“This is the 23rd day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021”

Before reading this article, it is recommended to read the content related to the article first.

1. Detailed analysis of the underlying implementation principle of network communication under distributed micro-service architecture (Diagram)

2. After working for 5 years, do you really understand Netty and why you use it? (Deep dry goods)

3. In-depth analysis of the core components of Netty

ByteBuf in Netty

5. How to solve the problem of unpacking sticky packets in Netty through a large number of actual cases?

6. Customized message communication protocol based on Netty (protocol design and practical application analysis)

7. The most detailed and complete serialization technology and in-depth analysis and application of the network

8. Implement a basic RPC framework based on Netty

9. (annual pay 60W watershed) Advanced RPC framework implementation based on Netty handwriting (with registry and notes)

The implementation of FastThreadLocal is very similar to ThreadLocal in the J.U.C package.

For those of you who know ThreadLocal, it has several key objects.

  1. Thread
  2. ThreadLocalMap
  3. ThreadLocal

Also, Netty has two important classes, FastThreadLocalThread and InternalThreadLocalMap, tailored specifically for FastThreadLocal. Let’s take a look at how these two classes are implemented.

If you don’t understand ThreadLocal, you can read this article: How ThreadLocal works

FastThreadLocalThread is a wrapper around the Thread class, with one instance of InternalThreadLocalMap for each Thread. The performance benefits of FastThreadLocal can be leveraged only when FastThreadLocal and FastThreadLocalThread are used together. FastThreadLocalThread FastThreadLocalThread FastThreadLocalThread

public class FastThreadLocalThread extends Thread {

    private InternalThreadLocalMap threadLocalMap;
    // omit other code
}
Copy the code

You can see that the FastThreadLocalThread mainly extends the InternalThreadLocalMap field, We can guess that fastthreadlocalthreads mainly use InternalThreadLocalMap to store data, rather than ThreadLocalMap in threads. Therefore, if you want to know the secret of the high performance of FastThreadLocalThread, you must understand the design principle of InternalThreadLocalMap.

InternalThreadLocalMap

public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {

    private static final int DEFAULT_ARRAY_LIST_INITIAL_CAPACITY = 8;

    private static final int STRING_BUILDER_INITIAL_SIZE;

    private static final int STRING_BUILDER_MAX_SIZE;

    public static final Object UNSET = new Object();

    private BitSet cleanerFlags;
    private InternalThreadLocalMap(a) {
        indexedVariables = newIndexedVariableTable();
    }
    private static Object[] newIndexedVariableTable() {
        Object[] array = new Object[INDEXED_VARIABLE_TABLE_INITIAL_SIZE];
        Arrays.fill(array, UNSET);
        return array;
    }
    public static int lastVariableIndex(a) {
        return nextIndex.get() - 1;
    }

    public static int nextVariableIndex(a) {
        int index = nextIndex.getAndIncrement();
        if (index < 0) {
            nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
    }
    / / to omit

}
Copy the code

The internal implementation of InternalThreadLocalMap is the same as that of ThreadLocalMap.

ThreadLocal uses arrays to implement hash tables internally, and linear exploration to implement hash collisions.

But instead of using linear probing to resolve Hash collisions, InternalThreadLocalMap allocates an array index, index, at FastThreadLocal initialization. The index value of using atomic classes AtomicInteger ensure increasing order, by calling the InternalThreadLocalMap. NextVariableIndex () method. Then, when reading and writing data, the array index is used to locate the FastThreadLocal location directly. The time complexity is O(1). If the array subscript is incremented to a very large value, the array will be large, so FastThreadLocal uses the idea of space for time to improve read and write performance.

The following diagram describes the relationship between InternalThreadLocalMap, index, and FastThreadLocal.

What are the differences between FastThreadLocal and ThreadLocal?

FastThreadLocal uses an Object array instead of an Entry array. Object[0] stores a Set<FastThreadLocal<? > > set.

Value data is stored directly from array subscript 1, instead of using the key-value pair form of ThreadLocal.

Suppose we now have a batch of data to add to the array, value1, Value2, Value3, value4. The corresponding FastThreadLocal generates array indexes 1, 2, 3, and 4 when initialized. As shown in the figure below.

At this point, we have a basic understanding of FastThreadLocal, the following we combine the specific source code analysis of FastThreadLocal implementation principle.

FastThreadLocal set method source analysis

Before we look at the source code, let’s go back to the ThreadLocal example above. What if ThreadLocal were replaced with FastThread?

public class FastThreadLocalTest {

    private static final FastThreadLocal<String> THREAD_NAME_LOCAL = new FastThreadLocal<>();
    private static final FastThreadLocal<TradeOrder> TRADE_THREAD_LOCAL = new FastThreadLocal<>();
    public static void main(String[] args) {
        for (int i = 0; i < 2; i++) {
            int tradeId = i;
            String threadName = "thread-" + i;
            new FastThreadLocalThread(() -> {
                THREAD_NAME_LOCAL.set(threadName);
                TradeOrder tradeOrder = new TradeOrder(tradeId, tradeId % 2= =0 ? "Paid" : "Unpaid");
                TRADE_THREAD_LOCAL.set(tradeOrder);
                System.out.println("threadName: " + THREAD_NAME_LOCAL.get());
                System.out.println("TradeOrder info:" "+ TRADE_THREAD_LOCAL.get()); }, threadName).start(); }}}Copy the code

As you can see, FastThreadLocal is used almost exactly the same as ThreadLocal, Simply replace Thread and ThreadLocal with FastThreadLocalThread and FastThreadLocal. Netty does a great job of ease of use. Let’s focus on the fastthreadlocal.set ()/get() methods used in this example.

Fastthreadlocal.set () fastthreadlocal.set () fastthreadlocal.set ()

public final void set(V value) {
    if(value ! = InternalThreadLocalMap.UNSET) { InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); setKnownNotUnset(threadLocalMap, value); }else{ remove(); }}Copy the code

The fastThreadLocal.set () method is not hard to understand. The process of set() is mainly divided into three steps:

  1. Check if value is the default, and if it is, call the remove() method. We don’t know what the connection is between the default and remove(), so we’ll leave remove() for last.
  2. If value is not equal to the default, the current thread’s InternalThreadLocalMap is then obtained.
  3. Then replace the corresponding data in InternalThreadLocalMap with the new value.

InternalThreadLocalMap.get()

Start with InternalThreadLocalMap. The get () method:

public static InternalThreadLocalMap get(a) {
    Thread thread = Thread.currentThread();
    if (thread instanceof FastThreadLocalThread) {
        return fastGet((FastThreadLocalThread) thread);
    } else {
        returnslowGet(); }}Copy the code

If the thread instance type is FastThreadLocalThread, fastGet() is called.

InternalThreadLocalMap. The get () logic is simple.

  1. If the current thread is of type FastThreadLocalThread, you can simply get the threadLocalMap attribute of the FastThreadLocalThread using the fastGet() method
  2. If InternalThreadLocalMap does not exist at this point, create a return.

The initialization of InternalThreadLocalMap, described above, initializes an Object array of length 32 filled with 32 references to the default Object UNSET.

private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
  InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
  if (threadLocalMap == null) {
    thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
  }
  return threadLocalMap;
}
Copy the code

Otherwise, slowGet() is called, which, from the code’s implementation, is a backstop for calling threads that are not FastThreadLocalThread types. If the current thread is not a FastThreadLocalThread, there is no InternalThreadLocalMap. Netty in UnpaddedInternalThreadLocalMap saved a JDK native ThreadLocal, depositing a InternalThreadLocalMap ThreadLocal, The retrieval of InternalThreadLocalMap is then degraded to the JDK’s native ThreadLocal retrieval.

private static InternalThreadLocalMap slowGet(a) {
  InternalThreadLocalMap ret = slowThreadLocalMap.get();
  if (ret == null) {
    ret = new InternalThreadLocalMap();
    slowThreadLocalMap.set(ret);
  }
  return ret;
}
Copy the code

setKnownNotUnset

Now that we’re done getting the InternalThreadLocalMap, let’s look at how setKnownNotUnset() adds data to InternalThreadLocalMap.

private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
    if (threadLocalMap.setIndexedVariable(index, value)) {
        addToVariablesToRemove(threadLocalMap, this); }}Copy the code

SetKnownNotUnset () does two main things:

  1. Find the index position in the array and set a new value.
  2. Save the FastThreadLocal object to the Set to be cleaned.

First we see the first step threadLocalMap. SetIndexedVariable () the source code to achieve:

public boolean setIndexedVariable(int index, Object value) {
    Object[] lookup = indexedVariables;
    if (index < lookup.length) {
        Object oldValue = lookup[index];
        lookup[index] = value;
        return oldValue == UNSET;
    } else {
        expandIndexedVariableTableAndSet(index, value);
        return true; }}Copy the code

IndexedVariables is an array used to store data in InternalThreadLocalMap. If the array is larger than the FastThreadLocal index, So I’m going to go to index and I’m going to put the new value in there, and the event complexity is O(1). Before setting the new value, the element at the previous index position is fetched, and if the old element remains the UNSET default object, success is returned.

What if I run out of array capacity? InternalThreadLocalMap is automatically expanded and then the value is set. Then look at expandIndexedVariableTableAndSet () expansion of logic:

private void expandIndexedVariableTableAndSet(int index, Object value) {
    Object[] oldArray = indexedVariables;
    final int oldCapacity = oldArray.length;
    int newCapacity = index;
    newCapacity |= newCapacity >>>  1;
    newCapacity |= newCapacity >>>  2;
    newCapacity |= newCapacity >>>  4;
    newCapacity |= newCapacity >>>  8;
    newCapacity |= newCapacity >>> 16;
    newCapacity ++;

    Object[] newArray = Arrays.copyOf(oldArray, newCapacity);
    Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);
    newArray[index] = value;
    indexedVariables = newArray;
}
Copy the code

As you can see, InternalThreadLocalMap implements array expansion almost exactly like HashMap, so reading the source code can give us a lot of inspiration. InternalThreadLocalMap Expands the array capacity based on index and rounded up the capacity to the power of 2. We then copy the contents of the original array into the new array, populate the empty portion of the default object UNSET, and finally assign the new array to indexedVariables.

Think about base scaling

Question: why does InternalThreadLocalMap expand based on index instead of the original array size?

Suppose 70 FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal FastThreadLocal When FastThreadLocal (index = 70) calls set(), it still fails to fill the array with data whose index = 70. So using index as the base can solve this problem, but if there are too many FastThreadLocal’s, the array length is also very large.

Back to the main flow of setKnownNotUnset(), after adding data to InternalThreadLocalMap, the next step is to save the FastThreadLocal object into the Set to be cleaned. Let’s look at how addToVariablesToRemove() is implemented:

addToVariablesToRemove

private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal
        variable) { Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex); Set<FastThreadLocal<? >> variablesToRemove;if (v == InternalThreadLocalMap.UNSET || v == null) {
        variablesToRemove = Collections.newSetFromMap(newIdentityHashMap<FastThreadLocal<? >, Boolean>()); threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove); }else{ variablesToRemove = (Set<FastThreadLocal<? >>) v; } variablesToRemove.add(variable); }Copy the code

VariablesToRemoveIndex is a static final variable that is assigned a value of 0 when FastThreadLocal is initialized. InternalThreadLocalMap first finds the element with subscript 0 in the array.

  1. If the element is the default object UNSET or does not exist, a Set of type FastThreadLocal is created and then the Set is populated at the array subscript 0.
  2. If the first element of the array is not the default object UNSET, the Set is already populated. This explains why the value data of InternalThreadLocalMap is stored from subscript 1, because 0 is already occupied by the Set.

Think about Set design

Consider: why does InternalThreadLocalMap store a FastThreadLocal Set at array subscript 0? Now let’s go back to the remove() method.

public final void remove(InternalThreadLocalMap threadLocalMap) {
  if (threadLocalMap == null) {
    return;
  }

  Object v = threadLocalMap.removeIndexedVariable(index);
  removeFromVariablesToRemove(threadLocalMap, this);

  if(v ! = InternalThreadLocalMap.UNSET) {try {
      onRemoval((V) v);
    } catch(Exception e) { PlatformDependent.throwException(e); }}}Copy the code

Before perform the remove operation, will be called InternalThreadLocalMap. GetIfSet () for current InternalThreadLocalMap.

With that in mind, understanding the getIfSet() method is pretty straightforward.

  1. If the FastThreadLocalThread type is FastThreadLocalThread, the threadLocalMap attribute of the FastThreadLocalThread is directly used.
  2. If it is a common Thread, it is obtained from slowThreadLocalMap of type ThreadLocal.

After InternalThreadLocalMap is found, InternalThreadLocalMap locates the element in the array at index position and overwrites the element at index position as the default object UNSET.

The next step is to clean up the current FastThreadLocal object. This is where the Set Set comes in. InternalThreadLocalMap will retrieve the Set at subscript 0 and delete the current FastThreadLocal. Finally, what does the onRemoval() method do? Netty only has an extension, but it is not implemented. The user needs to do some post-operation when deleting the Netty.

Fastthreadlocal.get () source code analysis

Fastthreadlocale.get () fastthreadlocale.get () fastthreadlocale.get ()

public final V get(a) {
    InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
    Object v = threadLocalMap.indexedVariable(index);
    if(v ! = InternalThreadLocalMap.UNSET) {return (V) v;
    }

    return initialize(threadLocalMap);
}
Copy the code

InternalThreadLocalMap = FastThreadLocalThread (); InternalThreadLocalMap = FastThreadLocalThread (); Note The location has been filled with data.

public Object indexedVariable(int index) {
  Object[] lookup = indexedVariables;
  return index < lookup.length? lookup[index] : UNSET;
}
Copy the code

If the element at index is the default object UNSET, you need to initialize it. As you can see, the Initialize () method calls the user-overridden initialValue method to construct the object data that needs to be stored.

private V initialize(InternalThreadLocalMap threadLocalMap) {
    V v = null;
    try {
        v = initialValue();
    } catch (Exception e) {
        PlatformDependent.throwException(e);
    }

    threadLocalMap.setIndexedVariable(index, v);
    addToVariablesToRemove(threadLocalMap, this);
    return v;
}
Copy the code

The initialValue method is constructed as follows.

private final FastThreadLocal<String> threadLocal = new FastThreadLocal<String>() {
  @Override
  protected String initialValue(a) {
    return "hello world"; }};Copy the code

Once the user object data is constructed, it is then populated into the array index, and the current FastThreadLocal object is saved into the Set to be cleaned. The whole process was covered in fastthreadlocal.set () and won’t be repeated here.

So far, the two core FastThreadLocal methods set()/get() have been analyzed. There are two questions that I want to think about a little bit more.

  1. Is FastThreadLocal really faster than ThreadLocal? The answer is no, only FastThreadLocalThread threads are faster, and normal threads are slower.
  2. Does FastThreadLocal waste a lot of space? Although FastThreadLocal uses a space-for-time approach, it was thought from the beginning that there would not be too many FastThreadLocal objects, and that there would only be references to the same default object in the data without any used elements. It doesn’t take up much memory.

Copyright Notice: All articles on this blog are subject to a CC BY-NC-SA 4.0 license unless otherwise stated. Reprint please specify from Mic to take you to learn structure! If this article is helpful to you, please also help to point attention and like, your persistence is the power of my continuous creation. Welcome to pay attention to the same wechat public account for more technical dry goods!