1 Introduction background and principle of FastThreadLocal

Why would Netty create a FastThreadLocal when the JDK already has one? Where is FastThreadLocal?

This starts with JDK ThreadLocal itself. The diagram below:

In Java threads, each thread has a ThreadLocalMap instance variable. (This Map is not created if ThreadLocal is not used, but only when a thread accesses a ThreadLocal variable for the first time.)

This Map uses linear detection to resolve hash conflicts. If no idle slot is found, the Map continues to try until an idle slot is found and an entry is inserted. This method affects efficiency when hash conflicts are frequently encountered.

FastThreadLocal(FTL) uses arrays directly to avoid hash collisions by assigning a subscript index to each FastThreadLocal instance. Index allocation is implemented using AtomicInteger, and each FastThreadLocal can fetch a non-repeating subscript.

When the ftl.get() method is called to get the value, it returns directly from the array, such as return array[index], as shown below:

Implement source code analysis

As can be seen from the above diagram, the implementation of FTL involves InternalThreadLocalMap, FastThreadLocalThread and FastThreadLocal classes. From the bottom up, we will start the analysis from InternalThreadLocalMap.

The inheritance diagram of the InternalThreadLocalMap class is as follows:

2.1 UnpaddedInternalThreadLocalMap main properties

static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();
static final AtomicInteger nextIndex = new AtomicInteger();
Object[] indexedVariables;

Copy the code

The array indexedVariables is used to store FTL values, accessed directly using subscripts. NextIndex is used to assign a subscript to each FTL instance at FTL instance creation, and slowThreadLocalMap is used when threads are not FTLS.

2.2 InternalThreadLocalMap analysis

The main attributes of InternalThreadLocalMap are:

Public static final Object UNSET = new Object(); Bitsets default underlying data structure is a long[] array that starts with length 1, i.e. only long[0], and a long has 64bit. Bitset.set (1) sets the second value of long[0] to true, 0000, 0000... If long[0]==2 * if bitset.get (1) = true if long[0]==2 Set (64) {index: Boolean} select bitset.set (64) from bitset.set (64) {index: Boolean} To prevent a FastThreadLocal thread from starting the cleanup thread more than once */ Private BitSet cleanerFlags; private InternalThreadLocalMap() { super(newIndexedVariableTable()); } private static Object[] newIndexedVariableTable() { Object[] array = new Object[32]; Arrays.fill(array, UNSET); return array; }Copy the code

The newIndexedVariableTable() method creates an array of length 32, initializes it as UNSET, and passes it to the parent class. The FTL value is then stored in this array.

Note that this saves directly variable values, not entries, unlike JDK ThreadLocal. InternalThreadLocalMap will be analyzed here, and other methods will be analyzed FTL later.

2.3 Implementation analysis of FTLT

To take advantage of FTL’s performance, it must be used in conjunction with FTLT, otherwise it degrades to JDK ThreadLocal. FTLT is relatively simple and the key code is as follows:

public class FastThreadLocalThread extends Thread { // This will be set to true if we have a chance to wrap the Runnable. private final boolean cleanupFastThreadLocals; private InternalThreadLocalMap threadLocalMap; public final InternalThreadLocalMap threadLocalMap() { return threadLocalMap; } public final void setThreadLocalMap(InternalThreadLocalMap threadLocalMap) { this.threadLocalMap = threadLocalMap; }}Copy the code

The trick of FTLT is in the threadLocalMap property, which inherits Java Thread and then aggregates its own InternalThreadLocalMap. The FTL variable is then accessed, and in the case of FTLT threads, the variable value is obtained directly from InternalThreadLocalMap.

2.4 FTL implementation analysis

The FTL implementation analysis is based on the netTY-4.1.34 version, which is specifically declared because the source code for this version has been commented out of the ObjectCleaner call in the place of clearing, which is different from the previous version.

2.4.1 FTL attributes and instantiation
private final int index;

public FastThreadLocal() {
    index = InternalThreadLocalMap.nextVariableIndex();
}

Copy the code

The static method of assigning a value to the index attribute is InternalThreadLocalMap:

 public static int nextVariableIndex() {
        int index = nextIndex.getAndIncrement();
        if (index < 0) {
            nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
  }

Copy the code

As can be seen, each FTL instance obtains the index value in an ascending sequence of 1 step, which ensures that the array length in InternalThreadLocalMap does not increase suddenly.

2.4.2 Get () method implementation analysis
public final V get() { InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get(); // 1 Object v = threadLocalMap.indexedVariable(index); // 2 if (v ! = InternalThreadLocalMap.UNSET) { return (V) v; } V value = initialize(threadLocalMap); // 3 registerCleaner(threadLocalMap); // 4 return value; }Copy the code

1. 1 FebruaryInternalThreadLocalMap.get()Method to obtain a threadLocalMap:

=======================InternalThreadLocalMap======================= public static InternalThreadLocalMap get() { Thread  thread = Thread.currentThread(); if (thread instanceof FastThreadLocalThread) { return fastGet((FastThreadLocalThread) thread); } else { return slowGet(); } } private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) { InternalThreadLocalMap threadLocalMap = thread.threadLocalMap(); if (threadLocalMap == null) { thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap()); } return threadLocalMap; }Copy the code

Because FastThreadLocalThread is used in conjunction with FastThreadLocalThread to take advantage of FastThreadLocal’s performance, I will focus on the fastGet method. This method obtains threadLocalMap directly from the FTLT thread, creates an InternalThreadLocalMap instance, sets it in, and returns.

2.threadLocalMap.indexedVariable(index)Simply get the value from the array and return:

  public Object indexedVariable(int index) {
        Object[] lookup = indexedVariables;
        return index < lookup.length? lookup[index] : UNSET;
    }

Copy the code

3. If the value obtained is not UNSET, it is a valid value and is returned directly. If it is UNSET, it is initialized.

The initialize (threadLocalMap) methods:

  private V initialize(InternalThreadLocalMap threadLocalMap) {
        V v = null;
        try {
            v = initialValue();
        } catch (Exception e) {
            PlatformDependent.throwException(e);
        }

        threadLocalMap.setIndexedVariable(index, v); // 3-1
        addToVariablesToRemove(threadLocalMap, this); // 3-2
        return v;
    }

Copy the code

3.1. Obtain the initial value of FTL and save it to the array in FTL. If the array length is insufficient, expand the array length and save it without expanding it.

3.2. AddToVariablesToRemove threadLocalMap, this implementation, is to save FTL instance within threadLocalMap array the zeroth element Set in the collection.

No code is posted here, as shown below:

4.registerCleaner(threadLocalMap)Netty-4.1.34 version of the source code:

private void registerCleaner(final InternalThreadLocalMap threadLocalMap) { Thread current = Thread.currentThread(); if (FastThreadLocalThread.willCleanupFastThreadLocals(current) || threadLocalMap.isCleanerFlagSet(index)) { return; } threadLocalMap.setCleanerFlag(index); // TODO: We need to find a better way to handle this. /* // We will need to ensure we will trigger remove(InternalThreadLocalMap)  so everything will be released // and FastThreadLocal.onRemoval(...) will be called. ObjectCleaner.register(current, new Runnable() { @Override public void run() { remove(threadLocalMap); // It's fine to not call InternalThreadLocalMap.remove() here as this will only be triggered once // the Thread is collected by GC. In this case the ThreadLocal will be gone away already. } }); * /}Copy the code

As the code of ObjectCleaner. Register has been commented out in this version, and the remaining logic is relatively simple, so no analysis is done.

2.5 Performance degradation of common threads using FTL

As the analysis of get() method is completed, the principle of set(value) method is also ready to be revealed. Due to space limitation, no separate analysis is required.

If FTL is used in combination with FTLT, it will degrade to JDK ThreadLocal because ordinary threads do not contain InternalThreadLocalMap.

InternalThreadLocalMap get() :

=======================InternalThreadLocalMap======================= public static InternalThreadLocalMap get() { Thread  thread = Thread.currentThread(); if (thread instanceof FastThreadLocalThread) { return fastGet((FastThreadLocalThread) thread); } else { return slowGet(); }} private static InternalThreadLocalMap slowGet() { Get InternalThreadLocalMap threadLocal <InternalThreadLocalMap> slowThreadLocalMap = from this threadLocal UnpaddedInternalThreadLocalMap.slowThreadLocalMap; InternalThreadLocalMap ret = slowThreadLocalMap.get(); if (ret == null) { ret = new InternalThreadLocalMap(); slowThreadLocalMap.set(ret); } return ret; }Copy the code

Get InternalThreadLocalMap from a JDK ThreadLocal variable and then get the value of the specified array subscript from InternalThreadLocalMap.

3 Resource recovery mechanism of FTL

Netty provides three reclamation mechanisms for FTL:

Automatic: Use FTLT to execute a Runnable task wrapped by FastThreadLocalRunnable. After the task is executed, FTL is cleared automatically.

Manual: Both FTL and InternalThreadLocalMap provide the remove method, which the user can (and sometimes must, for example, a common thread pool using FTL) call manually when appropriate to display the delete.

Automatic: Register a Cleaner for each FTL of the current thread. When the thread object is not strongly reachable, the Cleaner thread will recycle the current FTL of the current thread. (Netty recommends that if you can use the other two methods, do not use this method, because it requires another thread, resource consumption, and multi-threading will cause some resource competition, in netTY-4.1.34 version, has commented out the code to call ObjectCleaner.)

4 FTL usage in Netty

The most important use of FTL in Netty is to allocate ByteBuf. The basic approach is that each thread allocates a block of memory (PoolArena), and when it needs to allocate ByteBuf, the thread allocates it from its own PoolArena first, and then globally if it cannot allocate ByteBuf.

However, due to limited memory resources, multiple threads can still hold the same PoolArena. But this way has minimized the multithreaded resource competition, improve the efficiency of the program.

The code is in PoolThreadLocalCache, an internal class of PoolByteBufAllocator:

final class PoolThreadLocalCache extends FastThreadLocal<PoolThreadCache> { @Override protected synchronized PoolThreadCache initialValue() { final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas); final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas); Thread current = Thread.currentThread(); If (useCacheForAllThreads | | current instanceof FastThreadLocalThread) {/ / PoolThreadCache for individual threads that hold the memory block of encapsulation return new PoolThreadCache( heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize, DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL); } // No caching so just use 0 as sizes. return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0); }}Copy the code