preface

Netty uses its own implementation of ThreadLocal in many places, called FastThreadLocal. In this chapter, we will take a look at the advantages of FastThreadLocal over traditional JDK ThreadLocal.

A, ThreadLocal

What’s the problem with ThreadLocal? Why re-implement?

1. Data structure

ThreadLocal actually uses ThreadLocalMap to store the mapping between ThreadLocal instances and user variables.

public void set(T value) {
	// Get the threadLocals member variable of the current thread, i.e., ThreadLocalMap
    Thread t = Thread.currentThread();
    ThreadLocalMap map = getMap(t);
    if(map ! =null)
    	// Pass in the current ThreadLocal instance and the user variable value
        map.set(this, value);
}
Copy the code

ThreadLocalMap uses hash tables internally to store the mapping between ThreadLocal instances and user variables. Using hash tables, it is inevitable to encounter hash conflicts, resulting in additional insert and query overhead.

There are four ways to resolve hash conflicts:

  • Open address method: Use index=hash(key) to get the subscript. If a conflict occurs, use index=fun(index) to calculate the next subscript. Use fun to iterate n times until no conflict occurs.
  • Rehash: Index =hash1(key) to get the subscript, if there is a conflict, index=hash2(key) to calculate the next subscript, use n hash functions until there are no conflicts.
  • Zipper method: The entry of the hash table is the head node of the linked list. Index =hash(key) is used to obtain the subscript. If a conflict occurs, the search starts from the head node of the linked list at the index position.
  • Public overflow area: the hash table is divided into basic table and overflow table. All elements that conflict with the basic table are filled into the overflow table.

ThreadLocalMap uses the open address method to resolve hash conflicts and tries to put the next index location when a hash conflict occurs.

private void set(ThreadLocal
        key, Object value) {
	/ / a hash table
    Entry[] tab = table;
    int len = tab.length;
    // Use ThreadLocal instance hashCode & hash table length -1 to get the hash table subscript
    int i = key.threadLocalHashCode & (len-1);
    // Open address resolving hash conflicts
    // nextIndex = I +1
    for(Entry e = tab[i]; e ! =null; e = tab[i = nextIndex(i, len)]) { ThreadLocal<? > k = e.get();// Find ThreadLocal= the current ThreadLocal to be inserted
        if (k == key) {
            e.value = value;
            return;
        }
        ThreadLocal is the key of the Entry and is a virtual reference. If there is no strong reference, it will be reclaimed during GC
        if (k == null) {
        	// Replace expired entries (and clean up expired entries)
            replaceStaleEntry(key, value, i);
            return; }}// No entry in the hash table for ThreadLocal
    // But find the free subscript, put in
    tab[i] = new Entry(key, value);
    int sz = ++size;
    // Expansion requires rehash
    if(! cleanSomeSlots(i, sz) && sz >= threshold) rehash(); }Copy the code

As you can see, with a simple set operation, ThreadLocalMap does a bunch of things. This includes resolving hash conflicts, expanding rehashes, and handling cases where ThreadLocal keys used as virtual reference entries are reclaimed.

2. Boundary conditions

Another problem with ThreadLocal is that it requires some boundary condition control for robustness. Most obviously, the key of the hash table entry is a virtual reference ThreadLocal.

Why does a key use a virtual reference? Javadoc provides the answer.

To help deal with very large and long-lived usages, the hash table entries use WeakReferences for keys. However, since reference queues are not used, stale entries are guaranteed to be removed only when the table starts running out of space.

To handle very long lifetime, hash table Entry uses WeakReferences as a key. However, because reference queues are not used, deletion of obsolete entries is guaranteed only when the table space is insufficient.

The question is, when does the tablespace count as insufficient and an outdated entry deletion is performed?

In the previous set method, the replaceStaleEntry and cleanSomeSlots method calls, both involved the logic of cleaning up expired entries, which I won’t go over here. Look at the get method.

public T get(a) {
	Thread t = Thread.currentThread();
	ThreadLocalMap map = getMap(t);
	if(map ! =null) {
		// Find the Entry corresponding to a ThreadLocal from the hash table
	    ThreadLocalMap.Entry e = map.getEntry(this);
	    if(e ! =null) {
	        return(T)e.value; }}return setInitialValue();
}

private Entry getEntry(ThreadLocal
        key) {
    int i = key.threadLocalHashCode & (table.length - 1);
    Entry e = table[i];
    if(e ! =null && e.get() == key)
        return e;
    else
    	/ / if e = = null | | hash collision um participant et ()! = key
        return getEntryAfterMiss(key, i, e);
}

private Entry getEntryAfterMiss(ThreadLocal<? > key,int i, Entry e) {
    Entry[] tab = table;
    int len = tab.length;
    // Use the open address method to search the hash table
    while(e ! =null) { ThreadLocal<? > k = e.get();if (k == key)
        	/ / find the entry
            return e;
        if (k == null)
        	// The stale entry collection logic is executed when the virtual reference ThreadLocal is reclaimed
            expungeStaleEntry(i);
        else
        	// i = i + 1
            i = nextIndex(i, len);
        e = tab[i];
    }
    return null;
}
Copy the code

As you can see, if you fail to get an element from the hash table through threadLocalHashCode&Len-1 of ThreadLocal, you enter the getEntryAfterMiss method. If the key of an Entry is found to be null during the search, the ThreadLocal instance will be reclaimed during GC, you need to run expungeStaleEntry reclamation logic.

3. Netty’s improvements to ThreadLocal

To address the above two problems, Netty has made the following improvements:

  • The data structure uses pure arrays instead of hash tables, and each ThreadLocal instantiation assigns a unique ID as the subscript of the array in ThreadLocalMap. Since no hash table is used, there is no hash conflict, and the underlying container expansion does not involve Rehash.
  • In ThreadLocalMap, there is no concept of Entry, and ThreadLocal is not placed as a virtual reference in the array of ThreadLocalMap. There are no virtual references and no need to worry about clearing expired entries.

Second, the FastThreadLocal

1, FastThreadLocalThread

To improve ThreadLocal, FastThreadLocalThread extends Thread to InternalThreadLocalMap, a member variable that holds the Thread variables of the current Thread, instead of JDK ThreadLocalMap.

public class FastThreadLocalThread extends Thread {
    private InternalThreadLocalMap threadLocalMap;
    public final InternalThreadLocalMap threadLocalMap(a) {
        return threadLocalMap;
    }
    public final void setThreadLocalMap(InternalThreadLocalMap threadLocalMap) {
        this.threadLocalMap = threadLocalMap; }}Copy the code

2, UnpaddedInternalThreadLocalMap

UnpaddedInternalThreadLocalMap is InternalThreadLocalMap parent class.

class UnpaddedInternalThreadLocalMap {
	// Use the old ThreadLocal
    static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();
    // The counter is used to assign ids to FastThreadLocal (array subscript)
    static final AtomicInteger nextIndex = new AtomicInteger();
    // Instead of using hash tables in ThreadLocalMap, use pure arrays
    Object[] indexedVariables;

    // Other thread variables used by Netty itself are not included in indexedVariables
    int futureListenerStackDepth;
    intlocalChannelReaderStackDepth; Map<Class<? >, Boolean> handlerSharableCache; IntegerHolder counterHashCode; ThreadLocalRandom random; Map<Class<? >, TypeParameterMatcher> typeParameterMatcherGetCache; Map<Class<? >, Map<String, TypeParameterMatcher>> typeParameterMatcherFindCache; StringBuilder stringBuilder; Map<Charset, CharsetEncoder> charsetEncoderCache; Map<Charset, CharsetDecoder> charsetDecoderCache; ArrayList<Object> arrayList;// The constructor passes in an array of thread variables
    UnpaddedInternalThreadLocalMap(Object[] indexedVariables) {
        this.indexedVariables = indexedVariables; }}Copy the code

UnpaddedInternalThreadLocalMap held several important member variables:

  • SlowThreadLocalMap: In order to prevent clients using FastThreadLocal from inevitably using regular threads (no FastThreadLocalThread is used), degrade the use of ThreadLocal to store InternalThreadLocalMap instances. Has the same capabilities as ThreadLocal.
  • NextIndex: an atomic counter used to generate an ID for the FastThreadLocal instance as a subscript to the array (indexedVariables) in the InternalThreadLocalMap.
  • IndexedVariables: Indices generated by the nextIndex counter are held by FastThreadLocal, and the elements are the ones the client needs to store.
  • Others: All other variables are required by the Netty framework itself. Why are these variables not stored together in indexedVariables? Consider that each time the framework needs to add a thread variable, one capacity of indexedVariables will be occupied, resulting in: 1) the initialization capacity of indexedVariables needs to be adjusted each time; 2) If the array initialization capacity is not adjusted, it will easily cause the array to automatically expand at runtime

3, InternalThreadLocalMap

Member variables

public final class InternalThreadLocalMap extends UnpaddedInternalThreadLocalMap {
	// Initialize the array size
    private static final int INDEXED_VARIABLE_TABLE_INITIAL_SIZE = 32;
    // a constant that marks the free position in the array
    public static final Object UNSET = new Object();
}
Copy the code

A constructor

private InternalThreadLocalMap(a) {
    super(newIndexedVariableTable());
}

private static Object[] newIndexedVariableTable() {
    Object[] array = new Object[INDEXED_VARIABLE_TABLE_INITIAL_SIZE];
    Arrays.fill(array, UNSET);
    return array;
}
Copy the code

The build method is private, and InternalThreadLocalMap is lazily loaded when it is obtained externally (the GET method). Create an array of the length of 32 Object and fill the UNSET Object, to the parent class UnpaddedInternalThreadLocalMap constructor.

Get InternalThreadLocalMap

FastThreadLocal, like ThreadLocal, needs to retrieve the current thread’s corresponding ThreadLocalMap, so we need to provide a static method for FastThreadLocal to retrieve the container containing the element, InternalThreadLocalMap.

public static InternalThreadLocalMap get(a) {
    Thread thread = Thread.currentThread();
    if (thread instanceof FastThreadLocalThread) {
        return fastGet((FastThreadLocalThread) thread);
    } else {
        returnslowGet(); }}Copy the code

If the thread is a FastThreadLocalThread, use the fastGet method. Otherwise go slowGet method.

The fastGet method is as follows: get InternalThreadLocalMap of the FastThreadLocalThread. If it is empty, create and put it into the thread’s member variables.

private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
    InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
    if (threadLocalMap == null) {
        thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
    }
    return threadLocalMap;
}

Copy the code

SlowGet method is as follows, with the aid of the ThreadLocal UnpaddedInternalThreadLocalMap member variables, save InternalThreadLocalMap instance, realize the common Thread can use FastThreadLocal, A compatibility operation was done.

private static InternalThreadLocalMap slowGet(a) {
    ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
    InternalThreadLocalMap ret = slowThreadLocalMap.get();
    if (ret == null) {
        ret = new InternalThreadLocalMap();
        slowThreadLocalMap.set(ret);
    }
    return ret;
}

Copy the code

The storage elements

public boolean setIndexedVariable(int index, Object value) {
    Object[] lookup = indexedVariables;
    if (index < lookup.length) {
    	// If index does not exceed the array size, set it and return it
        Object oldValue = lookup[index];
        lookup[index] = value;
        return oldValue == UNSET;
    } else {
    	// If index exceeds the array capacity, expand and set
        expandIndexedVariableTableAndSet(index, value);
        return true; }}Copy the code

Expand and save elements. Note that the expansion logic is 2 to the NTH power greater than index. For example, if index=32, the expansion size is 64, index=64, the expansion size is 128, index=65, the expansion size is 128.

private void expandIndexedVariableTableAndSet(int index, Object value) {
    Object[] oldArray = indexedVariables;
    final int oldCapacity = oldArray.length;
    // Calculate the capacity expansion
    int newCapacity = index;
    newCapacity |= newCapacity >>>  1;
    newCapacity |= newCapacity >>>  2;
    newCapacity |= newCapacity >>>  4;
    newCapacity |= newCapacity >>>  8;
    newCapacity |= newCapacity >>> 16;
    newCapacity ++;
    // Array copy
    Object[] newArray = Arrays.copyOf(oldArray, newCapacity);
    // the UNSET flag fills
    Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);
    // Save the element
    newArray[index] = value;
    indexedVariables = newArray;
}

Copy the code

Query element

public Object indexedVariable(int index) {
    Object[] lookup = indexedVariables;
    return index < lookup.length? lookup[index] : UNSET;
}

Copy the code

4, FastThreadLocal

Member variables

public class FastThreadLocal<V> {
   	// Class variable, the subscript position used to store all FastThreadLocal instance collections
    private static final int variablesToRemoveIndex = InternalThreadLocalMap.nextVariableIndex();
    // Each FastThreadLocal creation is assigned a unique ID for InternalThreadLocalMap control
    . / / as UnpaddedInternalThreadLocalMap indexedVariables array indices
    private final int index;
}

Copy the code

A constructor

FastThreadLocal construct through UnpaddedInternalThreadLocalMap# nextIndex atomic counter, allocated InternalThreadLocalMap subscript of an array of Object location, used for subsequent get/set.

public FastThreadLocal(a) {
    index = InternalThreadLocalMap.nextVariableIndex();
}

Copy the code

set

In a nutshell, the set method simply stores the value at the index location assigned to the Object array.

public final void set(V value) {
    if(value ! = InternalThreadLocalMap.UNSET) {// Get InternalThreadLocalMap of the current thread
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
        / / store the value
        setKnownNotUnset(threadLocalMap, value);
    } else {
    	// If value is UNSET, remove the current threadLocalMapremove(); }}private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
	// If the save is successful
    if (threadLocalMap.setIndexedVariable(index, value)) {
    	// Add the current FastThreadLocal to the FastThreadLocal collection corresponding to the variablesToRemoveIndex subscript
    	// Static methods can be provided to batch clean the FastThreadLocal associated with the current thread
    	// Ignore the implementation
        addToVariablesToRemove(threadLocalMap, this); }}Copy the code

get

The get method attempts to get an Object from the index position of the Object array and returns it if it is not UNSET. Otherwise, initialize sets an initial value. The Initialize method requires a client implementation, just like ThreadLocal.

public final V get(InternalThreadLocalMap threadLocalMap) {
    Object v = threadLocalMap.indexedVariable(index);
    if(v ! = InternalThreadLocalMap.UNSET) {return (V) v;
    }
    return initialize(threadLocalMap);
}
Copy the code

Three, simple implementation

Without considering any boundary conditions and Thread compatibility, the implementation of FastThreadLocal is simply as follows:

public class DemoFastThread extends Thread {
    private DemoThreadLocalMap threadLocalMap;

    public DemoFastThread(Runnable target) {
        super(target);
        this.threadLocalMap = new DemoThreadLocalMap();
    }

    public DemoThreadLocalMap getThreadLocalMap(a) {
        return threadLocalMap;
    }

    public void setThreadLocalMap(DemoThreadLocalMap threadLocalMap) {
        this.threadLocalMap = threadLocalMap; }}class DemoThreadLocalMap {
    Object[] objs = new Object[32];
    static final AtomicInteger nextIndex = new AtomicInteger();

    public Object getObj(int index) {
        return objs[index];
    }

    public void setObj(int index, Object obj) {
        this.objs[index] = obj; }}class DemoFastThreadLocal<V> {
    private int index;

    public DemoFastThreadLocal(a) {
        this.index = DemoThreadLocalMap.nextIndex.getAndIncrement();
    }

    public final V get(a) {
        DemoThreadLocalMap threadLocalMap = ((DemoFastThread) Thread.currentThread()).getThreadLocalMap();
        Object v = threadLocalMap.getObj(index);
        if (v == null) {
            v = initialValue();
            threadLocalMap.setObj(index, v);
        }
        return (V) v;
    }

    public final void set(V v) {
        DemoThreadLocalMap threadLocalMap = ((DemoFastThread) Thread.currentThread()).getThreadLocalMap();
        threadLocalMap.setObj(index, v);
    }

    protected V initialValue(a) {
        return null; }}Copy the code