preface

This chapter reviews the process of Netty pooling memory usage, including allocating and releasing memory.

Related historical articles:

  • Netty source code (two) primary memory pool: Netty memory pool main class.
  • Netty source code (three) memory allocation (on) : Chunk and Subpage allocated less than 16MB specifications of memory.
  • Netty FastThreadLocal: Netty implementation of ThreadLocal.
  • Thread cache: How to use the thread cache PoolThreadCache in Netty memory pool to improve memory allocation efficiency.

A, PooledByteBufAllocator

The PooledByteBufAllocator class loads cinit and determines some key specifications based on the configuration information: page size (8K), Chunk tree depth (11), Arena array length (number of cores x 2). The size of Chunk (16M) is indirectly determined by page size and Chunk tree depth.

// 1. Determine the page size to be 8K
int defaultPageSize = SystemPropertyUtil.getInt("io.netty.allocator.pageSize".8192);
// ...
DEFAULT_PAGE_SIZE = defaultPageSize;

// 2. Determine the Chunk tree depth 11
int defaultMaxOrder = SystemPropertyUtil.getInt("io.netty.allocator.maxOrder".11);
// ...
DEFAULT_MAX_ORDER = defaultMaxOrder;

// 3. Chunk memory block size 16M = page size << tree depth
final int defaultChunkSize = DEFAULT_PAGE_SIZE << DEFAULT_MAX_ORDER;

final Runtime runtime = Runtime.getRuntime();
// 4. Arena array length number of cores *2
final int defaultMinNumArena = NettyRuntime.availableProcessors() * 2;
DEFAULT_NUM_HEAP_ARENA = Math.max(0,
                                  SystemPropertyUtil.getInt(
                                      "io.netty.allocator.numHeapArenas",
                                      (int) Math.min(
                                          defaultMinNumArena,
                                          runtime.maxMemory() / defaultChunkSize / 2 / 3)));
DEFAULT_NUM_DIRECT_ARENA = Math.max(0,
                                    SystemPropertyUtil.getInt(
                                        "io.netty.allocator.numDirectArenas",
                                        (int) Math.min(
                                            defaultMinNumArena,
                                            PlatformDependent.maxDirectMemory() / defaultChunkSize / 2 / 3)));

// 5. MPSC queue length of different MemoryRegionCache specifications in PoolThreadCache
DEFAULT_TINY_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.tinyCacheSize".512);
DEFAULT_SMALL_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.smallCacheSize".256);
DEFAULT_NORMAL_CACHE_SIZE = SystemPropertyUtil.getInt("io.netty.allocator.normalCacheSize".64);
Copy the code

The PooledByteBufAllocator constructor, assigns some member variables, and constructs the PoolArena array.

public PooledByteBufAllocator(boolean preferDirect, int nHeapArena, int nDirectArena, int pageSize, int maxOrder,
                              int tinyCacheSize, int smallCacheSize, int normalCacheSize,
                              boolean useCacheForAllThreads, int directMemoryCacheAlignment) {
    super(preferDirect);
    threadCache = new PoolThreadLocalCache(useCacheForAllThreads);
    this.tinyCacheSize = tinyCacheSize;
    this.smallCacheSize = smallCacheSize;
    this.normalCacheSize = normalCacheSize;
    chunkSize = validateAndCalculateChunkSize(pageSize, maxOrder);
    int pageShifts = validateAndCalculatePageShifts(pageSize);
	// Heap memory Arena array construct
    if (nHeapArena > 0) {
        heapArenas = newArenaArray(nHeapArena);
        for (int i = 0; i < heapArenas.length; i ++) {
            PoolArena.HeapArena arena = new PoolArena.HeapArena(this, pageSize, maxOrder, pageShifts, chunkSize, directMemoryCacheAlignment); heapArenas[i] = arena; }}// Direct memory Arena array constructs
    if (nDirectArena > 0) {
        directArenas = newArenaArray(nDirectArena);
        for (int i = 0; i < directArenas.length; i ++) {
            PoolArena.DirectArena arena = new PoolArena.DirectArena(
                this, pageSize, maxOrder, pageShifts, chunkSize, directMemoryCacheAlignment); directArenas[i] = arena; }}}Copy the code

When PooledByteBufAllocator is constructed, PoolArena is also constructed. Recall the Arena structure as follows.

Allocator can allocate HeapBuffer or DirectBuffer. In this case, the newDirectBuffer method is selected to allocate direct memory.

private final PoolThreadLocalCache threadCache;
@Override
protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
    // 1. Get the PoolArena for the current thread cache and thread cache
    PoolThreadCache cache = threadCache.get();
    PoolArena<ByteBuffer> directArena = cache.directArena;

    final ByteBuf buf;
    if(directArena ! =null) {
        // 2. Select pool
        buf = directArena.allocate(cache, initialCapacity, maxCapacity);
    } else {
        // Select unpooled
        buf = PlatformDependent.hasUnsafe() ?
            UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
        new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
    }
    // 3. If memory leak detection is configured, wrap ByteBuf and ignore it
    return toLeakAwareBuffer(buf);
}
Copy the code

For the first step in the code above, get the PoolThreadLocalCache instance in the PoolThreadLocalCache thread variable. If PoolThreadCache is not allocated by the current thread, the PoolThreadLocalCache initialValue method is triggered to select an Arena that is least used by the current thread.

@Override
protected synchronized PoolThreadCache initialValue(a) {
    // Each thread is drawn from the public arrays heapArenas and directArenas
    // Use the least Arena as the current thread's Arena
    final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
    final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

    final Thread current = Thread.currentThread();
    if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
        / / PoolThreadCache construction
        final PoolThreadCache cache = new PoolThreadCache(
            heapArena, directArena, tinyCacheSize, smallCacheSize, normalCacheSize,
            DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
        return cache;
    }
    // ...
}
Copy the code

Ii. Main process of pooled memory allocation

PoolArena’s allocate method first creates a pooled Buffer by calling poolarena.directarena #newByteBuf.

PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
    Create a pooled ByteBuf
    PooledByteBuf<T> buf = newByteBuf(maxCapacity);
    // Allocate memory for ByteBuf
    allocate(cache, buf, reqCapacity);
    return buf;
}
Copy the code

Next, the allocate method allocates Chunk and handle to PooledByteBuf (remember?Handle is the offset of the Chunk memory block).Allocate method is the main process of memory allocation, and the process nodes are different according to memory specifications.

private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
    // Apply capacity normalization to the nearest power of 2.
    final int normCapacity = normalizeCapacity(reqCapacity);
    // the requested capacity is less than 8KB
    if (isTinyOrSmall(normCapacity)) {
        int tableIdx;
        PoolSubpage<T>[] table;
        boolean tiny = isTiny(normCapacity);
        // 1 Level 1 tries to allocate from the thread cache PoolThreadCache
        // The required capacity is less than 512 BYTES
        if (tiny) {
            // Try to fetch it from PoolThreadCache's MemoryRegionCache
            if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                // was able to allocate out of the cache so move on
                return;
            }
            tableIdx = tinyIdx(normCapacity);
            table = tinySubpagePools;
        }
        // Apply for a capacity greater than 512 BYTES
        else {
            // Try to fetch it from PoolThreadCache's MemoryRegionCache
            if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                return;
            }
            tableIdx = smallIdx(normCapacity);
            table = smallSubpagePools;
        }
        // 2 Secondary -- Try tinySubpagePools or smallSubpagePools
        final PoolSubpage<T> head = table[tableIdx];

        synchronized (head) {
            final PoolSubpage<T> s = head.next;
            if(s ! = head) {long handle = s.allocate();
                s.chunk.initBufWithSubpage(buf, null, handle, reqCapacity, cache);
                return; }}// 3 Level 3 -- Normal allocation logic
        synchronized (this) {
            allocateNormal(buf, reqCapacity, normCapacity, cache);
        }
        return;
    }
    // the requested capacity is less than 16MB
    if (normCapacity <= chunkSize) {
        // 1 Level 1 tries to allocate from the thread cache PoolThreadCache
        if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
            return;
        }
        // 2 Level 2 -- Normal allocation logic
        synchronized (this) { allocateNormal(buf, reqCapacity, normCapacity, cache); ++allocationsNormal; }}// apply for a capacity greater than 16MB to create a pool
    else{ allocateHuge(buf, reqCapacity); }}Copy the code

For Tiny and Small size memory allocation, there are four levels: thread cache ->Subpage pool ->ChunkList-> new chunks, followed by the allocateNormal method. The Subpage pool allocation needs to lock the empty nodes of the linked list corresponding to the Subpage, because the Subpage access to the linked list is multithreaded operation (PoolChunk#allocateSubpage and PoolChunk#free). ChunkList allocation and new Chunk allocation are also wrapped in the allocateNormal method locking because ChunkList will be multithreaded.

Normal memory allocation goes through three layers and does not access the Subpage pool because Normal memory allocation is greater than or equal to the page size. The memory allocation is not by Subpage but by Chunk.

For Huge memory allocation, only a special Chunk that is not pooled is used to allocate memory.

Three, Huge specification memory allocation

The Huge specification refers to the memory that is allocated more than 16MB after standardization, which is larger than the normal size of a Chunk. Netty does this by creating special chunks to reuse logic.

private void allocateHuge(PooledByteBuf<T> buf, int reqCapacity) {
    1. Create a special PoolChunk
    PoolChunk<T> chunk = newUnpooledChunk(reqCapacity);
    2. Run the init0 method of PooledByteBuf
    buf.initUnpooled(chunk, reqCapacity);
}
Copy the code

1. Create a special Chunk

Poolarena. DirectArena’s newUnpooledChunk method directly calls a special constructor of PoolChunk.

@Override
protected PoolChunk<ByteBuffer> newUnpooledChunk(int capacity) {
    // The default direct memory alignment padding is 0
    if (directMemoryCacheAlignment == 0) {
        // Create a Huge ByteBuffer. The normal size is 16MB, but this is a Huge memory that exceeds 16MB
        ByteBuffer byteBuffer = allocateDirect(capacity);
        // Construct a special PoolChunk
        return new PoolChunk<ByteBuffer>(this, byteBuffer, capacity, 0);
    }
    / /... omit
}
Copy the code

PoolChunk is dedicated to the construction of Huge memory. Many key member variables are NULL, such as the memoryMap tree, and the unpooled attribute is true.

/** Creates a special chunk that is not pooled. */
PoolChunk(PoolArena<T> arena, T memory, int size, int offset) {
    unpooled = true;
    this.arena = arena;
    this.memory = memory;
    this.offset = offset;
    memoryMap = null;
    depthMap = null;
    subpages = null;
    subpageOverflowMask = 0;
    pageSize = 0;
    pageShifts = 0;
    maxOrder = 0;
    unusable = (byte) (maxOrder + 1);
    chunkSize = size;
    log2ChunkSize = log2(chunkSize);
    maxSubpageAllocs = 0;
    cachedNioBuffers = null;
}
Copy the code

2. Initialize Buffer

The PooledByteBuf init method is used for Huge chunks and Subpage memory allocation. The PooledByteBuf init method is used for Huge chunks and Subpage memory allocation.

// Tiny/Small (Subpage allocation), Normal(Chunk allocation)
void init(PoolChunk<T> chunk, ByteBuffer nioBuffer,
          long handle, int offset, int length, int maxLength, PoolThreadCache cache) {
    init0(chunk, nioBuffer, handle, offset, length, maxLength, cache);
}
// Huge(special Chunk allocation)
void initUnpooled(PoolChunk<T> chunk, int length) {
    init0(chunk, null.0, chunk.offset, length, length, null);
}
// The last main entry for the above two methods
private void init0(PoolChunk<T> chunk, ByteBuffer nioBuffer,
                   long handle, int offset, int length, int maxLength, PoolThreadCache cache) {
    this.chunk = chunk;
    memory = chunk.memory;
    tmpNioBuf = nioBuffer;
    allocator = chunk.arena.parent;
    this.cache = cache;
    this.handle = handle;
    this.offset = offset;
    this.length = length;
    this.maxLength = maxLength;
}
Copy the code

Four, the normal distribution logic

The PoolArena#allocateNormal method is the normal allocation logic that requires the allocation of memory from Chunk. The whole method needs to be wrapped in synchronized because it involves qXXX linked list multithreading.

private final PoolChunkList<T> q050;
private final PoolChunkList<T> q025;
private final PoolChunkList<T> q000;
private final PoolChunkList<T> qInit;
private final PoolChunkList<T> q075;
private final PoolChunkList<T> q100;
// Method must be called inside synchronized(this) { ... } block
private void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity, PoolThreadCache threadCache) {
    // 1. Try using PoolTrunk allocation in PoolChunkList (existing PoolChunk)
    if (q050.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        q025.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        q000.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        qInit.allocate(buf, reqCapacity, normCapacity, threadCache) ||
        q075.allocate(buf, reqCapacity, normCapacity, threadCache)) {
        return;
    }
    // 2. Create a PoolChunk and place qInit
    PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
    boolean success = c.allocate(buf, reqCapacity, normCapacity, threadCache);
    assert success;
    qInit.add(c);
}
Copy the code

We will focus only on the first PoolChunkList allocation. The second PoolChunk allocation has already been covered.

1. Review PoolChunkList

PoolChunkList is divided into different PoolChunkList instances based on Chunk usage specifications (minUsage and maxUsage). Chunks with the same usage specifications are stored in the same PoolChunkList instance and are stored in a linked list (head). The Arena maintains chunks of PoolChunkList (qXXX) with different usage specifications, connected to each other by the prevList (nextList) pointer to the PoolChunkList.

In the initial Arena, all PoolChunkList head Pointers were empty. After a new Chunk was created, the PoolChunkList instance for Qinit was added. Subsequently, the PoolChunkList (q000-Q100) was moved back and forth among the PoolChunkList (q000-Q100) for Chunk usage fluctuations.

final class PoolChunkList<T> implements PoolChunkListMetric {
    / / Arena they belong to
    private final PoolArena<T> arena;
    / / PoolChunkList after flooding
    private final PoolChunkList<T> nextList;
    / / precursor PoolChunkList
    private PoolChunkList<T> prevList;
    // Chunk usage lower limit
    private final int minUsage;
    // Chunk usage upper limit
    private final int maxUsage;
    // The upper limit of Chunk allocated memory managed by the current instance (calculated by minUsage)
    private final int maxCapacity;
    // The Chunk header node is initially NULL
    private PoolChunk<T> head;
    // The lower limit of free memory. Chunks less than or equal to this value need to be moved to nextList
    private final int freeMinThreshold;
    // The upper limit of free memory. Chunks larger than this value need to be moved to prevList
    private final int freeMaxThreshold;
}
Copy the code

2, the allocate

boolean allocate(PooledByteBuf<T> buf, int reqCapacity, int normCapacity, PoolThreadCache threadCache) {
    // If the requested standardized capacity is greater than maxCapacity, the application is not processed
    if (normCapacity > maxCapacity) {
        return false;
    }
    // Cursor traverses the list
    for(PoolChunk<T> cur = head; cur ! =null; cur = cur.next) {
        // Try to allocate memory using PoolChunk corresponding to the current cursor
        if (cur.allocate(buf, reqCapacity, normCapacity, threadCache)) {
            // If the allocation is successful, check whether the Chunk free memory is smaller than the current PoolChunkList threshold
            if (cur.freeBytes <= freeMinThreshold) {
                // If so, remove it from the current PoolChunkList
                remove(cur);
                // Add the PoolChunkList of the next specification
                nextList.add(cur);
            }
            return true; }}return false;
}
Copy the code

PoolChunkList#allocate determines whether the current PoolChunkList maxCapacity is sufficient to allocate a standardized allocation. Then, the PoolChunkList is traversed until it finds the Chunk and calls chunkList #allocate successfully.

If the allocation is successful, the remaining allocatable freeBytes of Chunk will be smaller than the freeMinThreshold of the current PoolChunkList. This PoolChunk must be moved to the PoolChunkList of the next size.

The PoolChunkList#remove method first removes the list node.

private void remove(PoolChunk<T> cur) {
    if (cur == head) {
        head = cur.next;
        if(head ! =null) {
            head.prev = null; }}else {
        PoolChunk<T> next = cur.next;
        cur.prev.next = next;
        if(next ! =null) { next.prev = cur.prev; }}}Copy the code

The PoolChunk is then added by calling the add method of the next size PoolChunkList, and finally by calling the add0 method. Each time a new PoolChunk is added, it is inserted into the linked list as a head node.

void add0(PoolChunk<T> chunk) {
    chunk.parent = this;
    if (head == null) {
        head = chunk;
        chunk.prev = null;
        chunk.next = null;
    } else {
        chunk.prev = null; chunk.next = head; head.prev = chunk; head = chunk; }}Copy the code

Five, memory release

// ReferenceCountUpdater implements reference counting
private static final ReferenceCountUpdater<AbstractReferenceCountedByteBuf> updater = new ReferenceCountUpdater<AbstractReferenceCountedByteBuf>() {
    // ...
};
@Override
public boolean release(int decrement) {
    return handleRelease(updater.release(this, decrement));
}

private boolean handleRelease(boolean result) {
    if (result) {
        deallocate();
    }
    return result;
}
Copy the code

ByteBuf is implemented as follows: the member variable is null, and the recycle method recycles PooledByteBuf into the object pool, with the focus on PoolArena’s free method.

@Override
protected final void deallocate(a) {
    if (handle >= 0) {
        final long handle = this.handle;
        this.handle = -1;
        memory = null;
        chunk.arena.free(chunk, tmpNioBuf, handle, maxLength, cache);
        tmpNioBuf = null;
        chunk = null; recycle(); }}private void recycle(a) {
    recyclerHandle.recycle(this);
}
Copy the code

PoolArena’s free method, unpooled=true, reclaims underlying memory resources (such as bytebuffers or byte arrays) if chunk is a special chunk of memory larger than 16MB. Otherwise, the PoolThreadCache thread cache is preferred.

void free(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, int normCapacity, PoolThreadCache cache) {
    // Release the underlying ByteBuffer directly
    if (chunk.unpooled) {
        destroyChunk(chunk);
    }
    / / other
    else {
        SizeClass sizeClass = sizeClass(normCapacity);
        // Try to put it into the thread cache
        if(cache ! =null && cache.add(this, chunk, nioBuffer, handle, normCapacity, sizeClass)) {
            return;
        }
        / / release
        freeChunk(chunk, handle, sizeClass, nioBuffer, false); }}Copy the code

The next four cases continue into PoolChunk’s free method.

  • The MPSC queue in the MemoryRegionCache corresponding to PoolThreadCache is full and cannot continue to cache elements.
  • The PoolThreadCache cache succeeded. PoolThreadCache is a FastThreadLocal. When the FastThreadLocal#remove method is called, the onRemoval hook is triggered to perform the entire PoolThreadCache collection.
  • The PoolThreadCache cache succeeded. The PoolThreadCache collection is triggered by the Finalize method of PoolThreadCache.
  • The PoolThreadCache cache succeeded. If the number of memory allocated by PoolThreadCache reaches 8192, trim is performed.

Next, go to the free method of PoolChunk, which we talked about earlier.

void free(long handle, ByteBuffer nioBuffer) {
    // The lower 32 bits of handle are the subscripts of a memoryMap
    int memoryMapIdx = memoryMapIdx(handle);
    // The high 32 bits of handle are the bitmap index of the subpage
    int bitmapIdx = bitmapIdx(handle);
    // The bitmap index is not 0, indicating that the memory allocated by Subpage is returned to Subpage
    if(bitmapIdx ! =0) {
        // Find the header node corresponding to PoolSubpage
        PoolSubpage<T> subpage = subpages[subpageIdx(memoryMapIdx)];
        PoolSubpage<T> head = arena.findSubpagePoolHead(subpage.elemSize);
        synchronized (head) {
            // Returns true, indicating that the subpage is still in use
            // Return false to indicate that the subpage has been reclaimed from the Arena subpage pool and will not be used again
            // The current memoryMapIdx needs to be reclaimed to trunk
            if (subpage.free(head, bitmapIdx & 0x3FFFFFFF)) {
                return; }}}// Return to Chunk's memoryMap if bitmap index is 0 or return to subpage fails

    // Increase the available allocated bytes
    freeBytes += runLength(memoryMapIdx);
    // Set memoryMap[memoryMapIdx] = original value = depth[memoryMapIdx]
    setValue(memoryMapIdx, depth(memoryMapIdx));
    // Update nodes on memoryMap[memoryMapIdx] bottom-up
    updateParentsFree(memoryMapIdx);
    // ByteBuffer is cached for future use, which can be used by resetting the index of the ByteBuffer to reduce GC caused by frequent new objects
    if(nioBuffer ! =null&& cachedNioBuffers ! =null&& cachedNioBuffers.size() < PooledByteBufAllocator.DEFAULT_MAX_CACHED_BYTEBUFFERS_PER_CHUNK) { cachedNioBuffers.offer(nioBuffer); }}Copy the code

Next, due to changes in the Chunk allocatable memory, the Chunk will be triggered to move in several chunkLists. The following move0 method will be called recursively, and due to the decrease in Chunk memory usage, it may end up zero, no ChunkList will receive, and it will return false.

// PoolChunkList.java
boolean free(PoolChunk<T> chunk, long handle, ByteBuffer nioBuffer) {
    // return it to Subpage or Chunk see above
    chunk.free(handle, nioBuffer);
    // Adjust which PoolChunkList Chunk belongs to
    If there is no precursors PoolChunkList will return false, indicating that Chunk is no longer in use and can be released
    if (chunk.freeBytes > freeMaxThreshold) {
        remove(chunk);
        return move0(chunk);
    }
    return true;
}

private boolean move0(PoolChunk<T> chunk) {
    // If the precursor node is empty, the current node is Q000, and the chunk usage is 0, chunk is useless
    if (prevList == null) {
        assert chunk.usage() == 0;
        return false;
    }
    / / recursion
    return prevList.move(chunk);
}

private boolean move(PoolChunk<T> chunk) {
    if (chunk.freeBytes > freeMaxThreshold) {
        return move0(chunk);
    }
    // ... 
}
Copy the code

Finally, back to PoolArena#freeChunk, if no PoolChunkList receives PoolChunk after the usage change, the Chunk is destroyed and the memory resource is reclaimed.

void freeChunk(PoolChunk<T> chunk, long handle, SizeClass sizeClass, ByteBuffer nioBuffer, boolean finalizer) {
    final boolean destroyChunk;
    synchronized (this) {
        // ...
        // PoolChunkList#freedestroyChunk = ! chunk.parent.free(chunk, handle, nioBuffer); }// If no PoolChunkList receives the changed Chunk, the entire Chunk will be reclaimed
    if(destroyChunk) { destroyChunk(chunk); }}Copy the code

DestroyChunk is an abstract method of PoolArena, implemented as a subclass. DirectArena implements the following, the underlying is the JDKByteBuffer freed resources.

@Override
protected void destroyChunk(PoolChunk<ByteBuffer> chunk) {
    if (PlatformDependent.useDirectBufferNoCleaner()) {
        PlatformDependent.freeDirectNoCleaner(chunk.memory);
    } else{ PlatformDependent.freeDirectBuffer(chunk.memory); }}Copy the code

conclusion

  • PooledByteBufAllocator in CINit determines some key specifications based on configuration information, such as page size (8K), Chunk tree depth (11), and Arena array length (number of cores x 2). The size of Chunk (16M) is indirectly determined by page size and Chunk tree depth. The PooledByteBufAllocator constructor constructs the PoolArena array.

  • PooledByteBufAllocator gets the PoolThreadLocalCache instance in the PoolThreadLocalCache thread variable when allocating memory. If PoolThreadCache is not allocated by the current thread, the PoolThreadLocalCache initialValue method is triggered to select an Arena that is least used by the current thread.

  • For Tiny and Small size memory allocation, there are four layers: thread cache ->Subpage pool ->ChunkList-> new chunks; For Normal memory allocation, the Subpage pool is not used because Normal memory allocation is greater than or equal to the page size and is directly allocated by Chunk rather than Subpage. For Huge memory allocation, only a special Chunk that is not pooled is used to allocate memory.

  • The unit of Chunk that Netty applies for memory is 16 MB. Chunk is added to the PoolChunkList. Chunk moves between the PoolChunkList node (qXXX) of different specifications based on the usage. When the Chunk is allocated, the system checks whether the remaining Chunk memory is smaller than the threshold specified by PoolChunkList. If the remaining Chunk memory is smaller than the threshold specified by PoolChunkList, the system moves the Chunk to the next PoolChunkList. Conversely, if the Chunk’s available memory increases (such as page-level memory being put back into Chunk), it may move to the previous PoolChunkList.

  • When the ByteBuf reference count is 0, memory release is triggered. As with the four-tier structure of memory allocation, memory is freed through four tiers: thread cache ->Subpage pool ->Chunk->System. Huge specifications are released directly and returned to the system. Normal specifications do not pass through the Subpage pool.