Touched! I finally figured out Netty memory management

preface

It is Netty’s ease of use and high performance that has made it so popular. As a communication framework, the first is the high requirements for IO performance. Many readers know that Netty uses Direct Memory to reduce the Memory copy between kernel mode and user mode and speed up THE I/O rate. However, frequently applying Direct Memory to the system and releasing it after use is in itself a performance problem. To solve this problem, Netty implements an internal memory management mechanism. During memory application, Netty applies for a large chunk of memory from the OPERATING system at a time, manages the large chunk of memory, and allocates the chunk as required. Netty does not rush to free the memory directly, but reclaims it for future use. This Memory management mechanism can manage not only Directory Memory, but also Heap Memory.

End consumer of memory — ByteBuf

Here, I would like to emphasize to the reader that ByteBuf and memory are two different concepts that need to be understood separately. ByteBuf is an object that needs to be allocated a chunk of memory for it to work. Memory can be generally understood as the memory of our operating system, although the applied memory also needs to depend on the carrier storage: byte[] is used for heap memory, and Direct memory is used for Direct memory. Is Nio’s ByteBuffer(so Java’s ability to use Direct Memory is provided by the Nio package in the JDK). This is important because Netty’s memory pool (or memory management) is used to allocate and recycle memory, whereas Netty’s ByteBuf recycling is another technology called object pooling (through Recycler). Although the two are often used together, they are separate sets of mechanics. There may be a time when ByteBuf is created, and ByteBuf is reclaimed, but memory is newly requested from the operating system. There may also be a time when ByteBuf is newly created and memory is reclaimed. Because a creation process can be broken down into three steps:

Get an instance of ByteBuf (either newly created or intercached)
Apply for memory from Netty memory management mechanism (may be newly applied to the operating system, may be reclaimed previously)
Allocate the allocated memory to ByteBuf

This article focuses only on the memory management mechanism, so it won’t explain the object reclamation mechanism too much.

Netty memory management related classes

There are many classes related to memory management in Netty. Framework provides PoolArena inside, PoolChunkList PoolChunk, PoolSubpage etc used to manage a piece or a set of memory. Externally, ByteBufAllocator is provided for users to operate. Next, we will introduce these classes to some extent and learn about the process of memory allocation and reclamation by using byte Locator. For the sake of length and readability, this article will not cover a lot of very detailed code, but will mainly be illustrated with the necessary code. For comments on the code, see my Netty project on GitHub.

PoolChunck – The minimum memory that Netty requests from the OS

As mentioned above, Netty allocates a large chunk of memory at a time to reduce frequent requests for memory from the operating system. This chunk of memory is then managed and allocated to the memory consumer (ByteBuf) as needed at a time. The memory in this case is PoolChunk, whose size is determined by ChunkSize (the default is 16MB, that is, 16MB of memory is requested from the OS at a time).

Page – the smallest unit of memory managed by PoolChunck

The minimum amount of memory that PoolChunk can manage is called Page, and the size is defined by PageSize(default: 8K). That is, the memory requested by PoolChunk at a time must be one or more pages. When the PoolChunk needs to allocate memory, the PoolChunk checks the internal records to find the location of the Page that meets the memory allocation requirement and allocates the memory to the user.

How does PoolChunck manage pages

We already know that PoolChunk internally organizes memory by Page and allocates memory by Page as well. So how can PoolChunk be managed to balance allocation efficiency (finding available memory as quickly as possible and ensuring that the allocated memory is continuous) and utilization efficiency (avoiding memory waste as little as possible and making the best use of things)? Netty adopted the idea of Jemalloc. First PoolChunk organizes internal memory through a full binary tree. For example, if the default ChunkSize is 16M and PageSize is 8K, a PoolChunk can be divided into 2048 pages. Consider these 2048 pages as widths of leaf nodes, and you get a tree with a depth of 11 (2^11=2048). If each leaf node manages one Page, its parent manages two pages, and so on, the root of the tree manages all the pages of the PoolChunk (since all leaf nodes are children). The amount of memory managed by a node in the tree is all the pages managed by the leaf nodes in the subtree at the root of that node. The advantage is that when you need memory, soon be able to find where to allocate memory (you only need to find the management from memory for you need the memory node, then the node of memory allocation out), and the assigned memory or continuous (as long as the guarantee of adjacent leaf node corresponding Page is continuous).

The node 512 in the figure above manages four pages: Page0, Page1, Page2, and Page3(because there are four leaf nodes below it: 2048,2049,2050, and 2051). Node 1024 manages two pages, Page0 and Page1(the corresponding leaf nodes are Page0 and Page1). To allocate 32K of memory, you only need to allocate the node with the number 512 (512 will be allocated to all its children by default). When 16K memory needs to be allocated, only node 1024 needs to be allocated (once node 1024 is allocated, 2048 and 2049 below are not allowed to be allocated).

After understanding the internal memory management mechanism of PoolChunk, the reader may have several questions:

How does PoolChunk internally mark that a node has been allocated?
When a node is allocated, how is the memory allocated by its parent updated? Once node 2048 is allocated, if you need another 16K of memory, you can’t allocate it from node 1024, which currently has only 8K available.

To solve these two problems, PoolChunk maintains byte[] memeoryMap and byte[] depthMap variables internally. The length of the two arrays is the same, equal to the number of nodes in the tree +1. Because they put the root node at position 1. And the position relation between the parent node and the child node in the array is:

Assuming the subscript of parnet is I, the subscripts of the child nodes are 2i and 2i+1Copy the code

If you represent a binary tree in an array, you’re thinking of a heap.

We know that both arrays represent binary trees, and that each element in the array can be treated as a node of the binary tree. So let’s see what the code for the values of the elements means. For depthMap, this value represents the number of levels of the tree in which the node is located. For example, depthMap[1] == 1, because it is the root node, and depthMap[2] = depthMap[3] = 2, indicating that both nodes are at layer 2. Since the structure of the tree does not change once it is determined, the values of the elements do not change after depthMap is initialized.

In the case of a memoryMap, the value represents the minimum number of layers under that node (or the number of layers closest to the root node) that can be used for full memory allocation. This may sound awkward, but let’s use the example above. First, if no memory is allocated, the amount of memory that each node can allocate is the initial state of the tier (i.e. the initial state of memoryMap is consistent with that of depthMap). Once a child node is allocated, the full memory that can be allocated by the parent node (full memory is the contiguous chunk of memory managed by the node, not the remaining memory size of the node) is reduced (allocation and reclamation of memory modifies the values of related nodes in the associated mermoryMap). For example, after node 2048 is allocated, the memory that can be fully allocated for node 1024 (originally 16K) is the same as node 2049 (its right child) (reduced to 8K). In other words, the capacity of node 1024 has been degraded to the capacity of node 2049’s layer node. This degradation may affect all parent nodes. At this point, the full memory that 512 nodes can allocate is 16K, not 24K(since memory allocation is a power of two, Netty’s memory management mechanism allocates 32K, even though a consumer might actually want 21K).

However, this does not mean that another 8K of memory managed by node 512 is wasted; 8K memory can also be allocated when 8K is requested.

Use a picture to illustrate the process of PoolChunk allocation. Where value represents the value of this node in memoeryMap, and depth represents the value of this node in depthMap. For the first memory allocation, the applicant actually needs 6K of memory:

The effect of this allocation is that the memoryMap values of all its parents are added one layer down. After that, the applicant needs to apply for 12K memory:

Since node 1024 has been unable to allocate the required memory and node 512 can, node 512 asks its right node to try again.

The above describes the process of memory allocation, and the process of memory reclamation is the reverse of the above process — after reclamation, the value of memoryMap of the corresponding node is modified back. I won’t tell you more here.

PoolChunkList – Management of PoolChunk

PoolChunkList has an internal linked list of PoolChunks. Generally, all PoolChunk usage (allocated memory /ChunkSize) in a PoolChunkList are in the same range. Each PoolChunkList has its own minimum or maximum usage range. The PoolChunkList and PoolChunkList are linked, and the PoolChunkList with a smaller usage range is higher in the linked list. If the usage of PoolChunk changes due to memory allocation and usage, PoolChunk will be moved back and forth in the PoolChunkList to the PoolChunkList in the appropriate range. The advantage of this is that PoolChunk with low usage can be used for memory allocation first, thus keeping PoolChunk utilization at a high level and avoiding memory waste.

PoolSubpage – Manager of small memory

The minimum memory managed by PoolChunk is one Page(8K by default), and when we need a small amount of memory, allocating a Page is a waste of memory. PoolSubPage is the manager that manages such small memory.

Small memory refers to memory smaller than one Page. It can be classified into Tiny and Smalll. Tiny refers to memory smaller than 512 BYTES, and Small refers to memory between 512 and 4096 bytes. If the memory block is larger than or equal to one Page, it is called Normal. If the memory block is larger than one Chunk, it is called Huge.

Tiny and Small are internally segmented by specific memory size. For Tiny, it’s 16,32,48… 496(increasing in multiples of 16), a total of 31 cases. For Small, it can be divided into 512, 1024, 2048 and 4096. PoolSubpage applies for a Page of memory from PoolChunk and divides the Page into equal memory blocks. (A PoolSubpage manages only one type of memory blocks. For example, to manage only 16B, divide the memory of a Page into 512 chunks of 16B size. Each PoolSubpage selects only one size of memory management, so poolsubpages that handle the same size are often organized in linked lists, with different sizes stored in separate places. And always manage a specification feature so that PoolSubpage does not need to use PoolChunk’s full binary tree for memory management (for example, managing PoolSubpage 16B only requires allocating 16B of memory, when requesting 32B of memory, The long[] bitmap (which can be thought of as a bit array) is used to keep track of which blocks of managed memory have been allocated (bits are the number of blocks). The implementation is much simpler.

PoolArena – Memory management coordinator

PoolArena is the memory management coordinator. It has an internal linked list of poolChunkLists (as described above, the linked list is divided by the usage managed by PoolChunkList). In addition, it has two arrays of poolSubPages, PoolSubpage[] tinySubpagePools and PoolSubpage[] smallSubpagePools. By default, tinySubpagePools have a length of 31. 496 These 31 poolSubPages are stored in their respective array subscripts. Poolsubpages of the same size are linked in the same array subscript. Similarly, by default, smallSubpagePools have a length of 4 and hold poolSubPages of 512, 1024, 2048, and 4096. PoolArena allocates memory based on PoolChunk or PoolSubpage.

It is important to note that PoolArena is competing to allocate memory, so PoolArena sychronize threads in critical places. Netty optimizes this contention to some extent by allocating multiple poolArenas so that threads can use as many different poolarenas as possible to reduce contention.

PoolThreadCache — thread-local cache to reduce contention for memory allocation

In addition to creating multiple poolArenas to reduce contention, Netty also lets threads cache requested memory when they free it without immediately returning it to the PoolArena. The cached memory is stored in the PoolThreadCache, which is a thread-local variable and therefore thread-safe, and access to it does not need to be locked. The PoolThreadCache contains an array of MemeoryRegionCache, which can also be divided into Tiny,Small, and Normal(Huge is not cached because it is inefficient). The partitions for Tiny and Small are the same as the partitions for PoolSubpage, while Normal has a parameter that controls the size of the cache due to too many combinations (e.g., one Page, two pages, four pages, etc.). Memory blocks that are not in the Normal cache size will not be cached and will be returned to PoolArena. MemoryRegionCache is a queue, and all nodes in the same queue can be regarded as memory blocks of the same size used by the thread. It also has a size attribute to prevent the queue from being too long (when the queue is full, the size of the memory block is not cached, but returned directly to PoolArena). When a thread needs memory, it first looks for the corresponding level of the cache pool (the corresponding array) in its PoolThreadCache. Then find the MemoryRegionCache from the array. Finally, a block of memory is fetched from its queue for allocation.

Netty Memory organization overview and PooledByteBufAllocator procedure for applying memory

With all of the above concepts in mind, use a picture to give the reader a deeper impression.

The figure above details only Heap Memory, and Directory Memory is similar.

Finally, the PooledByteBufAllocator is used as the entrance to comb through the memory application process again:

PooledByteBufAllocator. NewHeapBuffer () began to apply for memory
Get the thread-local variable PoolThreadCache and PoolArena bound to the thread
The PoolArena allocates memory, first fetching the ByteBuf object (either reclaimed or created by the object pool), and then allocating memory at the beginning
If the PoolThreadCache does not have a block of the same size, allocate memory from PoolArena
For Normal memory, a PoolChunk is allocated from the PoolChunkList. If no PoolChunk is allocated, apply for a PoolChunk from the OS and allocate the corresponding Page by PoolChunk
For Tiny and Small sizes, the PoolSubpage cache is allocated from the PoolSubpage cache. If there is no PoolSubpage, PoolChunk is allocated first and PoolChunk is allocated to PoolSubpage
For Huge level of memory, will not cache, will be used when the application, release directly reclaim 8. ByteBuf uses the obtained memory to complete a memory request process

conclusion

Netty’s memory management mechanism is still clever, but it’s a little hard to explain. I wanted to put aside the source code as easy as possible and talk about the principle, but unconsciously also wrote a paragraph of text. I hope the above pictures can help readers understand. In addition, this article does not cover the process of memory freeing. Release is the reverse of application. Interested readers can follow the source code themselves, or find the source code comment from the project at the beginning of this article.

The last

Thank you for reading here, the article is inadequate, welcome to point out; If you think it’s good, give me a thumbs up.