Netty Source code storage Management (1)(4.1.44)
As a high-performance network application framework, Netty has its own memory allocation. The idea is derived from Jemalloc Github, which can be said to be the Java version of Jemalloc. This chapter source code is based on Netty 4.1.44 version, the version is using jEMalloc3. x algorithm thought, and 4.1.45 after the version is based on JEMalloc4. x algorithm for reconstruction, the difference between the two is quite big.
High-performance Memory allocation
Jemalloc is a new generation of memory allocator introduced by Jason Evans in the FreeBSD project. It is a general-purpose malLOc implementation that focuses on reducing memory fragmentation and improving memory allocation efficiency in high-concurrency scenarios, with the goal of replacing MALLOc. Jemalloc is widely used in Firefox, Redis, Rust, Netty and other well-known products or programming languages. For details, see A paper by Jason Evans [A Scalable Concurrent Malloc Implementation for FreeBSD]. In addition to Jemalloc, there are some well-known high-performance memory allocator implementations in the industry, such as PTMALloc and TCMALloc. A simple comparison is as follows:
- Ptmalloc (Per-thread Malloc) is a standard memory allocator based on gliBC. The disadvantage is that the memory can not be shared between multiple threads, the memory overhead is very large.
- Tcmalloc (Thread-Caching Malloc) is open source by Google. It is featured with thread caching and is currently used in Chrome and Safari. Tcmalloc allocates a local cache for each thread. Small memory objects can be allocated from thread-local buffers, while for large memory allocations, spin locks are used to reduce memory contention and improve memory efficiency.
jemalloc
Drawing on the excellent design ideas of TCMALloc, there are many similarities in architectural design, including thread caching feature. But Jemalloc is more complex in design than TCMALloc. It divides the memory allocation granularity into ** Small, Large, Huge**, and records a lot of metadata, so the metadata takes up more space than TCMALLOC.
From the above, their core goals are nothing more than two things:
- Efficient memory allocation and reclamation to improve performance in single – or multi-threaded scenarios.
- Reduce memory fragmentation, both memory fragmentation and external fragmentation. Improve memory utilization.
Memory fragments
In the Linux world, physical memory is divided into 4KB memory pages, which is the minimum granularity to allocate memory. Allocation and recycling are done based on page. Fragments generated within the page are called memory fragments, and fragments generated outside the page are called external fragments. Fragmentation occurs when memory is broken up into small chunks that, while free and with contiguous addresses, are too small to be used. As memory is allocated and freed more and more times, memory becomes more and more discontinuous. Eventually, the entire memory is reduced to fragmentation, and even if there are enough free page frames to satisfy the request, it is not enough to allocate a large contiguity of page frames, so the core of reducing memory waste is to avoid fragmentation as much as possible.
Common memory allocator algorithmswiki
Common memory allocator algorithms are:
- Dynamic memory allocation
- Partner algorithmWiki
- Slab algorithm
Dynamic memory allocation
Dynamic Memory allocation is also called simple DMA. Simply put, the operating system will give you as much memory as you want. In most cases, the amount of memory required is not known until the program is running, the amount of memory allocated in advance is difficult to control, too much is wasted, and too little is wasted. DMA allocates as needed from a block of memory. Metadata is recorded for allocated memory and free partitions are used to maintain free memory so that available partitions can be quickly found for the next allocation. The following three search strategies are common:
First Fit algorithm
- Free partitions are joined together in a bidirectional linked list in order of memory address from lowest to highest.
- Memory allocation starts at the low address and is allocated each time. So low address usage is high and high address usage is low. It also generates more small memory.
Cycle First Fit Algorithm (Next FIT)
- This algorithm is a variation of the first adaptive algorithm, the main change is that the second allocation starts from the next free partition search.
- For the first time adaptation algorithm, the algorithm allocates the memory more evenly and improves the search efficiency, but it leads to serious memory fragmentation.
Best Fit Algorithm
- The spatial partition chain always keeps increasing order from small to large. When allocating memory, the appropriate spatial memory is found and allocated from the beginning, and when the allocation request is completed, the free partition chain is reordered by partition size.
- This algorithm has higher space utilization, but it also has small space partitions that are difficult to use. The reason is that the size of free memory blocks remains unchanged, and there is no optimized classification for memory size. Unless the size of memory exactly equals the size of free memory blocks, the space utilization rate is 100%.
- It needs to be reordered after each allocation, so there is CPU consumption.
Buddy Memory Allocationwiki
The partner memory allocation technique is a memory allocation algorithm, which divides the memory into partitions to meet the memory request with the most appropriate size. Invented by Harry Markowitz in 1963. The partner algorithm groups all the free page boxes into 11 block linked lists, each containing 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page boxes, respectively. The maximum memory request size is 4MB, which is contiguous.
- The companion algorithm is the same size, continuous address.
- Disadvantages: While the partner algorithm effectively reduces external fragmentation, the minimum granularity is page, which can result in very severe internal fragmentation, up to 50% memory fragmentation.
Slab algorithm
- Partner algorithmThis is not applicable in small memory scenarios, where one is allocated at a timepage, resulting in memory tuition and miscellaneous fees. whileSlab algorithmIs in thePartner algorithmOn the basis of the small memory allocation scenario made special optimization:
- Provides a tuning cache mechanism to store kernel objects, which can be retrieved from the cache when the kernel needs to allocate memory again.
- Linux uses Slab algorithm to allocate memory.
Jemalloc algorithms
Jemalloc is based on Slab and is more complex than Slab. Slab improves speed and efficiency in small memory allocation scenarios, and Jemalloc also has excellent memory allocation efficiency in multi-threaded scenarios through Arena and Thread Cache. Arena is the idea of divide-and-rule, rather than having one person manage all memory, it is better to assign tasks to multiple people, each managing independently without interference (threads competing). Thread Cache is the core idea of TCmalloc, which Jemalloc also borrows from. Each thread has its own memory manager, and allocation is done within this thread so that it does not have to compete with other threads. The related documents
- Facebook Engineering Post: This article was written in 2011 and buta collection to Jemalloc 2.1.0.
- jemalloc(3) manual page: The manual page for the latest release fully describes the API and options supported by jemalloc, and includes a brief summary of its internals.
The underlying memory allocation of Netty is based on the idea of jemalloc algorithm.
Memory size
Netty reserved different memory allocation policies for different sizes, as shown in the figure above. Defined in Nettyio.netty.buffer.PoolArena.SizeClass
Enumeration class, used to describe the memory specification types shown above, respectivelyTiny, Small, and Normal.when>16MB
, asHugeType. Netty defines finer – grained memory allocation units for each region, respectivelyChunk, Page, and Subpage.
// io.netty.buffer.PoolArena.SizeClass
enum SizeClass {
Tiny,
Small,
Normal
}
Copy the code
Memory normalization
Netty normalizes the memory size requested by users to facilitate subsequent calculation and memory allocation. For example, if the user requests 31B memory size, if no memory normalization, directly return 31B memory size, then it is a DMA memory allocation. By normalizing memory, 31B specifications are reduced to 32B and 15MB is normalized to 16MB. Of course, the strategy is different for different types of memory. Here are some clues:
- for
Huge
Level of memory size, returning as much memory as the user requests (memory alignment if necessary). - for
tiny
、small
、normal
Level of memory to512B
Is the dividing line:- when
>=512B
, returns the value closest to 2 and greater than the size of the memory requested by the user. For example, the requested memory size is513B
, the return1024B
. - when
<512B
When, return closest16
Is a multiple of and greater than the size of the memory requested by the user. For example, the requested memory size is17B
, the return32B
; The applied memory size is46B
To return to48B
.
- when
Memory normalizationCore source code inio.netty.buffer.PoolArena
Object,PoolArena
This is the most important class for Netty to manage memory:
Gets the closest number to 2 to the n
We need to apply memorynormalized.Easy to calculate and manage.The following is the1025
The process of normalizing:A series ofThe displacementCalculate, see dazzling. The main goal is to find the value closest to 2 that is larger than the size of the memory requested by the user. The idea is to take the binary0100, 0000, 0001 (1025).
become0111, 1111, 1111 (2048).
. I’m going to start with zeroi
, the highest binary bit of the original value is1
Is denoted asj
, the specific execution process is described as follows:
- To perform first
i-1
Operation, the purpose of which is to solve for a value of 2 can also get itself, not 2. - To perform
i |= i>>>1
Operation in order to assign the value noj-1
A value of1
. Has been the firstj
The bit position is determined to be1
, so unsigned move one bit to the rightj-1
Also for the1
. And then with the original value|
Update the number after operationj-1
The value of the. In this case, the first value of the original valuej
,j-1
Are identified as1
, and then I can unsign double to the right, letj-2
,j-3
The assignment for1
. Due to theint
The type has 32 bits, so you only need to proceed5
Each time, unsigned right shift 1, 2, 4, 8, 16 can be less thani
Is assigned to all bits of1
.
Gets the next nearest multiple of 16
The idea is simple: first, erase the lower four bits (to 0) and add 16 to get the target value.
(reqCapacity & ~15) + 16;
// 0000 0000 0000 0000 0000 0000 0000 0000
// 0000 0000 0000 0000 0000 0000 0000 0000 1111 (15)
// 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 0000 (-16)(~15) // ~15
// 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 (32)(reqCapacity& ~15
// 0000 0000 0000 0000 0000 0000 0000 0000 0000 (48) //
Copy the code
summary
Netty
Through a large number ofAn operationTo improve performance, but the codeIt’s not very readable. So you can do a bit search on the Internet and you can do a bit simulation.- Bit operation of the use of skills, you can see a bit operation introduction and practical skills, which is very detailed.
- Netty and memory normalization of the bit-operation techniques show three:
- One is to find the value of 2 closest to allocated memory and larger than allocated memory.
- The second is to find the value closest to the allocated memory and greater than 16 times the allocated memory.
- Third, judge whether it is greater than a certain number through the mask.
- Memory normalization is in bytes, not bits.
Netty memory pool allocation
- First of all,
Netty
Will send toThe operating system
Apply for a contiguous chunk of memory, called a chunk, unless requestedHuge
Level size of memory, otherwise the general size is16MB
, the use ofio.netty.buffer.PoolChunk
Object wrapping. It looks like this:
- NettywillchunkFurther split into multiplepage, eachpageThe default size is
8KB
Therefore, everychunkcontains2048
个 page. In order toSmall memoryFor fine management, reduce memory fragmentation, improve memory usage, Netty to **page ** further split severalsubpage, the size of the subpage changes dynamically, and the minimum is16Byte
. - Calculation: When memory allocation is requested, the required memory size is normalized to obtain the appropriate memory value. Verify the exact height of the tree based on the value.
- Search: Searches and allocates free groups from left to right at the corresponding height of the group size.
- Tag: Groups are marked as fully used and their parent tag information is updated through a loop. The parent node takes the smallest of the two child nodes.
Of course, the above is just the overall idea, but it is still in the clouds. I believe that after the following can help you to see the sun.
Huge Allocation logic overview
Large memory allocation is a bit simpler than other types of memory allocation. The memory unit that operates is PoolChunk, which has the capacity required by the user (memory alignment requirements are met). Netty uses a non-pooled management policy for Huge memory blocks. Each time a special non-pooled PoolChunk object is created for memory allocation. When the object memory is released, the entire PoolChunk memory is also released. Large memory allocation logic is in io.net ty. The buffer. The PoolArena# allocateHuge completed.
Normal allocation logic
The size of Normal allocation ranges from 4097B to 16M. The core idea is to split PoolChunk into 2048 pages, which is the minimum unit of Normal allocation. Each page is equal in size (pageSize=8KB) and logically manages these page objects through a full binary tree. Normal’s allocation core logic is done in PoolChunk#allocateRun(int).
Small allocation logic
The size range assigned by the Small level is (496B, 4096B). The core is to split a page into several subpages. PoolSubpage is the incarnation of these subpages. It effectively solves the problem of memory fragmentation caused by small memory scenarios. A page size is 8192B, and there are only four sizes: 512B, 1024B, 2048B, and 4096B, increasing by a factor of two. If the required memory size ranges from 496B to 4096B, one of the four memory types can be determined. When allocating memory, find an idle page at the bottom of the tree, divide it into several subpages, and construct a PoolSubpage for management. Select the first subpage for this application, mark it as used, and place PoolSubpage in the linked list corresponding to the PoolSubpage[] smallSubpagePools array. PoolSubpage[] allocates memory directly from the linked list.
Tiny allocation logic
The Tiny level allocation ranges in size from (0B to 496B). The allocation logic is similar to Small, in that the idle Page is found and divided into several subpages and a PoolSubpage is constructed to manage them. The first subpage is then selected for this application and the PoolSubpage object is placed in the linked list corresponding to the PoolSubpage[] tinySubpagePools array. Wait for the next assignment. The difference is how do you define how many? The definition logic given by Tiny is to get the size closest to 16*N that is larger than the specification value. For example, if the size of memory is 31B, find the next nearest multiple of 16*1 and greater than 31 is 32, therefore, split the Page into 8192/32=256 subpages, which are determined according to the specification value, it is variable value.
PoolArena
The above shows how memory allocation is done for different levels of Netty. Next, we first to understand some classes, for the subsequent source code interpretation to lay a foundation. PoolArena is a core class that allocates memory using a fixed number of arenas. By default, it is dependent on the number of CPU cores. It is a shared object by threads, and each thread is bound to only one PoolArena. When a thread requests memory allocation for the first time, it gets an Arena through round-robin, and only deals with that Arena for its entire life cycle. As mentioned earlier, PoolArena is an embodiment of the idea of didivide and conquer, and performs well in multi-threaded scenarios. PoolArena provides DirectArena and HeapArena subclasses, which are required because of the underlying container types. But the core logic is done in the PoolArena. The PoolArena data structure (excluding the monitoring metrics) can be roughly divided into two categories: the six PoolChunkLists that store PoolChunks and the two arrays that store poolsubpages. The PoolArena constructor initialization also does a lot of important work, including concatenating PoolChunkList and initializing PoolSubpage[].
Initialize PoolChunkList
q000
,q025
,q050
,q075
,q100
Indicates the lowest memory usage. As shown in the figure belowanyPoolChunkList
Both have upper and lower limits for memory usage:minUsag
,maxUsage
. If usage exceedsmaxUsage
, thenPoolChunk
From the currentPoolChunkList
Remove and move toThe nextPoolChunkList
. Similarly, if the usage is less thanminUsage
, thenPoolChunk
From the currentPoolChunkList
Remove and move toBefore aPoolChunkList
. eachPoolChunkList
There’s some overlap between the upper and lower bounds of phi becausePoolChunk
Need to be inPoolChunkList
Constantly moving, if the critical values are perfectly aligned, will causePoolChunk
In the twoPoolChunkList
Constant movement, resulting in performance loss.PoolChunkList
Apply toChunk
Memory allocation in the scenario,PoolArena
Initialize the6
个 PoolChunkList
And according to the above end to end, forming a bidirectional linked list, onlyq000
thisPoolChunkList
There is no forward node because when the restPoolChunkList
There is no suitablePoolChunk
When memory can be allocated, a new one is createdPoolChunk
In thepInit
, and then allocate memory according to the size of the requested memory. And in thep000
In thePoolChunk
If, due to memory return, the usage drops to0%
, you do not need to addpInit
, execute the destruction method directly to free the memory of the entire memory block. In this way, the memory in the memory pool has a life cycle process such as build/destroy, avoiding memory that is not used.
Initialize PoolSubpage []
PoolSubpage
It’s for some onepage
Because of the avatarPage
Can also according to theelemSize
Break it up into piecessubpage
, used in PoolArenaPoolSubpage[]
Array to storePoolSubpage
Object, pastPoolArena
As shown below:Remember this picture:forSmall
It comes in four different sizes, sosmallSupbagePools
The array length of is4
.smallSubpagePools[0]
saidelemSize=512B
的 PoolSubpage
A linked list of objects,smallSubpagePols[1]
saidelemSize=1024B
的 PoolSubpages
A linked list of objects. And so on,tinySubpagePools
The principle is the same, but the granularity of the partition (step size) is less, to16
Is increasing by multiple of. Therefore, due toTiny
Size limits, total can be divided into32
Class, sotinySubpagePools
The array length is32
. Array subscriptsize
The capacity varies, and each array corresponds to a set of bidirectional linked lists. These two arrays are for storagePoolSubpage
And according to the objectPoolSubpage#elemSize
Determine the location of the indexindex
, and finally construct a bidirectional linked list.
The source code
The subclass implementation
The inheritance system is shown in the figure below:
PoolArenaMetric
To define andPoolArena
Related monitoring interfaces.PoolArena
: Abstract class. Defines the main core variables and part of the memory allocation logic. Due to the storageData containersThe create and destroy logic is also different. So it has two subclasses, DirectArena and HeapArena.
An abstract classPoolArena
There are several interfaces that subclasses must implement:
These abstract methods are the difference between the DirectArena and HeapArena implementation classes, and the details are not described.
PoolChunkList
PoolChunkList
It’s a bidirectional linked list for storagePoolChunk
Object, which points toPoolChunk
The head of a linked list. And forPoolChunkList
The node itself, it and otherPoolChunkList
It also forms a bidirectional linked list. As shown above.PoolChunkList
The internal definition is simple:
PoolChunk
PoolChunk is Netty’s description of the idea of Jemalloc3. x algorithm, which is the core class of Netty’s memory allocation.
To translate documents
An overview of the description
Page is the smallest memory unit that can be allocated by Chunk. Chunk is a collection of pages. The Chunk size is calculated by chunkSize = 2^{maxOrder} * pageSize. First, we allocate a size = chunkSize byte array, and when we need to create a ByteBuf of a given size, we search for the first position in the byte array that has enough free space to accommodate the requested size, And returns a handle value of type LONG to encode the offset information (the memory segment is then marked reserved, so it is always used by one ByteBuf, not multiple). For simplicity, all requested memory sizes are normalized using the PoolArena#normalizecapacity method. This ensures that when we request a memory segment of size >= pageSize, the normalized capacity is equal to the next nearest power of 2. To get the first offset available for the size of the request, we construct a Compelte balanced binary ** to speed up the search. Use the array memoryMap to store information about the tree. The tree looks something like this (the parentheses indicate the size of each node)
- depth=0 1 node (chunkSize)
- depth=1 2 nodes (chunkSize/2)
- .
- depth=d 2^d nodes (chunkSize/2^d)
- .
- depth=maxOrder 2^maxOrder nodes (chunkSize/2^{maxOrder} = pageSize)
When depth=maxOrder, the leaf node is composed of page.
Search algorithm
Encode full binary trees in a memoryMap with symbols.
memoryMap
Type isbyte[]
Is used to record tree allocations. The initial value is the depth of the tree where the corresponding node resides.memoryMap[id] = depth_of_id
=> Idle/Not allocated.memoryMap[id] > depth_of_id
=> At least one child node has been allocated, but other child nodes can still be allocated.memoryMap[id] = maxOrder + 1
=> The current node has been allocated, that is, the current node is unavailable.
allocateNode(d)
The goal is to find the first free allocatable node at the corresponding depth from left to right. The d parameter indicates depth.
- Start at the beginning node. (depth=0 or id=1)
- if
memoryMap[1] > d
Said thisChunk
No allocated memory available. - If the value of the left node
<=h
, we can allocate from the left subtree and repeat until we find a free node. - Otherwise, depth the right subtree and repeat until a free node is found.
allocateRun(size)
Assign a set of pages. The size parameter represents the normalized memory size.
- To calculate
size
The corresponding depth. The formulad = log_2(chunkSize/size)
. - return
allocateNode(d)
allocateSubpage(size)
Create/initialize a new PoolSubpage of normcacity size. Any PoolSubpage created/initialized is added to the sub-page memory pool of the PoolArena that owns the PoolChunk.
- use
allocateNode(maxOrder)
Find any free page child nodes and return onehandle
The variable. - use
handle
buildPoolSubpage
Object and add toPoolArena
的subpagePool
Memory pool.
The source code
PoolChunk
The source code is relatively complex, the first need to define the variable to understand clearly, for the subsequent memory allocation source code analysis to lay a foundation.
Overview of related methods:
This is just to give you an idea of what the variables and methods actually do when the source code is analyzed.
PoolSubpage
PoolSubpage
是 Small
,Tiny
Level The object used when allocating memory. aPoolSubpage
Object corresponds to a page. So, onePoolSubpage
The managed memory size is8KB
. Related variables are explained as follows:
PoolSubpage is also very clever at managing small memory, more on that later.
Let’s talk about pooled memory allocation
In the ByteBuf chapter we talked about the ByteBufAllocator allocator system. But there is an overview of the allocator system, PooledByteBufAllocator related to pooled allocator simply describes the initialization process. Now we continue to use this as a starting point to clarify how classes are allocated and managed. PooledByteBufAllocator is a thread-safe class. We can use PooledByteBufAllocator. DEFAULT to obtain a io.net ty. Buffer. PooledByteBufAllocator pooling distributor, which is one of Netty recommended practices. PooledByteBufAllocator will initialize two important arrays, namely heapArenas and directArenas. All operations related to memory allocation will be delegated to heapArenas or directArenas. The length of the array is usually calculated by 2*CPU_CORE. This reflects the memory allocation design concept of Netty (or jemalloc algorithm to be exact), which reduces memory competition by adding multiple Arenas and improves the speed and efficiency of memory allocation in multi-threaded environment. The arenas array is made up of the PoolArena object we discussed above, which is the central hub of memory allocation, a big steward. This includes managing PoolChunk objects, managing PoolSubpage objects, allocating the core logic of memory objects, managing the local object cache pool, memory pool destruction, etc. It focuses on managing allocated memory objects. PoolChunk is the embodiment of the idea of Jemalloc algorithm. It knows how to allocate memory efficiently. You only need to call the corresponding method to get the chunk of memory you want. PoolArena will worry about that anyway. Next, we will take you into the world of Netty memory allocation through source code through the PooledByteBufAllocator method for entry.
Off-heap memory allocation source code implementation
The underlying data storage container for off-heap memory is the java.nio.byteBuffer object. Usually by io.net ty. Buffer. AbstractByteBufAllocator# directBuffer (int) to get a pooling of heap memory ByteBuf object. Tracking method, it will be through the abstract class io.net ty. The buffer. The AbstractByteBufAllocator# newDirectBuffer to subclass implementation, here is the use of pooling distributor PooledByteBufAllocator implementation. The relevant source code is as follows:
// io.netty.buffer.PooledByteBufAllocator#newDirectBuffer
/** * get an out-of-heap memory "ByteBuf" object */
@Override
protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
// #1 Fetch the PoolThreadCache object from the local thread cache
PoolThreadCache cache = threadCache.get();
// #2 Retrieve "directArena" from the cache object, select the corresponding "Arena" according to the storage type.
PoolArena<ByteBuffer> directArena = cache.directArena;
final ByteBuf buf;
if(directArena ! =null) {
// delegate "directArena" to allocate memory
buf = directArena.allocate(cache, initialCapacity, maxCapacity);
} else {
// # 2
buf = PlatformDependent.hasUnsafe() ?
UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
}
// #4 Wrap the generated "ByteBuf" object for memory leak detection
return toLeakAwareBuffer(buf);
}
Copy the code
Above is the allocator to allocate a pooled ByteBuf object core source code. Doesn’t feel very simple, so the memory allocation delegate is done by directArena. As mentioned earlier, each thread can only bind to one PoolArena object for the lifetime of the thread. This reference is stored in the PoolThreadCache, a thread that wants to allocate memory. The call to threadcache.get () initializes the relevant variables. Normally Netty starts the local threadCache by default, so the directArena object obtained from the cache is not empty. This PoolThreadCache is useful! It holds the PoolArena object, which caches part of the ByteBuffer or byte[] information via MemoryRegionCache, All we need to know is that we get one of the dicrectArena objects from the PoolThreadCache local cache, and by comparing the size of each PoolArena#numThreadCaches in PoolByteBufAllocator, Returns the minimum PoolArena object. Each thread has a PoolThreadCache. More on PoolThreadCache in a new section. Continuing with the main line, PoolArena#allocate(PoolThreadCache, int, int) is now used. Let’s see what PoolArena does:
Phase 1: Initialize a ByteBuf instance object
Object pooling speeds up memory and release of ByteBuf objects, but the downside is that programs run by developers who don’t know Netty’s underlying layers can cause memory leaks. If the object pool does not exist, it is created according to the corresponding rules.
// io.netty.buffer.PoolArena#allocate(io.netty.buffer.PoolThreadCache, int, int)
/** * Get pooled instances of "ByteBuf" */
PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
// #1 get an instance object of "ByteBuf". It can be generated directly or retrieved from the object pool.
// It is the "PoolArena" abstract class that needs to be subclassed. Here is the "PoolArena" implementation class
PooledByteBuf<T> buf = newByteBuf(maxCapacity);
// #2 fills the physical memory information for "buf"
allocate(cache, buf, reqCapacity);
/ / # 3 returns
return buf;
}
// io.netty.buffer.PoolArena.DirectArena#newByteBuf
/** * get an instance object of "ByteBuf". * /
@Override
protected PooledByteBuf<ByteBuffer> newByteBuf(int maxCapacity) {
if (HAS_UNSAFE) {
// #1 "ByteBuf" with "Unsafe", which is generally supported in the server
// So let's take a closer look at how this method is implemented
return PooledUnsafeDirectByteBuf.newInstance(maxCapacity);
} else {
// #2 "unbroadening" for "ByteBuf"
returnPooledDirectByteBuf.newInstance(maxCapacity); }}Copy the code
// io.netty.buffer.PooledUnsafeDirectByteBuf
/ * * * "PooledUnsafeDirectByteBuf" is not "public" decorate, it is visible object, therefore, we can't get this type instances by distributor. * this "ByteBuf" has the "ObjectPool" ObjectPool to speed up the allocation of objects. * a and its type, called "io.net ty. Buffer. PooledDirectByteBuf", also use "ObjectPool object pool". * specific difference is "PooledUnsafeDirectByteBuf" maintenance "memoryAddress" internal variables, this is the necessary "Unsafe" operating variables. * /
final class PooledUnsafeDirectByteBuf extends PooledByteBuf<ByteBuffer> {
/ / object pool
private static final ObjectPool<PooledUnsafeDirectByteBuf> RECYCLER = ObjectPool.newPool(
new ObjectCreator<PooledUnsafeDirectByteBuf>() {
@Override
public PooledUnsafeDirectByteBuf newObject(Handle<PooledUnsafeDirectByteBuf> handle) {
return new PooledUnsafeDirectByteBuf(handle, 0); }});static PooledUnsafeDirectByteBuf newInstance(int maxCapacity) {
// #1 Get the instance of "ByteBuf" from the object pool
PooledUnsafeDirectByteBuf buf = RECYCLER.get();
/ / # 2 reset
buf.reuse(maxCapacity);
/ / return
return buf;
}
private long memoryAddress;
// Reset all pointer variables
final void reuse(int maxCapacity) {
maxCapacity(maxCapacity);
resetRefCnt();
setIndex0(0.0);
discardMarks();
}
// ...
}
Copy the code
Phase 2: Fill memory information for ByteBuf
The core method of this phase belongs toio.netty.buffer.PoolArena#allocate(io.netty.buffer.PoolThreadCache, io.netty.buffer.PooledByteBuf<T>, int)
,PoolArena
Different memory allocation policies are adopted according to the requested memory size, and the memory information is writtenByteBuf
Object. So we had a little bit ofPoolSubpage<T>[] tinySubpagePools
And PoolSubpage[] smallSubpagePools smallSubpage [] smallSubpagePoolstiny&small
Level of memory used when. This is available on the next request to allocate the same amount of memoryPoolSubpage<T>[]
I’m going to allocate. Take a good recess from the source:
Now to summarize the off-heap allocation logic:
- First of all, the application capacity is normalized. The value that is closest to and greater than the original value to the power of 2 is called the canonical value.
- Select the appropriate allocation strategy based on the specification values. In the general direction, yes
3
Species allocation strategy, respectivelytiny&small
,normal
As well asHuge
. Huge
Memory allocation is not attempted from the local thread cache, nor is it pooled and created directlyPoolChunk
Object and return.- when
Normal
To allocate memory, pressq050->q025->q000->qInit->q075
Assign in order, fromq050
I’m going to start allocating because it’s a compromise allocation if I go fromq000
If it were distributed, there would be most of themPoolChunk
Faced with frequent creation and destruction, memory allocation performance degrades. If fromq050
At the beginning, will makePoolChunk
The range of usage remained in the middle, both reducedPoolChunkList
The probability of being recycled also takes into account performance. If the allocation is successful, this is evaluatedPoolChunk
The usage exceedsPoolChunkList
When moving to the next onePoolChunkList
Chain in the table. If the allocation fails, a new memory block is created for memory, and if the allocation succeeds, it is added toqInit
A linked list. - for
Tiny&Small
Level, will try to passPoolSubpage
Allocates, and returns if the allocation is successful. If the assignment fails, press againNormal
That allocation logic does the allocation.
In general, the PoolArena#allocate method is the core logic by which PoolArena objects allocate memory, choosing an appropriate allocation strategy based on the specification values. It also speeds up memory allocation through local thread caches, ByteBuf object allocation through object pools, and reduces GC.
Overview of in-heap memory allocation
The logic for allocating in-heap and off-heap memory is roughly the same, except that:
- use
PoolArena
A subclass ofHeapArena
Complete assignments. - The underlying data container is
byte[]
And theDirectArena
是java.nio.ByteBuffer
Object.
Memory recovery
Who is the subject of memory reclamation? We know that Netty caches part of the allocated memory through Thead Cache, so how does it reclaim memory? The subject here is Thread Cache. For the big butler PoolArena, how does it manage memory reclamation? Most of the time, ByteBuf objects are released by BytBuf#release(). This API only makes the reference count -1 and does not directly reclaim physical memory. Reclaim physical memory only when the reference count is 0. The ByteBuf#release() call is summarized as follows: we Update the reference count with an Update object, and if the reference count is zero, memory needs to be freed. If the PoolChunk to which it belongs does not support pooling, it is released directly. For pool-chunk, check whether the local thread can cache the memory information to be reclaimed. If the local thread cache succeeds, return the memory information. Otherwise, PoolArena handles memory reclamation. The PoolArena is handed over to the PoolChunkList. The processing logic is relatively simple: Find PoolChunk to reclaim the memory and check whether PoolChunk meets minUsage. If not, move the forward node. At this point, this is what memory reclamation looks like.
// io.netty.buffer.AbstractReferenceCountedByteBuf#release()
@Override
public boolean release(a) {
RefCnt = refcnt-2; refCnt= refcnt-2
Update. Release (this) returns true, indicating that the current reference count for "ByteBuf" is 0,
// It is time to release
// #2 free memory
return handleRelease(updater.release(this));
}
// io.netty.buffer.AbstractReferenceCountedByteBuf#handleRelease
private boolean handleRelease(boolean result) {
if (result) {
// Free memory
deallocate();
}
return result;
}
// io.netty.buffer.PooledByteBuf#deallocate
@Override
protected final void deallocate(a) {
// considering whether the handle variable is >=0
if (handle >= 0) {
final long handle = this.handle;
this.handle = -1;
memory = null;
// use PoolArena#free
chunk.arena.free(chunk, tmpNioBuf, handle, maxLength, cache);
tmpNioBuf = null;
chunk = null;
// Reclaim the "ByteBuf" objectrecycle(); }}// io.netty.buffer.PoolArena#free
/** * PoolArena defines "release" *@paramChunk ByteBuf (PoolChunk) *@paramNioBuffer Temporary "ByteBuffer" object * inside "ByteBuf"@paramHandle The value of the handle variable *@paramNormCapacity Indicates the applied memory value *@paramCache Thread cache */
void free(PoolChunk<T> chunk,
ByteBuffer nioBuffer,
long handle, int normCapacity, PoolThreadCache cache) {
if (chunk.unpooled) {
// #1 The Chunk to which "ByteBuf" belongs is not pooled and directly destroyed
// Different destruction strategies are adopted according to the underlying implementation.
// If the object is "ByteBuf", according to the classification of "Cleaner", take different destruction methods
// If it is "byte[]" and no processing is done, the JVM GC will reclaim the memory
int size = chunk.chunkSize();
destroyChunk(chunk);
activeBytesHuge.add(-size);
deallocationsHuge.increment();
} else {
// #2 For pooled "chunks"
SizeClass sizeClass = sizeClass(normCapacity);
if(cache ! =null &&
// Try adding it to the local cache. How to add it will be explained in another section
// MermoryRegionCache is used to cache memory information, such as handle value, capacity, and chunk
// It can be allocated from the local thread when the next thread requests the same capacity
// Then some people would say, have borrowed not to return? That's not possible. PoolThreadCache maintains the add count and fires when a certain threshold is reached
// Reclaim action does not cause a memory leak
cache.add(this, chunk, nioBuffer, handle, normCapacity, sizeClass)) {
return;
}
// the local cache fails to be added
freeChunk(chunk, handle, sizeClass, nioBuffer, false); }}// io.netty.buffer.PoolArena#freeChunk
/** * Release the "ByteBuf" object *@param chunk
* @param handle
* @param sizeClass
* @param nioBuffer
* @param finalizer
*/
void freeChunk(PoolChunk<T> chunk,
long handle,
SizeClass sizeClass,
ByteBuffer nioBuffer, boolean finalizer) {
final boolean destroyChunk;
synchronized (this) {
// We only call this if freeChunk is not called because of the PoolThreadCache finalizer as otherwise this
// may fail due lazy class-loading in for example tomcat.
// This is the judgment of lazy loading. For example, when Tomcat uninstalls an application, it uninstalls the corresponding ClassLoader.
// The thread recovery finalizer may require the class information of this class loader, so let's check
if(! finalizer) {switch (sizeClass) {
case Normal:
++deallocationsNormal;
break;
case Small:
++deallocationsSmall;
break;
case Tiny:
++deallocationsTiny;
break;
default:
throw newError(); }}// call PoolChunkList#free to return the memorydestroyChunk = ! chunk.parent.free(chunk, handle, nioBuffer); }if (destroyChunk) {
// destroyChunk not need to be called while holding the synchronized lock.destroyChunk(chunk); }}// io.netty.buffer.PoolChunkList#free
boolean free(PoolChunk<T> chunk, long handle, ByteBuffer nioBuffer) {
// #1 collect memory blocks with "PoolChunk#free"
// Handle records the location of the tree
// PoolChunk caches nioBuffer objects for use next time
chunk.free(handle, nioBuffer);
// #2 Determine whether the current usage of PoolChunk needs to be moved to the previous node list
if (chunk.usage() < minUsage) {
remove(chunk);
// Move the PoolChunk down the PoolChunkList linked-list.
return move0(chunk);
}
return true;
}
Copy the code
conclusion
This is a small step towards Netty memory and a big step towards becoming familiar with Netty memory. 2333, hope to give you a general understanding of the entire memory flow through the analysis of specific classes and structures. Once we’re familiar with the process, we’ll get into the details.
My official account
Search the trail