Today, another dry day. This is the first in a series of the most hardcore JVMS on the web, starting with TLAB. Since the article is very long and everyone has different reading habits, it is hereby divided into single and multiple editions

  • TLAB Analysis of the most hardcore JVM in the web (single version does not include additional ingredients)
  • TLAB analysis of the most core JVM in the whole network 1. Introduction of memory allocation idea
  • TLAB analysis of the most core JVM in the entire network 2. TLAB life cycle and problem thinking
  • 3. JVM EMA expectation algorithm and TLAB related JVM startup parameters
  • 4. Complete analysis of TLAB basic process
  • Network most core JVM TLAB analysis 5 TLAB source code full analysis
  • 6. TLAB related hot Q&A summary
  • 7. Tlab-related JVM log parsing
  • 8. Monitor TLAB through JFR

8.TLAB basic process

8.0. How to design the TLAB size per thread

Before, we mentioned the problems and solutions to introduce TLAB. According to these, we can design TLAB in this way.

First, the initial size of TLAB should be related to the number of threads per GC that need to be allocated to objects. However, the number of threads to be allocated is not necessarily stable. It is possible that the number of threads in this period is high, and the number of threads in the next stage is not so high. Therefore, the algorithm of EMA needs to collect the number of threads that need to be allocated by objects in each GC to calculate the expected number of threads.

Then, our best case is that within each GC, all the memory used to allocate objects is in the TLAB of the corresponding thread. The amount of memory used to allocate objects in each GC is essentially the Eden zone size in terms of JVM design. In the best case, it is best to only GC when Eden zone is full, and there is no other cause for GC, which is the most efficient case. The Eden zone is used up. If all the TLAB allocations are within TLAB, the Eden zone is occupied by all the Tlabs of threads. This is the fastest allocation.

Then, the number of threads and size of memory allocated by GC in each round are not certain. If a large chunk is allocated at once, waste will be caused; if it is too small, TLAB will be frequently applied from Eden, reducing efficiency. This size is difficult to control, but we can limit the maximum number of TLAB requests per thread from Eden in a round of GC, so that users can have better control.

Finally, the amount of memory allocated per thread is not always stable in each GC round, and using only the initial size to guide subsequent TLAB sizes is clearly not enough. In another way, the memory allocated by each thread has a certain relationship with the history, so we can infer from the history allocation, so each thread also needs to use EMA’s algorithm to collect the memory allocated by this thread each TIME to guide the next expected TLAB size.

To sum up, we can get such an approximate TLAB calculation formula:

Initial TLAB size of each thread = Eden zone size/(Maximum number of TLAB requests from Eden in a single GC round * Number of threads currently allocated by GC EMA)

After GC, re-calculate TLAB size = Eden zone size/(maximum number of TLAB requests from Eden in a single GC round * Number of threads currently allocated by GC EMA)

Next, let’s analyze each process of the TLAB lifecycle in detail.

8.1. TLAB initialization

When a thread is initialized, if TLAB is enabled by the JVM (it is enabled by default and can be turned off by -xx: -usetlab), then TLAB is initialized and TLAB memory is requested for the desired size when object allocation occurs. At the same time, TLAB memory will be reapplied when the thread first tries to allocate an object after GC scanning has occurred. Let’s focus on initialization first. The initialization process is shown in Figure 08:

The TLAB initial expected size is calculated during initialization. This involves the limitation of TLAB size:

  • Minimum size of TLABThrough:MinTLABSizeThe specified
  • The maximum size of TLAB varies with GC. In THE G1 GC, it is the size of humongous Object, that is, half of the size of G1 region. As mentioned at the beginning, in the G1 GC, large objects cannot be allocated in TLAB, but are old. The ZGC is 1/8 of the page size, similarly the Shenandoah GC is 1/8 of each Region size in most cases. They all expect at least seven out of eight areas to be unbacked to reduce the scanning complexity of selecting csets. For other GCS, it is the maximum size of an int array. This is related to padding dummy Object mentioned earlier, and the details are covered later.

In the subsequent process, whenever the TLAB size is within the range from the TLAB minimum size to the TLAB maximum size, we will not emphasize this restriction to avoid being verbose ~~~! In the subsequent process, whenever the TLAB size is within the range from the TLAB minimum size to the TLAB maximum size, we will not emphasize this restriction to avoid being verbose ~~~! In the subsequent process, whenever the TLAB size is within the range from the TLAB minimum size to the TLAB maximum size, we will not emphasize this restriction to avoid being verbose ~~~! Say the important things three times

Desired SIZE of TLAB is calculated during initialization, and the desired size of TLAB must be re-calculated after TLAB is reclaimed by GC and other operations. According to this expected size, TLAB will use this expected size as the benchmark space for each application as TLAB allocation space.

8.1.1. TLAB Initial expected size calculation

As shown in Figure 08, if TLABSize is specified, this size is used as the initial expected size. If not specified, the following formula is used:

Total size of heap space for TLAB /(Currently valid number of allocated threads expected * Refill times configured)

  1. Total size of heap space for TLAB: How much space on the heap can be allocated to TLAB, different GC algorithms vary, butMost GC algorithms are implemented with Eden zone size, such as:
    1. The traditional deprecated Parallel Scanvage is the Eden zone size. Reference: parallelScavengeHeap. CPP
    2. The default G1 GC is (number of YoungList regions minus number of Survivor regions) * region size, which is actually the Eden region size. Reference: g1CollectedHeap. CPP
    3. In ZGC is the size of the remaining Page space. Page is similar to the Eden zone, where most objects are allocated. Reference: zHeap. CPP
    4. Shenandoah GC is the size of FreeSet, also similar to Eden’s concept. Reference: shenandoahHeap. CPP
  2. Number of currently effectively allocated threads Expectation: This is a global EMA, which is a way to calculate the expectation as described earlier. The minimum weight for the EMA to effectively allocate the number of threads is TLABAllocationWeight. Number of valid allocated threads EMA is collected when a thread makes the first valid object allocation. This value is read during TLAB initialization to calculate the expected TLAB size.
  3. TLAB Refills Time: The number of times calculated according to TLABWasteTargetPercent. The meaning of TLABWasteTargetPercent is to limit the maximum wasted space limit. Why the number of refills is related to this will be discussed later.

8.1.2. TLAB initial allocation ratio calculation

As shown in Figure 08, the TLAB initial allocation ratio is then calculated.

Thread-private allocation ratio (EMA) : corresponds to the number of effectively allocated threads (EMA), which is a description of how much TLAB each thread should occupy globally, while the allocation ratio (EMA) is a dynamic control of the total TLAB space that the current thread should occupy.

When initialized, the allocation ratio is equal to 1/ the number of currently valid allocated threads. The formula in Figure 08, substituted into the previous formula for calculating the expected size of TLAB, is 1/ the number of currently effectively allocated threads. This value is used as an initial value to collect allocation ratios such as the thread-private EMA.

8.1.3. Clear thread private statistics

These collection data will be used for subsequent calculation and collection of the allocation ratio of the current thread, thereby affecting the TLAB expected size of the current thread.

8.2. TLAB allocation

The TLAB allocation process is shown in Figure 09.

8.2.1. Current TLAB allocation from threads

If TLAB is enabled (by default, it can be disabled by -xx: -usetlab), then memory is allocated from the current TLAB of the thread first. If the allocation is successful, the memory will be returned. Otherwise, different allocation strategies will be implemented based on the current remaining TLAB space and the current maximum wasted space limit size. In the next process, it will be mentioned exactly what this limitation is.

8.2.2. Reapply for TLAB assignment

If the current remaining TLAB space is greater than the current maximum wasted space limit (according to the process in Figure 08, we know that the initial value is the expected size /TLABRefillWasteFraction), allocate it directly on the heap. Otherwise, reapply for a TLAB assignment. Why the maximum waste of space?

When a TLAB is reallocated, there may be room left in the original TLAB. Dummy Object needs to be filled before the original TLAB is thrown back into the heap. Since TLAB only knows what is allocated within the thread and returns to the Eden zone when GC scanning occurs, if it is not filled, the outside does not know which part is used and which part is not, so additional checks need to be made. If the object that is confirmed to be reclaimed is filled, that is, dummy Object. GC will directly mark and skip this memory to increase scanning efficiency. Anyway, this memory already belongs to TLAB, other threads will not be able to use it until the end of the next scan. Our dummy Object is an int array. In order to ensure that there is space to fill dummy Object, the TLAB size is usually reserved for a dummy Object header, which is also an int[] header, so the TLAB size must not exceed the maximum size of the int array. Otherwise, you cannot fill the unused space with dummy Objects.

However, filling a dummy also creates a waste of space, which cannot be too much, so the maximum waste space limit is used to limit this waste.

The new TLAB size takes the smaller of the following two values:

  • The remaining heap space allocated to TLAB, and most GC implementations are actually the corresponding Eden space:
    • The traditional deprecated Parallel Scanvage is the Eden zone remaining size. Reference: parallelScavengeHeap. CPP
    • By default, the G1 GC is the remaining Region size of the current Region, which is actually the Eden partition. Reference: g1CollectedHeap. CPP
    • In ZGC is the size of the remaining Page space. Page is similar to the Eden zone, where most objects are allocated. Reference: zHeap. CPP
    • Shenandoah GC is the residual size of FreeSet, also similar to Eden’s concept. Reference: shenandoahHeap. CPP
  • TLAB Expected size + Current space size to be allocated

After the TLAB is allocated, the ZeroTLAB configuration determines whether to assign each byte to 0. When you create an object, you would assign an initial value to each field. Most fields are initialized to 0, and when TLAB returns to the heap, the remaining space is also filled with an int[] array of 0’s. So you can actually fill it ahead of time. In addition, when TLAB is first allocated, assigning 0 can also use the Allocation prefetch mechanism to accommodate CPU cache rows (the Allocation Prefetch mechanism will be described in another series). So you can assign 0 immediately after TLAB space is allocated by opening ZeroTLAB.

8.2.3. Allocate directly from the heap

Allocating directly from the heap is the slowest way to allocate. In one case, if the current TLAB free space is greater than the current maximum wasted space limit, allocate it directly on the heap. In addition, the current maximum wasted space limit will be increased, each time such allocation will increase the size of TLABWasteIncrement, so that after a certain number of direct heap allocations, the current maximum wasted space limit will keep increasing, resulting in the current TLAB remaining space is less than the current maximum wasted space limit. Then apply for a new TLAB for allocation.

8.3. Expected size of TLAB collection and recalculation during GC

The process is shown in Figure 10, with some operations performed on TLAB before and after GC.

8.3.1. Operations before GC

Before GC, if TLAB is enabled (it is enabled by default and can be turned off by -xx: -Usetlab), then the TLAB fill dummy Object for all threads needs to be returned to the heap and something is calculated and sampled for later TLAB size calculations.

First of all, in order to ensure the reference significance of this calculation, it is necessary to judge whether more than half of the TLAB space on the heap has been used. If the assumption is insufficient, then the data of this round of GC is considered to have no reference significance. If more than half is used, then calculate the new allocation ratio. The new allocation ratio is equal to the size of the thread’s GC allocation in this round/the TLAB space used by all threads on the heap. This calculation is mainly because the allocation ratio describes the proportion of the current thread’s TLAB space on the heap, which is different for each thread. This ratio is used to dynamically control the TLAB size of different business threads.

The size of thread GC allocation space in this round includes that allocated in TLAB and that allocated outside TLAB. It can be seen from the records of thread allocation space in thread records in the flow chart of Figure 8, Figure 9 and Figure 10. The size of the read-out thread allocation minus the size of the thread allocation at the end of the last GC round is the size of the thread’s current GC allocation.

Finally, the current TLAB is filled with dummy Object and returned to the heap.

8.3.2. Operations after GC

If TLAB is enabled (which is enabled by default and can be turned off by -xx: -usetlab) and TLAB is variable (which is enabled by default and can be turned off by -xx: -Resizetlab), then the expected TLAB size for each thread is recalcalculated after GC. New expected size = total heap space for TLAB * current allocation ratio EMA/Refill times configuration. It then resets the maximum wasted space limit to the current desired size/TLABRefillWasteFraction.