“This is the 28th day of my participation in the August Gwen Challenge.

🔉 introduction

ZGC is also a low-latency garbage collector developed by Oracle. The goal of ZGC and Shenandoah is the same, but the design is completely different. We can use some technical terms to define the characteristics of ZGC: This is a garbage collector based on Region memory layout, no generation, using read barrier, dye pointer and memory multiple mapping technology to achieve concurrent tag collation algorithm, with low latency as the primary goal.

Shenandoah is based on a Region. Shenandoah is based on a Region. Shenandoah is based on a Region. ZGC regions have dynamic features, such as dynamic Region creation, dynamic Region destruction, and dynamic Region capacity size. ZGC regions can have large, medium, and small capacities.

  • smallRegionCapacity:2M, can be stored less than256kbThe object of
  • mediumRegionCapacity:32M, can deposit256kb-4mThe object of
  • largeRegion: Dynamically variable capacity for storage4MThe above objects must be of size2Integer multiples of each largeRegionOnly one large object will be stored, and largeRegionIt doesn’t redistribute because it replicates too muchRegionThe cost is very high.

As shown in figure:

Concurrent finishing

Shenandoah’s concurrent collation implementation used forward Pointers and read barriers, but ZGC also uses read barriers in a more sophisticated way.

Dyeing pointer

Previously, we needed to record some additional data for use by the virtual machine or collector and stored it in the header, such as generational age, lock record, hash code, etc. This approach is fine if the object is accessible, but what if the object is not? We can’t do that if the object is moved, or if we don’t want to access the object to get some additional information at all.

So we can think about, can we get this extra information from Pointers to objects or some other place in memory? For example, the tricolor tag we mentioned earlier is some salient features of the tag object. Where is the tag? Mark it on the head? Is the tag recorded on a data structure independent of the object? That seems to be all right, so which one is the ZGC solution?

ZGC’s dye pointer is the most direct and pure way to record markup information directly on a reference pointer to an object, so why can the pointer itself also store additional information? We know that a 64-bit system takes up 2 to the power of 64 bytes, but in reality there are no requirements, performance, or cost considerations. For example, the 64-bit Linux operating system supports 47 bits of virtual process space and 46 bits of physical address space.

The top 18 bits of the 64 bits of Linux cannot be addressed, and the remaining 46 bits have 64 terabytes of memory that are targeted by the coloring pointer. The top 4 bits are extracted to record 4 flags. With this flag information, The virtual machine can see the tricolor status of the reference object from the pointer, whether it can be reassigned, and whether it can be accessed through finalize() method. Since there are already 4 bits, the ZGC can manage less than 4TB of memory, as shown in the figure below.

advantage

Although the dye pointer has a 4TB memory limit and does not support compression Pointers, the benefits are considerable. The designers of ZGC state three advantages of the dye pointer design:

One is that once you use the dye pointer, the live object of a Region can be reclaimed as soon as it is moved, rather than waiting until the reference is updated, as Shenandoah does. I’ll explain why later.

Second is sharply reduce the number of memory barriers when recycling, write barriers is recording changes in some object references, if put this record on dyeing pointer, then use the write barriers, save a part of the memory barrier, is helpful for the efficiency of the program is running, so ZGC affect throughput is particularly low.

The third is that the dye pointer is an extensible data structure, which means that it can record more data about object markers and relocation processes. Linux 64-bit Pointers still have the first 18 bits unused, which cannot be used for addressing, and can also be developed for recording, freeing up the 4 flag bits already used. Expand the maximum heap memory of the ZGC to 64TB, and you can also set some trace information so that each move can move low-frequency access objects to less frequently accessed memory regions.

Existing problems

In order to make use of the color pointer, you have to think about whether the operating system will allow you to arbitrarily change a few bits of some Pointers in memory. Does the processor agree? This is a very practical problem, just like the blind date, you want to go with their beautiful girl together, it is necessary to ask the parents whether they agree. The same is true for our programs, because programs are ultimately converted into machine instruction streams to be run by the processor. The processor, however, treats the entire pointer as a memory address, regardless of which ones are flags and which ones are addressed. How did ZGC designers solve this problem? That’s the memory mapping we mentioned below.

The memory mapping

Mention memory mapping, we have to mention x86 computer of ancient times, ancient times because we are poor, so can only let all processes are all squeezed in a memory space, the different processes of memory nature cannot be isolated, that when a process pollution after the other process, you can only rely on reset to recover. So how can we solve this problem?

Since Intel’s 80386 processor, protected mode has been provided to isolate processes, and the processor has provided paging management to separate the linear address space from the physical address space into small blocks of equal size called pages. Then we can create a mapping table between the thread virtual address page and the physical address page, so that we can complete the thread address to the physical address conversion.

The Mapping can be one-to-one, one-to-many, or many-to-one, depending on the actual design. ZGC on Linux/x86-64 uses muti-mapping to map multiple virtual memory addresses to the same physical memory address. This is a many-to-one Mapping. The marker bits in the coloring pointer are regarded as segments of the address, and then these segments are mapped to the same physical space. After multiple mapping transformation, the coloring pointer can be used to address normally. The diagram below:

📝 digression

In the 20th century, and in the 19th and 18th centuries, it was widely believed that there was an irreconcilable contradiction between knowledge and faith. The view, prevailing among some prominent figures, is that the time has come for faith to be increasingly replaced by knowledge; Belief without the support of knowledge is superstition and must be opposed. According to this view, the sole function of education is to open the way to thought and knowledge, and schools, as the outstanding institutions in which people teach, must serve this purpose entirely.