Make writing a habit together! This is the 13th day of my participation in the “Gold Digging Day New Plan · April More Text Challenge”. Click here for more details

This series: The JVM column

preface

Earlier said that the JVM is a big advantage is that it is equivalent to a hosting platform, to deal with some of the more complex logic, such as memory management, automatic memory management of the JVM will originally need by the developer manual recovery of memory, to the JVM garbage collector to automatically recycled, so this article is to explore the knowledge of the garbage collection mechanism.

The body of the

Garbage collection, as the name implies, is the collection of allocated memory that is no longer used so that it can be allocated again. In the JVM, garbage is the heap space taken up by a dead object, and the key here is how do you tell that object is dead

Reference counting method

Let’s start with an ancient method of identification: reference-counting. It does this by adding a reference counter to each object, which counts the number of references to the object. Once an object’s reference counter is zero, the object is dead and needs to be reclaimed.

This design looks good, but difficult to implement, such as a reference, assigned to a certain object, then the object reference counter plus 1, but if the reference point to another object, then the object reference counter minus one, which is to intercept all the references updates, this will be a huge workload.

It also requires extra space to store counters, but the biggest problem is that reference counting has a huge flaw in its ability to handle objects that are referred to in a loop.

That is, suppose objects A and B refer to each other, and there are no other references to a and B. In this case, A and B are actually dead, but since the reference counter is not zero, they cannot be treated as garbage.

Accessibility analysis

At present, the mainstream garbage collection algorithm adopted by JVM is the reachability algorithm, which is the idea of: Take a series of GC Roots as an initial live set, and then start from that set and explore all objects that can be referenced by that set and add them to that set and, this process is called marking, and the unexplored objects are garbage.

GC Roots refers to the reference from the heap to the heap. Generally speaking, GC Roots include but are not limited to the following:

  • Local variables in Java method stack frames; (Data in Java method stack)
  • Static variables of loaded classes; (Data in method area)
  • JNI handles; (Local method stack data)
  • A Java thread that has been started and not stopped.

Reachability analysis can solve the problem of circular reference that reference counting method cannot solve.

Although accessibility analysis is relatively simple, there are still many problems to be solved. For example, in a multi-threaded environment, when you first see an object that is unreachable and needs to be collected, the garbage collector thread starts working on a reference to the object from another thread, but there is no synchronization, the garbage collector mechanism will still collect the object, possibly causing the JVM to crash.

Stop-the-world and safe spots

The traditional garbage collection algorithm in the JVM takes a simple and crude approach: stop-the-world, which means to Stop the work of other non-garbage collection threads until the garbage collection is complete. This is known as GC pause.

Stop-the-world in the JVM is implemented through the safepoint mechanism, where when the JVM receives a stop-the-world request, it waits for all threads to reach the safepoint before allowing the garbage collector thread to work exclusively.

So why this safe point? The point of safety is not to stop other threads, but to find a stable state where the stack in the JVM does not change so that the garbage collector can perform the reachable analysis correctly.

For example, when a Java program executes native code through JNI, if the code does not access Java objects, call Java methods, or return to Java methods, the JVM stack does not change, which means that the native code is also a safe point of execution at the same time as garbage collection. You can continue to execute this native code.

In addition to executing JNI native code, Java threads have several other states: interpreting execution of bytecode, executing just-in-time compiler-generated machine code, and thread blocking. The blocked thread is in the control of the JVM program scheduler and is definitely a safe point. The other states are running states, so the JVM needs to reach the safe point at a foreseeable time. Otherwise, waiting for all threads to enter the safe point for a long time will increase the pause time for garbage collection.

For interpretation execution, both bytecode and bytecode can be used as safe points, because each bytecode instruction is executed and the stack change is completed, so when a safe point is requested, a bytecode is executed to enter the safe point state.

Executing machine code generated by the just-in-time compiler is more complicated. Since the code runs directly on the underlying hardware and is not controlled by the JVM, the just-in-time compiler needs to insert safety-point detection when generating machine code to prevent the machine code from being without safety-point detection for a long time. The current JVM approach for HotSpot is to insert security point checks at the method exit of the generated code.

Why not insert security point detection into every machine code or block of machine code? There are two reasons:

  • The first is that safe point detection has some overhead. Although the JVM has reduced the safe point detection in machine code to a memory access operation, that is, in the case of a safe point request, the JVM will make the page of the memory visited by the safe point detection unreadable and set up a SegFault processor. Intercept the threads that trigger a Segfault by accessing this unreadable memory and suspend them.

  • The second is that the machine code generated by the just-in-time compiler will disrupt the distribution of objects on the original stack frame. When entering the safe point, the machine code also needs to provide some additional information indicating which registers or memory Spaces on the current stack frame hold references to objects. So that the garbage collector can enumerate GC Roots(because the local variable in the stack frame is GC Roots).

Because these information need a lot of space to store, so try to avoid too much security point detection.

Three ways to recycle garbage

Having said how to identify garbage and the thread synchronization problems it causes, let’s look at how to recycle. When all the living objects are marked, the other objects are dead objects, and we can recycle them. Currently, there are three main recycling methods.

Mark clear

The first method is to sweep, which marks the memory occupied by dead objects as free memory and records it in a free list. When new objects need to be created, the memory management module changes back to finding free memory from this free list and allocates it to new objects.

This approach is simple enough, but the drawbacks are obvious: first, memory fragmentation, and since objects in the JVM heap must be continuously distributed, there may be enough total free space that cannot be allocated. Second, allocation efficiency is low. If it is a continuous memory space, we can use pointer bumping to allocate it; however, for the idle list, THE JVM needs to access the items in the list one by one to find the free memory that can be put into new objects.

Mark compression

The second is compact, which aggregates living objects to the beginning of the memory region, leaving a contiguity of memory space. This approach solves the memory fragmentation problem, but at the expense of the performance overhead of the compression algorithm.

Mark copy

The third method is copy, which first divides the memory area into two equal parts, respectively uses two Pointers from and to to maintain, and only allocates memory to the memory area pointed to by the FROM pointer. When garbage collection occurs, the surviving objects are copied to the memory region pointed to by the TO pointer, and the FROM and TO Pointers are swapped. Tag copy can also solve the problem of memory fragmentation, but the disadvantage is also obvious, that is, heap space utilization efficiency.

The above three methods have advantages and disadvantages, and modern garbage collectors combine these methods. In the next article, we’ll take a closer look at the implementation of garbage collection algorithms in the JVM.

conclusion

In this article we learned how to mark an object as not garbage, and how the JVM uses safety points to implement stop-the-world operations in order to prevent stack changes during marking, and three ways to reclaim dead objects: This can result in memory fragmentation cleaning, performance-expensive compression, and inefficient heap replication.

In the next article, we will explain how to combine their advantages and avoid their disadvantages to implement a better garbage collection algorithm.