Java-Bang

Focus on system architecture, high availability, high performance, high concurrency technology sharing

You’ve probably heard the saying that what’s free is the most expensive.

The automatic memory management of the Java virtual machine gives the garbage collector the memory that would otherwise be collected manually by the developer. However, since it is an automatic mechanism, it cannot be as accurate and efficient as manual recycling [1], and it will also bring many problems related to the implementation of garbage recycling.

In the next two articles, we’ll take a closer look at the garbage collector in the Java Virtual Machine. In today’s article, we review the basics of garbage collection.

Reference counting and accessibility analysis

Garbage collection, as the name suggests, is to reclaim memory that has been allocated but is no longer used so that it can be re-allocated. In the context of the Java Virtual Machine, garbage refers to the heap space occupied by dead objects. Here comes a key question: how do you tell whether an object is alive or dead?

Let’s start with reference counting, an ancient discrimination method. It does this by adding a reference counter to each object, which counts the number of references to that object. Once an object’s reference counter is 0, the object is dead and can be reclaimed.

The implementation looks like this: if a reference is assigned to an object, the reference counter for that object is +1. If a reference to an object is assigned to some other value, then the reference counter of that object is -1. That is, we need to intercept all reference update operations and increase or decrease the target object’s reference counter accordingly.

In addition to the extra space required to store counters and tedious update operations, reference counting has a major flaw in its inability to handle circular reference objects.

For example, suppose that objects A and B refer to each other, and there are no other references to either a or B. In this case, A and B are actually dead, but since neither of their reference counters is zero, they are still alive in the mind of reference counting. As a result, the space occupied by these circular reference objects is not recoverable, causing a memory leak.



The current mainstream Java virtual machine garbage collector takesAccessibility analysis algorithm. The essence of this algorithm is to take a series of GC Roots as the initial live set of living objects, and then start from the set to explore all objects that can be referenced by the set and add them to the set. This process is also calledMark. Ultimately, unexplored objects are dead and recyclable.

So what are GC Roots? Generally speaking, GC Roots include (but are not limited to) the following:

  • Local variables in Java method frames;
  • Static variables of loaded classes;
  • JNI handles;
  • A Java thread that has been started and not stopped.

Reachability analysis can solve the problem of circular reference which reference counting method cannot solve. For example, even if objects A and B refer to each other, reachable analysis does not add them to the collection of living objects as long as they cannot be reached from GC Roots.

Although the algorithm of accessibility analysis itself is very simple, there are still many other problems to be solved in practice.

For example, in a multithreaded environment, other threads may update references in objects that have already been accessed, resulting in false positives (setting references to null) or false positives (setting references to objects that have not been accessed).

False positives do no harm, and the Java virtual machine at most loses some garbage collection opportunities. Missing is more troublesome because the garbage collector may reclaim the memory of objects that are actually still being referenced. Once an object that has been reclaimed is accessed from the original reference, it is likely that the Java virtual machine will crash.

Stop-the-world and safe spots

How to solve this problem? In the Java virtual machine, the traditional garbage collection algorithm takes a crude approach of stop-the-world, stopping the work of other non-garbage collection threads until garbage collection is complete. This causes garbage collection to be called GC pauses.

Stop-the-world in Java virtual machines is implemented through the Safepoint mechanism. When the Java virtual machine receives a stop-the-world request, it waits for all threads to reach a safe point before allowing the thread requesting stop-the-world to work exclusively.

The blog post [2] also offers an alternative interpretation: safe words. Once the garbage collector thread calls out the safe word, the other non-garbage collector threads stop.

Of course, the initial purpose of a safe point is not to stop other threads, but to find a stable state of execution. In this execution state, the Stack of the Java virtual machine does not change. This way, the garbage collector can “safely” perform the reachability analysis.

For example, when a Java program executes native code through JNI, if the code does not access Java objects, call Java methods, or return to the original Java methods, the Java virtual machine stack does not change, which means that the native code can act as the same safe point.

As long as you don’t leave this safe point, the Java virtual machine can continue to run this native code while garbage is collected.

Because the native code needs to use JNI’s API to do all three of these operations, the Java virtual machine simply does a safepoint poll at the entry point of the API to test for other thread requests to stay in the safepoint, and can suspend the current thread if necessary.

In addition to executing JNI native code, Java threads have several other states: interpreting execution of bytecode, executing just-in-time compiler-generated machine code, and thread blocking. Blocked threads are safe points because they are under the control of the Java virtual machine thread scheduler.

The other states are running states that require the VM to enter a safe point within a foreseeable period of time. Otherwise, garbage collection threads may be waiting for a long time for all threads to enter the safe point, thereby increasing the pause time for garbage collection.

For interpretation execution, the bytecodes are safe points in between. The Approach taken by the Java virtual machine is to perform a security point detection by executing a bytecode when there is a security point request.

Executing the machine code generated by the just-in-time compiler is more complicated. Since this code runs directly on the underlying hardware and is not controlled by the Java VIRTUAL machine, the just-in-time compiler needs to insert safety-point checks when generating machine code to avoid the machine code going without safety-point checks for a long time. The HotSpot virtual machine does this by inserting security point checks at the method exit of the generated code and at the back-edge of the non-counting loop.

So why not insert safety point checks at every machine code or base block of machine code? There are two main reasons.

First, safety point detection itself also has some overhead. But the HotSpot VIRTUAL machine has reduced the detection of safety points in machine code to a memory access operation. In the case of a safe-point request, the Java virtual machine makes the page of the memory visited by the safe-point detection unreadable and defines a SegFault handler that intercepts and suspends the threads triggering a Segfault by accessing that unreadable memory.

Second, the just-in-time compiler generates machine code that disrupts the distribution of objects on the original frame. When entering the safe point, the machine code also needs to provide some additional information indicating which registers, or memory Spaces on the current stack frame, hold references to objects so that the garbage collector can enumerate GC Roots.

Since this information requires a lot of space to store, the just-in-time compiler tries to avoid too much security point detection.

However, different just-in-time compilers may insert safety point detection at different locations. Graal, for example, inserts a safety point detection at the loop back edge of the counting loop in addition to the above locations. Other virtual machines may also select the method entry rather than the method exit to insert security point detection.

In any case, the goal is to indirectly reduce the garbage collection pause time by avoiding machine code being left out of the safe point for too long, within acceptable performance overhead and memory overhead.

In addition to garbage collection, other Java virtual machine operations that require consistency of stack contents also use the safety point mechanism. I’ll talk more about that when I get to it.

Three ways to recycle garbage

When all live objects are marked, we can recycle dead objects. There are three main types of basic recycling.

The first is aTo sweep, i.e.,Mark the memory occupied by dead objects as free memory and record it in a free list. When a new object needs to be created, the memory management module looks for free memory from the free list and allocates it to the new object.



The principle behind cleaning up this collection is extremely simple, butThere are two disadvantages. One is to be able toMemory fragmentation. Because objects in the Java virtual machine heap must be continuously distributed, there can be extreme cases where the total free memory is sufficient, but cannot be allocated.

The other is less efficient allocation. If we are a contiguity of memory, we can allocate it around pointer addition. With the free list, the Java virtual machine needs to access the list items one by one to find free memory that can fit into the newly created object.

The second isCompact, i.e.,Gather the living objects to the beginning of the memory area, leaving a contiguous memory space. This approach can solve the memory fragmentation problem, butThe cost is the performance overhead of the compression algorithm.



The third isCopy, i.e.,The memory region is divided into two equal parts, maintained with two Pointers to from and to, and allocated with only the memory region pointed to by the FROM pointer. When garbage collection occurs, surviving objects are copied to the memory region pointed to by the TO pointer, and the contents of the FROM and TO Pointers are swapped. Copying this recycle method can also solve the problem of memory fragmentation, but itsdisadvantagesIt is also extremely obvious thatHeap space is extremely inefficient.



Of course, modern garbage collectors tend to combine these methods, synthesizing their advantages while avoiding their disadvantages. In the next article, we’ll look at the implementation of the garbage collection algorithm in Java Virtual machines.