What is the garbage collection mechanism? Why should we learn about garbage collection? Let’s take a look at these two questions today.

In our daily development process, we do not pay too much attention to the collection and release of objects, the JVM can help us to complete the garbage, reduce a lot of our work, as if garbage collection is far away from us, in fact, garbage collection mechanism is we must master from the primary to the middle and advanced development. Leaving the task of collecting objects entirely to the JVM seems to be liberating, but in fact it also increases uncertainty. Things are not perfect all the time. In today’s complex business scenarios, inappropriate garbage collection algorithms and strategies are often the main cause of performance bottlenecks in our systems.

Different service scenarios take different measures for garbage collection. If a service scenario has high requirements on memory, the garbage collection efficiency needs to be improved. If the CPU usage is high, the garbage collection frequency needs to be reduced.

As we all know, with multiple regions in the JVM memory, main is to look at the pile of garbage collection and methods of the memory, because other areas such as the program counter, virtual machine and the local method stack certainty and other areas of memory, so we should focus mainly on the object in the heap of recycled waste constants and methods in recycling.

How does the JVM determine that an object is recyclable?

When you first learned about garbage collection, you probably heard that objects can be collected if they are not referenced. However, there are two main ways to determine whether an object is referenced or not: reference counting algorithm and reachability analysis algorithm.

Reference counting algorithm: the so-called reference counting algorithm is based on the reference counter of an object to determine whether the object is referenced. When the object is referenced, the counter increases by 1, and the reference failure counter decreases by 1. A counter value of 0 indicates that the object is not referenced and can be reclaimed by the JVM. It should be noted that although the reference counting algorithm is simple to implement, it will have the problem of circular reference.

Accessibility analysis algorithm: The reachabability analysis algorithm is based on GC Roots, which is the tail of all objects. When the JVM loads, it creates objects that refer to normal objects, and these objects serve as the starting point for these normal objects. During garbage collection, the JVM searches down from GC Roots. If an object is not connected to GC Roots by any reference chain, then the object is recyclable.

How does the garbage collection thread reclaim objects?

The JVM’s ability to reclaim objects follows two main characteristics: automaticity and unpredictability.

Automaticity: The JVM creates a system-level thread to track each allocated chunk of memory, automatically checks each allocated chunk of memory when the JVM is free, and then automatically reclaims each chunk of memory.

Unpredictability: Unpredictability mainly refers to whether an object is recycled immediately when it is not referenced. The answer is location, it may be recycled immediately, or it may remain in memory for a long time.

GC algorithm

The JVM provides a variety of garbage collection algorithms to implement the garbage collection mechanism. Generally speaking, there are four common garbage collector collection algorithms in the market:

Mark-sweep algorithm

Advantages: No need to move objects, simple and efficient

Make sure: The mark-sweep process is inefficient and generates memory fragmentation.

Copying algorithms

Advantages: Simple and efficient, no memory fragmentation

Disadvantages: Low memory usage and the possibility of frequent replication.

Mark-compact algorithm

Advantages: No need to move objects, high efficiency, and no memory fragmentation

Disadvantages: Local objects need to be moved

Gennerational Collection

Advantages: Partition reclamation

Disadvantages: Poor recovery of long-lived objects.

Now that we know about the four garbage collector collection algorithms, let’s take a look at the collectors that are implemented based on these algorithms and briefly introduce some common ones:

Collector type Recovery algorithm The characteristics of
Serial New/Serial Old collector Copy algorithm/mark-collation algorithm Single thread copy collection, simple and efficient, will suspend the program and cause stuttering.
ParNew New/ParNew Old collector Copy algorithm/mark-collation algorithm Multithreaded copy recycling, improved efficiency, reduced pause time, but increased thread context switching
Parallel recycle Replication algorithm Parallel collector, high throughput, high CPU utilization
CMS collector Flag = cleanup algorithm Older collectors, high concurrency, less latency, the shortest GC collection pause time, but high CPU usage.
G1 collector Mark-tidy + copy algorithm High concurrency, low latency, predictable pause times

How do you measure GC performance?

Garbage collectors come in all shapes and sizes, and different scenarios are suitable for different collectors. The selection of the right garbage collector depends on three garbage collector metrics: throughput, holdout time, and garbage collection frequency.

Throughput: The ratio of the time spent by system applications to the total elapsed time of the system. Throughput of GC = TOTAL ELAPSED time of GC/total elapsed time of the system. GC throughput is generally not less than 95%.

Holdup time: The holdup time is the time the application pauses while the garbage collector is working. Generally, the serial collector has a long time to catch up, while the concurrent collector has a short time to catch up because the collector and the application run alternately, but the efficiency is not as good as serial, and the system throughput will decrease.

Garbage collection frequency: The time of garbage collection frequency and the time of garbage collection are mutually affected, we can reduce the frequency of garbage collection by increasing the memory, but with the increase of memory, the accumulation of objects will be more, when garbage collection, the time of garbage collection will increase. So we need to increase memory to the extent that we can ensure a normal garbage collection frequency.

How do I view and analyze GC logs?

We need JVM parameters to set up GC logs. We need to pay attention to the following parameters:

-xx :+PrintGCDetails # PrintGCTimeStamps # PrintGCDateStamps # PrintGCTimeStamps # PrintGCDateStamps -xx :+PrintGCDateStamps # PrintGCTimeStamps # PrintGCDateStamps -xx :+PrintGCDateStamps # PrintGCTimeStamps # PrintGCDateStamps -xx :+PrintHeapAtGC # Print heap information before and after GC -Xloggc:.. /logs/gc.log # Log file output pathCopy the code

We can configure parameters as required, and the printed log, such as the following figure:

We can use Notepad to open and view the GC logs of a very short time. If it is to analyze the GC logs of a long time, it is a little difficult to open and view the GC logs with Notepad, so we need to use tools to analyze the logs. In general, it is easy to use GCViewer to open the log files and view the GC performance graphically. Through the tool we can see throughput, lag time, GC frequency, it is very intuitive to see the performance of GC.

GCeasy is also a more useful GC log analysis tool, just need to compress the log file, upload the official website can be analyzed online, here is the result of using a local GC log analysis:

The GC tuning

After analyzing the GC logs to find out the problems affecting performance, it’s time to focus on tuning. A few common tuning strategies are briefly introduced, mainly reducing the frequency of Minor GC and Full GCd.

Reduce the Minor GC frequency

First of all, Minor GC is mainly aimed at object collection in Eden area. Since the space of the Cenozoic generation is generally small, Eden area will be full if it is very large, leading to a high frequency of Minor GC. Our solution is to increase the space of the Cenozoic generation to reduce the frequency of Minor GC. In the previous section on measuring GC performance metrics, we mentioned that increasing memory increases the lag time at collection time. Minor GC can also cause the application to freeze, but only for a very short time, so whether expanding the Eden region will cause the Minor GC to increase in time remains to be seen what happens in the next Minor GC.

Each Minor GC does two main things, scanning the new generation (A) and copying the living objects (B). The time consuming of copying objects is much higher than that of scanning objects. For example, if an object lives for 500ms in Eden, the frequency of Minor GC is 300ms once. Normally, the time of A+B is used in A Minor GC. At this time, we can expand Eden by analyzing GC logs. At this time, the object has been reclaimed in Eden area, so there is no need to copy the object, which saves the time of copying the living object. In this Minor GC, only the time of scanning the new generation is increased.

Summary: The duration of a single Minor GC depends more on the number of objects alive after GC than on the size of the Eden region. If there are more long-lived objects in the heap, the time for a single Minor GC will increase. If there are more short-lived objects in the heap, the time for a single Minor GC will not increase significantly, and the frequency of Minor GC will decrease.

Reduce the Full GC frequency

Full GC is usually triggered because there is not enough heap memory or too many old objects. Full GC also causes context switching. We have covered context switching in the previous article, and we know that context switching can degrade system performance. We can lower the frequency of Full GC in the following directions.

Reduce creating large objects: Sometimes, because of some programming conventions, it is easy to query a large object from the database at once for web display. This large object will be created directly in the old age, even in the new generation. Since the space of the new generation is usually very small, it will enter the old age through a Minor GC. Large objects like this can trigger Full GC, so make it a good habit to reduce queries for unnecessary fields.

Increase heap size: When the heap is low, increase the heap size directly by setting the initial size to the maximum heap size. This can significantly reduce the Full GC frequency /

The right GC collector: We’ve also covered a variety of collectors, and depending on our business scenario, choosing the right collector can often work well.

conclusion

Recycling is a complex subject that requires constant practice. After reading this article, you must have a certain understanding of garbage recycling. Let’s take action and practice our company’s development environment first.