This set of articles is a full translation of Java8’s official GC tuning guide, Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide
3 broke a generational
Introduction to the
One of the strengths of the Java SE Platform is that it allows developers to develop without having to know anything about memory allocation. However, when the garbage collector has become a major bottleneck for Java programs, it is useful to understand its internal implementation.
The garbage collector handles objects that no longer have any Pointers to them. The simplest garbage collection algorithm iterates over all reachable objects. The remaining unreachable objects are recyclable objects. The GC time depends on the number of live objects. If you are a large server application with a large number of live objects, it is best not to use such a collector.
The JVM uses a generational collation strategy that includes a number of different garbage collection algorithms. The original garbage collection examines all living objects in the heap, and generational collection strategies develop multiple hypotheses and observation properties to reduce the amount of effort to recycle objects. The most important of these observations is the weak generation conjecture, which marks that most objects only live for a short time.
The blue area in the figure below is a typical distribution of surviving objects. The X-axis is the number of objects alive, and the Y-axis is the total number of bytes used by the object. The vertical downward peak on the left of the figure represents objects that are allocated and quickly reclaimed. For example, for objects created in a for loop, the life cycle is that one loop, and the object can be collected at the end of the loop.
Some objects are much longer lived, so the distribution is smooth all the way to the right. These objects are allocated from the start of the service until the entire process exits. For some Java programs, the object survival distribution diagram may be completely different from the diagram. But it’s amazing how many applications look like this. A collection strategy that focuses on retrieving “short-lived objects” is most effective.
To make this scenario more optimistic, memory is also managed in generations. When a generation runs out of memory, the generation begins garbage collection. A large number of objects are short-lived. When the Young generation is full, a minor collection is triggered. Garbage objects from other generations are not collected. The ideal case for a minor collection is to assume that all short-lived objects are in the Young section, and then the collection is done directly. The cost of collecting a minor collection is the percentage of live objects in the total; If all objects in the Young section die, the collection will be quick. Typically, a portion of the surviving objects will move to the Tenured Generation after each minor GC. Eventually, the Tenured Generation will be full that is, the entire heap will be full, triggering a major collection. Major collections usually last longer because a large number of objects are involved in the collection.
As mentioned in the Ergonomics section, the JVM dynamically selects garbage collectors for different Java applications to provide good performance. Serial Garbage Collector and its default parameters are designed for small Java applications. Parallel, or throughput Garbage Collector, is designed for mid-sized applications. The heap size parameter and some additional feature parameters are designed for server-level applications. Most of these erasers work well, but not always. (This is not nonsense, before listen to a factory bull lecture said that the JVM native collector is garbage, but generally small factories do not have so large heap. Hey Garbage Collector, why don’t you collect yourself? Here are the document’s central guidelines:
If the garbage collector becomes a system bottleneck, it is best to customize the heap size for each generation. Check the GC log to find the best parameters for you.
To tune GC parameters, you need to adjust the heap size according to the characteristics of the program, because the heap size determines the pause time of the GC. You also need to be aware of all the JVM performance parameters available, and adjust the parameters to the physical environment in which your program is running. Watch the GC logs to ensure that throughput and pause times meet the requirements. How can you say that both of these goals fit the bill? How to set goals? This will see their own, we will say in detail later, the first set of documents read.
The following figure shows the default generational policy (for all collector usage except Parallel Collector and G1)
When a Java application is initialized, the maximum address space is predetermined but not actually allocated. The full memory address space can be divided into young and Tenured regions.
Young district contains one Eden district and two Suvivor districts. Most objects are allocated in Eden. A survivor zone is empty at any time, and the Eden zone survivor object will be migrated to this survivor zone after GC. The other survivor zone is for the next replication collection. Objects will be copied back and forth between the two S-zones until they are old enough to be promoted to the Tenured zone.
Performance Considerations
There are two main measures of garbage collector performance:
- Throughput, the percentage of application elapsed time excluding GC elapsed time. Throughput includes memory allocation time.
- Number of pauses, GC triggers STW when the program is unresponsive.
New tip: As soon as STW is triggered, the program will pause and must wait for GC to finish before continuing to respond.
Users have different needs for garbage collectors. For example, a policy: the optimal optimization goal for a Web service is throughput first, because pauses caused by GC collection can be masked by network latency. However, in a graphical program, even brief pauses can affect the user experience.
Some users are more sensitive to other factors. Footprint is the memory Footprint of a process, measured by pages of memory and cache rows. On an operating system with a given number of physical memory and CPU cores, Footprint determines the scalability of the application.
Promptness is a very important performance factor for distributed systems, including RMI, which refers to the time between object death and memory availability.
Typically, assigning a size to a particular generation is a trade-off between these factors. For example, a very large young area might increase throughput, but also increase footprint, promptness, and pause times. The pause time of the young zone can be reduced by decreasing the size of the young zone. The size of one generation does not affect the recycle frequency and pause time of other generations.
There is no simple right way to determine the size of a generation. The best choice comes from the way the application uses memory and the needs of the user. The JVM’s default collector and parameters are not necessarily optimal, and you may need to adjust the parameters to get the best results. See section Sizing the Generations for details.
Measurement metrics
Using application-specific metrics is a good way to measure Throughput and Footprint. For example, the throughput of a Web service can be measured by a client stress test, and the server’s footprint can be viewed by the pmap command. GC pause information can also be viewed through JVM output logs.
Parameter -verbose: The GC can output heap and garbage collection information after each gc. For example, here is a server GC log:
GC 325407K->83000K(776768K), 0.2300771 secs 0.2454258 secs] [Full GC 267628K->83769K, 1.8479984 secs]Copy the code
The log indicates two minor collections and one major collection. The two numbers around the arrow (325407K->83000K) indicate the memory space before and after GC. After the minor collection, there are some garbage objects in this space, but the memory space is not freed. These objects are either in the Tenured Generation or are referenced by objects in the Tenured Generation.
The number in parentheses, 776768K, is the committed heap size: the amount of space that the operating system has allocated to Java to create objects. Note that this numeric value contains the size of a survivor section. Except for the garbage collection itself, only one survivor zone is used to store objects.
The last item, 0.2300771 secS, represents the time taken to perform GC.
The third line of the Full GC log is similar to the Young GC above.
Pay attention to,-verbose:gc
The format of the output log may change in future releases
-xx :+PrintGCDetails will log more. Here’s an example:
[GC [DefNew: 64575K->959K, 0.0457646 secs] 196016K->133633K, 0.0459067 secs]Copy the code
This indicates that the Minor GC reclaimed 98% of the young region, DefNew: 64575K->959K(64576K) execution consumed 0.0457646 secs.
The total heap usage drops to about 51% (196016K->133633K(261184K)), and the total time is slightly higher than the minor GC time.
Pay attention to,-XX:+PrintGCDetails
The format of the output log may change in future releases
-xx :+PrintGCTimeStamps adds a timestamp to each GC. This is very useful when looking at GC frequencies.
111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K), 0.1293306 secs]
Copy the code
This collection occurs when the Java program runs at 111 seconds. The Minor GC starts executing around the same time. More importantly, this log also shows the Tenured GC. Usage of tenured Generation was reduced to 10%(18154K->2311K(24576K)), taking 0.1293306 secs, approximately 130 milliseconds.