In practice we have found that for most application domains, evaluating a garbage collection (GC) algorithm is based on two criteria:
The higher the throughput, the better the algorithm. 2. The shorter the pause time, the better the algorithm
First, let’s clarify two terms in garbage collection: Throughput and pause times. The JVM performs GC in specialized threads (GC Threads). As long as GC threads are active, they compete with Application Threads for the clock cycles of the currently available CPU. To put it simply, throughput is the percentage of application thread time spent in total program time. For example, a throughput of 99/100 means that for a 100-second program execution time, the application thread ran for 99 seconds, while the GC thread ran for only 1 second during that time.
The term “pause time” refers to a period of time during which the application thread is completely paused to allow the GC thread to execute. For example, a 100-millisecond pause time during GC means that no application threads are active during that 100-millisecond period. If a running application has an “average pause time” of 100 milliseconds, then the average length of all the pauses in the application is 100 milliseconds. Similarly, a “maximum pause time” of 100 milliseconds means that the maximum pause time for all applications is 100 milliseconds.
(1) Throughput VS pause time
High throughput is best because it gives the end user of the application the impression that only application threads are doing “productive” work. Intuitively, the higher the throughput, the faster the application will run. Low pause times are best because it is always bad from the end user’s point of view for an application to be suspended, whether GC or otherwise. Depending on the type of application, sometimes even a brief 200-millisecond pause can interrupt the end-user experience. Therefore, it is important to have low maximum pause times, especially for an interactive application.
Unfortunately, “high throughput” and “low pause times” are competing goals. Think about it this way, and simplify it for clarity: GC needs certain preconditions in order to run safely. For example, the application thread must be guaranteed not to modify the state of objects while the GC thread is trying to determine which objects are still referenced and which are not. To do this, the application must stop during the GC (or only during a specific phase of the GC, depending on the algorithm used). However, this incurs additional thread scheduling overhead: direct overhead due to context switching, indirect overhead due to caching. Coupled with the overhead of security measures within the JVM, this means that THE GC and its attendant non-negligible overhead will increase the amount of time that the GC thread performs actual work. So we can maximize throughput by running GC as little as possible, for example, only when it’s unavoidable, saving all the overhead associated with it.
However, running GC only occasionally means that there is a lot of work to do each time the GC runs, because the number of objects that accumulate in the heap during that time is high. Individual GC takes more time to complete, resulting in higher average and maximum pause times. Therefore, given the low pause times, it is best to run GC frequently to complete more quickly. This in turn increases overhead and decreases throughput, and we’re back to square one. To sum up, when designing (or using) GC algorithms, we must determine our goals: a GC algorithm can only target one of two goals (i.e. focus only on maximum throughput or minimum pause times), or try to find a compromise between the two.
(2) Garbage collection on HotSpot VIRTUAL machine
We discussed the young-generation garbage collector in Part 5 of this series. For the older generation, the HotSpot VIRTUAL machine provides two classes of garbage collection algorithms (in addition to the new G1 garbage collection algorithm), the first class of algorithms trying to maximize throughput and the second class of algorithms trying to minimize pause times. Today we focus on the first category, “throughput oriented” garbage collection algorithms. We want to focus on JVM configuration parameters, so I’ll just briefly outline the combinatory-oriented garbage collection algorithms that HotSpot provides. The garbage collector is triggered when the old generation fails to allocate objects due to lack of space (in fact, “allocate” usually refers to objects promoted from the young generation to the old generation). ‘S “GC roots”. The garbage collector moves the live objects to a non-fragmented chunk of memory in the older generation and marks the remaining memory space as free. That is, we don’t move to a different heap region like the replication strategy does, as the young-generation garbage collection algorithm does. Instead, we defragment the heap region by putting all objects in one heap region. The garbage collector uses one or more threads to perform garbage collection. When multiple threads are used, the different steps of the algorithm are broken down so that each collection thread works in its own area most of the time without disturbing other threads. During garbage collection, all application threads are suspended and only restarted when garbage collection is complete. Now let’s look at the important JVM configuration parameters related to the throughput-oriented garbage collection algorithm.
(3) – XX: + UseSerialGC
We use this flag to activate a serial garbage collector, such as a single-threaded throughput-oriented garbage collector. Either the young or the old generation will have only one thread to perform garbage collection. This flag is recommended for JVMS with only a single processor core available. In this case, using multiple garbage collection threads can even be counterproductive, as these threads will compete for CPU resources, incurring synchronization overhead, and never really run in parallel.
(4) – XX: + UseParallelGC
With this flag, we tell the JVM to perform young-generation garbage collection in parallel using multiple threads. In my opinion, this flag should not be used in Java 6 because -xx :+UseParallelOldGC is obviously more appropriate. Note that in Java 7 this has changed a bit (see this overview), with -xx :+UseParallelGC achieving the same effect as -xx :+UseParallelOldGC.
(5) – XX: + UseParallelOldGC
The logo’s name is somewhat unfortunate, as “old” sounds like “outdated”. However, “old” actually refers to the old generation, which explains why -xx :+UseParallelOldGC is superior to -xx :+UseParallelGC: in addition to activating the young generation parallel garbage collection, it also activates the old generation parallel garbage collection. I recommend using this flag when high throughput is expected and the JVM has two or more processor cores available. As a side note, HotSpot’s parallel throughput-oriented garbage collection algorithms are often referred to as “throughput collectors” because they are designed to increase throughput through parallel execution.
(6) – XX: ParallelGCThreads
We can specify the number of threads for parallel garbage collection by -xx :ParallelGCThreads=. For example, -xx :ParallelGCThreads=6 means that six threads will perform each parallel garbage collection. If this flag is not explicitly set, the virtual machine will use a default value calculated based on the number of available (virtual) processors. The determining factor is the Java Runtime. The return value N from the availableProcessors() method, if N<=8, the parallel garbage collector will use N garbage collection threads, and if N>8 availableProcessors, the number of garbage collection threads should be 3+5N/8. It makes more sense to use the default Settings when the JVM exclusively uses the system and processor. However, if there are multiple JVMS (or other CPU-hungry systems) running on the same machine, we should use -xx :ParallelGCThreads to reduce the number of garbage collection threads to an appropriate value. For example, if four JVMS running as servers are running simultaneously on a machine with a 16-core processor, it is advisable to set -xx :ParallelGCThreads=4 to keep garbage collectors from different JVMS from interfering with each other.
(7) – XX: – UseAdaptiveSizePolicy
The throughput garbage collector provides an interesting (but common, at least on modern JVMS) mechanism to make garbage collection configurations more user-friendly. This mechanism is seen as part of the “ergonomics” concept introduced by HotSpot in Java 5. Ergonomically, the garbage collector can apply dynamic heap size changes to different heap areas just like GC Settings, as long as there is evidence that these changes will improve GC performance. The exact meaning of “improving GC performance” can be specified by the user with the -xx :GCTimeRatio and -xx :MaxGCPauseMillis(see below) tags. It is important to know that ergonomics is activated by default. This is good, because adaptive behavior is one of the JVM’s greatest strengths. However, sometimes we need to be very clear about what Settings are most appropriate for a particular application, and in those cases we may not want the JVM to mess with our Settings. Whenever we find ourselves in this situation, we can consider discontinuing some ergonomics with -xx: -useadaptivesizePolicy.
(8) – XX: GCTimeRatio
With -xx :GCTimeRatio= we tell the JVM the throughput to target. More precisely, -xx :GCTimeRatio=N specifies that the execution time of the target application thread (relative to the total program execution time) reaches a target ratio of N/(N+1). For example, with -xx :GCTimeRatio=9 we require the application threads to be active at least 9/10 of the entire execution time (thus, the GC threads occupy the remaining 1/10). Based on runtime measurements, the JVM will attempt to modify the heap and GC Settings to achieve the target throughput. -xx: The default value of GCTimeRatio is 99, that is, application threads should run at least 99% of the total execution time.
(9) – XX: MaxGCPauseMillis
-xx :GCTimeRatio= tells the JVM to target the maximum pause time in milliseconds. At run time, the throughput collector calculates the statistics (weighted average and standard deviation) observed during the pause. If statistics indicate that there is a risk that the pauses being experienced will be longer than the target value, the JVM will modify the heap and GC Settings to reduce them. Note that the statistics for young and old garbage collections are calculated separately, and note that by default, the maximum pause time is not set. If both the maximum pause time and the minimum throughput are targeted, achieving the maximum pause time goal takes higher priority. Of course, there is no guarantee that the JVM will achieve any goal, even if it tries. In the end, everything depends on the behavior of the application at hand. When setting the maximum pause time target, we should be careful not to choose a value that is too small. As we now know, to keep pause times low, the JVM needs to increase the number of GCS, which can severely affect the throughput that can be achieved. This is why for applications (mostly Web applications) that require low pause times as their primary goal, I would recommend not using the throughput collector and opting for the CMS collector instead. The CMS collector is the subject of the next part of this series.
That’s it! Stay tuned for more next time!