JVM memory model and garbage collection algorithm

1. According to the Java Virtual Machine specification, the JVM divides memory into:

  • New (young generation)
  • Tenured (old generation)
  • Permanent generation (Perm)

The heap is allocated from memory specified by the JVM startup parameter (-xmx :3G). Perm is not allocated by the VM but can be resized using parameters such as -xx :PermSize -xx :MaxPermSize.

  • Young generation (New) : Young generation is used to hold Java objects just allocated by the JVM
  • Tenured: Objects in a young generation that are not recycled from garbage collection will be copied to the old generation
  • Permanent generation (Perm) : The permanent generation stores the Class and Method meta information. The size of Perm depends on the project scale, Class, and Method quantity. Generally, 128 MB is sufficient and 30% space is reserved.

New is divided into several parts:

  • Eden: Eden is used to store objects just allocated by the JVM
  • Survivor1
  • Survivro2: An object in Eden that has two Survivor Spaces of the same size will be copied back and forth between the two Survivor Spaces if garbage collection fails to reclaim it. When a certain condition, such as the number of copies, is met, it will be copied to Tenured. Obviously, Survivor simply increases the duration of the object’s stay in the young generation, increasing the likelihood of garbage collection.

2. Garbage collection algorithm

Garbage collection algorithms can be divided into three categories, all based on mark-clean (copy) algorithms:

  • Serial algorithm (single thread)
  • Parallel algorithm
  • Concurrent algorithms

The JVM selects the appropriate reclamation algorithm for each memory generation based on the hardware configuration of the machine. For example, if the machine has more than one core, the parallel algorithm will be selected for the young generation. Please refer to the JVM tuning documentation for details on the selection.

To explain a little bit, parallel algorithms use multiple threads for garbage collection, which suspends the execution of the program, while concurrent algorithms also use multiple threads for garbage collection, which does not stop the execution of the application. Therefore, concurrent algorithms are suitable for some programs with high interactivity. It has been observed that the concurrent algorithm reduces the size of the young generation, which in turn uses a large old generation, which in turn has a relatively low throughput compared to the parallel algorithm.

Another question is, when does the garbage collection take place?

  • When the young generation is full, a normal GC is raised, which only recycles the young generation. It is important to note that young generation full means Eden is full and Survivor full does not cause GC
  • Full GC is raised when the old generation is Full, and Full GC will collect both young and old generations
  • Full GC is also raised when the generation is permanently Full, which causes the Class and Method meta information to be unloaded

Another issue is when an OutOfMemoryException is thrown, not when memory is empty

  • JVM spends 98 percent of its time on memory reclamation
  • Less than 2% memory is reclaimed each time

Meeting both conditions triggers an OutOfMemoryException, which leaves a small gap for the system to do some pre-Down operations, such as manually printing Heap Dump.

Second, memory leakage and solutions

1. Symptoms before a system crash:

  • The time of garbage collection is getting longer and longer, extending from 10ms to 50ms. The time of FullGC is also extended from 0.5s to 4 and 5s
  • The frequency of FullGC is increasing, and the most frequent time interval of FullGC is less than 1 minute
  • Older generation memory is getting bigger and bigger and no memory is freed after each FullGC

The system then becomes unable to respond to new requests, gradually reaching the threshold of OutOfMemoryError.

2. Create stacks of dump files

The current Heap information is generated by JMX’s Mbeans as a 3G (the size of the entire Heap) hprof file, which can be generated by Java’s jmap command if JMX is not enabled.

3. Analyze the dump file

The next thing to consider is how to open the 3G heap file. Obviously, Windows systems don’t have this much memory, so you have to use Linux with high configuration. Of course we can use x-Window to import graphics from Linux into Windows. Let’s consider opening the file with the following tools:

  1. Visual VM
  2. IBM HeapAnalyzer
  3. The Hprof tool comes with the JDK

When using these tools, it is recommended to set the maximum memory to 6GB to ensure the loading speed. After use, it is found that none of these tools can intuitively observe memory leaks. Although Visual VM can observe object size, it cannot see the call stack. HeapAnalyzer can see the call stack, but cannot open a 3GB file properly. Therefore, we chose Eclipse’s special static memory analysis tool: Mat.

4. Analyze memory leaks

Through Mat, we can clearly see which objects are suspected of leaking memory, which objects occupy the most space, and the call relationship of objects. In this case, there are many instances of JbpmContext in ThreadLocal. After investigation, the JBPM Context is not closed.

In addition, through Mat or JMX we can also analyze the thread state, we can observe the thread is blocked on which object, so as to determine the system bottleneck.

5. Regression questions

Q: Why is it taking longer and longer to collect garbage before crashes?

A: According to the memory model and garbage collection algorithm, garbage collection can be divided into two parts: memory marking and cleaning (replication). As long as the memory size of the marked part is fixed, the time of the marked part remains unchanged, while the time of the replication part is changed. Because there are some unrecoverable memory in garbage collection, the amount of replication is increased, leading to the extension of time. Therefore, the time of garbage collection can also be used to determine memory leaks

Q: Why is Full GC getting more frequent?

A: So memory accumulates, gradually depleting the memory of the old generation, resulting in no more space for new object allocation, leading to frequent garbage collection

Q: Why are older generations taking up more and more memory?

A: Because the memory of the young generation cannot be reclaimed, it is increasingly copied to the old generation

Third, performance tuning

In addition to the above memory leaks, we also found that the CPU was chronically under 3% and the system throughput was insufficient, which was a serious waste of resources for an 8core× 16GB, 64-bit Linux server.

While the CPU was under load, we realized that we had to tune the program and the JVM as users occasionally reported that requests were taking too long. From the following aspects:

  • Thread pooling: Solves the problem of long user response times
  • The connection pool
  • JVM startup parameters: Adjust the memory ratio and garbage collection algorithm for each generation to improve throughput
  • Program algorithm: Improve program logic algorithm to improve performance

1. Java thread pool (Java. Util. Concurrent. ThreadPoolExecutor)

Most applications on JVM6 use thread pools that are native to the JDK, and the reason for the verbose description of a mature Java thread pool is that it behaves a little differently than expected. Java thread pools have several important configuration parameters:

  • CorePoolSize: number of core threads (latest thread count)
  • MaximumPoolSize: specifies the maximum number of threads. If the number exceeds this threshold, the task will be rejected. You can use the RejectedExecutionHandler interface to customize the processing mode
  • KeepAliveTime: the duration for which a thread remains active
  • WorkQueue: indicates a workQueue storing executed tasks

Java thread pools need to pass in a Queue argument (workQueue) to store the tasks being executed, and the pool behaves quite differently depending on the selection of queues:

  • SynchronousQueue: An empty Queue in which insert operations on one thread must wait for remove operations on another thread. This pool allocates a new thread for each task
  • LinkedBlockingQueue: An unbounded Queue in which the thread pool ignores the maximumPoolSize parameter and uses only corePoolSize threads to process all tasks. Unprocessed tasks are queued in LinkedBlockingQueue
  • ArrayBlockingQueue: bounded queues, programs are difficult to tune with bounded queues and maximumPoolSize: larger queues and smaller maximumPoolSize will result in low CPU load; For small queues and large pools, queues do not start as well as they should.

In fact, our request is very simple, we want the thread pool can be like the connection pool, can set the minimum number of threads, the maximum number of threads, when the minimum number of tasks < the maximum number of threads, should be allocated to the new thread processing; When task > Max, you should wait for free threads before processing the task.

But thread pools are designed in such a way that tasks should be placed in a Queue, and then new threads should be considered when the Queue is full, and if the Queue is full and no new threads can be spawned, the task is rejected. The design results in “put and wait for execution”, “unable to put and wait for execution”, “refuse and wait for execution”. Therefore, depending on the Queue parameter, you cannot increase maximumPoolSize to improve throughput.

Of course, some encapsulation of thread pools is necessary to achieve our goal, and fortunately enough custom interfaces are left in ThreadPoolExecutor to help us get there. The way we encapsulate it is:

  • Using SynchronousQueue as an argument, maximumPoolSize is enabled to prevent threads from being allocated unrestricted, and maximumPoolSize can be increased to improve system throughput
  • RejectedExecutionHandler: if the number of threads exceeds maximumPoolSize, the RejectedExecutionHandler can be used to check whether new tasks can be executed in the thread pool at any time. If the rejected Task can be added to the thread pool, The check time depends on the size of keepAliveTime.

2. The connection pool (org.apache.com mons. DBCP. BasicDataSource)

In the use of org.apache.com mons. DBCP. BasicDataSource, because before using the default configuration, so when highly trafficked, JMX observed that many Tomcat threads are blocking on the Lock of the Apache ObjectPool used by the BasicDataSource. The direct cause is that the BasicDataSource connection pool maximum number is set too small. The default BasicDataSource configuration uses only 8 maximum connections.

I have also observed a problem that when the system is not accessed for a long time, such as 2 days, Mysql on the DB will break all connections, causing the connections cached in the connection pool to become unavailable. To address these issues, we fully explored the BasicDataSource and found some optimizations:

  • Mysql supports 100 links by default. Therefore, you need to configure each connection pool based on the number of machines in the cluster. If there are two servers, set each connection pool to 60
  • InitialSize: The parameter is the number of connections that are always open
  • MinEvictableIdleTimeMillis: this parameter set the idle time of each connection, more than the time the connection will be closed
  • TimeBetweenEvictionRunsMillis: background thread running cycle, used to detect expired connection
  • MaxActive: indicates the maximum number of connections that can be allocated
  • MaxIdle: indicates the maximum number of idle connections. If the number of connections exceeds maxIdle after the connection is used, the connection is directly shut down. Only initialSize < x < maxIdle connections will be periodically checked for overage. This parameter is mainly used to improve throughput during peak access.
  • How is initialSize maintained? After studying the code, it was found that the BasicDataSource would close all the expired connections and then open the initialSize number of connections. This feature ensures that all extended with minEvictableIdleTimeMillis, timeBetweenEvictionRunsMillis initialSize connection will be reconnected, avoiding the Mysql long time no action will snap connection problem.

3. The JVM parameter

In the JVM startup parameters, you can set some parameters related to memory and garbage collection. By default, the JVM will work fine without any Settings, but for some well-configured servers and specific applications you must carefully tune them for best performance. We hope to achieve some goals by setting:

  • The GC time is sufficiently small
  • The number of GC’s is sufficiently small
  • The cycle for Full GC to occur is sufficiently long

The first two are currently contradictory, in order to have a small GC time you have to have a smaller heap, in order to have enough GC times you have to have a larger heap, we have to take the balance.

(1) The minimum and maximum values of the JVM heap can be specified by -xms-xmx. In order to prevent the garbage collector from shrinking the heap between the minimum and maximum values, we usually set the maximum and minimum values to the same values. (2) The young and old generations will be set according to the default ratio (1: 2) Allocate heap memory, you can adjust the size of the heap by adjusting the ratio between them NewRadio, or for the recycled generation, such as the young generation, set its absolute size by -xx :newSize -xx :MaxNewSize. Also, to prevent heap shrinkage in the younger generation, we usually set -xx :newSize -xx :MaxNewSize to the same size

(3) How large is it reasonable to set the young generation and the old generation? There is no answer to this question, of course, or there would be no tuning. Let’s look at the effect of these changes in size

  • A larger young generation inevitably leads to a smaller old generation, which lengthens the normal GC cycle but increases the time of each GC; Small aged generations result in more frequent Full GC
  • A smaller young generation inevitably leads to a larger old generation, and a smaller young generation leads to more frequent, but shorter GC sessions; Large tenured generations reduce the frequency of Full GC
  • How you choose should depend on the distribution of the application object lifecycle: if the application has a large number of temporary objects, you should choose the larger young generation; If there are relatively many persistent objects, the aged generation should grow appropriately. However, many applications do not have such obvious features, so the decision should be based on the following two points: (A) In accordance with the principle of Full GC as little as possible, let the elderly generation cache common objects as much as possible, the default ratio of JVM 1: (B) By observing the application for a period of time to see how much memory is occupied by other aged generations at peak times, you can scale up the young generation according to the actual situation without affecting the Full GC, for example, the ratio can be 1:1. But the older generation should be allowed at least a third of the growth

(4) On a well-configured machine (such as multi-core, large memory), you can choose the parallel collection algorithm for the old generation: -xx :+UseParallelOldGC, the default is Serial collection

(5) Thread stack setting: each thread will open a stack of 1M by default, which is used to store stack frames, call parameters, local variables, etc. For most applications, this default value is too much, generally 256K is sufficient. In theory, reducing the stack per thread can produce more threads with constant memory, but this is really limited by the operating system.

(4) You can use the following parameters to create Heap Dump information

  • -XX:HeapDumpPath
  • -XX:+PrintGCDetails
  • -XX:+PrintGCTimeStamps
  • -Xloggc:/usr/aaa/dump/heap_trace.txt

The following parameters control how much heap information is printed when OutOfMemoryError occurs

  • -XX:+HeapDumpOnOutOfMemoryError

(Server: 64-bit Linux, 8Core x 16G)

JAVA_OPTS=”$JAVA_OPTS -server -Xms3G -Xmx3G -Xss256k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseParallelOldGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/aaa/dump -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/usr/aaa/dump/heap_trace.txt -XX:NewSize=1G -XX:MaxNewSize=1G”

After observation, this configuration is very stable, the time of each ordinary GC is about 10ms, and the Full GC rarely occurs, or only occurs once for a very long time

Full GC will be performed every hour to clear references whenever JMX service is enabled in the JVM. Please refer to the attached documentation for this.

4. Program algorithm tuning

The last

Thank you for reading here, the article is inadequate, welcome to point out; If you feel well written, welcome to forward and like!