Young generation garbage collectorHow ParNew works

Often online systems use the ParNew garbage collector as the next-generation garbage collector. (Replication algorithm)

Multithreaded garbage collection mechanism

Java systems that typically run on a server take full advantage of the server’s multi-core CPU. (Squeeze the CPU dry)

Assuming the server has a 4-core CPU, using only a single thread to perform garbage collection when garbage collection occurs will result in underutilization of server CPU resources.

Why do you say that?

If only one garbage collection thread is executing, then the resources of the 4-core CPU cannot be fully utilized. Theoretically, the 4-core CPU can support the parallel execution of 4 garbage collection threads, which can improve the performance by 4 times.

So ParNew garbage collector is a multi-threaded garbage collection mechanism, another kind of Serial garbage collector is mainly single-thread garbage collection, they are both the new generation of recycling, the only difference is the difference between single-thread and multi-thread, but the garbage collection algorithm is exactly the same.

The online system specifies ParNew as the new generation garbage collector

Use the “-xx: +UseParNewGC” option

ParNew The default number of threads for the garbage collector

Once we specify the ParNew garbage collector, by default we set ourselves the same number of garbage collection threads as the number of CPU cores.

For example, if our online machine is using a 4-core CPU, or an 8-core CPU, or a 16-core CPU, then ParNew’s garbage collection threads would be 4, 8, or 16, respectively.

This number of threads generally does not need to be manually adjusted.

However, if you must adjust the number of garbage collection threads for ParNew, you can do so by using the “-xx :ParallelGCThreads” parameter, which allows you to set the number of threads.

Is it better for the garbage collector to be single-threaded or multi-threaded?

Server mode and client mode

You can distinguish between server mode and client mode when you start the system, so if you start the system with “-server “it’s server mode, and if you start the system with “-cilent” it’s client mode.

The difference between the two is that if your system is deployed on, say, a 4-core, 8-GIGAByte Linux server, server mode should be used, and if your system is running on, say, a Windows client, client mode should be used.

So what’s the difference between server mode and client mode?

The server mode usually runs large systems such as our website system, e-commerce system, business system and APP background system, which are generally multi-core CPUS.

Therefore, if you want to garbage collection at this time, then it must be better to use ParNew, because multi-threaded parallel garbage collection, make full use of multi-core CPU resources, can improve performance.

On the other hand, if you deploy on a server and you use single-threaded garbage collection, some CPU is wasted and not used at all.

So what if your Java program is a client program, such as a Windows client like Baidu Cloud disk, or a Windows client like Evernote, running on a Windows personal operating system?

Many of these operating systems have single-core cpus, and if you continue to use ParNew for garbage collection, you will end up running multiple threads on one CPU, which increases performance overhead and may not be as efficient as a single thread.

Because running multiple threads on a single CPU can result in frequent online context switching, efficient overhead.

Therefore, if it is similar to the kind of client program running on Windows, it is recommended to use Serial garbage collector, single CPU single thread garbage collection, but more efficient

Conclusion:

Java is now mainly used to build complex large-scale back-end business systems, so it is common to use “-server “specified as server mode, and then cooperate with ParNew multi-threaded garbage collector.

How old garbage collector CMS works

Review: When does a Full GC trigger

  1. The size of available memory before the Minor GC is less than the average size of objects in the previous Minor GC. To be on the safe side, perform Full GC before Minor GC
  2. After a Minor GC, if the size of objects promoted to the old age is larger than the available memory space of the old age, the Minor GC is performed

The rationale for the CMS garbage collector

Mark-clean algorithm

Mark which objects are garbage and then clean them up.

  • First, trace GC Roots to see if each object is referenced by GC Roots. If so, it is a live object, otherwise it is a garbage object (garbage object token).

Disadvantages: a large number of memory fragments, not large or small, intermittent, a memory fragment cannot accommodate an object, resulting in memory space waste

Execution mode

Slow garbage collection by stopping The World and using a mark-clean algorithm can cause The system to be stuck for too long and many responses can’t be processed.

Therefore, the CMS garbage collector uses the mode that the garbage collector thread and the system worker thread execute simultaneously as much as possible.

How does CMS realize garbage collection while the system works?

1. Initial tag

Causes The system’s worker threads to Stop altogether, entering The “Stop The World” phase

“Initial mark” : – marks the object referenced directly by GC Roots.

The local variables of a method and the static variables of a class are GC Roots objects, but the instance variables of a class are not GC Roots.

public class Kafka {
    private static ReplicaManager replicaManager = new ReplicaManager();
}

public class ReplicaManager {
    private ReplicaFetcher replicaFetcher = new ReplicaFetcher();
}
Copy the code

In the initial markup phase, only GC Roots represented by “replicaManager”, a static variable of the class, are marked to refer to replicaManager objects directly.

The ReplicaFetcher object is not marked because it is referenced by ReplicaFetcher, the instance variable of the ReplicaManager class, which is not GC Roots.

So the first stage, the initial tag, says to “Stop the World” and pause all worker threads, but it doesn’t really matter because it’s fast and only marks objects referenced directly by GC Roots.

2. Concurrent markup

Let the system threads create whatever objects they want and keep running.

New live objects may be created at run time, or some live objects may be unreferenced (junk objects).

During this process, the garbage collection thread does GC Roots tracing of existing objects as much as possible.

By GC Roots, he means that all the objects in the ReplicaFetcher, like the ReplicaFetcher, are referenced by whom? For example, it’s referenced by the instance variable of the ReplicaManager object, and then it says, well, who references the ReplicaManager object? It is referenced by a static variable of the “Kafka” class. Then you can assume that the “ReplicaFetcher” object is referenced indirectly by GCRoots, so there is no need to reclaim it.

public class Kafka {
    private static ReplicaManager replicaManager = new ReplicaManager();
}

public class ReplicaManager {
    private ReplicaFetcher replicaFetcher = new ReplicaFetcher();
}
Copy the code

However, when concurrent tags are running, the system program is constantly working, and new objects may be created, some of which may become garbage.

The “concurrent marking” phase, in which GC Roots are traced for all objects of the old age, is the most time-consuming. (There is no Stop The World)

He needs to trace if all objects are referenced by GC Roots, but this phase, which is the most time-consuming, is run concurrently with the system program, so it doesn’t really affect the system.

3. Re-label

Because in the concurrent marking phase, you mark live objects and garbage objects, create new objects, and make old objects garbage;

So after the concurrent marking phase is executed, there are definitely many live objects and junk objects that are not marked during the concurrent marking phase.

  1. The re-marking phase needs to Stop The system, Stop The World.

  2. Then re-mark newly created objects in the concurrent marking phase, and there are cases where existing objects may lose references and become garbage

This re-marking stage, is very fast, he is actually in the concurrent marking stage by the system program run changed a few objects to mark, so the speed is very fast.

4. Concurrent clearing

Let the program run at will, and the garbage collector cleans up the objects that were previously marked as garbage.

This stage is actually very time-consuming, because it needs to clean up the object, but it is also run concurrently with the system program, so it does not affect the execution of the system program

CMS performs process performance analysis

Cost performance stage

CMS design tip: Run concurrently in The performance consuming phase (system worker thread and garbage collector thread execute concurrently), there is no Stop The World;

For The non-performance phase, Stop The World has little impact on system response due to its fast execution speed.

Concurrent marking phase

GC Roots trace all objects of the old age to mark which objects can be recycled.

Concurrent cleanup phase

The various garbage objects are removed from memory

CMS Execution Diagram

Details of CMS garbage collection

1. CPU resources are strained due to concurrent garbage collection

Instead of stopping The World during The most time-consuming phases of concurrent tagging and concurrent cleaning, The system worker thread and The garbage collector thread work simultaneously, which results in limited CPU resources being partially consumed by The garbage collector thread.

  • Concurrent marking: It is necessary to deeply trace GC Roots to see all objects and which ones are alive. However, there are many living objects in the old era, so this process will trace a large number of objects, so it is time-consuming
  • Concurrent cleanup: It takes a lot of time to clean up all garbage objects

During these two phases, the CMS garbage collection thread is CPU intensive.

The number of garbage collection threads started by CMS garbage collector by default = (number of CPU cores +3) /4

Let’s use the most common 2-core 4G machine and 4-core 8G machine to calculate, assuming a 2-core CPU, CPU resources are limited, the result of this CMS will still have a “(2+3) /4″=1 garbage collection thread, to occupy a precious CPU.

The concurrent garbage collection mechanism of CMS, first and foremost, consumes CPU resources.

2. Concurrent Mode Failure

In the concurrent cleanup phase, the CMS simply reclaims previously marked garbage objects; However, the system is always running at this stage, and some objects may age as the system runs, and also become garbage objects, which are ** “floating garbage” **.

Floating garbage is not collected by the local garbage collection thread and must wait until the next garbage collection.

So to ensure that there is enough memory for some objects to be aged during the CMS garbage collection, some space is usually reserved.

One of the triggers for CMS garbage collection is to automatically perform GC when the memory footprint of the old generation reaches a certain percentage.

“- XX: CMSInitiatingOccupancyFaction” * * * * what percentage of parameters can be used to set the old s occupy the trigger CMS garbage collection, JDK1.6 inside the default value is 92%. That is, if the old generation takes up 92% of the space, the CMS garbage collection is automatically performed, reserving 8% of the space for concurrent collection while the system program puts some new objects into the old generation.

What if, during the CMS garbage collection, the system program wants to put more objects in the older generation than the memory available in the older generation

Concurrent Mode Failure occurs when Concurrent garbage collection fails and you run out of memory while I collect garbage.

In this case, The CMS will be automatically replaced by The Serial Old garbage collector, and The system program “Stop The World” will be directly forced to re-trace GC Roots for a long time. All garbage objects will be marked, and new objects will not be allowed to be created. The garbage object is then removed once and for all before resuming the system worker thread.

Serial Old uses the mark-collation algorithm and does not have memory fragmentation.

So in production practice, the proportion of the automatic trigger CMS garbage collection need to optimization, avoid the problem of “Concurrent Mode Failure” (” – XX: CMSInitiatingOccupancyFaction “)

3. Memory fragmentation problem

The old CMS used the “mark-clean” algorithm, marking garbage objects each time and then recycling them once, which resulted in a large amount of memory fragmentation. If there is too much fragmentation, subsequent objects will run out of contiguous memory space in the old age, and the Full GC will be triggered.

So CMS is not entirely mark-clean, because too much memory fragmentation can actually lead to more frequent Full GC.

CMS has a parameter is “- XX: + UseCMSCompactAtFullCollection” * * * *, the default is opened.

This means that after Full GC, you “Stop the World “again, Stop the worker thread, and then defragment, which is to move the living objects together to free up a large contiguous memory space to avoid memory fragmentation.

And a parameter is the * * “- XX: CMSFullGCsBeforeCompaction”, how many times this meaning is to perform Full GC then perform a memory defragmentation job, the default is 0 * *, which means that every time after a Full GC on a memory consolidation.

Old age triggers garbage collection

  1. The available memory of the old age is smaller than the size of all objects of the new generation, and the space guarantee parameter is not enabled, but generally this space guarantee parameter is enabled by default

  2. If the available memory of the old age is less than the average size of objects entering the old age, Full GC is performed

  3. After the Minor GC, the size of the surviving object is larger than the Survivor space and the available memory of the old age (there is not enough available memory), and the Full GC is performed

  4. * * * * “- XX: CMSInitiatingOccupancyFaction” parameter

    Full GC is also triggered automatically if the available memory of the old generation is larger than the average size of objects that have entered the old generation after the previous generation GC, but the memory used by the old generation exceeds the ratio specified by this parameter.

Interview question: Why is recycling so many times slower in the old generation than in the new? Where’s the slowness?

Cenozoic execution is actually very fast, because you can track which objects are alive directly from the GCRoots. There are very few Cenozoic living objects, so this is extremely fast, and you don’t need to track many objects.

The Survivor object is then placed directly into Survivor, and Eden and previously used Survivor are reclaimed at once.

But what about the CMS Full GC?

  1. In the concurrent marking phase, he had to keep track of all the living objects, which was slow in the old days when there were many living objects;

  2. Second, in the concurrent cleaning stage, it does not recycle a large chunk of memory at one time, but finds the garbage objects scattered in various places, which is also very slow.

  3. Finally, you have to perform a memory defragmentation, moving a large number of living objects together to free up contiguous memory space, and “Stop the World”, which is even slower.

  4. If, during Concurrent cleanup, there is not enough memory to hold the aging objects, causing a Concurrent Mode Failure, then the Serial Old garbage collector must be used immediately. After “Stop the World”, you slowly start the recycling process again, which is time consuming.

Case combat: how to optimize the garbage recycling parameters of the e-commerce system with hundreds of millions of requests every day?

Optimization idea

  1. Analyze the memory usage model of the system in a specific scenario
  2. Reasonably optimize the memory size of new generation, old generation, Eden and Survivor area
  3. Try to optimize parameters to avoid the new generation of objects into the old age, try to make objects in the new generation to be recycled

Case background

The e-commerce system is divided into the following subsystems:

  • Commodity system
  • Order system
  • Promotion system
  • Inventory system
  • Member system

Take the order system as an example. Our background is an e-commerce system with hundreds of millions of requests per day.

Generally according to each user access 20 times, hundreds of millions of requests, roughly need 5 million daily active users;

At a 10% conversion rate, we’re getting about half a million orders a day, which is about half a million orders

The 500,000 orders are concentrated in the peak period of 4 hours a day, so in fact there are only dozens of orders per second on average. Do you think there is nothing to talk about? Because under the pressure of dozens of orders, there is no need to pay much attention to the JVM, which is basically taking up a few new generations of memory every second, taking a long time for the new generations to fill up, and then after a Minor garbage collection, the garbage objects are cleared and the memory is free, with almost no pressure.

Special e-commerce promotion scene

Double 11618

Suppose that in a festival like Double 11, at midnight, many people wait for the promotion to start shopping with their hands. At this time, there may be 500,000 orders within 10 minutes of the promotion.

At this time, there will be close to 1000 orders per second. We will analyze the memory usage model of the order system for this kind of big push scenario.

How many machines does it take to withstand the instantaneous pressure of the rush?

Basically three machines, so each machine needs to handle 300 single requests per second. This is also quite reasonable, assuming that the order system is deployed on the most common standard 4-core 8G machine.

Given the machine’s CPU and memory resources, each machine can handle 300 single requests per second. (QPS of several hundred for a single machine is no problem)

The problem, however, is that the JVM’s limited memory resources need to be allocated and optimized, including garbage collection, to keep the JVM’s GC count as low as possible and avoid Full GC to minimize the impact of THE JVM’s GC on the system at peak times.

Estimation of memory usage model for large rush hour order system

An order system, say three machines, each accepts 300 order requests per second.

1KB for one order, 300KB for 300 orders

Then count the order object and a series of other business objects, such as the order item object, inventory, promotion, coupons and so on, generally need to enlarge the cost of a single object by 10 to 20 times.

In addition to placing orders, the order system will also have many other operations related to the order, such as order query and so on, so the combined calculation can be increased by 10 times.

Then there is about 300KB2010 = 60MB memory overhead per second. But after a second, the 60MB object can be considered garbage because 300 orders have been processed and all related objects have lost references and are recyclable.

How memory is allocated

Machine configuration: 4 cores 8G

Allocated to THE JVM 4G

  • Use 3 g heap
    • Young generation 1.5 G
    • The old s 1.5 G
  • Java virtual machine stacks are 1M each, hundreds of threads hundreds of M
  • Permanent generation memory 256MB
-Xms3072M -Xmx3072M -Xmn1536M -Xss1M -XX:PermSize=256M -XX:MaxPermSize=256M - XX:HandlePromotionFailure
Copy the code

After the JDK 1.6, – XX: HandlePromotionFailure parameters have been abandoned.

After JDK1.6, you can perform Minor GC directly without triggering Full GC if either of the following conditions are met: ** Available space of old age > Total number of new generation objects, or ** available space of old age > average size of old age objects.

-Xms3072M -Xmx3072M -Xmn1536M -Xss1M -XX:PermSize=256M -XX:MaxPermSize=256M
Copy the code

– XX: SurvivorRatio = 8 (default), Eden: Survivor1: Survivor2 = 8:1:1

Eden area was 1.5g *0.8= 1.2g, and Survivor area was 0.15g

During the rush period, the order system processes 300 order requests per second, occupying 60MB memory, but after one second, the 60MB object will become garbage. After about 20 seconds, Eden area is full, and it is judged for the first time that the available memory of 1.5G in the old age is not larger than the total memory size of the new generation. However, the available memory in the old age must be larger than the average object size in the previous MinorGC ages, because MinorGC has not been performed yet.

We assume that the last second order request is still being processed, and that objects equal to about 100M will survive the last Minor GC.

After Minor GC, the 100MB live objects will enter S1 and run again for 20 seconds. Eden and S1 will be reclaimed when Eden and S1 are full, and the 100MB live objects will enter S2.

-Xms3072M  -Xmx3072M -Xmn1536M -Xss1M -XX:PermSize=256M -XX:MaxPermSize=256M -XX:SurvivorRatio=8
Copy the code

New generation waste recycling optimization

Is the Survivor zone sufficient

  • Tuning analysis:

    The first question to consider when optimizing the JVM is whether you have enough Survivor zones for the new generation. According to the above logic, first of all, the garbage collection of the new generation should be around 100MB, possibly exceeding 150MB. Isn’t it common to see an object after a Minor GC fail to fit into Survivor? Then wouldn’t it be frequent for objects to enter the old age?

    Whether surviving objects after MinorGC can be placed into Survivor

    Also, even if the objects after Minor GC are less than 150MB, even if 100MB objects enter Survivor zone, because this is a cohort of peer objects, directly exceeding 50% of Survivor zone space, then objects may enter old age.

So Survivor zones are obviously not large enough.

  • Tuning measures:

    In fact, the suggestion here is to adjust the size of the new generation and the old generation, because this kind of ordinary business system, obviously most objects are short life cycle, should not frequently enter the old generation, there is no need to maintain too much memory space for the old generation, first of all, let the objects stay in the new generation as much as possible.

    You can set the memory of the new generation to 2G, the memory of the old generation to 1G, the Eden area to 1.6G, and S1/S2 to occupy 200MB respectively.

  • Tuning results:

    A larger Survivor region greatly reduces the problem of a Survivor not being able to fit into Survivor after the new generation GC, or an age exceeding 50% of Survivor.

-Xms3072M -xM × 3072M-xMN2048m-xSS1M-XX :PermSize= 256m-XX :MaxPermSize= 256m-XX :SurvivorRatio=8
Copy the code

How many times did the new generation of objects dodge recycling and make it into the old?

The default is 15 Minor GCS. In our scenario, 15 Minor GCS is 300 seconds, or about 5 minutes, and such objects are typically Spring Controllers,Service beans, and long-running core business components.

Then he should enter the old age, and this kind of object is generally few, a system accumulated up to a few tens of MB only

Therefore, we can change 15 times to 5 times, and for long-term survival, we can enter the old age earlier.

-Xms3072M -XM × 3072M-xMN2048M-xSS1M-XX :PermSize= 256M-XX :SurvivorRatio= 8-XX :MaxTenuringThreshold=5
Copy the code

How big an object goes directly into the old age

In general, 1MB is enough, as there are few large objects larger than 1MB.

If so, you may have pre-allocated a large array, List, or something to slow down the data stored.

-Xms3072M -XM × 3072M-xMN2048M-xSS1M-XX :PermSize= 256M-XX :SurvivorRatio= 8-XX :MaxTenuringThreshold=5 -XX:PretenureSizeThreshold=1M
Copy the code

Garbage collector specified

The new generation uses ParNew, the old generation uses CMS.

-Xms3072M -XM × 3072M-xMN2048M-xSS1M-XX :PermSize= 256M-XX :SurvivorRatio= 8-XX :MaxTenuringThreshold=5 -XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
Copy the code

New generation waste recycling optimization summary

The core parameters of the ParNew garbage collector are the size of the new generation memory and the ratio of Eden and Survivor. As long as you set the parameters properly, you can avoid Survivor after Minor GC, or aging after dynamic GC. Give plenty of room for Survivor in the new generation, and the Minor GC will generally be fine.

Then according to your system operation model, set “-xx :MaxTenuringThreshold” reasonably, let those long-term survival objects, as soon as possible into the old age, do not stay in the new generation.

Take a look at the JVM parameters of your production system, look at the size of your generation, generation, Eden, and Survivor and estimate your system runtime model:

  • How much memory per second?
  • How often does a Minor GC trigger?
  • How many live objects do you typically have after a Minor GC?
  • Can Survivor play?
  • Does it happen too often that an object falls into the old age because of Survivor?
  • Is it possible to enter the old age due to dynamic age judgment rules?

Old age garbage collection

-Xms3072M -XM × 3072M-xMN2048M-xSS1M-XX :PermSize= 256M-XX :SurvivorRatio= 8-XX :MaxTenuringThreshold=5 -XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
Copy the code

These are the JVM parameters currently specified.

Let’s go back to the old days

In the current case, under what circumstances does the object enter the old age

-XX:MaxTenuringThreshold=5

Objects that have skipped five consecutive Minor GC’s go straight to the old age

  • @Controller
  • @Service

The big object

  • -XX:PretenureSizeThreshold=1M

    In our case, assume that the large object does not exist or is ignored.

Insufficient Survivor space

In addition, objects that may survive beyond 200MB of Survivor after a Minor GC, or exceed 50% of Surviovr all at once, will enter the old age.

However, we previously optimized the JVM parameters of the new generation to avoid this situation, and according to our calculations, this probability should be very low.

But although it is very low, it can not be completely without this situation, for example, after a certain GC may happen to have more than 200MB objects, it will enter the old age.

We can make an assumption that the order system will have a small batch of objects aged about 200MB every 5 minutes after the Minor GC during the rush.

How often does the Full GC trigger during the rush

  1. Before each MinorGC, the available memory in the old age is compared with the average size of objects in the old age after each MinorGC

    Available memory in the old age < average size of objects in the old age after each MinorGC

  2. Maybe after a Minor GC, you want to go up to the size of the object in the old age > the size of the memory available in the old age

  3. Set up “- XX: CMSInitiatingOccupancyFaction” parameters, such as set value is 92%, so this time may be in front of several conditions are not met, but just found this condition is met, such as the space used by more than 92%, is the old s at this point would be to trigger a Full GC

In fact, during the real system running, objects may slowly enter the old age, but because the new generation we have optimized the memory allocation, so objects into the old age is very slow.

So it is likely to be half an hour to an hour after the system runs, there will be objects close to 1GB into the old age. This might trigger one of the conditions 1, 2, and 3, Full GC.

But these three conditions are usually triggered when the old chronospace is almost full.

Suppose a Full GC is triggered after the ordering system runs for an hour during a rush, when the rush ordering peak is almost past.

Note that this inference is important, because according to the big 500000 order to promote the first 10 minutes to calculate, actually promote after the start of a big pile of users waiting to order chop hand shopping so 1 h could have LiangSanBaiWan orders, this is rare rare holiday a year big promoting some, and then after the rush hour, basic order system access pressure is small, Then the GC problem is almost nothing.

Therefore, after the optimization of the new generation, it can be calculated that basically during the peak period, there may be only one FullGC per hour, and then after the peak period, with the slow operation of the order system, there may be only one FullGC for several hours.

Concurrent Mode Failure

A Full GC is performed while a Minor GC is performed while the generated object is not in a Survivor region and goes straight to the old age where there is no available memory

That’s a very low probability, so you can ignore it

-Xms3072M -Xmx3072M -Xmn2048M -Xss1M -XX:PermSize=256M -XX:MaxPermSize=256M
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=5 -XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFaction=92
Copy the code

Memory defragmentation

By default, Full GC does defragmentation once per Full GC, which is not really a big deal.

- Xms3072M - Xm x 3072 m - Xmn2048M  -Xss1M -XX:PermSize=256M -XX:MaxPermSize=256M
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=5 -XX:PretenureSizeThreshold=1M
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFaction=92
-XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0
Copy the code

Full GC optimization is premised on Minor GC optimization, and Minor GC optimization is premised on memory allocation, which is premised on memory usage model estimation during system operation.

On-line systems thinking

How does your online system set JVM garbage collection parameters? Is it set up properly?

Go ahead and look at the JVM parameters of your production system, look at the size of your generation, generation, Eden, and Survivor, and then estimate your system running model, how much memory it consumes per second, and how often it triggers Minor GC.

How many surviving objects can Survivor survive after a Minor GC? Does it happen too often that an object falls into the old age because of Survivor? Is it possible to enter the old age due to dynamic age judgment rules?

Take a look at how large your memory regions should be to ensure that any surviving objects after the new generation of GC are kept on Survivor.

Also, make an estimate of your system’s runtime model and see how often the Full GC will be triggered by the old generation. Do you need to tune various CMS parameters when triggering the Full GC?

Combined with the knowledge learned from the case, draw the operation model, memory allocation, GC trigger, the whole process and model of your system. You must analyze, think and draw by yourself, so as to truly absorb the knowledge.