1. Problems arise

Received a container monitoring alarm yesterday. This service is mainly used for interface development and docking. Usually the traffic is relatively small, so the use of 2G memory limit container.

The JVM startup parameter is ault VM flags. -XX:CICompilerCount=3 -XX:InitialHeapSize=1073741824 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=357564416 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=357564416 -XX:OldSize=716177408 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseParallelGC

After viewing the monitoring information of the container for 24 hours, the actual memory usage of the container is shown as follows: The actual memory usage of the instance represented by the green curve has been soaring.

The top command is used to view the usage of Java processes

Run the top -h -p command to check the usage of threads in the Java process

After viewing some monitoring information, it is found that the monitoring information is basically accurate. The container memory limit is 2 GB, and the service process uses about 1.3 GB, but the service actually uses more than 2.3 GB of memory, and the Page-cache is large.

2. Problem analysis

2.1 Analysis within the JVM

Using the < jmap – histo: live 1 | head – 10 > command to see the following, have not found the existence of large objects

< jmap-dump :format=b,file=21_01_21.hprof 1> dump the stack information and use mat to analyze the results as follows: there are no large objects in the stack. It seems that the problem is not with the JVM heap.

2.2 Linux System Analysis

Using the < pmap – 1 x | sort – 3 – n – k r > view the process of memory, and see that the great god said the malloc function of 64 m, according to the results did not assign special large memory space.

Using the < the strace – f – e “BRK, mmap, munmap, fork, clone” – p 1 > command to view the memory of opening and recycling method is called, still couldn’t find the memory allocation.

So far there are no leaks or large memory allocations in or out of the heap. <cat /proc/meminfo> <cat /proc/meminfo> <cat /proc/meminfo> <cat /proc/meminfo> <cat /proc/meminfo> <cat /proc/meminfo>

# cat/proc/meminfo... MemTotal: 129778440 kB MemFree: 7533696 kB MemAvailable: 52198264 kB Effective memory Buffers: Cached: 48735240 kB Size of the Cached data SwapCached: 0 kB Size of the swap partition on the hard disk Active: 88164904 kB page file size of cache memory Inactive use Inactive: 23182084 kB page file size of cache memory in infrequent use Active(anon): 26384472 kB Inactive(ANon): 2672 kB Inactive anonymous memory Active(file): 61780432 kB Active file memory Inactive(file): 23179412 kB Inactive file memory...... Shmem: 22056 kB Specifies the size of the allocated shared memory...... SReclaimable: 8194524 kB Specifies the size of a reclaimable slabCopy the code

Since the traffic of this service is very small, the problem caused by network I/O is excluded. It is almost certain that disk I/O is caused by log printing. <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs> <vmtouch /root/logs>

Problem solving

Arthas command <logger –name root –level error -c XXXXX > dynamically raised the log printing level of the instance from INFO to lerror to reduce the log printing volume. Immediately, the actual memory usage and page_cache of the container are reduced.

But over the course of the night, the actual memory usage goes up again. Look at the code and you’ll see that the RocketMQ-Client logs are printed separately somewhere else. Another thing that happens here is that the container memory usage goes down again at some point. After checking, we found that this period happened to be the rolling time of the RocketMQ-Client log file. We adjusted the log file printing configuration as follows, and the container memory and page_cache used later were reduced to normal levels.

4. Question review

4.1 Page Cache knowledge

Page cache, also known as PCache, its Chinese name is called page cache, short for page cache. The size of a page cache is one page, usually 4K. When Linux reads and writes files, it is used to cache the logical contents of files, thus speeding up the access to disk images and data. Therefore, page cache can effectively reduce I/O and improve the I/O speed of applications. You can use /proc/meminfo, free, and /proc/vmstat to monitor. However, too much page cache memory may cause problems such as server load surge, service response latency burr, and average service access latency increase.

4.2 Impact on File READ and write Performance

I tested this effect on a development environment, and the results were the same: the larger the read and write files, the more memory usage grew. This is very similar to the scenario where logs are frequently printed, because log files are cumulative and consume more and more memory.

The resources

www.jianshu.com/p/933c664c2… The/proc/meminfo explanation

www.cnblogs.com/coldplayere… Vmtouch use

Blog.csdn.net/top\_explor… Java tests page cache write back mechanism

www.freesion.com/article/551… File read/write _ Performance Test _

Blog.csdn.net/u012501054/… Linux memory details