A server running Java programs (centos7) has been experiencing memory usage alarms recently. Since the cause of the memory leak has not been found, it has been restarted every few days to release the memory, and recently decided to resolve the problem

The following is the process of replicating and resolving the problem in a test environment

First, free-m checks the memory usage.You can see that the main memory is buff/cache occupied

Total = Used + Free + Buff /cache Linux tries to release buff/cache when the memory is low

The top command displays the Java process memory usage (sorted by M memory usage, sorted by P CPU usage).

Jmap-heap pid Displays memory usage of the Java process heap

The actual memory used by the Java process is 270916 (264m), which is far less than the buff/cache usage of 2g

Since the memory footprint is mostly buff/cache, try to clear the cache

Sync echo 1 > /proc/sys/vm/drop_caches // Clear pagecache echo 2 > /proc/sys/vm/drop_caches // Clear objects (including directory item cache and inode cache) from the slab allocator. Echo 3 > /proc/sys/vm-drop_caches // Clears cache objects in pagecache and slab allocatorCopy the code

Buff /cache usage drops significantly after running Echo 2, but the actual cause of the memory leak is yet to be found

I started to analyze the source code of the program. Since I didn’t write the program, I didn’t have the source code, so I copied it to the test environment and used Alibaba’s Arthas binding process to decompile the program flow logic corresponding code

java -jar arthas-boot.jar
sc com.* | grep message
jad com.cupms.message.tcp.service.impl.CUServiceImpl >/tmp/CUServiceImpl.java
jad com.cupms.message.tcp.handler.ServerIoHandler >/tmp/ServerIoHandler.java
Copy the code

The main logic to view ServerioHandler. Java is to call the dealCU method of CUServiceImpl, so focus on CuserviceImp.java

This method contains two logical blocks: 1,block30: query the database, obtain the corresponding string 2,for loop through the array of characters, call the curl request interface to send data

  • Mysql > create a new database connection from the database pool. Mysql > create a new database connection from the database pool
DBHelper db = new DBHelper();
this.conn = db.getConnection();
this.stt = this.conn.createStatement();
this.set = this.stt.executeQuery(Sql);
     while (this.set.next()) {
        account_string = String.valueOf(account_string) + this.set.getString(1) + ","; 
        }
this.stt.close();
this.conn.close();
Copy the code

Although the connection is released, to test if there is a problem with this code, comment out the following no loop block, just run the database query block logic to see if there is a memory leak, and use arthas compilation to load the modified code

mc /tmp/CUServiceImpl.java
retransform /root/src_message_8240/bin/com/cupms/message/tcp/service/impl/CUServiceImpl.class
Copy the code

Write a script to loop through the program interface, observe the change in memory usage and find that there is no significant increase, so rule out the leak caused by the first section of logic

  • Because the buff/cache is increasing, it is related to the file read and write. I wonder if there is a bug with log4j to write logs

  • Rule out the above two cases and analyze the next piece of code

BufferedReader read=null; InputStream in=null; InputStreamReader inReader=null; pro=Runtime.getRuntime().exec(endCmds); //1 result = pro.waitFor(); //2 in = pro.getInputStream(); //2 inReader=new InputStreamReader(in); //2 read = new BufferedReader(inReader); //2 String line = null; while ((line = read.readLine()) ! = null) {logger. info("curl command return value :" + line); } in.close(); //2 inReader.close(); //2 read.close(); //2 pro.destroy(); / / 1Copy the code

You can see that the buffer is freed, but the buff/cache keeps growing

Comment by comment dynamic load test:

  1. Just run comment 1 and the code keeps growing
  2. If you just run the comment 2 code, pro will not be called and the cache will not be retrieved. It should not grow.

Comment 2 code is responsible for the memory increase.

Pro = runtime.getruntime (). Exec (endCmds) {curl = destroy;

pro.getOutputStream().close();
pro.getInputStream().close();
pro.getErrorStream().close();
pro.destroy();
pro=null;
Copy the code

I added some code to free up the Pro buffer, and the tests grew. So I commented out the original program code and rewrote the same logic for testing

String[] cmd={"/bin/sh","-c","curl -H \"Content-type: application/json\" -X POST -d hhhh www.baidu.com"};
pro=Runtime.getRuntime().exec(cmd);
...
in.close();  //2
inReader.close(); //2
read.close(); //2
pro.destroy(); //1
Copy the code

When tested again, the memory buff/cache does not grow, which means pro is freed

With a large head, why doesn’t the memory of the code I’m testing grow, but the same logical code of the source program grows all the time?

A closer look found that the only difference is the test CMD character array and the source program is not the same, replaced by the source program’s character array

String[] cmd={"/bin/sh","-c", "curl -H \"Content-type: application/json\" -X POST -d '.... ' https://www.baidu.com"};Copy the code

Again, memory has grown!

If you take a closer look, the address used in my test is www.baidu.com, and the source is HTTPS ://www.baidu.com. The source is requesting an HTTPS interface. Curl does not want to delete HTTPS

To verify the hypothesis, write a shell script that uses curl to call the HTTP and HTTPS interfaces respectively to observe memory usage

j=$1; for ((i=1; i<=j; I++) do echo "first" $I "loops" k = ` expr $I % 100 ` if [$k - eq 0] then free -m cat/proc/meminfo | grep Slab cat/proc/meminfo |grep SReclaimable fi if [ $2 -eq 1 ] then /bin/sh -c "curl -H \"Content-type: application/json\" -X POST -d hhhh https://www.baidu.com" >/dev/null else /bin/sh -c "curl -H \"Content-type: application/json\" -X POST -d hhhh www.baidu.com" >/dev/null fi doneCopy the code

Curl memory leak: http://curl memory leak: http://curl memory leak: http://curl memory leak: http://curl memory leak: http://curl memory leak: farll.com/2016/10/hig… www.ddnpc.com/meem-slab.h… Blog.huoding.com/2015/06/10/…

Curl calls the NSS library when requesting an HTTPS resource. Earlier versions of NSS-softokN (3.16.0 and below) have memory leaks

To detect whether a temporary directory is local or a network resource, the NSS accesses hundreds of nonexistent files and counts the time it takes to access them. In the process, it generates a large dentry cache for those nonexistent files. When the dentry cache generated by curl requests exceeds the system’s memory reclamation capacity, the memory usage increases gradually. A foreign language blog has a more detailed introduction, as well. NSS fixes this bug starting with later versions: NSS now avoids calls to sdb_measureAccess in lib/softoken/sdb.c s_open if NSS_SDB_USE_CACHE is “Yes”

Update NSS-softoKN to NSS-softoKN-3.53.1-6.el7_9.x86_64. Update NSS and CURL at the same time

yum update -y nss-softokn
yum update -y nss
yum update -y curl
Copy the code

Retesting memory does not continue to grow, so the problem is resolved.

Ps, I did not add NSS_SDB_USE_CACHE=yes to the test environment. After the upgrade of the production environment, I added export NSS_SDB_USE_CACHE=yes to /etc/profile