Background: This is replicated on centos7 + LXCFS and JDK11.

Here’s how we checked and solved the problem.

I. Fault phenomenon

OPPO kernel team received a case from brother Gan of JVM, the phenomenon is that the hot spot of Java has been the following functions, accounting for a high proportion.

at sun.management.OperatingSystemImpl.getProcessCpuLoad(Native Method)
Copy the code

After authorization, we logged on to oppo cloud host and found top as follows:

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                       
617248 service   20   0   11.9g   8.6g   9300 R 93.8  2.3   3272:09 java                                                                          
179526 service   20   0   11.9g   8.6g   9300 R 93.8  2.3  77462:17 java     
Copy the code

Two, fault phenomenon analysis

Java threads, which can drive CPU usage to such a high level, are usually gc or jNI running in an infinite loop. From the hot spots reported by the JVM team, this thread should be getting CPU utilization all the time, not the usual problems. From perf’s point of view, the thread is always triggering read calls.

Strace then looks at the return value of read, printing the following:

read(360, 0x7f6f9000b000, 4096)         = -1 ENOTCONN (Transport endpoint is not connected)
read(360, 0x7f6f9000b000, 4096)         = -1 ENOTCONN (Transport endpoint is not connected)
read(360, 0x7f6f9000b000, 4096)         = -1 ENOTCONN (Transport endpoint is not connected)
read(360, 0x7f6f9000b000, 4096)         = -1 ENOTCONN (Transport endpoint is not connected)
read(360, 0x7f6f9000b000, 4096)         = -1 ENOTCONN (Transport endpoint is not connected)
read(360, 0x7f6f9000b000, 4096)         = -1 ENOTCONN (Transport endpoint is not connected)
Copy the code

What type of fd is 360?

# ll /proc/22345/fd/360
lr-x------ 1 service service 64 Feb  4 11:24 /proc/22345/fd/360 -> /proc/stat
Copy the code

/proc/stat = /proc/stat = /proc/stat = /proc/stat = /proc/stat

Mount points within the container according to /proc/stat.

# cat /proc/mounts  |grep stat
tmpfs /proc/timer_stats tmpfs rw,nosuid,size=65536k,mode=755 0 0
lxcfs /proc/stat fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/stat fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/diskstats fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
Copy the code

Oddly enough, there are two mount points that are /proc/stat.

Try reading this file manually:

# cat /proc/stat
cpu  12236554 0 0 9808893 0 0 0 0 0 0
cpu0 10915814 0 0 20934 0 0 0 0 0 0
cpu1 1320740 0 0 9787959 0 0 0 0 0 0
Copy the code

Bash has no problem accessing this mount point. Isn’t bash accessing the same mount point as the Java thread?

Look at super_block and mount for fd’s that correspond to Java open.

crash> files 22345 |grep -w 360
360 ffff914f93efb400 ffff91541a16d440 ffff9154c7e29500 REG  /rootfs/proc/stat/proc/stat
crash> inode.i_sb ffff9154c7e29500
  i_sb = 0xffff9158797bf000
Copy the code

Then view the mount point information in the namespace of the corresponding process:

crash> mount -n 22345 |grep lxcfs ffff91518f28b000 ffff9158797bf000 fuse lxcfs /rootfs/proc/stat ffff911fb0209800 ffff914b3668e800 fuse lxcfs /rootfs/var/lib/baymax/var/lib/baymax/lxcfs ffff915535becc00 ffff914b3668e800 fuse lxcfs /rootfs/proc/cpuinfo ffff915876b45380 ffff914b3668e800 fuse lxcfs /rootfs/proc/meminfo ffff912415b56e80 ffff914b3668e800  fuse lxcfs /rootfs/proc/uptime ffff91558260f300 ffff914b3668e800 fuse lxcfs /rootfs/proc/stat/proc/stat ffff911f52a6bc00 ffff914b3668e800 fuse lxcfs /rootfs/proc/diskstats ffff914d24617180 ffff914b3668e800 fuse lxcfs /rootfs/proc/swaps ffff911de87d0180 ffff914b3668e800 fuse lxcfs /rootfs/sys/devices/system/cpu/onlineCopy the code

The super_blcok of the mount point it accesses is ffff914b3668e800. Because the code is extremely simple, it is not listed here.

This further confirms that the troubled Java process is accessing a troubled mount point. So why does visiting this mount point return ENOTCONN?

Crash > struct file.private_data ffff914f93efb400 private_data = 0xffff911e57b49b80 Fc 0xffff911e57b49b80 fc = 0xFFff9158797be000 Crash > fuse_conn.connected 0xffff9158797be000 connected = 0Copy the code

The connection status of fuse_conn is 0. The call chain is as follows:

sys_read-->vfs_read-->fuse_direct_read-->__fuse_direct_read-->fuse_direct_io --->fuse_get_req--->__fuse_get_req static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, bool for_background) { ... err = -ENOTCONN; if (! fc->connected) goto out; . }Copy the code

The next question is, why does the problematic Java thread keep trying again and again without judging the return value of read?

GDB tracks:

(gdb) bt #0 0x00007fec3a56910d in read () at .. /sysdeps/unix/syscall-template.S:81 #1 0x00007fec3a4f5c04 in _IO_new_file_underflow (fp=0x7fec0c01ae00) at fileops.c:593  #2 0x00007fec3a4f6dd2 in __GI__IO_default_uflow (fp=0x7fec0c01ae00) at genops.c:414 #3 0x00007fec3a4f163e in _IO_getc (fp=0x7fec0c01ae00) at getc.c:39 #4 0x00007febeacc6738 in get_totalticks.constprop.4 () from / usr/local/paas - agent/HeyTap - Linux - x86_64-11.0.7 / lib/libmanagement_ext. So # 5 0 x00007febeacc6e47 in Java_com_sun_management_internal_OperatingSystemImpl_getSystemCpuLoad0 () from / usr/local/paas - agent/HeyTap - Linux - x86_64-11.0.7 / lib/libmanagement_ext. SoCopy the code

After confirming that the user state is looped into the _IO_getc process, the JVM starts to check the code and finds a code that is very suspicious:

Unixoperatingsystem. c: get_totalTicks will have a loop like this:

static void next_line(FILE *f) { while (fgetc(f) ! = '\n'); }Copy the code

From the fuse module code, there are not many places where FC ->connected = 0. It is usually caused by abort or fuse_put_super calls. Fuse mount point failure: user-mode file system LXCFS exits, mount point and inode are accessing, and super_block cannot be recovered in time.

remount_container() {
    export f=/proc/stat    && test -f /var/lib/baymax/lxcfs/$f && (umount $f;  mount --bind /var/lib/baymax/lxcfs/$f $f)
Copy the code

Use bind mode, the specific bind mount feature can refer to the online information.

And in the journal log, the key is found:

: umount: /proc/stat: target is busy
Copy the code

This line of logs indicates that there was a failure when uninstalling and not all access was closed.

Three, fault recurrence

In response to the above doubts, a very simple demo was written as follows:

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #define SCNd64 "I64d" #define DEC_64 "%"SCNd64 static void next_line(FILE *f) { while (fgetc(f) ! = '\n'); } #define SET_VALUE_GDB 10 int main(int argc,char* argv[]) {unsigned long I =1; unsigned long j=0; FILE * f=NULL; FILE *fh; unsigned long userTicks, niceTicks, systemTicks, idleTicks; unsigned long iowTicks = 0, irqTicks = 0, sirqTicks= 0; int n; if((fh = fopen("/proc/stat", "r")) == NULL) { return -1; } n = fscanf(fh, "cpu " DEC_64 " " DEC_64 " " DEC_64 " " DEC_64 " " DEC_64 " " DEC_64 " " DEC_64, &userTicks, &niceTicks, &systemTicks, &idleTicks, &iowTicks, &irqTicks, &sirqTicks); // Move to next line while(i! =SET_VALUE_GDB)---------- If the mount point does not change, then go this large cycle, single-core CPU close to 40% {next_line(fh); // if the mount point changes and ENOTCONN is returned, the single-core CPU will approach 100% j++; } fclose(fh); return 0; }Copy the code

Execute the above code and get the following result:

# GCC -g -o caq.o caq.c 628957 root 20 0 8468 612 484 R 32.5 0.0 18:40.92 CAq. o When the CPU usage is 32.5, the mount point information is as follows:  crash> mount -n 628957 |grep lxcfs ffff88a5a2619800 ffff88a1ab25f800 fuse lxcfs /rootfs/proc/stat ffff88cf53417000 ffff88a4dd622800 fuse lxcfs /rootfs/var/lib/baymax/var/lib/baymax/lxcfs ffff88a272f8c600 ffff88a4dd622800 fuse lxcfs /rootfs/proc/cpuinfo ffff88a257b28900 ffff88a4dd622800 fuse lxcfs /rootfs/proc/meminfo ffff88a5aff40300 ffff88a4dd622800  fuse lxcfs /rootfs/proc/uptime ffff88a3db2bd680 ffff88a4dd622800 fuse lxcfs /rootfs/proc/stat/proc/stat ffff88a2836ba400 ffff88a4dd622800 fuse lxcfs /rootfs/proc/diskstats ffff88bcb361b600 ffff88a4dd622800 fuse lxcfs /rootfs/proc/swaps ffff88776e623480 ffff88a4dd622800 fuse lxcfs /rootfs/sys/devices/system/cpu/online #systemctl restart LXCFS mount point: /proc/stat fd: /proc/stat fd: /proc/stat fd: /proc/stat fd  crash> mount -n 628957 |grep lxcfs ffff88a5a2619800 ffff88a1ab25f800 fuse lxcfs /rootfs/proc/stat ffff88a3db2bd680 Ffff88a4dd622800 fuse LXCFS/rootfs/proc/stat/proc/stat -- -- -- -- -- -- -- -- -- -- -- -- this mount point, due to the fd not closed, so the unloading failure for certain, You can see the super_block is restart before ffff887795a8f600 ffff88a53b6c6800 fuse LXCFS/rootfs/var/lib/baymax/var/lib/baymax/LXCFS ffff88a25472ae80 ffff88a53b6c6800 fuse lxcfs /rootfs/proc/cpuinfo ffff88cf75ff1e00 ffff88a53b6c6800 fuse lxcfs /rootfs/proc/meminfo ffff88a257b2ad00 ffff88a53b6c6800 fuse lxcfs /rootfs/proc/uptime ffff88cf798f0d80 ffff88a53b6c6800 Fuse LXCFS/rootfs/proc/stat/proc/stat/proc/stat -- -- -- -- -- -- -- -- the bind mount, Fuse LXCFS /rootfs /proc/diskSTATS ffFF88CF798f1f80 ffFF88a53b6c6800 fuse LXCFS /rootfs/proc/diskstats ffFF88CF798f80 ffFF88a53b6c6800 fuse lxcfs /rootfs/proc/swaps ffff88a53f295980 ffff88a53b6c6800 fuse lxcfs /rootfs/sys/devices/system/cpu/online 628957 root 20 0 8468 612 484 R 98.8 0.0 18:40.92 caq.oCopy the code

4. Fault avoidance or solution

Our solution is:

1. The judgment of adding return value in FGETC cannot be without brain death cycle.

2. Since LXCFS in 3.1.2 has a segment error causing exit bug, try to upgrade to a version later than 4.0.0, but this involves version adaptation of libfuse dynamic library.

3. When a fuse user-mode file system such as LXCFS is used to remount a file, if the unmounting fails, the delay aggravates the retry and the log alarm system generates an alarm if the retry exceeds the number of times.