Case histories
- Container engine memory leak
1. Container engine memory leaks
The memory limit for the container engine itself on the machine is 4G. Under normal circumstances, dozens of containers are started on the machine, and the memory usage is roughly in the hundreds of meters. Pouchd on the problem machine will trigger OOM once a day, basically confirming that it is a memory leak
1 hour pouCHD will increase the memory of more than 100 M, so to grab pprof data regularly first, use go Tool pprof –base to analyze the memory growth between stages, and no information was obtained here
Also check the /proc/pid/maps data regularly, and find that there is an increasing number of anonymous segments, suspecting that the user code is unable to release the reference data subjently
Periodically grab the code stack and analyze the stack to find that the number of goroutines acquiring a lock is increasing from 13,000 to 15,000. Combined with the analysis of the container code, there are three goroutines obtaining container monitoring that are stuck. Shim doesn’t return, Kubelet keeps asking for container monitoring data, keeps generating new Goroutines, and memory usage keeps increasing, finally getting oom
Once the leaking shim process is removed, container engine memory immediately frees up a few gigabytes and returns to its normal size of a hundred megabytes