Ant Group | How to troubleshoot Rust memory usage in production environment
By ShiKaiWi/Edited by Zhang Handong
background
With Rust, memory security, although there is almost no memory leakage, how to properly allocate memory is a problem faced by every complex application. With different services, the same code may have different memory usage. Therefore, there is a high probability that the memory usage is too high and the memory gradually grows and is not released.
In this article, I would like to share with you some of the problems we encountered in the process of practice. For these memory problems, this article will make a simple classification and our troubleshooting methods in the production environment.
Memory allocator
First, in production environments, we tend not to choose the default memory allocator (MALloc), but jemalloc, which provides better multi-core performance and avoids memory fragmentation (see [1] for more details). In Rust’s ecosystem, there are many excellent libraries for jemalloc packaging. Instead of focusing on which library is better, we are more concerned with using the analytical capabilities provided by Jemalloc to help diagnose memory problems.
Read the jemalloc documentation to see that it provides sampling-based memory profile capabilities, and mallctl can be used to set prof.active and prof.dump options. To achieve dynamic control of memory profile switch and output memory profile information effect.
Memory grows rapidly until oom
This is usually the case when the same code faces different business scenarios, because a particular input (often a large amount of data) causes the program’s memory to grow rapidly.
However, with memory profiling, rapid memory growth is an easy situation to deal with because you can turn on a profile during a rapid growth process and, after a certain period of time, output the profile results and visualize them through a tool. It is clear which function calls are allocating memory for which structures.
However, there are two kinds of cases: reproducible and difficult to reproduce. The treatment methods for the two cases are different. The following are operational schemes respectively for these two cases.
Repetition can
Replayable scenarios are actually the easiest problem to solve, because we can dynamically open profiles during replayable scenarios to obtain a large amount of memory allocation information in a short time.
Here is a full demo that shows how dynamic memory profiles can be done in Rust applications.
In this article, I will use the three Rust libraries jemalloc-sys jemallocator jemalloc-ctl to profile memory. The main functions of these three libraries are:
jemalloc-sys
: encapsulates jemalloc.jemallocator
: implements RustGlobalAlloc
To replace the default memory allocator.jemalloc-ctl
: Provides the encapsulation of MallCTL, which can be used for tuning, dynamically configuring allocator, and obtaining allocator statistics, etc.
Here are the demo project dependencies:
[dependencies] jemallocator = "0.3.2" jemalloc- CTL = "0.3.2" [dependencies. Jemalloc -sys] version = "0.3.2" features = ["stats", "profiling", "unprefixed_malloc_on_supported_platforms"] [profile.release] debug = trueCopy the code
The key is that several features of Jemalloc-sys need to be enabled, otherwise the subsequent profile will fail. It is also important to note that demo runs on Linux.
The SRC /main.rs code for the demo is as follows:
use jemallocator; use jemalloc_ctl::{AsName, Access}; use std::collections::HashMap; #[global_allocator] static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc; const PROF_ACTIVE: &'static [u8] = b"prof.active\0"; const PROF_DUMP: &'static [u8] = b"prof.dump\0"; const PROFILE_OUTPUT: &'static [u8] = b"profile.out\0"; fn set_prof_active(active: bool) { let name = PROF_ACTIVE.name(); name.write(active).expect("Should succeed to set prof"); } fn dump_profile() { let name = PROF_DUMP.name(); name.write(PROFILE_OUTPUT).expect("Should succeed to dump profile") } fn main() { set_prof_active(true); let mut buffers: Vec<HashMap<i32, i32>> = Vec::new(); for _ in 0.. 100 { buffers.push(HashMap::with_capacity(1024)); } set_prof_active(false); dump_profile(); }Copy the code
The demo is already a very simplified test case, mainly described as follows:
set_prof_active
和dump_profile
Jemalloc = mallctl; jemalloc = mallctl; jemalloc = mallctlprof.active
Set the Boolean value hereprofile.dump
Set the dump file path.
After compiling, you cannot run the program directly, you need to set the environment variable (enable memory profile function) :
export MALLOC_CONF=prof:true
Copy the code
When you run the program again, you will output a memory profile file whose name is already dead in the demo — profile.out. This is a text file, which is not suitable for direct observation (there is no visual symbol).
With tools such as Jeprof, this can be directly translated into visual graphics:
jeprof --show_bytes --pdf <path_to_binary> ./profile.out > ./profile.pdf
Copy the code
This visualizes it, and we can clearly see where all the memory comes from:
The entire process of the demo is complete, short of a few trivial pieces of work for production, and here’s how we did it:
- Encapsulate it as an HTTP service, which can be triggered by using the curl command to send the result back through HTTP response.
- You can set the profile duration.
- Handle concurrent triggering of profiles.
One of the benefits of this solution that has not been mentioned is that it is dynamic, because enabling profile in memory will certainly have some impact on performance (although the impact of enabling profile in memory is not very large), and we naturally want to avoid enabling profile when there is no problem. So this dynamic switch is very practical.
It is difficult to reappear
In fact, the problems that can be reproduced stably are not problems. In production, the most troublesome problems are the ones that are difficult to reproduce. The problems that are difficult to reproduce are like a time bomb.
Generally, for problems that are difficult to reproduce, the main idea is to prepare and reserve the site in advance. When the problem occurs, although there is a problem with the service, we save the site with the problem, such as the problem of excessive memory usage. A very good idea is to generate coredump in OOM.
However, the coredUMP method is not adopted in our production practice. The main reason is that the memory of the server node in the production environment is usually large, and the coredUMP generated is also very large. It takes a long time to generate coredUMP, which will affect the speed of immediate restart. In addition, analysis, transmission, storage are not very convenient.
Here is the solution we use in production, which is actually a very simple way to indirectly output memory profile results with the functionality provided by Jemalloc.
To start a long-running program that uses Jemalloc, set the jemalloc parameter using an environment variable:
export MALLOC_CONF=prof:true,lg_prof_interval:30
Copy the code
Lg_prof_interval :30 is added to the parameter, which means that each additional 1GB of memory (2^30, can be modified as needed, this is just an example) outputs a memory profile, so that over time, if a sudden increase in memory occurs (beyond the set threshold), A profile must be created so that when a problem occurs, based on the creation date of the file, we can locate exactly what memory allocation occurred at the time of the problem.
Memory grows slowly and is not released
Unlike the rapid growth of memory, the overall use of memory is in a steady state, but over time, memory grows steadily and slowly, using the method described above, it is difficult to find where memory is being used.
This problem is also one of the most difficult problems we encounter in production. We don’t care about allocation events any more than we care about the current memory distribution, but in Rust without GC, we look at the memory distribution of the current program. It’s not an easy thing to do (especially if it doesn’t affect production runs).
In response to this situation, our practice in production environment is as follows:
- Manually free part of structural (often cached) memory
- Then observe the memory changes before and after (how much memory is freed) to determine the memory size of each module
With the help of the statistical function of Jemalloc, the current memory usage can be obtained. We can completely repeatedly release the module memory + calculate the release size to determine the distribution of memory.
The drawback of this scheme is also obvious, that the modules involved in memory usage detection are a priori (you can’t detect the modules outside of your knowledge), but this defect is acceptable, because we are always aware of the places in a program where the memory usage is too high.
The following gives a demo project, according to this demo project, applied to production.
Here are the demo project dependencies:
[dependencies] jemallocator = "0.3.2" jemalloc- CTL = "0.3.2" [dependencies. Jemalloc -sys] version = "0.3.2" features = ["stats", "profiling", "unprefixed_malloc_on_supported_platforms"] [profile.release] debug = trueCopy the code
SRC /main.rs
use jemallocator; use jemalloc_ctl::{epoch, stats}; #[global_allocator] static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc; fn alloc_cache() -> Vec<i8> { let mut v = Vec::with_capacity(1024 * 1024); v.push(0i8); v } fn main() { let cache_0 = alloc_cache(); let cache_1 = alloc_cache(); let e = epoch::mib().unwrap(); let allocated_stats = stats::allocated::mib().unwrap(); let mut heap_size = allocated_stats.read().unwrap(); drop(cache_0); e.advance().unwrap(); let new_heap_size = allocated_stats.read().unwrap(); println! ("cache_0 size:{}B", heap_size - new_heap_size); heap_size = new_heap_size; drop(cache_1); e.advance().unwrap(); let new_heap_size = allocated_stats.read().unwrap(); println! ("cache_1 size:{}B", heap_size - new_heap_size); heap_size = new_heap_size; println! ("current heap size:{}B", heap_size); }Copy the code
It’s a bit longer than the previous demo, but the idea is very simple. Just briefly explain one usage point of jemaloc-ctl. Before getting the new statistics, you must call epoch.advance().
Here is the output from my compiled run:
cache_0 size:1048576B
cache_1 size:1038336B
current heap size:80488B
Copy the code
As you can see here, cache_1 is not strictly 1MB in size, which is fairly normal, but generally (not for this demo) for two reasons:
- There are other memory changes that occur during memory statistics.
- The STATS data provided by Jemalloc is not necessarily completely accurate, because it is impossible to use global statistics for better multi-core performance, so it effectively gives up consistency of statistics for performance.
However, the imprecision of this information does not prevent the problem of locating excessive memory usage, because the freed memory is often so large that small perturbations do not affect the final result.
In addition, there is a simpler solution, which is to release the cache and directly observe the changes in the machine’s memory. However, it is important to know that the memory is not necessarily returned to the OS immediately, and it is tiring to observe with your eyes. A better solution is to integrate such memory distribution checking into your Rust application.
Other General schemes
metrics
Another very effective solution that we use all the time is to record the allocated memory size as an indicator for subsequent collection and observation when a large amount of memory is allocated.
The overall scheme is as follows:
- Use Prometheus Client to record allocated memory (application layer statistics).
- Expose the Metrics interface.
- Configure Promethues Server and pull metrics.
- Configure Grafana and connect to Prometheus Server for visual display.
Memory check tool
Other powerful tools such as HeapTrack and ValGrind have also been tried in the process of troubleshooting high memory usage, but these tools have a huge overhead, which is basically impossible to run in production using these tools.
For this reason, we rarely use such tools to troubleshoot memory problems in a production environment.
conclusion
Although Rust has helped us avoid memory leaks, I think there is still a high probability that many long-running applications will be produced with high memory usage. This article mainly shares several scenarios of high memory usage in the production environment and some commonly used troubleshooting solutions to quickly locate problems without affecting normal production services, hoping to give you some inspiration and help.
Of course, there are other memory problems that we have not encountered, and there are better and more convenient solutions to locate and troubleshoot memory problems. We hope that those who know can communicate with us more.
reference
[1] Experimental Study of Memory Allocation forHigh-Performance Query Processing
[2] Jemalloc usage documentation
[3] jemallocator
About us
We are the timing storage team of Ant Intelligent Monitoring Technology Center. We are using Rust to build a new generation of timing database with high performance, low cost and real-time analysis capability. Welcome to join or recommend, please contact: [email protected]