The underperforming Ark Server has been modified and rewritten. Some time after the refactoring was released, it turned out that the machine memory of the newly released Svr kept growing. As shown below:
The first reaction was that it was over, there must be a memory leak. It took three or four days to locate them using various methods, but nothing.
TCP socket[%d] user check PKG not OK, but no more memory TCP socket[%d] user check PKG not OK, but no more memory I have carefully checked the code logic of the callers, using the SPP tasklet architecture. The packet receiving buffer is a local variable of Msg, and delete is called whenever Msg is destructed. In other words, there is no possibility of a memory leak.
If there is no memory leak, why does memory keep going up? In accordance with the
As you can see from the chart above, memory has increased by about 1G in a day, which is pretty scary. Since the only clue lies in the memory allocation operations new and DELETE, there can only be some foul play here. “Delete not return memory” is a long story. Let’s review the memory management mechanism in C++ programs
Physical memory, virtual memory
Physical memory is easy to say, is the real memory of the machine, your machine is how big memory, physical memory. Virtual memory (virtual address space) is a logical concept. In 32-bit, each process has 4 gb virtual address space, and each process has independent address space. From a process perspective, each process thinks it has the entire memory space (4G) to itself. Process spatial distribution is shown as follows:
As shown above, the highest 1G space is reserved for the kernel. Next is the stack, which extends towards the lower address (the size of the stack is limited by RLIMIT_STACK, which defaults to 8M), next is the MMAP area (file mapped memory, such as dynamic libraries, where the private stack of the SPP tasklet is also located), and next is the heap (dynamic memory growth), which extends towards the higher address. This is followed by BSS, data segment, and code segment.
One thing to note: all of this is virtual memory. The allocation of physical memory pages, etc. (kernel management, page errors) is only involved when this memory space is actually used.
Linux dynamic memory allocation implementation mechanism
C, C++ dynamic memory allocation, management are based on malloc and free, dynamic memory is virtual space heap area. By the way, malloc and free also operate on virtual address Spaces.
Malloc, dynamic memory allocation function. This is done through two system calls, BRK (SBRK) and MMAP.
Combined with the process virtual space diagram above, BRK (SBRK) is to push the highest address pointer _edata of the data segment (.data) to the higher address. Mmap is to find a free chunk of virtual memory in a process’s virtual address space (the area between heap and stack, called the file map area). The differences between the two implementations are roughly as follows:
1. BRK (SBRK), less performance loss; Mmap has high performance loss
2. Mmap has no memory fragmentation (physical page alignment, full page mapping and release); BRK (SBRK) may have memory fragmentation (void, also known as fragmentation, may exist due to the different order of new and DELETE)
The memory allocated by either BRK (SBRK) or MMAP calls is the memory of the virtual space. Only when a page is missing when the allocated virtual address space is accessed for the first time, the operating system is responsible for allocating physical memory, and then establishing a mapping relationship between virtual memory and physical memory.
Delete, dynamic memory freeing function. If the memory is allocated by BRK (SBRK), call BRK (SBRK) and pass a negative number to reduce the size of the Heap. If mMAP allocated memory, call Munmap to return memory. Either way, the process’s virtual address space is immediately reduced and unused physical memory is returned to the operating system.
BRK (SBRK) and MMAP are both system calls. If the program frequently expands and contracts the memory, they are called directly each time. Of course, accurate memory management can be achieved, but the resulting performance loss is also very significant. Most runtime libraries (Glibc) and the like today have a layer of encapsulation for memory management to avoid the performance impact of each direct system call. Thus, it is up to the runtime to design the algorithm for memory allocation.
In the standard C library, malloc/free functions are provided to allocate free memory, which are implemented by BRK,mmap, and munmap system calls.
How to check the number of times a process has page missing terminal?
Run ps -o majflt,minflt -c program.
Majflt stands for major faults, and Minflt stands for minor faults. These two values represent the number of page-missing interrupts that have occurred since a process started.
What actions were performed after the page miss interrupt occurred?
When a process has a page-missing interrupt, the process will fall into kernel mode and perform the following operations:
1. Check whether the virtual address you want to visit is legitimate.
Find/assign a physical page
3. Populate the physical page (read the disk, set it to 0, or do nothing)
4, establish mapping relationship (virtual address to physical address)
Re-execute the instruction on the page – missing terminal
If the third step is to read the disk, the page miss is majFLT, otherwise it is MINFLT.
Run the cat /proc/$pid-smaps command to check the physical memory usage, which records the physical page memory usage of the process, such as Private_Dirty and Private_Clean
Mmap system call: Reading and writing the Mmap mapping area is equivalent to reading and writing the mapped file. The intention is to read and write files as if they were memory. Compared with Read and Write, memory copying is reduced. (Read and Write a hard disk file requires data to be copied from the kernel buffer to the application buffer (Read) and then copied from the application buffer back to the kernel buffer (Write). Mmap copies the data directly from the kernel buffer to another kernel buffer), but the modified data is synchronized from the MMAP area to the disk file, depending on the system’s page management algorithm, which by default hi slowly writes the content to disk. Msync is also provided to force synchronization to disk.
Glibc memory allocation algorithm
Glibc’s Memory allocation algorithm is ptmalloc based on DLmalloc. Dlmalloc can refer to A Memory Allocator or gliBC Memory Allocator in my previous article for details. I will focus on the memory return strategy here, but I won’t expand too much on the rest.
In general, GliBC uses DLMALloc. To avoid frequent system calls, it internally maintains a pool of memory, reuse, also known as free-list or bins, as shown below:
Bins of all memory freed by calls to delete are first hung in free-list(bins), instead of being immediately returned to the operating system with calls to BRK (SBRK). It then merges memory (an optional operation where adjacent chunks of free memory are merged into larger chunks of free memory) and checks whether malloc_TRIM’s threshhold has been reached. If so, malloc_trim is called to return some of the free memory to the operating system.
In glibc, the default threshhold for malloc_trim is 128K. That is, if the maximum available memory in the memory pool managed by DLMALloc is greater than 128K, the MALloc_trim operation is performed to restore some memory to the OS. When the available memory <=128K, the timely program deletes this part of memory, which is not returned to the operating system. The memory footprint of the process does not decrease after the delete call.
In addition, some glibc default Settings are as follows:
DEFAULT_MXFAST 64 (for 32bit), 128(for 64bit) // Free-list (fastbin) Maximum memory block DEFAULT_TRIM_THRESHOLD 128 * 1024 // Malloc_trim Threshold 128K DEFAULT_TOP_PAD 0 DEFAULT_MMAP_THRESHOLD 128 x 1024 // Threshold for memory allocation using MMAP 128K DEFAULT_MMAP_MAX 65536 // Maximum number of MMAPS These parameters can be adjusted using mallopt.
Malloc_trim (0) immediately returns memory to the operating system.
There are a lot of heap overflow attacks based on Fastbin in the early stage. If you are interested, you can Google the keyword fastbin.
Testing:
1. Loop new allocates 64K * 2048 memory space and circulates delete to release dirty data. Top The process still uses 131M memory and is not released. —- use BRK in this case
2. Loop new allocates 128K * 2048 memory space. After writing dirty data, loop delete to release dirty data. Top see process usage, 2960 bytes of memory, fully released. —- Use mmap in this case
3. Set M_MMAP_THRESHOLD to 256K, cycle new to allocate 128K * 2048 memory space, cycle delete to release dirty data, and then call malloc_trim(0). Top To see process usage, 2348 bytes, fully release. —- use BRK in this case
64K Memory usage before Delete:
64K Memory usage after Delete:
128K Memory used before Delete:
128K memory usage after Delete:
The test code is as follows:
int main(int argc, char *argv[])
{
mallopt(M_MMAP_THRESHOLD, 256*1024);
//mallopt(M_TRIM_THRESHOLD, 64*1024);
//MemoryLeak
int MEMORY_SIZE = hydra::CTrans::STOI(argv[1]);
vector<char *> Array;
for (int j=0; j<2064; j++) {
char *Buff = new char[MEMORY_SIZE];
for (int i=0; i<MEMORY_SIZE; i++)
Buff[i] = i;
Array.push_back(Buff);
}
sleep(10);
for (int j=0; j<2065; j++)
delete []Array[j];
cout << "Delete All" << endl;
//sleep(10);
//malloc_trim(0);
//cout << "strim" << endl;
while(1) sleep(10);
}
Copy the code
An example to illustrate the principle of memory allocation
Cases, malloc is less than 128 k of memory, use BRK allocates memory, pushing high _edata to address (only distribute virtual space, don’t correspond to the physical memory (so there is no initialization), read/write data for the first time, cause, the kernel of a page fault. The kernel to allocate corresponding physical memory, then the virtual address space to establish a mapping relationship), the following figure:
1. When the process starts, the initial layout of its (virtual) memory space is shown in Figure 1.
Where mMAP memory-mapped files are in the middle of the heap and stack (e.g. Libc-2.2.93.so, other data files, etc.), memory-mapped files are omitted for simplicity.
The _edata pointer (defined in glibc) points to the highest address of the data segment.
2. The process calls A=malloc(after 30K, the memory space is shown in Figure 2:
The MALloc call invokes the BRK system call, pushing the _edata pointer 30K up the address to complete the virtual memory allocation.
You may ask: is _edata+30K enough to allocate memory?
The fact is that _edata_30K only completes the virtual address allocation, and there is still no physical page corresponding to the memory block A. When the process reads and writes the memory block A for the first time, there will be A page missing interrupt, and then the kernel will allocate the physical page corresponding to the memory block A. That is, if A is allocated with malloc and is never accessed, the physical page corresponding to A will not be allocated.
3. After the process calls B=malloc(40K), the memory space is shown in Figure 3.
If malloC is greater than 128K, allocate memory using MMAP, and allocate a chunk of free memory between heap and stack (corresponding to independent memory and initialized to 0), as shown in the following figure:
4. After the process calls C=malloc(200K), the memory space is shown in Figure 4:
By default, the malloc function allocates memory, and if the requested memory is greater than 128K (adjustable by the M_MAP_THRESHOLD option), instead of pushing the _edata pointer, the MMAP system call allocates a chunk of virtual memory between the heap and stack.
The main reason for this is that: brK-allocated memory can wait until high-address memory is freed (for example, A cannot be freed before B is freed, which is why fragmentation occurs, see below), whereas MMAP allocated memory can be freed separately.
Of course, there are other pros and cons, but if you’re interested, look at the malloc code in Glibc.
5. After the process calls D=malloc(100K), the memory space is shown in Figure 5;
6. After the process invokes free©, the virtual memory corresponding to C is released together with the physical memory.
7. After the process calls free(B), as shown in Figure 7:
Neither virtual memory nor physical memory corresponding to B is freed because there is only one pointer to _edata. If you work backwards, what happens to D? Of course, the B block is reusable, and if another 40K request is made at this point, malloc will probably return the B block.
8. After the process calls free(D), as shown in Figure 8:
B and D are connected to form a 140K block of free memory.
9. By default:
When the maximum free memory exceeds 128K(adjustable by M_TRIM_THRESHOLD), the memory trim operation is performed. In the previous step free, it was found that the maximum address free memory exceeded 128K, so the memory was tightened, as shown in Figure 9.
Conclusion Simply put, the root cause of the growing memory trend at the beginning of this article is that Glibc is using the operating system’s memory to build its own memory pool. Because of large amount of processing request process itself, frequent calls to new and delete, over a period of time, the process of continuous memory was obtained from the operating system to satisfy the requirement of the new call, but came up out of the final result will be, there is always a critical point, makes the process from a new operating system memory and the return of the operating system’s memory to reach a relative balance. Until this dynamic equilibrium is established, memory continues to grow until a critical point is reached.
According to this theory, machine memory should rise and then flatten. Let’s take a look at the memory trend chart of the machine after a few days:
It can be seen that when the system memory grows to about 3.7G, the memory of the whole machine is in a stage of dynamic balance and no longer increases significantly. This proves that our inference is correct.
experience
In the case of growing memory, as described at the beginning of this article, do not jump to conclusions about memory leaks. Wait and see. Probably because of the above analysis.
Welcome to pay attention to my official account: [Java rotten pig skin], get exclusive finishing learning resources, daily dry goods and welfare gifts.
Article source: club.perfma.com/article/184…