Current problems encountered

The logging framework previously used by the company is a log module developed based on Java layer IO system. There are two main implementation modes. One is to record customer actions by using corresponding output streams on specific business nodes. This results in frequent IO operations. If the user stays for a long time, the user will use a lot of memory. Then, when appropriate, the log in memory will be written to the file system (Flush). If the user forces the process to kill before flush, The contents of memory will be lost.

Writing logs to files in real time can ensure the integrity of logs, but file writing is an IO operation, involving the switch between user mode and kernel mode, and this cost is unavoidable even if starting a thread, that is to say, it is relatively time-consuming to write logs in real time even if starting a new thread.

Therefore, to optimize, you need to understand the process of file reading and writing.

To optimize the

At a macro level, the Linux operating system architecture is divided into user-mode and kernel-mode (or user-space and kernel). A kernel is essentially software – controlling the hardware resources of a computer and providing the environment in which upper-layer applications run. User mode is the activity space of the upper application program. The execution of the application program must rely on the resources provided by the kernel, including CPU resources, storage resources, I/O resources, etc. In order for upper-layer applications to access these resources, the kernel must provide an interface for upper-layer applications to access them: system calls.

Here’s an example of the process:

It mainly controls hardware resources downward and manages operating system resources inward, including process scheduling and management, memory management, file system management, device driver management, and network resource management. It provides system call interfaces for application programs upward. As a whole, the operating system is divided into two layers: user mode and kernel mode.

Many programs start in user mode, but during execution, some operations need to be performed under kernel permissions, which involves switching from user mode to kernel mode. The memory allocation function malloc() in the C library allocates memory using the SBRK () system call. When malloc calls SBRK (), it involves a switch from user state to kernel state. Similar functions include printf(), which calls wirte() system call to output a string. And so on.

Under what circumstances does the switch from user mode to kernel mode occur? There are generally three situations:

1) System call, of course: the reason is analyzed above.

2) abnormal events: when the CPU is executing the program running in user mode, some unexpected abnormal events suddenly occur, this time will trigger from the current user mode execution process to kernel mode execution related abnormal events, typical such as page missing exception.

3) peripheral interrupt: when peripheral equipment to complete the user’s request, will be like the CPU interrupt signal, at this point, the CPU will be suspended due to instructions to be executed the next to the interrupt signal corresponding handler, if previously executed instructions are under the user mode, are naturally occurring from user mode to kernel mode conversion.

In general, to reduce the number of reads and writes to files, analogous to the overhead of switching thread contexts. So, how to reduce the number of read and write files, and prevent power outages caused by memory data loss.

case

The Xlog module in the Mars framework of wechat open source is implemented based on the MMAP feature. Mmap file mapped memory is used as the cache to maximize log integrity without sacrificing performance. The log file is first written to the mMAP file mapped memory. Based on the MMAP feature, even if the user forcibly kills the process, the log file will not be lost and will be written back to the log file during the next initialization.

mmap

Mmap is a method of memory-mapping files. A file or other object is mapped to the address space of a process to achieve the mapping between the file disk address and a segment of virtual address in the process virtual address space. After such mapping is achieved, the process can use Pointers to read and write the memory, and the system will automatically write back dirty pages to the corresponding file disk, that is, the operation on the file is completed without calling system call functions such as read and write. Conversely, changes made by the kernel space to this area directly reflect user space, allowing file sharing between different processes.

The implementation process of MMAP memory mapping can be divided into three stages:

(a) The process starts the mapping process and creates a virtual mapping area for the mapping in the virtual address space

Void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset); void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset)

(2) Call the kernel space system call function Mmap (different from the user space function), realize the file physical address and process virtual address one-to-one mapping relationship

(3) the process initiates the access to this mapping space, causes the page missing exception, and realizes the copy of the file content to physical memory (main memory)

Note: The first two phases only create virtual intervals and complete address mapping, but do not copy any file data to main memory. The actual file reading is when the process initiates a read or write operation.

When a process reads or writes a mapped address in the virtual address space, it queries the page table and finds that this address is not on the physical page. Because only address mapping is currently established, the real hard disk data has not been copied to memory, causing a page failure exception.

Note: Dirty page changes are not immediately updated back to the file, but there is a delay, you can call msync() to force the synchronization so that the written content is immediately saved to the file.

The difference between MMAP and regular file operations

In normal file system operations (calling functions like read/fread), a function is called:

1. The process initiates a file read request.

2. The kernel finds the inode of the file by searching the process file character table to locate the file information on the kernel open file set.

The inode looks at address_space to see if the requested file page is already cached in the page cache. If so, the contents of the file page are returned directly.

4. If not, the inode locates the file disk address and copies the data from the disk to the page cache. The page reading process is then initiated again to send the data from the page cache to the user process.

In summary, regular file operations use page caching to improve read and write efficiency and protect disk. As a result, the file pages need to be copied from the disk to the page cache before reading the file. Because the page cache is in the kernel space and cannot be directly addressed by the user process, the data pages in the page cache need to be copied to the corresponding user space in the memory. In this way, after two data copy processes, the process can complete the task of obtaining the file content. The same is true for write operations. The buffer to be written cannot be directly accessed in the kernel space. It must be copied to the corresponding main memory of the kernel space and then written back to disk (delayed write back), which also requires two data copies.

Create a new virtual memory region and set up the file disk address and virtual memory region map in the mmap operation file without any file copy operation. When the data is accessed later and no data is found in the memory, the page missing exception process can be initiated. Through the established mapping relationship, data can be copied only once from the disk into the user space of the memory for the process to use.

In summary, regular file operations require two copies of data from disk to page cache and then to user main memory. Mmap manipulates files, requiring only a single copy of data from disk to user’s main memory. To put it simply, the key point of MMAP is to enable direct interaction between user-space and kernel-space data without the cumbersome process of data incompatibility between different Spaces. Therefore, MMAP is more efficient.

Summary of mMAP advantages

1, the file read operation across the page cache, reduce the number of data copy, memory read and write instead of I/O read and write, improve the file read efficiency.

2. Realized efficient interaction between user space and kernel space. The modification operations of the two Spaces can be directly reflected in the mapped area, thus being captured in time by the other space.

3. Provide the way for processes to share memory and communicate with each other. Both parent and unrelated processes can map their own user space to the same file or anonymously to the same area. Therefore, the purpose of interprocess communication and interprocess sharing can be achieved by changing the mapping region.

At the same time, if both process A and process B map region C, when A reads C for the first time, the file page is copied from disk to memory through the missing page. However, when USER B reads the same page from user C again, a page error will occur. However, the file data stored in the memory can be directly used instead of copying the file from the disk.

4, can be used to achieve efficient large-scale data transmission. Insufficient memory is a constraint to big data operations. The solution is to use hard disk space to supplement the insufficient memory. However, it further causes a large number of file I/O operations, which greatly affects efficiency. This problem can be solved by mmap mapping. In other words, mMAP can be useful whenever disk space is needed instead of memory.

Details of mmap usage

1. A key point to note when using Mmap is that the mmap area size must be an integer multiple of the physical page size (page_size) (typically 4k bytes on 32-bit systems). The reason is that the minimum granularity of memory is pages, and the mapping of process virtual address space to memory is also in pages. To match memory operations, mMap’s mapping from disk to virtual address space must also be pages.

2. The kernel can track the size of the underlying object (file) mapped by memory, and the process can legally access those bytes within the current file size and memory mapped area. That is, if the file size keeps growing, the process can legally obtain any data within the mapping range, regardless of the size of the file at the time the mapping was created. For details, see “Case 3”.

3. After the mapping is created, even if the file is closed, the mapping still exists. Because the mapping is the address of the disk, not the file itself, and the file handle is irrelevant. At the same time, the effective address space available for interprocess communication is not entirely limited by the size of the mapped file, because it is a page by page mapping.

With that in mind, let’s look at what happens when the size is not an integer multiple of the page:

Case 1: The size of a file is 5000 bytes. The Mmap function starts from the starting location of the file and maps 5000 bytes into virtual memory.

Analysis: The size of the unit physical page is 4096 bytes. Although the mapped file is only 5000 bytes, the size of the virtual address area of the process must meet the size of the entire page. Therefore, after the mmap function is executed, 8192 bytes are actually mapped to the virtual memory area, and the bytes from 5000 to 8191 are filled with zero.

At this time:

(1) If you read or write the first 5000 bytes (0 to 4999 bytes), the operation file content is returned.

(2) When bytes 5000 to 8191 are read, all the results are 0. When writing data from 5000 to 8191, the process does not report an error, but the content is not written into the original file.

(3) Read/write disk parts other than 8192 will return a SIGSECV error.

Case 2: The size of a file is 5000 bytes. The Mmap function starts from the starting position of a file and maps 15000 bytes to virtual memory. That is, the mapping size exceeds the size of the original file.

Analysis: Since the file size is 5000 bytes, as in case 1, it corresponds to two physical pages. Then both physical pages are legal to read and write, but the portion that exceeds 5000 is not reflected in the original file. The program requires mapping of 15000 bytes, and the file occupies only two physical pages. Therefore, 8192 to 15000 bytes cannot be read or written, and an exception will be returned during operation.

At this time:

(1) The process can normally read or write the first 5000 bytes (0 to 4999) mapped. Changes in write operations will be reflected in the original file after a certain period of time.

(2) For 5000 to 8191 bytes, the process can read and write without error. However, the content is 0 before writing, and is not reflected in the file after writing.

(3) The process cannot read or write the 8192 to 14999 bytes and reports a SIGBUS error.

(4) For bytes other than 15000, the process cannot read or write them, which will cause SIGSEGV errors.

Case 3: A file with an initial size of 0 is mapped 1000*4K using mmap operation, that is, about 4M bytes of space for 1000 physical pages. Mmap returns pointer PTR.

Analysis: If a file is read or written at the beginning of the mapping, a SIGBUS error will be returned because the file size is 0 and there is no valid physical page corresponding to the file, as in case 2.

However, if the file size is increased before each PTR operation is performed, then PTR operations inside the file size are valid. For example, if the file is expanded by 4096 bytes, the PTR can operate on PTR ~ [(char) PTR + 4095] space. PTR can correspond to operations of the same size as long as the file scale is within 1000 physical pages (mapping range).

In this way, it is convenient to expand the file space at any time, write files at any time, and do not cause space waste.