In the article zero Copy Technology, we introduced the principle of zero copy technology, and we know that MMAP is also an implementation of zero copy technology. In this article, we will focus on the principles of MMAP.

First, traditional reading and writing files

Generally speaking, there are three steps to modify the contents of a file:

  • Read the contents of the file into memory.
  • Modify the contents of memory.
  • Writes memory data to a file.

The process is shown in Figure 1:

If you use code to implement the above procedure, the code would look like this:

read(fd, buf, 1024);  // Read the contents of the file to buf.// Modify the buF content
write(fd, buf, 1024); // Write the contents of buf to the file
Copy the code

As you can see from Figure 1, page cache is the middle layer for reading and writing files, and the kernel uses page cache to associate data blocks of files. So when an application reads or writes a file, it’s actually page caching.

Use Mmap to read and write files

From the traditional file reading and writing process, we can see that there is an optimization: if you can read and write the page cache directly in user space, you can avoid copying the data from the page cache to the user space buffer.

So, is there any technology that can do that? The answer is yes, it’s MMAP.

Using the MMAP system call, you can map (bind) the virtual memory address of user space to the file, and read and write the mapped virtual memory address as if it were a file. The principle is shown in Figure 2:

As we mentioned earlier, reading and writing files requires page caching, so MMap maps the page cache of files, not the files themselves on disk. Because MMap maps the page cache of a file, there is a synchronization issue, which is when the page cache will synchronize data to disk.

The Linux kernel does not actively synchronize mMAP-mapped page caches to disk, but requires the user to do so. There are four opportunities to synchronize mMAP-mapped memory to disk:

  • callmsyncThe function synchronizes data actively (actively).
  • callmunmapFunction to unmap a file (active).
  • Process exits (passive).
  • System shutdown (passive).

Iii. How to use MMAP

Here’s how to use mmap. The prototype of the mmap function is as follows:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
Copy the code

Here are the parameters of the Mmap function:

  • Addr: Specifies the mapped virtual memory address. The value can be set to NULL for the Linux kernel to automatically select an appropriate virtual memory address.

  • Length: indicates the length of the mapping.

  • Prot: indicates the protection mode of the mapped memory. Possible values are as follows:

    • PROT_EXEC: Can be executed.
    1. PROT_READ: Can be read.
    2. PROT_WRITE: Can be written.
    3. PROT_NONE: Inaccessible.
  • Flags: specifies the mapping type. Possible values are as follows:

    • MAP_FIXED: uses the specified start virtual memory address for mapping.
    1. MAP_SHARED: Shares the mapping space with all other processes that map to this file (enabling shared memory).
    2. MAP_PRIVATE: Creates a private mapping space for Copy on Write.
    3. MAP_LOCKED: Locks pages in the mapped area to prevent pages from being swapped out of memory.
    4. .
  • Fd: file handle for mapping.

  • Offset: the file offset (where the file is mapped from).

Having introduced the prototype of the Mmap function, we now use a simple example to show how to use mmap:

int fd = open(filepath, O_RDWR, 0644);                           // Open the file
void *addr = mmap(NULL.8192, PROT_WRITE, MAP_SHARED, fd, 4096); // Map the file
Copy the code

In the above example, we first open the file in read-write mode using the open function, and then map the file using the mmap function as follows:

  • addrIf the parameter is set to NULL, the OPERATING system automatically selects an appropriate virtual memory address for mapping.
  • lengthSetting the parameter to 8192 means that the mapped area is the size of two memory pages (a memory page is 4 KB).
  • protParameter set toPROT_WRITEIndicates that the mapped memory area is read-write.
  • flagsParameter set toMAP_SHAREDRepresents a shared mapping area.
  • fdParameter Setting File handle to open.
  • offsetIf the parameter is set to 4096, the mapping starts at 4096 in the file.

The mmap function returns the mapped memory address from which we can read and write files. Figure 3 shows the structure of the above example in the kernel:

Four,

This article mainly introduces the principle and usage of MMAP. We can know that mmap can reduce The Times of memory copy and system call when reading and writing files, so as to improve the efficiency of reading and writing files.

Since the kernel does not actively synchronize data in the memory area mapped by Mmap, data loss can occur in some special scenarios (such as power outages). To avoid data loss, when using MMAP, you can actively call the msync function to synchronize the mapped memory area data when appropriate.