Unify the data searched on the Internet recently to facilitate follow-up review.

What is MMAP

Mmap is a method of memory-mapping files. A file or other object is mapped to the address space of a process to achieve a mapping relationship between the file disk address and a segment of virtual address in the process virtual address space. Void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

After such mapping is achieved, the process can use Pointers to read and write the memory, and the system will automatically write back dirty pages to the corresponding file disk, that is, the operation on the file is completed without calling system call functions such as read and write. As shown in the figure below

Mmap suitable for frequently on the same block area, speaking, reading and writing, such as an index of 64 m file storage for information, we need to frequent changes and persisted to disk, this file can be through mmap mapped to the user’s virtual memory, and then modify memory areas, by means of a pointer by the operating system will automatically modify part of the brush back to disk, You can also manually brush the disk by calling msync yourself.

Kernel perspective analysis of mMAP principles, this blog is illustrated, direct reference is good. Linux mmap memory mapping principle analysis – fish, so the deep column – CSDN blog blog.csdn.net/yusiguyuan/…

Mapping is just mapping to virtual memory, don’t worry about the size of the mapped file.

  1. The 4 gigabytes of memory per process is just virtual memory. Every time you access an address in memory, you need to translate the address into physical memory
  2. All processes share the same physical memory, and each process maps and stores only the virtual memory space it currently needs in physical memory.

Linux process virtual memory – fengxin blog blog – CSDN blog.csdn.net/fengxinlinu…

2. Mmap Parameter Description

Mapping a file or device to memory and unmapping is the Munmap function.

The syntax is as follows:

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); int munmap(void *addr, size_t length);

This function has three main uses:

  • Common files are mapped to the memory, which is usually used when frequent file reads and writes are required. Memory reads and writes replace I/O reads and writes to achieve high performance.
  • Special files are anonymously mapped to provide shared memory space for associated processes.
  • Posix shared memory for unassociated processes (SystemV shared memory operations are shmget/shmat)

Let’s look at the input parameter selection of the function:

Parameter addr:

Indicates the start memory address to be mapped. The value is usually set to NULL, indicating that the system automatically selects the address and returns the address after the mapping succeeds.

Parameter Length:

Represents how large a portion of the file is mapped to memory.

Parameter PROT:

Protection mode of the mapping area. It can be a combination of the following ways:

The PROT_EXEC mapping region can be executed

PROT_READ Map region can be read

PROT_WRITE The mapping area can be written

The PROT_NONE mapping region cannot be accessed

Flags:

Affects the various properties of the mapping area. MAP_SHARED or MAP_PRIVATE must be specified when mmap() is called.

MAP_FIXED If the address specified by the start parameter cannot be mapped successfully, the mapping is abandoned and the address is not modified. It is usually discouraged.

MAP_SHARED writes to the mapped region are copied back into the file and allowed to be shared by other mapped processes.

MAP_PRIVATE writes to a mapped region cause a copy of the mapped file, that is, a private copy on write does not write back any changes made to the region.

MAP_ANONYMOUS Establishes an anonymous mapping. The fd parameter is ignored, no files are involved, and the mapping area cannot be shared with other processes.

MAP_DENYWRITE allows only writes to mapped areas. Other direct writes to files are rejected.

MAP_LOCKED locks the mapped region, which means that the region will not be swapped.

Parameter FD:

File descriptor to map to in memory. If anonymous memory mapping is used, that is, MAP_ANONYMOUS is set in flags and fd is set to -1.

Parameter offset:

The offset of the file mapping, usually set to 0, corresponds from the front of the file. Offset must be an integer multiple of the paging size.

Return instructions

On success, mmap() returns a pointer to the mapped region and munmap() returns 0. On failure, mmap() returns MAP_FAILED[void *)-1] and munmap returns -1. Errno is set to one of the following values.

EACCES: Access error

EAGAIN: The file is locked, or too much memory is locked

EBADF: fd is not a valid file descriptor

EINVAL: One or more parameters are invalid

ENFILE: the upper limit for open files has been reached

ENODEV: specifies that the file system does not support memory mapping

ENOMEM: Insufficient memory, or the process has exceeded the maximum number of memory mappings

EPERM: Insufficient power, operation is not allowed

ETXTBSY: open the file in written mode with the MAP_DENYWRITE flag specified

SIGSEGV: Try to write to the read-only area

SIGBUS: Try to access an area of memory that does not belong to the process

Comparison of mMAP and direct I/O (Read and write) efficiency

Can not simply say which high efficiency, to see the specific implementation and specific application. The actual write Read Mmap process is as follows:

The advantage of Mmap is that by mapping a portion of the file to user space, the user can read and write directly to the kernel buffer pool, so there is less copy back and forth between the kernel and user space and it is usually faster. But mMAP is only good for updating, reading and writing to a fixed size area of a file and not for things like growing a file by constantly writing into it.

The main difference between the two is that read and write perform more system calls and do more copying than mmap and memcpy. Read and write copy data from the kernel buffer to the application buffer, and then copy data from the application buffer to the kernel buffer. Mmap and memcpy, on the other hand, copy data directly from one kernel buffer mapped to the address space to another. Such copying occurs as a result of handling page errors (one error per page read, one error per page write) when referencing a memory page that does not yet exist.

So the efficiency comparison between the two is the cost of system calls and extra copy operations versus the cost of page errors, and whichever cost is less is better. Mmap eliminates reading and writing, which simplifies the logic and facilitates programming.

The system call mmap() can map a file to memory (process space), which can turn operations on the file into operations on memory, avoiding more lseek() and read() and write() operations, which is especially beneficial for large files or frequently accessed files. But it is important to be clear that mmap addr and offset must be aligned to a memory page size boundary, that is, the memory map is usually an integer multiple of the page size, otherwise maAPed_FILe_size %page_size memory space will be wasted. When mapping, specify offset as an integer multiple of the size of the memory page.

Use of memory file mapping:

(1) Read large data files, effectively improve the performance of data communication between disk and memory;

(2) Fast shared memory between processes to achieve efficient communication between processes.

The performance of memory-mapped files is higher than that of ordinary I/OS.

Memory file mapping and common file IO copy data to memory through the file system and disk drive. The larger the memory file mapping data is, the faster it is.

(1) Before the actual data copy, mapping information needs to be established. The mapping relationship of memory file mapping has been prepared in advance. The kernel has scheduled the memory block in the process and delivered it to the kernel for pre-processing.

(2) In actual copying, memory file mapping directly copies disk data to the user process memory space for only one copy, while common IO copies the file to the kernel cache space first, and then to the user process memory space for two copies.

Here is a performance analysis table that reads disk files of different sizes using ordinary Fread functions and memory-mapped file functions:

When many files in the tens of MB, hundreds of MB, or more than 1GB of file data need to be accessed frequently, or all of these large files need to be loaded in the beginning, then the use of in-memory file mapping should be considered.

4. Online test examples

/ TMP /file_mmap to uppercase, using mmap and read/write respectively. Create the/TMP /file_mmap file, write it to www.baidu.com, and use strace to count system calls.

/*
* @file: t_mmap.c
*/
#include <stdio.h>
#include <ctype.h>
#include <sys/mman.h> /*mmap munmap*/
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
 
int main(int argc, char *argv[])
{
 int fd;
 char *buf;
 off_t len;
 struct stat sb;
 char *fname = "/tmp/file_mmap";
 
 fd = open(fname, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
 if (fd == -1)
 {
  perror("open");
  return 1;
 }
 if (fstat(fd, &sb) == -1)
 {
  perror("fstat");
  return 1;
 }
 
 buf = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 if (buf == MAP_FAILED)
 {
  perror("mmap");
  return 1;
 }
 
 if (close(fd) == -1)
 {
  perror("close");
  return 1;
 }
 
 for(len = 0; len < sb.st_size; ++len) { buf[len] = toupper(buf[len]); /*putchar(buf[len]); * /}if (munmap(buf, sb.st_size) == -1)
 {
  perror("munmap");
  return 1;
 }
 return 0;
}
Copy the code

Test results:

root@chenwr-pc:/home/workspace/test# gcc tmp.c -o run 
root@chenwr-pc:/home/workspace/test# strace ./run 
execve("./run"["./run"], [/* 22 vars */]) = 0
brk(0)                                  = 0x1ffa000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106932, ... }) = 0 mmap(NULL, 106932, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fcab05de000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P \2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1857312, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcab05dd000
mmap(NULL, 3965632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fcab000e000
mprotect(0x7fcab01cc000, 2097152, PROT_NONE) = 0
mmap(0x7fcab03cc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x7fcab03cc000
mmap(0x7fcab03d2000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fcab03d2000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcab05db000
arch_prctl(ARCH_SET_FS, 0x7fcab05db740) = 0
mprotect(0x7fcab03cc000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7fcab05f9000, 4096, PROT_READ) = 0
munmap(0x7fcab05de000, 106932)          = 0
open("/tmp/file_mmap", O_RDWR|O_CREAT, 0600) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=14, ... }) = 0 mmap(NULL, 14, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7fcab05f8000 close(3) = 0 munmap(0x7fcab05f8000, 14) = 0 exit_group(0) = ? +++ exited with 0 +++Copy the code

The file has become uppercase.

root@chenwr-pc:/tmp# cat file_mmap 
WWW.BAIDU.COM
Copy the code

Description of the demo online:

open("/tmp/file_mmap", O_RDWR | O_CREAT, 0600) = 3 / / open, return to the fd = 3 fstat64 (3, {st_mode = S_IFREG | 0644, st_size = 18,... }) = 0 / / fstat, namely file size 18 mmap2 (NULL, 18, PROT_READ | PROT_WRITE, MAP_SHARED, 3. 0) = 0xb7867000 //mmap file fd=3 close(3) = 0 //close file fd=3 munmap(0xB7867000, 18)= 0 Remove memory mapping 0xb7867000 where addr is 0(NULL), offset is 18, which is not an integer multiple of a memory page, i.e. 4078bytes (4KB-18) is wasted.Copy the code

(2) Demo2 read

#include <stdio.h>
#include <ctype.h>
#include <sys/mman.h> /*mmap munmap*/
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
 int fd, len;
 char *buf;
 char *fname = "/tmp/file_mmap";
 ssize_t ret;
 struct stat sb;
 
 fd = open(fname, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
 if (fd == -1)
 {
  perror("open");
  return 1;
 }
 if (fstat(fd, &sb) == -1)
 {
  perror("stat");
  return 1;
 }
 
 buf = malloc(sb.st_size);
 if (buf == NULL)
 {
  perror("malloc");
  return 1;
 }
 ret = read(fd, buf, sb.st_size);
 for(len = 0; len < sb.st_size; ++len) { buf[len] = toupper(buf[len]); /*putchar(buf[len]); */ } lseek(fd, 0, SEEK_SET); ret = write(fd, buf, sb.st_size);if (ret == -1)
 {
  perror("error");
  return 1;
 }
 
 if (close(fd) == -1)
 {
  perror("close");
  return 1;
}
free(buf);
 return 0;
}
Copy the code

Results of your own test run:

root@chenwr-pc:/home/workspace/test# strace ./run 
execve("./run"["./run"], [/* 22 vars */]) = 0
brk(0)                                  = 0x13ac000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=106932, ... }) = 0 mmap(NULL, 106932, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb98f1d7000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P \2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1857312, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb98f1d6000
mmap(NULL, 3965632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fb98ec07000
mprotect(0x7fb98edc5000, 2097152, PROT_NONE) = 0
mmap(0x7fb98efc5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x7fb98efc5000
mmap(0x7fb98efcb000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fb98efcb000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb98f1d4000
arch_prctl(ARCH_SET_FS, 0x7fb98f1d4740) = 0
mprotect(0x7fb98efc5000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7fb98f1f2000, 4096, PROT_READ) = 0
munmap(0x7fb98f1d7000, 106932)          = 0
open("/tmp/file_mmap", O_RDWR|O_CREAT, 0600) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=14, ... }) = 0 brk(0) = 0x13ac000 brk(0x13cd000)                          = 0x13cd000
read(3, "www.baidu.com\n", 14)          = 14
lseek(3, 0, SEEK_SET)                   = 0
write(3, "WWW.BAIDU.COM\n", 14)         = 14
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
Copy the code

Description of the demo online:

open("/tmp/file_mmap", O_RDWR|O_CREAT, 0600) = 3 //open, fd=3 fstat64(3, {st_mode=S_IFREG|0644, st_size=18, ... }) = 0 //fstat, where file size 18 BRK (0) = 0x9845000 // BRK, return current breakpoint BRK (0x9866000) = 0x9866000 //malloc allocate memory, current last address of heapread(3, "www.perfgeeks.com\n", 18) = 18 / /read
lseek(3, 0, SEEK_SET) = 0 //lseek
write(3, "WWW.PERFGEEKS.COM\n", 18)  = 18 //write
close(3)  = 0 
Copy the code

Here, read() is used to read the contents of the file, and after toupper(), write() is called to write the file back. Because the file is too small, the disadvantage of read()/write() is that frequent access to large files requires multiple lseek() locations. Double copies of data in physical memory for each read()/write() edit. Of course, you can’t ignore the cost of creating and maintaining mmap() data structures. Note: There is no specific test of MMAP vs Read /write, that is, you can’t say who is bad or who is bad. Just remember: after mMAP memory maps files, operation memory is operation file, which saves a lot of system kernel calls (lseek, read, write).

5. Demo written by yourself

#include <stdio.h>
#include <ctype.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#define INT64U unsigned long long

#define MSG_ERR 1
#define MSG_WARN 2
#define MSG_INFO 3
#define MSG_DBG 4
#define MSG_NOR 5

#define MSG_HEAD ("libfat->")
#define PRTMSG(level, fmt, args...) \
do {\
    if (level <= MSG_NOR) {\
        if (level <= MSG_NOR) {\
            printf("%s, %s, line %d: " fmt,__FILE__,__FUNCTION__,__LINE__, ##args); \
        } else {\
            printf("%s:" fmt, MSG_HEAD, ##args); \
        }\
    }\
} while(0)

typedef unsigned char       BOOLEAN;
typedef unsigned char       INT8U;
typedef unsigned int        INT16U;
typedef unsigned long       INT32U;

typedef signed char         INT8S;
typedef signed int          INT16S;
typedef signed long         INT32S;

char *filename = "./lt00001";
//char *filename = "/mnt/sdisk/video/lt00004";
char *data = "1111111111\2222222222\3333333333\4444444444"; INT32S data_len = 40; Struct timeval t_start, t_end; structstat file_info;
long cost_time = 0;
int write_num = 1000;

INT32S mmap_write(INT32S fd, INT64U offset, void *data, INT32S data_len)
{
    char *buf = NULL;

 	if (fstat(fd, &file_info) == -1) {
		perror("fstat");
		PRTMSG(MSG_ERR, "[cwr] Get file info failed\n");
		return- 1; } buf = mmap(0, file_info.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);if (buf == MAP_FAILED) {
		perror("mmap");
		PRTMSG(MSG_ERR, "[cwr] mmap failed\n");
		return- 1; } //offset = (INT64U)((order)*sizeof(FAT_FILE_LIST_T)); memcpy(buf+offset, data, data_len);if (munmap(buf, file_info.st_size) == -1) {
		perror("munmap");
		PRTMSG(MSG_ERR, "[cwr] munmap failed\n");
		return- 1; }return data_len;
}
int write_test()
{
	int fd, ret, i, data_size;
	INT64U ret64, offset;
	int ret_len = 0;
	time_t starttime, endtime;

	fd = open(filename, O_RDWR);
	if (fd < 0) {
		printf("[cwr] open file faild\n");
	}
	gettimeofday(&t_start, NULL);
	for (i=0; i<write_num; i++) {
		offset = i*data_len;
		ret64 = lseek64(fd, offset, SEEK_SET);
        if (ret64 == -1LL) {
            printf("lseek data fail\n");
            return- 1; } ret_len = write(fd, data, data_len);if(ret_len ! = data_len) {printf("[cwr] count = %d; write error\n", i);
			close(fd);
			return- 1; } } gettimeofday(&t_end, NULL);printf("[cwr] test end, count = %d\n", i);
	close(fd);
	return 0;
}
int mmap_write_test()
{
	 int fd, ret, i, data_size;
	 INT64U ret64, offset;
	 int ret_len = 0;

    fd = open(filename, O_RDWR);
    if (fd < 0) {
        printf("[cwr] open file faild\n");
    }
	gettimeofday(&t_start, NULL);
    for (i=0; i<write_num; i++) {
        offset = i*data_len;
        ret_len = mmap_write(fd, offset, data, data_len);
        if(ret_len ! = data_len) {printf("[cwr] count = %d; mmap write error\n", i);
            close(fd);
            return- 1; } } gettimeofday(&t_end, NULL);printf("[cwr] mmap write test end, count = %d\n", i);

	close(fd);
    return 0;

}
void main()
{
	int ret;

	memset(&file_info, 0, sizeof(file_info));
#if 1
	ret = write_test();

	if(ret ! = 0) {printf("[cwr] write_test failed\n");
	}
#endif
#if 0
    ret = mmap_write_test();

    if(ret ! = 0) {printf("[cwr] mmap_write_test failed\n");
    }
#endif
	cost_time = t_end.tv_usec - t_start.tv_usec;
	printf("Start time: %ld us\n", t_start.tv_usec);
	printf("End time: %ld us\n", t_end.tv_usec);
	printf("Cost time: %ld us\n", cost_time);

	while(1) { sleep(1); }}Copy the code

Running results:

The time of obtaining the write value

buf = mmap(0, 40, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset*4*1024)
Copy the code

Why testing MMAP efficiency is not more efficient.

Vi. Reference materials

Mmap function to use with the example explanation – u013525455 blog blog – CSDN blog.csdn.net/u013525455/…

Linux Mmap memory mapping mmap() vs Read ()/write()/lseek() example demo – Linux operating system: Ubuntu_Centos_Debian – red black alliance www.2cto.com/kf/201806/7…

Linux file read/write mechanism and optimization way – it so learn Linux – blog garden www.cnblogs.com/linuxprobe/…

Manipulating files with Memory Mapping (MMAP) in Linux – Casual Wind Column – CSDN blog blog.csdn.net/windgs_yf/a…

Linux memory management – the mmap function explanation – – CSDN blog blog.csdn.net/notbaron/ar badman250 column…

Function in sync, summarize the fsync and fdatasync finishing – – CSDN blog blog.csdn.net/pugu12/arti pugu12 column…

file – Why (ftruncate+mmap+memcpy) is faster than (write)? – Stack Overflow stackoverflow.com/questions/3…

Efficiency comparison between MMAP and Direct IO (Read, Write) – Notes from Hiasa Yousa – CSDN blog blog.csdn.net/qq_15437667…

Test under Linux fprintf fwrite write mmap written documents such as the speed of the blog – penzchan column – CSDN blog.csdn.net/penzchan/ar…

Comparison between Mmap and Write Performance -bjpiao-ChinaUnix Blog blog.chinaunix.net/uid-2657535…

Linux mmap memory mapping principle analysis – fish, so the deep column – CSDN blog blog.csdn.net/yusiguyuan/…

Linux process virtual memory – fengxin blog blog – CSDN blog.csdn.net/fengxinlinu…

[original] in-depth mmap – from the three key problems – Jane books www.jianshu.com/p/eece39bee…