preface
Recently I looked at the code of MMKV, its core is to read and write files through Mmap. There are, of course, issues such as multi-process, serialization, and key rearrangement (all covered in any article). This paper mainly through Mmap to achieve a simple file read and write.
Specifically including the use of mmap function and how to expand the file size
Mmap () is what
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);
Copy the code
mmap()
creates a new mapping in the virtual address space of the
calling process. The starting address for the new mapping is specified in addr. The length argument specifies the length of the mapping (which must be greater than 0). If addr is NULL, then the kernel chooses the (page-aligned) address at which to create the mapping; this is the most portable method of creating a new mapping.
Part of the description is not posted.
Addr is the starting address, length is the length, flags, MAP_SHARED and so on, prot is the protection of the mapping area, PROT_READ and so on, offset must be an integer multiple of the size of a page of memory, Otherwise, mmap will fail to be invoked. Generally, offset is 0, indicating the beginning of the file.
The principle of mmap ()
Zhuanlan.zhihu.com/p/83398714…
Because I don’t know much about the Linux Kernel layer, such as pageCache, such as how to complete the read/write system call, etc., SO I don’t agree with others, and wait for a day to complete this part of the knowledge, then update.
Use Mmap to read and write files
Because there is some MMKV code, it first creates the file or reads the file (if the file already exists), both with an integer multiple of pageSize.
Then there is the problem. Consider the following situation:
You create a file and write one byte of data. Assuming a page size of 4KB, the 4KB file is actually only 1byte of actual content. The next time we read the file, we will definitely append to the end of the last write.
This is how to store the length of the previous contents. MMKV uses the first 4 bytes of the file to record the true length of the contents of the file (on 32-bit machines).
So in this article I also use the 4byte header to store the actual file length.
Code sample
Complete code github
In fact, a C file ~
Main logic
// open Opens or creates a file
fd = open(fileName, O_RDWR | O_CREAT, 0777);
// Get the memory size of a page
int pageSize = getpagesize();
// If necessary, change the file size to an integer multiple of pageSize
ftruncate(fd, mmapSize)
/ / mmap mapped
mmapPointer = (char *) mmap(NULL, mmapSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// Get the actual length of the file contents to offset
memcpy(&offset, mmapPointer, offsetSize);
// Write data to memory (this also includes expansion process)
memcpy(mmapPointer + offset + offsetSize, data, strlen(data));
/ / update the offset
memcpy(mmapPointer, &offset, offsetSize);
// Release resources
munmap(mmapPointer, mmapSize);
close(fd);
Copy the code
First, we can open a file to obtain the file descriptor, then obtain the pagesize with getPagesize, then change the file size with ftruncate, and then with mmap, mmapPointer is the initial address of the file contents in memory.
What happens if mmap is set to a size larger than the contents of the file itself?
1. If the memory area between fileSize and mmapSize is operated, SIGBUS 2 is triggered. SIGSEGV is triggered if a memory region between > mmapSize is manipulated
With the first four bytes documenting the actual file length mentioned earlier, we can read the first four bytes and convert it to int offset. Then mmapPointer + offset + 4 is the end address of our file’s real contents in memory. Then use memcpy to modify the contents of this memory area.
As for when to write back, since the FLAG of the Mmap used is MAP_SHARED, the operating system will help us write back to disk when appropriate (check the Mmap documentation in detail).
Protobuf
By reading and writing files above, we can write data to files. However, if you write a value like an int, sizeof must be 4 bytes, but if the int is only 1, it should be at least 1bit.
So MMKV uses Protobuf. Protobuf has its own encoding to reduce memory footprint (but what if it’s all strings?) , faster encoding and decoding speed. .
conclusion
After reading part of MMKV codes, I learned the following knowledge:
- Learned basic mmap() and memcpy() usage
- Some of the principles of MMAP () were understood
- – Learned something about protobuf