Basic overview

File manipulation is one of the most basic functions of application programs. Win32 API and MFC both provide functions and classes to support file processing. Commonly used are Win32 API CreateFile (), WriteFile (), ReadFile () and MFC provided CFile class. Generally speaking, the above functions can meet the requirements of most situations, but for some special application fields requiring dozens of GB, hundreds of GB, or even a few TB of mass storage, and then the usual file processing method is obviously not feasible. Currently, operations on such large files are generally handled as memory-mapped files.

A memory-mapped file is a mapping from a file to the process address space. In Win32, each process has its own address space, and one process can’t easily access data in another process’s address space, so it can’t do that in 16-bit Windows. The Win32 system allows multiple processes (running on the same computer) to share data using memory-mapped files. In fact, other techniques for sharing and transmitting data, such as using SendMessage or PostMessage, use memory-mapped files internally. [1]

Edit this sectionData sharing

File data Sharing

This data sharing involves having two or more processes map views of the same file mapping object, that is, they are sharing the same physical storage page. This way, when one process writes to one view of a memory-mapped file, the other processes immediately see changes in their own views. Note that you use the same name for the file mapping object.

Access methods

In this way, the data in a file can be accessed using in-memory read/write instructions instead of I/O system functions such as ReadFile and WriteFile, thus increasing file access speed.

Edit this sectionScope of application

Scope of application

This function is most suitable for applications that need to read files and parse the information contained in files, such as color syntax editors and compilers that parse input files.

Mapping files for reading and parsing enables applications to manipulate files using memory operations rather than having to read, write, and move Pointers back and forth across the file.

application

Some operations, such as abandoning the “read” of a character, were previously quite complex and the user had to deal with flushing of the buffer. After introducing the mapping file, it becomes much simpler. All the application has to do is decrement the pointer by one value.

Another important use for mapping files is shared memory to support permanent naming. To share memory between two applications, you can create a file in one application and map it, and then the other application can use it as shared memory by opening and mapping the file. VC++ uses memory-mapped files to handle large files

Edit this sectionMemory file

Memory mapped files have some similarities with virtual memory, through memory mapping file can retain a region of address space, physical memory to the region at the same time, only memory file mapping of physical storage from an already exist in the file on disk, rather than the page file system, and must first before operation on the file to the file mapping, This is like loading the entire file from disk into memory. Therefore, when using memory-mapped files to process files stored on disk, there is no need to perform I/O operations on files. This means that there is no need to apply for and allocate cache for files during file processing. All file cache operations are directly managed by the system. Memory-mapped files can play a very important role in processing files with large data volume because the steps of loading file data into memory, writing data back from memory to file, and freeing memory blocks are eliminated. In addition, the actual engineering system often needs to share data between multiple processes, if the amount of data is small, the processing method is flexible, if the shared data capacity is huge, then it needs to use memory mapping files to carry out. In fact, memory-mapped files are the most efficient way to share data between local processes.

Memory-mapped files are not simple file I/O operations, but actually use the core programming technology of Windows – memory management. Therefore, if you want to have a deeper understanding of memory mapped files, you must have a clear understanding of the Memory management mechanism of the Windows operating system. The following is a general method to use memory mapped files:

The CreateFile () function first creates or opens a file kernel object that identifies the file on disk that will be used as a memory-mapped file. After CreateFile () notifies the operating system of the location of the file image in physical storage, only the path of the image file is specified, and the length of the image is not specified. To specify how much physical storage is required for the file mapping object, create a file mapping kernel object using CreateFileMapping () to tell the system the size of the file and how to access it. After the file mapping object is created, you must also reserve an address space region for the file data and commit the file data as physical storage mapped to that region. The MapViewOfFile () function is responsible for mapping all or part of the file mapping object into the process address space through system management. At this point, the use and processing of memory-mapped files is basically the same as that of files loaded into memory. After the use of memory-mapped files, a series of operations are performed to clear them and release used resources. This part is relatively simple. You can undo the image of the file data from the process’s address space with UnmapViewOfFile () and close the file mapping object and file object created earlier with CloseHandle ().

Edit this sectionCorrelation function

When using memory-mapped files, the API functions used are mainly those mentioned above, which are described below:

HANDLE CreateFile(LPCTSTR lpFileName,DWORD dwDesiredAccess,DWORD dwShareMode,LPSECURITY_ATTRIBUTES lpSecurityAttributes,DWORD dwCreationDisposition,DWORD dwFlagsAndAttributes,HANDLE hTemplateFile);

The CreateFile () function is often used to create and open files even in normal file operations. In memory mapped files, this function creates/opens a file kernel object and returns its handle. DwDesiredAccess and dwShareMode need to be set according to whether data read and write and file sharing are required when calling this function. Incorrect parameter Settings will result in the failure of corresponding operations.

HANDLE CreateFileMapping(HANDLE hFile,LPSECURITY_ATTRIBUTES lpFileMappingAttributes,DWORD flProtect,DWORD dwMaximumSizeHigh,DWORD dwMaximumSizeLow,LPCTSTR lpName);

The CreateFileMapping () function creates a file mapping kernel object, specifying the file handle (retrieved from the return value of CreateFile ()) to be mapped to the process address space with the parameter hFile. Because the memory mapped file of actual physical memory is stored in a file on the disk, rather than from system memory allocated in the page file, so the system does not take the initiative to retain its address space area, also won’t automatically file storage space is mapped to the area, in order to make the system to determine the protection properties page to take, FlProtect is required. Protection properties PAGE_READONLY, PAGE_READWRITE, and PAGE_WRITECOPY indicate that file data can be read and written after a file mapping object is mapped. When using PAGE_READONLY, you must ensure that CreateFile () takes the GENERIC_READ argument; PAGE_READWRITE requires CreateFile () USES a GENERIC_READ | GENERIC_WRITE parameters; As for the PAGE_WRITECOPY property, just make sure that CreateFile () uses either GENERIC_READ or GENERIC_WRITE. DWORD parameters dwMaximumSizeHigh and dwMaximumSizeLow are also important, specifying the maximum number of bytes for a file. Since these two parameters are 64 bits, the maximum file size supported is 16 exabytes, which can meet the requirements of almost any large data processing situation.

LPVOID MapViewOfFile(HANDLE hFileMappingObject,DWORD dwDesiredAccess,DWORD dwFileOffsetHigh,DWORD dwFileOffsetLow,DWORD dwNumberOfBytesToMap);

The MapViewOfFile () function maps file data to the process’s address space. HFileMappingObject is the handle to the file image object returned by CreateFileMapping (). The dwDesiredAccess argument again specifies how to access the file data, again matching the protection property set by CreateFileMapping (). While repeated Settings of the protection attributes may seem redundant, they give the application more control over the protection attributes of the data. The MapViewOfFile () function allows all or part of a file to be mapped, specifying the offset address of the data file and the length to be mapped. The offset address of the file is specified by the 64-bit value of dwFileOffsetHigh and dwFileOffsetLow in the DWORD type. It must be an integer multiple of the allocation granularity of the OPERATING system. For Windows, the allocation granularity is fixed to 64KB. Of course, you can also dynamically obtain the granularity of the current operating system by using the following code:

SYSTEM_INFO sinf; GetSystemInfo(&sinf); DWORD dwAllocationGranularity = sinf.dwAllocationGranularity;

The dwNumberOfBytesToMap parameter specifies the mapping length of the data file. In particular, for Windows 9x, MapViewOfFile () returns NULL if it cannot find a region large enough to hold the entire file mapping object. But under Windows 2000, MapViewOfFile () only needs to find a region large enough for the necessary view, regardless of the size of the entire file mapping object.

UnmapViewOfFile () UnmapViewOfFile () UnmapViewOfFile () UnmapViewOfFile ()

BOOL UnmapViewOfFile(LPCVOID lpBaseAddress);

The only argument, lpBaseAddress, specifies the base address of the return region, which must be set to the return value of MapViewOfFile (). After using MapViewOfFile (), there must be a corresponding UnmapViewOfFile () call, otherwise the reserved area will not be freed until the process terminates. In addition to the file kernel objects and file mapping kernel objects previously created by CreateFile () and CreateFileMapping (), it is necessary to release them through CloseHandle () before the process terminates, otherwise there will be resource leaks.

In addition to these required API functions, there are other helper functions that you can use when using memory-mapped files. For example, when using memory-mapped files, the system caches the data pages of the file for speed and does not update the disk image of the file immediately while processing the file mapping view. To solve this problem, consider using the FlushViewOfFile () function, which forces the system to write some or all of the modified data back to the disk image, thereby ensuring that all data updates are saved to disk in a timely manner.

Edit this sectionThe sample application

The following illustrates the use of memory-mapped files with a specific example. The instance receives data from the port and stores it on disk in real time. Due to the large amount of data (tens of GB), memory mapped files are selected here for processing. The following is the main code in the worker thread MainProc, which starts when the program is running and emits the event hEvent[0] when data arrives on the port. The WaitForMultipleObjects () function waits until the event occurs to save the received data to disk. If the receive is terminated, the event hEvent[1] is emitted, and the event handler is responsible for freeing resources and closing files. The following is the specific implementation process of this thread handling function:

// Create a file kernel object whose handle is stored in hFile

HANDLE hFile = CreateFile(“Recv1.zip”,GENERIC_WRITE | GENERIC_READ,FILE_SHARE_READ,NULL,CREATE_ALWAYS,FILE_FLAG_SEQUENTIAL_SCAN,NULL);

// Create a file to map the kernel object with the handle stored in hFileMapping

HANDLE hFileMapping = CreateFileMapping (PAGE_READWRITE hFile, NULL, 0, 0 x4000000, NULL);

// Release the file kernel object

CloseHandle(hFile);

// Set the size, offset and other parameters

__int64 qwFileSize = 0x4000000;

__int64 qwFileOffset = 0;

__int64 T = 600 * sinf.dwAllocationGranularity;

DWORD dwBytesInBlock = 1000 * sinf.dwAllocationGranularity;

// Map file data to the address space of the process

PBYTE pbFile = (PBYTE)MapViewOfFile(hFileMapping,FILE_MAP_ALL_ACCESS,(DWORD)(qwFileOffset>>32),(DWORD)(qwFileOffset&0xFFFFFFFF),dwBytes InBlock);

while(bLoop)

{

// Catch event hEvent[0] and event hEvent[1]

DWORD ret = WaitForMultipleObjects(2,hEvent,FALSE,INFINITE);

ret -= WAIT_OBJECT_0;

switch (ret)

{

// Receive data event triggered

case 0:

// Receives data from the port and saves it to a memory-mapped file

nReadLen=syio_Read(port[1],pbFile + qwFileOffset,QueueLen);

qwFileOffset += nReadLen;

// When the data is 60% full, create a new mapping view to prevent data overflow

if (qwFileOffset > T)

{

T = qwFileOffset + 600 * sinf.dwAllocationGranularity; UnmapViewOfFile(pbFile);

pbFile = (PBYTE)MapViewOfFile(hFileMapping,FILE_MAP_ALL_ACCESS,(DWORD)(qwFileOffset>>32),(DWORD)(qwFileOffset&0xFFFFFFFF),dwBytes InBlock);

}

break;

// Terminates event triggering

case 1:

bLoop = FALSE;

// Undo the file data image from the process’s address space

UnmapViewOfFile(pbFile);

// Close the file mapping object

CloseHandle(hFileMapping);

break;

}

}…

If the UnmapViewOfFile () and CloseHandle () functions are simply executed during the termination event trigger processing, the actual size of the file cannot be correctly identified. That is, if the open memory mapped file is 30GB and the received data is only 14GB, then after the execution of the above program, The saved file length is still 30GB. That is, after the processing is complete, the file must be restored to its actual size again in the form of a memory-mapped file. Here is the main code to do this:

// Create another file kernel object

hFile2 = CreateFile(“Recv.zip”,GENERIC_WRITE | GENERIC_READ,FILE_SHARE_READ,NULL,CREATE_ALWAYS,FILE_FLAG_SEQUENTIAL_SCAN,NULL);

// Create another file mapping kernel object with the actual data length

hFileMapping2 = CreateFileMapping(hFile2,NULL,PAGE_READWRITE,0,(DWORD)(qwFileOffset&0xFFFFFFFF),NULL);

// Close the file kernel object

CloseHandle(hFile2);

// Map file data to the address space of the process

PbFile2 = (PBYTE) MapViewOfFile (hFileMapping2 FILE_MAP_ALL_ACCESS, 0, 0, qwFileOffset);

// Copy data from the original memory-mapped file to this memory-mapped file

memcpy(pbFile2,pbFile,qwFileOffset);

file:

// Undo the file data image from the process’s address space

UnmapViewOfFile(pbFile);

UnmapViewOfFile(pbFile2);

// Close the file mapping object

CloseHandle(hFileMapping);

CloseHandle(hFileMapping2);

// Delete temporary files

DeleteFile(“Recv1.zip”);