Explains the sync, fsync, Fdatasync, and fflush functions
1. Explanation of terminology
Concepts of the dirty pages: the Linux kernel, because of the hard disk read and write speed than the speed of memory, the system is to read and write more frequently data in advance into the memory, in order to improve the reading and writing speed, this is called cache, Linux as page cache unit, when the process to modify the data in the cache, the page is marked by the kernel dirty pages, The kernel will write the dirty page data to disk at the appropriate time to keep the data in the cache consistent with the data on disk. Memory mapping: A memory-mapped file is a mapping from a file to a block of memory. Win32 provides a function (CreateFileMapping) that allows applications to map files to a process. Memory mapped files have some similarities with virtual memory, through memory mapping file can retain a region of address space, physical memory to the region at the same time, the memory of physical storage from an existing file mapping to a file on disk, and before the operations on the document must first be mapped to a file. Memory-mapped files can be used to process files stored on disks without I/O operations. Therefore, memory-mapped files play an important role in processing files with a large amount of data.
// Delayed write: Traditional UNIX implementations have a buffer cache or page cache in the kernel, with most disk I/O being buffered. When writing data to the file, the kernel usually copies the data to one of the first buffer, if the buffer is full, is not it into output queue, but wait for the full or when the kernel need to reuse the buffer to hold other disk block data, then the buffer into the output queue, and then leave to reach the team first, Before the actual I/O operation is performed. This type of output is called deferred write. // Excerpted from Advanced Programming in UNIX Environments 3rd edition P65Copy the code
Second, the body
Write delay reduces the disk read and write times, but slows down the update speed of the file content. As a result, the data to be written to the file is not written to the disk for a period of time. This delay can result in the loss of file updates in the event of a system failure. UNIX provides sync, fsync, and fdatasync functions to ensure consistency between the actual file system on disk and the contents of the buffer cache.
2.1 the sync function
The sync function simply queues all the modified block buffers into write queues and returns without waiting for the actual write operation to finish. The system daemon, usually called Update, calls sync periodically (typically every 30 seconds). This ensures that the kernel's block buffers are flushed periodically. The sync(1) command also calls sync.Copy the code
2.2 fsync function
The fsync function only works on the single file specified by the file descriptor Filedes and waits for the disk write operation to end and then returns. Fsync can be used for applications such as databases that need to ensure that modified blocks are written to disk immediately.Copy the code
2.3 fdatasync function
The fdatasync function is similar to fsync, but it only affects the data portion of the file. In addition to data, fsync also updates file properties synchronously. For databases that provide transaction support, when a transaction is committed, ensure that the transaction log (including all modifications of the transaction and a commit record) is fully written to disk before the transaction is considered successful and returned to the application layer.Copy the code
2.4 fflush function
Standard IO functions (such as fread, fwrite, etc.) create a buffer in memory. This function flusher the buffer and writes the content to the kernel buffer. To actually write it to disk, you need to call fsync. (That is, call fflush and then fsync, otherwise it won't work). Fflush takes the specified file stream descriptor (corresponding to file streams opened with functions such as fopen) and returns simply by flushing data from the upper buffer to the kernel buffer,Copy the code
So it’s not very secure compared to fsync, and you need to call fsync again to actually write the data to the disk. Use the function int fileno(FILE *stream); Converting file stream descriptors (FP) to file descriptors (FD) to facilitate fsync calls, how can data be correctly written to external permanent storage on Linux operating systems?
2.5 Write Fails to meet requirements and fsync is required
For the write function, we assume that once the function returns, the data has already been written to the file. In general, a write operation to a file on a hard disk (or other persistent storage device) only updates the page cache in memory. Dirty pages are not immediately updated to the hard disk, but are scheduled by the operating system. For example, flusher kernel threads meet certain conditions (a certain interval, memoryCopy the code
The dirty pages are synchronized to the hard disk (put into the I/O request queue of the device). Because the write call does not return until the disk I/O completes, imagine that if the operating system crashes after the write call but before the disk synchronizes, the data could be lost. Although such a time window is small, the “loose asynchronous semantics” provided by write() is not sufficient for database programs that want to ensure persistence and consistency of transactions, The operating system-provided synchronized IO primitives are usually required to ensure that:
Function prototype:
2.6 int fsync (int fd);
The function of fsync is to ensure that all changes to the fd file have been correctly synchronized to the hard disk. This call blocks until the device reports that I/O is complete.
If you use mmap to map the page cache of a file to the address space of the process and modify the file by writing to the memory, there are similar system calls to ensure that the changes are fully synchronized to the hard disk:
#incude <sys/mman.h>
2.7 int msync(void *addr, size_t length, int flags)
Msync needs to specify the address range for synchronization. Such fine-grained control may seem more efficient than fsync (because applications usually know where their dirty pages are), but the kernel has highly efficient data structures that quickly find dirty pages in files, so that fsync only synchronizes changes to files.
2.8 Difference between Fsync and Fdatasync
Fsync synchronizes file metadata (size, access time, etc.). Data and metadata are stored in different locations on the hard disk. Therefore, fsync requires at least two I/O writes. According to Wikipedia, the Average seek time of current hard drives is about 3 to 15ms, and Average rotation latency of 7200RPM drives is about 4ms. Therefore, an I/o operation takes about 10ms. Posix also defines FdatasYNc, which liberalizes synchronization semantics to improve performance:
2.9 int fdatasync (int fd);
Fdatasync functions similarly to fsync, but synchronizes only when necessary, thus eliminating one IO write. “fdatasync does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be corretly handled.”
// Excerpt from man manualCopy the code
For example, if the file size changes, you must synchronize it immediately. Otherwise, if the OS crashes, the modified metadata cannot be read even if the file data has been synchronized. The last access time (atime)/ modification time (mtime) does not need to be synchronized every time, as long as the application does not have strict requirements on the two timestamps, there is basically no problem.
Note: The O_SYNC/O_DSYNC parameter of the open function has a similar meaning to fsync/fdatasync: block each write until the disk I /O completes.
O_SYNC causes each write to wait for the physical I/O operation to complete, including the I/O required to update file properties caused by the write operation. O_DSYNC causes each write to wait for the physical I/O operation to complete, but if the write operation does not affect the reading of the data just written, there is no need to wait for the file properties to be updated. Note the difference: the O_DSYNC flag is subtly different from the O_SYNC flag: the O_DSYNC flag affects file attributes only if the file attributes need to be updated to reflect changes in file data (for example, updating the file size to reflect the inclusion of more data in the file). When the O_SYNC flag is set, data and properties are always updated synchronously. When a file is opened with the O_DSYN flag, the file time attribute is not updated synchronously when portions of its existing content are overwritten. In contrast, if a file is opened with the O_SYNC flag, each write to the file will update the file time before the write returns, regardless of whether existing bytes are overwritten or files are appended. This setting is less flexible than fsync/fdatasync and should be used sparingly.
3.0 Optimize log synchronization with Fdatasync (fromBlog.csdn.net/cywosp/arti…
To meet transaction requirements, database log files often need to synchronize IO. Because of the synchronous wait for disk I/OS to complete, the transaction commit operation is often time-consuming and becomes a performance bottleneck. In Berkeley DB, if AUTO_COMMIT is enabled (all independent write operations automatically have transaction semantics) and the default synchronization level is used (logs are not returned until they are fully synchronized to disk), writing a record takes about 5 to 10ms, which is basically the same as an I/o operation (10ms). We already know that fsync is inefficient at synchronization. However, if you want to use fdatasync to reduce metadata updates, you need to ensure that the file size does not change before and after the write. Log files are appends-only by nature and keep growing, making it hard to take advantage of Fdatasync. Berkeley DB is used to process log files as follows: 1. Each log file has a fixed size of 10MB and is numbered from 1. The name format is log.%010d 2. Each time a log file is created, the last page of the file is written first and the size of the log file is expanded to 10MB 3. Since the file size does not change when appending records to a log file, using Fdatasync can greatly optimize log writing efficiency.4 If a log file is full, a new log file is created with only one synchronization metadata overhead
Third, summary
1. If you are writing to all buffers, you should use sync, but it should be noted that this function only queues the command and then returns it. 2. If you want to commit changes made to an open file to disk, you should call fsync. This function returns only after the data is actually written to disk, so it is the safest and most reliable way. 3. If you are operating on an open file stream, you should first call fsync to synchronize the changes to the kernel buffer, and then call fsync to actually synchronize the changes to the hard disk.
Fsync and FDATASYNC are attached to man manual
fsync() transfers (“flushes”) all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent Storage device) so that all changed information can be retrieved even after the SYS ‐ tem crashed or was rebooted includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)). Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modification; see stat(2)) do not require flushing because they are not necessary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush. The aim of fdatasync() is to reduce disk activity for applications that do not require all metadata to be synchronized with the disk.