This is the 29th day of my participation in the August Wenwen Challenge.More challenges in August
One, foreword
In our last article, “Is Data Safe after Linux I/O Fsync (Fsync, Fwrite, Fflush, Mmap, Write Barriers)”, we learned how to write data safely to disk in Linux. We mentioned direct IO. It can be used to create a direct channel between disk cache space and user space, so that we can directly manipulate data on disk in user space.
In this article, we will study direct IO in depth. Before we start, we can think about the following questions and see if they can be answered correctly.
What is direct IO?
2. When can DIRECT IO be used?
3. Why do we need direct IO?
4. What are the advantages and disadvantages of direct IO?
5. In direct IO, is there any cache between the application Buffer and disk?
Two, direct IO
What is direct IO
Straightforward explanation: Direct IO is the creation of a channel between the application Buffer and disk. This reduces the number of context switches when reading and writing data and reduces the number of data copies, thus improving efficiency.
The official explanation is as follows: It focuses on the following two points,
A. The channel between the application layer and disk is not without caching, but with very little caching.
B. Direct IO does not guarantee O_SYNC.
How to use direct IO
The use of direct IO is as simple as adding the O_DIRECT flag to the open file. But Java itself does not support direct IO. If a Java program wants to use direct IO, it needs to reference the JNA package, manipulate the file by directly calling the Linux underlying instructions, and specify the O_DIRECT flag when it opens the file. Direct IO is better used in Ignite, you can refer to his specific code example use examples (search Ignite official download code can be). The code is in the ignite-direct-io package.
3. Precautions for using direct IO
Often, direct IO usage leads to performance degradation. Since little data is cached by the kernel, all read and write operations are performed directly on files on disk. Therefore, you need to cache data at the application layer and properly perform read and write operations. Otherwise, performance deteriorates sharply.
In addition, direct I/O requires data read and write based on the block dimension, that is, data read and write must be an integer multiple of the block size of the device and the page size of the Linux system. So typically we read and write data in 4K chunks.
4. Advantages and disadvantages of direct IO
Advantages of A,
A, the application layer directly operates the disk to reduce the overhead of context switch and data copy, and the speed is faster.
B. Data is directly cached in the application layer, so that applications can operate data more flexibly.
B and disadvantages
A. The system basically does not cache data, so the application needs reasonable read and write data, otherwise the performance will be poor.
B. All caches are directly controlled by the application layer, which increases the complexity of the implementation of the application layer and has high requirements on the ability of developers.
C. O_DIRECT also cannot ensure that data is written to disks simultaneously each time it is written to disks. Therefore, you need to manually set the O_SYNC flag or call the fsync method to synchronize data.
5. Problem solving
That’s it for direct IO, and finally we’ll answer some of the questions we’ve been asking.
What is direct IO?
The text is answered.
2. When can DIRECT IO be used?
You usually need to control the cache of all data yourself and want more flexibility in the output of file data. For example, databases and synchronous log systems.
3. Why do we need direct IO?
Consider its advantages.
4. What are the advantages and disadvantages of direct IO?
The text is answered.
5. In direct IO, is there any cache between the application Buffer and disk?
There’s a cache, but it’s small.
One final question: Since direct IO has a cache between the reference Buffer and disk, where does this cache reside? C Library Buffer? Kernel Page Cache?
Third, practice
If you like this article or find it helpful to you, welcome one button three link support, thank you very much.
If you have any questions or comments about this article, please add lifeofCoder.