LemonNan: juejin.cn/post/684490…

introduce

A disk is a storage device used to store large amounts of data. Although it stores many times as much data as RAM, the latency for reading information from disk is in the order of milliseconds, 100,000 times slower than DRAM(generally main memory) reads and one million times slower than SRAM(CPU cache memory) reads.

The main structure

Below is a rough schematic of the mechanical hard disk (comparison of pictures emMM....) You will have to make do)

Spindle, disc, drive arm, read/write head, track/sector

Each disk has a spindle in the middle that rotates at a fixed rate, and the disk follows the spindle, usually at 5,400 to 15,000rpm (my Seagate 1T was 7,200rpm). A disk usually contains many of these disks.

Each disk consists of a set of concentric circles called tracks, each track divided into a set of sectors. Each sector contains the same bit of data (usually 512 bytes, here is some output information)

// the following is the data on a centos7.2
[root@xxxxx ~]# fdisk -lDisk/dev/vda:42.9 GB, 42949672960Bytes,83886080Sectors Units = sectors of1 * 512 = 512Bytes Sector size (logical/physical) :512Bytes /512Byte I/O size (minimum/best) :512Bytes /512Byte disk label type: DOS Disk identifier:0x000d2717
Copy the code

Disk operating

To operate a disk, you need several parts: a drive arm, a read/write head, and a track/sector.

Drive arm: The drive arm is used to position the read/write head on the track on the platter. This is called seek and will appear later.

Read/write head: Located at the end of the drive arm, it is used to read and write data. Each disk on the disk has a separate read/write head.

Track/sector:

Sector: A sector is a section of a track. The disk reads and writes data in the smallest unit of sector (sector is the smallest physical storage unit).

Track: Each concentric circle on a disk is a track, and a track consists of multiple sectors and gaps between sectors (the outermost track is the longest, has the most sectors, and stores the most data).

An aside: In the past, one of the first steps of Google to optimize search was to place hot data on the outer disk, because the disk is a concentric circle with the same angular velocity. In the same period of time, the track on the outer disk has the longest path. In other words, the data on the outer disk is the most.

Disk operation process

  • (Seek time) The physical location of the data to be read is first received, and the drive arm locates the read/write head to the corresponding track of the data. Generally, this seek time is in ms.

  • (Rotation time) after locating on the corresponding track, it then waits for the read/write head to locate on the first byte of the sector corresponding to the data on the track.

  • (Transfer time) Reading or writing data from a disk, usually at a minimum.

So the total time to access the disk is: seek time + rotation time + transfer time.

In disk I/O of mechanical hard disks, the usual disk I/O time impact is: Seek > Spin >> Transfer.

time

Seek time

Currently, the average seek time of hard disks is 7.5 ~ 14ms. The average seek time is Tavg seek = 9ms.

Mean rotation time

Take my 7200RPM mechanical hard disk for example, 7200 turns per minute, 7200/60 = 120 turns per second, the time required for a turn is 1/120(s), So the resulting average rotation time is 1/120/2 = 1/240(s), which is about a little more than 4ms.

Data transfer time

The data transfer time is usually negligible compared to the first two times, but there is (lazy).


Random I/O and sequential I/O

The random I/O

The random I/O process is similar to the disk operation process described above. Each time data is read and written, complete steps are required: Tracking, rotation and transmission, namely, each time reading and writing data, the read and write head needs to be positioned on the corresponding track. When reading and writing data is relatively frequent, the inner ring positioning and outer ring positioning will occur, which is referred to as repeated horizontal jump (as if I have heard it before). Most of the disk operation time is spent in the first step of the seek and the second part of the rotation (because only the seek + rotation can determine the final data position).

Sequential I/O

Sequential I/O requires no or almost no seek time, since sequential I/O basically operates on the same track or adjacent tracks, with very few seek times and is naturally more efficient than random I/O.

As for the subsequent transfer time, both types of I/O are the same.

How does random I/O become sequential I/O

Q: How do I convert random I/ OS to sequential I/ OS?

A: To know how to do that, you need to know the difference: The difference between random I/O and sequential I/O is the number of seek and rotation. N times of random I/O requires N seek and rotation. The best case for sequential I/O is one seek and rotation. So the seek time and spin time must be much lower than N random I/ OS. For sequential I/O to work, the data must be contiguous in space.

So data is best read/write as a whole, the whole block of data is continuous, can achieve sequential I/O.


performance

The major indicators used to measure disks are IOPS and throughput

IOPS & throughput

Input/Output Per Second (IOPS) is the number of read/write operations Per Second, like QPS.

Disk IOPS = 1000(ms)/(Seek time + rotation time + transfer time).

According to the above data (seek time + rotation time + transmission time (ignored here)) = 9+4 = 13ms, 1000/13 ≈ 76.

Throughput refers to the amount of data that can be transmitted successfully per unit of time.

test

Mechanical drive

(In order to white piao read and write performance test results, went to a certain east to open the Seagate 7200RPM 1T product page at home computer, asked customer service for the read and write performance link, customer service unexpectedly said that the merchant did not give them specific read and write performance parameters, maybe I think too much? Excuse me ~)

So I opened my Own Windows, under a performance testing software I/O performance test, the following is the test results.

You can see that the Seq is much higher than the non-SEQ. Yes, Seq is sequential I/O, and without Seq is random I/O, showing the performance difference. Sequential I/O performance is around 150MB/s, while random I/O performance is between 0.5 and 1.5MB/s.

A few more tests might make a difference, but not by much.

Solid state drives

I’m going to start with the solid state on my Windows PC for drive C

The data shows that solid-state drives are almost always far ahead because they are physically different from mechanical drives. Solid-state drives are made of semiconductors and have no moving parts, so they are more efficient to access than mechanical drives.


Optimizations in OS (or Linux)

Let me start with a principle in computers, the principle of locality.

The principle of locality: when a piece of data is used, it is more likely that neighboring data will be used later.

In Linux, some optimization of disk I/O is made based on this principle.

The proofs

When reading data from disk for the first time, for example, reading only page1(4K), Linux will synchronize the following several (3) pages (page2, 3, 4) together, stored in the page cache, the first trigger is synchronous prefetch. In other words, the upper layer must wait for the four pages to be read, and page2, 3, and 4 are marked as pre-read. Page2 (); page2 (); page2 (); page2 (); page2 (); page2 (); Page5 ~page8 will be read into the page cache, and also marked, and then continue to use the second asynchronous prefetch process.

This reminds me of a page in MySQL that defaults to 16K. Is it a coincidence?

This is the pre-read case, and there is the post-write case.

After writing

The OS does not write to disk every time it receives an I/O request. If it writes to disk every time, it is as efficient as the random I/O test above. To optimize this situation, the OS performs write merges while writing to disk. The process of merging adjacent write requests so that at least partial sequential I/O can be performed improves the efficiency of many disk writes. So after this optimization of the OS, the final IOPS will exceed the calculated IOPS.

Kafka maintains a Segment hop structure by adding the Segment to the corresponding partition. Segment data is read and written sequentially, and partition data is also read and written sequentially. This is one of the reasons why Kafka can handle 100,000 levels of data.

Batch operations are good in most cases, but without OS optimization, even daily CRUD writing is not good.


The last

This is a record of disk I/O. (Anyway, the recent bug is a bit hot.)