“This is my 37th day of participating in the First Challenge 2022. For details: First Challenge 2022”

To learn kafka, check out this column: juejin.cn/column/6996…

This is based on my understanding of the kafka principle. I can say that sequential disk writes are faster than random memory reads. But at the end of the day, performance-based optimization is about memory.

Kafka is a message queue that supports writing and writing large amounts of data. Kafka is implemented based on Scala and Java, and both Scala and Java need to run on the JVM. Therefore, if it is based on memory, that is, the JVM’s heap for data storage, it needs to create a large heap to support data reading and writing. This can result in GC frequently affecting performance. With these considerations in mind, Kafka uses disks to store data.

Kafka file storage format

Messages in Kafka are categorized by topic, with producers producing messages and consumers consuming messages, both topic-oriented. The topic storage structure is shown below:

Messages generated by producers are continuously added to the end of log files. To prevent inefficient data location due to large log files, Kafka uses fragmentation and indexing to divide each partition into multiple segments. Each segment corresponds to two files — an.index file and a.log file.

Partition Indicates the naming rule of a folder

Kafka = kafka = kafka = kafka = kafka = kafka = kafka = kafka = kafka

kafka-0
kafka-1
kafka-2
Copy the code

Naming rules for index and log files

1) The first segment in the partition folder starts from 0, and then each seinterfaces file will name the last message of the above segment file with offset+1 (name the offset value of the first message in the current log).

2) The maximum value is 64 bit long size. A 19-digit character length, with no digits padded with zeros.

For example, there are three pairs of files:

0000000000000000000.log
0000000000000000000.index
0000000000000002584.log
0000000000000002584.index
0000000000000006857.log
0000000000000006857.index
Copy the code

Take the second file as an example to look at the corresponding data structure:

Sparse indexes need some attention.

Message lookup process:

Find message-2589, offset is 2589:

1) Locate the segment file in 0000000000000002584.

2) Calculate the relative offset of the search offset in the log file offset-file number = 2589-2584 = 5; Find the value of the first parameter in the index file, if found, then obtain the offset, through the offset to the log file to find the corresponding offset data can be; If not found in this example, find the value closest to the upper limit of the offset in the current index, i.e. 3, offset text 246; Then go to the log file and start looking down from the data at offset 246.

Realize the principle of disk faster than memory

Kafka is based on disk but faster than memory.

Sequential write disk

In looking at storage structures earlier, we saw that Kafka uses an end-of-append approach to logging, which is sequential writing. Because of the mechanics of the disk, sequential writing is faster because it saves a lot of time in head addressing.

MMAP (Memory Mapped Files)

Mmap is simply described as mapping disk files to memory. Users can modify disk files by modifying memory.

Even with sequential writes to disk, disk reads and writes are still much slower than memory, but the operating system has solved this problem for us. In Linux, Linux reads some data from disk into memory, which we call memory pages. When reading or writing hard disks, they are processed in memory pages first. When the number of dirty pages reaches a certain level, the operating system will flush the dirty pages and write the data from the memory to the disk.

Problem: unreliable. Data written to Mmap is not actually written to hard disk. The operating system does not write data to hard disk until Flush is called.

Kafka provides a parameter to control whether the producer is actively Flush: If Kafka writes to the Mmap, it Flush immediately, and then returns the producer to Sync. If Kafka writes to mmap and immediately returns Producer without calling Flush, it is called Async.Copy the code

Zero copy technology

Zero copy does not eliminate the need for copy, but reduces the number of unnecessary copies. It is usually used during I/O reading and writing.

Traditional IO process

As shown in the figure above, there are four copies in the figure above:

1) Read buffer of data to kernel state;

2) Kernel-mode read buffer to user-mode application layer buffer;

3) Socket buffer from user mode to kernel mode;

4) The socket buffer to the NIC buffer (NIC).

DMA Introduces the DMA technology, which refers to the interface technology that external devices directly exchange data with the system memory without using the CPU. Hardware devices such as network adapters support DMA technology.

As shown in the figure above, the figure has gone through two copies.

Sendfile In kernel version 2.1, sendFile system call was introduced to simplify data transfer over the network and between two local files. DMA technology is also used.

As shown in the figure above, the figure underwent a total copy process.

Before sendFile, we copied the data from the page cache to the socket cache. In fact, we just passed the buffer descriptor to the socket buffer and passed the data length so that the DMA controller could package the data from the page cache and send it to the network.

As shown in the figure above, the last copy is also eliminated, data ->read buffer->NIC.

Kafka implements zero copy

Kafka is implemented in Java and Scala, On sendfile by Java NIO FileChannel (Java. NIO. Channels. FileChannel) transferTo and transferFrom method to realize zero copy.

Note: transferTo and transferFrom do not guarantee the use of zero copy. Whether or not you can actually use zero-copy depends on the operating system. If the operating system provides a zero-copy system call like SendFile, then the two methods will take full advantage of zero-copy through such a system call, otherwise they cannot implement zero-copy by themselves.