Kafka’s messages are saved or cached on disk. Reading and writing data to and from disk is generally thought to slow down performance because addressing takes time, but in fact, one of The characteristics of Kafka is its high throughput rate.

Kafka can easily support millions of write requests per second, far more than most messaging middleware, even on ordinary servers. This feature makes Kafka widely used in high-volume data scenarios such as log processing.

The Apache Kafka benchmark: 2 million writes per second (on three cheap machines)

Why Kafka is so fast is analyzed from both data writing and data reading.

Write data

Kafka writes incoming messages to the hard disk, and it never loses data. To optimize write speed Kafka uses two techniques, sequential write and MMFile.

Sequential writes

The speed of disk reads and writes depends on how you use it, that is, sequential or random reads and writes. In the case of sequential reads and writes, the disk in some optimized scenarios can be read and write at the same speed as memory (note: there are questions here, see searene.me/2017/07/09/… .

Because hard disks are mechanical, each read and write is addressed -> written, where addressing is a “mechanical action” that is the most time consuming. So hard disks hate random I/O and prefer sequential I/O. To speed up reading and writing hard drives, Kafka uses sequential I/O.

Linux also optimizes disk read and write, including read-ahead and write-behind, and disk caching. If the memory overhead of JAVA objects is high and the GC time of JAVA becomes very long as the heap memory increases, using disk operations has several advantages:

  • The sequential disk read/write speed exceeds the random memory read/write speed
  • The JVM’s GC is inefficient and has a large memory footprint. Using disks can avoid this problem
  • The disk cache is still available after the system is cold booted

The figure above shows how Kafka writes data to a Partition. Each Partition is actually a file, and Kafka inserts the data at the end of the file (the dotted box) after receiving the message.

One drawback of this approach is that there is no way to delete data, so Kafka does not delete data. It keeps all data, and each Consumer has an offset for each Topic to indicate which data item was read.

Consumer1 has two offsets corresponding to Partition0 and Partition1 (assuming that each Topic has one Partition). Consumer2 has an offset for Partition2. This offset is stored by the client SDK, and Kafka’s Broker ignores it completely. Normally, the SDK will save it in ZooKeeper. (So you need to provide the zooKeeper address to the Consumer).

If you don’t delete the hard drive, it will be full, so Kakfa offers two strategies for deleting data. It is based on the time and partition file size. See its configuration documentation for details.

Memory Mapped Files

Even with sequential writes to the hard drive, it’s impossible for the hard drive to catch up with the memory. So instead of writing data to hard disk in real time, Kafka takes advantage of modern operating systems’ paged storage to use memory to improve I/O efficiency.

Memory Mapped Files(mmap) are also translated into Memory Mapped Files. In 64-bit operating systems, 20GB of data Files can be represented. It works by directly mapping Files to physical Memory using the operating system Page. After the mapping is complete your operations on physical memory will be synchronized to the hard disk (operating system in due course).

With MMAP, processes can read and write memory (virtual machine memory, of course) just as they read and write hard disks, without worrying about the size of memory.

You can get a big I/O boost in this way, eliminating the overhead of user-to-kernel copying (the read of the calling file puts the data into kernel memory and then copies it into user-space memory). There is also an obvious flaw – unreliability. Data written to Mmap is not actually written to disk, and the operating system does not write data to disk until flush is called.

Kafka provides a parameter — producer.type to control whether the producer is actively flush. If Kafka writes to the Mmap, it flush immediately and then returns the producer called sync. Immediately after writing to mmap, returning Producer without calling Flush is called async.

Read the data

What optimizations does Kafka make when reading disks?

Implement Zero Copy based on SendFile

In traditional mode, when a file needs to be transferred, the details are as follows:

  1. The file data is copied to the kernel buffer by calling the read function
  2. The read function returns that file data is copied from the kernel buffer to the user buffer
  3. Write function call that copies file data from the user buffer to the kernel socket-related buffer.
  4. Data is copied from the socket buffer to the relevant protocol engine.

In this case, the file is copied four times:

Disk – > kernel BUF – > User BUF – > Socket-related buffer – > Protocol engine

The SendFile system call provides a way to reduce multiple copies and improve file transfer performance. In kernel version 2.1, the SendFile system call was introduced to simplify data transfer over the network and between two local files. The introduction of SendFile not only reduces data replication, but also reduces context switching.

sendfile(socket, file, len);

The operation process is as follows:

  1. Sendfile system call, file data is copied to the kernel buffer
  2. Copy from the kernel buffer to the socket-related buffer in the kernel
  3. Finally, copy the socket-related buffer to the protocol engine

Compared to traditional read/write, the 2.1 kernel introduced sendFile has reduced the copy of files from the kernel buffer to the user buffer, and then from the user buffer to the socket-related buffer. After kernel 2.4, the file descriptor result was changed. Sendfile implements a simpler approach that again reduces the need for a copy operation.

In Apache, Nginx, Lighttpd and other Web servers, there is a configuration related to sendfile. Using Sendfile can greatly improve file transfer performance.

Kafka stores all messages in a file, and when the consumer needs data Kafka sends the file directly to the consumer, along with Mmap as a file read/write method, passing it directly to SendFile.

Batch compression

In many cases, the bottleneck is not CPU or disk, but network IO, especially for data pipelining that needs to send messages between data centers on a wan. Data compression consumes a small amount of CPU resources, but in the case of Kafka, network IO should be considered.

  • If each message is compressed, the compression rate is relatively low, so Kafka uses batch compression, where multiple messages are compressed together instead of a single message
  • Kafka allows the use of recursive message sets. Bulk messages can be transmitted in compressed form and remain compressed in logs until they are decompressed by consumers
  • Kafka supports a variety of compression protocols, including Gzip and Snappy

conclusion

The secret of Kafka speed is that it turns all messages into a batch file, and carries out reasonable batch compression, reduces network IO loss, improves I/O speed through MMAP, and writes data because a single Partion is added at the end, so the speed is optimal; Read data with sendFile direct violence output.