Kafka is ubiquitous messaging middleware in the big data space, and is now widely used for real-time data pipelines within enterprises and to help enterprises build their own streaming computing applications.
Kafka is a disk-based data store with high performance, high throughput, and low latency. Its throughput can range from tens of thousands to tens of millions.
But many people who have used Kafka are often asked the question: why is Kafka fast and high throughput? Most of the people asked are confused, or just know some simple points, this article will briefly introduce Kafka why the throughput is high, fast.
First, sequential reading and writing
It is well known that Kafka persists message logging to a local disk. The general perception is that disk read/write performance is poor, and you may question how Kafka can guarantee performance. In fact, whether memory or disk is fast or slow depends on the way it addresses. Disk is divided into sequential and random reads and writes, and memory is divided into sequential and random reads and writes. While disk-based random read/write is slow, sequential disk read/write performance is generally three orders of magnitude higher than random disk read/write performance. In some cases, sequential disk read/write performance is higher than random memory read/write performance.
Here is the performance comparison chart of ACM Queue: queue.acm.org/detail.cfm?…
Sequential disk reads and writes are the most consistent of the disk usage patterns, and the operating system has greatly optimized this pattern. Kafka uses sequential disk reads and writes to improve performance. Kafka messages are continuously appended to the end of local disk files, rather than being written randomly, resulting in a significant increase in Kafka write throughput.
The figure above shows how Kafka writes data to a Partition. Each Partition is actually a file, and Kafka inserts the data at the end of the file (the dotted box) after receiving the message.
One drawback of this approach is that there is no way to delete data, so Kafka does not delete data. It keeps all data, and each Consumer has an offset for each Topic to indicate which data item was read.
Consumer1 has two offsets corresponding to Partition0 and Partition1 (assuming that each Topic has one Partition). Consumer2 has an offset for Partition2. This offset is stored by the client SDK, and Kafka’s Broker ignores it completely. Normally, the SDK will save it in ZooKeeper. (So you need to provide the zooKeeper address to the Consumer).
If you don’t delete the hard drive, it will be full, so Kakfa offers two strategies for deleting data. It is based on the time and partition file size. See its configuration documentation for details.
Second, the Page Cache
To optimize read and write performance, Kafka uses the operating system’s own Page Cache, which uses the operating system’s own memory rather than JVM space memory. The benefits of this are:
- Avoid Object consumption: If you are using a Java heap, Java objects consume a lot of memory, often twice as much or more of the stored data.
- Avoid GC problems: As the amount of data in the JVM increases, garbage collection becomes complex and slow, and there are no GC problems using the system cache
Using the Page cache of an operating system is much simpler and more reliable than using data structures such as the JVM or in-memory cache. First, cache utilization is higher at the operating system level because it stores compact byte structures rather than individual objects. Second, the operating system itself has made significant improvements to the Page Cache, providing write-behind, read-ahead, and flush mechanisms. Furthermore, the system cache does not disappear even if the server process restarts, avoiding the in-process cache rebuilding process.
Kafka’s read and write operations are basically memory-based through the operating system’s Page Cache, which greatly improves read and write speeds.
Zero copy
The Zero copy mechanism in Linux uses the sendFile method to allow the operating system to send data directly from the Page Cache to the network, requiring only the last copy operation to copy the data to the NIC buffer, thus avoiding re-copying the data. The schematic diagram is as follows:
With this zero-copy mechanism, Page Cache combined with sendFile, Kafka consumer side performance also improved dramatically. This is why sometimes we don’t see high disk I/O as the consumer continues to consume data, and it is the OPERATING system cache that provides the data.
When a Kafka client reads data from a server, if zero copy is not used, the process roughly goes like this:
1. The operating system reads data from the disk into the read buffer of the kernel space.
2. The application (aka Kafka) copies data from the read buffer in kernel space to the buffer in user space.
3. The application writes data from the user-space buffer back to the kernel-space socket buffer.
4. The operating system copies the data in the socket buffer to the NIC buffer and sends the data to the client over the network.
no zero cop
As you can see from the figure, data is shuttled between kernel space and user space twice, so can you avoid this redundant process? Of course, Kafka uses zero-copy technology, where data is copied directly from the kernel-space read buffer to the kernel-space socket buffer and then written to the NIC buffer, avoiding the need to travel between the kernel-space and user-space.
zero copy
As you can see, zero copy here does not mean no copy at all, but avoids copying between kernel space and user space. If there is no copy at all, then the data will not be sent to the client, right? However, just eliminating this step can result in a significant performance improvement.
Partition segmentation + index
Kafka’s messages are grouped by topic, and the data in these topics is stored in partitions to different broker nodes. Each partition corresponds to a folder on the operating system. Partitions are actually stored by segment. This is also very consistent with the distributed system of buckets.
With this segmented design, Kafka’s messages are actually distributed in a small segment, and each file operation is a direct segment. For further query optimization, Kafka creates index files for segmented data files by default, which are.index files on the file system. This design of partition and index not only improves the efficiency of data reading, but also improves the parallelism of data operation.
Five, batch read and write
Kafka data reads and writes are also batch rather than single.
In addition to leveraging underlying technologies, Kafka also provides some means at the application level to improve performance. The most obvious is the use of batches. When writing data to Kafka, batch writes can be enabled to avoid the latency and bandwidth overhead associated with frequently transferring individual messages over the network. Given a network bandwidth of 10MB/S, it is obviously much faster to transfer a 10MB message at once than a 1KB message 10 million times.
Six, batch compression
In many cases, the bottleneck is not CPU or disk, but network IO, especially for data pipelining that needs to send messages between data centers on a wan. Data compression consumes a small amount of CPU resources, but in the case of Kafka, network IO should be considered.
1> If every message is compressed, but the compression rate is relatively low, so Kafka uses batch compression, where multiple messages are compressed together instead of a single message
2>Kafka allows the use of recursive message sets. Bulk messages can be transmitted in compressed form and remain compressed in the log until they are decompressed by the consumer
3>Kafka supports multiple compression protocols, including Gzip and Snappy
The secret of Kafka speed is that it turns all messages into a batch file, and carries out reasonable batch compression, reduces network IO loss, improves I/O speed through MMAP, and writes data because a single Partion is added at the end, so the speed is optimal; Read data with sendFile direct violence output.