One of the Redis series of articles, “Redis Core: The Only Secret that Can’t Be Broken”, explains the performance optimization of Redis. Deeply from IO, thread, data structure, coding and other aspects of Redis “fast” internal secrets. 65 elder brother deeply inspired, in the process of learning Kafka, found that Kafka is also a very good performance of the middleware, then asked “code brother” to speak about Kafka performance optimization knowledge, so “code brother” decided to this performance blog as the Kafka series of the opening work.

Here’s a preview of the Kafka series:

Starting with a performance tour of Kafka, let’s take a closer look at the secrets behind Kafka’s “fast.” Not only can you learn the various methods of Kafka performance optimization, you can also refine the various performance optimization methodologies that can be applied to our own projects to help us write high-performance projects.

Guan Gong fought Qin Qiong

Redis and Kafka are totally different middleware. Is there a comparison?

Yes, so this is not “Distributed Cache Selection”, nor “Distributed Middleware Comparison”. We focus on the performance optimization of projects in these two different areas, and take a look at the common methods of performance optimization for good projects, as well as the optimization methods for different scenarios.

Many people learn a lot, understand a lot of frameworks, but often feel inadequate when it comes to practical problems. This is the failure to systematize the knowledge learned and to abstract effective methodologies from concrete implementations.

One of the most important aspects of learning about open source projects is generalization, a methodology that takes the best implementations of different projects and then translates them into my own practice.

Brother code message

Rationality, objectivity and caution are the characteristics and advantages of programmers. However, in many cases, we also need to bring a little sensibility and a little impulse, which can help us make decisions faster. “The pessimist is right; the optimist succeeds.” I hope we can solve problems with optimism.

Kafka performance overview

From a highly abstract point of view, performance problems cannot escape from the following three aspects:

  • network
  • disk
  • The complexity of the

For distributed queues like Kafka, network and disk optimization are the most important. The solution to the abstract problem presented above is highly abstract and simple:

  • concurrent
  • The compression
  • batch
  • The cache
  • algorithm

With that in mind, let’s take a look at the roles in Kafka that can be optimized:

  • Producer
  • Broker
  • Consumer

Yes, all the questions, ideas, optimization points have been listed, we can be as detailed as possible, all three directions can be detailed, so that all the implementation is clear, even without looking at Kafka implementation, we can think of a point or two can be optimized.

That’s the way to think about it. Ask the question > list the problem points > list the optimization methods > list the specific points to cut in > tradeoff and refine the implementation.

Now, you can also try to think of their own optimization points and methods, not perfect, not well managed to achieve, think a bit is a bit.

65 elder brother: no line, I am very stupid, also very lazy, you still say directly with me, I white piao compare line.

Order to write

Redis is a system based on pure memory. Kafka also reads and writes disks.

Why is disk writing slow?

We can’t just know the conclusion without knowing the whine. To answer this question, we have to go back to our operating system courses in school. Does brother still have his textbook? Now, turning to the chapter on disks, let’s review how disks work.

Brother 65: The ghost is still there. The book was lost before the course was half over. If it were not for my good eyes, I would not have graduated.

See the classic big picture:

To complete disk I/O, you need to go through three steps: seek, rotation, and data transfer.

Factors affecting disk I/O performance occur in the above three steps, so the main time is:

  1. Seek time: Tseek is the time required to move the read/write head to the correct track. The shorter the seek time, the faster the I/O operation. Currently, the average seek time of a disk ranges from 3 to 15ms.
  2. Rotation delay: Trotation is the amount of time it takes for the disk rotation to move the sector of requested data below the read/write disk. The rotation delay depends on the disk speed and is usually expressed as 1/2 of the time it takes for the disk to rotate once. For example, the average rotation delay of a 7200rpm disk is about 4.17ms (60 x 1000/7200/2), and that of a 15000rpm disk is 2ms.
  3. Data transfer time: Ttransfer refers to the time required to complete the transmission of the requested data. It depends on the data transfer rate and its value is equal to the data size divided by the data transfer rate. Currently IDE/ATA can reach 133MB/s, SATA II can reach 300MB/s interface data transmission rate, data transmission time is usually much less than the consumption of the first two parts. It can be ignored in simple calculation.

Therefore, the disk read/write performance can be greatly improved by eliminating seek and rotation when writing to the disk.

Kafka uses sequential file writes to improve disk write performance. Sequential file writing basically reduces the number of disk seek and rotation. The heads no longer dance around the track, but zip along.

Each Partition in Kafka is an ordered, immutable sequence of messages. New messages are constantly appended to the end of a Partition. Partition is just a logical concept in Kafka. Each Segment corresponds to a physical file. Kafka writes to the Segment sequentially.

Why can Kafka use appending?

It has to do with the nature of Kafka, let’s look at Kafka and Redis, Kafka is a Queue, and Redis is a HashMap. What’s the difference between Queue and Map?

A Queue is FIFO, the data is ordered; HashMap data is unordered and is read and written randomly. Kafka’s immutability and orderliness allow Kafka to append files.

In fact, many data systems that meet the above features can use appending to optimize disk performance. Typical examples include Redis AOF files, WAL(Write Ahead log) mechanism of various databases and so on.

So clearly understand the characteristics of their own business, can be targeted to make optimization.

Zero copy

Brother: Haha, I was asked that in the interview. Unfortunately, the answer was so-so, alas.

What is zero copy?

Taking a look at the Kafka scenario, the Kafka Consumer consumes the data stored on the Broker disk, and what system interactions are involved between reading the Broker disk and the network transmission to the Consumer. Kafka Consumer consumes data from the Broker, which reads logs, using SendFile. Using the traditional IO model, the pseudocode logic would look like this:

readFile(buffer)
send(buffer)
Copy the code

As shown in the figure, if the traditional I/O process is used, the network I/O is read first and then the disk I/O is written. In fact, the data needs to be copied four times.

  1. The first time: read the disk file into the operating system kernel buffer;
  2. Copy the kernel buffer data to the application buffer.
  3. Step 3: Copy the data in the application program buffer to the socket network sending buffer.
  4. Fourth time: Copy the socket buffer data to the network adapter for network transmission.

Brother 65: Ah, is the operating system so stupid? Copy and copy.

It’s not that the operating system is stupid. The design of the operating system is that each application has its own user memory, and the user memory is isolated from the kernel memory for the sake of application and system security. Otherwise, each application has its own memory, which can be read and written at will.

However, there is also zero-copy technology, English — zero-copy. Zero copy is to minimize the number of data copies, thus reducing the CPU cost of copy, reducing the number of context switches in user-mode kernel mode, and optimizing data transmission performance.

There are three common zero-copy ideas:

  • Direct I/O: Data passes directly across the kernel between the user’s address space and the I/O device. The kernel only performs necessary auxiliary tasks such as virtual storage configuration.
  • Avoid data copying between kernel and user space: When the application does not need to access data, it can avoid copying data from kernel space to user space.
  • Copy-on-write: Data does not need to be copied in advance, but partially copied when it needs to be modified.

Kafka uses mmap and SendFile to implement zero copy. Corresponding to Java MappedByteBuffer and Filechannel.transferto respectively.

Zero copy using Java NIO as follows:

FileChannel.transferTo()
Copy the code

Under this model, the number of context switches is reduced to one. Specifically, the transferTo() method instructs the block device to read data into the read buffer through a DMA engine. This buffer is then copied to another kernel buffer for staging to the socket. Finally, the socket buffer is DMA copied to the NIC buffer.

We reduced the number of copies from four to three, and only one of those copies involved the CPU. We also reduced the number of context switches from four to two. This is a big improvement, but there is no query for zero copies yet. The latter can be implemented as a further optimization when running Linux kernel 2.4 and later and network interface cards that support collection operations. As shown below.

Based on the previous example, calling the transferTo() method causes the device to read the data through the DMA engine into the kernel read buffer. However, when the Gather operation is used, there is no copy between the read buffer and the socket buffer. Instead, the NIC is given a pointer to the read buffer along with an offset and length that are cleared by DMA. The CPU has absolutely no role in copying buffers.

For details on zero-copy, read this article zero-copy analysis and its applications.

PageCache

When producer produces a message to the Broker, the Broker uses the pwrite() system call [corresponding to the Java NIO filechannel.write () API] to write data at an offset, and the data is written to the Page cache first. When the consumer consumes the message, the Broker uses the sendFile () system call [corresponding to the Filechannel.transferto () API], Zero-copy transfer of data from the Page cache to the broker’s Socket buffer and over the network.

The synchronization between the leader and followers is the same as the process of consumer consumption data above.

The data in the page cache is written back to disk with the scheduling of the flusher thread in the kernel and calls to sync()/fsync(), so there is no fear of data loss even if the process crashes. In addition, if a consumer wants to consume a message that is not in the Page cache, it will read it from disk, and it will prefetch some adjacent blocks into the Page cache for the next read.

Therefore, if Kafka Producer’s production rate is similar to the consumer’s consumption rate, the entire production-consumption process can be completed almost exclusively by reading and writing to the broker Page cache, with very few disk accesses.

A network model

Brother 65: The network, as a Java programmer, is Netty

Yes, Netty is an excellent network framework for the JVM, providing high performance network services. When most Java programmers think of networking frameworks, Netty is the first thing that comes to mind. Excellent frameworks such as Dubbo and Avro-RPC use Netty as the underlying network communication framework.

Kafka implements the network model itself to do RPC. The base layer is based on Java NIO and uses the same Reactor thread model as Netty.

The Reacotr model is divided into three main roles

  • Reactor: Assigns I/O events to the corresponding handler
  • Acceptor: Processes client connection events
  • Handler: Processes non-blocking tasks

In the traditional blocking I/O model, each connection needs to be processed by independent threads. When the number of concurrent connections is large, the number of threads created is large, occupying resources. Using the blocking IO model, if the current thread has no data to read after the connection is established, the thread will block the read operation, resulting in a waste of resources

Aiming at the two problems of traditional blocking IO model, the Reactor model is based on the pooling idea, which avoids creating threads for each connection and transfers business processing to the thread pool after the connection is completed. Based on the IO multiplexing model, multiple connections share the same blocking object without waiting for all connections. When new data is available for processing, the operating system notifies the program and the thread jumps out of the blocking state for business logic processing

Kafka implements multiplexing and processing thread pools based on the Reactor model. Its design is as follows:

An Acceptor thread processes new connections, N Processor threads SELECT and read socket requests, and N Handler threads process the request and the corresponding business logic.

I/O multiplexing allows the system to process multiple client requests simultaneously in a single thread by reusing multiple I/O blocks onto the same SELECT block. Its biggest advantage is that the system cost is small, and there is no need to create a new process or thread, reducing the system resource cost.

Conclusion: Kafka Broker KafkaServer design is an excellent network architecture, there are Java network programming, or need to use this technology students may wish to read the source code. The following Kafka series will also cover this source code interpretation.

Batch and compression

Kafka Producer does not send messages to brokers one by one. Those of you who have used Kafka know that Producer has two important parameters: batch.size and Lingering.ms. These two parameters are related to the batch sending of the Producer.

Kafka Producer’s execution flow is shown below:

The message is sent through the following handlers:

  • Serialize: Keys and values are serialized according to the passed serializer. Excellent serialization method can improve the efficiency of network transmission.
  • Partition: Determines which Partition of the topic to write the message to, following murmur2 algorithm by default. Custom partitioning routines can also be passed to producers to control which partition messages should be written to.
  • Compress: Compression is disabled in the Kafka producer by default. Compression not only allows faster transmission from producers to agents, but also allows faster transmission during replication. Compression helps improve throughput, reduces latency, and improves disk utilization.
  • The Accumulate:AccumulateAs the name suggests, it is a message accumulator. It internally maintains one for each PartitionDequeDouble-endian queues, queues hold batches of data to be sent,AccumulateWhen data is accumulated to a certain amount, or expires within a certain time, it is sent in batches. Records are accumulated in a buffer for each partition of the topic. Group records according to producer batch size attributes. Each partition in the topic has a separate accumulator/buffer.
  • Group Send: The batches of partitions in the record accumulator are grouped by the agent to which they are sent. Records in the batch are sent to the agent based on batch.size and Lingering.ms attributes. Records are sent by the producer based on two criteria. When a defined batch size is reached or a defined delay time is reached.

Kafka supports multiple compression algorithms: LZ4, SNappy, and GZIP. Kafka 2.1.0 officially supports ZStandard — ZStandard is Facebook’s open source compression algorithm designed to provide extremely high compression ratios (SEE ZSTD for details).

The Producer, Broker, and Consumer use the same compression algorithm. The Producer writes data to the Broker, and the Consumer reads data to the Broker without decompressing. Finally, the Consumer Poll does not decompress until the message arrives, which saves a lot of network and disk overhead.

The partition of concurrent

Kafka’s topics can be divided into multiple partitions, with each Paritition acting like a queue, ensuring that data is ordered. The partition is actually the smallest unit for tuning Kafka parallelism, so it can be said that each additional Paritition increases the consumption concurrency.

Kafka has an excellent partition allocation algorithm called StickyAssignor, which ensures that partition allocation is as balanced as possible and that the result of each reassignment is as consistent as the last one. In this way, the partitions of the cluster are balanced so that the processing of brokers and consumers is not too skewed.

Brother 65: Is it true that the more partitions the better?

Of course not.

More partitions require more file handles to be opened

In Kafka’s broker, each partition is referenced to a directory of the file system. In Kafka’s data log file directory, each log data segment is allocated two files, an index file and a data file. Therefore, as the number of partitions increases, the number of file handles required increases sharply. You need to adjust the number of file handles allowed by the operating system if necessary.

The more memory the client/server side needs to use

The client producer has a parameter, batch.size, which is 16KB by default. It caches messages for each partition and, once full, packs them up and sends them out in bulk. It looks like a performance enhancing design. Obviously, because this parameter is partition-level, the more partitions there are, the more memory required for this part of the cache.

Reduce high availability

The more partitions there are, the more partitions are allocated to each Broker. When a Broker goes down, the recovery time is very long.

File structure

Kafka messages are grouped by Topic, which are independent of each other. Each Topic can be divided into one or more partitions. Each partition has a log file for recording message data.

Kafka each partition log is physically divided into multiple segments by size.

  • Segment file: Consists of two main files: index file and data file. These two files correspond to each other in pairs, with the suffix “.index “and”.log “denoting the segment index file and data file respectively.
  • The name of each subsequent segment is the offset value of the last message in the preceding segment. Values are up to 64 bits long, 19 digits long, and no digits are padded with zeros.

Index is sparse, so that each index file has a limited size. Kafka uses mmap to map index files directly to memory, so that no disk I/O operation is required. The Java implementation of Mmap corresponds to MappedByteBuffer.

Note: Mmap is a method of memory-mapping files. A file or other object is mapped to the address space of a process to realize the mapping between the file disk address and a segment of virtual address in the process virtual address space. After such mapping is achieved, the process can use Pointers to read and write the memory, and the system will automatically write back dirty pages to the corresponding file disk, that is, the operation on the file is completed without calling system call functions such as read and write. Conversely, changes made by the kernel space to this area directly reflect user space, allowing file sharing between different processes.

Kafka makes full use of dichotomy to find the message position corresponding to offset:

  1. Use dichotomy to find.log and.index for segment less than offset
  2. Subtract the offset in the filename from the destination offset to get the offset of the message in this segment.
  3. Use dichotomy again to find the corresponding index in the index file.
  4. Go to the log file and search sequentially until the message corresponding to offset is found.

conclusion

Kafka is an excellent open source project. Its performance optimization is done incisively and vividly, is worth our in-depth study of a project. Whether it is thought or realization, we should take a serious look and think about it.

Kafka performance optimization:

  1. Zero-copy networks and disks
  2. Excellent network model, based on Java NIO
  3. Efficient file data structure design
  4. Parition is parallel and extensible
  5. Batch data transmission
  6. Data compression
  7. Sequential reading and writing disks
  8. Lockless lightweight offset

Review past

  1. Graphic | distributed, programmers advanced road
  2. Finish Kafka from an interview perspective
  3. Software architecture patterns that must be understood
  4. Redis: A killer app for fast recovery without downtime

If there are mistakes in the article, thank you for correcting, follow me, get real hardcore knowledge. In addition, the technical reader group has also been opened, and the background replies to “Add group” to obtain the wechat of the author of “code elder brother byte”, so that we can grow and communicate together.

That’s the secret of Kafka’s “fast”. If you feel good, please like it and share it.