The reason for the high throughput of Kafa

In order to read and write

Kafka’s messages are continuously appended to files, a feature that allows Kafka to take advantage of the sequential read and write performance of disks

Sequential reads and writes do not require the seek time of the hard disk head and require very little sector rotation time, so they are much faster than random reads and writes

Zero copy

After Linux kernel2.2, a system call mechanism called “zero-copy” was introduced, which skips the copy of the “user buffer” and creates a direct mapping between disk space and memory. Data is no longer copied to the “user buffer”.

partition

Topic content in Kafka can be divided into multiple partitions, and each partition can be divided into multiple segments, so each operation is performed on a small part, which is lightweight and increases the ability to operate in parallel

Batch send

Kafka allows you to send messages in batches. When producter sends a message, it caches it locally and sends it to Kafka when certain conditions are met

  1. Wait for the number of messages to reach a fixed number
  2. Send them once in a while

Data compression

Kafka also supports compression of message sets. Producer can compress message sets using GZIP or Snappy formats. Compression reduces the amount of data to be transmitted and reduces the pressure on network transmission

  • Batch sendandData compressionUsed together, single data compression, the effect is not obvious

Topic, partition, segment

Topic is divided into multiple partitions

  1. A topic can be divided into multiple partitions.A topic can reside on only one broker, and the amount of information that can be stored is limited by a single broker. Divided into multiple partitions, each partition can be distributed among different brokers, so that the contents of a topic can be stored on many machines.Horizontal expansion of topic has been realized.
  2. Consumption of multiple partitions can be done concurrently, committing the ability to handle concurrency

Partition Divides the data into multiple segment files

Segment is the file in the folder (a data file + an index file)

If no segment is used, all data inside a partition will be recorded in the same file. Deleting expired data will be troublesome. If a partition is divided into multiple segments, delete all segments that have expired

The resources

  1. Blog.csdn.net/u013256816/…