Recently I wanted to know how distributed message system is composed, so I spent some time studying the implementation principle of Kafka. Write it down for review and recall. Kafka’s design philosophy is subtle and can be borrowed from most distributed systems.

What problems can Kafka solve?

  • Kafka supports large amounts of data.
  • Can handle data accumulation problem elegantly.
  • Low latency
  • Support for distribution

Design concept

persistence

  1. Read and write disks as linearly as possible. The sequential read and write speed of a hard disk is usually more than 1000 times that of 4K read and write. Linear reads and writes can be predicted and greatly optimized by the operating system.
  2. Pagecache-centric design style, using file systems and relying on Pagecache is preferable to maintaining in-memory caches or other structures. One aspect is to avoid the performance cost of gc in the JVM. It also simplifies code implementation.
  3. To persist a queue, simply append to the file. Don’t worry about creating an index file (BTree). The complexity of queries and writes is reduced from O(logN) of BTree to O(1) linear. The data throughput is greatly improved, which helps to process massive data and has low performance requirements on the storage system, reducing costs.
  4. Consider aggregating multiple messages at once. Reduce the average cost per message.
  5. Use zero-copy to reduce the overhead of character copying.
  6. Enable the compression protocol to reduce data space.

They are all producers.

  1. Producer sends messages to the Leader Partition.
  2. Producer can ask any Partition about the status of the entire cluster and who is the Leader Partition
  3. Producer decides which Partition to write to. Producer can decide which load strategy to use. Random, polling, partitioning by key.
  4. Batch operations are supported. A certain number of messages are accumulated before being sent, using appropriate latency in exchange for higher data throughput.

Consumer

  1. The consumer directly sends a fatCH request to the Leader Partition, formulates the starting position of consumption (offset), and retrips a section of data after the offset for processing.
  2. The Consumer decides its own Offset, where it wants to consume from.
  3. Push and Pull issues. Is the message pushing or pulling? Kafka uses a mechanism where the Producer pushes messages to the Broker. The Consumer pulls messages to the Broker. This has several advantages. First, the rate of message consumption is determined by the Consumer. Second, data can be aggregated to process data in batches. If push is used, the Broker needs to consider whether to wait for multiple data or send it in a timely manner. The Consumer can pull as much data as possible to ensure that the message is consumed in a timely manner.
  4. How to track which messages are consumed effectively? Topics are divided into ordered partitions, ensuring that each partition is only consumed by consumers in the same Group at any one time. You only need to record the offset of consumption. At the same time this position can be used as CkeckPonit, timing check. The cost of guaranteeing ACK is small.
  5. If we can use a Consumer to consume data and store it on a hadoop-like platform for persistence.

Semantics of Kafka messages

  1. Message system systems generally have the following semantics:
    • At most once: A message may be lost, but it will not be redelivered
    • At least once: Messages are not lost but may be delivered repeatedly
    • Exactly once: A message is not lost, is not repeated, and will only be distributed once (really desired)
  2. After the Producer sends a message, there is a concept of commit. If the commit succeeds, the message will not be lost, but the Producer may commit successfully and not receive the commit message. This may result in at least once semantics.
  3. From the Consumer’s point of view, we know that Offset is maintained by the Consumer. So when to update Offset determines the semantics of Consumer. If the message is received and the Offset is updated, then if the Consumer crashes, the new Cunsumer restarts the consumption again, resulting in At most once semantics (messages are lost, but not repeated).
  4. If Consumser consumption is complete, then update Offset. If the Consumer crashes and another Consumer refetches the message with the Offser, the semantics at least once will be created.

Conclusion: By default Kafka provides at-least-once semantics for message distribution, allowing users to provide at-most once semantics by saving location information before processing messages. If we can realize that consumption is idempotent, then the whole system can be considered Exactly once.

The backup

  1. Kafka backs up the partiotion for each topic. The number of partiotion copies is set by the user.
  2. By default kafka has a Leader and zero or more followers.
  3. We can think of followers as consumers, consuming the Leader’s logs and backing them up locally.
  4. All reads and writes are done on the Leader, so the Follower is really just a backup.
  5. How does Kafka verify that a Follower is alive?
    • Keep in touch with ZooKeeper.
    • Follower Copies messages on the Leader and lags behind much (configurable).
  6. Messages are considered to have been submitted successfully only after being synchronized to all followers. Therefore, the failure of the Leader does not result in message loss (note the at least once semantics of the previous Producer).

The election

  1. After the Leader breaks down, a new Leader needs to be selected from the followers. Kafak dynamically maintains a synchronous backup collection (ISR). Followers in this set can all become leaders. A write is considered a Commit success only after it is synchronized to all ISRs. The ISR will also be persisted to the ZK.
  2. If all nodes fail, Kafka chooses the first replica (not in the ISR) as the Leader. Messages will be lost at this time.
  3. Oriducer can choose whether to wait for backup accordingly. The so-called backup corresponding refers to the backup corresponding in the ISR collection.