Apache Kafka is a distributed messaging system written in the Scala language.

Basic Kafak concepts

message

Message is the most basic data unit in Kafka. It consists of two parts: key and value. KV is a byte array.

Kafka routes messages to specific partitions based on key values based on policies to ensure that all messages with the same key are written to the same partition.

Value is the business data that Kafka actually delivers.

Topic, partition, broker

Topic is a logical concept in Kafka. Multiple topics can be defined in a Kafka cluster.

When used, the producer and consumer agree on a topic name, and the messages produced by the producer are written to the specified topic, and the consumer reads the messages from the specified topic.

A topic can be thought of as a message channel that connects producers and consumers, as shown in the following figure:

Within the logical concept of topic, Kafka goes further by splitting a topic into one or more partitions (a topic has at least one partition), as shown in the following figure:

As can be seen from the figure above, when message is pushed to a partition by producer, it will be assigned an offset number, which is the unique number of the message in the partition.

By numbering offsets, Kafka ensures that messages within a partition are ordered. A partition is a store for storing ordered messages.

In a production environment, different topics need to support different amounts of traffic, which means that topics need the ability to scale horizontally, which Kafka does with partitions.

Different partitions of the same topic are allocated to different physical machines. Partitions are the smallest unit of horizontal scaling for a topic, and a set of ordered messages stored in one partition cannot be stored on multiple machines.

A common Kafka architecture is shown below:

A Kafka cluster consists of multiple brokers. A Kafka cluster consists of multiple brokers.

We can increase the throughput of a topic by adding partitions and brokers.

The broker is one of kafka’s core functions and has three core responsibilities:

  • Receiving messages sent by the producer and saving them to disks
  • Process consumer requests
  • Process requests from other brokers in the cluster and return responses based on the type of request.

log

After understanding the three macro concepts of partition, topic and broker, we further analyze them from the micro perspective.

A partition logically corresponds to a log. When producer pushes a message to a partition, Kafka actually writes the message to the log corresponding to the partition.

Log consists of multiple segments. Both log and segment are logical concepts. In fact, log corresponds to a folder on disk, and segment corresponds to a segment file and an index file under the log folder.

Create a new segment file and write it to a partition. If the segment file has expanded to a certain size, create a new segment file and write it to a partition.

Add data to the latest segment file in order to avoid large single segment files and write sequential I/OS.

The index file is a sparse index file of the segment file. Kafka maps the contents of the index file to memory to improve indexing speed.

replica

In order to improve the availability of distributed systems and ensure data integrity and security, data is often backed up, and Kafka follows this pattern.

In Kafka, partitions usually have multiple replicas (at least one replica). Messages in different replicas of a partition are identical.

Although the figures of replicas in a partition are the same, there are still role differences. Among the replicas under a partition, one is the leader replica, and the other replicas are all followers.

All read and write requests are processed by the leader replica, while the other follower replicas just periodically pull new written messages from the Leader replica to the local location and synchronously update them to their own log. The structure diagram is as follows:

As shown in the figure above, different replicas of the same partition are allocated to different brokers. In this way, when the broker where the leader replica is located breaks down, Kafka elects a new Leader replica from the remaining follower replicas. The new Leader Replica continues to provide read and write services.




ISR collection

Let’s talk more about Replica. There is a concept called “ISR” In the Kafka replica, which is called “in-sync Replica”. It represents a collection of replicas. The replicas In the collection must meet the following two requirements:

  • The Replica broker must be connected to the Zookeeper cluster.
  • The difference between the offset of the last message in the replica and that of the last message in the leader Replica must not exceed the specified threshold.

A Kafka cluster maintains an ISR set for each partition. Kafka’s ability to write messages is closely related to the ISR set: When the leader replica receives a write request, the leader replica first processes it and persists it in the local log. Then the follower replica periodically requests the Leader Replica to pull the newly written message. The data is synchronized to the local log of the follower Replica. Of course, there is a delay in the regular synchronization requests of the follower replica compared with the write operations of the leader replica. As a result, the messages stored in the follower replica are slightly behind the leader replica. However, as long as the specified threshold is not exceeded, the follower replicas are in the ISR collection.

If a follower replica fails to synchronize data with the leader Replica for a long time due to abnormal requests such as downtime, long-time GC or network faults, that is, the follower replica no longer meets the requirements and is kicked out of the ISR collection.

After the follower replica recovers from the above fault, it starts to synchronize with the leader Replica again. When the follower replica catches up with the Leader replica (that is, the difference between the offset of the last message and the leader replica is less than the specified threshold), the follower replica is added to the ISR collection.

HighWatermark, Log End Offset

HW (HighWatermark) and LEO (Log End Offset) are closely related to the ISR set above.

HW records a special offset value. When a consumer pulls a message, it can only pull the message before HW, and the message after HW is invisible to the consumer.

Similar to ISR collections, HW is managed by the Leader Replica. After all follower replicas in the ISR set synchronize messages corresponding to HW, the LEADER Replica increases the HW value.

Because the message before HW exists in multiple replicas at the same time, even if the leader Replica breaks down or the disk is damaged, These messages also do not suffer data loss (the follower replica reelects as the leader replica and provides read services), so Kafka considers the message to be “committed” before HW.

LEO (Log End Offset) is an Offset mark that all replicas have. It is used to record the Offset value of the last message added to the current replica:

  • When the Leader Replica receives the message sent by producer, the LEO mark of the Leader Replica will increase.
  • When the follower replica successfully pulls the message from the leader replica and updates it to the local log, the number of LEO of the follower replica also increases.

With the basic concepts of HW and LEO understood, here is a dynamic illustration of how HW and LEO work together, as shown in the following figure:

  1. First, producer sends a message to a partition in a topic we care about.
  2. When the message is written to the leader Replica, the offset allocated is 5, and the Leader Replica changes its LEO value from 4 to 5. In this case, the HW of the Leader Replica remains 4.
  3. The follower replica pulls the message from the leader replica and synchronizes it to the local log. The follower replica changes its LEO from 4 to 5.
  4. When all replicas in the ISR set synchronize offset=5, the leader Replica changes the HW mark from 4 to 5. When the consumer pulls the message again, it will see the message with offset 5.

Replication scheme design

After introducing the basic knowledge of Replica, we further understand the concept of Replica design. As mentioned above, redundant backup is one of the common backup schemes in distributed storage. The common backup schemes include synchronous replication and asynchronous replication:

  • Synchronous replication means that a message is committed only after all the followers replicas have copied it.

  • Asynchronous replication means that the leader Replica immediately updates the HW after receiving the message sent by the producer. That is, the message is considered to be in the “committed” state. The follower replica then synchronizes messages from the leader replica asynchronously.

Here are the problems with the two replication schemes:

  • For the synchronous replication scheme, once a follower replica fails or GC is Full for a long time and message cannot be synchronized, HW cannot be updated and message cannot be submitted, so consumers cannot get messages. A faulty follower replica in this case causes the entire distributed system to become unavailable.

  • For the asynchronous replication solution, data loss may occur in the synchronous replication solution, although it avoids the problem that a single point of failure may bring down the entire cluster. For example, in a cluster, the synchronization speed of all follower replicas is slow, and the amount of messages stored in them is far behind that of the leader replica. If the leader replica breaks down at this moment, there is not all the messages of the original Leader Replica in the newly elected Leader Replica, which will cause data loss. In addition, At this point, some consumers may consume these missing messages, and no one knows what these missing messages are, so the state of the whole system is out of control.

Kafka’s Replica and ISR collection designs weigh synchronous replication and asynchronous replication:

  • When the synchronization of a follower replica is too far behind that of the leader replica, Kafka identifies it as a faulty replica and kicks it out of the ISR collection. Since message submission only focuses on the replica in the ISR set, the replica in slow synchronization does not affect the performance of the whole system.
  • When the leader replica is down, kafka first elects a new Leader replica from the ISR collection. The new Leader Replica contains all messages before the HW signature. Because the message after HW is in the “uncommitted” state, from outside the Kafka cluster, the switch of the leader Replica is not sensed and the data is not lost.




Retention Policy & Log compaction

If you know something about Kafka, kafka retains messages for a long time, whether or not they have been consumed by a consumer. This is designed so that the consumer can fall back to a certain offset and start consuming again.

However, Kafka is not a database, and you shouldn’t always store history messages, especially those that are definitely not going to be used again. Periodically cleaning up history messages can be implemented by modifying Kafka’s Retention Policy configuration.

Kafka provides two retention policies by default:

  1. A message can be cleaned up by a background thread in a Kafka cluster if it has been kept for longer than a specified threshold
  2. The strategy of cleaning up by the size of the disk occupied by the topic means that when the topic log is smaller than a threshold, the background threads can start removing the oldest messages.

Kafka’s Retention policy can be configured for all topics, as well as for a particular topic.

In addition to the Retention policy, Kafka also provides log compaction to reduce disk usage. We know that messages consist of a key and a value. If a key value is constantly updated and consumers only care about the latest value, we can compress the log by running a log compaction compaction algorithm. Here’s how it works: Kafka starts a background compression thread that periodically merges messages with the same key, keeping only the latest values.

The following illustration shows how a log compaction works:




controller

As mentioned earlier, a Kafka cluster is made up of brokers. A Kafka cluster elects one broker to act as controller.

The controller is responsible for managing the status of partitions, replica status under partitions, and monitoring changes of Zookeeper data.

Controller is a master and slave implementation. All brokers monitor the status of the controller leader. When the controller leader fails, a new broker will be elected as the controller.




consumer

The consumer’s main job is to pull messages from the topic and consume them to complete its business logic.

An offset is maintained in consumer to record the current consumer consumption to the partition (offset).

The purpose of maintaining offset by the consumer is to reduce the pressure on Kafka Broker to maintain consumer state, especially if the broker fails or is delayed, resulting in loss of consumption state or affecting consumer consumption. In addition, the design allows consumers to specify offsets to consume according to their needs, such as skipping certain messages or re-consuming certain messages.




consumer group

Multiple consumers in Kafka can form a consumer group, which is the basic unit for consuming Kafka messages.

A consumer can only belong to one consumer group. Within a consumer group, each partition of the topic it consumes is guaranteed to be allocated to only one consumer in the consumer group for consumption. Of course, different consumer groups don’t interact with each other.

According to this feature of the Consumer Group, each consumer can form an independent consumer group, so that the effect of broadcast can be realized (that is, a message is consumed by multiple consumers at the same time).

To achieve exclusive consumption, you can put all the consumers of the target topic into a consumer group, so that only one consumer can be consumed per partition.

The following figure shows the mapping between consumers and partitions in a consumer group. Consumer0 and Consumer1 are responsible for consuming partition0 and Partition1 respectively, and Consumer2 is responsible for consuming partition2 and partition3:

In addition to providing exclusive and broadcast consumption modes, the Consumer Group also implements horizontal scaling and failover. When consumer2 doesn’t have enough processing power to consume two partitions, we can reassign the partition mapping to consumers by adding a new consumer to the Consumer group, as shown below:

After adding consumer3, Consumer2 consumes only the message in Partition2, which is reassigned to Consumer3 for consumption.

In the consumer Group failover scenario, when Consumer3 fails, the Consumer Group also redistributes the mapping between consumer and partition, as shown in the following figure. Consumer2 takes over partition3:

According to the principle of one-to-many (or one) between consumers and partitions described above, the more consumers in a consumer group, the better. When the number of consumers in a consumer group exceeds the number of partitions in a topic, some consumers cannot be allocated to the corresponding partition, resulting in idle consumers.




conclusion

This lesson focuses on introducing the basic concepts of Kafka, mainly involving the following parts:

  • The composition of the message
  • The three core concepts of Topic, partition, and broker in Kafka and their relationships
  • Log this logical concept and the corresponding physical implementation, including segment file, index sparse index file introduction
  • Discuss the relationship between Replica and partition, and discuss the design of replica
  • Introduction to ISR concepts
  • HighWatermark, Log End Offset
  • Replication scheme design
  • Retention Policy & Log compaction
  • controller
  • Consumer, consumer group

Thank you for watching. Articles and videos about the course will also be posted

  • Wechat account: Yang Sizheng

  • Station B: Yang Sizheng, corresponding video: Basic concept of Kafka