Kafka is not completely synchronous or asynchronous. It is a special ISR (In Sync Replica). In-sync Replica (ISR) is a set of synchronization sets maintained by Kafka for a zone. That is, each zone has its own ISR set. The replica in the ISR set means that the follower replica and the leader replica are synchronized. Only replicas in an ISR collection are eligible to be elected leader.Copy the code
Kafka’s up
- A kafka topic can have n replicas. It is recommended that the number of replicas be less than or equal to the number of brokers. That is, ensure that there is at most one replica in one broker.
- Replicas are created in topic zones. Each zone has a leader and 0 to N-1 followers. Kafka divides multiple replicas into Lerder replicas and follower replicas.
- When the producer writes data to a topic partition, according to the ACK mechanism, the default ACK =1 only writes data to the leader. The data in the leader is then copied to other replicas, and the followers periodically pull data from the leader. However, all read and write operations are in the leader replica. The follower replica is selected only after the leader replica dies, and the follower does not provide external services.
Kafka ISR mechanism
ISR replicas: follower replicas that are basically the same as the leader’s replicas. If synchronization is too slow, they will be kicked out of the ISR replicas.
Copy synchronization:
-
Last End offset (LEO) : indicates the offset at the end of the log, which records the offset of the next message in the underlying log file of the replica object. The LEO value is automatically updated when the replica writes the message. If LE0 is 2, the current offset is 1.
-
High watermark (HW) : indicates the high watermark value. HW must not be greater than LEO value. Messages smaller than HW value are considered “committed” or “backed up” and visible to consumers.
-
The producer sends messages to the leader and then writes them to the leader. The leader generates logs locally. The Follow then pulls messages from the leader and writes them to the local log, which returns an ACK signal to the leader. Once all ack signals in the ISR are received, the HW is incresed and the leader returns an ACK to the producer.
Kafka’s replication mechanism
Kafka each partition consists of an immutable sequence of messages appended sequentially. Each message has a unique offset to mark the location.
When creating a topic in Kafka, you can set a replica count. The replica count determines the number of replicas in the zone. If the leader fails, the replica count is set. Kafka will failover the partition master node to other replica nodes to ensure that messages from this partition are available. The leader node receives messages sent by the producer, while the followers copy messages from the master node.
[image-31f1F9-1614765714779]
The Kakfa log replication algorithm provides the guarantee that after a message is committed by the producer, if the leader node fails and another node is elected as the leader node, the message can also be consumed.
The key configuration: unclean. Leader. Election. The enable
Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss
Type: boolean
Default: false
Valid Values:
Importance: high
Update Mode: cluster-wide
Copy the code
The default value is false, that is, replica is not selected as the leader in the ISR. This configuration can be configured globally or at the topic level.
Each point in the set must be synchronized with the leader message, that is, there is no delay. The leader of the partition maintains the LIST of ISR sets. If a point falls too far behind, it is kicked out of the ISR set.
After the producer sends a message to the leader, the leader commits only when all replicas in the ISR send an ACK to confirm the message. Then the producer considers the message committed. The write performance of the Kafka client depends on the reception performance of the slowest broker in the ISR set. If a point is poor, it must be identified as soon as possible and then kicked out of the ISR set to avoid performance problems.
How do I determine if a replica will not be removed from an ISR collection?
Replica.lag.max. Messages: Indicates the maximum number of messages of the follower copy that is behind the leader copy. (Removed after 0.9.0.0).
Replica.lag.time.max. ms: Refers not only to the time that has elapsed since the request was last obtained from the copy, but also to the time since the copy was last captured.
Set replica.lag.max.messages to 3. As long as the follower does not lag more than 2 messages behind the leader, the node that can keep up with the leader will not be kicked out.
Set replica.lag.time.max.ms to 300ms, which means that followers are not considered dead and will not be kicked out of the ISR collection as long as they send a fetch request every 300ms.
conclusion
The purpose of the Replica is to take the lead in case of an accident. After the leader fails, a new leader needs to be selected from the followers. During the election, the follower is preferentially selected from the ISR, because the data of the followers in this list is synchronized with that of the leader. Selecting from them ensures data integrity.
However, if all the followers in the ISR list fail, you have to choose from other followers. In this case, there is a risk of data loss because it is not sure whether the follower has copied all the leader’s data.
There is also an extreme case where all replicas fail, and there are two options:
-
Wait for one of the ISRS to come alive. Select Leader. The data is reliable, but the time of coming alive is uncertain.
-
Select the first Replication to come to life, not necessarily in the ISR, and select the leader to restore availability as quickly as possible, but data may not be complete.
Kafka allows you to choose which solution to use through configuration, with tradeoffs based on availability and consistency.