This is the 7th day of my participation in Gwen Challenge.More article challenges

Let’s start with the basics of Kafka’s high availability mechanism:

  1. kafkaHigh availability is throughleader-followerThe multi-copy mechanism is synchronized
  2. Each partition contains multiple copies (partitions are the physical storage medium where the actual log messages are stored)
  3. Multiple copies have only oneleaderAnd only it provides external services (both read and write services).
  4. otherA copy of the followerIt’s just a backup redundancy
  5. A copy of the followerIt’s by going toA copy of the leadersendfetchTo synchronize messages

So if the leader replica dies, one of the follower replicas will become the new leader replica under that partition. The problem comes. When the new leader replica is selected, the messages of the old Leader replica are not fully synchronized, which will lead to message loss or dispersion? How does Kafka prevent message errors when leader replicas change?

HW mechanism

Let’s start with a few key terms:

  1. HW: HW must not be greater than the log end displacement,offset < HWThe log is considered to beHas been submitted.Have a backup, visible to consumers
  2. LEO:last end offset, log end displacement. Where the next log is written

How are these two indices stored?

  1. leaderIt will save itselfLEO, HWAnd will be savedremote leader ηš„ LEO, HW
  2. eachfollowerThey only keep their ownLEO, HW
  3. The above instructions:remote follower ηš„ LEO, HWIt will be stored in two places

Here is the update mechanism of these two indicators:

LEO update

The Producer now sends a batch of messages to the broker, and each copy changes:

  1. A copy of the leaderTheir ownLEO+1(Because reading and writing are based on the presentA copy of the leaderProvide functionality)
  2. A copy of the follower 从 A copy of the leaderFetch log messages to local, and thenLEO+1
  3. Every timefollowerFetch will fetch itselfLEOCarry and send,A copy of the leaderIn theremote follower LEOupdate

HW update

No PIC πŸ§‚ πŸ”¨ :

The basic process:

  1. leaderPartition receivedproducerSend messages, write messages to the partition disk, andLEO++
  2. follower 向 leaderFetch message, carryfetchOffset=N(Currently the message offset to fetch is required), updatedleader remote LEO, according to eachleader remote LEO, leaderHWupdateleaderHW
  3. thenleaderSend the currentleaderHWAs well as log message response messages sent tofollower.followerThe partition writes messages to update its ownfollower LEO++
  4. followerSend a second fetch, carrying the current latestfetchOffset = N+1.leaderRequest received, updatedremote LEO = N+1, according to theleaderHWComputes the latest value and willleaderHWSent to thefollower
  5. followerCompare the current latestLEO 与 leaderHW, take the smallest as the newfollowerHW ε€Ό

HW mechanism defects

In general, we need two rounds of fetch req to complete the leader’s update of leaderHW & followerHW. If we had a Leader switch in the process, we would have lost/inconsistent data

leader epoch

It was introduced after version 0.11, mainly to compensate for the HW mechanic.

The leader-epoch-checkpoint file is used to store the leader’s epoch data:

$ cat leader-epoch-checkpoint0 10 0 3 10898Copy the code

Format:

. Epoch indicates the leader version, which is monotonically increasing. Epoch ++ is used for each leader change. Offset indicates the displacement of the first message written by the leader of each generation.
,>

Working mechanism

We can see the whole mechanism working directly from the defect side:

Followers do not truncate their logs as they did before

Describe what happens when you run the above:

  1. Before the outage,followerIs no longer in the ISR list,unclean.leader.election.enable=trueThat is, non-ISR replicas are allowed to become the leader
  2. followerThe message is written to pagecache but not flushed to disk. In this case, HW may be the intermediate state of the entire synchronous write

Epoch resolves message inconsistencies here:

  1. The newfollowersendepochReqWill carry their ownepoch
  2. The newleaderDetermine what is sentepoch ! = nowEpochAnd will send its ownepoch start offset
  3. The newfollowerReceived the newepoch start offset, will truncate their log from this positionleader ηš„ leaderStay consistent with the past)
  4. sendfetchreqStart a new synchronization so that all subsequent messages are consistent.