Consumers go to Kafka to send messages, but there are no new messages available in Kafka. What does Kafka do?

As shown in the figure below, both follower copies have pulled to the latest position of the leader copy, and then send pull requests to the leader copy, but no new messages are written to the leader copy. What should the leader deputy do at this time? It is possible to directly return empty pull results to the follower copy, but if no new messages are written to the leader copy, the follower copy will always send pull requests and receive empty pull results, which consumes resources and is not reasonable.

This is where the concept of Kafka deferred operation comes in. When Kafka processes a pull request, it reads the log file once. If it does not collect enough messages (fetchMinBytes, configured by the fetch.min.bytes parameter, default: 1), A DelayedFetch is created to wait for a sufficient number of messages to be pulled. When a delayed pull operation is performed, the log file is read again and the pull result is returned to the follower copy.

Delayed operations are not unique to pulling messages. There are many delayed operations in Kafka, such as delayed data deletion, delayed production, and so on.

For delayed production (messages), setting the acks parameter to -1 when sending a message using the producer client means either waiting for all replicas in the ISR collection to acknowledge receipt of the message before receiving the result of the response properly, or catching a timeout exception.

Since the client has acks -1, the client cannot be notified that it correctly received the sent message until both copies of Follower1 and Follower2 receive messages 3 and 4. If either copy of Follower1 or copy of Follower2 fails to pull messages 3 and 4 completely within a certain amount of time, then a timeout exception needs to be returned to the client. The timeout period of production requests is set by request.timeout.ms. The default value is 30,000, or 30s.

So who performs the action of waiting for messages 3 and 4 to write copies of Follower1 and copies of Follower2 and return corresponding response results to the client? After the message is written to the local log file of the leader replica, Kafka creates a DelayedProduce to handle the situation where the message is written to all replicas properly or times out, and returns the response to the client.

A delayed operation requires a delayed return of the result of the response. First, it must have a timeout period (delayMs). If the task is not completed within this timeout period, it must be forced to complete to return the response result to the client. Second, delayed operations are different from timed operations. Timed operations refer to operations that are performed after a specific time, while delayed operations can be completed before the set timeout time, so delayed operations can support the triggering of external events.

In the case of a delayed production operation, its external event is an increase in the HW (high water level) of one of the partitions to which the message is being written. In other words, as the follower copy continuously synchronizes messages with the leader copy, HW increases further. Each increase of HW will detect whether the delayed production operation can be completed, and if it can, the response result will be returned to the client. If the execution cannot be completed within the timeout period, the execution is enforced.

Recall the delayed pull operation at the beginning of the article, which is also executed by a timeout or by an external event. Timeout triggering is well understood as waiting for a timeout to trigger a second read of the log file. External event triggering is a little more complicated because pull requests can be made not only by the follower copy but also by the consumer client, which also corresponds to different external events. If it is a delayed pull of the follower copy, its external events are messages appended to the local log file of the leader copy. If it is a delayed pull on the consumer client, its external event can be understood simply as an increase in HW.

There are deeper things behind the delay, such as the understanding of purgatory and the harvester, that can be found in Understanding Kafka


Welcome to support the author’s book: “Illustrated Kafka Guide” and “Illustrated Kafka’s Core Principles”


Welcome to support my new books: In-depth Understanding of Kafka: Core Design and Practice Principles and RabbitMQ Field Guide. Also welcome to follow my wechat official account: Zhu Xiaosi’s blog.