The premise condition

There are two core conditions for Kafka to ensure that any message component does not lose data in a particular scenario.

First, it must be a committed message. Kafka defines committed messages as when a producer commits a message to a broker and waits until multiple brokers acknowledge and return the committed message to the producer. The multiple brokers are defined by ourselves and can be considered committed if only one broker successfully saves the message, or if all brokers successfully saves the message. In either case, Kafka guarantees persistence only for committed messages.

Second, and most fundamentally, although kafka clusters are distributed, there must be enough brokers working to ensure message persistence. If your message is stored on N Kafka brokers, then at least one of them is alive. As long as this condition is true, Kafka guarantees that your message will never be lost.

How to ensure that messages are not lost

When a message is generated, sent to Kafka for saving, and taken out for consumption, there are multiple scenarios and stages of the process. It can be lost. Let’s talk about what Kafka does to ensure that messages are not lost.

The production end

The Producer side may lose messages. Currently Kafka Producer sends messages asynchronously, which means that if you call the producer.send(MSG) API, it usually returns immediately, but there is no guarantee that the message has been sent successfully. Network jitter causes messages not to be sent to the Broker at all. Or the message itself is not compliant and causes the Broker to reject it (for example, the message is too large and exceeds the Broker’s limit).

In fact, using the producer.send(MSG, callback) interface avoids this problem, and can be handled accordingly if a message fails to submit, based on the callback. If it’s a transient error, the Producer can just retry. If the message is not compliant, format the message and send it again. In short, the responsibility for handling send failures lies on the Producer side, not the Broker side. Of course, if the broker is down at this point, that is another matter and you need to deal with the broker exception as soon as possible.

The consumer end

The loss of data on the Consumer side is slightly more complicated. A Consumer has a concept of “offset” that indicates where the Consumer is currently consuming in the Topic partition. As shown in figure:

Kafka ensures that messages are not lost by consuming messages first and updating offsets later. However, messages may be repeated in this way. How to ensure only-once and share messages separately later?

Things get a little more complicated when we enable multithreaded asynchrony on the consumer side. The consumer automatically updates the offset forward, and if one of the threads fails, its message is not processed successfully, but the shift has been updated, so the message is effectively lost to the consumer. The key here is the automatic submission of the offset, how to actually verify that the message is actually consumed, and then update the offset.

This problem is also simple to solve: if multiple threads are processing the consumer message asynchronously, the consumer does not enable the automatic submission of the offset. The consumer application handles the submission of the offset itself. Mind you, it’s easy to say that a single consumer program uses multiple threads to consume messages, but it’s a bit cumbersome to code because it’s hard to handle updates to offset correctly. That is, it’s easy to avoid the loss of no-consume messages, but it’s very easy to consume messages more than once.

Practical Configuration [Important]

Kafka no message loss configuration:

  1. The producer side uses the producer.send(MSG, callback) send method with a callback.

  2. Set acks = all. Acks is a parameter of Producer that represents the definition of a “committed” message. If set to all, it means that all brokers must receive the message before it is considered “committed”.

  3. Sets retries to a larger value. This is also the argument for Producer. When network jitter occurs, messages may fail to be sent. In this case, a Producer configured with REtries can automatically retry sending messages to avoid message loss.

  4. Set the unclean. Leader. Election. The enable = false. This is the broker-side parameter, and its default value has been modified by the community several times during iterations of kafka versions, which were somewhat controversial. It controls which brokers are eligible to run for the Leader of a partition. If a Broker falls too far behind the original Leader, messages will be lost once it becomes the new Leader. Therefore, this parameter is usually set to false.

  5. Replication. factor >= 3. This is also the parameter on the Broker side. Save multiple message redundancy, not much explanation.

  6. Set min.insync.replicas > 1. Broker side parameter that controls how many copies a message must be written to before it is considered “committed”. Setting it to greater than 1 improves message persistence. Do not use the default value 1 in a production environment. Make sure replica.factor > min.insync.replicas. If the two are equal, then if only one copy is offline, the entire partition will not work. Replication. factor = min.insync.replicas + 1 is recommended.

  7. Ensure message consumption completes before submission. The Consumer side has a parameter enable.auto.mit, which is best set to false and handles the submission of offset itself.

end