This article is for the blogger to study notes, the content comes from the Internet, if there is infringement, please contact delete.

Personal Note: github.com/dbses/TechN…

Under what circumstances can Kafka ensure that messages are not lost?

In a nutshell, Kafka only makes limited persistence guarantees for committed messages.

There are two core elements to this statement.

The first core element is “committed messages.” When Kafka’s brokers successfully receive a message and write it to a log file, they tell the producer program that the message was successfully committed. At this point, the message officially becomes “committed” in Kafka’s eyes.

The second core element is “limited persistence assurance.” Kafka cannot guarantee that messages will not be lost under any circumstances. To take an extreme example, if the earth did not exist, Kafka would obviously not be able to save any messages.

Kafka does not lose messages, but the messages must be committed and must meet certain conditions. If your message is stored on N Kafka brokers, then the prerequisite is that at least one of N brokers is alive.

Message loss Case 1: The producer program loses data

Currently Kafka Producer sends messages asynchronously, which means that if you call the producer.send(MSG) API, it usually returns immediately, but you can’t assume that the message has been sent successfully. If a message is lost, we don’t know about it.

The Producer should always send the API using a method with a callback notification, using producer.send(MSG, callback).

Message loss Case 2: The consumer application loses data

The Consumer program has a concept of “displacement” that represents the position of the Topic partition that the Consumer is currently consuming.

To ensure that messages are not lost on the Consumer side, the sequence of consuming messages first and then updating shifts can be maintained. The problem with this approach, of course, is the repeated processing of messages.

There is also a more insidious message loss scenario. After the Consumer program receives the message from Kafka, it starts multiple threads to process the message asynchronously, and the Consumer program automatically updates the shift forward. If one of the threads fails, the message it is responsible for is not processed successfully, but the shift has been updated, so the message is effectively lost to the Consumer.

The solution to this problem is simple: if multiple threads are processing Consumer messages asynchronously, the Consumer program does not turn on the auto-commit shift, but the application manually submits the shift.

Best practices

Producer:

  1. Instead of using producer.send(MSG), use producer.send(MSG, callback).
  2. Set acks = all. Acks are a parameter of Producer that represents your definition of a “committed” message. If set to all, it means that all replica brokers must receive the message before it is considered “committed.”
  3. Sets retries to a larger value. The retries parameter is also a parameter of Producer, which automatically retries the Producer. When transient jitter occurs, messages may fail to be sent. In this case, a Producer whose REtries > 0 is configured can automatically retry message sending to avoid message loss.

Broker:

  1. Set the unclean. Leader. Election. The enable = false. It controls which brokers are eligible to run for the Leader of a partition. If a Broker falls too far behind the original Leader, messages will inevitably be lost once it becomes the new Leader. Therefore, this parameter must be set to false, that is, it is not allowed to happen.

  2. Replication. factor >= 3. The idea here is that it’s best to keep multiple copies of messages, since redundancy is currently the main mechanism for preventing message loss.

  3. Set min.insync.replicas > 1. It controls at least how many copies a message must be written to before it is considered “committed.” Setting it to greater than 1 improves message persistence. Never use the default value of 1 in the real world.

    This does not conflict with acks=all. If there is only one ISR replica, acks=all equals acks=1. Min.insync.replicas is introduced to provide a lower limit: Ensure that the number of writes in the ISR is not less than min.insync.replicas.

  4. Make sure replica.factor > min.insync.replicas. If the two are equal, then only one copy hangs up and the entire partition fails. Not only do we need to improve message persistence and prevent data loss, but we need to do it without compromising availability. Replication. factor = min.insync.replicas + 1 is recommended.

Consumer:

  1. Ensure message consumption completes before submission. The Consumer side has a parameter enable.auto.mit. It is best to set it to false and manually commit the shift. As mentioned earlier, this is critical for single-consumer multithreaded processing scenarios.