type

How does Kafka keep messages from getting lost

There are three aspects: <1> The producer does not lose messages. <2> The server (and The Kafka Cluster service) does not lose messages. <3> The consumer does not lose messagesCopy the code

<1> How to ensure that the producer side produces many messages (and does not lose messages at the producer side)

Use the API with callback and set the parameters acks and retries and retry.backoff.ms to determine how the system thinks the message was sent successfully. The retries parameter indicates the number of retries a producer tries to produce a message. Retry.backoff. ms Indicates the retry interval after a message production timeout fails. The callback function lets us know if the message was sent successfully. If sending fails, we need to do exception handling. For example, store failure messages to local disks or remote databases until services are up. This ensures that messages are not lost.Copy the code

<2> How to ensure that the server does not lose messages

Set three parametersCopy the code

<3> How to ensure that the consumer side does not lose messages (can not consume less messages)

preface

First of all, we should consider the following questions: how is message loss caused, what is message repetition caused from the perspective of the production side and the consumption side, and how to ensure message order from the perspective of the production side and the consumption side? What is the cost of ensuring that the information is not heavy or leaky? It is also easy to solve the problem by turning off the automatic submission offset shift and manually submitting offset after processing the message. This is not important, the consumer can lose data from the deduplication table before consumption: this is the most troublesome case resolution strategy: 1. Asynchronous buffers are full, block there, wait for the buffer to be available, can't empty buffer 2. After sending a message, the callback function will send the next one if it succeeds, and the failure will be recorded in the log, waiting for the timing script to scan the error log. (Send failure may not be a real send failure, but the feedback is not received, and the timing script may be resent.)Copy the code

How to keep order

If producers have a message sending failed, so the message behind the can't continue to send, or retransmission of the certain order Producers before receive send success feedback, can't send a data, but I feel the producer is a flow, blocking producers feel business is not possible, how can because a message received feedback sent out, It blocks the producer. After the synchronous mode: send out messages, must be blocked waiting for after receiving the notification, to send a message under the asynchronous delivery mode: has been written into the buffer, then a written to the queue Both have advantages and disadvantages: the synchronous mode while throughput is small, but send a after receiving the confirmation again hair a, without losing information can be ensured and can ensure the order againCopy the code

Kafka messaging ensures that produced information is not lost and re-consumed

1) When using synchronous mode, there are three states to ensure that messages are produced safely. If the state is set to 1 (only to ensure that the leader partition is written successfully), data will be lost if the Leader partition hangs. 2) Messages can also be lost when using asynchronous mode. When the buffer is full, the data is immediately discarded if the buffer pool is set to 0 (empties the buffer pool as soon as it is full without acknowledgement). Ways to avoid data loss in data production: As long as you can avoid both of these situations, you can guarantee that messages will not be lost. 1) In synchronous mode, the confirmation mechanism is set to -1, that is, the message is successfully written to the leader partition and all replica partitions. 2) In asynchronous mode, if the buffer pool is full when the message is sent but no acknowledgement is received, the configuration file is set to unlimited block timeout, that is, to keep the production side blocked, so that data will not be lost. Methods to avoid data loss during data consumption: If storm is used, enable Storm ackfail mechanism; If storm is not used, update offset after confirming that the data has been processed. You need to manually control the offset value in low-level apis. The problem with message queues is always at the source, whether the producer has a problem. Discuss a case in which if the data is sent successfully, but the response is lost while receiving, the machine will be resent after restart. Retransmission is easy to solve, as the consumer side can add to the table to solve the problem, but if the producer loses data, the problem is very troublesome. In the case of repeated data consumption, if (1) deduplication is processed: the unique identifier of the message is saved in the external media, and whether it has been processed during each consumption processing is judged; (2) Regardless of: in big data scenarios, it does not matter how many pieces of report system or log information are lost. It does not affect the final statistical analysis result. Usually not, but in some cases it can happen. The following parameter configuration and Best practices list do a good job of keeping the data persistent (trade-off, of course, at the expense of throughput). Each item in the list will be discussed after the list, and those who are interested can read the analysis below. There is no silver bullet, and if you want high throughput you have to be able to tolerate occasional failures (retransmission misses without sequential guarantees).Copy the code

The Consumer end

The case for a message loss on the consumer side is simple: if the offset is committed before the message processing is complete, data can be lost. Since Kafka Consumer automatically commits shifts by default, it is important to ensure that the message is properly processed before committing shifts in the background. Therefore, heavy processing logic is not recommended. If the processing takes a long time, it is recommended to put the logic in another thread. In order to avoid data loss, two suggestions are given: enable.auto.mit =false Disables the automatic submission shift and manually submits the shift after the message is processed




Simple summary

The consumer loses data: disables the automatic submission of offset, and the offset is shifted after the processing is complete. Enable.auto.com MIT =false Disables the automatic submission of offset. This is not a problem, regardless of whether the producer repeats the message or not, as long as the consumer takes the weight from the deduplication table before consuming it: the production side loses data: this is the most troublesome caseCopy the code

Resolution strategy

1. The asynchronous mode blocks when the buffer is full, waiting for the buffer to become available. The buffer cannot be emptied (once emptied, the data is lost) 2. Synchronous mode: callback function after sending the message (waiting for feedback), send the next message if the message is successfully sent, record the failure in the log, and wait for the timing script to scan (sending failure may not really send failure, but do not receive feedback, the timing script may be resent)Copy the code

Data loss:

1) When using synchronous mode, there are three states to ensure that messages are produced safely. If the state is set to 1 (only to ensure that the leader partition is written successfully), data will be lost if the Leader partition hangs. 2) Messages can also be lost when using asynchronous mode. When the buffer is full, the data is immediately discarded if the buffer pool is set to 0 (empties the buffer pool as soon as it is full without acknowledgement). As long as you can avoid both of these situations, you can guarantee that messages will not be lost. 1) This means that in synchronous mode, the acknowledgement mechanism is set to -1, that is, messages are written to the leader and all replicas. 2) Also, in asynchronous mode, if the buffer pool is full when the message is sent but no acknowledgement is received, the configuration file will set the blocking timeout time to unlimited, that is, to keep the production side blocked, so that data will not be lost.Copy the code

Ack response mechanism

The ACK confirmation mechanism is set to 0, indicating that there is no waiting for response, no waiting for Borker's confirmation information, minimum delay, producer cannot know whether the message succeeds or not, and the message may be lost, but it has the maximum throughput. The ACK mechanism is set to -1, that is, messages are written to the leader and all replicas. All replicas in the ISR list return confirmation messages. The ACK confirmation mechanism is set to 1. The leader has received the data confirmation information, and the Replica pulls the data asynchronously. The ACK mechanism is set to 2, which means that the broker returns success if the producer writes successfully to the partition leader and one other follower, regardless of whether the other follower writes successfully. The ACK mechanism is set to "all", that is, the send method returns only when all copies are synchronized to the data. In this way, the data can be sent successfully. Theoretically, the data will not be lost. Min.insync.replicas =1 means that at least one replica returns successfully. Otherwise, the product is abnormalCopy the code

conclusion

The integrity of the message and the throughput of the system are mutually exclusive, and the throughput of the system must be lost to ensure that the message is not lost.Copy the code

< 1 > producer:

1, 1, ack set - and ensuring that the message is sent to the leader and all duplicates of 2, set up the synchronization successfully, the minimum number of synchronous vice book 1 3, increase the retry count 4, send 5, too large for a single data synchronization, to set up the receiving of the size of a single data 6, for asynchronous transmission, through the callback function to sense the message, Use the kafkaproducer. send(Record, callback) method instead of send(Record) method. 7. All replicas are collectively called Assigned Replicas (AR 8). When the client buffer is full, messages may be lost. 9, block.on.buffer.full = trueCopy the code