1, the Offset Topic
The Consumer submits the Offset to record the last position of the current consumption, so that the Consumer crashes or a new Consumer joins the Consumer group, causing the partition rebalancing operation, and each Consumer may be assigned to a different partition. The version of Kafka I tested was 0.11.0.2, where the consumer sends a message to a special theme “_consumer_offset”, as shown here:
The contents of the message include:
fields | content |
---|---|
Key | Consumer Group, topic, partition |
Payload | Offset, metadata, timestamp |
Messages submitted to the “_consumer_offset” topic are partitioned according to the key of the consumer group. All messages within a consumer group are sent to a unique Partition.
2, Offset the Commit
The submission logic of Offset is the same as that of a normal producer sending data to Kafka.
2.1, Consumer
When the consumer starts, it creates a built-in producer for the “_consumer_offset” theme for the submission of Offset data.
2.2, the Broker
The Offset submission is treated as a normal production request with the same logic.
The “_consumer_offset” theme is automatically created when the first Offset in the cluster submits a request.
3. Submission method of Offset
There are two problems with Offset submission: repeated consumption and missed consumption.
- Duplicate consumption occurs when the submitted Offset is smaller than the Offset of the last message processed by the client. Scenario: consume first and then submit the Offset. If the consumption succeeds and the submission fails, the consumer will obtain the previous Offset next time, so repeated consumption will be caused.
- Missing consumption occurs when the submitted Offset is greater than the Offset of the last message processed by the client. Scenario: first submit the Offset and then consume. If the submission succeeds and consumption fails, the consumer will acquire a new Offset next time, so consumption will be missed.
According to the specific business situation, choose the appropriate submission mode, can effectively solve the problem of repeated consumption and consumption leakage.
3.1. Automatic submission
Automatic submission is the simplest submission mode. By setting parameters, you can enable automatic submission and set the submission interval. The disadvantage is that after consuming some data, the automatic submission time has not been reached. At this time, a new consumer joins or the current consumer hangs, and the partition rebalancing operation will occur. After that, the consumer will start consuming at the Offset submitted last time, resulting in repeated consumption. You can shorten the auto-commit interval, but that doesn’t solve this problem.
3.2. Submit the current Offset
With manual commit turned off, the current Offset can be submitted through the synchronous commit interface, which gains initiative but also sacrifices throughput because synchronous commit is bound to be blocked and there is a retry mechanism.
3.3. Asynchronously submit the current Offset
Asynchronous commits are both proactive and increase throughput for Kafka consumption, without retry mechanisms, and do not solve the problem of repeated consumption.
3.4. Synchronous and asynchronous combined commit
In normal use, asynchronous submission is used, which is fast. When it comes to shutting down consumers, use synchronous commit, which will be retried even if it fails, until the commit succeeds or an unrecoverable error occurs. Both synchronous and asynchronous commit cannot avoid the problem of repeated consumption and missed consumption.
3.5. Submit the specified Offset
Because automatic, synchronous, and asynchronous commit all commit to the last Offset. The problem of repeated consumption and missed consumption can be mitigated by submitting the specified Offset, but the corresponding consumer side requires complex business processing and needs to maintain the Offset itself.