This is the 21st day of my participation in the August More Text Challenge

Author: Tom Brother wechat official account: Micro technology

Hi, I’m Tom

Kafka message framework, you must be familiar, many people have contact with work. Its core idea, through a high-performance MQ service to connect the production and consumption of two systems, to achieve decoupling between the systems, has a strong scalability.

You may be wondering, what happens if one of these links breaks?

This condition, called message loss, can cause data inconsistencies between systems.

So how do you solve this problem? It needs to be handled from the production side, MQ server side, and consumer side

1. Production end

It is the responsibility of the production side to ensure that the produced message reaches the MQ server, and we need a response to determine whether the operation was successful.

Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback)

Copy the code

For example, the above code uses a Callback function to determine whether the message was sent successfully, and if it failed, we need to compensate.

In addition, to improve the flexibility of sending, Kafka provides a variety of parameters for different businesses to choose from

1.1 parameter acks

This parameter indicates how many partition replicas receive the message before the send is considered successful.

  • Acks =0: if the message is sent, it is considered successful. The production end does not wait for the response from the server node

  • Acks =1: indicates that the producer considers the leader partition to have sent the response successfully

  • Acks =-1: The production end considers the ISR successful only when all replicas in the ISR receive messages. This configuration is the most secure, but throughput decreases due to the large number of synchronized nodes.

1.2 parameter retries

Indicates the number of retries at the production end. If the number of retries expires but the message still fails, the system temporarily stores the message to the local disk and resends the message after the service is restored. Recommended value retries = 3

1.3 parameter retry. Backoff. M

Interval retry time after message sending timeout or failed. The recommended setting time is 300 milliseconds.

Note that if the MQ service does not respond properly, it does not necessarily mean that the message failed to be sent. It may also mean that the response timed out due to network jitter.

When the production side does this, it is guaranteed that the message will be sent successfully, but it may be sent more than once, resulting in duplicate messages, which we’ll talk about later

2. MQ server

MQ servers, as the storage medium for messages, can also lose messages. For example: a partition suddenly hangs up, so how to ensure that the data of this partition is not lost, we will introduce the concept of copy, through backup to solve this problem.

What parameters can be set?

2.1 parameter replication factor

Factor >1 When the leader replica dies, the follower replica is elected to continue providing services as the leader.

Parameters of 2.2 min. Insync. Replicas

Min.insync.replicas is usually set to >1 so that there are available follower replicas for replacement and messages are not lost

2.3 parameter unclean. Leader. Election. The enable

Whether replicas that are not in the ISR collection can be elected as leader replicas.

If this parameter is set to true, the follower copy is elected as the leader, causing message loss.

3. Consumer end

All the consumer side has to do is process the complete consumption of the message. But there is a step to commit the shift.

Some students, considering that business processing takes a long time, will start a separate thread to pull the message and store it in the local memory queue, and then set up a thread pool to process the business logic in parallel. There is a risk that local messages will be lost if they are not processed and the server goes down.

In: Pull message – business process —- commit consumption shift

For commit shifts, Kafka provides centralized parameter configuration

MIT parameter enable.auto.com

Indicates whether the consumption shift is automatically committed.

If a message is pulled, the business logic is not finished, the consumer commits the consumption shift but the consumer hangs, the consumer recovers or another consumer takes over the shard and cannot pull the message again, resulting in message loss. So, we usually set enable.auto.mit =false to manually commit the consumption shift.

List<String> messages = consumer.poll();
processMsg(messages);
consumer.commitOffset();

Copy the code

And that’s another problem, so let’s look at this diagram

The last commit shift is not saved to the MQ server. The next time the message is pulled, it will still be pulled from message 4, but this part of the message has already been processed, resulting in repeated consumption.

How to solve repeated consumption and avoid causing data inconsistency

First, duplicate messages on the MQ server need to be addressed. Since kafka 0.11.0, each message has a unique message ID, and MQ services automatically filter repeated messages in a spatial-temporal manner, ensuring idempotency of the interface.

This does not fundamentally solve the problem of message duplication. Even if there are no duplicate messages stored in the MQ service, the consumer uses the pull mode. If the pull mode is repeated, it will also lead to repeated consumption.

Scheme 1: pull the message only once (after the consumer pulls the message, it submits the offset first and then processes the message). However, if the system breaks down and the business processing is not finished normally, the message cannot be pulled in the future, leading to data inconsistency, this scheme is rarely adopted.

Scheme 2: Allow to pull duplicate messages, but the consumer does idempotent control. Make sure you make only one successful purchase.

There are many idempotent technical solutions. We can use data tables or Redis cache to store processing identification, and check the processing status before processing every time the message is pulled, and then decide whether to process or discard the message.

More:

Github.com/aalansehaiy…

Author introduction: Tom brother, computer graduate student, the school recruited ali, P7 technical expert, has a patent, CSDN blog expert. Responsible for e-commerce transactions, community fresh, flow marketing, Internet finance and other businesses, many years of first-line team management experience