From the stone fir code farm curriculum collation, the follow-up will fill the map

1. Interview questions

How to ensure reliable transmission of messages (how to deal with message loss)?

2. The interviewer’s psychological analysis

This is for sure, the basic rule of MQ is that you can’t have more data, you can’t have less data, you can’t have more data, which is the problem of repeated consumption and idempotency. No less, which means don’t lose the data. Well, that’s something you’ll have to think about.

If you use MQ to deliver very core messages, such as billing and deduction messages, because I used to design and develop a very core advertising platform, billing system, billing system is a very heavy business, operation is very time-consuming. So in the overall architecture of the advertising system, billing is actually made asynchronous, and MQ is added in the middle.

We spent a lot of time making sure that the BILLING message was never lost during the MQ delivery. Advertiser put in an advertisement, clearly say good, user click buckle fee 1 yuan. As a result, if the user always clicked once, the news lost when deducting the fee, our company will continue to lose a few dollars, a few dollars, a little makes a lot, this is a great loss to the company.

3, interview question analysis

This lost data, MQ generally falls into two categories, either mq itself loses it, or we lose it when we consume it. Look at rabbitMQ and Kafka separately

Rabbitmq is the backbone of a company’s business, and data should never be lost

(1) the rabbitmq

1) The producer lost the data

When producers send data to RabbitMQ, it can be lost halfway through the process, due to network problems and so on.

To do this, you can use rabbitMQ transactions, which enable the rabbitMQ transaction (channel.txselect) before sending a message. If the message is not received successfully, the producer will receive an exception. At this point, you can roll back the transaction (channel.txrollback) and retry sending the message; If a message is received, the transaction can be committed (channel.txCommit). The problem, however, is that as soon as rabbitMQ transactions are implemented, throughput is basically reduced because it takes too much performance.

So in general, if you want to make sure that messages to and from RabbitMQ are not lost, you can enable Confirm mode. After enabling Confirm mode, you will be assigned a unique ID for each message you write to RabbitMQ. Rabbitmq will send you an ACK message saying that the message is ok. If RabbitMQ fails to process the message, it will call you back to the NACK interface telling you that the message failed to be received and you can try again. In combination with this mechanism, you can maintain the status of each message ID in memory yourself, and if you haven’t received a callback for that message for a certain amount of time, you can resend it.

The big difference between transactions and Cnofirm is that transactions are synchronous, you commit a transaction and it blocks, but confirm is asynchronous, you send a message and then you send the next message, Rabbitmq then receives the message and asynchronously calls you back to an interface to inform you that the message was received.

Therefore, the confirmation mechanism is generally used to avoid data loss in the producer.

2) RabbitMQ loses data

If rabbitMQ loses data, you must enable RabbitMQ persistence, meaning messages will be persisted to disk after being written to, even if RabbitMQ dies, and will be read automatically after recovery, data will not be lost. In the rare event that RabbitMQ dies without persisting, a small amount of data can be lost, but this is unlikely.

There are two steps to setting persistence. The first is to set the queue to persist when it is created. This ensures that rabbitMQ will persist the metadata of the queue but not the data in it. The second is to set the deliveryMode of the message to 2, which is to persist the message, and RabbitMQ will persist the message to disk. Both must be set at the same time, and rabbitMQ will restart its queue from disk even if it hangs and restarts again.

And persistence can be combined with producer confirmations to notify producers of ACK messages only after they have been persisted to disk, so even if rabbitMQ dies before persisting to disk, the data is lost and the producer can’t receive an ACK, you can send it yourself.

Even if you enable persistence for RabbitMQ, it is possible that the message will be written to RabbitMQ before it is persisted to disk, and then rabbitMQ will hang up, causing a bit of memory to be lost.

3) The consumer loses the data

If rabbitMQ loses data, mainly because the process hangs (for example, after a restart) when you consume it, rabbitMQ will assume that you have consumed it and the data is lost.

To do this, use rabbitMQ’s ack mechanism. In simple terms, you can turn off rabbitMQ automatic ack, call it through an API, and ack it every time you make sure it’s done in your own code. In that case, if you’re not done with it, there’s no ACK? Rabbitmq will assume that you have not finished processing the purchase, and will assign the purchase to another consumer so that the message is not lost.

(2) the kafka

Kafka message loss causes:

1) The consumer loses the data

The only way a consumer can lose data is if you consume the message, and then the consumer automatically submits the offset, making Kafka think you have consumed the message. When you are ready to process the message, you hang up before you process it, and the message is lost.

Kafka is known to automatically commit offsets, so if you turn off the automatic submission of offsets and manually commit offsets after processing, you can ensure that data is not lost. However, it is true that repeated consumption will occur at this time. For example, if you die before submitting offset after processing, you will definitely repeat consumption once, as long as you ensure idempotency.

One of the problems in production is that our Kafka consumers consume data and write it to an in-memory queue to buffer it. As a result, sometimes you just write the message to the in-memory queue, and then the consumer automatically submits the offset.

Then we restart the system, causing the queue to lose data before it has time to process it

2) Kafka loses data

A common scenario for this is when a Kafka broker goes down and the partition leader is reelected. If some data from other followers is not synchronized at this time, the leader dies. If a follower is elected as the leader, his data will be lost. There’s some data missing.

Kafka’s leader machine crashed. After switching from follower to leader, we found that this data was lost

Therefore, you must set at least four parameters as follows:

Set replication. Factor to this topic: this value must be greater than 1, requiring at least 2 replicas per partition

Set the min.insync.replicas parameter on the Kafka server. The value must be greater than 1 to ensure that the leader knows that at least one follower is still in contact with him

If acks=all is set on the producer end, each piece of data must be written to all replicas before it can be considered as written successfully

Setting retries=MAX on the producer side is a request for unlimited retries if a write fails

Our production environment is configured in such a way that, at least on the Kafka broker side, data will not be lost in the event of a leader switch due to a failure of the leader broker

3) Will the producer lose the data

If ack= allLeaderfollower is set as above