How to solve message queue delay and expiration problem? What happens when the message queue is full? There are millions of messages waiting for hours. How do you fix them?
You see this question method, in fact, the essence of the scene is to say that you may have a problem with the consumption end, do not consume, or consumption is extremely slow. Then, what if your message queue cluster’s disk is almost full and no one is consuming it? Or maybe the whole thing just backlogs for a few hours. What do you do at this point? Or if your backlog is so long that, for example, rabbitMQ sets an expiration date for messages and then they run out?
For example, when the client tries to write mysql after each purchase, mysql hangs, and the client hangs. Or the consumption end of a fork, resulting in extremely slow consumption.
Let’s go through this one by one, assuming a scenario where we now have a failure on the consumption side and a large number of messages are backlogged in MQ, and now there is an accident, panic
1. A large number of messages have been stuck in MQ for hours
Tens of millions of pieces of data sit back in MQ for seven or eight hours, from late 4pm to late 10pm, late 11pm.
This is a real scenario we have encountered. It is true that there is a fault on the line. At this time, it is either to fix the problem of the consumer, let him restore the consumption speed, and then wait for a few hours to complete the consumption. You can’t say that in the interview.
One consumer per second is 1000, three consumers per second is 3000, one minute is 180,000, more than 10 million.
So if you have a backlog of millions to tens of millions of data, even if the consumer recovers, it takes about an hour to recover.
In this case, only temporary emergency capacity expansion can be performed. The detailed procedure and roadmap are as follows:
1) First fix the problem of consumer to ensure its consumption speed, and then stop the existing CNOsumer;
2) Create a new topic, partition is 10 times of the original, temporarily establish 10 times or 20 times of the original queue number;
3) Then write a temporary consumer program that distributes data. This program is deployed to consume the backlog data. After consumption, it does not do time-consuming processing and directly writes evenly polling to the queue that is 10 times as many as temporarily established.
4) Then 10 times as many machines are temporarily recruited to deploy consumers, with each batch consuming a temporary queue of data;
5) This is equivalent to temporarily expanding queue resources and consumer resources by 10 times, consuming data at 10 times the normal speed;
6) Once the backlog is quickly consumed, the original deployment architecture must be restored to consume messages with the original consumer machine;
2. Here we assume a second pit
If you use RabbitMQ, RabbitMQ can set an expiration time, namely TTL, and if messages are stuck in a queue for more than a certain amount of time they will be cleared by RabbitMQ and the data will be lost. So that’s the second hole. This does not mean that the data will accumulate in MQ, but rather that the data will simply get lost.
In this case, it’s not about adding to the consumer backlog, because there isn’t any backlog, it’s about losing a lot of messages. We can adopt a solution, that is, batch reguide, which we have done similar scenes online before. You know, when there’s a huge backlog, we just throw it away, and then after the peak period, like when people are drinking coffee and staying up until 12 o ‘clock at night, people are asleep.
At this time we began to write procedures, will lose that batch of data, write a temporary program, bit by bit to find out, and then re-into mq inside, lost data to him during the day to make up for it. That’s all it has to be.
Suppose 10,000 orders are unprocessed in MQ, and 1000 of them are lost. You have to manually write a program to find those 1000 orders and manually send them to MQ to be filled again.
3. Then let’s assume a third pit
What if messages are backlogged in MQ and you leave them unprocessed for a long time, causing MQ to fill up? Is there another way to do this? No, your first plan was too slow. You wrote an AD hoc program to access data to consume, consume one by one, discard one by one, quickly consume all messages. Then go to plan two and make up the data later in the evening.
In addition, we have any good ideas, welcome to leave a message to add oh!
Welcome Java engineers who have worked for one to five years to join Java Programmer development: 854393687
Group provides free Java architecture learning materials (which have high availability, high concurrency, high performance and distributed, Jvm performance tuning, Spring source code, MyBatis, Netty, Redis, Kafka, Mysql, Zookeeper, Tomcat, Docker, Dubbo, multiple knowledge architecture data, such as the Nginx) reasonable use their every minute and second time to learn to improve yourself, don’t use “no time” to hide his ideas on the lazy! While young, hard to fight, to the future of their own account!