Welcome to my personal public account: Architecture Notes of Huishania (ID: Shishan100)
8:30am Monday to Friday! High-quality technical articles sent on time!
1. Introduction of background
In this article, we will discuss how to ensure data is not lost from front to back full link when using message-oriented middleware technology in an online production environment.
This question, in the Internet company interview time frequency appears, but also is very realistic production environment problem.
If your resume states that you are familiar with MQ technologies (RabbitMQ, RocketMQ, Kafka) and have experience using them on projects, a very real production environment issue is whether data will be lost in the process of sending messages to MQ and consuming messages from MQ.
If data can be lost, what steps do you take to ensure that 100% of the data transmitted over MQ is not lost during production deployment? Please tell us your technical solution in detail based on the message-oriented middleware you use online.
This is a question that distinguishes the skill level of candidates.
In fact, a significant percentage of ordinary engineers, even those who have worked in small and medium Internet companies, simply use MQ clusters deployed by the company, probably at the code level, basically sending and consuming messages, without much thought to the technical solution.
However, there are a number of production environment issues associated with the use of MQ, caching, shards, NoSQL and other technologies and middleware.
In view of these problems, there must be a corresponding set of technical solutions to ensure the robustness, stability and high availability of the system.
In fact, when interviewing candidates for large and mid-sized Internet companies, the question of data loss will most likely come up if they are looking at their experience and knowledge of MQ-related technologies. Because of this question, it is very good to distinguish the skill level of the candidate.
In this article we will discuss the risks and possibilities of data loss in the context of RabbitMQ messaging middleware, from sending messages to MQ, to consuming messages from MQ.
Then let’s take a look at how we can combine some of the technical features that MQ itself provides to ensure that data is not lost.
2. Previously on
For those of you who are new to MQ and are not familiar with the technology, take a look at the previous MQ articles to see the basics of how MQ works:
Why did you introduce message-oriented middleware into your system architecture?
What are the disadvantages of introducing message-oriented middleware into a system architecture
How did “Java Advanced Interview Series 3” message middleware land in your project?
In addition, we actually have two previous articles on data loss in message-oriented middleware.
We discussed how to prevent data loss from the perspectives of sudden customer outage and sudden cluster crash.
However, only those two schemes can not guarantee the data loss of the whole link, but if you have not seen the suggestions, let’s take a look:
The heart! How can I ensure 100% data loss when online services are down?
How to ensure millions of production data is not lost when the messaging middleware cluster crashes?
Anyway, for those of you who are not familiar with MQ, familiarize yourself with the previous series of articles and then take a systematic look at how 100% of MQ data is not lost.
3. Existing technical solutions
After the discussion of several previous articles, we have initially known that the first place that will lead to data loss is that after consumers get messages, they directly break down without having time to complete processing.
RabbitMQ’s automatic ack mechanism notifies the MQ cluster that the message has been processed and the MQ cluster deletes the message.
Wouldn’t the message be lost? None of the consumers will process this message.
So we discussed in detail earlier, by adjusting the manual ACK mechanism in the consumer service to ensure that the message must have been successfully processed before ack notifications are sent to the MQ cluster.
Otherwise, the consumer service is down before an ACK is sent, in which case the MQ cluster automatically senses and resends the message to other consumer service instances.
The heart! How can I ensure 100% data loss when online services are down?
This article, which discusses this issue in detail, shows the architecture diagram under manual ACK as follows:
In addition to this data loss issue, there was also the question of whether the MQ cluster itself would lose data if it suddenly went down.
By default it will, because neither queue nor message is posted in a persistent manner, so the MQ cluster restart will result in some data loss.
How to ensure millions of production data is not lost when the messaging middleware cluster crashes?
In this article, we looked at how to create a queue in a persistent manner and post messages to an MQ cluster that will persist messages to disk.
If the MQ cluster suddenly goes down before the message is delivered to the consumer service, the data will not be lost because the MQ cluster will automatically load the undelivered message from the disk file and continue delivering the message to the consumer service when it restarts.
Similarly, the solution precipitates the system architecture diagram as follows:
4. Is the data 100% not lost?
Think about it, so far, our architecture can ensure that data is not lost?
In fact, with the current architecture, there is still a problem that data can be lost.
That is, after the above producer order service posts the message to the MQ cluster, it temporarily resides in MQ memory and has not yet been persisted to disk or delivered to the consumer warehouse service.
What if the MQ cluster itself suddenly goes down?
Embarrassingly, data stored in memory is bound to be lost. Let’s take a look at the following diagram.
5. Make technical plans as required
Now, the technical solution we need to consider is: How does the order service guarantee that the message has been persisted to disk?
In fact, it is very easy to lose data when the producer order service delivers the message to the MQ cluster.
For example, if there is a network failure and the data never passes through, or if the above message has just been received by MQ but still resides in memory, the MQ cluster will lose data when it goes down.
So first we need to consider how the producer order service can implement a technical solution using the functionality provided by RabbitMQ.
The technical solution is to ensure that as soon as the message sent by the order service is confirmed to be successful, the MQ cluster must have persisted the message to disk.
We must achieve such an effect to ensure that data delivered to the MQ cluster is not lost.
6. Technical details to be studied
The technical details we need to study here are: the implementation principle of manual ACK of warehouse service to guarantee data loss.
Before, I received a lot of questions from students:
How on earth can the warehouse service achieve data loss based on manual ACK? What are the details and principles of RabbitMQ’s underlying implementation? Why does the storage service go down without sending an ACK when RabbitMQ automatically senses that it is down and resends the message to other storage instances?
What are the implementation principles and the underlying details behind these things?
Calm down, everyone, and we’ll take a closer look at the rationale behind this in a series of articles.
End
If there is any harvest, please help to forward, your encouragement is the biggest power of the author, thank you!
A large wave of micro services, distributed, high concurrency, high availability of original series of articles is on the way
Please scan the qr code below and keep following:
Architecture Notes for Hugesia (ID: Shishan100)
More than ten years of EXPERIENCE in BAT architecture
Recommended reading:
1. Please! Please don’t ask me about the underlying principles of Spring Cloud
2. [Behind the Double 11 carnival] How does the micro-service registry carry tens of millions of visits of large-scale systems?
3. [Performance optimization] Spring Cloud parameter optimization practice with tens of thousands of concurrent applications per second
4. How does the microservice architecture guarantee 99.99% high availability under the Double 11 Carnival
5. Dude, let me tell you in plain English what Hadoop architecture is all about
6. How can Hadoop NameNode support thousands of concurrent accesses per second in large-scale clusters
7. [Secret of Performance Optimization] How does Hadoop optimize the upload performance of large TERabyte files by 100 times
8, please, interview please do not ask me TCC distributed transaction implementation principle pit dad!
9, 【 pit dad ah! How do final consistent distributed transactions ensure 99.99% high availability in real production?
10, please, interview please don’t ask me Redis distributed lock implementation principle!
11, 【 eyes light up! See how Hadoop’s underlying algorithms elegantly improve large-scale cluster performance by more than 10 times?
12. How to support the storage and calculation of ten-billion-level data in the architecture of billion-level traffic system
How to design highly fault-tolerant distributed computing System
14. How to design a high-performance architecture for carrying ten billion traffic
How to design a high concurrency architecture with 100,000 queries per second
16. How to design the full link 99.99% high availability architecture for 100 million level traffic system architecture
17, seven pictures thoroughly explain the implementation principle of ZooKeeper distributed lock
18. What is volatile about Java concurrent interview Questions?
How to optimize CAS performance in Java 8?
What is your understanding of AQS?
What are fair locks and unfair locks in Java concurrent interview questions?
22, In general, talk about Java concurrent interview question micro service registry read/write lock optimization
23. How do interviewers of Internet companies inspect candidates without any dead Angle? (last)
24. How do Internet company interviewers inspect candidates without dead Angle? (the next)
Dude, why did you introduce message-oriented middleware into your system architecture?
【Java Advanced Interview series 2 】 : The elder brother, then you say system architecture introduction message middleware has what disadvantage?
27. [Walking Offer Harvester] Remember the interview experience of a friend who won the Offer of BAT technical expert
Guys, how does messaging middleware land in your project?
【Java Advanced Interview series 4 】 piercing heart! How can I ensure 100% data loss when online services are down?
Behind a JVM FullGC, there is a thrilling online production accident hidden!
31. [High Concurrency Optimization Practice] Will your system be overwhelmed by 10x request pressure?
How to ensure that millions of production data is not lost when the message-ware cluster crashes?
How to design scalable Architecture in Ten thousand concurrent Scenarios (PART 1)?
34. How to design scalable Architecture in Ten thousand Concurrent Scenarios (In)?
How to design scalable Architecture in tens of thousands of concurrent scenarios (Ii)?
36 billion level flow architecture second shot: your system is really invulnerable?
37. How to Ensure data consistency under Ten Billion Traffic Flow System Architecture (PART 1)
38. How to ensure data consistency in a Billion Traffic System architecture (middle)?
39. How to ensure data consistency under the condition of billions of traffic (ii)?
Author: Architectural Notes of Huoia Link: juejin.cn/post/684490… Nuggets copyright belongs to the author all, please contact the author to obtain authorization!