In this article we continue to talk about distribution.
When it comes to distributed systems, there is no question of “consistency”, and this time we will talk about “final consistency”.
Ultimate consistency is at the heart of most highly available distributed systems today.
For those unfamiliar with final consistency, here’s a quick introduction:
Final consistency refers to the fact that all the data scattered on different nodes in the system can reach a consistent state in accordance with the business definition after a certain period of time.
To highlight:
- Data consistency, not transaction consistency (ACID is transaction consistency);
- Conditions: Multiple nodes/systems;
- Inconsistency may be temporary, but eventually (god knows how long “eventually”)
Ok, the text begins.
Don’t look at the river as flat as a mirror, look at the deep water
Final consistency, in a word, loose in the process, tight in the end. Regardless of the intermediate process, the results must meet business requirements and meet data consistency requirements.
Although, in the implementation, there are all kinds of schemes, but the essence of the idea is the same. Let’s now ignore the glitteriness of the process and examine the nature of final consistency.
What live in the poor road in endlessly, chaos also with the net
When I just entered the line shortly, my ability is limited, a rookie, can only do some small functional modules. What impressed me most was the order module.
The user places an order, and the order module executes the corresponding order business logic after receiving the order request. Finally, the order is inserted into the order table and the order result is returned to the user. After the user settles, the order module will update the order status according to the payment situation.
It was a bit of a struggle for me, a technical dud, at first, but eventually I became adept at maintaining the module.
After this simple little life for a while, the new task comes!
The product manager told me that the order module that the data audit department wants me to maintain can timely distribute an order data to them after the order is completed. They provided an interface for me to send data directly to them.
Two problems arise:
Problem 1: The user waits for a long time
The simplest implementation is that after I update the order data, I call the interface given by the data audit department in order to pass the order number.
However, the process from successful settlement of the user to updating the order status is synchronous, assuming that the process takes n milliseconds. This means that the user has to wait at least n milliseconds. If you add in the operation time to the data audit department, say m milliseconds, then the entire user has to wait n+m milliseconds.
The cost of waiting time increases and the experience decreases. The diagram below:
Problem 2: Partial success, partial failure
When a new interface is introduced, calling the interface may fail at some point, for network problems, authentication problems, interface service failures, and many other reasons. So what happens when the new interface fails?
If the order update succeeds and fails to pass to the data audit department, this situation can make the subsequent processing of the order module awkward.
First of all, you can’t go back to the client and say you failed this time, the request did not fail, why do you say the other person failed? Second, you can’t call it a business success, because data auditing is a very important part of the business logic.
It’s a dilemma.
One of the solutions to these two problems is final consistency.
We’ve talked about CAP before, knowing that partition fault tolerance and availability can be guaranteed if some consistency is sacrificed. The ultimate consistency is that we can’t guarantee that all the data will meet the business requirements at the same time, but we can guarantee that the service will be serviced externally whenever there is an internal problem.
Fourth brother, I usually like to play games, so let’s use an example of taobao to buy Switch, to explain the final consistency:
If you want to buy a digital version of the Switch and a Switch at the same time on Taobao, you can get the digital version of the game immediately after you pay for it. However, for the purchased Switch, you have to wait a few days until the express delivery arrives home.
To tease out the details of this example:
- First of all, there must be a merchant selling switches and digital games to customers on Taobao to accept our order and give you an order number.
- You get a digital version of the game, but not the Switch.
- You do not know how this merchant prepared the Switch for you. Is it that he had to run to other merchants for serial goods when he was out of goods, or that he waited two days before sending the goods to you when he was out of goods? (Other reasons can be given for the delayed delivery, which will not be discussed further). It doesn’t matter. What matters is that you know that the other person is going to finish the order when they take it.
- Once you’ve placed your order, you’re guaranteed that you’ll get your Switch eventually, though you may not be sure when.
After a few days, you finally received the goods, well, congratulations on your success in the pit Switch.
The example above is what we call final consistency. But there’s a very, very important thing that doesn’t come out, which is what drives us to use final consistency?
The answer is data distribution.
The paper come zhongjue shallow, and must know this to practice
Why do we have a situation where we need final consistency?
Because we need to distribute the data to different places, and because we distribute the data to different places, there may be some inconsistency in the success or failure of the distribution in the intermediate distribution, and we need the idea of final consistency to deal with those situations.
Well, distributing data… OK, you got it, right?
Yes, the distribution of messages through MQ handles the distribution of data, which is the most common way to achieve final consistency.
We package the data to be distributed into a message and send it to the MQ middleware. The middleware broadcasts the data to all services that want to receive the messages. The services that receive the message process the data independently, depending on their business situation.
Going back to our order module example, we can use final consistency in two ways.
- Insert the database first, then send a message to the data audit
In this way, the order module first updates the order status. The order data is then packaged as a message and sent to MQ, and the order module’s job is done. All that remains is for the data audit department to take messages from MQ and process them based on their business.
In this method, we ensure both that the database update succeeds and that the data is sent to MQ. Finally, when the data audit department receives the message and processes it according to the message content, the overall data reaches the final consistent state.
- Only into MQ
In this way, the order module receives the request directly and packages the data into a message into MQ.
Then, the order module itself and the service of the data audit department respectively get the corresponding message from MQ, and then update the database according to their own business logic, and then go through their own audit, and finally achieve a consistent state.
The small lotus just showed its sharp horn, long before dragonflies stood on top
In the example above, we described the core idea of final consistency. We do not guarantee that the data state will meet the business requirements in real time, but we can guarantee that the business requirements will be met after a window of time, just as we do with online shopping.
However, simple as it sounds, nothing in the world is so easy. Depending on the business, the final consistency is differentiated into multiple implementations. For instance,
Retry + Reverse mode
When we make a payment, we need to account for it, and when the account doesn’t work, we may want to try again as much as possible. When a retry limit is reached, we may even notify the upstream system to provide a retry and cancel interface so that the downstream can notify the upstream to resend the message or temporarily cancel the operation.
Salvage mission mode
After we failed to do the billing, we tried retry + reverse mode to cancel the operation. Then we can create a recovery task and execute it later when the billing success is guaranteed.
Asynchronous message pattern
When we do the transfer, we must ensure that there is A strong consistency between A and B after A transfers out. However, cross-services may be required at this point. At the same time, we want to keep performance as high as possible. At this point, we can first transact local writes to the database and messages to cross services, and then commit and roll back the overall transaction based on the state of the message being processed.
As you can see, the ultimate consistency can be achieved in a variety of ways, but there is always one core that distributes data through message queues. Once we understand this fundamental principle, it will be much easier to understand the various distributed transactions, distributed consensus, and so on.
After the