Every year, Alipay has demonstrated its excellent technical capabilities in double 11 and Double 12 activities. This ability is not only reflected in handling high TPS volume of access, but also reflected in almost no errors, no double payment situation, so how does this work?
Admittedly, in order to achieve the technical goal of not making mistakes under high concurrency, Alipay has made a lot of efforts, such as idempotent processing, the use of distributed transactions, etc., but personally, the most critical point is the seemingly unremarkable phrase “one lock, two judgment, three updates”.
What is “one lock two sentence three update”? Simply put, when any concurrent request comes in
- Let’s first lock the associated documents
- Then judge the state of the associated documents, whether the corresponding state has been updated before
- If the status has not been updated before according to step 2, the request can be updated and the relevant business logic can be completed. If the status has been updated before, it cannot be updated and the business logic cannot be completed.
Schematic diagram
Without further ado, let’s go straight to the code:
// Step 1 locks the current payslip
PaymentInfo resultPaymentInfo = commonPayCoreService
.queryPaymentForUpdate(createPaymentInfo.getId());
if (resultPaymentInfo.isFinalStatus()) {
// Step 2, judge the current single payment state, if it is the final state, return directly
// Do not make any updates
return resultPaymentInfo;
}
// Step 3 Update the current payment order state to the final state, and complete the relevant business logic (payment succeeded)
payCoreService.updateRequestResult(payChannelResult);
Copy the code
Based on the above scheme, it can be 100% guaranteed that there will be no repeated update problem in the case of concurrency. According to the theory, every time the state machine changes, it should judge whether the state has changed in the case of concurrency security.
Take a look at what happens if step 1 or step 2 is missing:
Step 1 is missing
Step 2 is missing! [No step 2 flow. PNG]
As long as these three steps are our code specification, we can avoid most of the concurrent duplication problems. The same is true for the processing of asynchronous concurrent and repeated messages. After deepening the judgment of the state machine, we can also deal with the problem of out-of-order messages.
You can choose pessimistic lock and Optimistic lock based on the actual situation. The implementation of pessimistic (database row) and optimistic (database version or distributed) locks will be discussed in more detail later.
Some people may ask whether pessimistic or optimistic locks on the system concurrency is affected, how to solve this problem? My opinion is that in modern distributed system, if we pursue high availability and stability, we must give priority to the solution. For performance, we can optimize code logic, optimize technical architecture, expand database resources and other ways to solve the problem.
In the previous pressure test of Ant Financial, there were about 10 SQL calls and one remote call (about 100ms) in the settlement system I was in charge of, and the total process cost about 180ms. The Java service concurrency reached 150TPS when pressed on a 4-core 8G machine, and the results were satisfactory, with no problem with horizontal server scaling.
Throughout the alipay technology architecture, it is no use only one scene directly update locks and judgment, is the Spring Festival in 2016 wufu red envelopes, as high as millions of TPS access, in order to ensure that the user experience of smooth, at the expense of the state security, after the fact to do a reconciliation (though even if wrong didn’t help:))