Or I was told that no transaction had taken place when I had already been charged for online purchases. All of these situations are caused by the absence of transactions. This illustrates something about the importance of transactions in life.
Have business, you go to the shop to buy things, that is to hand over money and hand delivery. With the transaction, you go online shopping, deducting money to generate order transactions.
The specific definition of a transaction
Transactions provide a mechanism to bring all operations involved in an activity into one indivisible unit of execution. All operations that make up a transaction can be committed only if all operations are properly executed. Failure of any operation will result in a rollback of the entire transaction.
Simply put, transactions provide an “All or Nothing” mechanism.
Database local transactions
ACID
When it comes to database transactions, it has to be said that the four characteristics of database transactions are ACID:
A: Atomicity. All operations in A transaction are either completed or not completed, and do not end up somewhere in between.
If a transaction fails during execution, it will be rolled back to the state before the transaction began, as if the transaction had never been executed.
Like you buy things or pay money to receive goods are executed together, or send not to ship, return money.
C: Consistency. The Consistency of a transaction means that the database must be consistent before and after a transaction is executed.
If the transaction completes successfully, all changes in the system are applied correctly and the system is in a valid state.
If an error occurs during a transaction, all changes in the system are automatically rolled back and the system is returned to its original state.
I: Isolation, which means that in a concurrent environment, when different transactions simultaneously manipulate the same data, each transaction has its own complete data space.
Changes made by concurrent transactions must be isolated from changes made by any other concurrent transactions. When a transaction views a data update, the data is either in the state it was in before another transaction modified it, or in the state it was in after another transaction modified it, and the transaction does not see the data in the intermediate state.
For example, when you buy something, it doesn’t affect other people.
D: Durability, meaning that as long as a transaction ends successfully, its updates to the database must be kept forever.
Even if a system crash occurs, the database system can be restarted to the state it was at the successful end of the transaction.
For example, when you buy something, you need to record it in a ledger, even if the boss forgets it.
InnoDB implementation principle
InnoDB is a storage engine of MySQL, most people are familiar with MySQL, here is a brief introduction to some basic principles of database transaction implementation.
In a local transaction, services and resources can be considered as one entity under the transaction envelope, as shown in the following figure:
Our local transactions are managed by the resource manager:
The ACID of transactions is guaranteed by InnoDB logging and locking. Transaction isolation is implemented through database locks, persistence through Redo logs, atomicity and consistency through Undo logs.
The principle of Undo Log is very simple. In order to satisfy the atomicity of transactions, before any data is manipulated, the data is backed up to a place (the place where the data is stored is called Undo Log). Then the data is modified.
If an error occurs or the user performs a Rollback statement, the system can use the backup in the Undo Log to restore the data to the state before the transaction began.
In contrast to the Undo Log, the Redo Log records a backup of new data. Just persist the Redo Log before a transaction commits, not the data.
When the system crashes, data is not persisted, but the Redo Log is persisted. The system restores all data to the latest state according to the Redo Log. Students who are interested in the implementation process can search for extensions on their own.
Distributed transaction
What are distributed transactions
Distributed transaction means that the participants of the transaction, the server supporting the transaction, the resource server, and the transaction manager are located on different nodes of different distributed systems.
To put it simply, a large operation consists of different small operations, which are distributed on different servers and belong to different applications. Distributed transactions need to ensure that all of these small operations either succeed or fail.
Essentially, distributed transactions are designed to ensure data consistency across different databases.
The cause of distributed transactions
From the perspective of local transactions above, we can be divided into two parts:
- Service produces multiple nodes
- Resource produces multiple nodes
Service Multiple nodes
With the rapid development of the Internet, service architecture patterns such as microservices and SOA are being used on a large scale.
For a simple example, within a company, a user’s assets may be divided into several parts, such as balance, points, coupons, and so on.
Within the company, it is possible that the points function is maintained by one microservices team and the coupons are maintained by another team.
In this way, there is no guarantee that the coupon can be successfully deducted after the deduction of points.
Resource Multiple nodes
Similarly, the Development of the Internet is too fast, our MySQL generally holds tens of millions of levels of data have to be divided into libraries and tables.
For the transfer business of Alipay, if you transfer money to your friend, your database may be in Beijing, while your friend’s money is stored in Shanghai, so we still cannot guarantee that they can be successful at the same time.
The foundation of distributed transactions
From the above point of view, distributed transactions emerge with the rapid development of the Internet, which is inevitable.
We have said before that the four properties of DATABASE ACID are no longer sufficient for distributed transactions. At this time, some new bigwigs put forward some new theories.
CAP
CAP theorem, also known as Breuer’s theorem. For architects designing distributed systems (not just distributed transactions), CAP is your gateway theory.
C (consistency) : For a specified client, the read operation can return the latest write operation.
For data distributed on different nodes, if the data is updated on one node, it is called strong consistency if the latest data can be read on other nodes. If the data is not read on one node, it is called distributed inconsistency.
A (availability) : non-failing nodes return reasonable responses (not error and timeout responses) in A reasonable amount of time. The two keys to usability are reasonable time and reasonable response.
A reasonable time means that the request cannot be blocked indefinitely and should be returned within a reasonable time. A reasonable response means that the system should definitely return the result and that the result is correct, and by correct I mean, for example, it should return 50, not 40.
P (fault tolerance of partitions) : The system can continue to work after network partitions occur. For example, there are multiple machines in the cluster, and one machine has a network problem, but the cluster still works.
Anyone familiar with CAP knows that the three cannot be shared. If you are interested, you can search for CAP proofs. In distributed systems, networks cannot be 100% reliable, and partitioning is an inevitable phenomenon.
If we choose CA instead of P, when partition occurs, the request must be rejected to ensure consistency, but A does not allow it. Therefore, it is theoretically impossible for distributed system to choose CA architecture, but CP or AP architecture.
For CP, giving up availability and pursuing consistency and fault tolerance of partition, our ZooKeeper is actually pursuing strong consistency.
For AP, abandon consistency (consistency here is strong consistency), pursue fault tolerance and availability of partition, this is the choice of many distributed system design, the later BASE is also based on AP extension.
Incidentally, CAP theory ignores network latency, that is, when A transaction commits, there is no delay in copying from node A to node B, but in reality this is obviously impossible, so there will always be some time of inconsistency.
At the same time, if you choose CP, it does not mean that you give up A. Because the probability of P is so small, most of the time you still have to guarantee CA.
Even if the partition does appear, you have to prepare for the later A, for example by logging back to other machines.
BASE
BASE is a contraction of the phrases Basically Available, Soft state, and Eventually consistent, and is an extension of AP in CAP.
Basic availability: When a distributed system fails, it allows the loss of some available functions to ensure the availability of core functions.
Soft state: An intermediate state is allowed in the system that does not affect system availability. This refers to inconsistencies in the CAP.
Final consistency: Indicates that data on all nodes will be consistent after a period of time.
BASE solves the problem of no network delay in CAP theory, and adopts soft state and final consistency in BASE to ensure the consistency after delay.
BASE is the opposite of ACID in that it is completely different from ACID’s strong consistency model, but instead sacrifices strong consistency for availability and allows data to be inconsistent for a while, but eventually reach a consistent state.
Distributed transaction solutions
With the above theoretical foundation in mind, several common solutions for distributed transactions are introduced here.
Whether you really want to distribute transactions
Before you talk about scenarios, you must first ask whether you really need distributed transactions.
One of the two reasons for distributed transactions mentioned above is that there are too many microservices. I’ve seen too many teams maintain several microservices for one person, and too many teams overdesign to the point where everyone gets tired.
However, too many microservices will lead to distributed transactions. At this time, I do not recommend you to adopt any of the following solutions, but please aggregate the microservices that need transactions into a stand-alone service, using the local transaction of the database.
Because either option adds complexity to your system, the cost is simply too high. Don’t introduce unnecessary cost and complexity in pursuit of certain designs.
If you do want to introduce distributed transactions, look at the following common solutions.
2PC
Talking about 2PC, we have to talk about XA Transactions in distributed Transactions of database.
There are two stages in the XA protocol:
- The transaction manager requires that each database involved in a transaction precommit this operation and reflect whether it can be committed.
- The transaction coordinator requires each database to commit or roll back data.
Advantages:
- As far as possible to ensure the strong consistency of data, low implementation cost, in each major mainstream database have their own implementation, for MySQL from 5.5 support.
Disadvantages:
- Single point of problem: The transaction manager plays a critical role in the entire process. If it goes down, for example when it has completed phase 1 and is about to commit phase 2, the resource manager will block all the time, making the database unusable.
- Synchronous blocking: After it is ready, the resource in the resource manager is blocked until the commit completes, freeing the resource.
- Data inconsistency: Although the two-phase commit protocol is designed for strong consistency of distributed data, there is still the possibility of data inconsistency.
For example, in the second phase, if the coordinator sends a transaction Commit notification, but only some participants receive the notification and perform the Commit operation due to network problems, and the rest of the participants are blocked because they did not receive the notification, data inconsistencies will occur.
Overall, the XA protocol is simpler and cheaper, but its single-point problems and inability to support high concurrency (due to synchronization blocking) remain its biggest weaknesses.
TCC
The concept of TCC (try-confirm-Cancel) was first published by Pat Helland in Life Beyond Distributed Transactions in 2007: An Apostate’s Opinion.
Compared with XA, the TCC transaction mechanism solves the following disadvantages:
- The single point of coordinator is resolved, and the business activity is initiated and completed by the main business. The business activity manager has also become multi-point, introducing clustering.
- Synchronization blocking: A timeout is introduced and the whole resource is not locked. The resource is converted to service logic with smaller granularity.
- Data consistency, with compensation, is controlled by the business activity manager.
Explanation of TCC:
- In the Try phase, all services are checked (consistency) and necessary services are reserved (quasi-isolation).
- In the Confirm phase, services are executed without any service check and only service resources reserved in the Try phase are used. The Confirm operation is idempotent. Idempotent design is required. Retry if Confirm fails.
- Cancel phase: The execution is cancelled and service resources reserved for the Try phase are released. The Cancel operation is idempotent. The exception handling schemes of Cancel and Confirm phases are basically the same.
A simple example: If you buy a bottle of water for $100, Try phase: you need to check your wallet for $100 and lock the $100, same for water.
If there is a failure, Cancel(release the 100 dollars and the bottle of water) is performed, and if Cancel fails, Cancel is retried regardless of any failure, so you need to keep idempotent.
If both are successful, Confirm that the 100 yuan is deducted and the bottle of water is sold. If Confirm fails, try again (depending on the activity log).
For TCC:
- Strong isolation, strict consistency requirements of the activities of the business.
- Services that take a short time to execute.
Implement reference: https://github.com/liuyangming/ByteTCC/.
Local message table
Watch the local news scheme was originally proposed by eBay, eBay’s complete solution, https://queue.acm.org/detail.cfm?id=1394128.
The core of this solution is to execute tasks requiring distributed processing asynchronously through message logging. Message logs can be stored in local text, databases, or message queues, and then automatically or manually initiated retries by business rules.
Manual retry is more commonly used in payment scenarios to deal with post-hoc problems through reconciliation systems.
The core for local message queues is to turn large transactions into small transactions. You can buy a bottle of water for $100.
1. When you deduct money, you need to add a new local message table on the server where you deduct money, you need to write your deduct money and minus water inventory to the local message table, in the same transaction (depending on the database local transaction to ensure consistency).
2. At this time, there is a scheduled task to poll the local transaction table, throw the unsent message to the commodity inventory server, and ask it to subtract the water inventory. After reaching the commodity server, the transaction table of this server must be written first, and then deducted.
3. The commodity server scans the message table through a scheduled task or notifies the deduction server directly. The deduction server updates the status in the local message table.
4. For some abnormal situations, periodically scan the messages that are not processed successfully and resend them. After receiving the messages, the commodity server determines whether they are repeated.
If yes, the system determines whether to execute the transaction. If the transaction is executed immediately, the notification transaction is performed. If it is not executed, it needs to be executed again to ensure that the service is idempotent, that is, one bottle of water will not be deducted.
Local message queue is the BASE theory and the ultimate consistency model, which is suitable for the situation where consistency is not high. Implementing this model requires attention to idempotent retries.
The MQ transaction
Distributed transactions are implemented in RocketMQ, which is essentially an encapsulation of the local message table, moving it inside MQ.
Following a brief introduction of the MQ transaction, if you want to know the detailed you can refer to: https://www.jianshu.com/p/453c6e7ff81c.
The basic process is as follows:
- The first stage Prepared message will get the address of the message.
- The second phase performs the local transaction.
- Phase 3 accesses the message using the address obtained in phase 1 and modifies the state. The message recipient can then use the message.
If the confirmation message fails, periodic scanning for messages with no updated status is provided in RocketMQ Broker.
If a message is not acknowledged, a message is sent to the sender to determine whether to commit it, which in RocketMQ is given to the sender in the form of a Listener for processing.
If the consumption times out, it needs to keep retrying, and the message receiver needs to be idempotent. If message consumption fails, then manual processing is needed because the probability is low, and designing this complex process for such a low probability time is not worth the cost.
Saga transaction
Saga is a concept mentioned in an article on database ethics 30 years ago. The core idea is to split a long transaction into a number of local short transactions, coordinated by the Saga transaction coordinator, that are completed if they are completed properly, and that compensation operations are invoked once in reverse order if a step fails.
Composition of Saga: Each Saga is composed of a series of sub-transaction Ti, and each Ti has its corresponding compensation action Ci, which is used to undo the results caused by Ti. Each T here is a local transaction.
As you can see, Saga has no “reserve try” action compared to TCC, and its Ti simply commits directly to the library.
There are two sequences of Saga execution:
- T1, T2, T3… , Tn.
- T1, T2… , Tj, Cj… , C2, C1, where 0 < j < n.
Saga defines two recovery strategies:
- Backward recovery is the second execution order mentioned above, where J is the sub-transaction in which the error occurred. The effect of this approach is to undo all the previous successful sub-transation, making the execution result of the whole Saga undone.
- Forward recovery, for scenarios that must succeed, is performed in a sequence similar to this: T1, T2… , Tj(failed), Tj(retry),… Tn, where j is the sub-transaction where the error occurred. No Ci is required in this case.
Note here that isolation is not guaranteed in Saga mode, because without locking resources, other transactions can still overwrite or affect the current transaction.
Let’s take the example of $100 for a bottle of water. Here’s the definition:
- T1 = deduct 100 yuan, T2 = add a bottle of water to the user, T3 = reduce the stock of a bottle of water.
- C1 = add 100 yuan, C2 = subtract a bottle of water from the user, C3 = add a bottle of water to the inventory.
We do T1, T2, T3 all at once and if something goes wrong, we do the reverse of the C operation that went wrong.
The isolation problem mentioned above can occur if a rollback is performed at T3, but the user has already drunk the water (another transaction), the rollback will find that it is not possible to reduce the user’s water bottle.
This is the problem with no isolation between transactions. It can be seen that Saga mode still has a great impact without isolation. We can refer to Huawei’s solution: add a Session and lock mechanism from the business level to ensure that resources can be serialized.
At the service level, the resources can be isolated by freezing funds in advance. In the end, the latest updates can be obtained by reading the current status in time during service operations. (For example, see Huawei’s Service Comb.)
The last
Again, don’t use distributed transactions if you can. If you have to use distributed transactions, use your own business analysis to see which one your business is more suitable for, whether you care about strong consistency or final consistency.
Finally, in the summary of some questions, we can come down from the article to find the answer:
- Is the CA of ACID and CAP the same?
- What are the advantages and disadvantages of common solutions for distributed transactions? What scenarios are applicable?
- Why do distributed transactions occur? To deal with what pain points?
The original address: http://developer.51cto.com/art/201808/581174.htm