Distributed transaction is a technical difficulty in enterprise integration, and it is also involved in every distributed system architecture, especially in microservice architecture, which is almost unavoidable. This article will briefly talk about distributed transaction.
Database transaction
Before we talk about distributed transactions, let’s start with database transactions. Database transactions are probably familiar and often used during development. But even so, perhaps for some details, many people are still unclear. For example, many people know the characteristics of database transactions: Atomicity, Consistency, Isolation and Durabilily (ACID for short). But you don’t know what isolation is when you ask what isolation is, or you know what isolation is when you ask what levels of isolation a database implements, or what are the differences between them.
This article is not intended to cover these database transactions, but you can do some research on them. One thing we need to know is that if the database suddenly loses power while committing a transaction, how does it recover? Why mention this point? Because the core of distributed system is to deal with all kinds of abnormal situations, which is also the complex place of distributed system. Because the distributed network environment is very complex, this kind of “power failure” is much more than the single machine, so when we do distributed system, the first consideration is this kind of situation. These exceptions may include machine downtime, network exceptions, message loss, message disorder, data errors, unreliable TCP, storage data loss, other exceptions, and so on…
Let’s move on to the case where the local transaction database is out of power, how does it guarantee data consistency? Using SQL Server as an example, we know that our SQL Server database consists of two files, a database file and a log file. In general, the log file is much larger than the database file. Database for all the write operation is to write the log, in the same way, when we were in executing a transaction database is first to record this transaction redo log operation, and then began to real database operation, first put the log file to disk before operation, so when suddenly loses power, even if the operation did not complete, When a database is restarted, the database performs undo rollback or redo rollback based on the current data condition to ensure data consistency.
Next, let’s talk about distributed transactions.
Distributed theory
When we have a single database performance bottlenecks, we might have on the database partition, partition here refers to the physical partition, partition after may different libraries in different servers, this time a single database ACID already can not adapt to this kind of situation, in this ACID cluster environment, To ensure that the cluster ACID is almost difficult to meet, or even able to achieve the efficiency and performance will be dropped significantly, the most important thing is hard to expand the new partition again, this time if again pursue cluster ACID can lead to our system is very poor, then we need to introduce a new theoretical principles to adapt to this kind of cluster, The CAP principle or the CAP theorem, so what does the CAP theorem stand for?
The CAP theorem
The CAP theorem was put forward by Eric Brewer, a professor at the University of California, Berkeley, who pointed out that WEB services cannot satisfy three properties simultaneously:
Consistency: the client knows that a series of operations will take place simultaneously.
Availability: Each operation must end with an expected response
Partition tolerance: Operations can be completed even if a single component becomes unavailable
Specifically, in distributed systems, a Web application can only support at most two of these properties simultaneously in any database design. Obviously, any scale-out strategy depends on data partitioning. Therefore, designers must choose between consistency and usability.
This theorem holds true in distributed systems up to now! Why do you say that?
At this time, some students may put the database 2PC (two-phase commit) out to speak. OK, let’s look at the two-phase commit of the database.
For those of you who are familiar with distributed Transactions in databases, you must know that databases support 2PC, also known as XA Transactions.
MySQL has been supported since version 5.5, SQL Server 2005, and Oracle 7.
Wherein, XA is a two-phase commit protocol, which is divided into the following two phases:
Phase 1: The transaction coordinator requires each database involved in a transaction to precommit this operation and reflect whether it can be committed.
Phase 2: The transaction coordinator requires each database to commit data.
If any database rejects the commit, all databases are asked to roll back their part of the transaction. What’s the downside of this? At first glance we can achieve consistency between database partitions.
If the CAP theorem is true, it must affect availability.
Say that the availability of a system represents the sum of the availability of all the components involved in performing an operation. In a two-phase commit, then, availability represents the sum of availability for each database involved. Assuming 99.9% availability for each database during a two-phase commit, the result is 99.8% if two databases are involved in a two-phase commit. According to the system availability calculation formula, assuming 43,200 minutes per month, 99.9% availability is 43,157 minutes, and 99.8% availability is 43,114 minutes, which is equivalent to adding 43 minutes of downtime per month.
So, we can verify that the CAP theorem is theoretically correct. CAP we’ll see that first and we’ll talk about it later.
The BASE theory of
In distributed systems, we tend to pursue availability, which is more important than consistency, so how to achieve high availability? Predecessors have put forward another theory for us, that is, the BASE theory, which is used to further expand the CAP theorem. BASE theory refers to:
Basically Available
Soft state
Eventually consistent
BASE theory is the result of balancing consistency and availability in CAP. The core idea of the theory is that strong consistency cannot be achieved, but each application can adopt an appropriate way to achieve Eventual consistency according to its own business characteristics.
With this theory in mind, let’s look at the problem of distributed transactions.
Distributed transaction
In distributed systems, there are no more than a few solutions to implement distributed transactions.
Phase 1, Phase 2 Submission (2PC)
As with the DATABASE XA transaction mentioned in the previous section, two-phase commit is the principle of using the XA protocol. You can easily see details such as COMMIT and abort in the flow diagram below.
Two-phase delivery of this solution is a trade-off for consistency over availability. In terms of implementation, in.NET, the API provided by TransactionScop can be used to program the two-phase commit in distributed systems, such as WCF. However, between multiple servers, DTC needs to be relied on to achieve transaction consistency. Microsoft has MSDTC service in Windows, but it is more tragic in Linux.
By default, TransactionScop cannot be used for transaction consistency between asynchronous methods because the transaction context is stored in the current thread, so you need to explicitly pass the transaction context if the method is asynchronous.
Advantages: Ensure strong consistency of data as far as possible, suitable for the key fields with high requirements for strong consistency of data. (In fact, there is no 100% guarantee of strong consistency)
Disadvantages: Complex implementation, sacrificing availability, has a great impact on performance, is not suitable for high concurrency and high performance scenarios, if the distributed system is called across the interface, there is no implementation scheme in the.NET field.
Ii. Compensation Transaction (TCC)
TCC is the compensation mechanism adopted. The core idea is that for each operation, a corresponding acknowledgement and compensation (undo) operation should be registered. It is divided into three stages:
In the Try phase, service systems are detected and resources are reserved
The Confirm phase is used to Confirm the submission of the service system. If the Try phase succeeds and the Confirm phase starts, the Confirm phase does not fail by default. If the Try succeeds, Confirm succeeds.
In the Cancel phase, services that need to be rolled back due to errors are cancelled and reserved resources are released.
For example, Bob wants to transfer money to Smith.
We have a local method that we call in turn
First, in the Try phase, the remote interface is called to freeze Smith’s and Bob’s money.
2. In the Confirm phase, perform the transfer operation of the remote call and the transfer is unfrozen successfully.
3. If the execution of step 2 succeeds, the transfer succeeds. If the execution of step 2 fails, the corresponding unfreezing method (Cancel) of the remote freezing interface is called.
Advantages: Compared with 2PC, the implementation and process is relatively simple, but the data consistency is also worse than 2PC
Disadvantages: Disadvantages are obvious, failure is possible in step 2 or 3. TCC is a compensation method at the application layer, so programmers need to write more compensation code during implementation. In some scenarios, some business processes may be difficult to define and handle with TCC.
3. Local message table (asynchronous guarantee)
The implementation of local message table is the most widely used in the industry. Its core idea is to split distributed transactions into locally processed transactions, which originated from ebay. We can see some of the details in the following flow chart:
The basic idea is:
The message producer needs to build an additional message table and record the message sending status. Message tables and business data are committed in a transaction, which means they are in a database. The message is then sent through MQ to the consumer of the message. If the message fails to be sent, it is sent again.
The message consumer needs to process the message and complete its own business logic. If the local transaction is successfully processed, it is successfully processed. If the local transaction fails, the execution will be retried. If a service failure occurs, you can send a service compensation message to the production company to notify the production company to perform operations such as rollback.
Producers and consumers periodically scan the local message table to resend incomplete or failed messages. If there is a sound automatic reconciliation logic, this scheme is very practical.
This scheme follows the BASE theory, USES is eventual consistency, the author considered that several programs which fit in with the actual business scenarios, that there will not be as complicated as 2 PC implementation (when invocation chain is very long, 2 PC availability is very low), also won’t like TCC possible confirm or rollback is not the case.
Advantages: A very classic implementation that avoids distributed transactions and achieves ultimate consistency. There are ready-made solutions in.NET.
Disadvantages: Message tables are coupled to the business system, and without a packaged solution, there is a lot of chores to deal with.
MQ transaction messages
Some third-party MQS, such as RocketMQ, support transaction messaging in a manner similar to the two-phase commit adopted, but some mainstream MQS on the market do not support transaction messaging, such as RabbitMQ and Kafka.
Taking Alibaba’s RocketMQ middleware as an example, its ideas are roughly as follows:
The first stage Prepared message will get the address of the message.
The second phase performs a local transaction, and the third phase accesses the message using the address obtained in the first phase and modifies the state.
That is, two requests, one send message and one acknowledgement message are submitted to the message queue within the business method. RocketMQ will periodically scan the message cluster for transaction messages if it finds Prepared. It will acknowledge the message to the sender, so the producer needs to implement a Check interface. RocketMQ will decide whether to roll back or continue sending the confirmation message based on the policy set by the sender. This ensures that message delivery and the local transaction both succeed or fail at the same time.
Unfortunately, RocketMQ does not have a.NET client.
Advantages: Ultimate consistency is achieved without reliance on local database transactions.
Cons: Difficult to implement, not supported by mainstream MQ, none. NET client, RocketMQ transaction message part of the code is not open source.
Sagas transaction model
The Saga transaction model, also known as long-running-Transaction, was proposed by H.Garcia-Molina et al., Princeton University. It describes another way to solve complex business transaction problems in distributed systems without two-phase commit. You can see the Sagas paper here.
What we are talking about here is a workflow transaction model based on Sagas mechanism. The relevant theory of this model is relatively new at present, so that there is almost no relevant information on Baidu.
The core idea of this model is to split long transactions in a distributed system into many short transactions, or local transactions, which are coordinated by the Sagas workflow engine. If the whole process ends normally, then the business is successfully completed. If the implementation fails during this process, The Sagas workflow engine then invokes the compensation operations in reverse order, rerolling the business.
For example, our business operation of buying travel package involves three operations, namely, car reservation, hotel reservation and air ticket reservation, which belong to three different remote interfaces. They may not be part of the same transaction from our program’s perspective, but they are part of the same transaction from a business perspective.
Their execution sequence is shown in the figure above, so when a failure occurs, compensation is cancelled in turn.
Because long transactions are split into many business flows, one of the most important components of the Sagas transaction model is the workflow or Process Manager, as you might call it. The workflow engine and Process Manager are not the same thing, but in this case, their responsibilities are the same. After selecting the workflow engine, the final code might look something like this
SagaBuilder saga = SagaBuilder.newSaga(“trip”) .activity(“Reserve car”, ReserveCarAdapter.class) .compensationActivity(“Cancel car”, CancelCarAdapter.class) .activity(“Book hotel”, BookHotelAdapter.class) .compensationActivity(“Cancel hotel”, CancelHotelAdapter.class) .activity(“Book flight”, BookFlightAdapter.class) .compensationActivity(“Cancel flight”, CancelFlightAdapter.class) .end() .triggerCompensationOnAnyError(); camunda.getRepositoryService().createDeployment() .addModelInstance(saga.getModel()) .deploy();
Here is an example of C# for those interested.
Advantages and disadvantages are not mentioned here, because this theory is relatively new and there is no solution on the market at present, even in the Java field, I haven’t searched much useful information.
Distributed transaction solution: CAP
The distributed transaction solutions described above you may see elsewhere, but there is no actual code or open source code for them, so they are not dry goods.
In the.NET world, there seems to be no ready-made solution for distributed transactions, or one that exists but is not open sourced. With the author’s understanding, there are some internal companies actually have this kind of solution, but also as one of the company’s core products, did not open source…
In view of the above reasons, the blogger decided to write one and open source it, so he started to work on it since the beginning of 2017, and then spent more than half a year in continuous improvement, which is the following CAP.
GithubCAP: CAP is not CAP theory, but the name of a.NET distributed transaction solution.
Details:
www.cnblogs.com/savorboard/…
Related documents:
www.cnblogs.com/savorboard/…
Exaggeratedly, this solution has a Dashboard, so you can see which messages have been successfully executed and which messages have failed, and whether they have failed to be sent or processed at a glance.
To top it all off, the visualization interface of the solution also provides a real-time dynamic graph that not only shows the real-time message delivery and processing, but also the current system processing speed, as well as the historical message throughput over the past 24 hours.
On top of that, the solution also lets you integrate Consul for distributed node discovery and registration, as well as heartbeat checks, so you can see what other nodes are doing at any time.
Most exaggeratingly, do you think you need to log in to the Dashboard of another node to view data? No, you can open the Dashboard of any of the nodes and click it to switch to the console interface of the node you want, just like you can view local data, they are completely decentralized.
You think that’s enough? No, much more than:
CAP also supports message queues such as RabbitMQ and Kafka
CAP also supports SQL Server, MySql, PostgreSql and other databases
CAP Dashboard supports both Chinese and English interface in both languages, so mom no longer has to worry about me not understanding it
CAP provides a rich interface that can be extended. What serialization, custom processing, custom sending is not a problem
CAP is based on MIT open source, so you can do any secondary development you want. (Remember to keep MIT License)
Now you think I’m done? No!
You can use CAP as an EventBus. CAP has excellent message handling capabilities. Don’t worry about the bottleneck in CAP.
In this article we learned two distributed system theory, they are the CAP and BASE theory, at the same time we also summarized and compared the advantages and disadvantages of several distributed decomposition scheme, distributed transaction itself is a technical problem, is not a perfect solution to all the scene, specific still should go choice according to the business scenario. Then we introduce a distributed transaction solution CAP based on local message.
Finally, a personal note
If you want to know more about distributed knowledge, you can follow me. I will also sort out more knowledge about distributed architecture in the future and share it with you. By the way, I recommend an exchange learning group: 650385180, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization has become an architect’s essential knowledge system. You can also get free learning resources, which have benefited a lot at present. The following course system map is also available in the group.
If you found this article helpful, thank you for your likes.