preface

In the last article, we microservitized the entire transaction system into multiple independent business components, each of which contains not only its own business microservices, but also a independently managed database. So, let’s consider the ordering scene, when the user orders, there are three main steps: one is to freeze the amount, two is to add orders, three is delivered to the matching engine. These three steps are required to ensure transaction consistency. This is easy to satisfy when neither the service nor the database is split. But after the split, the operations of these steps are separated into different business components, services are separated, and databases are separated. In this distributed environment, how to ensure the consistency of transactions is the problem of distributed transactions.

What are the solutions to the distributed transaction problem? How to choose? How to land? In this article, we will answer these questions.

From the ACID

ACID refers to Atomicity, Consistency, Isolation, Durability for database transactions.

Atomicity requires that the sequence of operations of a transaction either complete or not complete, and cannot be stopped somewhere in the middle. If an error occurs, it should be rolled back to the state before the transaction began.

Persistence requires that after a transaction, the result should be persistent. At the database level, WAL (write-Ahead Logging) technology is commonly used to ensure atomicity and persistence.

Isolation is to deal with concurrent transactions, requiring that each transaction executed concurrently is isolated from each other to prevent data inconsistency caused by cross execution of multiple transactions executed concurrently. If isolation is not considered, problems can occur with dirty reads, unrepeatable reads, phantom reads, and so on. In essence, the implementation of isolation is concurrency control. In SQL standard, four isolation levels are defined, from lowest to highest: Read Uncommitted, Read Committed, Repeatable Read, Serializable. Transactions with lower isolation levels have better concurrency but lower consistency. The four isolation levels defined by the standard are only applicable to lock-based transaction concurrency control. Later, the isolation scheme based on MVCC (multi-version concurrency control) mechanism appeared. Compared with lock-based concurrency control, the main feature of this mechanism is that no lock is read. This feature greatly improves the concurrency performance of the system for the scenario of more read and less write, so most databases have implemented MVCC.

Consistency is easily confused with the C in CAP, but the two are different. Consistency in CAP, and specifically in databases, means that in a distributed database, each node must have the same copy of the same data. The consistency of transactions ensures that transactions can only move the database from one valid state to another, and keep the database invariable, with no perceptible intermediate state. An effective state is one that satisfies predetermined constraints, both at the database level and in business logic.

The most common example to explain transaction consistency is A transfer, assuming that A transfers $100 to B. If the account balance of A is only 90 yuan, and the constraint condition of the database on the account balance is not less than 0, then if the transfer is successful, the account balance of A will become negative, which does not meet the constraint condition and does not meet the consistency. If the account balance of A is sufficient, it needs to be divided into two steps: first deduct 100 yuan from the account balance of A, and then add 100 yuan to the account balance of B. If only the first step is completed, the transaction is finished, then the business logic is incorrect, that is, the whole transaction does not meet the consistency. The transaction is consistent only if A’s account balance is reduced by $100 and B’s account balance is increased by $100, and both steps succeed without interference from other concurrent transactions.

Essentially, atomicity, isolation, persistence, the ultimate goal is to ensure consistency. That is, consistency is the ultimate goal, atomicity, isolation, persistence can be said to be the means to achieve this goal.

So the ultimate goal for both local and distributed transactions is to ensure consistency. Just for different scenarios, there are different implementation schemes, and the consistency of the degree of choice.

XA specification

There are many solutions for Distributed Transaction Processing. The XA specification is one of the representative standard solutions. It was proposed by the X/Open organization in 1991 with the document: Distributed Transaction Processing: The XA Specification.

The XA specification describes a DTP model, which is a conceptual model for implementing distributed transaction processing systems. Not only X/Open, but OSI also has official documentation describing the DTP model. The XA specification describes this model as consisting of three roles:

  • AP: Application Pragram defines the boundaries of a transaction and specifies the behavior that makes up a transaction. It can be understood as a micro-service that the transaction initiates.
  • RMs: Resource Managers, there are several, which can be understood as each database instance in a distributed database.
  • TM: Transaction Manager, which coordinates and manages transactions, is a coordinator that controls global transactions.

Introduce TM, because the entire global transaction is split into multiple nodes, each node while it is possible to know whether his operation is successful, but don’t know the other nodes on the success of operation, rely on disperse node itself and there is no guarantee that the consistency of the global transaction, so need to introduce a coordinator to manage global, In order to ensure the ACID of the global transaction.

The diagram of the three is as follows:

XA specification also defines a series of interfaces, called XA interfaces, used for communication between TM and RM, mainly including the following interfaces:

The transaction completion and rollback between TM and RM is implemented by using the two-phase Commit protocol (2PC), namely the two-phase Commit protocol. The 2PC protocol was proposed much earlier than the XA specification. It was first mentioned by Jim Gray, an expert in distributed transactions, in his 1977 article Notes on Database Operating Systems.

2 PC protocol

2PC = two-phase Commit. The transaction is divided into Two phases: Prepare Phase and Commit Phase.

Before starting the two-phase commit, the RMS involved need to register with TM first. The AP then initiates a global transaction to TM, after which the two-phase commit of the transaction begins.

In the Prepare phase, TM sends a Prepare request to each RM involved and waits for the response from the RMs. After receiving the request, the RM executes the local transaction but does not commit it, and logs the transaction, namely undo and redo logs. RM’s local transaction will return a response to TM OK if it succeeds, or an error if it fails. In this phase, if RM successfully executes the local transaction, it will lock the transaction resource and wait for TM’s next instruction because there is no commit.

After the Prepare phase, there are three possibilities:

  • All RMS responded ok
  • One or more RMS responded with an error
  • TM waited for RM’s response timed out

During the Commit phase, different operations are performed based on the three different results. If all RMS respond ok, TM sends a COMMIT request to all RMS. If the other two conditions occur, TM sends a ROLLBACK request to all RMS. If a COMMIT request is received, the RM commits the uncommitted local transaction from the previous phase. If a ROLLBACK request is received, the RM rolls back the local transaction.

The whole process is roughly as follows:

Although the process is simple, but distributed system, network timeout, network retransmission, server downtime and other problems may occur at any time, so it will also bring some problems to distributed transactions. The main problems are idempotent processing, empty rollback and resource suspension.

When TM sends commit/rollback requests to RM, if the request times out or is interrupted due to network jitter, TM repeatedly sends commit/rollback requests to RM. For RM, the first COMMIT /rollback request may have been received and processed, but TM did not receive the response due to network problems. If RM receives the same commit/rollback request again, it cannot handle the local transaction again. The correct thing to do is to ensure idempotency of RM’s COMMIT/ROLLBACK interfaces.

If RM did not receive the prepare request but received the rollback request, the rollback request was invalid. To solve the empty rollback problem, rollback needs to identify whether the prepare of the previous phase has been performed.

If the prepare request times out because of network congestion, TM initiates rollback. After that, RM receives the prepare request, but ROLLBACK reaches RM before PREPARE. In this case, the global transaction ends after the prepare request is received. If you execute the prepare request again, the transaction will be locked. However, the locked resources cannot be released after the transaction ends. At this point, resource suspension is formed.

The solution to these three problems can be generally solved by transaction state control table, which mainly contains global transaction ID, branch transaction ID and branch transaction state.

XA summary / 2 PCS

XA/2PC is used in distributed transactions. Generally, it can ensure the ACID properties of transactions and make use of the implementation of the database itself to commit and roll back local transactions without intrusions to services. However, the main disadvantages of 2PC are as follows:

  1. Synchronous blocking: During execution, all RMS are transactionally blocking. If the RM has access to a common resource, other third parties are blocked.
  2. TM single point of failure: Once A TM fails, RMs will keep locking transaction resources while waiting for TM messages, resulting in the whole system being blocked.
  3. Inconsistent data: In the second phase, after TM sends a COMMIT request to RMs, a local network exception occurs or TM fails during the commit request so that only part of THE RMs receives the commit request. This part of the RMs will perform a COMMIT after receiving a commit request, but other RMs that have not received a commit request cannot perform a transaction commit. So the whole distributed system will appear the phenomenon of inconsistent data.
  4. The status of the transaction is uncertain: the TM is down after sending a COMMIT message, and the RM that received the message is down at the same time. Even if a new TM is created through the election protocol, the status of the transaction is uncertain and the cluster cannot determine whether the transaction has been committed.

The biggest problem with 2PC is actually poor performance. Dealing with short transactions may be ok, but if dealing with long transactions, the resource lock time is longer, the performance is worse, and it is unbearable.

In order to improve 2PC, 3PC was later proposed. 3PC also adds a timeout mechanism to RMs and breaks down the entire transaction into three phases. However, 3PC only solved part of 2PC’s problems, not the poor performance, which was made worse by adding an extra stage. As a result, 3PC is hardly used and I haven’t found a landing implementation, so I’m not going to go into detail about 3PC. Although 2PC has its flaws, there are also implementations of XA in open source frameworks such as Atomikos, Bitronix, and Seata, and most major database vendors have implemented XA/2PC as well. Therefore, 2PC is still the best choice for scenarios where strong consistency is required.

Speaking of open source frameworks, I should add that the mature distributed transaction frameworks, including those mentioned below, are basically Java-based, and the implementation of other languages is still immature.

Flexible transaction

ACID compliant transactions, also known as rigid transactions, mainly guarantee strong consistency. XA/2PC is the main solution for rigid distributed transactions, but it is not suitable for high performance and high concurrency Internet scenarios due to its poor performance. In order to solve the performance problem, someone put forward the concept of flexible transaction based on BASE theory. The BASE theory is Basically Available, Soft state, and Eventual consistency. The basic idea is:

Even if Strong consistency cannot be achieved, each application can adopt an appropriate method to achieve Eventual consistency according to its own service characteristics.

As we said earlier, the (strong) consistency of transactions ensures that transactions can only move the database from one valid state to another, with no appreciable intermediate state. Flexible transactions allow data to exist in intermediate (soft) states, as long as final consistency is achieved over a period of time. The essence of final consistency is that the system needs to ensure the consistency of the final data, but does not need to ensure the strong consistency of the system data in real time. The best example is the inter-bank transfer, in which there is actually an intermediate state of frozen funds, and it takes a period of time to know the result of the transfer, that is, to achieve the final consistency of the result.

In terms of isolation, rigid transactions achieve resource isolation mainly by means of resource locking, which is provided at the database level without business implementation. However, flexible transactions are generally isolated through resource reservation (such as frozen amount) rather than locking, and such resource reservation isolation mode needs to be realized and ensured by the business itself.

The solution of flexible transaction is mainly divided into compensation type and notification type. The compensation type mainly includes TCC and Saga mode, while the notification type is divided into transaction message type and maximum effort notification type. Compensation transactions are synchronous and notification transactions are asynchronous, so there is also a division between synchronous and asynchronous transactions.

TCC

TCC = try-confirm-cancel, short for three words. TCC can be traced back to a 2007 paper called Life Beyond Distributed Transactions: An Apostate’s Opinion. In this paper, the original three words were false-confirmation -Cancellation, and Atomikos’s formal name was try-confirm-cancel. It also registered the TCC trademark.

TCC is also based on the 2PC design idea. It is also divided into two stages for transaction submission. The first stage is to submit the Try interface, and the second stage is to submit the Confirm or Cancel interface. TCC’s try-confirm-cancel is similar to 2PC’s prepare-commit-ROLLBACK, but the implementation is quite different.

2PC is actually a distributed transaction solution at the database level or resource level, such as prepare-commit-rollback. These operations are actually completed inside the database, and the developer level is not aware of them.

TCC is a distributed transaction scheme at the business level, and try-confirm-Cancel are all operations implemented at the business level, which developers can perceive and need to implement by themselves.

In the Try phase, all services are checked and required service resources are transferred to the intermediate state to reserve resources. For example, freeze money when transferring money.

If all branch services reply OK in the Try phase, Confirm is submitted to all branch services in the second phase to Confirm the execution of services. In this case, no service check is required and resources in the intermediate state are directly transferred to the final state.

If not all branches return OK during the Try phase, then the compensation mechanism is used. Cancel compensation is performed for those branches whose Try operation is OK, and the Try operation is rolled back to restore the resources in the intermediate state to the state before the transaction.

The specific process of TCC is shown in the figure below:

Note that the Try interface in the first phase is called by the business application (solid arrow) and the Confirm or Cancel interface in the second phase is called by the transaction coordinator TM (dotted arrow). This is the two-phase asynchronization function of the TCC model. When the first phase of the branch business service is successfully executed, the business application can be committed and then the second phase of each branch business service is asynchronously executed by the transaction coordinator.

Transaction logging is added to TCC and retries are made if the Confirm or Cancel phases fail, so idempotent support is required for both phases. If the retry fails, manual intervention is required for recovery and processing. In addition to supporting idempotent processing, TCC also needs to address the previously mentioned issues of empty rollback and resource suspension.

Compared with 2PC, TCC does not lock resources and does not affect the access of resources by concurrent transactions, so its performance is greatly improved and it is suitable for dealing with high concurrency scenarios and long transactions.

However, the disadvantage of TCC is that it is too intrusive and requires a lot of development work for business transformation, which brings difficulties to business upgrading and operation and maintenance. The main open source frameworks implemented by TCC include ByteTCC, TCC-Transaction, Himly, AND Seata TCC modes.

Saga

Saga has its origins much earlier than TCC, starting with a 1987 paper “Sagas” on how to deal with a long lived transaction. The core idea of Saga is to decompose a long transaction into many short transactions (also known as sub-transactions). Each sub-transaction is a local transaction that can ensure its own consistency, and each sub-transaction has its corresponding execution module and compensation module. When any one of the sub-transactions fails, it can call relevant compensation methods to restore the initial state of the transaction, so as to achieve the final consistency of the transaction.

In general, Saga consists of two parts:

  • Each Saga transaction consists of a series of sub-transactions Ti
  • Each Ti has a corresponding compensation action Ci, which is used to undo the result caused by Ti

In contrast to TCC, Saga transactions are not committed in two phases, there is no “reserved” action, and Ti is committed directly to the library. So, one might ask: how does Saga guarantee isolation? In fact, Saga itself does not guarantee isolation and requires the business to control concurrency itself, that is, to lock or reserve resources at the business layer.

The best case is the entire subtransaction sequence T1, T2… Tn is successfully executed, and the whole Saga transaction is successfully executed.

If a subtransaction fails, there are two ways to recover: forward and backward.

  • Forward Recovery: Retry failed transactions, assuming that each subtransaction will eventually succeed
  • Backward recovery: Compensating for all completed transactions, essentially rolling back all completed local transactions

Obviously, there is no need to provide compensation for recovery going forward. If you are in a business where subtransactions always succeed eventually, or compensation methods are difficult or impossible to define, recovery forward is more in line with your needs.

In the case of backward recovery, if a subtransaction fails, the failure information is immediately sent to the AP, and subsequent compensation operations are performed asynchronously.

Theoretically, the compensation method must be successful. If the compensation fails due to server downtime or network jitter, you need to retry the compensation method. If the retry still fails, you need to manually handle the compensation.

There are two main modes of Saga implementation: Chinese and non-centralized.

The centralized implementation needs to rely on the centralized coordinator (TM) to be responsible for service invocation and transaction coordination, mainly based on the design and implementation of AOP Proxy, huawei’s ServiceComb Saga is implemented in this way. Centralized implementation is intuitive and easy to control, easy to develop, low cost to learn, and the disadvantage is that the business coupling degree will be relatively high.

A decentralized implementation, also known as a distributed implementation, does not rely on centralized TM, but instead coordinates transactions through an event-driven mechanism, such as the one used by Seata Saga, which implements a state machine. The advantages of non-centralized implementation: the use of event sources to reduce the complexity of the system, improve the system scalability, processing module through subscription events to reduce the coupling degree of the system. The disadvantages are: Saga systems involve a large number of business events, which can cause problems for coding and debugging; Also, the related business logic processing is event-based, and the related event processing module may have the problem of loop dependency.

Saga has no two-phase commit, so it can take Saga half as long to process transaction requests as TCC, because TCC requires at least two communications with each service, whereas Saga only needs one. Thus, in theory, Saga should be at least twice as powerful as TCC. In addition, Saga is one of the most successful schemes in the industry due to its low intrusion on business.

Transaction message type

Transactional messaging, also known as asynchronous assurance, uses message queues (MQ) to ensure final consistency. Compared with the synchronous compensation scheme, the asynchronous scheme introducing MQ has the following advantages:

  • Coupling between microservices of different branching transactions can be reduced
  • Can improve the throughput of each service
  • Can enhance the availability of the overall service

So, with the introduction of MQ, the core issue is how to deal with the consistency between the success of the service’s local transaction processing and the success of the message delivery. That is, how can the upstream service of an MQ message reliably deliver the message to the downstream service once the local transaction has been processed? At present, the industry has two solutions to this problem:

  • Transaction messages based on MQ itself
  • Db-based local message table

MQ transaction messages

A transaction messaging scheme based on MQ itself, which is currently supported only by RocketMQ and not by any of the other major MQ platforms, so our description of the scheme is based on RocketMQ. The design idea of the scheme is based on 2PC, and the transaction message interaction process is shown as follows:

Among them, there are several concepts to be explained:

  • Transaction messages: Message queue MQ provides distributed transaction functionality similar to X/Open XA, where ultimate consistency of distributed transactions can be achieved through MQ transaction messages.
  • Semi-transactional message: a message that is temporarily undeliverable. The sender has successfully sent the message to the MQ server, but the server has not received a second acknowledgement of the message from the producer, and the message is marked as “temporarily undeliverable”.
  • Message back to check: due to network failure, producers application restart, leading to secondary confirmation for a transaction message is missing, the MQ server by scanning found a message “half a transaction message” for a long time, need to message producers asks the final state (Commit or Rollback), namely the message back to the inquiry process.

Transaction message sending steps are as follows:

  1. The sender sends a semi-transactional message to the MQ server.
  2. After successfully persisting the message, the MQ server returns an Ack to the sender confirming that the message has been successfully sent, which is a semi-transactional message.
  3. The sender starts executing the local transaction logic.
  4. According to the local transaction execution result, the sender submits a secondary confirmation (Commit or Rollback) to the server. When the server receives the Commit status, the semi-transaction message is marked as deliverable, and the subscriber finally receives the message. The semi-transaction message is deleted when the server receives the Rollback status and will not be accepted by the subscriber.

The steps for checking back transaction messages are as follows:

  1. In the case of network disconnection or application restart, the secondary confirmation submitted in Step 4 does not reach the server. After a fixed period of time, the server checks the message back.
  2. After receiving the message, the sender needs to check the final result of the local transaction execution of the corresponding message.
  3. The sender submits a second acknowledgement based on the final status of the local transaction, and the server continues to perform operations on the half-transaction message according to Step 4.

It is important to note that if the sender does not receive an Ack result from the MQ server in a timely manner, it may result in repeated delivery of MQ messages, so the subscriber must idempotent the consumption of messages so that the same message is not consumed twice.

The biggest disadvantage of the MQ transaction message scheme is that it is intrusive to the business and the business sender needs to provide a backcheck interface.

Local message table

The local message table scheme was first proposed by ebay and later popularized in China by companies such as Alipay. For MQ that does not support transaction messages, the idea is to store transaction messages in a local database, and the recording of message data must be done in the same transaction as the recording of business data. Once the message data is saved to DB, a scheduled task can be used to poll the DB for messages with status to be sent, and then deliver the messages to MQ. After receiving a successful ACK from MQ, the status of the messages in the DB can be updated or deleted. The interactive process is shown in the figure below:

The processing steps are as follows:

  1. The message producer processes the business update operation in the local transaction and writes a transaction message to the local message table with the status ofsent, business operations and message table writing are done in the same local transaction.
  2. Periodic task polling The status is queried from the local message tablesentStatus messages, and posts any detected messages to the MQ Server.
  3. When the MQ Server receives the message, it persists it and then returns an ACK to the message producer.
  4. After receiving the ACK from the MQ Server, the message producer queries the corresponding message record from the local message table and updates the message status toHas been sentOr simply delete the message record.
  5. MQ Server returns an ACK to the message producer and then sends the message to the message consumer.
  6. The message consumer receives the message, performs a local transaction, and finally returns an ACK to the MQ Server.

It is possible for producers to send duplicate messages to MQ due to MQ outages or network outages, so the processing of messages received by consumers needs to support idempotent processing.

The advantage of this scheme, compared to the MQ transaction message scheme, is that the dependency on MQ is reduced because the reliability of message data depends on the local message table, not MQ. Another advantage is that it is easy to implement. The disadvantage is that the local message table is coupled with the business, which is difficult to make universal, and the message data and business data are the same database, occupying the business system resources. Local message tables are based on databases that read and write disk I/O, so there is a performance bottleneck under high concurrency.

Best efforts notification type

Maximum effort notification is also based on the extension of transaction messaging, and its application scenario is mainly used to notify external third-party systems. In other words, the best effort notification approach mainly addresses the business interaction between cross-platform, cross-enterprise systems. The transaction message scheme is suitable for distributed transactions among internal services of the same network system.

Maximum effort notification typically introduces a notification service that sends notification messages to third-party systems. The simplified process is shown as follows:

The interaction between the notification service and the third-party system is generally completed by the notification service calling the interface of the third-party system, and the notification message sent can be lost.

The notification service provides an incremental multi-block interval (5min, 10min, 30min, 1H, and 24h) for retrying the interface for invoking third-party systems. This is called maximum effort notification. If the notification fails after N times, then alarm + log + manual intervention is required.

Because notification services may call third-party systems multiple times, the interfaces provided by third-party systems need to be idempotent.

The notification service also needs a periodic verification mechanism to check service data to prevent service rollback when a third party fails to perform its responsibilities and ensure data consistency.

How to choice

So far, can solve the problem of distributed transaction scheme we have basically talked about all over, that to the distributed transaction to our transaction system, how to choose? Let’s take a look at the table below for a comparison of each scheme:

attribute XA/2PC TCC Saga MQ transaction messages Local message table Best efforts notification type
Transaction consistency strong In the weak weak weak weak
performance low In the high high high high
Business intrusion small big small In the In the In the
complexity In the high In the low low low
Maintenance costs low high In the In the low In the

And the specific selection, in fact, still need to be based on the scene. In the first article, I said that we should be scenario-driven architecture, and to talk about architecture without a scenario is to be a bully.

If we want to solve the business interaction with external third-party systems, for example, the transaction system is connected to the third-party payment system, then we can only choose maximum effort notification.

For scenarios where there are rigid requirements for short transactions with strong consistency, but no requirements for high performance and high concurrency, XA/2PC can be considered. For Java, the implementation can directly use the XA mode of the Seata framework.

TCC is more suitable for scenarios with high requirements on consistency, real-time performance and short execution time, such as transactions, payments and accounting transactions in Internet finance. It is also recommended to use Seata’s TCC mode if the implementation is Java.

Saga is suitable for business scenarios where transactions operate concurrently on the same resource in small numbers because Saga itself does not guarantee isolation. Also, Saga doesn’t have the action of reserving resources, so the compensation action should also be a manageable scenario.

MQ transaction message and local message table schemes are suitable for asynchronous transactions, requiring low consistency, tolerating long periods of data inconsistencies, involving fewer parties and links, and having an accounting/verification system in place. If RocketMQ is used on your system, consider using the MQ transaction messaging scheme, which is currently supported only by RocketMQ. Otherwise, consider a local message table scheme.

In fact, there is another type of selection, that is, business avoidance, which means that you can make a little adjustment from the business, so as to avoid the distributed transaction, this is the most elegant solution to solve the problem of distributed transaction, no one. Business avoidance, in essence, is to eliminate the problem itself. It requires empathy, thinking out of the inertia to think about solutions to problems from different dimensions, and sometimes sacrificing some business characteristics. However, the reality is that in most scenarios business avoidance is hard to achieve, and distributed transaction problems can only be solved honestly.

In addition, some realistic scenes also have particularity, at this time can not directly apply the above statement, but according to the specific scene and adjust the plan. For example, to be specific to our transaction system, let’s look at a distributed transaction processing scheme for placing orders.

There are actually three steps to place an order:

  1. Create an order;
  2. Freezing the balance of the user’s asset account;
  3. Send the order to the matching engine for matching.

The order transaction is initiated by the transaction service, the first step is also done in the transaction service, the second step should be done in the public service – since we haven’t separated the account service – and the third step is to post the order to the matchmaking engine via MQ.

From the scenario classification mentioned above, our transaction scenario belongs to the transaction transaction of Internet finance, which is more suitable for TCC, but the last step is asynchronous transaction, how should we choose this? In fact, the first two steps use TCC to ensure the consistency of synchronous transactions, while the third step uses the local message table asynchronously to ensure reliable delivery of messages. However, the message cannot be written to the message table until the transaction of the previous two steps is successfully executed. Matching engines, as consumers of MQ, need to do idempotent processing.

Therefore, a transaction sometimes does not have to use a single scheme, can be combined, can evolve. Distributed transaction problem is complicated, the most fundamental reason is also in this, the real scene is far more complex and changeable than the theory.

conclusion

We went through a long discussion of the various solutions for distributed transactions, starting with ACID, then XA, 2PC, TCC, Saga, MQ transaction messages, local message tables, maximum effort notification, and finally summarized and compared all the solutions and described the applicable scenarios. However, the topic of distributed transactions is far from being covered in this article, because space is limited, many design details are not covered, and more specific implementations are more complex. In addition, if the development language of the system is not Java, there is a high probability that you will need to implement the distributed transaction solution yourself.

In the end, not everything I said above is correct. If there is something wrong or missing, please leave a comment and point it out.


Previous articles:

Evolution of transaction system architecture (III) : microservitization

Evolution of Trading System Architecture (II) : version 2.0

Evolution of Trading System Architecture (I) : version 1.0