Distributed Transactions: From Theory to Practice (PART I)

From centralized to distributed

Mainframes were invented in the 1960s and became mainstream because of their performance in security and stability. However, since the 1980s, the development of computer system to network and miniaturization has become increasingly obvious, and the traditional centralized processing mode is increasingly unable to meet the needs of people.

The most obvious problem with centralism is a single point.

With the continuous improvement of PC performance and the rapid popularization of network technology, the market share of mainframe is becoming smaller and smaller, many enterprises began to abandon the original mainframe, and use minicomputer and ordinary PC server to build distributed computers.

distributed

What is a distributed system?

A distributed system is one in which hardware or software components are distributed on different network computers and communicate and coordinate with each other only through messaging.

A standard distributed system has the following characteristics:

  • Distribution (multiple computers distributed randomly in space)
  • Equivalence (replica is one of the most common concepts in distributed systems)
  • concurrency
  • Lack of a global clock (lack of a global clock sequence control)
  • Failures happen all the time

Network factors are inevitably introduced in the process of evolution from centralized to distributed, and additional problems are also introduced due to the unreliability of network itself:

  • Abnormal communication
  • Network partition (commonly known as “split brain”)
  • Node failure

Although there are many problems, there are always ways to solve them. Just as there are formulas to solve problems, distribution also has corresponding theoretical support, such as CAP theorem and BASE theory.

CAP

In July 2000, Professor Eric Brewer of University of California, Berkeley proposed CAP conjecture at ACM PODC conference. Two years later, Seth Gilbert and Nancy Lynch of the Massachusetts Institute of Technology proved CAP theoretically. Since then, CAP theory has officially become the accepted theorem in distributed computing.

The CAP theorem tells us that A distributed system cannot simultaneously satisfy the three basic requirements of C:Consistency, A:Availability and P:Partition tolerance, but only two of them at most.

-w403
  • Consistency Each read reads the latest write or error, which is equivalent to all nodes accessing the same latest copy of data.

  • Availability Each request receives a (non-error) response, but there is no guarantee that it contains the latest write operation.

  • Partition fault tolerance The system continues to operate despite any number of messages being dropped or delayed by the network between nodes.

When the network partition fails, we should decide

  • Cancel the operation, reducing availability but ensuring consistency
  • Operations continue to provide availability, but there is a risk of inconsistency

For example:

Here is a system topology with three nodes:

When a distributed system encounters any network partition failure, it still needs to be able to guarantee the consistency and availability of external services, unless the entire network environment fails.

If the communication from M to S1 fails, or S1 hangs, there are two partitions S1, M, and S2.


In this case, we should be tolerant, that is, in this case, the system should still be able to provide services, not because of such a partition problem can not provide services as a whole.

According to the CAP theorem, CAP can only take two, we have guaranteed P, so we have to choose between C and A.

  • If AP is selected to ensure availability, S1, M, and S2 partitions provide services at the same time to ensure system availability. However, data consistency cannot be guaranteed because data cannot be synchronized between M and S1 and consistency requires that the latest write or error can be read each time.

  • CP to ensure consistency When M and S1 cannot communicate with each other, the system needs to recover from errors. During this period, the system is unavailable. Availability requires that each request receives a response, so availability cannot be guaranteed. After the system is restored, data in the available state is consistent, ensuring consistency.

Note: Consistency refers to the above when we quit strong consistency of data, and reserves the final data consistency, such a system cannot be guaranteed to keep the consistency of real-time data, but can promise is that data will eventually reach a consistent state, it is introduced the concept of a time window, specifically how long does it take to reach consistent data depends on the design of the system, This mainly includes the length of time that data copies are replicated between different nodes. In the face of CAP, we will spend more energy on how to find A balance between C and A in system design.

The BASE theory of

Eric Brewer first proposed the concept of BASE in his paper Cluster-based Scalable Network Services published in 1997. Dan Pritchett, the architect of eBay, put forward the BASE theory explicitly for the first time in his article BASE: An AcidAlternative in 2008.

Short for Basically Available, Soft state, and Eventually consistent, BASE is the result of tradeoffs between consistency and usability in CAP, It comes from the conclusion of distributed practice of large-scale Internet system and is gradually evolved based on THE CAP theorem. Its core idea is that even though Strong consistency cannot be achieved, each application can be based on its own business characteristics. Adopt an appropriate approach to achieving Eventual consistency.

  • Basic available

    Basic availability is the ability of a distributed system to allow a partial loss of availability in the event of unforeseen failures:

    • Loss in response time
    • Loss of functionality (degraded pages)
  • Weak State Weak state, also known as soft state, is the opposite of hard state. It allows data in the system to exist in an intermediate state and considers that the intermediate state does not affect the overall availability of the system. That is, it allows a delay in data synchronization between data copies on different nodes.

  • Final consistency Final consistency indicates that all data copies in the system can reach a consistent state after a period of synchronization. Therefore, the essence of final consistency is that the system needs to ensure the consistency of the final data, rather than ensuring the strong consistency of the system data in real time.

Protocols and Algorithms

In order to solve the problem of distributed consistency, a large number of classic consistency protocols and algorithms emerged in the process of long-term exploration and research, such as two-phase, three-phase commit protocol, Paxos, Raft (Muti – Paxos), ZAB (MutI – Paxos) algorithm, etc.

For information on the principles and implementation of distributed Consensus algorithms, please refer to other resources. This article focuses on distributed transactions and will not cover more.

Distributed transactions 2PC and 3PC

The transaction

Transaction processing is information processing in computer science, divided into separate, indivisible operations called transactions. Each transaction must succeed or fail as a complete unit; It can never be only partially complete. Transactions should have four attributes: atomicity, consistency, isolation, and persistence. These four properties are commonly referred to as ACID properties.

In a centralized singleton application, ACID is guaranteed by the database for transaction operations; Mysql, for example, guarantees ACID through its own implementation.

Distributed transaction

Distributed transactions are database transactions in which two or more network hosts are involved. Typically, the host provides transaction resources, and the transaction manager is responsible for creating and managing global transactions that contain all operations on such resources. Like any other transaction, a distributed transaction must have all four ACID (atomicity, consistency, isolation, persistence) properties, where atomicity guarantees all or all of the results of the unit of work.

For a distributed transaction, which involves multiple DB operations, the difficulty here is how to ensure the consistency of multiple database data operations under the unreliable network environment.

Recall the protocols we mentioned earlier to solve the conformance problem. The two-phase and three-phase commit protocols are the theoretical basis for solving this problem.

XA and 2PC and 3PC

X/Open was created in 1984 by a consortium of companies to define and advance Open standards in information technology. X/Open and The Open Software Foundation merged to form The Open Group and managed The UNIX trademark from 1993 to 1996.

XA stands for eXtended Architecture. Is a specification published by X/Open in 1991 for distributed transaction processing (DTP). It is a distributed transaction protocol, which guarantees strong consistency through two-phase commit protocol. The DTP model has become the de facto standard for the behavior of transaction model components.

Here is my screenshot of the DTP model section from the Open Group standard file:

The DTP model abstracts the concepts of AP (application program), TM (transaction manager) and RM (resource manager) to ensure strong consistency of distributed transactions. TM and RM use XA protocol for bidirectional communication.

Compared to traditional local transactions, XA transactions add a preparation phase where the database, in addition to passively accepting commit instructions, can also reverse notify the caller whether the transaction can be committed. TM can collect the preparation results of all branch transactions and perform atomic commit at the end to ensure strong transaction consistency.

Java by defining the JTA interface to achieve XA model, JTA interface ResourceManager needs database vendors to provide XA driver implementation, TransactionManager needs TransactionManager vendor implementation, Traditional transaction managers are expensive to use because they need to be tied to the application server. The embedded transaction manager can provide services in the form of JAR packages, which can ensure strong consistency of cross-library transactions after sharding after integration with Apache ShardingSphere.

Typically, XA transactions are supported only if you use the XA transaction connection pool provided by the transaction manager vendor. When integrating XA transactions, Apache ShardingSphere separates XA transaction management from connection pool management to achieve zero intrusion to applications.

Two-phase commit protocol

As mentioned above, DTP ensures strong consistency through a two-phase commit protocol. So what is the two-phase commit protocol?

In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of other nodes’ operation. When a transaction across multiple nodes, in order to keep the ACID characteristic of transaction, need to introduce a unity as coordinator component to control all the nodes (referred to as participants) operating results and eventually indicating whether the node should submit the operating results are real (for example, the updated data to disk, etc.). Therefore, the algorithm idea of two-stage submission can be summarized as follows: participants will inform the coordinator of the success or failure of the operation, and then the coordinator will decide whether to submit the operation or stop the operation according to the feedback information of all participants.

The two stages are:

  • Stage 1: Preparation stage (voting stage)
  • Phase 2: Submission phase (Implementation phase)

Schematic diagram of “Transaction Commit” in two-phase commit:Schematic diagram of two-phase commit “Transaction Rollback” :


So far, we can know that the so-called XA transaction is the DTP model of XA, and the consistency of DTP is guaranteed by the two-phase commit protocol.

Three-phase commit protocol

The two-phase commit protocol has its advantages, but its disadvantages are also obvious:

  • Synchronous blocking All transaction participants are synchronous blocked while waiting for responses from other participants.
  • The single point of problem coordinator plays a very large role in 2PC, and failure can have a big impact. In particular, when phase two fails, all participants are kept in a waiting state, unable to complete other operations
  • Data inconsistency In phase 2, if the coordinator sends only part of the Commit message, an exception occurs on the network, and only part of the participants receive the Commit message, that is, only part of the participants Commit the transaction, making the system data inconsistent.
  • Too conservative, the failure of any node will lead to the failure of the whole transaction, without a perfect fault tolerance mechanism.

Three-phase Commit is designed to address the shortcomings of the two-phase commit protocol. Unlike two-phase commit, three-phase commit is a “non-blocking” protocol.

Unlike the two-phase commit, the three-phase commit has two change points.

  1. Introduce timeouts. Introduce timeouts for both the coordinator and the participant.
  2. Insert a preparation phase between phases 1 and 2. The state of each participating node is consistent before the final submission stage.

That is, in addition to introducing a timeout mechanism, 3PC splits the preparation phase of 2PC in two again, so that there are three phases of CanCommit, PreCommit, and DoCommit.


Compared to 2PC, 3PC mainly addresses single points of failure and reduces blocking, because once an actor fails to receive information from the coordinator in a timely manner, he defaults to commit. Transaction resources are not held and blocked all the time.

However, this mechanism can also cause data consistency problems because, due to network reasons, the abort response sent by the coordinator is not received in time by the participant, and the participant performs the COMMIT operation after waiting for a timeout. This results in data inconsistencies with other participants who receive abort and perform rollback

Knowing 2PC and 3PC, we can see that neither two-phase commit nor three-phase commit can completely solve the distributed consistency problem. Mike Burrows, the author of Google Chubby, said, There is only one consensus protocol, and that’s Paxos “– All other approaches are just broken versions of Paxos. This means that there is only one consistency algorithm in the world, and that is Paxos, and all other consistency algorithms are incomplete versions of Paxos

Distributed transaction solutions

From the above description, the rest of us already know a solution for distributed transactions, namely XA transactions based on the two-phase commit protocol. So what are the alternatives? Here are a few.

TCC

About the concept of TCC (try-confirm-cancel), It was first proposed by Pat Helland in a paper titled Life Beyond Distributed Transactions: An Apostate’s Opinion published in 2007. The TCC transaction mechanism addresses several disadvantages compared to XA described above:

The single point of coordinator is resolved, and the business activity is initiated and completed by the main business. The business activity manager has also become multi-point, introducing clustering. Synchronization blocking: A timeout is introduced and the whole resource is not locked. The resource is converted to service logic with smaller granularity.

Data consistency, with compensation, is controlled by the business activity manager

TCC(Try Confirm Cancel)

  • Try phase: Try execution, complete all service checks (consistency), reserve necessary service resources (quasi-isolation)
  • Confirm: The Confirm operation is idempotent, and only service resources reserved during the Try phase are used. Idempotent design is required. Retry if Confirm fails.
  • Cancel: Cancels the execution and releases the reserved service resources in the Try phase. The exception handling scheme of the Cancel phase is basically the same as that of the Confirm phase.

In the Try stage, the business system is checked and resource preview, such as order and storage operations, which need to check whether the remaining inventory quantity is sufficient and reserve. The operation in the Try stage is to operate the available inventory quantity.

Realizing distributed transaction based on TCC will split the logic that can be realized only by one interface into three Try, Confirm and Cancel interfaces, so the code implementation complexity is relatively high.

Local message table

The local message table solution was originally published by ebay architect Dan Pritchett to the ACM in 2008. In this scheme, there will be two roles of message producer and consumer. Assume that system A is message producer and system B is message consumer, and the general process is as follows:

  1. When system A is called by another system to perform A database table update operation, the business table of the database will be updated first, and A data will be inserted into the message table of the same database. The two operations will occur in the same transaction
  2. The script of system A periodically polls local messages to write A message to MQ and retries if the message fails to be sent
  3. System B consumes the messages in MQ and processes the business logic. If the local transaction fails, the message in MQ will be consumed again and retried, and if the service fails, system A can be notified to roll back

Local message table implementation conditions:

  • Both consumer and producer interfaces need to support idempotent
  • Producers need to create additional message tables
  • Compensation logic needs to be provided and the producer needs to support rollback operations if the consumer business fails

Fault tolerance mechanism:

  • If step 1 fails, the transaction is rolled back directly
  • Step 2 and step 3 Write mq and consume MQ fails. Retry
  • Step 3 The service fails. System B rolls back the transaction to system A

The core of this solution is to execute tasks requiring distributed processing asynchronously through message logging. Message logs can be stored in local text, databases, or message queues, and then automatically or manually initiated retries by business rules. Manual retry is more commonly used in payment scenarios to deal with post-hoc problems through reconciliation systems.

Reliable messages are ultimately consistent

The general process is as follows

  1. A The system sends A prepare message to the MQ server. If the prepare message fails to be sent, the operation is cancelled
  2. If the message is sent successfully, the local transaction is executed
  3. Mq sends a confirm message if the local transaction is successful, or a rollback message if it fails
  4. B Periodically consumes confirm messages in mq, performs local transactions, and sends ACK messages. If the local transaction in system B fails, the system tries again and again. If the service fails, the system sends A rollback request to system A
  5. The MQ system periodically polls all prepared messages and invokes the interface provided by system A. If the PREPARED message is successfully processed, the SYSTEM sends the Confirm message again. Otherwise, the SYSTEM rolls back the PREPARED message

The biggest difference between this solution and the local message is that the local message table is removed. The local message table relies on the message table to retry writing to THE MQ server. In this solution, the local message table is used to retry or rollback the message instead of polling the prepare message status. The implementation conditions are basically the same as the redundancy fault – tolerant scheme. Alibaba’s RocketMq is currently on the market.

Best efforts to inform

Maximum effort notification is the simplest kind of flexible transaction, which is suitable for some business with low final consistency time sensitivity, and the passive processing result does not affect the active processing result.

The general idea of this plan is:

  • After the local transaction is completed, system A sends A message to MQ.
  • There will be a dedicated MQ consuming service that will consume MQ and invoke the interface of system B;
  • If system B succeeds, it is OK; If system B fails, the best effort notification service periodically tries to call system B again, N times, and finally gives up.

Seata

What is Seata?

Seata is an open source distributed transaction solution dedicated to providing high performance and easy to use distributed transaction services. Seata will provide users with AT, TCC, SAGA and XA transaction modes to create a one-stop distributed solution for users.

Building distributed transaction solutions for enterprise business based on Seata’s AT pattern can bring the following three core values

  • Low cost: Unchanged programming model, light dependency, no specific design for distributed transaction scenarios, business built and grew naturally like building blocks.
  • High performance: No protocol blocking; Fast resource release ensures business throughput.
  • High availability: In extreme cases, abnormal transactions can be temporarily skipped to ensure high availability of the entire service system.

Seata currently offers four transaction models:

We commonly use AT and TCC in daily life. One is non-invasive and the other is highly invasive. In the following articles, we will explain one by one.

Seata’s architecture:


Three components: Transaction Manager (TM), Resource Manager (RM), and Transaction Coordinator (TC).

A typical transaction process:

  • TM applies to TC for starting a global transaction. The global transaction is successfully created and a globally unique XID is generated.
  • The XID is propagated in the context of the microservice invocation link.
  • RM registers branch transactions with THE TC and brings them under the jurisdiction of the global transaction corresponding to the XID.
  • TM initiates a global commit or rollback resolution against the XID to TC.
  • TC schedules all branch transactions under XID to complete commit or rollback requests.

Seata’s global transaction process is divided into two phases:

  • Execution phase: Execute branch transactions and ensure that the execution results are Rollbackable and Durable.
  • Completion phase: According to the resolution formed in the execution phase, apply the global Commit or Rollback request sent by TM to TC, and TC command RM drives the branch transaction to Commit or Rollback.

The so-called transaction mode of Seata refers to the behavior mode of branch transactions running under the GLOBAL transaction framework of Seata. To be precise, it should be called the branch transaction pattern. The division of the two phases also shows that Seata is based on the implementation of the two-phase commit protocol.

AT mode

Automatic Transaction

Execution stage:

  • Rollback: Records rollback logs based on SQL parsing results
  • Persistence: Rollback logs and business SQL are committed to the database in the same local transaction

Completion stage:

  • Branch commit: Asynchronously delete rollback log records
  • Branch rollback: Reverse compensation updates based on rollback logs

TCC mode

Execution stage:

  • Invoke the business-defined Try method (rollback and persistence is fully guaranteed by the business level)

Completion stage:

  • Branch commit: Invoke the Confirm method defined by each transaction branch
  • Branch rollback: Invoke the Cancel method defined by each transaction branch

To be continued

In the following articles, we will take a look AT the implementation of THE TWO modes of AT and TCC in Seata with specific examples and the problems encountered in the development process. Please stay tuned.

For more exciting content, please pay attention to the public account “Small box technology sharing”

Chat 🏆 technology project stage v | distributed those things…