Distributed transactions, you get it

preface

I don’t have to tell you what a transaction is. On a stand-alone system, we can manage transactions through Spring, both programmatic and declarative transactions are very straightforward. But there are transactions in a monolithic system and once you get into distribution, there are some questions, so let’s start by asking what? And in the project transaction is inevitably a problem, distributed system is even more so, so today, let’s talk about the common scheme in distributed system.

Standalone versus distributed transactions

To answer the above question, why do transactions in a stand-alone system not work in a distributed system?

For example, when a user wants to place an order, what the order interface does is first create an order and then remotely call the inventory service to deduct the inventory. If the order fails due to various problems, but the request to deduct the inventory has already been sent to another service where changes have been made to the inventory. There is no way to get another service to roll back in real time. It either succeeds or fails together.

To better understand distributed transactions, let’s understand some basic concepts.

The basic nature of a transaction

Atomicity: A series of operations that are indivisible as a whole and either succeed or fail together. Consistency: Data is consistent across the business before and after a transaction. Isolation: Different transactions are isolated from each other. Persistence: Once a transaction succeeds, the data is persisted in the database.

Transaction ACID, I won’t go into that.

The laws of CAP

Consistency (C) : Whether all data backups in a distributed system have the same value at the same time. Availability (A) : Whether the cluster as A whole can respond to read/write requests from clients after some nodes fail. (High availability for data updates) Partitioning fault tolerance (P) : In practical terms, partitioning is equivalent to time-bound requirements for communication. If the system cannot achieve data consistency within the time limit, it means that A partitioning situation has occurred and that it must choose between C and A for the current operation.

That is to say, these three indicators cannot be done at the same time. At most, 2 can be chosen from 3. This conclusion is called CAP theorem.

The BASE theory of

BASE theory is an acronym for Basically Available, Soft State, and Eventually Consistent. Core idea: Even if Strong consistency cannot be achieved, each application can adopt appropriate methods to achieve Eventual consistency according to its own service characteristics.

Basically Available
- What is basic availability? Suppose the system, with some unexpected failure, still works, compared to a normal system:
- Loss in response time: normal search engines return results in 0.5 seconds, while basic search engines can return results in 2 seconds. Loss of functionality: On an e-commerce site, users can normally complete every order without a hitch. But during the promotion period, some consumers may be directed to a downgraded page to protect the stability of the shopping system.
Soft State* The requirement that copies of data across multiple nodes be consistent is a “hard state” as opposed to atomicity.
- Soft state refers to that data in the system is allowed to exist in an intermediate state, and the state is considered not to affect the overall availability of the system, that is, the system is allowed to have data delay in multiple data copies of different nodes.
Eventually Consistent
- It says soft state, and then you can’t always be soft state, you have to have a period of time. At the end of the period, data consistency should be guaranteed for all replicas to achieve final data consistency. This time frame depends on network latency, system load, data replication scheme design, and so on.

BASE theory is oriented to large, highly available and scalable distributed systems, which is the opposite of ACID, a traditional thing. It is completely different from ACID’s strong consistency model, but rather sacrifices strong consistency to achieve availability and allows the data to be inconsistent for a period of time, but eventually reach a consistent state. However, in actual distributed scenarios, different business units and components have different requirements for data consistency. Therefore, ACID characteristics and BASE theory are often combined in the design process of distributed system architecture. In fact, in distributed systems, not only distributed transactions, but basically all technologies are developed around CAP and BASE.

XA scheme

Xa scheme is based on two-phase commit, which implements the two-phase commit protocol. A transaction manager (TM) and one or more resource managers (RMS) and applications (aps) are defined and communicate between them. The key to XA is the control of two-phase commit by the transaction manager (TM).

In combination with this diagram, the first phase is the preparation phase, where the transaction coordinator asks each database involved in the transaction to commit the operation and reflect whether it can commit. The second stage is the submission stage. The transaction coordinator requires each database to commit data, and if any database rejects the commit, all databases are asked to roll back their part of the transaction.

Advantages and disadvantages: 1.XA is relatively simple, and once commercial databases implement XA, the cost of using distributed transactions is low. 2.XA performance is not ideal, especially in the single trading link, which often has high concurrency.XA cannot meet high concurrency scenario 3. The XA implementation of mysql does not record logs in the prepare phase, causing data inconsistency between the two databases due to the active/standby switchover. Many NoSQL also do not support XA, which makes the application scenario of XA very narrow. 5. If TM has a problem and cannot be repaired, RM will be blocked all the time, but there are also three stages of submission, which introduces the timeout mechanism (whether the coordinator or the participant sends the request to the other party, if there is no response for a long time, they will deal with it accordingly).

TCC transaction compensation scheme

TCC stands for:
Try,
Confirm,
Cancel

Try stage: This phase checks the resources of each service and locks or reserves the resources.

Confirm stage: This phase is about performing the actual operations in the individual services.

Cancel stageIf the business method execution of any of the services fails, there is a need to compensate by performing a rollback of the business logic that has been successfully executed. (Roll back those that performed successfully)

TCC provides another way of thinking, is to freeze the transaction needs of resources, and coordinate all of the application of transaction, if all the applications are successful will commit the transaction, freezing normal consumption of resources, if there is a application transaction fails, will release all the resources of the freeze, TCC also need to be segmented, at the same time also has a transaction manager. Stage 1 prepAR stage: The custom prepare logic is called. Phase 2 Commit/ROLLBACK phase: Invoke custom COMMIT /rollback logic.

Advantages and disadvantages: 1. It will not block other transactions, because resources are frozen in advance, other transactions will not compete with it for resources, so it will not block naturally. 2. Although the frozen resources do not have to compete with other transactions to increase the concurrency, each application still needs to block and wait for the completion of all other applications within this transaction. Therefore, it is recommended to use it when the execution time of each service is relatively short. 3. The service complexity increases, and three methods, try, confirm, and Cancel, need to be written. 4. It is highly intrusive to applications. Different services need to use different service codes for freezing and unfreezing consumption resources, which cannot be used by all applications.

Best effort notice

It is used for SMS verification code, or email verification, etc. We should also have encountered SMS verification code did not send over the situation. Its implementation is to inform according to the rule, does not guarantee that the data can be notified successfully, but will provide a queryable operation interface for checking. This solution is also implemented in combination with MQ.

For example, if you send HTTP requests through MQ, set the maximum number of notifications to five. So when service A fails to call service B, it will try again. After 5 attempts, it will be considered A failure. After all, I have tried my best to inform you, what do you expect from me?

Reliable message + Final consistency scheme

This scheme is based on the local transaction +MQ reliable message implementation, through the way of asynchronous message, to ensure final consistency.

So using the figure above to illustrate the detailed steps. 1. After confirming that the service is processed, service A sends A message to MQ. If the message succeeds, MQ replies to ACK. 2.MQ sends a message to the remote consumer service B and replies to the ACK message. Or MQ can persist messages to DB after receiving them and record the message status. 3. Service B receives the message and confirms it. If the message has been persisted, service B can change the state of the message. 4. Service B consumes the message row and returns to MQ after consuming it. If the message has been persisted, the status of the message changes to done 5. The tables of persistent messages can be scanned periodically, with special cases involving manual processing

As you can see, the most important thing is the reliability of the message. If you want to learn more about reliable message, you can read my other article, Reliable Message

seata

Seata is an open source distributed transaction solution of Alibaba, committed to providing high performance and easy to use distributed transaction services. Seata also has AT, TCC, SAGA and XA transaction modes to create a one-stop distributed solution for users.

This is an image from the front page of the Seata website. It can be seen that its core is completed by the ID+ three components of the distributed transaction process.

A Transaction Coordinator(TC) that maintains the running status of a global Transaction and coordinates and drives the commit or rollback of a global Transaction: Controls the boundaries of a global transaction, is responsible for starting a global transaction, and ultimately initiating a global commit or rollback resolution: Controls branch transactions, is responsible for branch registration, status reporting, and receives instructions from the transaction coordinator, drives commit and rollback of branch (local) transactions (XID) : the ID of the global transaction

The AT pattern uses @globalTransactional to control global transactions without intrusions to the business. This is similar to spring’s @transcational annotation.

Implementation process

TM applied to TC for starting a global transaction, and the global transaction was successfully created and a global unique XID was generated. 2.XID was propagated in the context of microservice invocation link. 4.TM initiates a global commit or rollback resolution against XID to TC. 5.TC schedules all branch transactions under XID to complete the commit or rollback request

AT mode mechanism (to non-intrusion into business)

A phase:
- Business data and rollback log records are committed in the same local transaction, freeing local locks and linked resources.
Stage 2:
- Commit asynchronously, very fast to complete
- Rollback is compensated in reverse by the rollback log of one phase

implementation

One-stage load: In one stage, SEATA intercepts “business SQL” 1. Parse the SQL semantics to find the business data to be updated in business SQL. Before the business data is updated, save it as “Before Image” 2. Run service SQL to update service data. 3. After the service data is updated, save it as after Image and generate a row lock

All of the above operations are done within a single database transaction, which ensures atomicity of the one-phase operations

Two-phase commit: Since the “business SQL” has been committed to the database in phase 1, the SeATA framework only needs to delete the snapshot data and row locks saved in phase 1 to complete data cleansing.

Two-phase rollback: If the two-phase rollback is performed, SEATA needs to roll back the service SQL that has been executed in the first phase to restore service data. Before Image is used to restore service data, but dirty data is checked before restoration. Compare current service data in the database with After Image. If the two data copies are identical, there is no dirty data and service data can be restored. Dirty writing on the need to turn to manual processing.

summary

Through the above, said several commonly used schemes today, but it is not said that distributed transactions can be forced to use, is not to use distributed transactions. Distributed transaction problem in our development or need to actively avoid, use of the word is also according to the project business to specific choice of scheme landing.

I have written several articles on distribution. I was going to write something about cluster building next, but I will have to put it on hold because I have been studying big data recently. However, I will record my learning process during the learning of big data. Would you like to join me or pay attention to it in advance, so that we can communicate and learn together?

Finished!

I have seen it, click a like point a concern and then go ~

Focus on me, we learn together, progress together. Be a little better every day than you were yesterday.