Abstract: How to overcome the problem of distributed transactions under microservices architecture?
What are microservices? What are the advantages and difficulties of microservices?
What is microservices Architecture?
In short, the system of microservice architecture is a distributed system, which is divided into independent service units according to the business, to solve the deficiency of single system, but also meet the increasingly complex business requirements. Each microservice is only focused on completing one task and doing it well.
Advantages of microservices architecture
-
Simplify complex problems by splitting complex businesses into smaller businesses, each business into a service. Conducive to division of labor, reduce the cost of learning new.
-
As a distributed system, microservice system is completely decoupled from services. With the increase of services, it can be split based on services and has a strong horizontal expansion capability.
-
The services use HTTP protocol to communicate, and are completely independent of each other. Each service can choose the appropriate programming language and database based on the business scenario.
-
Services are deployed independently, and the modification and deployment of each service has no impact on other services.
Although micro-service has the advantages mentioned above, the practice of micro-service is still in the stage of exploration. Many small and medium-sized Internet companies find it difficult to implement micro-service in view of experience, technical strength and other problems. Noted architect Chris Richardson points out that microservices are currently facing several difficulties:
-
After a single application is divided into a distributed system, the communication mechanism between processes and troubleshooting measures become more complex.
-
After system microservitization, a seemingly simple function may need to call multiple services and operate multiple database implementations internally, and the problem of distributed transaction of service invocation becomes very prominent.
-
With so many microservices, testing, deploying, monitoring, and so on has become more difficult.
As the RPC framework matures, the first problem has been gradually resolved. For example, Dubbo supports multiple communication protocols, and Spring Cloud supports restful calls very well. As for the third question, with the development of Docker and DevOps technologies and the launch of automatic operation and maintenance tools for public cloud PaaS platforms, the testing, deployment, operation and maintenance of microservices will become easier and easier.
As for the second problem, there is no general solution to solve the transaction problems caused by microservices. Distributed transactions have become the biggest obstacle to the implementation of microservices, which is also the most challenging technical problem. The following will discuss various solutions of distributed transactions under microservice architecture.
How to overcome the problem of distributed transactions under microservice architecture?
What is a transaction
A transaction is a logical processing unit consisting of a set of SQL statements. A transaction has the following four properties, often referred to simply as the ACID property of the transaction:
Atomicity: A transaction is an atomic unit of operation in which all or none of the modifications to data are performed.
Consistent: Data must be Consistent at the beginning and completion of a transaction. This means that all relevant data rules must be applied to transaction modifications to preserve data integrity.
Isolation: The database system provides some Isolation mechanism to ensure that transactions are executed in a “separate” environment that is not affected by external concurrent operations. The database transaction isolation levels from low to high are Read uncommitted, Read committed, Repeatable, Serializable.
Durability: After transactions complete, their modifications to data are permanent and persist even if system failures occur.
Typical scenario of distributed transaction:
Bank transfer business is a typical distributed transaction scenario, which usually includes the following three situations:
A. Intra-branch transfer: intra-branch transfer of the same bank
B. Intra-bank transfer: transfer between different branches of the same bank
C. Inter-bank transfer: transfer between different banks’ systems
For traditional centralized architectures, A and B are usually local transactions, and C is distributed transactions. After business microservice transformation, roll-in and roll-out are usually different microservices, and the same microservice usually runs in different instances. A may become A distributed transaction, or it may be circumvented by some means and completed within A local transaction. B and C are hard to avoid and can only be distributed transactions.
Microservices best practices recommend avoiding distributed transactions as much as possible, but in many business scenarios (such as the B and C transfer scenario above), distributed transactions are a technical problem.
Common solutions for distributed transactions
In order to solve the problem of distributed system consistency, many typical protocols and algorithms have been summarized in the process of balancing performance and data consistency. The most common of these is the 2 Phase Commitment Protocol.
Two-stage proposal submission
Transaction middleware and database use two-phase commit to complete a global transaction through the XA interface specification, which is based on the two-phase commit protocol.
The first stage is the voting stage, in which all participants give feedback on the success of the transaction to the coordinator. The second phase is the implementation phase, where the coordinator, based on the feedback of all participants, notifies all participants and synchronously commits or rolls back on all branches.
Two-phase commit solutions are widely used, with typical commercial applications including Oracle Tuxedo and IBM CICS. It has the advantage of being less intrusive into business code, but the disadvantages are also obvious:
Low performance: Due to the characteristics of XA protocol, transaction resources cannot be released for a long time, and the locking period is long. Moreover, it cannot be intervened at the application layer, and the performance of scenarios with high data concurrency conflicts is poor.
Single point of problem: The coordinator plays an important role in the entire two-phase commit process, and if the coordinator’s server goes down, the entire database cluster will be affected. For example, in the second phase, participants will remain blocked if the coordinator cannot send transaction commit or rollback notifications properly due to a failure.
Synchronous blocking: In the two-phase commit execution process, all participants need to follow the unified scheduling of the coordinator, so they are blocked and cannot engage in other operations, which is extremely inefficient.
Therefore, the two-phase commit scheme is rarely used in Internet services and cannot meet the high concurrency requirements.
In order to make up for the problem of low performance brought by this scheme, we have come up with many solutions to solve it, through the application layer, that is, the way of invading business, the typical TCC scheme and the final consistency scheme based on reliable message.
TCC transaction scheme
TCC transaction model is widely used in e-commerce and financial fields. The TCC solution is actually an improvement over the two-phase commit. Each branch of the entire business logic is explicitly divided into three operations: Try, Confirm, and Cancel. The Try part completes the business preparation, the Confirm part completes the business submission, and the Cancel part completes the transaction rollback. The basic principle is shown in the figure below.
When a transaction starts, the business application registers with the transaction coordinator to start the transaction. The business application then calls the try interface of all the services to complete a phase of preparation. The transaction coordinator then decides to call the Confirm or Cancel interface, depending on what the try interface returns. If the interface call fails, it will be retried.
TCC allows applications to define their own granularity of database operations, which makes it possible to reduce lock conflicts and improve throughput. For example, Huawei distributed transaction middleware DTM has high performance, and the common configuration server can support global transactions of 10,000 + TPS and branch transactions of 30,000 + TPS. Of course, TCC also has its shortcomings, which are mainly reflected in the following two aspects:
Services are highly intrusive. Each branch of the service logic needs to implement the try, Confirm, and Cancel operations. Therefore, the application is highly intrusive and costly.
The implementation is difficult. In order to meet the requirement of consistency, idempotent operation should be fully considered, repeated execution should be allowed, resource suspension should be prevented, concurrent access control and data visibility control should be done.
As a result, TCC solutions are mostly adopted by large companies with strong R&D capabilities and urgent needs. Microservices advocate the lightweight of services, while the processing logic of many transactions in TCC scheme needs to be implemented by its own coding, which is complicated and requires a large amount of development.
Message-based final consistency scheme
The message consistency scheme ensures the consistency of data operation of upstream and downstream applications through message middleware. The basic idea is to place the local operation and the send message in a local transaction, ensuring that either the local operation and the send message succeed or both fail. The downstream application subscribes to the message system and performs operations after receiving the message.
In essence, the message final consistency scheme converts a distributed transaction into two local transactions and then relies on the retry mechanism of the downstream business to achieve the final consistency. Message-based final consistency schemes are also highly intrusive applications that require a lot of business transformation and are very costly.
Invasion code scheme is based on the existing situation of “forced” to launch solution, in fact they implement very not elegant, TCC, for example, a transaction calls are usually accompanied by a series of increases on the transaction interface contrarian, submit logic inevitably accompanied by a rollback logic, this code will make the project very bloated, high maintenance costs.
In view of the pain points of the distributed transaction solution mentioned above, it is obvious that our ideal distributed transaction solution must have good performance and be non-intrusive to the business, the business layer does not need to care about the constraints of the distributed transaction mechanism, and achieve the separation of transaction and business, which is the non-intrusive transaction mainly recommended in this paper.
Non-intrusive transaction scheme
A. Typical architecture
The typical architecture of non-intrusive transactions is shown in the figure below:
The transaction core components include:
A Transaction Coordinator (TC) is a distributed Transaction brain that generates and maintains global transactions, branch transactions, and promotes the two-stage processing of Transaction submission and rollback. TC Server provides transaction coordination capabilities in a clustered form.
Transaction Manager (TM) : Defines the boundaries of global transactions and communicates with the Transaction coordinator to start, commit, or roll back global transactions.
Resource Manager (RM) : A Resource Manager that manages the resources for branch transaction processing, communicates with the transaction coordinator to open and close the transaction branch, and receives instructions from the transaction coordinator to complete the two-phase branch transaction commit or rollback.
Lock Server (LS) : distributed Lock Server, which can be used to query, Lock, and release the resources of ongoing distributed transaction operations.
A distributed transaction is called a global transaction, and a branch transaction is a local transaction that meets ACID. The core idea of non-invasive transaction is that the resource manager intercepts business SQL, parses it and performs some additional data processing, generates undo logs and saves them. Once global transaction rollback occurs, all branch transaction rollback is completed through the undo logs corresponding to each branch transaction.
It is easy to imagine that two global transactions that modify the same data in parallel could result in data errors resulting from undo log complete rollback. The solution is to Lock the transaction modification data through the Lock Server. The Lock is released immediately after the global transaction is committed, and the global transaction rollback waits for the branch transaction rollback to complete.
B. Typical process
Typical distributed transactions are performed as follows:
1.TM requests TC to start a new global transaction, TC creates the global transaction and returns the global transaction ID (XID).
2. Build transaction context based on XID and propagate through the call chain of microservice.
3.RM found itself in the transaction context, obtained the global transaction ID, parsed SQL, generated undo log and distributed transaction lock data, and requested TC to create a branch transaction.
4. The TC is locked by LS. After the lock is successfully added, the BRANCH transaction ID is created and returned.
5.RM associates the branch transaction ID with the Undo log and commits it in a local transaction with the original SQL.
6. Repeat 3 to 5 to create a branch transaction for each local transaction within the scope of the global transaction.
7. If there is no exception within the global transaction boundary, TM requests TC to submit the global transaction; If there is an exception, TM requests TC to roll back the global transaction.
- TC marks the global transaction status and releases the lock via LS immediately if it is committed. Push all branch transactions under the global transaction corresponding to XID for two-stage processing, and send the request to RM.
9.RM commits or rolls back the branch transaction and returns the status to the TC.
10.TC uses LS to release locks for branches that have completed rollback. When all branches are complete, the global transaction results are returned to TM.
The two-stage transaction processing is more critical, which is highlighted here.
C. Branch transaction commit
If the global transaction status is commit, a branch commit is initiated for each branch, as shown in the following figure:
RM received the branch transaction commit request, first saved the branch transaction ID in the queue and returned. A thread periodically retrieves a batch of branch transaction ids from the queue and constructs SQL to delete the corresponding undo log in batch. Branch transaction commits can be processed asynchronously in batches because global transactions have already been committed and undo log is no longer important as an intermediate state, as long as it is cleaned periodically.
D. Branch transaction rollback
If the global transaction status is rolled back or timed out, a branch rollback is initiated for each branch, as shown in the following figure:
RM received the branch transaction rollback request, started a local transaction, found the corresponding Undo log based on the branch ID, constructed the rollback SQL statement and executed it, deleted the Undo log, and then committed the local transaction. After receiving the response, TC clears the resources occupied by the branch through LS.
E. Performance analysis
A significant performance advantage of non-invasive transactions over XA two-phase commit is that resources are locked for less time. In real business, we know that the vast majority of transactions are committed and a small percentage are rolled back. For XA, resources are released in two phases, whether committed or rolled back. For the non-intrusive transactions described in this article, there is no need to take locks in phase 2 for global transactions in the commit state, and only a small proportion of global transactions in the rollback state need to release locks in phase 2.
Non-invasive transactions are not limited to the DATABASE XA interface and are fully controllable. Key components such as TCS, RM, and LS have a significant impact on performance. High performance can be achieved through proper design and implementation. The practice of non-invasive transactions has proved that it can easily meet the performance requirements of most highly concurrent business scenarios.
Typical core business system distributed transaction transformation example
Huawei Cloud Stack is modified for distributed transactions of the core service system of a carrier. The customer service challenges the distributed system in common concurrent scenarios such as the peak hours of charging and charging services at the beginning of the month.
- High concurrency distributed transaction access account table, XA two-stage commit due to long lock time, serious impact on business. Overall performance requirements of 1000+ TPS, traditional or open source distributed transactions are difficult to meet the requirements of high availability and performance.
- XA transaction consistency issues with other database operations. You need to treat the XA transaction as one branch of the DTM TCC transaction and the other database operations as another branch.
Huawei cloud Stack hybrid cloud solution for distributed transaction middleware DTM through a series of technical innovation, to provide high performance, high availability, high reliability, high safety, low invasion, easy to use a distributed transaction service, support TCC and non-intrusive affairs two models, power enterprises value-chain reconstruction, elegant solution to data consistency under the distributed system problems.
Click to follow, the first time to learn about Huawei cloud fresh technology ~