preface
Among distributed transaction solutions, TCC is a more classical pattern, which uses the idea of two-phase commit to achieve the final consistency of distributed transactions. But LATELY I don’t like TCC.
Wechat official account “Programmer Jinjunzhu”, written by Jinjunzhu.
TCC review
What exactly is TCC?
Take the classic e-commerce system as an example, when a customer buys a commodity, the system needs three services to complete it cooperatively. The order service adds orders, the inventory service deducts inventory, and the account service deducts amounts. The diagram below:
If we follow the pattern shown above, with each service committing its own transaction, we are likely to have data inconsistencies. Because the three services use different databases, rather than one atomic operation, for example, the order service submits successfully and the account service fails, the data is inconsistent.
The idea of TCC is to use 2-phase commit. The try phase first tries each service reservation resource. If the reservation succeeds, the transaction will be committed in the COMMIT phase. This requires the addition of a coordination node to issue commands to the three services and obtain branch transaction execution results for each service. The try phase is represented by the following figure:
In the try phase, if resources are successfully reserved for each service, the coordination node will issue the COMMIT command to each service, as shown in the following figure:
Once all services are committed, the transaction completes.
Code implementation
The coordination node needs to provide each distributed transaction with a global transaction ID, called an XID, to bind to each service’s local transaction. Using the account service as an example, let’s look at the code for the three phases try/commit/ Cancel:
This code uses JDBC to handle local transactions. In the try phase we get the connection and save it in the connectionMap. The key is xID, so in the commit/ Cancel phase, Pull connection from connectionMap to commit/rollback.
There is a problem
Is there a problem with the code implementation of the TCC pattern above?
Service cluster
If the order service cluster is deployed on three machines, the try request is sent to order service 1 and the commit request is sent to Order service 2, how can the connectionMap of order service 2 have the value xID =123? The order service local transaction could not be committed.
So if you want to commit a transaction as a hold connection, the coordinating node needs to ensure that the try/commit/ Cancel requests for the same XID are sent to the same machine.
The solution must be to transform the registry or coordinate the nodes to maintain their own service lists. The former enables the registry to couple business code, while the latter effectively discards the registry.
Empty submitted
Both registry and coordination nodes require a lot of work. Is there another way? We make an improvement, here orM framework using Mybatis, the code is as follows:
The commit phase is an empty commit and has no effect on the branch transaction.
Another option is to return true in the try phase and commit the transaction in the COMMIT phase.
But both approaches run counter to TCC’s ideas.
Power etc.
If the coordination node sets timeout retry, the following situation occurs: Order service 1 fails after executing the try method. If the coordination node does not receive a successful reply, it must retry, so that the order service will execute the try method repeatedly.
To circumvent this problem, the try/ Confirm/Cancel methods must incorporate idempotent logic to record the execution status of the local transaction corresponding to the global transaction XID.
Empty the rollback
When using frameworks to implement TCC patterns, there is an empty rollback situation.
As shown in the figure above, the try method failed because the order service 1 node failed, but the global transaction was started, and the framework had to push the global transaction to the end state, which had to call the Order service cancel method to roll back. As a result, the order service ran an empty cancel method.
To solve this problem, the try phase needs to record the branch transaction execution status corresponding to the XID, and the Cancel phase makes its judgment based on this record.
suspension
If a null rollback occurs during the use of SEATA, the global transaction ends after executing the cancel method. However, the order service receives a try request because of network problems. After executing the try method, resources are reserved successfully, but these resources cannot be released.
The solution to this problem is to record the execution status of the branch transaction corresponding to xID in the cancel method, and determine whether the branch transaction has been rolled back during the try phase.
Code intrusion height
TCC’s try/commit/cancel are all intrusive to business code, and even more intrusive if you consider idempotent, empty rollback, suspension, etc.
conclusion
TCC is a very classic pattern in distributed transactions, but even with the help of framework implementation, code implementation is complicated.
In practice, service clustering, empty commit, idempotent, empty rollback, suspension and other issues need to be considered.
Highly intrusive to business code.