Distributed Transaction Exploration (1)

One, XA specification

An organization called X/Open defines the distributed Transaction model, which has several roles: AP (Application), TM (Transaction Manager), RM (Resource Manager), CRM (Communication Resource Manager), in fact, Application is our system, TM is a component embedded in the system that specializes in managing transactions across multiple databases. RM is simply a database (like MySQL), and CRM can be message-oriented (but not necessarily).

Then it defines a very important concept, is a global transaction, this thing namely a transaction across multiple databases, is a transaction, involves multiple database operations, and to ensure the multiple databases, any one operation fails, the other all the operation of the library rolled back, this is called a distributed transaction.

To put it bluntly, is defined between the TM and RM interface specification, is to manage the communication between the components distributed transaction with the database of an interface, namely the means such as distributed transaction management components, TM will be defined according to the XA interface specification, communication and interaction with each database, told everyone, XA is only a specification, and the implementation is provided by the database manufacturer. For example, MySQL will provide the implementation of the XA specification interface functions and class libraries, etc.

2. 2PC theory

2PC is simply a theory of distributed transactions based on the XA specification, which can also be called a specification or protocol. Two-phase-commitment-protocol, 2PC, is actually based on the XA specification to enable the implementation of distributed transactions, defining many details of the implementation of distributed transactions.

2.1 Preparation Phase

For example, TM sends a prepare message to each database, asking each database to prepare for each operation in the distributed transaction. Each library open a transaction in the local first, then execute SQL, is ready only owe the east wind, and note here each database will be ready to submit or is rolled back, there is a corresponding logging Every database and then returns a response message to the transaction manager, if successful will send a message of success, If it fails, send a failure message.

2.2 Submission Phase

In the first case, it’s embarrassing to discover that a database has failed. Or wait for a long time, a database is dead or alive do not return a message, as missing, do not know what to do, also trouble at this time will directly determine the distributed transaction failure, after all, a database there error, and then inform all databases, all rollback.

Actually here you can think is notify each database, the transaction rollback their local don’t have to, then each library rollback the TM will be informed later, TM is considered the distributed transaction is rolled back But, if the TM receives all database of the returned message is successful, that we will notify each database directly send a message and said, Then each database submits the transaction locally, and TM is notified when the transaction is submitted. If TM finds that the transaction of all databases has been successfully submitted, it considers the whole distributed transaction as successful.

2.3 the illustration

2.4 the problem

Synchronous blocking problem

After TM sends a prepare message to RM, RM starts a transaction locally and executes the SQL statement to lock the resource. After the execution is complete, THE resource is not released but the execution result is returned to TM. After TM sends a COMMIT message and RM commits the transaction, the locked resource will be released. Then, if there are other resources that need to access the locked resource, it will remain blocked.

TM single point problem

TM only one, so TM hung, the whole business is directly cool cool.

Transaction state loss problem

Let’s say TM1 hangs after sending a COMMIT message to RM1, and we select a new TM named TM2, but the TM2 does not know the current transaction state. We do not know which RMS have sent commit messages to.

Split brain problems

If we have three RMS, two of them have received the commit message, and one of them is unable to receive the commit message because of network problems, then the data is inconsistent, and the entire distributed transaction is cold.

3. 3PC theory

In fact, 3PC is to solve some problems of 2PC and do some optimization

3.1 CanCommit phase

A CanCommit message is sent to each database, and each database returns a result. In this case, the database does not execute the actual SQL statement.

3.2 PreCommit phase

If each library returns a success for the CanCommit message, then it enters the PreCommit phase. TM sends the PreCommit message to each library. This phase is equivalent to phase 1 in 2PC. If a library returns a failure on the CanCommit message, TM sends abort messages to each library to end the distributed transaction.

3.3 DoCommit phase

If all libraries return success for PreCommit phase, then send DoCommit message to each library for transaction commit. If all libraries return commit success to TM, then the distributed transaction succeeds. If one of the libraries returns a failure for PreCommit, or fails due to timeout, TM considers the distributed transaction to have failed and sends abort messages to each library to roll back. After each library successfully rolls back, TM is notified of the distributed transaction rollback.

3.4 improve the point

The CanCommit phase is introduced, and the DoCommit phase is followed by a timeout mechanism, which means that if a library receives a PreCommit and returns a success, if the timeout period expires, It has not received the DoCommit message or abort message sent by TM, so IT is directly determined that TM may be faulty and will execute the DoCommit operation and commit the transaction by itself.

If the library received a PreCommit message, the first phase of the library returned a success for CanCommit, so that TM will send a PreCommit. Therefore, the timeout mechanism is based on the introduction of CanCommit. With a CanCommit, there is one more phase, so people can execute the timeout mechanism themselves. This solves the single point problem of TM hanging.

The resource blocking problem is also optimized, because a library that does not receive a DoCommit message will not lock up the resource and will commit to release the resource itself, thus reducing the resource blocking problem slightly better than 2PC

3.5 defects

If TM sends abort messages to libraries during the DoCommit phase and a library fails to receive abort messages due to a split brain problem, the commit operation itself will cause data consistency problems between libraries.

MySQL supports XA

The logic here is also relatively simple, the first is to obtain the link address of each library, through different libraries, create the corresponding XAConnection, XAResource, set the ID of distributed transaction, set the ID of each library transaction, and then assemble their own SQL statements to execute. After the local execution is complete, send a prepare message and judge the result. If the local execution is successful, enter the second phase commit. If the local execution is successful, roll back the local execution.

MySQL XA supports the 2PC theory.

public static void main(String[] args) throws SQLException {
        // Create an RM instance of the commodity library
        Connection productConnection = DriverManager.getConnection(
                "jdbc:mysql://localhost:3306/product"."root"."root");
        // The true argument here refers to the printed logs of XA distributed transactions
        XAConnection productXAConnection = new MysqlXAConnection(
                (com.mysql.jdbc.Connection)productConnection, true); 
        // This XAResource is actually an object instance in RM (Resource Manager) code
        XAResource productResource = productXAConnection.getXAResource();
        
        // Create an RM instance of the order library
        Connection orderConnection = DriverManager.getConnection(
                "jdbc:mysql://localhost:3306/order"."root"."root");
        XAConnection orderXAConnection = new MysqlXAConnection(
                (com.mysql.jdbc.Connection)orderConnection, true);
        XAResource orderResource = orderXAConnection.getXAResource();
      
        // The following two things are components of a distributed transaction ID (TXID)
        byte[] ptrid = "p123".getBytes();
        int formatId = 1;
        
        try {
            // This is the identity of the sub-transaction of the goods store in the distributed transaction
            // The operation we want to perform in the commodity store belongs to a sub-transaction of the distributed transaction, which has its own identity
            byte[] bqual1 = "c001".getBytes();
            Xid xid1 = new MysqlXid(ptrid, bqual1, formatId); // This xID represents a sub-transaction in the merchandise store
            
            // This is to define the SQL statement to be executed in the commodity library in the distributed transaction using the START and END operations
            // Define which SQL statements are to be executed in the distributed transaction database
            productResource.start(xid1, XAResource.TMNOFLAGS);
            PreparedStatement productPreparedStatement = productConnection.prepareStatement(
                    "UPDATE product SET name = 'new' WHERE id=1");
            productPreparedStatement.execute();
            productResource.end(xid1, XAResource.TMSUCCESS);
            
            // This is the identity of the sub-transaction of the order library in a distributed transaction
            // In a distributed transaction involving multiple sub-transactions of databases, the TXID of each sub-transaction is partly the same and partly different
            byte[] bqual2 = "c002".getBytes();
            Xid xid2 = new MysqlXid(ptrid, bqual2, formatId);
            // This means that the START and END operations define the SQL statement to be executed in the integral library in the distributed transaction
            orderResource.start(xid2, XAResource.TMNOFLAGS);
            PreparedStatement orderPreparedStatement = orderConnection.prepareStatement(
                    "UPDATE order SET POINT=POINT+1.2 WHERE id=1");
            orderPreparedStatement.execute();
            orderResource.end(xid2, XAResource.TMSUCCESS);
            
            // So far, nothing has been done, just defining the SQL statements to be executed by the two libraries in a distributed transaction
            
            // Phase 1 of 2PC: Prepare message is sent to both libraries to execute the SQL statement in the transaction, but no commit is committed
            int productPrepareResult = productResource.prepare(xid1);
            int orderPrepareResult = orderResource.prepare(xid2);
            
            // Phase 2 of 2PC: Both libraries send commit messages to commit transactions
            
            // If both libraries return prepare ok, commit all the local transactions
            if (productPrepareResult == XAResource.XA_OK
                   && orderPrepareResult == XAResource.XA_OK) {
                productResource.commit(xid1, false);
                orderResource.commit(xid2, false);
            } 
            Rollback rollback if not all libraries return OK on prepare
            else{ productResource.rollback(xid1); orderResource.rollback(xid2); }}catch(XAException e) { e.printStackTrace(); }}Copy the code