A transaction is a grouping of all involved operations into an indivisible unit of execution. All operations within a transaction are either executed at all or not executed at all. This is the common understanding of transactions.
In general, transactions are targeted at databases, but they are not. Some message queues such as RocketMq, Kafka and others also involve transactions. There is a term for these components called Resource Managers (RM)
Distributed transaction is a new concept derived from the application of distributed system more and more widely. Generally, IT refers to RM on different nodes. With the popularity of microservices, distributed transactions deserve more and more attention.
Local transactions
This article is intended to introduce distributed transactions, but when it comes to distributed transactions, local transactions are an unavoidable topic. So let’s take a quick look at the concept of local transactions.
ACID
ACID is the four characteristics that a transaction must have. They are:
1. A is for atomicity. It means that an operation must be an indivisible unit that is either executed or not executed. It cannot exist in a state where half of the operation is executed and the other half is not executed. 2. The system is in a consistent state before and after the transaction is executed. Isolation means that different transactions should be isolated from each other without being affected. 4. D is persistence, indicating that the execution of transactions should be permanent and cannot be lost due to system restart or crashCopy the code
In general, the I in ACID leads to another set of concepts: visibility issues and isolation levels.
Visibility problems are problems caused by the visibility of operations in one transaction in another transaction. In general, the higher the visibility, the more likely it is to cause problems.
From high to low visibility, there are several problems:
1. Read unsubmitted. One transaction can read uncommitted changes made by another transaction. This is the most serious problem. Uncommitted data is dirty data. A non-repeatable read is when a transaction reads a record the first time and reads the same record the second time. The reason is that between the two reads, another transaction updated the record and committed it. Phantom read. Phantom read is when a transaction reads the same data twice, the first and second read quantity is not the same. The reason is that between these two reads, another transaction added/deleted records and committed them. As you can see, both unrepeatable reads and phantom reads are actually caused by another transaction changing the data. The difference is whether the other transaction operates on update or INSERT/DELETECopy the code
Isolation levels and visibility issues are closely related, and each isolation level exists to solve visibility issues
The isolation levels are as follows:
1. Read unsubmitted. This is the lowest level of isolation and clearly no visibility issues have been resolved. 2. Read committed. Resolved"Read uncommitted"Question 3. Repeatable reading. To solve"Unrepeatable"Problem 4. Serialization, solved"Phantom read"The problem.Copy the code
Of course, higher isolation levels mean lower throughput for processing data.
Transactions in mysql
In mysql, you can use begin, COMMIT, rollback to implement transactions.
1. Begin starts a transaction. 2. Commit Commits a transaction. 3. Rollback Rolls back a transactionCopy the code
Mysql transactions are committed automatically by default. You can use
setThe autocommit = 0 orset autocommit = 1
Copy the code
To turn auto commit off/on.
It is worth noting that mysql’s default isolation level is “repeatable read “, but the older Version of InnoDB (mysL5.7) actually achieves” serialization “through gap locking.
Transactions in Spring
Spring supports transactions, but transactions in Spring are proxies that rely on begin, COMMIT, and ROLLBACK.
Programmatic transaction
Programed transactions are transactions that manually control commit and ROLLBACK through transactionTemplate and TransactionManager.
Programmatic transactions have more flexibility than declarative transactions, such as committing or rolling back to a code segment.
Declarative transactions (proxies)
Declarative transactions are simply annotated transactions. Declare a transaction by adding Spring’s @Transactional annotation to a method or class, hence the name “declarative transaction.”
The usual arguments for @Transactional are:
Propagation 2. Isolation 3. Norollbackfor, specifying exceptions that do not rollback transactions 4. Rollbackfor5. Timeout: specifies the timeout period for the transactionCopy the code
Declarative transactions are based on dynamic proxy and AOP, which simply means adding begin, COMMIT, rollback logic before and after the execution of a specific method.
The biggest benefit of declarative transactions is simplicity and low code intrusion. The corresponding disadvantage is that the granularity is not easy to control, the smallest granularity is also added to the method.
Propagation of transactions (nested transactions)
Transaction propagation, in layman’s terms, is how Spring should handle the nesting of transactions in the invocation chain of multiple methods.
This concept is also derived from the principle of declarative transactions. The principle of declarative transactions is that a dynamic agent starts and commits a transaction before and after a method.
Consider the following scenario:
@Transational
public void A(){
B();
// do something
}
@Transational
public void B(){
//dosomething
int a = 1/0
}
Copy the code
Both methods declare the transaction open, and it is clear that B will throw an exception and B’s transaction will be rolled back. So is A going to get rolled back as well. This needs to be solved by the propagation mechanism of transactions.
Spring has seven transaction propagation mechanisms:
1. Propagation_require. Default propagation type. Propagation_support indicates that the current method needs to be executed in another transaction. If a transaction is not enabled, you need to enable a 2.propagation_support. 3. Propagation_mandatory. An exception is thrown if the current method does not exist. Indicates that the current method needs to be executed in a new transaction. If a transaction already exists when the current method executes, suspend it. 5.propagation_not_support. 6.propagation_never. If a transaction exists, suspend it. Indicates that the current method should not be executed in a transaction, and if a transaction exists, an exception is thrown. If there are nested transactions, each method commits and rolls back in its own separate method.Copy the code
Distributed transaction
The DTP, XA and JTA
The DTP model
Distributed Transaction Processing (DTP) is a Distributed Transaction model proposed by X/Open organization.
A DTP model contains at least three elements:
1. AP, application, used to define the boundary between the start and end of a transaction. 2. RM, resource manager. In theory, all database resources that support persistence can be a resource manager. And responsible for committing and rolling back transactions.Copy the code
XA specification
XA is a distributed transaction specification from X/Open that is language independent.
The XA specification defines the interface through which RM and TM interact. For example, TM can manage RM through the following interfaces:
Xa_open and xa_CLOSE are used to establish connections with RM. Xa_star and xa_END are used to start and end a transaction. Xa_prepare, xA_COMMIT, and xa_rollback are used to pre-commit. Commit and roll back a transaction 3. Xa_recover is used to roll back a pre-committed transactionCopy the code
JTA specification
The JTA specification is what can be considered the XA specification implementation version of the Java language.
JTA defines a series of distributed transaction-related interfaces:
1. javax.transaction.Status: Defines the state of the transaction, for example, prepare, commit rollback, etc etc. 2. The javax.mail., Synchronization, Synchronization 3. Javax.mail. Transaction. The transaction: 4 transaction javax.mail. Transaction. TransactionManager: transaction manager 5. Javax.mail. Transaction. UserTransaction: Used to declare a distributed transaction 6. Javax.mail. Transaction. TransactionSynchronizationRegistry: transaction synchronization register 7. Javax.mail., xa. XAResource: Javax.transaction.xa. Xid: indicates the transaction IDCopy the code
The above interfaces are implemented by different roles (RM, RM, etc.).
Phase 2 Submission (2PC)
Two-phase commit is the simplest solution for distributed transactions. It divides a transaction into two phases: Request COMMIT and COMMIT/ROLLBACK.
The first phase is the request phase, where the coordinator asks all RMS if the transaction can be committed. Reply YES if it can be submitted, NO otherwise.
The second phase is the commit phase, where the coordinator decides whether the distributed transaction can be committed based on all the RMS ‘responses. If all RMS reply YES, the transaction can be committed, otherwise the transaction is rolled back.
The two-phase commit idea is simple, but it has a lot of problems.
- Coordinator single point problem
- Phase 1 blocking problem
- In the second phase, due to network problems, RM did not receive the COMMIT /rollback instructions, resulting in data inconsistency.
Three-phase Submission (3PC)
Three-phase commit is proposed to solve the problems of two-phase algorithm. It breaks the transaction commit into three phases:
1. Cancommit phase, which is similar to the request phase in 2PC. 2. If the CANCOMMIT phase does not all respond to YES or there is an RM timeout, then the entire transaction is rolled back. Otherwise, send the precommit command and ask each RM to execute the transaction and respond to the ACK. 3. doCOMMIT phase. If the RM does not respond to ACK or times out during the PreCOMMIT phase, the entire transaction is rolled back. Otherwise, issue the doCOMMIT directive and let each RM actually commit the transaction.Copy the code
TCC
TCC stands for try-comfirm-cancel. It is a flexible distributed transaction solution that has been popular over the years.
The so-called “flexible” refers to “rigid transactions” such as 2PC and 3PC. Flexible transactions no longer pursue strong consistency, only final consistency.
TCC breaks down a distributed transaction into the following three steps:
1. Try phase. Each transaction participant checks service consistency and reserves system resources. For example, lock inventory 2. Comfirm phase. The transaction participant uses the resources reserved during the try phase to perform service operations. 3. If any transaction participant fails to try during the try phase, cancel is performed. Cancel includes resource release and reverse compensationCopy the code
T-c-c is the same as request-commit-rollback 1 in 2PC. In this sense, TCC is essentially a 2PC solution.
There are also two concepts in TCC, master business services and slave business services.
A master business service can be colloquially understood as the service that initiates a transaction. For example, a purchased service invokes an inventory service and an order service, respectively. The purchase service can then be considered a master business service.
Correspondingly, the “inventory services” and “order services” mentioned above are slave business services.
Why distinguish between the two services in the first place? Because their duties are different:
1. The slave service must provide the try, comfirm, cancel methods. The main business service needs to log transactions and, with the coordination of the transaction manager, invoke the three TCC methods from the business service as appropriate.Copy the code
The TCC model is shown in the figure below:
Photos come from www.tianshouzhi.com/api/tutoria… , also highly recommend this blog, benefit a lot.
The message queue
Using message queues to achieve ultimate consistency is another idea of flexible distributed transactions. Its main idea is to complete a distributed transaction asynchronously through message queues, combine with scheduled tasks for retry and compensation, and require manual intervention when necessary.
In summary, there are three ideas: “best efforts notification “,” local message tables “and “MQ transaction messages”.
Best efforts to inform
Best effort notification means that the active notifier will try its best to notify the receiving party of the processing result. If the notification fails, a maximum of X retries will be made. If the failure persists, the active party provides an interface for the query, and the receiver can initiate the query.
This kind of thought is the simplest, in fact, the application is more. Typical examples are:
1. Sending back SMS status of the carrier 2. Sending back payment status of wechat and AlipayCopy the code
Local message table
As the name implies, the local message table uses a local database to maintain an intermediate state of transaction completion. In the process of distributed transaction execution, all transaction participants update the state of the message table after completing the operation, and gradually complete a whole transaction.
In abnormal cases, the scheduled scheduler periodically detects the unfinished transactions in the message table and initiates retries. For the scheduled scheduling solution, see the six postures for executing a scheduled task in Java
If either party fails to complete the transaction, human intervention compensates.
If there is a picture above:
1. The producer writes local message tables and service data first and uses local transactions to ensure success. 2. Consumers consume data, again performing local transactions. Update the state of the local message table after success. What about failure? You could send a message to the producer for rollback, but that would be more complex (TCC is better than TCC if the producer is required to implement TCC as well). The producer may write the service data successfully, but fail to send the MQ message. In this case, the local message table still has the corresponding unfinished transaction. Then the scheduled task will be scanned and retry. Eventually, the entire distributed transaction can be completed.Copy the code
Of course, the figure above is not 100% perfect, but the local message table is more of an idea, and the implementation may vary, depending on the specific business scenario and business requirements.
MQ transaction messages
The local message table solution was proposed at a time when transactional messages were not generally implemented in MQ. But now both Kafka and rocketMQ are starting to support transactional messages.
With transaction messages, the job of local tables and scheduled tasks is taken care of by MQ’s transaction mechanism.
For example, www.tianshouzhi.com/api/tutoria… It introduces the scheme.
Distributed transaction framework
In practical applications, there are two scenarios for distributed transactions. Again, using a purchase service as an example, the scenarios for these two distributed transactions might be:
- First, multiple RMS can be operated on the same service
- In the second case, a service invokes multiple services through RPC, indirectly operating multiple RMS
In today’s world of microservitization, dividing the library by business should be a basic principle for most companies to build their architectures. So in that sense, the second scenario seems more realistic.
Of course, the first scenario still exists. For example, in the “local message table” solution above, there is a need to interact with multiple RMS in the same service.
There are many distributed transaction open source frameworks on the market, such as TCC-Transactio and so on. Here we take a look at Atomikos and Seata
atomikos
Atomikos is a well-known open source framework for distributed transactions. It has an implementation of the JTA/XA specification, which is free and open source, and a commercial paid version of the TCC mechanism.
Here is an introduction to the implementation of the JTA/XA specification.
As mentioned in the JTA specification section above, JTA defines a set of interfaces that are implemented by different roles. Atomikos’s role is a transaction manager, which implements the following interfaces:
1. Javax.mail. Transaction. UserTransaction corresponding implementation is com atomikos. Icatch. Jta. UserTransactionImp, Users only need to directly manipulate this class is to implement a JTA distributed transaction 2. Javax.mail. Transaction. The TransactionManager corresponding implementation is com atomikos. Icatch. JTA. UserTransactionManager, Atomikos use the implementation class to manage the affairs 3. Javax.mail. Transaction. The transaction is corresponding to the implementation of com. Atomikos. Icatch. Jta. TransactionImpCopy the code
Simple example application atomikos (or from www.tianshouzhi.com/api/tutoria.) :
- Introduction of depend on
< the dependency > < groupId > com. Atomikos < / groupId > < artifactId > the transactions - JDBC < / artifactId > < version > 4.0.6 < / version > </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> The < version > 5.1.39 < / version > < / dependency >Copy the code
- The demo instance
import com.atomikos.icatch.jta.UserTransactionImp; import com.atomikos.jdbc.AtomikosDataSourceBean; import javax.transaction.SystemException; import javax.transaction.UserTransaction; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.Statement; import java.util.Properties; public class AtomikosExample { private static AtomikosDataSourceBean createAtomikosDataSourceBean(String dbName) { // Properties p = new Properties(); p.setProperty("url"."jdbc:mysql://localhost:3306/" + dbName);
p.setProperty("user"."root");
p.setProperty("password"."your password"); / / use AtomikosDataSourceBean encapsulation com. Mysql. JDBC. Jdbc2. Optional. The MysqlXADataSource AtomikosDataSourceBean ds = new AtomikosDataSourceBean(); // Atomikos requires the name of each AtomikosDataSourceBean, which is set to the same as dbName for easy memorization. ds.setXaDataSourceClassName("com.mysql.jdbc.jdbc2.optional.MysqlXADataSource");
ds.setXaProperties(p);
return ds;
}
public static void main(String[] args) {
AtomikosDataSourceBean ds1 = createAtomikosDataSourceBean("db_user");
AtomikosDataSourceBean ds2 = createAtomikosDataSourceBean("db_account"); Connection conn1 = null; Connection conn2 = null; PreparedStatement ps1 = null; PreparedStatement ps2 = null; UserTransaction userTransaction = new UserTransactionImp(); Try {// start the transaction userTransaction.begin(); SQL conn1 = ds1.getConnection(); ps1 = conn1.prepareStatement("INSERT into user(name) VALUES (?) ", Statement.RETURN_GENERATED_KEYS);
ps1.setString(1, "tianshouzhi");
ps1.executeUpdate();
ResultSet generatedKeys = ps1.getGeneratedKeys();
int userId = -1;
while(generatedKeys.next()) { userId = generatedKeys.getInt(1); // int I =1/0; // int I =1/0; // execute SQL conn2 = ds2.getConnection() on DB2; ps2 = conn2.prepareStatement("INSERT into account(user_id,money) VALUES (? ,?) "); ps2.setInt(1, userId); ps2.setDouble(2, 10000000); ps2.executeUpdate(); // Two-stage submission userTransaction.com MIT (); } catch (Exception e) { try { e.printStackTrace(); userTransaction.rollback(); } catch (SystemException e1) { e1.printStackTrace(); } } finally { try { ps1.close(); ps2.close(); conn1.close(); conn2.close(); ds1.close(); ds2.close(); } catch (Exception ignore) { } } } }Copy the code
Obviously, this example is a distributed transaction that belongs to scenario 1. So if you have distributed transactions in scenario 1, you can just use Atomikos, which is simple, direct and efficient.
But then again, the distributed transactions of the actual scenario are more scenario 2. It is clear that simple JTA transactions cannot handle distributed transactions in Scenario 2. Distributed transactions in scenario 2 also require solutions such as TCC or message queue flexible transactions.
seata
Seata is an open source distributed transaction landing solution framework based on the integration of Fescar(TXC/GTC/ Fescar) and TCC-Transaction, which implements three modes: AT, TCC and SAGA.
Seata. IO /zh-cn/docs/… The documentation is relatively incomplete, but it’s good enough for understanding. Here is also a brief introduction.
The term
Tc-transaction coordinator maintains the state of global and branch transactions and drives global transaction commit or rollback.
The TM-transaction manager defines the scope of a global transaction: start, commit, or roll back the global transaction.
RM – Resource manager manages resources for branch transaction processing, talks to TCS to register branch transactions and report status of branch transactions, and drives commit or rollback of branch transactions.
AT mode
AT is Automatic Transaction, which means that this mode is non-intrusive to services and does not require service transformation. But there are requirements for business:
1. Based on a relational database that supports local ACID transactions. 2. Java applications use JDBC to access databases.Copy the code
The general logic of AT mode is shown as follows:
AT mode also adopts the 2PC idea, adding compensation mechanism, compensation mechanism is similar to the undo log in InnoDB.
Undo logging is actually a reverse compensation, such as the INSERT statement, when the transaction is rolled back, a corresponding DELETE statement is executed
Translation of the pattern in plain English (my understanding) is:
1. In phase 1, the undo log is created, and the undo log and the operation of the business are committed together in the local transaction. Phase 2, under the coordination of THE TC, commits quickly if it can be committed. If rollback is required, reverse compensation is performed based on the rollback log.Copy the code
Of course, the specific application is not so simple, more reference to the official website
TCC mode
TCC mode is the idea of TCC introduced above. The TCC mode of SEATA is shown below:
In fact, TCC mode is similar to AT mode, which is also an evolutionary version of 2PC. Under the coordination of transaction coordinator (TC), multiple sub-transactions are committed and rolled back.
The difference is that AT mode rollback is compensation AT the database resource level (rollback logs are performed), whereas TCC calls custom logic for rollback (rollback code logic is performed).
SAGA mode
Saga is a long transaction solution. In the Saga model, each participant in the business process commits a local transaction, and when one of the participants fails, the previous successful participants are compensated. Both the one-phase forward service and the two-phase compensation service are implemented by the business development
The saga concept was first proposed in 1987, but seata’s Saga mode was officially supported in August of this year. I don’t understand it well enough, so I don’t teach fish how to swim. Just take a look
conclusion
Distributed systems are never a simple concept, especially distributed transactions in distributed systems.
Perhaps the idea of distributed transactions is relatively simple, but there are a lot of details and difficulties we need to pay attention to and overcome. Therefore, most data companies do different practices according to their business reality, rather than copying ideas completely.
The other side of this is that there really isn’t a perfect distributed solution that we can just copy. Alibaba’s Seata is also open source, and it hopes that one day it will really become a one-stop-shop solution to the problem of distributed transactions
reference
www.tianshouzhi.com/api/tutoria…
Seata. IO/useful – cn/docs /…