Cabbage Java self study room covers core knowledge
Understanding the Nature of Transactions from Single Machine to Distributed Machine (1)
3. Share transactions
In contrast to the discussion in global transactions where a single service uses multiple data sources, a shared Transaction is when multiple services Share the same data source.
It is important to reiterate the distinction between “data source” and “database” : A data source is a logical device that provides data and does not necessarily correspond to a physical device. The most common pattern used in application cluster deployment is to deploy the same set of applications to multiple middleware servers, which constitute multiple replica instances to share the traffic pressure. Although they are connected to the same database, each node has its own proprietary data source, usually open to application code in the form of JNDI middleware. In this case, data access for all replica instances is completely independent without any intersection, and each node still uses the simplest local transaction.
As a concrete example, suppose that the user account, merchant account and commodity warehouse are all stored in the same database, but each domain of the user, merchant and warehouse is deployed with independent microservices. In this case, the business operation of a book purchase will run through three microservices, and all of them will modify data in the database. If we directly to the different data sources is considered a different database, the global transaction, and the next section to tell the distributed transaction is feasible, however, each data source connection for this is a special case of the same physical database, sharing the transaction may have a chance to become the other ways to improve performance, reduce complexity, of course, also is likely to be a false demand.
A theory of feasible solution is directly to individual service sharing database connection, the database connection is based on the network connection, it is with the IP address and port number of binding, the literal meaning of “different node Shared database connection service” it’s hard to do, so the transaction in order to achieve the sharing, you must add a “server” in the middle of the role, Whether it’s a user service, a merchant service, or a warehouse service, they all interact with the database through the same transaction server.
If the external interface of the transaction server is implemented according to the JDBC specification, it can be treated as a pool of remote database connections independent of each service, or directly as a database agent. At this point, the transaction requests issued by the three services may be delivered to the same database connection on the transaction server, through the way of local transactions.
The reason why the theoretical feasibility is emphasized is that the scheme is contrary to the pressure direction in the actual production system, the database in a service cluster is the biggest pressure and the most difficult to scale and expand the disaster area, and there is almost no agent for a database to provide transaction coordination for multiple applications transaction service agent. This is also why it is more likely to be a bogus requirement, and if you have a good reason for multiple microservices to share a database, you have to have a good reason to explain to the team what the purpose of splitting microservices is.
A more common variant of the above scenario exists in daily development: using a message queue server instead of a transaction server. When the services of users, merchants and warehouses operate businesses, all changes to the database are transmitted to the message queue server through messages, which are processed uniformly by the consumers of messages, thus realizing persistent operations guaranteed by local transactions. The concept of “shared transactions” and the two methods listed here are not recommended in practice, and there are few successful cases of this approach.
Distributed transactions
Distributed Transaction refers to a Transaction mechanism in which multiple services access multiple data sources at the same time. If strictly speaking, it should be called “Transaction mechanism in Distributed service environment”.
4.1 CAP and ACID
CAP Theorem (Consistency, Availability, Partition Tolerance Theorem), also known as Brewer Theorem, originated in July 2000, Is a conjecture presented at the ACM Symposium on Principles of Distributed Computing (PODC) by Eric Brewer of the University of California, Berkeley. The theorem describes a distributed system in which at most two of the following three properties are true when it comes to sharing data:
- Consistency: represents that data is consistent with expectations at any time and in any distributed node. Consistency in distributed research is a seriously defined concept with many types of segmentation, and that consistency for copy replication is not strictly the same as consistency for database state here.
- Availability represents a system’s ability to provide services continuously. To understand Availability, one has to understand two key metrics: Reliability and Serviceability. Reliability was measured by Mean Time Between Failure (MTBF). Maintainability was measured using Mean Time To Repair (MTTR). Availability measures the ratio of the time the system can be used normally to the total time, which is represented as: A=MTBF/ (MTBF+MTTR). Availability is the ratio value calculated by reliability and maintainability, such as 99.9999% available, which means the average annual fault repair time is 32 seconds.
- Partition Tolerance: represents the ability of the system to correctly provide services even after some nodes lose contact with each other due to network reasons, that is, when forming “network Partition” with other nodes in the distributed environment.
Suppose a transaction request is jointly responded by “Account node 1”, “merchant node 2”, and “warehouse node N”. When a user buys a commodity worth 100 yuan, account node 1 should deduct 100 yuan from the user account first. It is easy to deduct 100 yuan from its own database, but it also informs nodes 2 to NODE N of the cluster of this transaction change. To ensure that the associated data in the merchant and other account nodes in the warehouse cluster can be correctly changed, the following situations may occur:
- If the change information is not synchronized to other account nodes in time, the user may be assigned to another node for processing when purchasing another commodity, and the transaction that could not be carried out by mistake occurs due to the incorrect balance in the account, which is a consistency problem.
- If you want to synchronize the change information to other account nodes, you must temporarily stop the transaction service for the user until the data synchronization is consistent, the user may be rejected for the next purchase because the system cannot provide services for the time being. This is an availability problem.
- If for one part of the account service cluster nodes, because the network problems, can’t normal exchange account change to another part of the node information, service at this time no matter what part of node in the cluster external services is likely to be incorrect, the whole cluster can bear as part of the node can still provide correctly in the connection between the interrupt service, this is a partition with sex.
4.2. Rigid versus flexible transactions
CAP cannot have both. Let’s analyze the different influences brought by abandoning C, A and P.
- Abandoning partition tolerance (CA without P) means that we will assume that communication between nodes is always reliable. Always-reliable communication is bound to fail in distributed systems, not because you want it or not, but because partitions will always exist as long as networks are used to share data. One of the easiest real-world examples of partitioning tolerance is a traditional relational database cluster, which still uses networked nodes to work together, but data is not shared over the network.
- , if give up availability (CP without A) means that we will assume that once A network partition occurs, information synchronization time between nodes may be extended indefinitely, and at this point, the problem is equivalent to degenerate to the front of the “global transaction” discussed in the scene of A system using multiple data sources, we can use 2 PC / 3 PCS, Both partition tolerance and consistency are achieved. In reality, the CP system that abandons availability is generally used in scenarios that require high data quality. In addition to distributed database transactions in the DTP model, HBase is also a CP system. Take an HBase cluster as an example. All key value ranges held by the RegionServer are offline until the data recovery process is complete, which takes an unpredictable amount of time.
- If consistency is abandoned (AP without C), it means that we will assume that the data provided between nodes may be inconsistent once partitioning occurs. The AP system that gives up consistency is now the mainstream choice for designing distributed systems. Because P is a natural property of distributed networks, you can neither want nor discard it. And A is the purpose of the construction of distributed normally, if the availability as the node number increase instead of decrease, many distributed system may lose the value of existence, unless the money these involve Banks, securities trading services, rather than interrupt cannot go wrong, or most of the system is can’t tolerate the lower nodes instead of the more availability. At present, most NoSQL libraries and distributed caching frameworks are AP systems. Taking Redis cluster as an example, if a Redis node has network partitions, it still does not prevent each node from providing caching services with its own locally stored data. However, inconsistent data may be returned to the client when the request is allocated to different nodes.
Read here, don’t know whether you to “choose to give up the consistency of the mainstream of the AP system at present is to design a distributed system selection” this conclusion felt a little helpless, the topic of discussion in this chapter is to obtain the purpose of “transaction” originally “consistency”, and in a distributed environment, “consistency” had to be sacrificed, usually abandoned the item properties.
But in any case, we build information systems to ensure that the results of operations are at least correct at the end of the delivery. This means that data is allowed to fail in the intermediate process (inconsistencies) but should be corrected at the output. Therefore, people redefine consistency again, calling the consistency discussed in CAP and ACID as “strong consistency”, sometimes also called “linear consistency”, and calling the pursuit of “weak consistency” for AP system that sacrifices C to obtain the correct result as much as possible.
Eventual Consistency, a slightly stronger exception to weak Consistency, is something called Eventual Consistency, which means: If the data has not been changed by another operation for a period of time, it will eventually achieve the same result as the strong consistency process. The algorithm for final consistency is sometimes called “optimistic replication algorithm”.
In “distributed transactions”, the goal also has to be reduced from the strong consistency pursued by the previous three transaction modes to the pursuit of “final consistency”. Due to the change in the definition of consistency, the meaning of the word “transaction” has also been broadened. People refer to transactions using ACID as “rigid transactions”, and the common practices of distributed transactions described below are collectively referred to as “flexible transactions”.
4.3. Reliable event queues
The ultimate concept of consistency is eBay’s systems architect Dan Pritchett’s 2008 ACM paper “Base: An Acid Alternative, which summarizes An approach to using BASE for consistency purposes independent of the strong consistency obtained by Acid. BASE stands for Basically Available, Soft State, and Eventually Consistent, respectively.
We continue to use the transaction scenario example to explain the specific practice of “reliable event queuing”, again with the goal of correctly modifying the data in account, warehouse, and merchant services during the transaction:
-
Trading of the end user to the system request, the first response to user account deductions, merchant account receipts, inventory goods outbound the three operating a priori error probability assessment, according to the size of the error probability to arrange their operating sequence, this evaluation generally directly reflected in the program code, there are some large system may realize dynamic scheduling. For example, according to statistics, the most likely transaction anomaly is when a user buys a product but does not agree to a deduction, or the account balance is insufficient. Secondly, warehouses find that goods are not in stock enough to deliver goods; The lowest risk is collection, if the merchant collection link, generally will not give what accident. The order should be arranged as the most error-prone first, namely: account deduction → warehouse out → merchant collection.
-
Account service for deduction business, such as successful deduction, then in their own database to establish a message table, stored in a message: “transaction ID: A UUID, deduction: 100 yuan (status: completed), warehouse goods: 1 copy (status: in progress), a merchant collection: 100 yuan (status: In progress), notice that the “debit” and “write message” in this step use the same local transaction to write the account service to its own database.
-
Set up a message service in the system, poll the message table periodically, and send messages with an “in progress” status to both the inventory and merchant service nodes (it can also be sent serially, with one successful and then the other, but this is not necessary in the scenario we are discussing). The following situations can occur.
- 1) Both the merchant and warehouse service have successfully completed the collection and outbound work and returned the execution result to the user account server. The user account service updates the message status from “in progress” to “completed”. The entire transaction is declared to be successfully completed and the final consistency state is reached.
- 2) At least one merchant or warehouse service fails to receive the message from user account service due to network reasons. At this point, because the message stored in the user account server is always in an “in progress” state, the message server keeps repeating messages to the unresponsive service at each poll. The repeatability of this step dictates that all messages sent by the message server must be idempotent, usually designed to carry a unique transaction ID to ensure that the outgoing and incoming actions in a transaction will be processed only once.
- 3) Some or all of the merchant or warehouse services are unable to complete their work. For example, the warehouse finds that the goods are out of stock. In this case, it will continue to automatically resend the message until the operation is successful (such as replenishing new stock) or manual intervention. Therefore, as long as the first step of the reliable event queue business is completed, there is no concept of failure rollback, only success, not failure.
- 4) stores and warehouse service successfully completed the collection and dispatch work, but reply reply message lost due to network reasons, at this point, the user account services will continue to the next message, but due to the operation have idempotence, so will not lead to repeat the payments and outbound, will only lead to the businessman, warehouse server to send a reply message, This process is repeated until the network communication between the two parties is restored.
- 5) There are also some messaging frameworks that support distributed transactions, such as RocketMQ, which supports distributed transactions natively. In this case, the above situations 2 and 4 can also be guaranteed by messaging frameworks.
4.4. The TCC affairs
TCC is another common distributed transaction mechanism, which stands for try-confirm-cancel. A reliable message queue ensures that the final result is relatively reliable and the process is simple (compared to TCC), but there is no isolation at all. There are some businesses where isolation is irrelevant, but there are others where the lack of isolation can cause a lot of trouble.
An obvious problem with this lack of isolation is “overbooking” : it is entirely possible for two customers to successfully purchase the same item in a short period of time, and neither of them buys more than is currently in stock, but the sum of their purchases exceeds the stock. This can be completely avoided if the isolation level is sufficient for a rigid transaction. For example, the above scenario requires a “Repeatable Read” isolation level to ensure that subsequent committed transactions fail because they cannot acquire locks, but this cannot be guaranteed with a reliable message queue. This part of the database local transaction knowledge, refer to the previous explanation. If the business requires isolation, then architects should generally focus on TCC solutions, which are naturally suited for distributed transactions that require strong isolation.
In terms of concrete implementation, TCC is complicated. As a highly intrusive transaction scheme, it requires that the business process must be split into two sub-processes: “reserving business resources” and “confirming/releasing consumption resources”. As the name suggests, TCC is divided into the following three phases:
- Try: In the execution phase, all business executability checks are completed (to ensure consistency) and all required business resources are reserved (to ensure isolation).
- Confirm: No service check is performed during the execution phase. Resources prepared during the Try phase are used to complete services. The Confirm phase may be repeated, so the actions performed in this phase need to be idempotent.
- Cancel: Cancels the execution phase and releases service resources reserved during the Try phase. The Cancel stage may be repeated and also needs to be idempotent.
TCC is similar to the preparation and commit phases of 2PC, but TCC is at the user code level rather than the infrastructure level, which gives its implementation flexibility to design the granularity of resource locking as needed. TCC only operates reserved resources during service execution, and almost does not involve locks or resource contention, thus has high performance potential. But pure TCC is not only benefits, it also brings a higher development costs and business invasive, means has a higher development costs and replacement cost of replacement transaction implementation scheme, so, usually, we will not be fully accomplished by naked coding TCC, but on some distributed transaction middleware (such as ali open source Seata) to complete, Minimize some of the coding effort.
4.5. SAGA
TCC transactions are highly isolated, avoid the problem of “overbooking,” and generally have the highest performance of the flexible transaction patterns covered in this article, but they still do not satisfy all scenarios. The main limitation of TCC is that it is very intrusive. This is not a repeat of the previous section about the amount of work it requires to develop code coordination, but more about the constraints of technical control that it requires.
Scenario example: Due to the increasing popularity of online payment in China, now users and merchants can choose not to open a top-up account in the bookstore system, or at least not to require that they must first top-up into the system from the bank before they can consume. Instead, they are allowed to directly transfer the payment in the bank account through U shield or scanning code payment. This requirement fully conforms to the prevailing status quo of online payment in China, but adds additional restrictions to the transaction design of the system: If the account balance of users and businesses is managed by the bank, its operation authority and data structure can no longer be defined arbitrarily, and it is usually impossible to complete such operations as freezing funds, unfreezing and deducting, because the bank generally will not cooperate with your operation. So the first Try phase in TCC often doesn’t work. We can only consider another flexible transaction scheme: SAGA transactions.
SAGA is a long story, a long narrative, a long series of events. In this paper, we propose a method to improve the operation efficiency of “Long Lived transactions” by splitting a large Transaction into a set of sub-transactions that can be run interlacing. SAGA was originally designed to avoid large transactions locking up database resources for long periods of time, but has since evolved into a design pattern for breaking large transactions into a series of local transactions in a distributed environment.
SAGA consists of two operations:
-
The large transaction split several small transactions and decomposed the whole distributed transaction T into N sub-transactions named T1, T2… , Ti,… , Tn. Each subtransaction should be or can be treated as an atomic behavior. If a distributed transaction can commit normally, its impact on the data (final consistency) should be equivalent to a sequential successful Ti commit.
-
Design the corresponding compensation action for each subtransaction, named C1, C2… , Ci,… , Cn. Ti and Ci must meet the following conditions:
- Both Ti and Ci are idempotent.
- Ti and Ci are Commutative, that is, Ti or Ci are Commutative first.
- The Ci must be successfully committed, that is, regardless of the case of the Ci itself being rolled back. If the Ci fails, it must be continuously retried until it succeeds, or manual intervention is required.
If T1 through Tn commits successfully, the transaction completes successfully, otherwise, one of the following recovery strategies is adopted:
- Forward Recovery: If the Ti transaction fails to commit, the Ti is retried until it succeeds (maximum effort delivered). This type of recovery does not require compensation and is suitable for situations where the transaction is ultimately successful, such as the shipment of goods to someone else after the payment is deducted from someone else’s bank account. The forward recovery can be performed in the following modes: T1, T2… , Ti (failed), Ti (retry)… , Ti + 1,… , Tn.
- Backward Recovery: If the Ti transaction commit fails, Ci is performed to compensate Ti until success (maximum effort delivered). The requirement here is that the Ci must execute successfully (after continuous retries). The reverse recovery mode is T1, T2… , Ti (failure), Ci (compensation)… C2, C1.
SAGA must ensure that all subtransactions are committed or compensated, but the SAGA system itself may crash, so it must be designed with a database like logging mechanism (called SAGA Log) to ensure that the system can track the execution of the subtransactions, such as the execution and compensation of the steps. In addition, although compensation operation is usually easier than freeze/cancellation, but guarantee the forward and reverse recovery process can be rigorous also need to spend a lot of time, such as through the service choreography, reliable event queue to complete, so the SAGA transaction is generally accomplished by naked coding will not, generally is also done on the basis of the transaction middleware, Seata, mentioned earlier, also supports the SAGA transaction mode.
4.6. AT the transaction
AT a transaction is with reference to the XA two paragraphs of the protocol implementation to submit but for XA 2 PC defects, namely in the run-up to have to wait for all data sources are returned after the success, unified coordinator to issue a Commit command to barrel effect (all locks and resources involved in need to wait until the slowest after the completion of the transaction to unified release). Targeted solutions are designed.
The general approach is to automatically intercept all SQL data when service data is submitted, save snapshots of the SQL data before and after modification, generate row locks, and submit the data to the data source of the operation together through local transactions, which is equivalent to automatically recording redo and rollback logs. If the distributed transaction is successfully committed, then the corresponding log data in each data source can be cleaned subsequently. If a distributed transaction needs to be rolled back, “reverse SQL” is automatically generated for compensation based on log data.
With this compensation approach, each data source involved in a distributed transaction can be committed individually, and locks and resources are immediately released. This asynchronous commit mode greatly improves the throughput level of the system compared to 2PC. The price is a huge loss of isolation, even directly affecting atomicity. Compensation instead of rollback is not always successful in the absence of isolation. For example, after the local transaction is committed and before the distributed transaction is completed, the data is compensated and modified by other operations, which is called Dirty Write. In this case, once the distributed transaction needs to be rolled back, it is not possible to implement compensation through automatic reverse SQL, but can only be handled by human intervention.
In general, dirty writes must be avoided, and all traditional relational databases are still locked at the lowest isolation level to avoid dirty writes, because dirty writes can be very difficult to handle manually once they occur. Therefore, Seata added a “Global Lock” mechanism to implement write isolation, requiring local transactions to be committed only after obtaining the Global Lock on the modification record, and must wait until the Global Lock is obtained. This design sacrifices some performance. Avoid having two distributed transactions containing local transactions that modify the same data, thus avoiding dirty writes. In Read isolation, the default isolation level for AT transactions is Read Uncommitted, which means that Dirty reads can occur. It is possible to solve the read isolation problem with a global lock, but it is expensive to block reads directly, which is not usually done.
There is no cure-all solution in distributed transaction, the only effective way is to choose appropriate transaction processing scheme according to local conditions.
Understanding the Nature of Transactions from Single Machine to Distributed Machine (1)