“This is the fourth day of my participation in the First Challenge 2022. For details: First Challenge 2022”
Hello, I’m looking at the mountains.
From monomer architecture to distributed architecture, from monolith architecture to microservice architecture. The interaction between systems is becoming more and more complex, and the magnitude of data interaction between systems is also increasing exponentially. As a system, we want to ensure that logic is self-consistent and data is self-consistent.
There are two requirements for data consistency:
- Regardless of the code, the data can verify its own accuracy, meaning that the data does not contradict each other
- All data are accurate and conform to expectations
To achieve these two things, you need to achieve data consistency, and to achieve consistency, you need to use transactions.
It should be noted that the data consistency designed in this paper is not data consistency between multiple data copies, but business data consistency between systems.
Local transactions
In earlier systems, we could guarantee data consistency through transactions in relational databases. There are four basic elements of this transaction: ACID.
- A (Atomicity) : All operations in the whole transaction either complete or fail. It is impossible to stop at some intermediate stage. If a transaction fails during execution, it will be rolled back to the state before the transaction began, as if the transaction had never been executed.
- C (Consistency) : a transaction can encapsulate state changes (unless it is read-only). Transactions must always keep the system in a consistent state, no matter how many concurrent transactions there are at any given time.
- I (Isolation) : The Isolation state executes transactions as if they were the only operation performed by the system at a given time. If you have two transactions, running at the same time and performing the same function, the isolation of the transaction ensures that each transaction is considered by the system to be the only one using the system. This property is sometimes called serialization, and to prevent confusion between transaction operations, requests must be serialized or serialized so that only one request is used for the same data at a time.
- D (Persistence) : Changes made to the database by a transaction persist in the database and do not get rolled back after the transaction completes.
These four elements are fundamental to relational databases. No matter how complex the system, as long as we use the same relational database, we can guarantee data consistency with the help of transactions. Based on our trust in relational databases, we can assume that local transactions are reliable and require no additional work during development. From the point of view of architecture, the relational database is also a separate system, which is also distributed between the relational database and the application. So let’s take a look at how ACID is implemented for this simple distributed system.
First, A (atomicity) and D (persistence) are two attributes that are inseparable from each other. Atomicity guarantees that all operations of A transaction will either complete or fail. Persistence ensures that once a transaction completes, the changes made by the transaction to the database are persisted in the database and cannot be undone or lost for any reason.
As we all know, data must be written to disk for persistence, stored only in memory, and lost in the event of a system crash, host power failure, etc. So, the key to “write to disk” is to achieve atomicity and persistence, but the action has an intermediate state: writing. As a result, modern relational databases typically use apend logging. All the information needed to modify the data, including what data to modify, which memory page and disk block the data is physically located in, what value to change from, and so on, is recorded to disk in a sequential appending form. Only after the log records are all down and the database sees a “commit record” in the log that represents a successful transaction commit, does it make changes to the real data based on the information in the log. After the modification is complete, an “end record” is added to the log to indicate that the transaction has been persisted. This transaction implementation method is called “commit logging”.
We can guarantee atomicity and persistence of a transaction through logging, but what if multiple transactions access the same resource? As we all know, multiple threads/processes access the same resource, this resource is called critical resource, want to solve the critical resource occupation conflict is very simple, is locked. Relational databases provide us with three types of locks:
- Write Lock: Only one transaction locks data at a time. Therefore, a Write Lock is also called an exclusive Lock. Data cannot be written to or read locked by other transactions (note that it cannot be read but can be read).
- Read Lock: Data can be Read locked by multiple transactions at the same time, so a Read Lock is also called a Shared Lock. After a read lock is added to a database, data cannot be added to a write lock.
- Range Lock: Add a write Lock to a Range of data that cannot be written. It can also be considered a batch behavior of write locks.
Depending on the different combinations of these three locks, we can implement four different transaction isolation levels:
- Serializable: Write locks are added to write, read locks are added to read, and range locks are added to write.
- Repeatable Read: Write lock is added when writing, Read lock is added when reading, range Read lock is not added when reading, in this way, different results will be returned when reading the same range data, namely Phantom Read.
- Read Committed: Writes are Committed, reads are Committed, and the Read lock is released immediately after reading. In this case, the same transaction reads the same data for many times and returns different results, namely, non-repeatable Read.
- Read Uncommitted: Locks are added to writes but not locked to reads. This will Read data written by another transaction that has not yet committed, called a Dirty Read.
Global transaction
As the system scale continues to expand, the volume of services continues to increase. Individual applications no longer meet the requirements, so we split the system, and then the database. At this point, multiple databases can be accessed simultaneously in the same request. To solve the data consistency problem in this situation, the X/Open organization in 1991 (when I was a kid) proposed a transaction architecture for X/Open XA. The core of XA is to define the communication interface between the global Transaction Manager (used to coordinate global transactions) and the local Resource Manager (used to drive local transactions). It forms a communication bridge between a transaction Manager and multiple Resource managers, and realizes the unified submission or rollback of global transactions by coordinating the consistent actions of multiple data sources. Accompanying the XA architecture is the Two Phase Commitment Protocol (2PC). The key point in this protocol is that the activity of multiple databases is controlled by a component of the transaction coordinator. Specifically, there are five steps:
- The application invokes the commit method in the transaction manager
- The transaction manager contacts each database involved in the transaction and notifies them that they are ready to commit the transaction (this is the beginning of phase 1)
- After receiving a notification that a transaction is ready to commit, the database must ensure that it can commit a transaction when asked to commit it or roll back a transaction when asked to do so. If the database is unable to prepare a transaction, it responds to the transaction manager with a negative response.
- The transaction manager collects all responses from each database.
- In the second phase, the transaction manager notifies each database of the results of the transaction. If any database responds negatively, the transaction manager sends a rollback command to all databases involved in the transaction. If the databases all respond positively, the transaction manager instructs all resource managers to commit the transaction. Once the database is notified to commit, subsequent transactions cannot fail. By responding positively to the first phase, each resource manager has ensured that the transaction will not fail if it is later told to commit the transaction.
The two-phase commit protocol is simple to implement, but has several obvious drawbacks:
- Single point problem: The transaction manager plays an important role in the two-part commit. The transaction manager can have a timeout mechanism when waiting for the resource manager to reply, allowing the resource manager to break down, but the resource manager cannot do the timeout processing when waiting for the transaction manager’s instruction. If the transaction manager, rather than one of the resource managers, goes down, all resource managers are affected. If the transaction manager never recovers and does not send Commit or Rollback normally, all resource managers must wait.
- Performance issues: In the two-phase Commit process, all resource managers are equivalent to be bound into a unified scheduling whole, which needs to go through two remote service calls and three data persistence (write redo log in preparation stage, transaction manager do state persistence, write Commit Record in log in Commit stage). The entire process continues until the slowest processing operation in the resource manager cluster ends, which typically results in poor performance for two-phase commits.
- Conformance risk: Although the commit phase is short, it is a clear and present danger period. If the transaction manager after issued to prepare instruction, based on the information sent back received each resource manager to determine the state of affairs can be submitted, the transaction manager will be persistent state of affairs, and submit their own affairs, if network disconnection, unable to send the Commit command to all the resource managers through the network, The result is that some data (for the transaction manager) has been committed, but some data (for the resource manager) has not been committed, and there is no way to roll back, creating data inconsistency problems.
If you can see a problem, you can think of a solution. Our high school teacher said, as long as the consciousness does not slide, there are always more ways than difficulties. Hence the development of the Three Phase Commitment Protocol (3PC), which mitigated single point problems and performance issues during the preparation Phase. This protocol splits the preparation phase in 2PC into CanCommit and PreCommit, renaming the commit phase to DoCommit. CanCommit is the query phase that allows each resource manager to determine whether the transaction is likely to complete based on its circumstances.
3PC is essentially through an inquiry. If everyone says they can do it, it is highly likely to be done, which reduces the re-operation of directly locking resources in the preparation stage. In a three-phase commit, if a transaction manager outage occurs after the PreCommit phase, i.e. there is no message from the resource manager waiting for DoCommit, the default operation strategy is to commit the transaction rather than roll back the transaction or wait. This is equivalent to avoiding the risk of a single point of problem for the transaction manager.
Distributed transaction
Speaking of distributed transactions, we have to mention CAP theory: any distributed system can only satisfy two points of Consistency, Availability and Partition tolerance at the same time, but not all three.
Theory of CAP
- Consistency: Data is seen as expected at any time in any distributed node.
- Availability: A system’s ability to provide services continuously, defined by a ratio of Reliability to Serviceability. Reliability was measured by Mean Time Between Failure (MTBF). Maintainability was measured by Mean Time To Repair (MTTR). Availability measures the ratio of the time the system can be used to the total time. The formula is: A=MTBF/(MTBF+MTTR).
- Partition Tolerance: The ability of the system to correctly provide services in a distributed environment even after some nodes lose contact with each other due to network reasons.
The definition of CAP theory has been revised for several times. The revised definition has no difference in essence, but is more rigorous in logic. For ease of understanding, this article uses the most accessible and understandable definition.
Since CAP can’t do both, let’s take a look at what happens when one or the other is missing:
- CA is chosen to give up P: that is, we believe that the network is reliable and partition will not occur. Such reliability means that there will be no network delay, interruption and other situations between nodes, which is obviously not established.
- Choose CP to abandon A: In this way, availability is abandoned. In order to ensure data consistency, information synchronization between nodes can be extended indefinitely once network exceptions occur. CP combination is generally used in situations with high requirements on data quality, that is, to ensure data consistency, services are not provided until the network is fully restored, which may last for an indefinite period of time, especially when the system has shown high latency or the network is disconnected due to network failure.
- AP abandonment C: indicates that in the event of network partitioning, services are preferentially available and data consistency is abandoned. This is the mainstream choice of the current distributed system, because the network itself is to link servers in different areas, and the network is unreliable, so P cannot be abandoned. At the same time, we implemented distributed systems to improve usability, which is our goal and cannot be abandoned.
It is Eventual Consistency that AP gives up on C. AP gives up on Strong Consistency. AP gives up on Strong Consistency. All copies of the data in the system eventually reach a consistent state after a period of time. The period of time mentioned here is also a period of time acceptable to users.
The final consistency is also supported by a theory called BASE theory, which mainly includes:
- Basically Available: Allows for the loss of a portion of the system’s availability in the event of an unpredictable failure. For example, allow response time to increase, allow some non-critical interfaces to degrade or fuse.
- Soft State: The Soft State is also called the weak State, as opposed to the hard State. An intermediate state is allowed to exist in the system, and the existence of the intermediate state will not affect the overall availability of the system, that is, the system is allowed to delay data synchronization between data copies of different nodes.
- Eventually Consistent: Ultimate consistency emphasizes that all copies of data in a system, after a period of synchronization, Eventually reach a Consistent state. Therefore, the essence of final consistency is that the system needs to ensure the consistency of the final data, rather than ensuring the strong consistency of the system data in real time.
In engineering practice, the final consistency is divided into five ways, which will be combined to achieve the final consistency:
- Causal consistency: If node A informs node B after updating A certain data, subsequent access and changes to the data by node B are based on the updated value of A. Meanwhile, data access of node C, which has no causal relationship with node A, has no such restriction.
- Read your writes: when node A updates data, it can always access the latest value that it updated, but does not see the old value.
- Session consistency: Session consistency The process of accessing system data is framed in a session. The system ensures read-write consistency in the same valid session. That is, after the update operation, the client can always read the latest value of the data item in the same session.
- Monotonic read consistency: If a node reads a value of a data item from the system, the system should not return the older value for any subsequent data access to the node.
- Monotonic write consistency: A system must be able to ensure that all writes from the same node are executed sequentially.
With a theory in hand, let’s look at some of the patterns that lead to final consistency.
Reliable event mode
The reliable event pattern is event-driven: when an event occurs, such as updating a business entity, the service publishes an event to the message broker. The message broker pushes events to the services that subscribe to events, and when the services that subscribe to these events receive the events, they can complete their business and possibly cause more event publishing.
To illustrate this pattern, the order system notifies the inventory system to reduce inventory after a successful order is placed.
- Order system completes order operation according to user operation. The same local transaction is used to hold the order information and the write event.
- Another message service polls the event table and sends events with an “in progress” status as messages to the message service. If the sending fails, it will be sent again in the next polling because it is a polling task. (There are some optimization points here. This example is not expanded to simplify the model.)
- The message service sends an order success message to the inventory service that subscribed to the order message, and the inventory service begins processing. There will be such a concentration:
- The inventory service detracted inventory successfully, and the message service received the processing response successfully. The message service returns the response to the order service, where the event receiver modifies the event to completed.
- The inventory service failed to detract inventory. The message service received a processing failure response. The message service then sends the message again to the inventory service until it receives a successful response. If the number of failures reaches the threshold, you can notify manual intervention.
- The message service failed to return a result to the order service because the order service did not receive a successful response. At this point, the event polling logic again sends the event to the message service. In this way, the inventory service will repeatedly receive the message of inventory reduction, so the inventory service is required to be idempotent. The inventory service discovery message has been processed and returns success.
This solution of constant retries to ensure reliability is called best-effort Delivery, which is where the word “reliable” comes from.
Reliable event patterns and a more common form, referred to as “Best to submit” (Best Effort – 1 PC), means will most likely to make mistakes or the core business, in the form of a local transaction is completed, adopt the way of retry (not limited to message service) to make the same other related business in the distributed transaction is complete. The way to find the most likely error is to do a prior assessment of error probability in advance, so as to know which block is most likely to make mistakes. The way to find the core business is to find the piece of business where everything else must succeed if it is to succeed.
Here we add two more concepts:
- Business exception: a condition in which the business logic is faulty, such as insufficient account balance, insufficient merchandise inventory, etc.
- Technical exception: non-service logical exception, such as network connection exception or network timeout.
TCC mode
Try-confirm-cancel (TCC) is a highly intrusive transaction scheme. The service processing must be divided into two sub-processes: Reserved service resources and Confirm/release consumption resources. Unified services coordinate and schedule the sub-processes of different service systems. It is divided into the following three stages:
- Try: In the execution phase, all the checks for service execution are completed (to ensure consistency) and all required service resources are reserved (to ensure isolation).
- Confirm: No service check is performed during the execution phase. Resources prepared during the Try phase are used to complete services. The Confirm phase may be repeated and needs to be idempotent.
- Cancel: Cancels the execution phase and releases service resources reserved during the Try phase. The Cancel stage may be repeated and needs to be idempotent.
- The order system creates transactions, generates transaction ids (used to identify idempotent requests), and logs activity through the activity manager.
- Enter the Try phase
- Call the account system to check whether the account balance is sufficient. If so, freeze the required amount. At this time, the account balance is a critical resource, and the security of freezing operation should be guaranteed by exclusive lock or optimistic lock.
- Call the inventory system to check whether the commodity inventory is sufficient. If so, lock the required inventory. Lock the warehouse operation to ensure safety
- If all services return success, log the activity as Confirm and enter the Confirm phase.
- Call the account system and deduct the frozen amount
- Invokes the inventory system to deduct the locked inventory
- If all completes in step 3, the transaction is declared complete. If an exception occurs on either side in Step 3, the Confirm operation is repeated as recorded in the activity log for maximum effort delivery. Therefore, Confirm operations of each service system must be idempotent.
- If either party fails in Step 2 (including business and technical exceptions), log the activity as Cancel and enter the Cancel phase:
- Call the account system to release the frozen amount
- Call the inventory system to release the locked inventory
- If all completes in step 5, the transaction fails. If an exception occurs on either side in Step 5 (both business and technical), the Cacel operation, maximum effort delivery, is repeated as recorded in the activity log. Therefore, the Cancel operation of each business system also needs to be idempotent.
The difference between TCC and 2PC is that TCC is a white box at the business code level and 2PC is a black box at the infrastructure level. So TCC has the flexibility to adjust the granularity of resource locking as needed.
TCC can reserve resources during business execution, which solves the problem of resource isolation in reliable event mode. However, TCC has two significant drawbacks:
- TCC moves the logic of the infrastructure layer up to the business code, which is highly intrusive to the business and requires higher development cost. With the increase of development cost, the corresponding maintenance cost and the quality of developers will have higher requirements.
- TCC requires resources to be locked, occupied, or released, but some resources belong to external systems and cannot be locked.
Given the above two shortcomings, let’s see if SAGA can make up for them.
SAGA mode
SAGA is a long story, a long narrative, a long series of events. The SAGA pattern was developed long before the concept of distributed transactions (again, in admiration of its predecessors). It originated ina 1987 ACM paper “SAGAS” by Hector Garcia-Molina and Kenneth Salem of Princeton University. In this paper, we propose a method to improve the operation efficiency of “Long Lived Transaction”. The idea is to decompose a large Transaction into a set of sub-transactions that can be run interlacing. Later, we develop a design pattern to decompose a large Transaction in a distributed environment into a set of local transactions. In some articles, this mode is called business compensation mode. SAGA is the description of transaction form, while business compensation is the description of transaction behavior. The essence is the same.
SAGA mode has two implementations:
- Forward Recovery: Execute each subtransaction in sequence. If a subtransaction fails, retry the operation until it succeeds, and then proceed to the next subtransaction. For example, the user placed a successful payment, must deduct inventory.
- Backward Recovery: Each subtransaction is executed in order, if a subtransaction fails to execute, the compensation operation of the subtransaction will be performed (to avoid failure caused by technical anomaly, the compensation operation needs to be idemidemous), and then the compensation operation of the successful subtransaction will be executed in reverse order. This is generally cancelable batch operation, such as travel booking, need to buy air tickets, book hotels, buy tickets, if the failure to buy tickets, air tickets and hotels can be cancelled.
Based on these two implementations, SAGA can be divided into two parts:
- Normal Transactions: The large transaction splits several small Transactions, and the whole transaction T is decomposed into N sub-transactions, named T1, T2… , Tn. Each subtransaction should be or can be treated as an atomic behavior. If a distributed transaction can commit normally, its impact on the data (final consistency) should be equivalent to a sequential successful Ti commit.
- Tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor tremor. And the Cn.
Subtransactions and compensation actions need to meet some conditions:
- Ti and Ci must correspond
- The compensation action Ci must be executed successfully, that is, maximum effort delivery needs to be achieved.
- Ti and Ci need to be idempotent
At the end of the article to summarize
This paper mainly summarizes the local transaction, global transaction, final consistency and other ways to achieve data consistency. The centralized modes to achieve the final consistency are mainly introduced: reliable event mode, TCC mode, SAGA mode and so on. Data consistency has always been a difficult problem. With the microservitization, data consistency becomes more difficult. There is no fear of difficulties, as long as you do not give up, it will always be solved.
Recommended reading
- What are microservices?
- Microservices programming paradigm
- Infrastructure for microservices
- Feasible solutions for service registration and discovery in microservices
- From singleton architecture to microservice architecture
- What other options do we have besides microservices?
- How to effectively use Git to manage code in microservices teams?
- Summary of data consistency in microservice systems
- Implementing DevOps in three steps
- System Design Series how to Design a Short Chain Service
- System design series of task queues
- Software Architecture – Caching technology
- Software Architecture – Event-driven architecture
Hello, I’m looking at the mountains. Swim in the code, play to enjoy life. If this article is helpful to you, please like, bookmark, follow.