preface
Current application system, enterprise applications and Internet application, the final data consistency is every application system must face the problem, with the growing popularity of the distributed, data consistency is more difficult, but also difficult to have a silver bullet solution, not to introduce specific middleware or specific open source framework can be solved, more is to see the business scenario, The solution is presented according to the scenario. According to the author’s understanding in recent years, summed up a few points, more application systems when coding, pay more attention to data consistency, so that the system is robust.
Basic theory correlation
Speaking of transaction, the current several theories, ACID transaction characteristics, CAP distributed theory, and BASE, etc., ACID is reflected in database transaction, CAP and BASE are distributed transaction theory, combined with business system, such as order management, such as warehouse management, etc., can learn from these theories, so as to solve the problem.
The ACID properties
A(atomicity) The atomic operation unit of A transaction, in which all or none of the modifications to the data are performed;
C(Consistency) Data must remain in a consistent state at the beginning and completion of a transaction, relevant data rules must be applied to the modification of the transaction to ensure data integrity, and all internal data structures must be correct at the end of the transaction;
I(isolation) ensures that transactions are not executed in a separate environment from external concurrent operations;
D(persistence) After a transaction, changes to the data are permanent and can persist even if the system fails;
CAP
C(consistency) Consistency refers to the atomicity of exponential data, which is guaranteed by transactions in classical databases. When transactions are completed, data will be in a consistent state regardless of success or rollback. In distributed environments, consistency refers to whether data of multiple nodes are consistent.
A(availability) The service is always available. When A user makes A request, the service can return A result within A certain period of time.
P(Partition tolerance) In distributed applications, the system may not work due to some distributed reasons. Good partition tolerance makes the application seem to be a distributed system, but it can function as a whole
BASE
BA: Basic Availability;
S: Soft state Indicates the Soft state.
E: Eventual consistency Final consistency.
Several practices for ultimate consistency
Transactions in the single database case
If the application system is a single database, then this is a good guarantee to use the transaction characteristics of the database to meet the transaction consistency, which is strong consistency. For Java applications, hard coding is rarely done directly through transaction start and commit and ROLLBACK, but mostly through Spring’s transaction templates or declarative transactions.
Final consistency based on transactional message queues
Using the message queue, where the processing business logic, send messages, after the success of the business logic processing, commit message, ensure the success of the message is sent, the message queue after delivery to deal with, if successful, the end, if at first you don’t succeed, try again, until success, but is only in the business logic, the first stage is successful, the second stage must succeed. Corresponding to the C process in the figure above.
Final consistency based on message queue + timed compensation mechanism
The difference between the previous section and the queue based on transactional messages above is that the second stage of retries is no longer the retry logic of the messaging middleware itself, but a separate compensation task mechanism. In fact, in most logic, the probability of failure in the second stage is relatively small, so it can be more clear to separate the compensation task list, and can be more clear how many tasks are currently failed. Corresponds to flow E in the figure above.
Commit /rollback mechanism for the business logic of the business system
Commit and ROLLBACK are typical concepts in database transactions, but in a distributed system, you need to implement these concepts in the business code, successful COMMIT, failed rollback.
Idempotent control of business application systems
Why do we do idempotent? The reason is simple: a system call is retried after it does not achieve the desired result. Then retry will face a problem, try again after can’t affect the business logic, such as creating an order, the first call timeout, but the system does not know the timeout call is failed or successful, then he would try again, but in fact the first call to order creation is successful, then retry, obviously cannot create order again.
The query
The query API is inherently idempotent, because if you query once and twice, there is no data change for the system, so query once is the same as query multiple times.
MVCC scheme
Multi-version concurrency control, update with condition, update with condition, this is also in the system design, reasonable choice of optimistic lock, through version or other conditions to do optimistic lock, so as to ensure timely update in the case of concurrent, there will be no big problems. For example, update table_xxx set name=#name#,version=version+1 where version=#version#, or update table_xxx set Quality =quality-# where quality-#subQuality# = 0
Separate de-duplicate tables
If there are too many places involved in deduplication, for example, there are various business documents in the ERP system, and each kind of business document needs to be de-duplicated, in this case, a separate de-duplicated table can be set up. When inserting data, the de-duplicated table can be inserted to ensure unique logic by using the unique index feature of the database.
A distributed lock
Or insert the data example, if the distribution system, build a unique index is difficult, such as the uniqueness of field can not determine, at that time can introduce a distributed lock, through a third party system, in the business system insert or update data, access to distributed lock, and then do the operation, after the lock is released, so it is multi-threaded concurrent lock the train of thought, Introduce many systems, that is, distributed system to solve the idea.
Delete the data
Delete data, only the first delete is the real operation data, the second or even the third delete, directly return success, thus ensuring idempotent.
A unique index to insert data
The uniqueness of inserted data can be constrained by the business primary key. For example, in a particular business scenario, three fields must determine the uniqueness. Then, a unique index can be added to the database table to mark the uniqueness.
Idempotent at the API level
Here is a scenario of idempotent API level, for example, how to control repeated submission of data. In this scenario, a unique identifier can be added to the form form or client software that submits data, and then the server can de-duplicate the identifier according to the UUID. In this way, the unique identifier at the API level can be better achieved.
State machines are idempotent
Design relevant business documents or tasks related to business, is sure to involve the state machine, it is a state above the business documents, state in different circumstances will change, usually exist finite state machine, at that time, if the state machine is in the next state, this time to a state of change, on a theory is not able to change, In this way, the idempotent finite state machine is guaranteed.
Introduction of asynchronous callback mechanism
A application call B, in synchronous invocation returns results, return success to A, B in general, this time will be over, actually is no problem in 99.99% of the time, but sometimes in order to ensure that 100%, remember that at least 100% in the system design, then B system callback A again, tell A, you call my logic, It worked. In fact, this logic is very similar to the three-way handshake in TCP. Process B in the figure above.
A verification mechanism similar to double check
In the same asynchronous callback process shown above, A calls B synchronously and B returns success. This call ends, but to make sure that at A later time, which can be A few seconds, or at A regular time every day, A calls B again to check whether the previous call was successful. For example, A calls B to update the order status, which succeeds. After A few seconds, A queries B to confirm whether the status is what it just expected. Process D in the figure above.