Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks is a reading note for the paper.
MIT 6.824 Schedule: Spring 2016
An overview of the
The thesis was published in 1995, when the topic of how to implement distributed transactions in distributed data systems was a hot topic. After so many years, the Optimistic Concurrency Control (OCC) proposed then is still a hot topic today.
People always hope to achieve an efficient, scalable and stable persistent storage system, and the OCC proposed in this paper is to solve this problem, its characteristics are:
- Data is cached locally on the client side and persisted on the server side
- Serializability and external consistency guarantees are provided for committed transactions
- Global Serialization is captured through loosely synchronized clocks
OCC supports concurrent transactions, but it does not save the information of concurrency control for each data like traditional methods, but only saves a version number, which ensures the minimum memory consumption and low storage consumption, but also ensures the performance.
introduce
What are the scenarios for OCC described in this article?
Distributed object-oriented database system, data persistence by the server, client to improve performance will cache data.
Why is it called Optimistic concurrency control?
In order to ensure the external consistency of the transaction, a simple method is to serialize all transactions through locking, but this will definitely lead to poor performance, so the solution is to remove the lock, and only take measures when the conflict occurs. So optimism is relative to pessimistic locking algorithms.
What is External consistency?
External consistency(some articles called Linearizability or strict serializability) means that the later transaction must see the modification of the first committed transaction
The Price of Optimism
Since we do not use locking, OCC has its limitations: it is suitable for low-conflict scenarios. If a large number of transactions are in conflict, OCC can have very poor performance because:
Pessimistic algorithm can only delay the execution of the transaction, while optimistic algorithm can directly terminate the execution of the transaction once the conflict occurs.
Optimistic realization
In essence, the optimistic algorithm only delays the detection of conflicts and recovers them when conflicts occur. Therefore, the core problems to be solved are as follows:
- Collision detection
- Conflict to restore
The environment
The OCC proposed in this paper has been implemented in object-oriented database Thor. The model diagram of Thor is presented below
Each application runs directly on the client, and data is persisted on the server. To improve performance, each client cache data locally. Applications can be executed directly on the client, then manipulate data in the local cache, and finally communicate with back-end Servers when submitting.
Data is submitted with two pieces of information:
- Validation Information: Represents the read/write type of the data involved in this transaction T
- Installation Information: Modified data
The client sends a commit request to the back-end server. If the data is owned by the server, the submission operation will be carried out; otherwise, the Server will transform into a Coordinate role and submit the complete transaction with the participants who have the data.
At this time, 2-phase protocol will be involved between coordinate and participants. The following is a brief description.
The first stage is as follows:
1.1 Coordinate Send prepare MSG to each participant, which contains Validation information and installation information
1.2 After the verification succeeds, the participant records the installation information to the disk and replies ok
1.3 Coordinate After receiving ok from all participants, record a COMMIT message to the local disk and reply to the client saying OK
The second phase is performed asynchronously
2.1 Coordinate Send the COMMIT message to each participant
2.2 Participants overwrite the old value with the new value in the installation message, record a commit log locally, and reply to coordinate to say OK
When the server commits a transaction and needs to send InValidation messages to other clients holding cached data, how do you find those clients?
The Server has a cached set for each client. Invalidation messages are not required to be correct, but they need to be:
- If the client receivesinvalidation messagesThe current transaction has not read the old data, which invalidates the data in the local cache
- If the current transaction has already read old data, the current transaction is terminated immediately
When the client finishes processing the InValidation messages, it replies to the server, which removes them from the Invalid set.
Efficient validation rules
The algorithm guarantees two kinds of consistency:
- Serializability: All committed transactions can be sorted and executed in the same order as if they were executed in that order
- External consistency: The last transaction must see the modification of the first committed transaction
Validation occurs when a transaction request is submitted. There are two types of validation:
- Forward Validation: Checks for conflicts with all ongoing transactions
- Backward Validation: Conflict checks are performed with all validated transactions
We deduce the validation rules from example to example:
Example 1
Initialization: x=0 y=0 Z =0 T1: Rx0 Wx1 T2: Rz0 Wz9 T3: Ry1 Rx1 T4: Rx0 Wy1Copy the code
When verifying, find an order for the above four transactions that makes all reads and writes true.
A possible sequence is: T4, T1, T3, T2
At this point we assume that the transaction will see the uncommitted data during validation, because all four transactions execute in parallel and none of them commit during validation, so they clearly see each other’s writes.
For performance, we want a distributed validation rule where data X,y, and Z may be distributed on different machines, hence the following example:
Example 2
Initialization: x=0 y=0 Z =0 T1: Rx0 Wx1 T2: Rx0 Wy1 T3: Ry0 Rx1Copy the code
S1 only validates information about X
T1: Rx0 Wx1
T2: Rx0
T3: Rx1Copy the code
So T2, T1, T3 is ok so S1 says yes
S2 only validates y:
T2: Wy1
T3: Ry0Copy the code
In this case, T3 T2 is ok, so S2 answers yes, but in fact the above transaction cannot pass the check, so what is the cause of the error?
Validation must be in a consistent order
Here’s the requirement: We want to read the local clock timestamp on each client commit as a sort benchmark, so that each server can validate in the same order.
So if we verify in ts order, what’s the problem?
Example 3
T1@100: Rx0 Wx1
T2@50: Rx1 Wx2Copy the code
T2 reads x=1 and writes x=2. In order by timestamp, the verification fails. However, this situation may be caused by a large difference in the client clocks. Requiring TS order causes unnecessary transaction termination.
Every time we submitted at this time, will bring the original value of the read data, this value may be large, causing unnecessary waste, so optimization are as follows: each object a version number to check the read transaction before data is written to the latest data, in the choice of the version number, can choose to write the data of ts as the version number.
Example 4
Initial value x=y=0,ts=0,x, and y are on S1 and S2 respectively T1@100: Rx@0 Wx T2@110: Rx@0 Wy parameter name parameter Meaning log onCopy the code
S1 only validates the information of x, sorted by ts:
T1@100: Rx@0 Wx
T3@105: Rx@100
T2@110: Rx@0Copy the code
Here T2 is not reading the latest, it should be 100.
Here we use the version number to determine whether the data is up to date, as opposed to using it directly, what are the disadvantages?
The version may be different, but the value is the same
T1@100: Wx1
T2@110: Wx2
T3@120: Wx1
T4@130: Rx1@100Copy the code
According to versions, T4 should be terminated, T4 should read the version written by T3, but T4 actually reads the correct value x=1, T3 and T1 are the same value.
Now that we’re done with these examples, let’s look at some specific problems
The global order
Global order is through the time stamp on each machine to obtain, but each machine clock is out of sync, so will lead to some deviation, so Google [Spanner] [2] by atomic clocks and is equipped with GPS receivers in the center of the data to solve the error range uncertain problems, to solve the problem of timing the distributed transaction, The algorithm proposed in this paper assumes that such clock inconsistencies are controllable.
When coordinate receives the COMMIT request, it reads the timestamp of the local clock and assigns a value to the transaction. The prepared MSG sent by coordinate to the participant contains:
- T.ts: timestamp of transaction T
- T. readset: T IDs that read data
- T riteSet: T write data IDs
- T ID of a running client
Here T.t s =
Each server places a Validation Queue, or VQ, on each validated transaction
Check for transactions near ts
Consider the following scenario: S is a transaction that has already been authenticated, and T requires authentication. Depending on the timestamp order of T and S, there will be different authentication rules. Let’s consider the timestamp of S later than T. Why is the timestamp of S late here, but it was submitted first? This is probably because clocks are out of sync between different machines. In this case, the rule we check is:
For each validated transaction with a timestamp greater than T, we check that T does not read any data that S modified and does not update any data that S read. We call this a later-conflict check
Check transactions before TS
For transaction S that has been validated and whose timestamp is earlier than T, we consider:
- S reads x, T modifies x, so we don’t have to check
- S modifies x, T reads X, and now we need to make sure that T reads x by the time T reads it, and then there are two different situations: if S hasn’t committed yet, then we interrupt T, because T can’t read the data that hasn’t committed yet. If S has committed already, it depends on the version of X that T reads
Version-check. In order to implement version-check, the general practice is to associate each object with a version. This version can be the timestamp of each commit write operation transaction, which meets the monotonically increasing requirement, but will cause unnecessary waste of space. Therefore, this article provides a method called current-verison-check:
Check that T has read the latest value of x
How did you do that? Current-verison-check and version-check are consistent. Assuming T reads X and has passed later-conflict check, it indicates that the transaction verified after T does not update X. If T passes the version-check at this time, it indicates that T reads x as the latest value after x has been changed before. Then, the x read by T is of course the latest and the current version.
As mentioned earlier, the server keeps an invalid set for each client. In this case, we only need to check whether the x read by T is in the invalid set of the client.
If x is not in the client’s invalid set, it is the latest one
An example of using invalid sets:
Client C2:
T2 ... Rx ... client/TC about to send out prepare msgs
Server:
T1 wrote x, validated, committed
x in C2's invalid set on S
server has sent invalidation message to C2
server is waiting for ack from C2Copy the code
Three cases:
-
The INVAL message reaches C2 before C2 sends the prepare message
C2 aborts T2
-
Inval waits for yes/no after C2 has sent prepare
C2 aborts T2
-
Inval lost or delayed (so C2 doesn’t know to abort T2)
S has not received ACK from C2
C2 is still in the INVal set of X on S
S will reply no to T2’s prepare message
According to the above analysis, if x is not in the invalidate set when current-verison-check is performed at this time, the client must have received the expiration message of X. If the value of X is not the latest at this time, it must be in Case2 in the above three cases. After the prepare message is sent, the transaction terminates even if the server replies that it is OK.
Sever needs to have some data in memory:
- cached sets
- invalid sets
- VQ
The first two are not large, but the VQ will grow larger if it is not cleared. The next section shows how to truncate the VQ.
Truncation
VQ stores all validated transactions. If we don’t clean it up, it will get longer and longer. What should we clean up?
-
Transactions that have been committed
Because the invalid set holds the effects of committed transactions, it can be deleted
-
Read-only transaction
For a read-only transaction that has been removed, how do we know what data it has read? We maintain a threshold timestamp that is greater than or equal to the timestamps of all transactions that have been removed from the VQ.
We maintain the following invariants throughout the process:
VQ stores all uncommitted Read-write transactions and all transactions greater than threshold
Therefore, there is threshold check. All items smaller than threshold fail to be verified because there is not enough information to carry out later-conflict. However, there is enough information to carry out earlier check after the check of threshold check.
So given the concept of threshold, how do you set it? If the value is too high, the transaction will fail in the threshold check; if the value is too low, the VQ queue will be too long.
We assume that the transmission delay of message is MSG delay and the error of clock time is skew, then when the message is sent from coordinate to participant, the delay is: MSG delay + SKEw. If the Time of the Participant is T1, the transaction may have been sent, but the time t of the unarrived transaction should be t > T1 – (MSG delay + skew). MSG delay + SkeW is called Threshold Interval.
At this point, let’s summarize all our current verification rules as follows:
-
Threshold Check
All less than Threshold fail
-
Checks Against Earlier Transactions
For the uncommitted transaction S with a timestamp less than T in VQ, if T reads the data written in S, return failure
-
Current-Version Check
For each read data x in T, failure is returned if x is in an invalid set
-
Checks Against Later Transactions
Transaction S with a timestamp greater than T in VQ returns failure whenever data read in T is modified in S or data written in T is read in S
Recovery from a crash
When the server recovers from a crash, you need to ensure that the transactions validated before the crash are guaranteed and the transactions validated after the recovery still meet the validation rules, so a natural idea is to log VQ and Threshold to disk.
For read/write transactions, you need to record the installation information after preparing. However, for read-only transactions, you do not need to record the VQ installation information after preparing. So what we do is we don’t record it.
If we do not record read-only transactions, this part of information will be lost when the crash is recovered. However, if we set the Threshold to be greater than the last verified transaction on the server, we will not worry about the loss of read-only data.
The cached set is also not persisted. Instead, the server stores the address of the client where the cached data is stored. After the crash is restored, the server communicates with the client to restore the cached set.
Finally, the Invalid set is recovered from the recorded Installation info and cached set, but this may result in unnecessary entries due to missing client ACKS. How do you solve it? When a transaction raises an InValidation MSG, the server generates an InValidation number, which is stored with the commit log, and the InValidation number is guaranteed to increment monotonically. When the InValidation MSG is sent, The client stores the InValidation number when it receives it. When it resumes, the client brings the InValidation number with the cached set. The server can then rebuild the correct Invalid set based on the InValidation number.
The experiment
omit
conclusion
-
Caching reduces the data fetch between client/ servR, so it is faster
-
Distributed OCC avoids clien/server lock contention
- Loose time synchronization helps servers agree on the order for detection purposes
- Distributed OCCS are still a hot field 20 years later
This is lesson 10 of 6.824: Distributed Systems. Your encouragement motivates me to continue writing and I look forward to our common progress.