Understanding database transactions from 0 to 1 (part 2) : Isolation level implementation - MVCC and locks

This is the second article, Shared database transactions on an interpretation of the database transaction concurrency will produce problems, this article will speak database in detail how to avoid these problems, which is how to realize the isolation, mainly about two kinds of mainstream technology solutions – MVCC and lock, understand the MVCC and lock, can extrapolate to see various database concurrency control scheme, Understand the problems each implementation solves and the concurrency issues that require the developer’s own attention to better support business development.

To review the issues discussed in the previous article, there is no isolation or the isolation level is insufficient:

Dirty Write
Dirty Read
Unrepeatable Read
Phantom
Read bias (Read Skew)
Write Skew
Lost Updates

As you can see, all problems are fundamentally caused by writing, and the root cause is data changes. Reading is not change the data, so no matter how many concurrent read, will not conflict, if all the transaction only consists of reading, so no matter how to dispatch them, they are serializable, because their execution as a result, are carried out with a serial of the result is the same, but writing can cause the change of the data, a little careless, The result of this concurrent scheduling will be inconsistent with the result of serial scheduling.

In the following discussion, we first define our model for describing transactions: We use R for read, W for Write, followed by A number for which transaction is operating, followed by parentheses for which element is operating R1(A) for transaction 1 to read element A, R2(A) for transaction 2 to read element A

Take a look at how write operations can cause a mismatch between concurrent scheduling and serial execution:

1. Serialization based on lock

(This section takes MySQL InnoDB as the basic model)

1. Read lock and write lock

The cornerstone of serializability is to control conflicts and ensure the serialization of conflicting operations forcibly, so the following principles should be followed:

Read – write should line up
Write. – Write in line

Read cannot write, write cannot read and write, but read can read, because read does not conflict, so the database needs two locks:

In general concurrent programming, we usually add an exclusive lock to a resource. To obtain a lock, the resource must be in the unlocked state. If someone has already locked the resource, they need to wait for the lock to be released. This kind of lock can ensure that a resource can be processed in sequence concurrently and realize the purpose of queuing.
Share lockAlso known as S lock, this is a looser lock than exclusive lock. When a resource is not locked or is currently locked as a shared lock, a shared lock can be added to it. That is, an unlimited number of shared locks can be added to a resource at the same time. At this time, the resource has been locked. Although the shared lock can be added, the exclusive lock cannot be added. The exclusive lock can be obtained only after the resource lock is released. The purpose of shared locks is to increase the concurrency of non-conflicting operations and ensure that conflicting operations are queued.

What is the relationship between these two types of locks and read and write? Both reads and writes are locked, but read-reads can be concurrent, while writes need to queue up any operation, so:

A shared lock (S lock) that obtains records is only allowed to be read by transactions. Simply speaking, a shared lock is only a read lock. After a record is read, other records can also be read.
An exclusive lock (X lock) of a record allows the transaction to update it. An exclusive lock allows a transaction to read and write, and is a universal read-write lock. After an exclusive lock is added to a record, other transactions, whether they want to add an exclusive lock or a shared lock, need to queue for the release of the current exclusive lock. In MySQL, you can use SELECT FOR UPDATE to specify an exclusive lock FOR a record. In MySQL, you can use SELECT FOR UPDATE to specify an exclusive lock FOR a record.

Through read and write operations lock, has realized the read and write, write about the end of the line, but by simple locking guarantee queue, queue size is too small, but is simply the operation and operation between the end of the line, not enough to solve the problem of the cannot be serialized in the figure above, because if the transaction 1 read released immediately after A lock, the transaction 2 can immediately get A write lock, change the value of A, The non-serialization problem above still occurs, so transactions need to ensure a larger granularity of queuing — if a record is read or written by a transaction, it cannot be modified by other transactions until the transaction commits, thus Strict two-phase Locking is born.

2. Strict two-phase Locking

First of all, what is the two-stage locking protocol (2PL)? It stipulates that the locking and unlocking of transactions are divided into two independent stages. In the locking stage, only the locking can be added but not unlocked. Strict two-phase locking (S2PL) specifies that a transaction can only be unlocked after commit or rollback. Therefore, S2PL guarantees that records read or written by a transaction will not be unlocked before commit or ROLLBACK, and therefore cannot be locked by other records. No record changes are made, thus serializability is achieved.

3. Multi-granularity locking and Intention Lock

InnoDB not only supports row-level locks, but also table-level locks. In order to be compatible with multi-granularity locks, a special Lock – Intention Lock is designed, which does not have the Lock function itself, but only takes the “indication” function. To add a table lock, ensure that the row-level lock is fully released and the entire table does not have any locks. So, how do you know if every record in the entire table has been released? It would be inefficient to iterate through the locking status of each record, so an intent lock is needed, which is simply an indicator telling the database if and what locks are being added below this granularity. For example, parking lots will post a sign at the door indicating whether the parking space is “full” or “vacant”, so there is no need to drive to check parking Spaces one by one, which improves efficiency. If InnoDB wants to lock a record, it needs to add an intent lock to the table before it can add a normal lock to the record. If InnoDB fails to acquire the intent lock, it cannot continue to acquire the lock.

Intent locks are fully compatible with each other

In general, we do not add table locks to transactions, because table locks are inefficient, we usually add row-level locks, row-level locks are placed on a particular record, we call it record locks. The main content of this section is to give an idea of multi-granularity locking. In reality, table locking is rarely used. The above shared Lock, exclusive Lock is defined in accordance with the Lock compatibility, table Lock, Record Lock (Record Lock) is defined by the Lock range, according to the Lock range is different, there are other N locks, some will be mentioned below.

4. Avoid Phantom reading

Gap Lock

Consider an example: SELECT name FROM students WHERE age = 18; INSERT INTO students VALUES(” 四”, 18); Transaction 1 executes the same SELECT statement again, and finds that the result is “triple” + “triple”. This is a phantom read. The same transaction does two reads of the same condition, but it has not read the record before. Although a record locks can achieve concurrency control to existing records, which is for update, delete, never has a concurrency issues, but can’t make a concurrency control to insert, because the insert is for there is no record, and also there is no record, we can’t add records for its lock, therefore may produce phantom read phenomenon. To solve this problem, there is a gap lock, which is also applied to a record, but it does not lock the record itself, it only locks the gap between this record and its previous record, preventing insertion. As shown in the figure below, if a table has three records with primary keys 1, 2, and 5, a gap lock on 5 will only lock the open interval (2,5), but not the record itself.

Insert Intention Lock

Next-Key Lock

SELECT * FROM students WHERE id >= 1; SELECT * FROM students WHERE id >= 1; SELECT * FROM students WHERE ID >= 1;

Clearance Lock and record Lock are two types of Lock structures, so they cannot be combined. If three clearance locks are added separately, and four records are added separately, seven Lock records will be generated, which occupies a lot of memory. Therefore, MySQL has a kind of Lock called next-key Lock, if a next-key Lock is added on the red record, (1,3) will Lock the front open and then closed interval, that is, to Lock the Record itself + Record before the Gap, it can be found that, next-key Lock is actually Gap Lock + Record Lock. At this point, the Lock structure can be simplified into the record with ID 1 plus the record Lock + three successive next-key locks. Since the next-key Lock type is the same and continuous, they can be put into the same Lock record. Finally, only the record Lock with ID 1 + one next-key Lock. Next-key Lock is nothing special, just a simplification of Record Lock + Gap Lock.

5. The solution to the concurrent problem

1. Dirty Write

Scheme: Transaction write records must acquire exclusive locks

Principle: An exclusive lock is acquired before a transaction is written to a record, and due to strict two-phase locking, the lock is not released until the transaction commits, so dirty writes are completely avoided.

2. Dirty Read

Scheme: Transaction write records must acquire exclusive locks

How it works: After an exclusive lock is placed on a record, no further locks are allowed, so no transaction can read data written by another transaction that has not yet committed.

3. Unrepeatable Read

Solution: Transaction read records must be locked (S or X locks are acceptable)

How it works: Because the record is already locked while a transaction is reading it, no other transaction can lock it exclusively, so it cannot be modified at all, and no unrepeatable reads occur.

4. Phantom

Solution: gap lock

How it works: Gap locks block insertions, so there are no phantom problems.

5. Read Skew

Reading bias needs to be explained a little more, using the example from the previous article: For example, the balance of both accounts X and Y is 50, and their sum is 100. Transaction A reads balance of X with 50, and then transaction B transfers 50 from X to Y and commits. Transaction A reads balance of Y after B commits, and their sum becomes 150. Can be found, read the deviation is due to the consistency of business is made up of multiple records of total state guarantee, the transaction is A open and read one part of the record, the transaction record of haven’t read A B modified and submitted the B, database has entered A new consistent state at this time, but after A in B submitted to read the part of the record, read the revised data, B Although the database is still in the consistent state, USER A finds that the total status of multiple records is inconsistent with service consistency, resulting in read deviation. The nature of the read bias is that transaction A has some old data and some new data, and the overall state is inconsistent.

Solution: Read data must acquire lock, write data must add exclusive lock

Principle: because the transaction was reading records have been added to the lock, so any transaction can no longer access to exclusive lock, also will not be able to update the data, this article has been read for transaction nature impossible “stale data” say, any data is read, can’t be modified before it to submit, so are the latest data is read.

6. Write Skew

With a detailed when it comes to writing deviation, there is not much said, it is the same as reading deviation nature, because read a certain part of the data become obsolete data, write a deviation to use old data as the premise, so make the wrong judgment, inconsistent results in the business, thus solving the problem of writing deviation stale data need to be solved.

Solution: Read data must acquire lock, write data must add exclusive lock

Principle: It is the same as the solution to write bias, because locking forces the data read by the transaction to be modified and prevents stale data from appearing.

7. Lost Updates

The lost update was also mentioned in the last article, which probably means that transaction A reads X first and calculates X before writing X, but before writing X, transaction B has modified the value of X and committed it, while A, unaware of this, writes the X value that it thinks is correct, overwriting the value of transaction B, which is A lost update. The essence of missing updates is to make modification decisions based on stale data, but stale records and modified records are the same record, which is the only difference from write bias.

Solution: Read data must acquire lock, write data must add exclusive lock

Principle: It is exactly the same principle as avoiding reading and writing bias to avoid records becoming obsolete.

Visible, the InnoDB serializable isolation level, based on the lock, and avoid all the concurrency issues, is the most secure transaction isolation level, but not in the business development every concurrency issues we may encounter, due to the uniqueness of the business, may face some concurrency issues or you can use other ways to avoid these concurrent damage problem of business, In order to avoid all concurrency problems, using locks is obviously a low cost option. Sometimes you can allow some concurrency problems to reduce the use of locks and improve concurrency efficiency. MVCC is a good alternative.

2. Lock replacement – use MVCC to improve concurrency

Serializable while guarantee the absolute security affairs, but the degree is low, many operations need to queue for, in order to improve the efficiency of the SQL standard for compromise on the isolation level, thus has the repeatable read isolation level, read submitted, they are all allowed some concurrency issues, here first repeatable read isolation level. In SQL standard, repeatable read only needs to completely avoid the three exceptions of dirty write, dirty read and unrepeatable read. At this time, if the lock is used again, read-write queuing is too inefficient, so MVCC was born. Concurrency Control is a multi-version Concurrency Control. To put it simply, it creates a snapshot for each transaction. Each transaction is guaranteed to read only its own snapshot data, no matter how many other transactions are updating the same record. The data read by this transaction will not change, that is, multiple versions of a record will be retained. Multiple transactions will read different versions of a record. MVCC replaces read locks and implements read-write non-blocking. The purpose of MVCC is to replace read lock, write is still locked, so as to avoid dirty write. The following is the implementation of MVCC, to understand how MVCC avoids concurrency problems, and finally discuss the limitations of MVCC in concurrency.

1. Implementation principle of MVCC

Version chain (Undo Log)

In MVCC, there are multiple versions of each record, which form a version chain. That is, when a record is updated, it is not In Place, but copied and modified to the version chain. When it is DELET, it is not immediately deleted from the file, but marked as deleted. It’s also part of the version chain. In InnoDB there are 2 hidden columns per record, one is trx_id and one is roll_pointer.

Trx_id indicates the transaction by which the record version was created. The database has a global transaction ID allocator, which must be incremental. The new transaction ID must not duplicate the old one.
Roll_pointer is the pointer to connect the version chain.

Read View

The most commonly heard concept in MVCC is snapshot, which is the end result, not the implementation. Snapshot = version chain + Read View. Instead of freezing a snapshot of all the records in the table for the transaction, MVCC generates a data structure called Read View when the transaction executes the first statement. Note that the Read View is generated only when the transaction executes. Simply executing start Transaction will not generate a Read View. The Read View holds the following information:

When a transaction reads a record, it determines which version of the record is visible to the transaction based on the transaction’s Read View:

If the trx_id of the record is the same as creator_trx_id, it means that the version was created by the transaction and can be read.
If the trx_id of the record is less than min_trx_id, the version was created before the transaction generated the Read View and can be Read.
If the trx_id of the record is greater than or equal to max_trx_id, it indicates that the version was created by a transaction started after the transaction generated the Read View and must not be Read.
If the trx_id of the record is between min_trx_id and max_trx_id, then whether trx_id is in m_IDS or not, then this version was committed when the transaction generated the Read View and can be Read.

With version chains and Read Views, even if other transactions modify the record, the transaction that is the Read View will not Read it. As long as the Read View does not change, the Read version must be the same every time. The repeatable Read and Read commit levels in MySQL are based on MVCC. The only difference is when a Read View is generated. The repeatable Read level generates a Read View when the transaction executes the first SQL, while the Read commit level regenerates a Read View when the transaction executes each SQL.

2. Limitations of MVCC

MVCC replaces the read lock, and while it has the advantage of not blocking writes, it does not prevent all concurrency problems.

1. Can MVCC avoid phantoms

A transaction cannot Read the version of the record generated by another transaction after the Read View is generated, so it can be inserted without a gap lock without reading another transaction. Conclusion: MVCC does not avoid phantom reading. The root cause of this problem is that InnoDB treats Update, Insert, and Delete as special operations. Special operations perform Current Read on records, i.e. Read the latest record. In other words, Read View only works on SELECT statements. If some of the users table with id 1, 2, 3, A total of 3 records, issues A read, transaction inserting A record and submit B, transaction UPDATE is inserted into A record can be successful, because the UPDATE is for the current read, can read id of 4 UPDATE records exist, therefore can be updated successfully, the transaction after the record of A successful UPDATE id of 4, Transaction A will add A new version of transaction A to the record version chain with id 4, so that transaction A can Read the record again, which is in accordance with the Read View rule, but produces A phantom Read.

2. Read Skew and Write Skew cannot be avoided

In MVCC, Read and Write data do not block. Therefore, the snapshot Read by transactions may be expired and the Read data may become stale. Therefore, Read Skew and Write Skew may occur.

3. You can’t avoid losing updates

R1(A) => R2(A) => W2(A) => W1(A) In Postgrel, the Repeatable Read level already avoids missing updates because it uses MVCC+ optimistic lock. If transaction 1 writes to A and the storage engine detects that the value of A has been changed by another transaction since transaction 1 started, it will report an error and prevent transaction 1 from writing to A. MVCC alone does not prevent lost updates; other mechanisms are needed.

Third, better practice of business

In business development, we should first understand the transaction isolation level of the database used by the project and its principle and performance, and then consider a better coding method according to the transaction implementation principle.

1. Avoid deadlocks

Different statement orders cause deadlocks

This situation must be familiar to you:

Try to operate in the same sequence of record statements

Different index orders cause deadlocks

Select * from users where id,user_name,password; select * from users where id,user_name,password; select * from users where id,user_name,password; There is a Unique Index on user_name with the following statement:

UPDATE users SET user_name = ‘[email protected]’ WHERE id = 1;

Mysql > select * from InnoDB where user_name = ‘id’ and user_name = ‘primary key’; mysql > select * from InnoDB where user_name = ‘primary key’; mysql > select * from InnoDB where user_name = ‘primary key’

Add IX locks to the table
Primary key plus X lock
Add X lock to index user_name

If another transaction executes the following statement:

UPDATE users SET password = ‘123’ WHERE user_name = ‘[email protected]’;

Deadlocks can occur. You can think about why. This statement is locked in the following order:

Find the index where user_name is ‘[email protected]’ and add X lock
Add IX lock to table
Primary key plus X lock

If INNODB is locked in the same primary key index and the same secondary index, but the order of the lock is different, it may cause deadlock.

2. Avoid unnecessary tasks

A Serializable transaction is not required in many business scenarios, such as receiving coupons. You do not need to open a Serializable transaction to SELECT the remaining number of coupons, determine whether there is a margin, and then UPDATE the coupon.

UPDATE coupons SET balance = balance – 1 WHERE id = 1 and balance >= 1;

After the statement returns, the number of updated rows is judged. If the number of updated rows is 1, the collection is successful; if the number of updated rows is 0, the collection fails because there are no records that meet the conditions. (Note: this is only considered for the coupon scenario, if the business also needs to write the coupon to the Users table and a series of other operations, it needs to put the transaction according to the business needs.)

3. Avoid putting unnecessary SELECT into transactions

What does it mean to put a SELECT into a transaction?

To read the transaction’s own version, you must put SELECT into the transaction
You need to rely on the results of the SELECT as a prerequisite for other statements, and you must not only put the SELECT into a transaction, but also ensure that the transaction is Serializable

For example, if the product_name of the next product is only used to write to the Orders table, there is no need to place the product_name in the transaction. Product_name written to order is old data 1 second ago and is acceptable.

4. Don’t be superstitious

Many developers mistakenly assume that it is safe to place a SELECT into a transaction, use the result as a judgment condition, or write condition. This may not be true depending on the isolation level. For example:

SELECT users from the SELECT users table. SELECT users from the SELECT users table. SELECT users from the SELECT users table
UPDATE the calculated score to the user_scores table

Placing these two statements in a transaction may not be safe, depending on the implementation of the transaction. If the transaction is Repeatable Read by InnoDB, then the transaction is not safe, because the SELECT reads the snapshot, and other transactions may have changed the user’s level before the UPDATE. He may not satisfy the triple score condition and then UPDATE the user_scores table, which is a business insecure transaction. Therefore, it is important to understand transactions before you use them, otherwise it is easy to use them incorrectly.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Understanding database transactions from 0 to 1 (part 2) : Isolation level implementation — MVCC and locks