Note that this article is based on the understanding of two types of reads and row locks in mysql. If you are not familiar with the concepts of none-locking reads, gap lock or insert intention, please read the previous article first.

This paper mainly focuses on the following contents: MVCC parallel to LOCKING, ISOLATION level stipulated by SQL92 and implementation of InnoDB, and the influence of different isolation level of InnoDB on two kinds of reads.

MVCC

What is a MVCC

In the last article we talked about locking, which is a common way to ensure data consistency in concurrent scenarios. This time to talk about MVCC can be said to be the same level of lock design. MVCC stands for Multi-version Concurrency Control, but it’s commonly referred to as MVCC because of its long name. MVCC is essentially a strategy for managing concurrency through a multi-version mechanism. Because locking can limit efficiency, MVCC is often used as a supplement to locking to manage consistency between read and write concurrent requests. From here, we can see that consistency management in concurrent environment has a design idea of MVCC.

Almost all major databases implement MVCC. The implementation schemes and details of MVCC vary from database to database, but the overall idea is based on version read and write isolation: a TRX write operation will cause version changes, while keeping the old data version for other TRX to read.

Innodb implementation of MVCC

How do I represent version Verison?

The first step in implementing MVCC is to need some “tools” to record the version, preferably to help determine which version of the data to read. In innodb’s design, transactionId transactionId is the core tool for recording verison. Transactionids are incrementally added in InnoDB, large transactions start after small transactions (note that the trxId is not simply a time to automatically start the next TRX, which is related to the time the SELECT statement is executed). Innodb adds a hidden DB_TRX_ID field to each row to indicate the last Transaction attempted to modify that row. This field is called DB_TRX_ID. This field helps the SELECT statement determine whether the current row data is readable and the version it can see.

Where does the historical data exist?

Where does the historical version exist? There are two design directions for this: one is to still exist in place, just marked as DELETE, and recreate a currently valid data; The other direction is to change the data in place, change the version number of the data (trx_id), and put the historical data somewhere else.

Innodb uses the second method for row data (postgresQL uses the first method) : Each innoDB row has a DB_ROLL_PTR pointer that points to the previous version of the data in the rollback segment. The rollback segment is a part of undolog that records historical versions of the data. It is called rollback and undo because another key function of this segment is for rolling back transactions.

Innodb divides DML into two types: update and DELETE for existing records, and INSERT for new data. This setup is consistent with the two types of locks we discussed in the previous article: records and voids. The undo log is divided into two parts, update and INSERT. Innodb assumes that the rollback information will be cleared immediately after the INSERT operation is committed. The update class information needs to be kept until all MVCC supported consistent reads no longer require it before the Purge (purge is the innoDB background thread’s gc operation on useless data in Undolog).

Updated in-place & historical reconstruction

Update (update+delete); update(update+delete); insert (update+delete); The pointer on the row is updated with the new undo log location.

When another TRX select finds that the trxId of this row is greater than its own trxId, it will find the history of the row in undolog according to the pointer until it finds the maximum change record that is less than its own trxId. (This is a determination that trxId meets the requirement, and this determination varies according to the isolation level).

How to identify a row requires a rowId design, since PK can also be updated, and the increment invariant rowId is a reliable ID for identifying a row. So there is also a hidden rowId field on the row data.

So when you need to roll back, you can just go back to the historical version of the current row, When another TRX selects a SNAPSHOT, the trxId corresponding to the rowId is found to be inconsistent with the trxId corresponding to the rowId in the snapshot snaptshot of TRX (in a consistent environment, TRX saves the version information in its select snapshot). The undo log is also checked from a pointer to retrieve the original historical version of the data back to the user. From the design of TRX snapshot, we can better understand why a TRX is called a unit of work.

Secondary Index MVCC of the secondary index

The in-place update on record is actually based on a clustered index design, because records are stored in the leaf node of the clustered index. However, the leaf node record of the secondary index only records the ID of the clustered index without a line of data, so the implementation scheme of MVCC for the secondary index is different from that described above.

Instead of adding new fields or updating them in place, the first idea mentioned above is adopted: instead of making changes to the original data, new data is generated. The original data entry will be marked as deleted, and the updated trxId will be recorded on the page where the new index index is located. When SELECT TRX finds a larger trxId on a DELETE or page, select will go back to the table to query the clustered index and use the clustered index information to look up the undo log. From this process, it can be found that the process of MVCC guaranteeing consistent read will cause index coverage failure.

Isolation

Let’s start with Isolation, which is the I in database ACID. The isolation level is the degree to which different transaction transactions are isolated when using the same data. When it comes to isolation levels, there are two levels, one is the isolation level specified in the SQL standard, and one is the implementation of the isolation level by the database.

SQL – 92 definition

The third SQL revision, SQL-92, defines four isolation levels.

  • Read Uncommitted
  • Read Committed
  • Repeatable Read
  • Serializable

In SQL-92, the isolation level is defined by whether there are inconsistencies such as dirty read, none-repeatable read and phantom read.

Dirty, none-repetable, Phantom

Dirty Read The simplest definition of a Dirty Read is to Read only inserts or updates from other TRX that have not yet been committed.

When I first saw unrepeatable reading and unreal reading, I always thought these two are the same thing. I wonder if you have the same doubts as me. However, if you look at the locking situation in the first chapter, you know that the locking situation is completely different between the existing records and the existing records (gaps). Inrepeatability and illusory reading are also distinguished by the existence of records.


Non-repeatability refers to the existing records read in one TRX that have been modified or deleted by other TRX and submitted by other TRX. In other words, the data read for the first time has changed or disappeared. A phantom read is a TRX record that has been read by another TRX insert and commit.

Definition of isolation levels

In SQL-92, isolation levels are based on whether or not these three problems can be prevented (note that this is not entirely consistent with innoDB’s implementation). Serializable is defined as a transaction executed concurrently in a certain order and executed sequentially in the same order. If the parallelism and serialization are identical, then there will be no concurrency inconsistencies.

In fact, an important point to note here is that these definitions are more for non-locking reads, namely ordinary select. Imagine if all rows involved are locked, then even the weakest isolation level will not appear dirty reads. Why is it more? Different isolation levels implement different locks, so the three inconsistent reads can also be discussed in the case of locking reads. The following figure shows the isolation levels defined by SQL-92:

Innodb claims to support four isolation levels, but the implementation schemes and effects are not exactly the same as standard SQL-92. One of the biggest differences is that InnoDB prevents phantom reads in the Repetable AD phase. The isolation level of InnoDB implementation can be seen in the figure below. Serializable is not commonly used. Compared with REPEATable, it reduces the parallelization of SQL by changing all select to SELECT for share.

Isolation level and two readings

The viewpoint thrown out in the last part is that several inconsistent reads referred to by the isolation level are in the case of non-locking reads, but the isolation level also affects the locking of locking reads. Insert Intention lock, gap lock and next key lock are used to lock gaps.

consistent none-locking reads

Innodb serialIZABLE isolation level if auto_commit = false will change non-locking select to locking select for share. If auto_commit = true then each select constitutes a transaction and ends immediately. So none-locking reads need not be discussed at this isolation level.

At another isolation level read-uncommitted, uncommitted changes can also be seen by other TRX’s. There is no consistent reading between TRX’s at this isolation level.

Consistent None-locking reads focuses on read-committed (RC) and Repeatable- Read (RR), which are the two most commonly used isolation levels. For RC and RR isolation levels, none-locking reads use the MVCC mechanism described above to ensure read consistency within the transaction. However, the two have different requirements for consistent. RC believes that any read submitted can be read, so RC’s snaptshot will be updated every time it reads. RR believes that the snaptshot data read for the first time should last until the end of TRX.


This is why illusory and non-repeatability occur in RC and not in RR: As long as udpate and insert are committed, both udpate and INSERT are considered readable in RC. RC select updates snapshot to include the TRx_id of these updates and inserts. Since the TRX in the RR does not change the snapshot it first reads, the UPDATE and insert will not be read because their TRx_id is not in the range that the RR saves the snapshot.

locking reads

Locking reads have different locking states at different I levels. RC and RR can learn in two ways.

Record and gap

Repeatable Read (non-repeatable Read) is protected from Phantom Read (non-repeatable Read) in RC (non-repeatable Read).

The RR level assigns a next key lock forward and a gap lock backward to the next record. The RR level effectively protects against Phantom Read. In fact, the implementation of InnoDB always prevents insert into the previous record and the next record. If the table is very sparse, the locked interval can be quite large. This is another reason why transaction should be as short and fast as possible.

select * from tb_user_complaint where user_id = 222 forupdate; +----+----------------------------+---------+------------------+------------+----------------------------+-----------+ |  id | created_at | user_id | contents | is_archive | last_updated_at | user_name | +----+----------------------------+---------+------------------+------------+----------------------------+-----------+ | 2 | 2020-02-12 15:12:19. 214543 | 222 | | an - test - 1 ^ @ 15:12:19. | 2020-02-12 | 214543 macavity | +----+----------------------------+---------+------------------+------------+----------------------------+-----------+ // RR level: (222,2) X = next-key lock, (500,1042) gap lock = gap lock; +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+-------------+-----------------------+-----------+---------------+------- ------+-----------+ | ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+-------------+-----------------------+-----------+---------------+------- ------+-----------+ | INNODB | 4724256128:1079:140387850342152 | 6118 | 1068 | 134 |test| tb_user_complaint | <null> | <null> | <null> | 140387850342152 | TABLE | IX | GRANTED | <null> | | INNODB | 4724256128:22:5:3:140387876554776 | 6118 | | 1068 | 134test          | tb_user_complaint | <null>         | <null>            | idx_user_id | 140387876554776       | RECORD    | X             | GRANTED     | 222, 2    |
| INNODB | 4724256128:22:4:17:140387876555120 | 6118                  | 1068      | 134      | test| tb_user_complaint | <null> | <null> | PRIMARY | 140387876555120 | RECORD | X,REC_NOT_GAP | GRANTED | 2 | | INNODB | 4724256128:22:5:8:140387876555464 | 6118 | | 1068 | 134test| tb_user_complaint | <null> | <null> | idx_user_id | 140387876555464 | RECORD | X,GAP | GRANTED | 500, 1042 | +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+-------------+-----------------------+-----------+---------------+------- ------+-----------+ // At the RC level: There are only two Index Records locks +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+-------------+-----------------------+-----------+---------------+------- ------+-----------+ | ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+-------------+-----------------------+-----------+---------------+------- ------+-----------+ | INNODB | 4724256128:1079:140387850342152 | 6117 | 1068 | 129 |test| tb_user_complaint | <null> | <null> | <null> | 140387850342152 | TABLE | IX | GRANTED | <null> | | INNODB | 4724256128:22:5:3:140387876554776 | 6117 | | 1068 | 129test          | tb_user_complaint | <null>         | <null>            | idx_user_id | 140387876554776       | RECORD    | X,REC_NOT_GAP | GRANTED     | 222, 2    |
| INNODB | 4724256128:22:4:17:140387876555120 | 6117                  | 1068      | 129      | test| tb_user_complaint | <null> | <null> | PRIMARY | 140387876555120 | RECORD | X,REC_NOT_GAP | GRANTED | 2 | +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+-------------+-----------------------+-----------+---------------+------- ------+-----------+Copy the code

1) whether the index used in the execution plan is unique, and 2) whether the corresponding record can be found:

  • If the index used is unique, and the search interval is equivalent, and there are hit records; In this case innoDB will no longer use gap or next-key locks because unQIue locks prevent inserts.
  • If the above requirements are not met, such as unique key search is a range, such as unique key search record does not exist, or using a key that is not unique, Innodb then uses next-key lock or gap lock (as in the example above).
Rec_not_gap lock select * from tb_user_complaintwhere id = 2 forupdate; +----+----------------------------+---------+------------------+------------+----------------------------+-----------+ |  id | created_at | user_id | contents | is_archive | last_updated_at | user_name | +----+----------------------------+---------+------------------+------------+----------------------------+-----------+ | 2 | 2020-02-12 15:12:19. 214543 | 222 | | an - test - 1 ^ @ 15:12:19. | 2020-02-12 | 214543 macavity | +----+----------------------------+---------+------------------+------------+----------------------------+-----------+ Time: 0.013 s +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------- -----+-----------+ | ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------- -----+-----------+ | INNODB | 4724256128:1079:140387850342152 | 6121 | 1068 | 145 |test| tb_user_complaint | <null> | <null> | <null> | 140387850342152 | TABLE | IX | GRANTED | <null> | | INNODB | 4724256128:22:4:17:140387876554776 | 6121 | | 1068 | 145test| tb_user_complaint | <null> | <null> | PRIMARY | 140387876554776 | RECORD | X,REC_NOT_GAP | GRANTED | 2 | +--------+------------------------------------+-----------------------+-----------+----------+---------------+---------- ---------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------- -----+-----------+Copy the code

Whether index records that do not meet the WHERE condition will be released

This trait feels easy to overlook.

  • Stage 1: The secondary index only contains the index itself and the primary key of the leaf node. If there are other filters in the WHERE conditions of update and DELETE, innoDB will select all rough hit rows with a Record lock.

  • Stage 2: InnDB will then return the pk hit to mysql, which will need to return to table Clustered Index for filtering. Different isolation levels treat rough hits differently here. The RC level releases record locks filtered through the back table (note that the first stage even RC requests locks first), and the RR level continues to hold all locks until TRX ends. 0 –

Let’s look at an example where we restrict the where condition to user_id and user_name, userId=555, and userId is selected as the key in the execution plan. There are 3 rows that hit userId=555 but only 2 rows that match name. When RR and RC are locked, the result is completely different.

select * from tb_user_complaint where user_id = 555 and user_name = "macavity" for update;
+-----+----------------------------+---------+------------------+------------+----------------------------+-----------+
| id  | created_at                 | user_id | contents         | is_archive | last_updated_at            | user_name |
+-----+----------------------------+---------+------------------+------------+----------------------------+-----------+
| 17  | 2020-02-15 16:08:00.184710 | 555     | complaint-test-1 | ^@          | 2020-02-15 16:08:00.184710 | macavity  |
| 123 | 2020-02-12 15:12:11.922491 | 555     | complaint-test-1 | ^@          | 2020-02-14 13:40:04.517506 | macavity  |
+-----+----------------------------+---------+------------------+------------+----------------------------+-----------+
Copy the code

In RR, all conditions hit by userId will be locked. Note that the lock type is X lock (nextKey lock) and the lock range is large.

Select * from ‘RC’ where lock (s) where lock (s) where lock (s) where lock (s);

If all suspected row locks need to be added to the first stage of the RC condition, is it possible for the two WHERE statements to collide in the first stage to cause a block even if they do not collide in the second stage?

Execute two statements below the RC level and the answer is does TRX-2 block? Trx-1 releases the user_name= MIDofinOS line in the second phase of the SQL query. But trX-2’s first stage of rough locking goes back to request a lock on the Macavity row. If one of them uses user_name as the key, there will be no block.

// the first TRX does not lock midofinOS, but the second TRX will still be blocked; // InnoDB uses user_name as index when index hints are dropped. The two TRX's no longer interfere with each other. // trx-1 select * from tb_user_complaint use index (idx_user_id)where user_id = 555 and user_name = "macavity";

// trx-2
select * from tb_user_complaint use index (idx_user_id) where user_id = 555 and user_name = "midofinos";
Copy the code

RU and SER

As for the effect of uncommitted and Serializable on locking-reads, it can be understood that RU has the same effect on locking-reads and RC, while serializable and RR have the same effect. These two levels are relaxation/reinforcement for RC and RR which are more consistent on none-locking reads.

conclusion

DB has always been one of my favorite topics, and I’m done with isolation levels and locks. Hope you enjoy it 🙂