preface

The implementation principle of MVCC is a very frequent interview question. Recently, the technical discussion group has been discussing it. Taking advantage of the National Day, let’s talk about it together.

  • Public number: a boy picking up snails

1. Review of relevant database knowledge points

1.1 What are database transactions and why do they exist

Transactions, consisting of a limited sequence of database operations that are either all or none executed, are an indivisible unit of work.

If A transfers 100 yuan to B, first deduct 100 yuan from A’s account and then add 100 yuan to B’s account. If after deducting 100 yuan from A, there is no time to add it to B, the banking system will be abnormal, and finally the balance of A will decrease, but the balance of B will not increase. So you need A transaction to roll back A’s money, that’s it.

Why should there be transactions? To ensure that the data is ultimately consistent.

1.2 What are the features of transactions?

Four typical features of transactions are ACID, Atomicity, Consistency, Isolation, and Durability.

  • Atomicity: The transaction is executed as a whole, and all or none of the operations on the database contained within it are executed.
  • Consistency: data will not be destroyed before and after the transaction. If account A transfers 10 yuan to account B, the total amount of account A and B will remain the same regardless of success or failure.
  • Isolation: When multiple transactions are accessed concurrently, the transactions are isolated from each other. One transaction should not be disturbed by other transactions, and multiple concurrent transactions should be isolated from each other.
  • Persistence: Indicates that operational changes made by the transaction to the database will be persisted in the database after the transaction is committed.

1.3 Problems of transaction concurrency

Transaction concurrency causes dirty reads, unrepeatable reads, and phantom reads.

1.3.1 dirty read

If a transaction reads data that has been modified by another uncommitted transaction, a dirty read is said to have occurred.

Suppose we now have two transactions A and B:

  • Assuming that A’s balance is now 100, transaction A is preparing to query Jay’s balance
  • Transaction B first deducts Jay’s balance by 10, but it hasn’t committed yet
  • Finally, A reads that the balance is 90, which is the balance after deduction

Because transaction A reads data that transaction B has not committed, this is A dirty read.

1.3.2 Unrepeatable Read

In the same transaction, data is read repeatedly and the data content is inconsistent

Suppose we now have two transactions A and B:

  • Transaction A queries Jay’s balance and finds 100
  • At this time, transaction B deducts Jay’s account balance, deducts 10, and submits the transaction
  • Transaction A then queries Jay’s account balance and finds that it’s 90

Transaction A is interfering with transaction B! In transaction A, two identical queries that read the same record but return different data are called non-repeatable reads.

1.3.3 phantom read

If one transaction queries records based on some search criteria, and another transaction writes records (insert, DELETE, update) that meet those search criteria before the transaction commits, a phantom read has occurred.

Suppose we now have two transactions A and B:

  • Transaction A first queries the account whose ID is greater than 2, and then obtains two records with id=2 and ID =3
  • At this point transaction B opens, inserts a record with ID =4, and commits
  • Transaction A executes the same query again and returns three records with id=2,3,4.

Transaction A queries the result set of A range, and another concurrent transaction B inserts new data into the range and commits the transaction. Then transaction A queries the same range again and reads A different result set. This is A phantom read.

1.4 Four isolation levels

To solve the problems of dirty reads, unrepeatable reads, and phantom reads in concurrent transactions, the database uncle designs four isolation levels. Read uncommitted, read Committed, repeatable read, Serializable.

1.4.1 Read Uncommitted

The read uncommitted isolation level only limits that two data cannot be modified at the same time. However, when the data is modified, even if the transaction is not committed, it can be read by other transactions. This level of transaction isolation has problems such as dirty read, duplicate read, and phantom read.

1.4.2 Read Submitted

Read committed isolation level, the current transaction can only read data submitted by other transactions, so the isolation level of this transaction solves the dirty read problem, but there are still repeated read and phantom read problems.

1.4 3 Repeatable

The repeatable read isolation level limits the ability to modify data when reading data. Therefore, the problem of repeatable read is solved. However, when reading range data, data can be inserted, resulting in illusory read problems.

1.4.4 serialization

The highest isolation level for transactions, at which all transactions are executed in serialized order. Avoid all concurrency problems of dirty reads, unrepeatable reads, and phantom reads. However, at this level of transaction isolation, transaction execution is costly.

1.4.5 What are the concurrency problems of the four isolation levels

Isolation level Dirty read Unrepeatable read Phantom read
Read uncommitted Square root Square root Square root
Reading has been submitted x Square root Square root
Repeatable read x x Square root
serialization x x x

1.5 How does the database ensure transaction isolation?

Databases are locked to achieve transaction isolation. It’s like if you want to be alone and not be disturbed, you can put a lock on your door.

Locking really works and ensures isolation. Serialization isolation levels, for example, are locked. However, frequent locking causes that data cannot be modified when it is read, and data cannot be read when it is modified, which greatly reduces database performance.

So, how to solve the performance problem after locking?

The answer is MVCC multi-version concurrency control! It realizes reading data without locking, can let read data modify at the same time. Data can be read at the same time.

2. What is MVCC?

Concurrency Control — MVCC, for Multi-version Concurrency Control. It is a method of concurrency control, generally in the database management system, to achieve concurrent access to the database, in the programming language to achieve transaction memory.

Popular speaking, at the same time there are multiple versions of the data in the database, not multiple versions of the whole database, but a record of multiple versions exist at the same time, at the time of a transaction carries on the operation, need to look at the records of hidden column transaction version id, than transaction id and according to the isolation level to judge which version to read data.

Database isolation level read committed and repeatable read are implemented based on MVCC. Compared with the simple and crude way of locking, it can deal with read and write conflicts in a better way, which can effectively improve the performance of database concurrency.

3. Key knowledge points of MVCC implementation

3.1 Transaction version number

Before each transaction is started, a self-increasing transaction ID will be obtained from the database, and the sequence of transaction execution can be judged from the transaction ID. This is the transaction version number.

3.2 Implicit fields

For the InnoDB storage engine, each row has two hidden columns trx_ID, roll_pointer, and a third hidden primary key column ROW_id if there are no primary keys and non-null unique keys in the table.

The column name Whether must describe
row_id no Monotonically increasing row ID, not required, takes up 6 bytes.
trx_id is Records the transaction ID that operates on the data transaction
roll_pointer is This hidden column acts as a pointer to the undo log of the rollback segment

3.3 the undo log

Undo log: rollback logs are used to record information before data modification. Before modifying the table record, the data is copied to the Undo log. If the transaction is rolled back, the undo log can be used to restore the data.

You can assume that when a record is deleted, the Undo log records a corresponding INSERT record, and when an update record is updated, it records a corresponding reverse update record.

What is the use of undo log?

  1. Ensure atomicity and consistency when a transaction rolls back.
  2. Used for MVCC snapshot reads.

Version 3.4 of the chain

When multiple transactions operate on a row in parallel, the changes made to the row by different transactions generate multiple versions, which are then connected to a linked list by rolling back the pointer (roll_pointer). As follows:

In fact, through the version chain, we can see the relationship between transaction version number, table hidden columns, and undo log. Let’s do a little bit more analysis.

  1. Select * from core_user where id = 1 and name = sun quan;

  1. Now start A transaction A: execute on the CORE_USER tableUpdate core_user set name =" cao "where id=1The following process is performed
  • First get a transaction ID=100
  • Copy the data from core_USER to undo log
  • Alter TABLE core_user select * from core_user where id=1 and name = cao cao
  • Change the data transaction Id=101 to the current transaction version number and point roll_pointer to the undo log data address.

3.5 Snapshot Read and Current Read

Snapshot reads: Read the visible version of the recorded data (with older versions). No lock, normal SELECT statements are snapshot reads, such as:

select * from core_user where id > 2;
Copy the code

Current read: Reads the latest version of the recorded data, and explicitly locks the current read

select * from core_user where id > 2 for update;
select * from account where id>2 lock in share mode;
Copy the code

3.6 Read the View

  • What is Read View? It is the read view generated when the transaction executes the SQL statement. In fact, in InnoDB, every SQL statement is preceded by a Read View.
  • So what does Read View do? It is mainly used for visibility judgment, that is, to determine which version of data is visible in the current transaction ~

How does a Read View guarantee visibility? Let’s start by looking at a few important properties of the Read View

  • M_ids: Active (uncommitted) read/write transaction ids in the current system. The data structure is a List.
  • Min_limit_id: indicates the minimum transaction ID of active read/write transactions in the system when the ReadView is generated, that is, the minimum value in m_IDS.
  • Max_limit_id: indicates the ID value that should be assigned to the next transaction in the system when a ReadView is generated.
  • Creator_trx_id: transaction ID of the current read View

Read View matching rules are as follows:

  1. If the data transaction IDtrx_id < min_limit_id, indicating that the transaction that generated the version was committed before the Read View was generated (because the transaction ID is incremented), so the version can be accessed by the current transaction.
  2. iftrx_id>= max_limit_id, indicating that the transaction that generated this version was generated after the ReadView was generated, so this version is not accessible by the current transaction.
  3. ifmin_limit_id =<trx_id< max_limit_id, need to be divided into three cases
  • (1) ifm_idscontainstrx_id, the transaction has not yet committed, but if the data istrx_idIs equal to thecreator_trx_idIf so, it means that the data is self-generated, thereforevisible.
  • (2) Ifm_idscontainstrx_idAnd,trx_idIs not equal tocreator_trx_idWhen the Read View is generated, the transaction is not committed and is not self-produced, so is the current transactionCan’t see;
  • (3) ifm_idsDoes not containtrx_idThe transaction was committed before the Read View was generated, and the result of the modification is visible to the current transaction.

4. MVCC implementation principle analysis

4.1 What is the process of querying a Record Based on MVCC

  1. Gets the version number of the transaction itself, the transaction ID
  2. Access to the Read View
  3. The transaction version number in the Read View is compared.
  4. If the Read View’s visibility rules are not met, the Undo log historical snapshot is required.
  5. Finally, the data that matches the rule is returned

InnoDB implements MVCC via Read View+ Undo Log. Undo Log stores historical snapshots. Read View visibility rules help determine whether the current version of data is visible.

4.2 Read Committed (RC) isolation level, there is an analysis process of non-repeatable read problems

  1. Create table core_USER and insert a row of initialization data as follows:

  1. The isolation level is set to read Committed (RC), and both transactions A and B query and modify the CORE_USER table.
SQL > select * from core_user where id=1Copy the code

The execution process is as follows:

In the end, transaction A queries the record of name= Cao Cao. Based on MVCC, we analyze the execution process:

(1). A starts the transaction and first gets A transaction ID 100

(2).B starts the transaction, and the transaction ID is 101

(3). Transaction A generates A Read View with the corresponding value as follows

variable value
m_ids 100101
max_limit_id 102
min_limit_id 100
creator_trx_id 100

Then back to the version chain: Start picking visible records from the version chain:

As can be seen from the figure, the content of column name in the latest version is Sun Quan, and the value of trx_id in this version is 100. Start read View visibility check:

Min_limit_id (100)=<trx_id (100) <102; creator_trx_id = trx_id =100;Copy the code

If trx_id=100, the current transaction is visible. Therefore, it is found that the name of the record is Sun Quan.

(4). Transaction B modifies the operation and changes the name to Cao Cao. Copy the original data to undo log and modify the data to mark the transaction ID and the address of the previous data version in Undo log.

(5) Commit transaction

(6) Transaction A performs the query operation again and generates A new Read View. The corresponding value of Read View is as follows

variable value
m_ids 100
max_limit_id 102
min_limit_id 100
creator_trx_id 100

Then go back to the version chain again: pick visible records from the version chain:

As can be seen from the figure, the column name of the latest version is Cao Cao, and the trx_id value of this version is 101. Start Read View visibility check:

Min_limit_id (100)=<trx_id (101) <max_limit_id (102); However,trx_id=101 does not belong to the m_IDS collectionCopy the code

Therefore, the record trx_id=101 is visible to the current transaction. Therefore, the SQL query shows the record whose name is Cao Cao.

SQL > commit (RC) isolation level; SQL > commit (RC) isolation level; SQL > commit (RC) isolation level;

4.3 Repeatable Read (RR) Isolation level to solve the analysis of non-repeatable read problems

How to solve the problem of unrepeatable reads in RR isolation level? Let’s look at it again,

Same process as in section 4.2, same transaction A and transaction B, as follows:

4.3.1 Read View works differently at different isolation levels

In fact, the Read View works differently at different transaction isolation levels. RR can solve the problem of non-repeatable reads.

  • At the Read Committed (RC) isolation level, each query creates a new copy of the Read View within the same transaction, which can lead to inconsistent reads within the same transaction (non-repeatable Read concurrency).
begin
select * from core_user where id =1 Generate a Read View
/ /
/ /
select * from core_user where id =1 Generate a Read View
  • In RR isolation, read Views are retrieved only once in a transaction and are shared by replicas to ensure that the data is the same for each query.
begin
select * from core_user where id =1 Generate a Read View
/
/
select * from core_user where id =1 Share a copy of the Read View

4.3.2 Example Analysis

Let’s go back to the 4.2 example and execute the second query:

Transaction A performs the query again, reusing A copy of the old Read View with the following values

variable value
m_ids 100101
max_limit_id 102
min_limit_id 100
creator_trx_id 100

Then go back to the version chain again: pick visible records from the version chain:

As can be seen from the figure, the column name of the latest version is Cao Cao, and the trx_id value of this version is 101. Start read View visibility check:

Min_limit_id (100)=<trx_id (101) <max_limit_id (102); Because m_ids{100,101} contains trx_id (101) and creator_trx_id (100) does not equal trx_id (101)Copy the code

Therefore, trx_id=101 is not visible to the current transaction. Trx_id =100; roll_pointer (trx_id=100);

Min_limit_id (100)=<trx_id (100) < max_limit_id (102); Because m_ids{100,101} contains trx_id (100) and creator_trx_id (100) equals trx_id (100)Copy the code

Therefore, the record trx_id=100 is visible to the current transaction. The problem of unrepeatable reads is solved by reusing the old copy of the Read View in RR isolation.

4.4 Network legend, does MVCC solve the illusion problem?

There is a legend that the RR isolation level of MVCC has solved the illusion problem. Let’s analyze it together.

4.4.1 A snapshot read in RR does not have a phantom read problem

According to the figure, the query result set in Step 2 and Step 6 does not change. It seems that the RR level has solved the phantom problem

4.4.2 RR level, a current read example

Suppose we have an ACCOUNT table with four entries in the RR table.

  • Start transaction A, perform current read, and query all records whose ID is >2.
  • Then open transaction B and insert data with id=5.

The process is as follows:

Transaction A is performing A select… Select * from share mode where id = 3 and id = 4; select * from share mode where id > 2;

For RR isolation, select, UPDATE, delete, and other locked statements use a gap lock and a temporary key lock to lock the range between index records and avoid inserting records between ranges.

4.4.3 This particular scenario seems to have illusory problems

Update account set balance=200 where id=5; This step operation, the same transaction, the same SQL, check the result set is different, the result, in line with the definition of magic read ~

This question, my dear friend, do you think it is an illusory problem? So, in essence, RR isolation levels may still have illusory problems. Please leave your comments in the comments section.

Reference and thanks

  • Database foundation (4) Innodb MVCC implementation principle