preface

You now know that the four isolation levels for SQL are RU, RC, RR, and serialization.

The familiar default isolation level for MySQL is the third RR (repeatable read). Compared with THE SQL standard RR, the MySQL RR is implemented based on the MVCC mechanism. In this isolation level, it can prevent dirty writes, dirty reads, unrepeatable reads, and phantom reads.

So what is MVCC?

MVCC stands for Multi-version Concurrent Control, which is implemented based on undo log multi-version chain + ReadView mechanism.

Let’s talk about the implementation of the MVCC mechanism and the RC and RR based on the MVCC implementation.

1. Understand the MVCC prelude, what is the undo log version chain?

We talked about undo log before. Each piece of data has two hidden fields, trx_id and ROLL_pointer. Trx_id is the transaction ID that updated the data recently, and roll_pointer is the undo log generated before the transaction was updated.

If transaction A (id=50) inserts A value of A, then the corresponding trx_id is 50 and roll_pointer points to an empty undo log because it was not inserted before.

Transaction B has id 60 and rooll_pointer points to the actual undo log rollback log:

This undo log records the transaction ID of transaction B, the modified value B, and roll_pointer points to the undo log before it was modified

Transaction C then changes this value to C, and the corresponding transaction ID is 70, as shown in the figure below:

You can see that roll_pointer points to the undo log before this change and is concatenated with the undo log of transaction A.

Therefore, it is clear that trx_id and roll_pointer will be updated every time data is modified. Meanwhile, the undo log corresponding to multiple previous data snapshots will be connected in series through the roll_pointer to form a version chain.

2. What is the ReadView based on undo log multi-version chain?

When a transaction is executed, a ReadView is generated that contains several sections:

  • M_ids: what transactions are executing in MySQL at this time that have not yet been committed
  • Min_trx_id: the smallest value in m_IDS
  • Max_trx_id: specifies the maximum transaction ID of the next MySQL transaction to be generated
  • Creator_trx_id: indicates the ID of the current transaction

Here’s an example to understand the usefulness of ReadView.

Suppose there is already a data entry in the database, inserted by a previous transaction, with a transaction ID of 32 and an initial value of the original value.

Next, two transactions, A and B, are executed concurrently. The transaction ID of transaction A is 45 and that of transaction B is 59. Transaction B updates the row, and transaction A queries the row.

In this ReadView, m_ids contains the ids of transaction A and transaction B, 45 and 59, min_trx_id is 45, max_trx_id is 60, creator_trx_id is the current id of transaction 45, It’s transaction A.

Trx_id = 32 (min_trx_id = 45); trx_id = 32 (min_trx_id = 45); trx_id = 32 (min_trx_id = 45); The transaction to modify this row has already been committed, so it is possible to look up this row.

Transaction B then changes the row, changing the original value to B, and the row’s trx_id becomes 59. Roll_pointer points to the undo log generated before the change.

The trx_id is 59, which is greater than min_trx_id(45) in ReadView and less than max_trx_id (60) in ReadView. If a transaction (trx_id=59) exists in m_IDS, the transaction (trx_id=59) is executed concurrently at the same time as the transaction (trx_id=59).

Since this row of data cannot be queried, what data should be returned?

If trx_id=32, trx_id=32, trx_id=32, trx_id=32. The undo log version must be committed before transaction A is started.

The undo log version chain saves the snapshot chain and allows you to quickly read the previous snapshot value.

The above undo log version chain + ReadView guarantees that transaction A will not read the value of the concurrent transaction B update.

Assuming transaction A updates the row itself to value A, trx_id is updated to 45, and the worth snapshot of the previous changes made by transaction B is saved

Creator_trx_id (45) = creator_trx_id(45);

Then, while transaction A is executing A transaction, A transaction C with id 78 is suddenly started, and the row is updated to value C and committed.

Trx_id =78, which is larger than max_trx_id in ReadView, indicating that another transaction updated the data during transaction A’s execution, so it cannot be queried.

Then follow the undo log version chain to find the value you have modified.

Through the undo log version chain + ReadView mechanism, we know that after a transaction is started, we can only read the data that was modified before the transaction started or by the transaction itself. This enables data isolation when multiple transactions are executed concurrently.

3. RC isolation level based on ReadView mechanism

Committed read isolation level, which means that while a transaction is running, other transactions are executing and committed, and you can update data from other transactions, so there will be non-repeatable reads and phantom reads.

The core of the RC isolation level is to regenerate a ReadView each time a query is issued.

Suppose you now have two transaction pairs that agree to execute A row of data concurrently, transaction A and transaction B. Transaction A is the query data and transaction ID is 50. Transaction B updates data and transaction ID is 70.

Transaction A now initiates the query and opens A ReadView. Since transaction B executes concurrently, the structure in ReadView is:

Therefore, no matter how transaction B modifies the data and commits the transaction, transaction A cannot read the value changed by transaction B. The reason is also simple: the transaction ID of transaction B is in the active transaction list of m_IDS of ReadView.

How can transaction A read transaction B’s update and commit the value of the transaction?

That is, every time you query, you reopen a ReadView.

Now assume that transaction B has changed the row’s data to value B and committed. Transaction A queries again and restarts to generate A ReadView. In this generated ReadView, the only active transaction in the database is transaction A.

Transaction A finds that the trx_id of this data is 70, which is between min_trx_id and max_trx_id, but not in the m_IDS list. Note Transaction B is committed before the ReadView is generated.

Since the transaction was committed before the ReadView was generated, it means that transaction A can query the value changed by transaction B. To achieve committed read.

So the key thing about the committed read isolation level is that a new ReadView is generated for each query.

4. RR isolation level based on ReadView mechanism

Next we’ll look at the default isolation level of repeatable reads in MySQL and how to avoid both unrepeatable and phantom reads.

The repeatable read isolation level, as the name implies, means that a transaction reads the same piece of data no matter how many times it reads the same value. Other transactions cannot read the value even after modifying the data commit. At the same time, if other transactions insert some new data, also cannot read, can avoid unrepeatable read and phantom read.

Assume that A data entry already exists in the database, and transaction A and transaction B are executing at the same time. Transaction A’s ID is 60 and transaction B’s ID is 70

Transaction A issues A query, and the first query generates A ReadView

Trx_id =50 = min_trx_id =50 = min_trx_id =50 = min_trx_id =50 = min_trx_id =50

Transaction B updates the row, changes the value to B, generates an undo log, and commits.

M_ids contains the ids of other transactions that are executing at the time of transaction A. This does not mean that the committed transaction does not exist in m_IDS, except at the same isolation level as RC. Regenerate to a new ReadView.

Therefore, when transaction A queries this row of data again, because there is transaction ID of transaction B in m_IDS list, it indicates that transaction B is also an active transaction of the database. Even if transaction B commits, it will not read value B, but actually find the corresponding value along the undo log version chain.

Here, it becomes clear how ReadView can be used to avoid the problem of unrepeatable reads.

What about the illusion that can result from inserting data?

Select * from table where id > 10; select * from table where id > 10

Now we have a transaction C that inserts a piece of data and commits it.

Transaction A then queries again and finds that there are two days of eligible data, one is the original value and one is the value C.

Trx_id =80, which is larger than max_trx_id (71). Note The transaction is started only after the user initiates the query. Therefore, the data cannot be queried.

So in this query, transaction A can only query one data.

In this case, transaction A does not generate A phantom with the ReadView mechanism.

You can see how RR isolation levels can avoid unrepeatable and phantom reads based on ReadView.

conclusion

Through the analysis of a series of chapters and underlying principles, we all understand how the database dirty write, dirty read, non-repeatable read and phantom read problems arise.

MySQL implements RR isolation based on the undo log multiversion chain + ReadView mechanism to avoid dirty writes, dirty reads, unrepeatable reads, and illusory reads.