Multi-version concurrency control (MVCC) in MySQL is a common way to deal with read/write conflicts in modern database engine implementation. MVCC, as an advanced feature of MySQL, aims to improve the throughput performance of database in high concurrency scenarios.
First, what is the background of MVCC?
The four isolation levels of a transaction and the corresponding three exceptions:
-
Dirty read: a transaction reads data that another transaction has not committed.
-
Non-repeatable read: in the same transaction, the same data is read twice with different contents.
-
Phantom read: in the same transaction, the same operation is read twice, resulting in a different number of records.
In MySQL, the default isolation level is repeatable reads, which addresses dirty and unrepeatable reads, but not phantom reads. If we want to solve the phantom problem, we need to serialize, which is to raise the isolation level to the highest, but this will significantly reduce the transaction concurrency of the database.
MVCC solves the problem of unrepeatable read and phantom read through optimistic lock, which can replace row-level lock in most cases and reduce the system overhead.
MySQL > update loss caused by concurrent transactions
-
Optimistic locking: Its implementation is, as its name suggests, a better case.
Every time we fetch the data, we think that others will not modify it, so we will not lock it. However, when we update the data, we will judge whether others have updated the data during this period, which can be achieved by using the version number mechanism and CAS algorithm.
-
Pessimistic locks: Pessimistic locks are like its name, always assume that is worst, each time you take data all think that others will be modified, so every time when take data will be locked, so people want to take this data will be blocked until it got locked (Shared resources to only one thread at a time using, other threads blocked, after use to transfer resources to other threads).
What is MVCC and what problems does it solve?
MVCC implements database concurrency control through multiple version management of data rows. In a nutshell, the idea is to save historical versions of data.
We can compare the version number to determine whether the data is displayed or not (the rules will be described later), and we can read the data without locking it to ensure that the transaction is isolated.
With MVCC we can solve the following problems:
(1) the problem of blocking between read and write, through MVCC can make read and write do not block each other, that is, read does not block write, write does not block read, so that can improve the transaction concurrent processing ability.
(2) Reduce the probability of deadlock. This is because MVCC uses an optimistic locking approach, which does not require a lock to read data and locks only the necessary rows for write operations.
(3) Solve the problem of consistent reading. Consistent read is also called snapshot read. When we query the snapshot of the database at a certain point in time, we can only see the update result of the transaction committed before this point in time, but not the update result of the transaction committed after this point in time.
A few words that might be difficult to understand:
-
Snapshot read: Reads snapshot data. Simple SELECT operations without locks are snapshot reads (common read operations).
-
Current read: The current read reads the latest data, not the historical version of the data.
The current read (including lock reads and DML operations) is performed for locked SELECT, or for adding, deleting, or modifying data.
Analysis of application examples
To better understand MVCC, let’s use a sample scenario to illustrate.
Suppose you have a user_balance table, which contains three fields, username username, balance balance and bankcard number, as follows:
The database administrator wants to query the total amount in the user_Balance table when user A transfers money to user B. The two scenarios are concurrent. What problems may occur when there is no MVCC?
Case1: Due to the need to add line lock, it takes A long time for user A to transfer money to user B, as shown in the following figure.
Case2When we read with row lock, deadlock may occur, as shown in the following figure.
For example, when we read that A has 1000 yuan, B starts to transfer money to A.
4. How InnoDB implements MVCC?
When querying a record, the process is as follows:
-
First get the version number of the transaction itself, that is, the transaction ID;
-
Get Read View;
-
The resulting data is then compared to the transaction version number in the Read View;
-
If you do not comply with the ReadView rule, you need to retrieve the historical snapshot from Undo Log.
-
Finally, the data that matches the rule is returned.
Relevant concepts
1. Transaction version number
A self-growing transaction ID that marks the order in which transactions are executed.
2. Read View
In THE MVCC mechanism, multiple transactions updating the same row record produce multiple historical snapshots, which are stored in the Undo Log. If a transaction wants to query this row record, which version of the row record needs to be read?
This is where the Read View comes in, which helps us solve the row visibility problem. The Read View holds a list of all transactions that were active (and not committed) when the current transaction was started. In other words, the Read View holds a list of other transaction ids that should not be seen by this transaction.
Several important properties in Read VIew:
-
Up_limit_id, the smallest transaction ID of active transactions;
-
Trx_ids, the set of transaction ids that are currently active in the system;
-
Low_limit_id, the maximum transaction ID of an active transaction;
-
Creator_trx_id, the transaction ID that created the Read View.
3. Hidden columns of row records
InnoDB’s leaf segment stores the data pages, which hold the row records. There are some important hidden fields in these row records:
-
DB_ROW_ID:
6-byte: records the ID of the transaction that operates on the data transaction.
-
DB_TRX_ID:
6-byte: the hidden ID is used to create a clustered index when there is no suitable index for the created table.
-
DB_ROLL_PTR:
7-byte: rollback pointer pointing to the undo log position of the previous version data;
4. Aggregate indexes
A clustered index is when the physical order of the data in a database table row is the same as the logical (index) order of the key values. A table can have only one clustered index. Because a table can have only one physical order, there can only be one clustered index.
5. Undo Log
InnoDB saves snapshots of row records in the Undo Log, which can be found in the rollback section. It is mainly used to record the logs before the data is modified. Before modifying the table information, it copies the data to the Undo Log, and restores the data through the Undo Log when the transaction is rolled back.
The association relationship between rollback Pointers in the rollback segment is as follows:
5. How does InnoDB solve phantom reading?
1. Illusory reading will occur even if MVCC is used to read committed data
We at the same time open transaction and transaction B, first in the transaction to A condition in A range of queries, read the row it locks, increase in transaction B A range of data comply with the conditions, and submit it, then we again in the transaction A range of data query the conditions, you will see the result set of A qualified data, This is where phantasmagoric reading occurs.
The reason for phantom reading is that InnoDB only uses Record Locking when a commit is being read.
InnoDB has three ways to lock rows:
-
Record the lock:
Add locks for a single row record.
-
Gap Locking:
You can lock a range (the gap between indexes), but not the record itself.
The gap lock is used to prevent phantom reading.
-
Next, the Key lock:
Locking a range and locking the record itself, equivalent to gap locking + record locking, solves the phantom problem.
2. In the case of repeatable reads, InnoDB can use next-key lock +MVCC to solve the phantom read problem.
When trying to insert player Alex Len (2.16m), transaction B will time out and cannot insert this data.
This is because the next-key lock locks all ranges of height>2.08, and cannot insert data that matches this range. Transaction A then re-queries the conditional range, and there is no phantom reading.
Six, summarized
The core of MVCC is Undo Log+ Read View.
-
MV uses Undo Log to save historical versions of data to manage multiple versions.
-
“CC” is managed through Read View, which determines whether data should be displayed or not.
At the same time, for different isolation levels, the Read View generation strategy is different, so different isolation levels are implemented.
Author: The road to architecture Improvement, ten years of research and development road, Dachang architect, CSDN blog expert, focus on architecture technology precipitation learning and sharing, career and cognitive upgrade, adhere to share practical articles, looking forward to growing with you
Attention and private message I reply “01”, send you a programmer growth advanced gift package, welcome to hook up.
Thanks for reading!