Writing in the front
- What are the problems caused by the lack of isolation of transactions?
- What is a transaction isolation level and why is it needed?
- What is MVCC and what problems does it solve?
- How is MVCC implemented?
background
As a basic service, mysql usually has multiple application services interacting with it as clients at the same time. Each application service executing multiple statements in a transaction does not want to be affected by or affect transactions of other clients
The simplest approach is for all transactions to execute exclusively, so that when all statements of one transaction have been executed (commit or rollback), statements of other transactions can be executed. But the impact on performance is too great. In fact, such isolation is not required in most business scenarios, so some of it can be sacrificed in exchange for performance of database services
Problems caused by isolation
What problems might arise if there is a lack of isolation between different transactions?
Let’s first define what is no problem: is the result of executing a transaction the same as executing each transaction sequentially
Dirty write
Dirty writes, in which one transaction makes changes to uncommitted change records of another transaction
For example: two transactions A and B, when transaction B updates A record but has not committed it, transaction A updates the record and commits it. If transaction B rolls back later, the record will be in the same state as before transaction B’s update. Not only B’s update will be rolled back, but transaction A’s update will also be rolled back
What’s wrong with that?
- This is coupled with fully guaranteed transaction isolation as a result of serial execution
Does not conform to
, if the serial execution, no matter which A or B executes first, in the case of B rollback, the final result of the record must be the result after A modification, rather than the result before B update - Secondly, transaction A was obviously modified after the modification of transaction B, and finally returned to the previous modification of transaction B, as if the update had been lost, which did not meet most business expectations
Fortunately, Mysql Innodb avoids dirty writes at all transaction isolation levels
Dirty read
Dirty reads, when one transaction reads uncommitted changes from another transaction
For example: two transactions A and B, when transaction B modifies A piece of data before committing it, transaction A reads the record and performs some operations based on the record. If transaction B rolls back later and the modified record of TRANSACTION B is restored to its original state, the previous read of the record of transaction A is A dirty read
What’s wrong with that?
- As with dirty writes, if transactions A and B are executed sequentially, there must be no dirty reads, and the result of this operation is not the same as that of A fully isolated operation
Don't agree
- If transaction B needs to be rolled back later, the modified record also needs to be rolled back. This modified value should never have existed in the first place, and it can have undesirable consequences if transaction A relies on data that should not exist for subsequent operations. That is
Wrong cause leads to wrong effect
Unrepeatable read
Non-repeatable reads, that is, a transaction read to a record, are not the same every time
For example, transaction A reads A record X at the beginning, and the result is M. At this time, another transaction B modifies record X, changing the value of record M to N and committing. Transaction A does not commit, then reads record X, and finds that the value of X changes to N
In this case, unrepeatable reads can only read the data that has been submitted, avoiding dirty reads
So what’s wrong with that?
- It’s ok to say that there is no problem, after all, reading data is
Have submitted
The,There is no problem with dirty reading
, but the value of a record is queried multiple times within a transaction - If there is a problem, there is a problem. Again, this is the same as complete serial execution, which guarantees complete isolation
Not the same
. If it is fully serial, when one transaction starts any operation, no other transaction will intersperse it, and no other transaction will modify any records. Of course, the original transaction will read the same data every time. This is problematic if the user expects the same result to be read every time in a transaction
Phantom read
Phantom read, that is, one transaction queries some records according to a certain condition, and then another transaction inserts some records that serve this condition. When the original transaction queries again according to the original condition, the inserted records will also be read out. This phenomenon is called phantom read
Although repeatable for previously read records, there is no guarantee for newly inserted records
A small summary
From dirty writes to dirty reads to unrepeatable reads to phantom reads, the severity of each isolation problem decreases
In fact, every kind of isolation problem you can say is problematic, because if you compare it to complete isolation, none of these isolation problems more or less fit the definition of complete isolation
But there’s nothing wrong with unrepeatable reads, for example: if the user expects to read the latest committed data every time in a transaction, then there’s nothing wrong with unrepeatable reads
Therefore, databases typically support custom isolation levels that are tailored to your needs and strike a balance between performance and isolation
The isolation level of the transaction
Four isolation levels are defined in the SQL standard. Each isolation level and its possible isolation problems are as follows:
Isolation level | Dirty read | Unrepeatable read | The phantom read |
---|---|---|---|
Uncommitted read | May occur | May occur | May occur |
Has been submitted to read | – | May occur | May occur |
Repeatable read | – | – | May occur |
serializable | – | – | – |
Mysql can prevent phantom reads at repeatable read isolation level
MVCC
The uncommitted read isolation level causes all isolation issues except dirty reads and is simple to implement by reading the most recent value of a record every time, regardless of whether or not the transaction that produced the value is committed
Serializable, which can be implemented to execute each transaction serially, is also simpler
So how does committed read and repeatable read isolation levels prevent dirty reads and avoid dirty and repeatable read problems?
Mysql Innodb uses multi-Version Concurrency Control (MVCC) to implement the Concurrency Control. When reading a record that is being written by another transaction, the client should read the historical Version of the record to support concurrent read-write operations and improve system performance
What is the historical version?
Mysql Innodb has two hidden columns for each clustered index record:
- Trx_id: Each time a transaction changes a cluster index record (insert, update, delete), the transaction ID of that transaction is assigned to
trx_id
Hidden columns
Each transaction in Innodb is assigned a transaction id in ascending order. For example, there are currently two transaction ids 1,2. So the next time a new transaction starts its transaction ID will be 3
- Roll _pointer: every time a change is made to a cluster index record, the old value of the record is not discarded. Instead, the old version of the record is written to
The undo log
The hidden column then acts as a pointer to find the recordInformation before modification
The record was modified in chronological order as follows:
The transaction of 100
Insert aname
forjames
The record ofThe transaction of 101
Will record thename
The column fromjames
Updated tojerry
The transaction of 102
Will record thename
The column fromjerry
Instead oftom
In order to supportMVCC
These historical versions are retained until the system judgesNo longer needed
These historical versions so far
Avoid dirty read
The core of avoiding dirty reads is to ensure that each version of a record read belongs to a committed transaction
Innodb designs a structure called ReadView that contains four fields
m_ids
: A list of transaction ids active in the system at the start of a read operationmin_trx_id
:m_ids
The smallest value inmax_trx_id
: The ID that the system should assign to the next transaction when a read operation beginscreator_trx_id
: indicates the id of the transaction to which the read operation belongs
With ReadView, we can determine which version of a record should be read when reading it, that is, which version is visible to the current transaction:
-
If the trx_id of a version of a record is the same as the creator_trx_id of the current ReadView, that version was created by the current transaction
-
If the trx_id of a version of a record is less than the min_trx_id of the current ReadView:
- First, the transaction ID to which the version belongs is absent
m_ids
, the transaction is inactive and either committed or has not yet started - while
m_ids
Is greater than the transaction ID of this versiontrx_id
, and the transaction ID is incrementally allocated, indicating that it cannot have been started, it must have been committed, and the committed version is readable
- First, the transaction ID to which the version belongs is absent
-
If the trx_id of a version of a record is greater than the max_trx_id of the current ReadView:
- Note The version is in
ReadView
This version cannot be accessed by the current transaction
- Note The version is in
-
Otherwise, only one case remains: if the trx_id of a certain version of a record is between [min_trx_id,max_trx_id-1], you need to check whether the trx_id is in m_IDS
- If yes, it indicates that the version of the record is currently active, i.e., created by an uncommitted transaction. Since it is not committed, it is not visible
- If not, it indicates that the version of the record was generated by a committed transaction and has already been committed
Note that transactions in the interval [m_ids maximum, max_trxid-1] may also be visible, such as max_trx_id=5, transaction 3, and transaction 4 committed when there are active transactions 1, 2, 3,4 in the system
M_ids =[1, 2], max_trxid-1 is [3,4], 3,4 was committed before the current ReadView was created
Each read starts from the latest version of the target record. If a version is not visible, search down the record version chain until it is foundSome visible version
So far. If all versions of the record are not visible, the record is being created in the current transactionReadView
If it does not already exist, return null
Before each read, create a ReadView according to the current active or committed status of all transactions in the system. According to the ReadView, determine which version of the target record is readable to avoid dirty reads, because the read version must have been committed, and all the submitted records in the system can be read at the moment. The committed read isolation level is implemented in this manner
Avoid unrepeatable reads
Let’s look at how to avoid being unrepeatable:
Unlike the dirty read avoidance operation, a ReadView is created on the first read in the transaction, and subsequent reads in the transaction use this ReadView to determine whether a version of the record to be read is visible
When a ReadView is generated for the first time, it is equivalent to taking a snapshot of the transaction execution status in the current system. The subsequent reads in the transaction are based on this snapshot to determine whether the version of a record is visible. Of course, each read result is the same
conclusion
- The lack of isolation of transactions can cause
Dirty write, dirty read, unrepeatable read, phantom read
The isolation problem of - The database allows users to set transaction isolation levels, a direct tradeoff between isolation and performance
- Mysql Innodb through
MVCC
Mechanism to implementHas been submitted to read
andRepeatable read
Isolation level