MVCC is introduced
Concurrency Control MVCC stands for Multiversion Concurrency Control. As its name implies, it is a method of controlling read and write in the event of concurrency. The basic popular Database has its specific implementation.
MVCC enables each transaction to minimize the dependence on locks when reading and writing data. While ensuring transaction isolation, MVCC also enables read and write transactions to run concurrently.
When we make changes, MVCC keeps the previous data records through undo log, as if each record has its own historical version.
By tracing these historical versions, THE MVCC is no longer afraid of being influenced by write operations, which is the essence of MVCC.
MVCC role
As we know, isolation is one of the four characteristics of transactions, and its implementation cannot be achieved without the use of locks. But locks have their own performance bottlenecks, such as queuing or accidentally creating deadlocks.
With MVCC multi-version control, read and write operations no longer need to guard each other carefully, greatly improving the database concurrency processing.
Of course, this is for read and write concurrency, if the write and write operation at the same time, then the original control should be the same control, this is related to the data integrity principle.
Therefore, MVCC is mainly optimized for read and write concurrent operations, which can be optimized for many Internet projects with more read and less write.
MVCC principle
Before going into the details of MVCC, let’s take a look at undo log, which is very useful in MVCC.
Undo log is a log of transaction operations to provide data rollback function. For example, if the transaction now performs an INSERT operation, the rollback will reverse resolve to delete. If it is a DELETE operation, it becomes an INSERT operation, which acts as a logical reverse.
In addition to recording the operation type, data status, and transaction ID of each time, undo log also contains a pointer, that is, the last record pointer, to form a complete historical link.
So the MVCC historical version data is retrieved retrospectively from the undo log record pointer. So, where was the last record pointer stored?
In fact, every time we insert a row of data, the database generates two additional invisible fields for us:
- DB_TRX_ID: specifies the transaction ID
- DB_ROLL_PTR: rollback record pointer.
This line DB_ROLL_PTR is also the last record pointer, so each time you need to trace the historical version, you can start from there.
As mentioned earlier, there are four levels of isolation for transactions: uncommitted reads, committed reads, repeatable reads, and serializable. In fact, MVCC is implemented only at the committed read, repeatable read isolation level.
Because the unsubmitted read requirement is minimal, no matter what, as long as the data can be retrieved will be returned. Serializability inherently shuts out other requests, regardless of the concurrent execution between transactions.
MVCC also differs in the isolation level of committed reads and repeatable reads, mainly in the generation of Read Views.
Read View is a maintenance of the active transaction ID and contains the following fields:
- M_ids: Set of all transaction ids that are occurring at that time
- M_low_limit_id: indicates the maximum transaction ID that can be read by the current transaction. It is equivalent to the “high watermark” warning threshold.
- M_up_limit_id: conversely, it is the “low water” warning line, and all transaction ids read should be greater than this value.
- M_creator_trx_id: indicates the current transaction ID.
When these transaction ids are maintained in the Read View of the current transaction, you can control the visibility of other transactions to the current transaction.
For example, if m_low_limit_id of the current Read View is 10, the current transaction can only Read the transaction whose id is 10, even if a new transaction is created later.
If the current transaction needs to be rolled back, the rollback is controlled based on the undo log pointer and m_up_limit_id of the Read View.
After explaining the use of Read View, let’s move on to the difference between committed reads and repeatable reads on MVCC.
For committed reads, it regenerates the Read View on each SELECT, so committed reads may Read different committed data in the same transaction!
Repeatable reads are only generated on the first SELECT, so all subsequent reads are under version control here!
The historical version of the MVCC generated in the undo log will not remain forever, but will be marked as cleared after certain conditions are met.
Undo logs of insert type, for example, can be marked as cleared after the transaction is committed because it is new and no other transaction depends on it.
conclusion
MVCC obtains historical versions through the undo log record pointer, just like mirroring backup, so that data reading and writing need not depend on one copy of data, which improves the concurrent execution efficiency.
However, if the undo log clearing thread is delayed, the undo log will become larger and larger, affecting the efficiency of disk operation. If necessary, system variables need to be configured to allocate more resources to the cleanup thread.
In general, the use of MVCC makes sense, as does the use of space for time, in this case for redundant version for concurrency.
Interested friends can search the public account “Read new technology”, pay attention to more pushed articles. Thank you for your support! Read new technology, read more new knowledge.