• ** Before we get to MVCC? We need to first for those who do not know MySQL, first to popularize some basic knowledge of MySQL, specific MVCC content to see later oh.

Let’s take a step by step look at what MySQL MVCC really is. **

  • In MySQL, only database tables with InnoDB storage engine support transactions.
  • Transactions can be used to ensure data integrity and consistency, ensuring that batches of SQL statements are either executed or not executed at all.
  • Transactions manage INSERT, UPDATE, and DELETE statements.

1. Four properties (ACID) :

  • ** Atomicity: ** All operations in a transaction either complete or do not complete, and do not end up somewhere in between. If a transaction fails during execution, it will be rolled back to the state before the transaction began, as if the transaction had never been executed.
  • ** Consistency: ** Database integrity is not compromised before and after a transaction. This means that the data written must conform to all the preset rules, including the accuracy of the data, the concatenation of the data, and the ability of the subsequent database to do its predetermined work spontaneously.
  • ** Isolation: ** The ability of a database to allow multiple concurrent transactions to read, write and modify its data at the same time. Isolation prevents data inconsistencies due to cross-execution when multiple transactions are executed concurrently. Transaction isolation can be divided into different levels, including Read uncommitted, Read Committed, Repeatable Read, and Serializable.
  • ** Persistence: After a transaction, changes to the data are permanent and will not be lost even if the system fails.
By default on the MySQL command line, transactions are committed automatically, that is, the COMMIT operation is performed immediately after the SQL statement is executed. Therefore, to explicitly START a TRANSACTION, use the commands BEGIN or START TRANSACTION, or execute the command SET AUTOCOMMIT=0, which disables automatic commit using the current session.Copy the code

2. Isolation level

**

  • Read Uncommitted, a transaction can Read data that has been committed in another transaction. Can appear dirty read, cannot repeat read, unreal read phenomenon.
  • Read Committed, a transaction can only Read data committed by another transaction. There will be non-repeatability, illusory phenomena.
  • Repeatable read keeps the same data from multiple queries within the same transaction. Phantom reading will occur
  • Serializable is a high isolation level that requires read and write locks on the selected objects to be released until the end of the transaction, so it can prevent all problems, but because it is serialized, it is less efficient.

3. Magic reading, unrepeatable reading and dirty reading

Dirty read: When a transaction reads data that has not been committed by another transaction, because the uncommitted data is not necessarily the final valid data. So we say we read dirty data. Dirty reading. Non-repeatable read: After one transaction A reads the data, another transaction B modifies the data. At this time, transaction A searches again and finds that the data is different. This is unrepeatable reading. It can also be called phantasmagoria. Illusory: also called “illusory read “, is a special case of” unrepeatable read “: when transaction 1 executes” SELECT… Transaction 2 creates (for example, [[INSERT]]) a new row of data in the table that satisfies transaction 1’s “WHERE” clause. Note: this may be a bit convoluted, but in general, “non-repeatable” and “phantom” mean roughly the same thing. However, inrepeatability occurs on the data row, i.e., when an update occurs and the data is read, an inrepeatable read occurs. Phantom reads occur on tables, where insert and DELETE operations occur. Read the table again, data entries or rows (number of records) are different. It’s like a hallucination. **

Concurrency Control (MVCC

Databases are used to handle read/write conflicts in order to deliver throughput performance in high concurrency scenarios. Version-linked: For tables using InnoDB’s storage engine, there are two necessary hidden columns in the clustered index record (row_id is not necessary, we create tables with primary keys or non-null unique keys that do not contain row_id columns) : trx_id: Each time a change is made to a record, the corresponding transaction ID is assigned to the trx_ID hidden column. Roll_pointer: Each time a change is made to a record, this hidden column stores a pointer that can be used to find information about the record before the change. Let’s say I have a table like this: T

ID Name
1 Xiao li

Let’s assume that the transaction ID of the new record is 80, so the version list of the new record is as follows (roll_pointer is empty because it is new) :If you have two transaction ids 100 and 200, update this record as follows:

Can't update the same record in two transactions? When the first transaction updates a record, it locks the record, and when another transaction updates again, it waits for the first transaction to commit and release the lock.Copy the code

Every time we make a change to the data record, MySQL records a log, which we call an undo log. Each undo log also has a roll_pointer property (the undo log for insert does not have this property because there is no older version of the record). These undo logs can be linked together into a linked list, so the situation now looks like the following:For the record, after each update will old record into the undo log, it is a historical version of the record, along with the time and increase the number of updates, all the versions will be roll_pointer attribute is connected into a linked list, we call this list 】 【 version chain, head node is the current record of the latest version of the chain value. In addition, each version contains the transaction ID from which the version was generated. This id is so important that the isolation level of subsequent transactions is based on this ID.

ReadView

** For transactions using the [read Uncommitted READ_UNCOMMITTED] isolation level, it is better to read the latest version of the record, and for transactions using the [SERIALIZABLE] isolation level, it is better to access the record by locking. For transactions COMMITTED to READ and REPRATABLE_READ, we need to use version chain. The core problem is that we need to determine which version of the data in the version chain is visible to the current transaction. This ReadView contains all active read/write transactions in MySQL and puts their transaction ids into a list. We call this list m_IDS (an array). So when we access a record, we only need to follow the following steps to determine whether a version of the record is visible (official design rules oh) :

  • If the value of the accessed version’s trx_ID attribute is less than the minimum transaction ID in the M_IDS list, it indicates that the transaction that generated the version was committed before the ReadView was generated, so the version can be accessed by the current transaction.

  • If the value of the accessed version’s trx_ID attribute is greater than the maximum transaction ID in the M_IDS list, the transaction that generated the version was generated after the ReadView was generated, so the version cannot be accessed by the current transaction.

  • If the value of the trx_id attribute is between the maximum transaction ID in m_IDS and the minimum transaction ID in m_IDS, then it is necessary to check whether the value of trx_id attribute is in m_IDS. If it is, then the transaction generated by ReadView is still active, and this version cannot be accessed. If not, the transaction that generated the version of the ReadView when it was created has been committed and the version can be accessed.

If a particular version of the data for the current transaction is not visible, then along the chain to continue to look for the next version version of the data records, we still according to the above steps to determine whether the data can be seen, and so on, has been to release the final version of the chain, if the final version of the data I also not is visible, This means that the record is not visible to the transaction and is not included in the query result. One of the biggest differences between READ COMMITTED and REPEATABLE READ isolation levels in MySQL is when they generate readViews. MySQL determines whether the data is readable (visible) by using the version chain we drew above. –[1]– [R****EAD COMMITTED — ReadView]

# Transaction 100
BEGIN;
 
UPDATE t SET name = 'the little B' WHERE id = 1;
 
UPDATE t SET name = 'small C' WHERE id = 1; # Note: I did not commit this transaction. There is nocommitInstructions: ohCopy the code
# Transaction 200
BEGIN; # update some other table records...Copy the code
The first time a transaction is actually modified (such as with INSERT, DELETE, or UPDATE statements), a single transaction ID is assigned, and this transaction ID is incremented.Copy the code

At this point, the version linked list for the record with ID 1 in table T looks like this:Note that I haven’t committed the transaction 100 yet, and I didn’t execute the commit directive. Suppose a transaction now starts executing with the READ COMMITTED isolation level:

Transactions with READ COMMITTED isolation level (READ COMMITTED)BEGIN; # SELECT1: Transaction100,200Have not been submittedSELECT * FROM t WHERE id = 1; # yields column name with the valueSmall 'A'
Copy the code

The SELECT1 execution flow is as follows:

  • A ReadView is generated when the SELECT statement is executed, and the contents of the m_IDS arraylist of ReadView are [100,200].

  • Select the visible record from the version chain, as shown in the figure, the latest version of the column name content is’ small C’, this version of the trx_id value is 100, so does not meet our visibility requirements, according to roll_pointer jump to the next version.

  • Trx_id = 100, trx_id = 100, trx_id = 100, trx_id = 100

  • The next version of the column name has the content of ‘little A’. The trx_id value of this version is 80, which is less than the smallest transaction id100 in the M_IDS list, so this version is valid and the final version returned to the user is the record with the column name’ little A’.

Then we commit the transaction with id 100 as follows:

# Transaction 100
BEGIN;
 
UPDATE t SET name = 'the little B' WHERE id = 1;
 
UPDATE t SET name = 'small C' WHERE id = 1;
 
COMMIT;    //Submit the ohCopy the code

Select * from t where id = 1 and id = 200; select * from t where id = 1 and id = 200;

# Transaction 200
BEGIN; # update some other table records... UPDATE tSET name = 'little D' WHERE id = 1;
 
UPDATE t SET name = 'the little F' WHERE id = 1;
Copy the code

At the moment, the version chain of the record with ID 1 in table T looks like this:Select * from the transaction where THE COMMITTED isolation level was used and continue to search for the record with ID 1 as follows:

# Use READ COMMITTED isolation level transactionsBEGIN; # SELECT1: Transaction100,200Queries executed when none are committedSELECT * FROM t WHERE id = 1; # yields column name with the valueSmall 'A'# SELECT2: Transaction100Submitted, the Transaction200Queries executed when not committedSELECT * FROM t WHERE id = 1; # yields column name with the value'small C'
Copy the code

This SELECT2 execution is as follows:

  • When the SELECT statement is executed, a ReadView will be created, and the contents of the m_IDS list of ReadView will be [200].

  • Select the visible record from the version chain, as can be seen from the figure, the latest version of the column name content is’ small F’, this version of the trx_id value is 200, so does not meet the visibility requirements, according to roll_pointer to the next version.

  • The value of trx_id is 200, and the value of trx_id is 200, so it does not meet the requirement.

  • The next version of column name has the content of ‘little C’. The trx_id value of this version is 100, which is smaller than the smallest transaction id200 in the M_IDS list, so this version is valid and the final version returned to the user is the record with the column name’ little C’.

In the same way, if a later transaction with id 200 is COMMITTED and a table t with ID 1 COMMITTED is COMMITTED, the result will be ‘little F’. To summarize, transactions with READ COMMITTED isolation level generate a separate ReadView at the start of each query. That said, the isolation level is [read submitted]. Do you understand? If you do not understand, please contact me and we will discuss together. Let’s take a look at how MVCC controls data visibility when the transaction isolation level is repeatable. –[2]–****【REPEATABLE READ — generate a ReadView when reading data for the first time. For transactions with REPEATABLE READ isolation level, only a ReadView is generated when the query statement is executed for the first time. Subsequent queries will not be generated repeatedly. So let’s use an example to see what happens. For example, the system has two transactions with id 100 and id 200 executing:

# Transaction 100
BEGIN;
 
UPDATE t SET name = 'the little B' WHERE id = 1;
 
UPDATE t SET name = 'small C' WHERE id = 1;
Copy the code
# Transaction 200
BEGIN; # update some other table records...Copy the code

At this point, the version linked list for the record with ID 1 in table T looks like this:Suppose a transaction with REPEATABLE READ isolation level is now executed:

# Use REPEATABLE READ to isolate transactionsBEGIN; # SELECT1: Transaction100,200Have not been submittedSELECT * FROM t WHERE id = 1; # yields column name with the valueSmall 'A'
Copy the code

SELECT1 is executed as follows:

  • A ReadView is created when the SELECT statement is executed, and the contents of the m_IDS list of ReadView are [100, 200].

  • Select the visible record from the version chain, as shown in the figure, the latest version of the column name content is’ small C’, this version of the trx_id value is 100, so does not meet the visibility requirements, according to roll_pointer to the next version.

  • Trx_id = 100, trx_id = 100, trx_id = 100, trx_id = 100

  • The next version of the column name has the content of ‘little A’. The trx_id value of this version is 80, which is less than the smallest transaction id100 in the M_IDS list, so this version is valid and the final version returned to the user is the record with the column name’ little A’.

After that, we commit the transaction with transaction ID 100, like this:

# Transaction 100
BEGIN;
 
UPDATE t SET name = 'the little B' WHERE id = 1;
 
UPDATE t SET name = 'small C' WHERE id = 1;
 
COMMIT;
Copy the code

Select * from t where id = 1 and id = 200; select * from t where id = 1 and id = 200;

# Transaction 200
BEGIN; # update some other table records... UPDATE tSET name = 'little D' WHERE id = 1;
 
UPDATE t SET name = 'the little F' WHERE id = 1;
Copy the code

At the moment, the version chain of the record with ID 1 in table T looks like this:REPEATABLE READ isolation level = 1; REPEATABLE READ isolation level = 1;

# Use REPEATABLE READ to isolate transactionsBEGIN; # SELECT1: Transaction100,200Has not been submittedSELECT * FROM t WHERE id = 1; # yields column name with the valueSmall 'A'# SELECT2: Transaction100Submitted, the Transaction200Have not been submittedSELECT * FROM t WHERE id = 1; # the resulting column name is still equal toSmall 'A'
Copy the code

This SELECT2 execution is as follows:

  • The m_IDS list in the ReadView is [100, 200].

  • Select the visible record from the version chain, as can be seen from the figure, the latest version of the column name content is’ small F’, this version of the trx_id value is 200, so does not meet the visibility requirements, according to roll_pointer to the next version.

  • The value of trx_id is 200, and the value of trx_id is 200, so it does not meet the requirement.

  • The next version of the name column is’ little C’, the trx_id value is 100, and the m_IDS list contains the transaction ID with the value 100, so this version is also not valid, and the next version of the name column is’ little B’. Continue to the next version.

  • The value of trx_id is 80,80 is less than the smallest transaction id100 in the m_ids list, so this version is valid, and the final version returned to the user is the record with the column name ‘little A’.

The value of the name column is’ small A’. This is what repeatable means. If we commit the same transaction with transaction ID 200 and then look up the same transaction with transaction ID 1 with REPEATABLE READ isolation level, the result is still ‘small A’.

MVCC summary

Concurrency Control for Multi-version Concurrency Control Multi-version concurrency control (MULTI-version concurrency control) refers to the process of accessing the version chain of records when ordinary SEELCT operations are performed by transactions with READ COMMITTD and REPEATABLE READ isolation levels. In this way, read-write and read-read operations of different transactions can be executed concurrently, thus improving system performance. READ COMMITTD, REPEATABLE READ One big difference between the READ COMMITTD and REPEATABLE READ isolation levels is when a ReadView is generated. READ COMMITTD generates a ReadView before a normal SELECT operation. REPEATABLE READ generates a ReadView only before the first normal SELECT operation and repeats this ReadView for all subsequent query operations.

Back to our title: Can MySQL solve illusions? Or how does MySQL solve phantom reading? Do you understand now? Welcome to discuss