Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

InnoDB storage engine enables READ COMMITTED and REPEATABLE READ (default). The so-called consistent unlocked read means that each row of records may have multiple historical versions, which are connected in series to form a version chain. In this way, transactions started at different times can access data of different versions without locking.

Undo log Version chain

Concurrency non-locked reads are implemented through Multi Version Concurrency Control (MVCC). In fact, there is no universal implementation standard for MVCC, so the implementation mechanism of storage engines is different.

InnoDB storage engine MVCC is implemented using undo log.

In simple terms, undo log is the reverse operation of each operation. For example, if the current transaction performed an operation to insert a record with ID = 100, then undo log stores the operation to delete the record with ID = 100.

So, using multiple versions is not very accurate, because InnoDB doesn’t really make room to store multiple versions of row records, just using undo log to record the reverse of each write.

In other words, there is only one latest version of the record in the B+ index tree, but InnoDB can get the historical version of the data according to the undo log, so as to achieve multi-version control.

So, how does undo Log relate to a row? In other words, how can I find the Undo log it owns from this row record?

In particular, each row in the InnoDB storage engine actually has two hidden fields: trx_id and roll_pointer.

As can be seen from the name, trx_id is the transaction ID of the most recently updated row record, and roll_pointer points to the previously generated Undo log.

For example, if transaction A with id = 100 inserts A row (id = 1, username = “Jack”, age = 18), then, The two hidden fields trx_id = 100 and roll_pointer in this row point to an empty undo log because there was no previous row with transaction id = 1. As shown in the figure:

Then transaction B (id = 200) changes the age of the row from 18 to 20. The trx_id of the row becomes 200. Rooll_pointer points to the undo log generated by transaction A:

Next, transaction C with id = 300 changes the row record again, changing age from 20 to 30, as shown below:

As you can see, the hidden fields trx_ID and roll_pointer are updated with each row change, and the undo logs corresponding to previous snapshots are concatenated by the roll_pointer pointer to form a version chain.

Note that the SELECT query operation does not generate undo log! There are only two types of undo log in InnoDB storage engine:

  • Insert undo log: The undo log generated during the INSERT operation
  • Update undo log: Indicates the undo log generated for delete and update operations

In fact, because of transaction isolation, the records of insert operations are visible only to the transaction itself, not to other transactions, right, so there is no concurrency problem. Update undo log (MVCC) update undo log (MVCC) update undo log (MVCC)

ReadView mechanism

When it comes to MVCC and undo log versioning, you’re probably going to be asked about ReadView if you don’t talk about it yourself.

The ReadView mechanism is used to determine which versions of a transaction can be seen. A ReadView contains several sections:

  • m_ids: Which transactions were executing when the ReadView was generated but had not yet committed (called”Active transaction“), the ids of these active transactions are stored in this field
  • min_trx_id: the smallest value in m_IDS
  • max_trx_id: The value of the ID InnoDB will assign to the next transaction when the ReadView is generated.
  • creator_trx_id: ID of the ReadView transaction currently being created

Next, take out the user table and use an example to understand how the ReaView mechanism can determine which versions of the current transaction can be seen:

Insert row (id = 1, username = “Jack”, age = 18) from transaction A (id = 100)

Id = 200; id = 300; id = 300; update (select);

If transaction B now opens a ReadView, inside this ReadView:

  • m_idsContains the ids of the current active transaction, transaction B and transaction C, 200 and 300
  • min_trx_idIt is 200
  • max_trx_idIs the id of the next transaction that can be assigned, which is 301
  • creator_trx_idIs the id 200 of the ReadView transaction B currently created

Trx_id = 100; trx_id = 100; trx_id = 100; < min_trx_id (200) in ReadView, which means that transaction A has already committed the row before transaction B starts. Therefore, transaction B, which started after transaction A commits, can check the update of this row by transaction A.

row.trx_id < ReadView.min_trx_id
Copy the code

Age = 20; trx_id = 300; roll_pointer points to the undo log generated before transaction C changed it:

The trx_id (300) is greater than ReadView’s min_trx_id (200) and smaller than max_trx_id (301).

row.trx_id > ReadView.min_trx_id && row.trx_id < max_trx_id
Copy the code

This means that the transaction that updates this row is likely to also exist in m_IDS (active transactions) of the ReadView. Trx_id = 300 (trx_id = 300, trx_id = 300, trx_id = 300); Age = 20; age = 20;

Since you can’t query, what should you do? What can be found in transaction B’s query operation?

That’s right, undo log version chain!

Transaction B will follow the roll_pointer pointer of the row to find the latest undo log whose id is 200 and whose trx_id is 100. The undo log version with trx_id = 100 must have been committed before transaction B started. So transaction B’s query reads this version of the data, age = 18.

From the above examples, we conclude that with the undo log version chain and ReadView mechanism, it is possible to guarantee that one transaction will not read the updates of another concurrently executing transaction.


Can you read the value that you modify?

Of course it’s nonsense. You can read it. ReadView creator_trx_id: creator_trx_id: creator_trx_id: creator_trx_id: creator_trx_id: creator_trx_id: creator_trx_id

Assume that the changes to transaction C have been committed, and then transaction B updates the row, changing age = 20 to age = 66, as shown below:

Trx_id = 200 = creator_trx_id = 200 = creator_trx_id = 200 = creator_trx_id = 200 = creator_trx_id = 200

row.trx_id = ReadView.creator_trx_id
Copy the code

If, during the execution of transaction B, transaction D suddenly opens a row with id = 400, updates the row with age = 88 and commits it, then transaction B reads the row, can it read it?

The answer is no.

Trx_id = 500 is greater than max_trx_id = 301 in ReadView. This indicates that another transaction updated data during transaction B’s execution, so the update of another transaction cannot be queried.

row.trx_id > ReadView.max_trx_id
Copy the code

From the above examples, we conclude that with the undo log version chain and ReadView mechanism, it is possible to ensure that a transaction can only read the data that the transaction itself modified or the data before the transaction started.

summary

To summarize, with undo log version chain and ReadView mechanism:

  • It is guaranteed that one transaction will not read updates from another concurrently executing transaction
  • It is guaranteed that a transaction can only read data that the transaction itself has modified or that was before the transaction started

REPEATABLE READ (MVCC) is enabled only at READ COMMITTED and REPEATABLE READ (default) isolation levels. The most fundamental difference, which we’ll explain later, is when they generate readViews

| flying veal 🎉 pay close attention to the public, get updates immediately

  • I am a postgraduate student in Southeast University and a summer intern in Java background development of Ctrip. I run a public account “Flying Veal” in my spare time, which was opened on 2020/12/29. Focus on sharing computer fundamentals (data structure + algorithm + computer network + database + operating system + Linux), Java technology stack and other related original technology good articles. Attention to the public account for the first time to get the article update, background reply 300 can be free to get geek University produced Java interview 300 questions

  • And recommend personal maintenance of open source tutorial project: CS-Wiki (Gitee recommended project, has accumulated 1.8K + STAR), is committed to creating a perfect back-end knowledge system, in the road of technology to avoid detours, welcome friends to come to exchange learning ~ 😊

  • If you don’t have any outstanding projects, you can refer to the Gitee official recommended project “Open Source Community System Echo” written by me, which has accumulated 900+ star so far. SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo