Can it

Transaction: ACID

Locks: optimistic locks, pessimistic locks, table locks, row-level locks, gap locks, record locks, auto-increment locks, intent locks…

MVCC: multiversion concurrency control version used by MySQL database to improve read and write efficiency. It is equivalent to Java’S CAS for JUC concurrency.

First talk about MVCC

Basic knowledge of

The current reading

  1. Concept: The data read is the latest version, the latest version of data is always read, and the data returned by the current read will be locked, to ensure that other transactions will not concurrently modify the record.
  2. Scene:Insert, update, delete,select ... lock in share model,select ..... for update

Where, lock in share Model and for update are lock operations.

Read the snapshot

  1. Concept: Read data is a historical version of the record, no lock
  2. Scenario: Select…

Isolation level for Innodb transactions

  1. Read uncommitted: As soon as the data is modified, other things can see the data immediately, whether it is committed or not
  2. Read Committed: Committed reads can also be read committed. Committed reads can solve the problem of dirty reads, which is higher than uncommitted reads. Therefore, after the isolation level is set to committed, other objects’ modified data must be committed before the current transaction can see it.
  3. Repeatable read: The results of each query must be the same in the same transaction
  4. Serialization: After one transaction is executed, other transactions are executed in sequence.

Repeatable reads are Innodb’s default isolation level.

For more information about isolation levels, see this blog post: Illustrating the four isolation levels for mysql transactions

Take an example to introduce MVCC

There are now two transactions with the following contents:

Q: Can A read the data submitted after B’s modification?

A: At the ISOLATION level of RC (read committed) : yes at the isolation level of RR (repeatable read), no. (This one is also interesting, so keep reading and you may have questions about it.)

Reason: This is where the MVCC and its visibility algorithm come in.

What is a MVCC

It consists of three parts:

1. Hide fields

In addition to user-defined data fields, there are three fields in the database table that are not visible to the user

  • DB_TRX_ID:The transaction ID of the record that was created or last modified
  • DB_ROW_ID:Hide the primary key, if the primary key is not set in the table you created
  • DB_ROLL_PTR:Rollback Pointers

2. Undolog Rolls back logs

This is mainly reflected in the atomicity of transactions. When a transaction fails, the transaction is rolled back according to Undolog.

  • When the same record is modified by different transactions, the undolog of the record will form a linked list. When the same record is modified by different transactions, the latest record will be inserted into the head of the linked list, and the end of the linked list stores the earliest data record.

3. Readview readview

The read view generated when the transaction is doing snapshot reads (note this sentence, which is interesting)

Readview also holds three fields:

  • trx_list: The active transaction ID of the system
  • up_limit_id: The smallest transaction ID in the list
  • low_limit_id: The next transaction ID that has not been assigned by the system

What is the visibility algorithm

The visibility algorithm is defined by Innodb when you need to read data. The rules are as follows:

  1. First comparedDB_TRX_ID < up_limit_id.If less than,The current transaction can see the record where the DB_TRX_ID is located, if greater than or equal to go to the next judgment
  2. And then judgeDB_TRX_ID >= low_limit_id.If it’s greater than or equal toDB_TRX_ID is not present until the Read View is generated.It must not be visible to the current transaction, if less than, enter the next judgment
  3. DB_TRX_ID specifies whether the transaction is active at the time the Read View is generated. If the transaction is active at the time the Read View is generated, the transaction has not been committed. The result of the modification is visible.

MVCC+ visibility algorithm to achieve transaction reading data problem

Q: Can the snapshot read of transaction 2 read the modified data of transaction 4

A: Both the RC and RR isolation levels are acceptable

Isn’t that confusing? You can’t read it in RR?

So let’s analyze it in terms of the visibility algorithm, and let’s first list the fields that we need and their corresponding values

And then you run it through the visibility algorithm, and you see that you can actually read the modified data.

This is very strange, and it is not the same as the actual situation in our life, which is the above answer I asked you to pay attention to whether the data can be read at the RC and RR isolation levels

No wonder, this has something to do with the creation of the readView

RC: a new readView is created for each snapshot read. RR: A new readView is created for the second and subsequent snapshot reads, except for the first snapshot read. Instead, a new readView is created for the first snapshot read.Copy the code

In this case, modified data cannot be read at the RR isolation level.

Analysis is as follows:

  • After transaction 2 is started, the readView is created in the snapshot read. The readView used by transaction 2 to access the data modified by transaction 4 is the same readview that was created in the beginning.
  • Minimum transaction ID: 1
  • The next transaction ID not yet assigned by the system: 5
  • Transaction ID of the last modified record: 4

Based on the visibility algorithm, it is easy to know that modified data cannot be read at the RR isolation level. This is the MVCC for MySQL

In other words, if in RR isolation level, if a snapshot read has been performed before a transaction, the data committed by the new transaction will not be seen after the snapshot read

Note:

  • In RR isolation, if only snapshot reads are in the transaction, the first readView is used for the second and subsequent snapshot reads
  • At the RR isolation level, if a transaction has a current read in addition to a snapshot read, the readView is recreated -- this is where a phantom read occurs
  • RR does not solve the illusionary problem, but rather solves the illusionary problem by means of gap locking.

Let’s talk about transactional ACID

The four characteristics of a transaction are atomicity, consistency, isolation and persistence

The atomicity of a transaction is achieved by undolOG, isolation is achieved by MVCC, and the persistence of a transaction is achieved by Redolog, and atomicity + isolation + persistence together achieve the consistency of a transaction.

Transaction persistence

Transaction persistence is implemented through Redolog. How is this implemented? This involves a two-phase commit for Redolog

Phase 2 commit for Redolog

Redolog’s two-phase commit involves another mechanism: WAL (Write Ahead log)

We all know that data can be read and written in two ways: Random, speaking, reading and writing, speaking, reading and writing, and order stochastic efficiency is lower than the order of, speaking, reading and writing, speaking, reading and writing, but in the reality of our operations in most cases is random, speaking, reading and writing, so a WAL appeared, it will be first by order, speaking, reading and writing data written to the log file, and then the data written to the corresponding disk file, this way is faster than random read-write efficiency directly, This also ensures data persistence (since we write data to the log first, as long as the log is saved, data will not be lost regardless of whether the data is written to disk or not).

If you look up redolog’s phase 2 submission on the Internet, you’ll see that they all mention binlog. It’s a new term, don’t panic, it’s just a log file in binary format, it’s a log file that records what you’re doing to the database, it comes with mysql.

Then what good is he to you? Don’t worry, just give you the picture above (take updating data as an example)

Illustration:

  1. The actuator retrieves data from the engine, returns it if it is in memory, and returns it if it is not
  2. The actuator takes the data, modifies it, and then calls the engine interface to write it back
  3. The engine updates data to memory and writes data to redolog. At this time, the engine is in the prepare phase and informs the actuator that the execution is complete and ready to operate
  4. The executor generates a binlog of this operation
  5. The executor calls the engine’s transaction commit interface, the engine changes the redo file to commit, and the update is complete

In a two-phase commit, data is written to redolog for the first time. Redolog is in the prepare state, and then the data operation is written to binlog. After the data operation is written, the prepare state is changed to COMMIT.

  • Stage 1:Data is written to Redolog, and Redolog is in the prepare phase
  • Stage 2:Data is written to binlog, and the prepare phase of redolog is changed to commit phase.

So why two-phase commit? — Ensure data persistence.

Or the picture above for example:

Think about it:

  • When your operation is executed at ①, the power is off, what happens to the data?
  • When your operation is executed at ②, the power is off, what happens to the data?

The first thing you need to know is that when a power outage occurs, the database needs to perform data recovery, and database recovery is implemented through Redolog and binlog. The database compares the data of Redolog and binlog and selects how to restore the data based on the comparison results.

It’s easy to know that.

In the first case, redolog contains data, but binlog does not contain data, and redolog is in the prepare state, not the final COMMIT state. In this case, the database loses the prepare state of Redolog and rolls back the data.

In the second case, there are data records in both redolog and binlog. When restoring the database, the binlog will change the state of redolog from Prepare to COMMIT and the data records will be the same. Instead of throwing it away.

Note: The two-phase commit must ensure that both redolog and binlog are written successfully to ensure that the data can be recovered properly. If only one of the logs is written successfully, the following two situations occur:

  • Write redo log first and then Binlog: Suppose that the MySQL process restarts unexpectedly when the redo log is complete. Because the system can recover data from a redo log crash, the value of c is 1. But because the binlog didn’t finish writing the crash, the binlog didn’t record the statement. Therefore, when the log is backed up later, the saved binlog does not contain this statement. Then you will notice that if you need to restore the temporary library with this binlog, the temporary library will miss the update because the binlog of this statement is lost, and the restored line of C will have a value of 0, which is different from the original library.
  • Write a binlog and write a redo log first: If the binlog crashes and the redo log is not written, this transaction is invalid after crash recovery, so the value of c in this line is 0. But in the binlog, it’s already logged “change c from 0 to 1”. The value of c in the restored row is 1, which is different from that in the original library.

One last word about database locks

The type of lock

  1. Shared lock: also called read lock, S lock;When a transaction has a read lock, other transactions can only have a read lock on the transaction, but cannot have a write lock on the transaction until the read lock is released

    Function:Concurrent data reading is supported. Data reading cannot be modified to avoid repeated data reading.
  2. Exclusive lock: also called write lock, X lock;When a transaction holds a write lock, other transactions cannot hold any other locks on that transaction

    Function:When modifying data, do not allow others to modify or read data at the same time, avoiding dirty data and dirty read problems.
  3. Row locks:Lock one or more rows in a table. Other transactions can only access the unlocked data

    Features:Large granularity, simple locking, easy to conflict;
  4. Table locks:Other transactions must wait until the currently locked transaction releases the lock before they can access the table

    Features:Small granularity, locking is more troublesome than table locking, less likely to conflict, and higher concurrency than table locking
  5. Record the lock:A type of row lock that locks a row of data

    Features:With a record lock, the data can avoid repeated reads that are modified during the query and dirty reads that are read by other transactions before the modified transaction is committed
  6. Clearance lock:Is a kind of row lock. Gap lock locks a certain range of table records after transactions are locked. When there is a gap between the adjacent ids of the table, an interval will be formed, following the principle of open on the left and close on the right

    Features:Range query and query failed records, the query condition must match the index, gap lock only occurs in RR isolation level transactions.
  7. Intentional lock: When a transaction accesses a resource with a row lock, the row lock is upgraded to an intentional lock, locking the table

We will focus on the above mentioned lock used to solve illusionary reading at RR level – gap lock

Gap lock lock logic:

  1. To lockNext-KeyIs the basic unit
  2. Only the range scanned during the search is locked
  3. Unique index equivalent query, no gap lock, only add row lock
  4. No row lock is added to the value that does not meet the condition detected in the rightmost index equivalent query
  5. Primary key indexes are not locked when an index is overridden and an S lock is applied only

Definition of next-key unit: the scanned interval + the Next equivalent value is formed (a,b).

I’m sure it’s hard to understand just by listening to the concept.

Where ID is the primary key and C is the secondary index.

  • Case 1: Equivalent query gap lock

  • Case 2: Non-unique index equivalent lock

  • Case 3: primary key index range lock

  • Case 4: Non-unique index range lock

  • Case 5: Non-index field query

Note: in the current read case, do not query items that do not have indexes (corresponding to case 5, add large granularity lock, bad ~~).