Look at the following SQL statement:

# table T (id int, name varchar(20)) delete from T where id = 10;Copy the code

MySQL > select * from ‘lock’;

Consider the following statement:

Select * from T where id = 10;Copy the code

What about this statement? There’s too much knowledge involved. To answer these two questions, you need to know something first.

Introduction to relevant knowledge

Multi-version concurrency control

The default storage engine for MySQL, InnoDB, is a multi-version Concurrency Control protocol (MVCC). The opposite of MVVC is the Lock-based Concurrency Control. The biggest advantage of MVCC is: read without lock, read and write without conflict.

In OLTP applications with more read and less write, read and write conflict is very important, which greatly improves the concurrency performance of the system. At present, almost all RDBMS support MVCC. In fact, MVCC is summed up in one sentence: a way to temporarily store multiple versions of the same data, thereby achieving concurrency control.

Current read and snapshot read

In MVCC concurrency control, read operations can be divided into two types: snapshot read and current read.

Snapshot read (simple SELECT operation) : Reads the visible version (possibly historical version) of the record without locking. There you have the answer to the second question.

Current read (special SELECT operations, INSERT, DELETE, and UPDATE) : The latest version of the record is read, and any record returned by the current read is locked to ensure that no other transaction can modify the record concurrently.

Clustered index

Also called clustering index. In InnoDB, data is organized in a clustered index: complete records are stored in a primary key index, through which all columns in the record can be retrieved.

Left-most prefix rule

This rule applies to composite indexes and prefix indexes.

Select * from (a, b, c, d) where (a, b, c, d) = 1 and b = 2 and C > 3 and d = 4; select * from (a, b, C, d) where (a, b, C, d) = 4; D (a, b, d, c); d (a, B, d, c);

(a = 1 and b = 2 and c = 3); (a, b, c); (a, b, c);

Two phase lock

One of the principles of traditional RDMS Locking is 2PL(two-phase Locking). That is to say, the lock operation is divided into two stages: lock and unlock stage, and ensure that the lock and unlock stage does not intersect. In other words, no matter how many entries are added, deleted or changed in a transaction, the locks are added in the lock phase. After the commit, all entries will be unlocked in the unlock phase.

Isolation level

MySQL/InnoDB defines four isolation levels:

  • Read Uncommitted: Uncommitted records can be Read. This isolation level is not used.
  • Read Committed (RC) : For current reads, the RC isolation level guarantees that Read records are locked (record locks), and phantom reads occur.
  • Repeatable Read (RR) : For the current Read, the RR isolation level ensures that the Read record is locked (record lock) and the Read range is locked. New records that meet the query conditions cannot be inserted (gap lock) and there is no phantom Read.
  • Serializable: Degenerates from MVCC concurrency control to lock-based concurrency control. There is no difference between snapshot read and current read. All read operations are current read, read plus read lock (S lock), and write plus write lock (X lock). At this isolation level, read and write conflicts occur, so concurrency performance deteriorates dramatically and is not recommended in MySQL/InnoDB.

Gap and next-key locks

In InnoDB, a full row lock consists of three parts:

  • Record Lock: A Record Lock locks a Record in an index.
  • Gap Lock: A Gap Lock either locks the value in the middle of an index record, the value before the first index record, or the value after the last index record.
  • Next-key Lock: A combination of a record Lock on an index record and a gap Lock that precedes the record.

analysis

With that in mind, let’s start analyzing the first question. When you look at this question, you probably don’t hesitate to say, add locks. It’s both wrong and right, because we know so little. So what are the prerequisites that we need to know?

  • Premise 1: Is the ID column a primary key?
  • Premise two: What is the current system isolation level?
  • If the id column is not a primary key, do you have an index on the ID column?
  • Premise 4: If there is a secondary index on the ID column, is it unique?
  • Premise five: What is the SQL execution plan? Index scan? Full table scan again

According to the above prerequisites, there are nine possible combinations, but I haven’t listed them all yet.

  • The ID column is the primary key, RC isolation level
  • The ID column is a secondary unique index, RC isolation level
  • The ID column is a secondary non-unique index, RC isolation level
  • No index on id column, RC isolation level
  • The ID column is the primary key, RR isolation level
  • The ID column is a secondary unique index, RR isolation level
  • The ID column is a secondary non-unique index, RR isolation level
  • No index on id column, RR isolation level

Combination 1: id primary key + RC

This combination is the easiest to parse, and by the time the statement is executed, only records with primary key ID = 10 need to be x-locked. As shown below:

Select * from primary key where id = 10; select * from primary key where id = 10;

Combination two: id unique index + RC

For this combination, id is not the primary key, but a Unique secondary key. How is locking done at the RC isolation level? See below:

Select * from primary key where id = 10; select * from primary key where id = 10; select * from primary key where id = 10; Then X lock the primary key entry corresponding to name = ‘e’ on the cluster index.

SQL > select * from id where id = 10; select * from id where id = 10; select * from id where id = 10;

Combination 3: id not unique index +RC

Select * from MySQL where id column is not unique, but is a common index. See below:

As can be seen from the figure above, first of all, on the id column index, records that meet the query id = 10 will be locked with X. At the same time, records on the primary key index corresponding to these records are also x-locked. The only difference with combination ER is that combination two has at most one record that meets the condition, while combination three locks all records that meet the condition.

Conclusion: If there is a non-unique index on the ID column, then all records that meet the SQL query criteria will be locked. These records are also locked on the primary key index.

Combination 4: ID No index +RC

Select * from where id = 10; select * from where id = 10; select * from where id = 10; How does MySQL lock this combination? See below:

Because there is no index on the ID column, only the clustered index can be used for full table scan. It can be seen from the figure that there are only two records that meet the conditions. However, all records on the cluster index will be locked with X. In practice, however, MySQL has made improvements. When filtering conditions are not met, the unlock_row method is called to lock the records that do not meet the conditions (violating the 2PL principle). This ensures that the last record that meets the condition is locked, but the locking operation for each record cannot be omitted.

MySQL > select * from ‘id’; MySQL > select * from ‘id’; Because it’s at the MySQl Server level. Therefore, every record, regardless of whether it meets the filtering conditions, will be added with the X lock. However, for efficiency, MySQL has improved this aspect. During the scanning process, if the record does not meet the filtering conditions, it will be unlocked. Optimization also violates the 2PL principle.

Combination 5: ID primary key +RR

Repeatable Read isolation level for the above SQL statement, locking process is the same as combination 1 (ID primary key +RC).

Combination 6: ID unique index +RR

This combination is the same as the locking process of combination two.

Combination 7: ID not unique index +RR

Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read: Repeatable Read How does MySQL lock the above statement? See below:

This combination looks similar to combination three, but is very different. A Gap lock is added to this combination. This Gap lock is the key to preventing illusory reading at RR level compared to RC level. In essence, a Gap lock is not for the record itself, but for the Gap between the records. The so-called phantom read is the same transaction, continuous current read, and read a range of records (including direct query of all record results or aggregation statistics), found inconsistent results (standard file generally refers to the increase of records, the reduction of records should also be regarded as phantom read).

So how to solve this problem? How to ensure that multiple current reads return consistent records requires that no new records that meet the condition are inserted and committed by other transactions between multiple current reads. In order to achieve this result, Gap lock came into being.

As shown in the figure, there are places where new records that meet the condition can be inserted. Considering the orderliness of B+ trees, records that meet the condition must be continuous. [4, b], [10, c], [10, d], [20, e]

Insert (10, aa); Insert (10, aa); Insert (10, AA); Insert (10, AA); Insert (10, AA); Therefore, through the first current read, X locks will be added to the records that meet the condition, and three Gap locks will be added to Insert the three Gap locks that may meet the condition, to ensure that the subsequent Insert cannot Insert new records that meet the id = 10, and thus solve the phantom read problem.

In combination 5, combination 6, the same RR level, but do not add Gap lock, combination 5 id is the primary key, combination 6 ID is Unique key, can guarantee the uniqueness. An equivalent query can only return one record that meets the condition at most, and new records of the same value cannot be inserted.

Conclusion: In RR isolation level, id column has non-unique index, for SQL statement above; First, locate the first record that meets the condition through the ID index, add X lock to the record, and add Gap lock to the Gap lock, and then add X lock to the record that meets the same condition on the primary key cluster index, and return; Then read the next record and repeat. At this point, there is no need to add X lock to the record, but do you need to add Gap lock to the Gap, and finally return the result.

Combination 8: ID No index +RR

Mysql > select * from table_name where table id = 1; mysql > select * from table_name where table id = 1

As shown in the figure, this is a very scary thing, the whole table each record to add X lock, each Gap add Gap lock, if there is a large number of data on the table, what is the situation? In this case, the table, except for the snapshot read without locking, any other lock concurrent SQL, cannot be executed, update, delete, insert, and so on, the whole table is locked.

Of course, like combination four, MySQL has been optimized to be semi-consistent Read. Semi-consistent Read If semi-Consistent Read is enabled, MySQL will release the lock in advance for records that do not meet the criteria, and the Gap lock will also be released. Semi-consistent Read is triggered either at the Read Committed isolation level; Either innodb_LOCKS_unSAFE_for_binlog is set in Repeatable Read isolation level.

Conclusion: In Repeatable Read isolation level, if the current Read of the full table scan is performed, all records on the table will be locked, and all gaps will be locked by Gap, eliminating all delete/update/ INSERT operations. Semi -consistent Read can be triggered in MySQL to alleviate lock overhead and concurrency, but semi-consistent Read itself brings other problems and is not recommended.

Combination 9: Serializable

In the final combination, the locking process is the same as in combination eight for the delete SQL statement of appeal. However, query statements (for example, select * from T1 where ID = 10) are read by snapshots at the RC and RR isolation levels and are not locked. At the Serializable isolation level, no matter the query statement is locked, that is, the snapshot read does not exist, and the MVCC is degraded to lock-based CC.

Conclusion: In MySQL/InnoDB, the so-called read without lock does not apply to all cases, but depends on the isolation level. At the Serializable isolation level, all operations are locked.

A simple delete statement locking is done, but the learning continues. How to lock complex SQL statements? What about index analysis in MySQL? What about performance analysis, performance optimization? Further study and exploration are needed.