The first problem of data management system is data storage, the second problem is fast data search, and the third problem is concurrent data consistency and performance.
- This article is a note type article, only the conclusion is not proved, can be used to review the consolidation of knowledge, can not be used for new knowledge learning. Thank you for correcting any mistakes you have made.
- [MySQL] concurrency, transactions and locks — nuggets (juejin. Cn)
Series of articles:
- 【MySQL】 logical architecture (juejin. Cn)
- MySQL > select * from juejin.cn;
- 【MySQL】 concurrency, transactions and locks
The transaction
A transaction is a unit of execution, the basic unit of recovery and concurrency control, used to ensure system consistency in concurrency. This operation can be a single SQL or a set of consecutive SQL statements. InnoDB, XtraDB, PBXT are all storage engines that support transactions.
Characteristics of transactions
Four major features of transactions, known as ACID properties:
-
Atomicity: Transactions are indivisible, either not executed or all executed.
-
Consistency: Data integrity is consistent before and after a transaction is executed.
-
Isolation: Transactions executed simultaneously cannot affect each other.
-
Durability: Transactions, once executed, must persist for storage.
By following all the rules of a transaction, data consistency can be maintained under concurrent conditions.
Data consistency issues in concurrency
-
Update the lost
Condition: Concurrent writing occurs for the same record.
Definition: the write operation of transaction A has not committed, and the modification is overwritten by the write operation of transaction B. Breaking the atomicity and isolation of transactions.
Solution: write serialization, add mutex when writing concurrent.
-
Dirty read
Condition: Concurrent read and write operations occur for the same record.
Definition: also called uncommitted read. The write operation of transaction A is not committed, and transaction B reads the uncommitted changes. At this time, transaction A rolls back, causing transaction B to read dirty data. Breaking the isolation of transactions.
Resolution: Restrict transactions to see only committed changes.
-
Unrepeatable read
Condition: Concurrent read and write operations occur for the same record.
Definition: In read/write concurrency, transaction A reads A row, and then transaction B modifies the row and commits it. Transaction A reads the changed row again, which is inconsistent with the first read. Broke the consistency of the transaction.
Solution: Before a transaction is executed, the data snapshot created at the beginning of the transaction is saved. The data in the snapshot is read for each query. If the data of the latest version is not read, the unrepeatable read problem does not occur.
-
Phantom reads:
Condition: Read operations, insert operations, and delete operations matching the same query conditions are executed concurrently.
Definition: In read and write concurrency, transaction A queries, and transaction B deletes or inserts some rows and commits them. The deleted or inserted rows just match the query conditions of transaction A, causing transaction A to query again with less or more records, which are called unreal rows.
Solution: Before a transaction is executed, the data snapshot at the beginning of the transaction is saved. The query condition filters only the data in the data snapshot during each query. The query does not overwrite the data of the latest version, so the phantom read problem does not occur. The same is true for deleted rows, as long as the deleted rows remain in the data snapshot, read, and do not suddenly disappear.
Transaction isolation level
Because different users have different requirements for data consistency, transactions can be divided into four isolation levels depending on the problem being solved:
-
READ UNCOMMITTED:
That is, you can read data that has not been committed by other transactions. There are no lost updates, there are dirty reads, unrepeatable reads, and phantom reads.
-
READ COMMITTED:
That is, only data committed by other transactions can be read. Although the dirty read problem is solved, there are still unrepeatable read and phantom read problems.
-
REPEATABLE READ:
That is to ensure that the same data is read by multiple queries within a transaction. There is no unrepeatable read problem, but there is a phantom read problem.
-
SERIALIZABLE:
The highest transaction isolation level, which forces all transactions to execute sequentially, without illusions, without any concurrency problems, because there is no concurrency. This causes a lot of timeouts and lock contention issues, so this level should only be used if concurrency is not considered.
InnoDB’s default transaction isolation level is repeatable reads, and unlike standard isolation levels InnoDB also addresses illusionary reads in repeatable reads. This eliminates the need for serialization, an extremely inefficient isolation level, to solve phantom readings.
It is important to note that InnoDB’s magic solution is implemented through special write operations, and simple snapshot reads cannot completely solve magic.
Snapshot read and current read
In MySQL, read operations are divided into two types: snapshot read and current read. Snapshot read refers to data snapshot when transaction is started, and current read refers to data of the latest version. As mentioned earlier, snapshot reads are used to solve the problem of non-repeatable reads and phantom reads existing in the current read. Snapshot reads protect transaction consistency by allowing multiple reads within a transaction to read the same result. Snapshot read is InnoDB’s default read mode and is implemented through Multi-version Concurrency Control (MVCC).
The purpose of snapshot read is to solve the problem of data consistency in concurrent read and write. Therefore, it is essentially a non-blocking read implementation.
The significance of the current read is to enable some strong real-time queries to read the latest version of data. Because the data is of the latest version, the current read must have the potential of unrepeatable read and phantom read. These hidden dangers are as follows:
-
Transaction A wants to read some data, and after the transaction starts, transaction B changes it and commits it. At this time, transaction A reads the snapshot data for the first time and reads the value at the beginning of the transaction. Then the current read is used and a new value is read. In this case, inconsistent reads within the same transaction cause unrepeatable reads.
-
Transaction A wants to read data with age = 6. After transaction B inserts A data with age = 6 and commits. At this time, the first read of transaction A adopts snapshot reading, and only the rows before the transaction start are read. Then using snapshot reading, the phantom row is read.
Both of these pitfalls are caused by snapshot reads being used before the current reads are used within the transaction. Snapshot reads must be used throughout transactions to ensure data consistency. The current read at this point destroys the snapshot read, reverting to the original scenario where the concurrency problem occurred. Therefore, if snapshot reads are to be used, you must ensure that snapshot reads are used throughout the transaction.
Current read and write operations
However, current reads are necessary because we sometimes need to get the latest data, so it is also necessary to solve the concurrency problem of current reads. It is also important to note that when InnoDB writes, the queries used for InnoDB writes are currently read because they must be modified on data entities. This means that snapshot reads cannot solve the problems of unrepeatable reads and phantom reads in write operations.
InnoDB cannot guarantee concurrency security when snapshot reads are performed first and then the current read or write is performed. Users must consciously guarantee not to perform such queries to ensure consistency.
However, InnoDB locks data for multiple read or write operations after the current read or write. If a current read or write is performed within a transaction, InnoDB locks the data and does not release the lock until the transaction commits. This is the step that ensures consistency. There are two ways to lock:
-
Shared lock: SELECT… FROM … LOCK IN SHARE MODE. This is an explicit shared lock in InnoDB, InnoDB default all index reads are snapshot reads, do not need to lock.
-
Exclusive lock: SELECT… FROM … FOR UPDATE. Write operations are locked exclusively by default because updates need to be kept from being lost.
For the problem of unrepeatable read, the current read and write operations are locked to prevent other transactions from modifying them. For the phantom read problem, the current read and write operations use a gap lock to prevent other transactions from deleting or inserting records that meet the query conditions.
Therefore, whether snapshot read or current read, once a transaction and the current read or write operation, must ensure that there is no snapshot read in the transaction, either before or after; Once a transaction uses snapshot reads, it must be guaranteed that only snapshot reads and no current reads and writes can be performed within the transaction, either before or after.
Automatically submit
MySQL transactions are in AUTOCOMMIT mode. If a transaction is not explicitly enabled, each query is executed as a transaction by default. This is how transactions work at the MySQL service layer.
For storage engines that do not support transactions, there is no problem performing AUTOCOMMIT because they do not have the concept of rollback or commit. Automatic commit is useless, but does not affect usage.
Multi-version Concurrency Control (MVCC)
We know that to solve the problem of unrepeatable and phantom reads, snapshot reads can be solved. That is, a snapshot of data is saved at the beginning of a transaction, and all queries in the process of a transaction are put back to the data in the snapshot, so as to ensure data consistency of concurrent queries. InnoDB implements snapshot reading through Multiple Version Concurrent Control. However, it is important to note that MVCC does not solve the problem of unrepeatable and phantom reads in write operations.
By keeping a snapshot of the data when the transaction was started, the transaction can see a consistent view of the data no matter how long it runs. Depending on when a transaction is started, different things may see different data at the same time.
Some people talk about MVCC as an implementation of optimistic locking, and there’s a lot of writing going on, but MVCC has nothing to do with optimistic locking. MVCC is designed to implement snapshot reads and is only responsible for consistency of data reads within a transaction. The goal of optimistic locking is to solve the problem of writing concurrency in the read-write scenario, which has nothing to do with that. The difference between them is that one is lockless concurrency and the other is locked concurrency.
The realization of the MVCC
MVCC is implemented with three implicit fields, undo log, and read view.
-
Implicit field
Implicit fields include DATA_TRX_ID (6B), DATA_ROLL_PTR (7B), and DB_ROW_ID (6B). DATA_TRX_ID Records the transaction ID of the most recently updated record. The transaction will only read data of the same version as its transaction ID. DB_ROLL_ID is a pointer to the record rollback segment that organizes all rows in a linked list in undo_log. DB_ROW_ID Can only be created if no primary key is defined, that is, hide the primary key. At the same time, there is a special flag (1bit) in the header information of each record to identify whether the current record has been deleted.
-
undo log
Undo logs are called rollback logs, and the data in them are called rollback segments. It is used to save the data of the version before the data update and can be directly restored during the transaction rollback. At the same time, because of the existence of multiple versions of historical data, we can read the data of the old version. Snapshot reading is realized by reading the data of the old version. Undo logs are classified into two types:
- Insert undo log: The undo log used by insert statements. Insert operations written to transactions are not visible to other transactions, so they are first written to the INSERT undo log, which is deleted after the transaction commits. [This point is not very certain, temporarily doubtful]
- Update undo log: The undo log generated when the update and Delete statements are executed. The undo log is not deleted immediately after the transaction commits and needs to be deleted by the Purge thread.
-
Read the view
That is, the data of the corresponding version is queried according to the current transaction ID.
The lock
Locks are essential to the implementation of transactions. In InnoDB, transactions are implemented by locking and MVCC, so the logic is easy to unclog. Locks and MVCC are also the foundation for efficient concurrency in InnoDB.
In MySQL, locks can be divided into table level locks, page level locks, row level locks and gap locks from the granularity of locks. Lock can be divided into shared lock and exclusive lock from its function. The basic comparison of locks of different granularity is as follows:
The lock type | Particle size | overhead | speed | A deadlock | conflict | Concurrent degree | To support the engine |
---|---|---|---|---|---|---|---|
Table level lock | big | small | fast | x | high | low | MyISAM, Memory, InnoDB, BDB |
Page-level locks | In the | In the | In the | Square root | In the | In the | BDB |
Row-level locks | small | big | slow | Square root | low | high | InnoDB, XtraDB |
Gap locks are mainly supported by InnoDB and are used differently from other conventional locks. See more about gap locks. For locks, only more suitable, no better, different locks are suitable for different scenarios, such as table lock is suitable for read and write less scene, such as online query system; Line locking is suitable for high concurrency scenarios, such as students choosing courses; Page locking applies to scenarios with a large amount of data and moderate modification.
Shared locks and exclusive locks
The implementation of Lock in each engine generally designs the Lock as the standard Share Lock (S Lock) and Exclusive Lock (X Lock), which can be roughly called read Lock and write Lock. At that time, read Lock also exists in the communication of thread and process, and its function is not quite the same as here. Therefore, they are best called shared and exclusive locks.
In a database, shared locks allow a transaction to read resources that it has locked with a shared Lock without being able to modify those resources, and other transactions can simultaneously add shared locks to those resources, which is called Lock compatibility, but cannot add exclusive locks to those resources, which is called Lock conflicts. An exclusive lock allows a transaction to read and modify a resource that is locked with an exclusive lock, but other transactions cannot lock a resource with an exclusive lock added. Examples are LOCK IN SHARE MODE and FOR UPDATE.
Table locking at the service layer
Although locks are the storage engine’s responsibility by default, the MySQL service layer has its own lock implementation, which is used for ALTER TABLE and other operations, regardless of the engine’s locking mechanism.
MyISAM table locks
MyISAM does not support transactions, but only supports table locks. Its table locks are implemented in shared and exclusive lock modes. Its table locks have the following features:
-
Default locking: It automatically locks the tables involved in the query before performing the SELECT operation. Before performing a write operation, it automatically locks the involved tables in an exclusive way. The user does not need to explicitly lock the table.
-
Exclusive lock priority: Exclusive locks have higher priority than shared locks by default, so a write request can be inserted in front of the shared lock in the lock queue. However, you can set the priority in the Settings.
-
Concurrent reads and writes: For the write operation insert, MyISAM supports concurrent reads and writes, set using the concurrent_INSERT variable:
- If the value is 0, concurrent insertion is not allowed.
- When the value is 1, data can be inserted to the end of the table when a shared lock is added if there are no empty Spaces in the middle of the table, that is, no deleted rows exist in the middle of the table.
- Equal to 2 allows concurrent insertion of records at the end of the table, regardless of whether there is a void in the table.
For read exclusive lock, if write operations are frequent, serious lock contention will be caused, you can run the SHOW STATUS LIKE ‘Table_locks_waited’; The larger the value is, the more serious the lock contention is.
The lock of the InnoDB
Record locks
InnoDB supports record locking, one record is a row, so it is row level lock, also known as row lock, record lock is also divided into shared lock and exclusive lock. InnoDB locks rows on indexes rather than data, such as primary Key index, which is the primary Key of a clustered index. If a secondary index is used, the primary Key of the secondary index will be locked as well as the primary Key corresponding to that Key in the clustered index. So if our operation requires a lock and the query condition does not match any index, a table lock is used to lock the entire table.
It should be noted that row locking can only solve the illusion problem when deleting data, but the illusion caused by inserting new data cannot be solved. It needs to be solved by gap locking.
Table locks used
InnoDB does not support table locks, but InnoDB does use table locks automatically, as mentioned above. You can also manually enable table locks, using table locks provided by MySQL service layer.
There are two things to be aware of when InnoDB uses table locks:
-
Only when autoCOMMIT =0 and Innodb_table_lock =1 (the default) is InnoDB aware of the table lock added by the service layer. The service layer is aware of the row lock added by InnoDB. In this case, InnoDB automatically recognizes and handles deadlocks involving table locks. If the autocommit mode! = 0, InnoDB does not use table locks.
-
When you use LOCK TABLES to LOCK TABLES, COMMIT or ROLLBACK before you perform UNLOCK TABLE operations, because the UNLOCK TABLE automatically commits transactions.
Conflicts and compatibilities between table and row locks are as follows:
| | row-level Lock X | row-level Lock S | | table level Lock X | | conflict | | table level Lock S | | | compatibility conflictCopy the code
Intent locks
InnoDB supports multi-granularity locking including table locks and row locks. Once an X lock is added, no other transaction can add any table locks. If every time a table lock is added, you need to iterate to see if there is a row lock, which can be a very serious performance problem. So InnoDB uses Intention Lock to solve this problem. Intent locks are used to indicate that a transaction wishes to lock certain data. InnoDB’s intent lock is table level. If a transaction adds an intent lock to a table, it means that the transaction wishes to add a row lock or table lock to the table. Therefore, all transactions must add an intent lock before they are locked.
Intent locks are divided into intent shared locks (IS Lock) and intent exclusive locks (IX Lock). IS and IX are Lock compatible with each other. At the same time, the intent lock is table level lock, and row level lock is not conflict, will only conflict with table level lock. That is, after adding IX, we can add row lock X. The conflicts and compatibility of all locks are as follows:
| | IX | IS | levels table table X | S | level line level X | S | | IX compatible compatible | | | | conflict conflict compatible compatible | | | | IS compatible conflict compatible compatible | | | | | compatible Compatible | | | table level X | | | | | | conflict conflict conflict conflict conflict conflict | | | table level S conflicts compatible | | | compatible compatible conflict of | | | | row-level X | compatible compatible | | | | | conflict conflict conflict Conflict | | row-level conflict compatible | | | S | conflict compatible compatible | | | conflictCopy the code
We can also see from this example that cartesian product operations should be avoided
Here’s an example of what intent locks can do:
-
Transaction A updates the records in the table, so an exclusive row lock needs to be added; Transaction B uses the index-free query mode, so A shared table lock needs to be added. At this time, transaction B learns that the table has A row lock by traversing the data, and therefore blocks waiting for transaction A to execute.
-
After the intentional locking mechanism is added, transaction A first adds an intentional exclusive lock to the table and then to the row. Transaction B adds an intentional shared lock to the table, and then it finds that transaction A has added an intentional exclusive lock, so it knows that there is an exclusive lock in the table, and blocks and waits directly.
-
At this time, transaction C wants to update some records in the table and add intentional exclusive lock to the table. Since it is a row lock, transaction C can still add row lock when the two data are different.
On the lock
The self-increment lock is used by MySQL to maintain the self-increment of tables. If the self-increment field is empty or 0, you need to obtain the self-increment. When acquiring auto-increment, you need to acquire an auto-increment lock, which is a lightweight lock used to ensure that concurrent inserts do not have the same auto-increment. For multi-row inserts, autoincrement locks have three working modes, set according to the innodb_autoinc_lock_mode parameter:
- 0. In traditional mode, the inserted statement holds the lock from the start to the end of execution.
- 1. Continuous mode: Multiple consecutive increments can be generated at one time for consecutive insert statements, and the auto-increment lock will be released after the value is generated.
- 2. Insert mode, auto-increment lock is independent of statement execution and has the highest performance, but there may be a jump in auto-increment in consecutive insert statements, because other transactions may also be acquiring auto-increment inserts.
INSERT INTO (INSERT INTO); INSERT INTO (INSERT INTO); INSERT INTO (INSERT INTO); INSERT INTO (INSERT INTO); The SELECT. These Settings are at the statement level, not at the transaction level, so the rollback of a transaction is independent of the increase in self-increment, so if a transaction rolls back after it gets a self-increment insert, it will result in a primary key void.
Therefore, do not assume that the value increment must be greater than the maximum value of the current field. At the same time, if the value of this field is inserted or updated before the self-increment reaches x, the value of this field is x. An error will be reported when performing inserts when AUTO_INCREMENT == x, so don’t do these tricks when you’re bored.
Gap lock and key lock
A gap lock is used to lock a gap between two index records. Conflicts occur if records are added to the middle of a gap with a gap lock. Gap lock and row lock are collectively called next-key-lock. By locking the gaps between records, a temporary key lock solves the phantom read problem during current read and write operations. However, there is no phantom read problem in common query because the snapshot read mode is adopted in common query.
Gap locks, like row locks, are index-oriented. Temporary key locks do not always use gap locks and row locks in unique and non-unique indexes:
For equivalent query, when the index uses a unique index, records with the same key cannot be inserted, so there is no phantom row problem. It is only necessary to ensure that the record is not deleted. Therefore, for equivalent query with unique index, the temporary key lock degrades to row lock, and the lock record is enough. When an index is a normal index, duplicate keys can exist, so phantom rows appear or disappear at the same time. Therefore, in addition to adding a row lock to the original record, it is necessary to use a gap lock to lock the upper and lower gaps of all hit records.
For range queries, a nonunique key lock behaves the same on a unique index as on a nonunique index. All hit range rows are locked to ensure that they are not deleted, and all hit intervals are locked to ensure that they are not inserted.
conclusion
The following are the important concepts related to concurrency:
-
Transaction is an implementation scheme for ensuring data consistency in database management system.
-
Transactions can only guarantee the ultimate consistency of data in a database management system if they guarantee atomicity, consistency, isolation, and persistence.
-
A variety of problems can occur when transactions run concurrently, including lost updates, dirty reads, unrepeatable reads, and phantom reads. All problems are caused by concurrent write operations, both read and write.
-
Update loss needs to be resolved by locking. Dirty reads need to be addressed by write on commit. The non-repeatable read can be resolved by snapshot read or row lock. Phantom read needs to be solved by snapshot read or temporary key lock.
-
Concurrent write conflicts are resolved by locking, while concurrent read and write conflicts are resolved by snapshot reading.
-
The goal of MVCC is to implement snapshot reads. The goal of snapshot reads is to resolve concurrent read and write conflicts.
-
The current read is the result of forced reading of other transaction updates, and the write operation also needs to be operated according to the latest data version. InnoDB uses temporary key locking to solve the phantom read problem when writing concurrently.
-
Row locks and gap locks are only used for indexes. Therefore, if a query or update fails to match an index, InnoDB will use table locks to perform a full table scan. In this case, data inconsistency will not occur, but the risk of deadlock is high.