preface

As we all know, transactions and locks are very important functions in mysql, and are also the focus and difficulty of the interview. This article will introduce the concepts and implementation principles of transactions and locks in detail, I believe you will have a deeper understanding of transactions and locks after reading.

What is a transaction

In Wikipedia, a transaction is defined as a logical unit within a database management system (DBMS) execution consisting of a limited sequence of database operations.

Four characteristics of transactions

Transactions encompass four major features, Atomicity, Consistency, Isolation and Durability.

1. Atomicity

Atomicity refers to a series of operations on a database that either all succeed or all fail, with no partial success. Take the transfer scenario as an example. If the balance of one account decreases and the balance of the other account increases, both operations must succeed or fail simultaneously.

2. Consistency

Consistency means that the integrity constraints of the database have not been broken and the data state is legal before and after the transaction is executed. The consistency here can mean that the constraint of the database itself is not broken, such as the uniqueness constraint of some fields, field length constraint, etc. It can also represent business constraints in various real-world scenarios, such as the transfer operation above, where the amount reduced in one account must be the same as the amount increased in the other.

3. Isolation

Isolation means that multiple transactions are completely isolated from each other and do not interfere with each other. The ultimate goal of isolation is also to ensure consistency.

4. Durability

Persistence means that as long as the transaction commits successfully, the changes made to the database are preserved forever and cannot be returned to the original state for any reason.

State of a transaction

Depending on the stage of the transaction, the transaction can be roughly divided into the following five states:

1. They are active.

A transaction is active when its corresponding database operation is in progress.

2. Partially committed

A transaction is in a partially committed state when the last operation in the transaction has completed but the changes have not been flushed to disk.

3. failed

When a transaction is active or partially committed and cannot continue due to some error, the transaction is in a failed state.

4. Aborted

When a transaction is in the failed state and the rollback operation is complete, data is restored to the state before the transaction was executed, the transaction is in the aborted state.

5. Committed

A transaction is committed when it is partially committed and all modified data has been synchronized to disk.

Transaction isolation level

As mentioned earlier, transactions must be isolated. The simplest way to achieve isolation is to not allow transactions to be concurrent, and each transaction is queued, but this approach is very poor performance. Transactions support different isolation levels for both isolation and performance.

To illustrate what follows, let’s create a sample table, Hero.

CREATE TABLE hero (
    number INT,
    name VARCHAR(100),
    country varchar(100),
    PRIMARY KEY (number)
) Engine=InnoDB CHARSET=utf8;
Copy the code

Problems encountered with concurrent execution of transactions

There are four types of problems that can occur when transactions are executed concurrently without any control:

1. Dirty Write

Dirty writes are when one transaction modifies data that has not been committed by another transaction.

As shown in the figure above, Session A and Session B each start A transaction. The transaction in Session B first updates the name column of the record whose number is listed 1 to ‘Guan Yu’, and then the transaction in Session A then updates the name column of the record whose number is listed 1 to zhang Fei. If the transaction in Session B is later rolled back, the update in Session A will not exist, which is called dirty write.

2. Dirty Read

Dirty reads are when a transaction reads data that has not been committed by another transaction.

As shown in the figure above, Session A and Session B each start A transaction. The transaction in Session B first updates the name column of the record whose number is 1 to ‘guyu’, and then the transaction in Session A queries the record whose number is 1. If the value of column name is read as’ guyu ‘and the transaction in Session B is later rolled back, then the transaction in Session A reads data that does not exist. This phenomenon is called dirty read.

3. Non-repeatable Read

Non-repeatable read refers to that the data committed by other transactions is read during the execution of a transaction, causing the results of two reads to be inconsistent.

As shown in the figure above, we committed several implicit transactions in Session B (mysql automatically adds transactions for increments, deletes, and deletes). Each of these transactions changed the value of name in the record column whose number is 1. After each transaction committed, if the transaction in Session A can see the latest value, This phenomenon is also known as unrepeatable reading.

4. Phantom

Phantom read refers to that during the execution of a transaction, the newly inserted data of other transactions is read, resulting in inconsistent results of the two reads.

Select * from hero where number > 0; select * from hero where name = ‘liu’; Session B then commits an implicit transaction that inserts a new record into the table Hero; Then the transaction in Session A queries the table Hero according to the same condition number > 0, and the result set contains the new record inserted by the transaction in Session B. This phenomenon is also called phantom read. The difference between an unrepeatable read and a phantom read is that an unrepeatable read reads data that has been modified or deleted by another transaction, while a phantom read reads data that has been inserted by another transaction.

The problem of dirty writing is so serious that any isolation level must be avoided. Other issues, whether dirty reads, unrepeatable reads, or phantom reads, are database read consistency issues, where two consecutive reads are inconsistent within a transaction.

Four levels of isolation

Four isolation levels are established in the SQL standard to address the above read consistency problem. Different isolation levels can solve different read consistency problems.

1.READ UNCOMMITTED

Read not committed.

2.READ COMMITTED

Read submitted.

3.REPEATABLE READ

Repeatable.

4.SERIALIZABLE

Serialization.

Read consistency problems that may occur at each isolation level are as follows:

InnoDB supports four isolation levels (roughly as defined by the SQL standard). The higher the isolation level, the lower the concurrency of the transaction. The only difference is that InnoDB handles phantom reading at REPEATABLE READ level. This is why InnoDB uses repeatable reads as the default isolation level for transactions.

MVCC

Concurrency Control (MVCC) is a multi-Version Concurrency Control that addresses the problem of Concurrency by maintaining historical versions of data.

Version of the chain

In InnoDB, each row actually contains two hidden fields: the transaction ID (trx_id) and the rollback pointer (roll_pointer).

1.trx_id

The transaction id. Each time a row record is modified, the transaction ID of that transaction is assigned to the trx_ID hidden column.

2.roll_pointer

Rollback the pointer. Each time a row is modified, the undo log address is assigned to the ROLL_pointer hidden column.

Suppose there is only one row in the hero table with the transaction ID 80 inserted at that time. Here is an example of the record:

Suppose two transactions with id 100 and id 200 UPDATE this record, and the operation flow is as follows:

Since the undo log is logged first for each change, roll_pointer points to the undo log address. Therefore, you can think of the change logs for this record as a chain of versions, the head node of which is the most recent value for the current record. As follows:

ReadView

If the database isolation level is READ UNCOMMITTED, the latest version in the version chain can be READ. If SERIALIZABLE, transactions are executed with locks, with no read inconsistencies. However, if READ COMMITTED or REPEATABLE READ is COMMITTED, each record in the version chain needs to be iterated to determine whether the record is visible to the current transaction until it is found (if not found, it does not exist). InnoDB implements this functionality through ReadView. ReadView contains the following four contents:

1.m_ids

Represents a list of transaction ids of the read and write transactions currently active on the system at the time the ReadView was generated.

2.min_trx_id

Represents the smallest transaction ID, the minimum value in m_IDS, of the read/write transactions currently active on the system at the time the ReadView was generated.

3.max_trx_id

Represents the id value in the system that should be assigned to the next transaction when the ReadView is generated.

4.creator_trx_id

Represents the transaction ID that generated the ReadView transaction.

With ReadView, we can determine whether a version of the record is visible to the current transaction based on the following steps.

1. If the value of the accessed version’s trx_id attribute is the same as the creator_trx_id value in ReadView, it means that the current transaction is accessing its own modified record, so the version can be accessed by the current transaction.

2. If the value of the trx_id attribute of the accessed version is less than the value of min_trx_id in ReadView, the transaction that generated the version was committed before the ReadView was generated by the current transaction, so the version can be accessed by the current transaction.

3. If the value of the accessed version’s trx_id attribute is greater than or equal to the value of max_trx_id in ReadView, the transaction that generated the version was started after the current transaction generated the ReadView. Therefore, the version cannot be accessed by the current transaction.

If the value of the trx_id attribute is between the min_trx_id and max_trx_id of the ReadView, then check whether the trx_id attribute is in the m_IDS list. If yes, the transaction that generated the ReadView was active when the ReadView was created. The version is not accessible; If not, the transaction that generated the version of the ReadView when it was created has been committed and the version can be accessed.

One big difference between READ COMMITTED and REPEATABLE READ isolation levels in MySQL is when they generate readViews. READ COMMITTED Generates a ReadView before each READ to ensure that data COMMITTED by another transaction can be READ each time. REPEATABLE READ only generates a ReadView on the first READ of the data, thus ensuring that the results of subsequent reads are exactly the same.

The lock

Transaction concurrent access to the same data resource can be divided into read – read, write – write and read – write.

1. Read – reading

That is, concurrent transactions access the same row of data records simultaneously. Since both transactions are read-only and do not affect the record at all, concurrent reads are fully allowed.

2. Write – write

That is, concurrent transactions modify the same row of data records simultaneously. This situation can lead to dirty writing problems, it is in any case, are not allowed to happen, so can only be achieved by locking, when a transaction is need to modify a row, will first give the record locking, if lock continues to execute successfully, otherwise we are waiting in line, transaction completes or rollback automatically releases the lock.

3. Read write

That is, one transaction reads and the other writes. This can result in dirty reads, unrepeatable reads, and phantom reads. The best solution is to use multi-version concurrency control (MVCC) for read operations and lock for write operations.

The granularity of the lock

According to the data range of the lock, the lock can be divided into row – level lock and table – level lock.

1. Row-level locks

The granularity of locks is small when applied to data rows.

2. The table level lock

For the entire table, the granularity of the lock is large.

The classification of the lock

In order to prevent read-read from being affected, and block each other between write-write and read-write, Mysql uses read-write locks to implement the idea. Specifically, it is divided into shared locks and exclusive locks:

1. Shared Locks

The s-lock, for short, is required to obtain the S-lock of a record before a transaction reads it. An S lock can be held by multiple transactions at the same time. We can use select…… lock in share mode; The way to manually add an S lock.

2. Exclusive Locks

X lock for short, before a transaction changes a record, it needs to obtain the X lock of the record first. X locks can be held by at most one transaction at a time. There are two ways to add X lock. The first is automatic lock. When data is added, deleted or modified, an X lock is added by default. There is also manual locking, where we add an X lock to a row with a FOR UPDATE.

It is also important to note that if one transaction already holds an S lock for a row, another transaction cannot add an X lock for that row, and vice versa.

In addition to Shared and Exclusive Locks, Mysql also has Intention Locks. Intent locks are maintained by the database itself. Generally speaking, when we add a shared lock to a row of data, the database automatically adds an intent shared lock (IS lock) to the table. Before we assign an exclusive lock to a row of data, the database automatically assigns an intentional exclusive lock (IX lock) to the table. The intent lock can be considered as the identification of S lock and X lock on the data table. Through the intent lock, it can quickly determine whether any records in the table are locked, so as to avoid traversing to check whether records in the table are locked and improve the locking efficiency. For example, if we want to add table level X lock, if there is row level X lock or S lock in the data table, the lock will fail. At this point, we can directly know whether the table has row level X lock or S lock according to the intent lock.

Table level locking in InnoDB

Table level locks in InnoDB mainly include table level intention shared lock (IS lock), intention exclusive lock (IX lock) and auto-inc lock. The IS lock and IX lock have been introduced before, so we will not repeat them here. Let’s focus on the auto-Inc lock.

As we all know, if we add an AUTO_INCREMENT attribute to a column, we do not need to specify the value of the column when inserting it. The system automatically increments the column. There are two main principles for the system to implement this AUTO_INCREMENT assignment:

1. The AUTO – INC

A table level auto-Inc lock is added before the insert statement is executed. The lock is released immediately after the insert is complete. If our INSERT statement cannot determine exactly how many records to INSERT before execution, such as INSERT… SELECT insert statements are generally locked in auto-inc mode.

2. Lightweight locks

The lightweight lock is acquired when the AUTO_INCREMENT value is generated in the insert statement and then released when the AUTO_INCREMENT value is generated. If our insert statement determines how many records to insert before execution, then we assign AUTO_INCREMENT columns with a lightweight lock. This approach avoids locking tables and improves insert performance. By default, mysql automatically selects a lock mode based on actual scenarios. You can also use innodb_autoinc_lock_mode to force you to use only one lock mode.

Row-level locking in InnoDB

As mentioned earlier, MVCC can solve read consistency problems such as dirty reads, unrepeatable reads, and phantom reads, but in reality it only solves the data reading problems of ordinary SELECT statements. READ operations performed by transactions using MVCC are called snapshot reads. All ordinary SELECT statements are regarded as snapshot reads at READ COMMITTED and REPEATABLE READ isolation levels. In addition to snapshot read, there is another kind of lock read, that is, the record is locked during the read. In the case of lock read, the problems of dirty read, unrepeatable read, and phantom read still need to be solved. These locks are row-level locks because they are placed on records.

InnoDB locks rows by locking indexes. If no index has been used in a query, the entire cluster index will be locked. Depending on the locking scope, row Locks can be implemented using Record Locks, Gap Locks, and next-key Locks. Let’s say I have a table T, and the primary key is ID. We inserted four rows of data with primary keys 1, 4, 7, and 10. Next, we will introduce three types of row locking using clustered indexes as an example.

1. Record Locks

The so-called record refers to the real data stored in the cluster index, such as 1, 4, 7 and 10 above are records.

Obviously, a record lock locks a row directly. When we use unique indexes (including unique indexes and clustered indexes) to perform equivalent query and accurately match a record, the record will be locked directly. For example select * from t where id =4 for update; The record with id=4 is locked.

2. Gap Locks

A gap is the part between two records that has not been logically filled with data, such as (1,4), (4,7), etc.

Similarly, a gap lock locks certain gaps. When we use the equivalent query or range query, and do not hit any record, the corresponding gap will be locked at this time. For example, select * from t where id =3 for update; Select * from t where id > 1 and id < 4 for update; I’m going to lock in the interval 1,4.

3. Next-key Locks

The key refers to the gap plus the record to its right of the left open right closed interval. For example, the above (1,4], (4,7], etc.

A combination of Record Locks and Gap Locks Locks the gaps between the indexes in addition to the records themselves.

When we use a range query, and a portion of the record is hit, the key range is locked. Note that the interval locked by a keylock contains the interval to the right of the last record.

For example, select * from t where id > 5 and ID <= 7 for update; Will lock (4,7), (7,+∞). The default mysql row lock type is next-key Locks.

When unique indexes are used, next-key Locks degrade to record Locks when an equivalent query matches a record. When no records are matched, a gap lock is degraded.

The last

I have arranged a: MySQL related information documents and knowledge map, Java systematic information, (including Java core knowledge points, interview topics and the latest Internet real questions in 20 years, e-books, etc.) friends who need to pay attention to the public number [procedure Yuan Small wan] can be obtained.