The understanding of database transaction concept before there are a lot of places not in place, today with a simple example to illustrate the concept of database transaction and isolation level, also convenient for later review of the new.

1. What are transactions

Transaction is the basic unit of concurrency control. A transaction is a sequence of operations that are either performed or not performed, and it is an indivisible unit of work. For example, a bank transfer job involves deducting money from one account and adding money to another, both or none of which are performed. Therefore, you should think of them as a transaction. A transaction is the unit of data consistency maintained by a database, and data consistency is maintained at the end of each transaction. Let’s take an Msql database operation as an example to further explain database transactions: First of all, we use the following command to view the Mysql session transaction isolation level, about the transaction isolation level and its effects, we’ll have a detail in later chapters, here as long as the simple know database can set different transaction isolation level, different isolation levels will produce different effect to the operation of the transaction. The transaction isolation level of the current Mysql session is REPEATABLE-READ by default.

mysql> select @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+
Copy the code

To illustrate the transaction with an example, we create the following Bank table and insert a single data,

mysql> describe bank; +---------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+---------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | | name | varchar (40) NO | | NULL | | | the balance | a decimal (10, 2) | YES | | NULL | | +---------+---------------+------+-----+---------+----------------+ mysql> select * from bank; + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | id name | balance | + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 3 | fufu | | 2000.00 + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code

Start a database transaction using the start transaction command,

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
Copy the code

Update balance for row 3 to 3000.00,

mysql> update bank set balance = 3000 whereid = 3; Query OK, 1 row affected (0.09 SEC) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from bank; + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | id name | balance | + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 3 | fufu | | 3000.00 + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1  rowin set (0.00 sec)
Copy the code

The balance value of row 3 in the select statement has been changed to 3000.00. Then we try to insert a new data.

mysql> insert into bank (name, balance) values ('melo', 1000); Query OK, 1 row affected (0.06 SEC) mysql> select * from bank; + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | id name | balance | + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 3 | fufu 3000.00 | | | | 4 melo 1000.00 | | +----+------+---------+ 2 rowsin set (0.00 sec)
Copy the code

The above update and INSERT operations are all part of the same transaction until the end of the transaction. If an error occurs during the INSERT operation, we can see from the transaction definition. All operations that belong to the same transaction are either executed or none are executed. To verify this, use the rollback command to simulate a transaction failure rollback.

mysql> rollback; Query OK, 0 rows affected (0.01sec)Copy the code

Query all data in the database and find that the balance value of line 3 is equal to 2000.

mysql> select * from bank; + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | id name | balance | + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 3 | fufu | | 2000.00 + - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1  rowin set (0.00 sec)
Copy the code

So far, we have described the definition of database transactions and explained the operation mode of transactions with a simple Mysql operation. We can summarize the life cycle of database transactions as follows:

  • The start boundary of a transaction
  • The normal end boundary of a transaction (COMMIT), which commits the transaction and permanently stores the database state that has been updated by the transaction.
  • ROLLBACK: To ROLLBACK the transaction and return the database to its initial state before the transaction was executed.

Now we come back to think about the sample, the sample of all operations are done in a Mysql session, which is no other users connect to the database at the same time, the use of no concurrent sessions in this scenario, the end of the end of transaction is normal or abnormal, for the user to read data alone won’t cause any influence, Because everything he does is serial. However, in the actual application scenario, the database serves many sessions at any time. Suppose that user A’s transaction A updates the database data, and user B starts to read the data, and user B will read the new value. However, if transaction A is rolled back following an error in the next SQL statement, the data read by user B will be incorrect and invalid. This is a simple problem that database transactions can cause in a concurrent environment, so I’ll elaborate on the problem of concurrent transactions.

2. Problems caused by concurrent transactions

In this section we focus on the problems that can occur when a transaction is concurrent, using points in time and a table of transaction operations as examples.

2.1 Missing Updates

2.1.1 Type 1 lost updates

Definition: when A transaction is revoked, the update data of B transaction is overwritten.

Point in time A transaction Transaction B
T1 Start the transaction
T2 Start the transaction
T3 The query account balance is 1000 yuan
T4 The query account balance is 1000 yuan
T5 Deposit 100 yuan and change the balance to 1100 yuan
T6 Commit the transaction
T7 Take out $100 and change the balance to $900
T8 Cancel the transaction
T9 Balance restored to 1000 yuan (lost update)

The example above illustrates the first type of lost update problem, where transaction B succeeds but its updates are not permanently stored. This concurrency problem is caused by not isolating transactions at all. When two transactions update the same data, if one is committed and the other is undone, the update made by the first transaction is also undone. The start time and end time of transaction A contain the start time and end time of transaction B. It is avoided that transaction A rolls back transaction B’s committed transaction at the same time. This is the first type of lost update.

2.1.2 Type 2 lost updates

Definition: when A transaction commits, the update data of B transaction is overwritten.

Point in time A transaction Transaction B
T1 Start the transaction
T2 Start the transaction
T3 The query account balance is 1000 yuan
T4 The query account balance is 1000 yuan
T5 Take out $100 and change the balance to $900
T6 Commit the transaction
T7 Deposit 100 yuan and change the balance to 1100
T8 Commit the transaction
T9 Balance restored to 1100 yuan (lost update)

The difference between the second type of lost update and the first type is actually whether the impact on the data is caused by the cancellation or commit of A transaction. It is essentially the same type of concurrent problem as unrepeatable reads (described below), and is generally considered A special case of unrepeatable reads. Two or more transactions query the same data. Each then updates the data based on its own query results, resulting in the last committed update transaction that overwrites the other committed update transactions.

2.2 dirty read

Definition: Read data that has not committed an update

Point in time A transaction Transaction B
T1 Start the transaction
T2 Start the transaction
T3 The query account balance is 1000 yuan
T4 Take out $500 and change the balance to $500
T5 Query account balance is 500 yuan (dirty read)
T6 Cancel the transaction, the balance is restored to 1000 yuan
T7 Deposit 100 yuan to change the balance to 600 yuan
T8 Commit the transaction

Transaction A queries the update data that transaction B has not committed, and transaction A continues to perform related operations based on the query result. But then B transactions dropped the update, this could lead to A transaction is dirty data operation, has A dirty read T5 moment in the above example, eventually lead to A transaction is committed, the balance in the account is not correct, could have some question, B transaction has not been submitted or undo, T5 moment why A transaction can read data has been changed, here, The data in the data table changes in real time, and the transaction only controls the final state of the data, meaning that without the correct isolation level, the changed data value can be read by other transactions after the update operation statement, even if the transaction is not completed. So far: all databases avoid dirty READ operation, you can try the above operation with two Mysql sessions, with the default isolation level (REPEATABLE-READ), A transaction READ balance of $1000 at time T5, not $500.

2.3 Unrepeatable read

Definition: Read data that has been committed for update, but two identical queries within a transaction scope return different data.

Point in time A transaction Transaction B
T1 Start the transaction
T2 Start the transaction
T3 The query account balance is 1000 yuan
T4 The query account balance is 1000 yuan
T5 Take out $100 and change the balance to $900
T6 Commit the transaction
T7 The balance of the query account is 900 yuan (inconsistent with the one read in T4, which cannot be repeated)

2.4 phantom read

Definition: A magic read is similar to a non-repeatable read. A magic read is a query for a new insert that has been committed by another transaction. A non-repeatable read is a query for an update that has been committed by another transaction.

Point in time A transaction Transaction B
T1 Start the transaction
T2 Start the transaction
T3 The total deposit of user Z is 1000 YUAN
T4 Add a deposit account of Z, deposit 100 yuan
T5 Commit the transaction
T6
T7 Again, the total deposit of user Z is 1100 yuan (inconsistent with T4 reading, unreal reading)

There was no problem in the first query of transaction A, but the new insert data submitted by transaction B was found in the second query, resulting in different results of the two queries. The difference between unrepeatable and phantom reads: Simply put, unrepeatable reads are caused by data modification, while phantom reads are caused by data insertion or deletion. A non-repeatable read is a database access in which two identical queries within the scope of a transaction return different data. This is caused by the commit of other transaction changes in the system at the time of the query. For example, if transaction T1 reads some data, transaction T2 reads and modifies the data, and T1 reads the data again to verify the read value, it gets a different result. A more understandable way of saying this is to read the same data multiple times within a transaction. The same data is accessed by another transaction while the transaction is still active. So, between the two reads of the data in the first transaction. The data read by the first transaction may not be the same due to the modification of the second transaction, so that the data read by the first transaction is not the same, and therefore is called non-repeatable read, i.e. the original read is not repeatable. The so-called phantom read is that transaction A reads A number of rows matching the search criteria. Transaction B modifies the result set of transaction A by inserting or deleting rows, etc., before committing. Phantom reading is a phenomenon that occurs when a transaction is not executed independently, such as when the first transaction modifies data in a table, such as when the modification involves “all rows” in the table. At the same time, the second transaction also modifies the data in the table by inserting a “new row” into the table. Then, as if in an illusion, the user operating on the first transaction will discover that there are unmodified rows in the table. The general solution to phantom is to increase the scope lock RangeS to lock the scope of the lock as read-only, so as to avoid phantom.

3. Transaction isolation level

These are the five problems caused by concurrent transactions in the database. In summary, there are two types of update problems and three types of read problems. How does the database avoid such concurrent transactions? The answer is through different transaction isolation levels, under different isolation levels, the results of reading data by concurrent transactions are different, for example, in the dirty READ section, if the transaction is in REPEATABLE-READ isolation level, A transaction cannot READ the uncommitted data of B transaction at time T5. We need to set different isolation levels based on business requirements to strike a balance between efficiency and data security. The SQL standard defines four classes of isolation levels, including specific rules that define which changes are visible and which are not, both inside and outside a transaction. Low-level isolation levels generally support higher concurrency and have lower system overhead.

3.1 SERIALIZABLE

When a database system uses the SERIALIZABLE isolation level, a transaction executes without seeing any updates made to the database by other transactions. When two transactions operate on the same data in the database simultaneously, if the first transaction is already accessing the data, the second transaction can only stop and wait until the first transaction completes. So these two transactions actually run serialized.

REPEATABLE READ

When the database system uses the REPEATABLE READ isolation level, a transaction can see newly inserted records that have been committed by other transactions during execution, but can’t see updates made by other transactions to existing records.

3.3 READ COMMITTED

When a database system uses the READ COMMITTED isolation level, a transaction can see newly inserted records COMMITTED by other transactions during execution, as well as updates to existing records COMMITTED by other transactions.

3.4 READ UNCOMMITTED data

When a database system uses the READ UNCOMMITTED isolation level, a transaction can see newly inserted records that have not been committed by other transactions during execution, as well as updates to existing records that have not been committed by other transactions. With the four isolation levels listed from highest to lowest, you might say, choose SERIALIZABLE because it’s the safest! Yes, it’s the safest, but it’s also the slowest! The security of the four isolation levels is inversely proportional to performance! The safest performance is the worst, and the least secure performance is the best!

4. Isolation levels and concurrency issues

By defining the four isolation levels above, we can analyze which concurrency problems can be avoided by each isolation level, as summarized in the following table:

Isolation level The first type is missing updates The second type of lost updates Dirty read Unrepeatable read Phantom read
SERIALIZABLE avoid avoid avoid avoid avoid
REPEATABLE READ (REPEATABLE READ) avoid avoid avoid avoid allow
READ COMMITTED avoid allow avoid allow allow
READ UNCOMMITTED avoid allow allow allow allow

We can easily analyze this table ourselves by defining the isolation level. For example, the repeatable read isolation level is defined as a transaction that can see newly inserted records that have been committed by other transactions during execution, but cannot see updates made by other transactions to existing records. So, under this isolation level, in the dirty read sample moment of T5 and T7 has not repeatable read time, transaction is whether transaction B submitted, A transaction are unable to read from A to B to existing records updated, so won’t produce the dirty read and not repeatable read, and because of this isolation level can be seen under other transactions have submitted new insert record, Nature cannot avoid hallucinations. Also, it is important to note that all isolation levels avoid the problem of type 1 missing updates. Most relational databases use Read COMMITTED isolation by default. Mysql InnoDB uses Read REPEATable isolation by default because Mysql Replication uses the Statement log format. For example, Oracle supports Read Committed and Serializable isolation levels. In addition, Read snapshots can be used to prevent unrepeatable reads at the Read Committed level. MySQL uses the RR isolation level by default. The SQL standard requires RR to solve the problem of unrepeatable reads. However, because MySQL uses gap lock, the RR isolation level of MySQL also solves the problem of phantom reads. That is, Mysql InnoDB uses the next-key locking policy on the Read REPEATable level to avoid phantom reading.