preface
Before, we have analyzed the MySQL index related knowledge and explain execution plan analysis, presumably you have a basic understanding of the index, so this article, I will introduce you to MySQL transaction and MVCC related knowledge
What is a transaction
A Transaction is a program execution unit consisting of a series of operations that access and update data in a database.
All operations in the same transaction either succeed or do nothing at all. The ideal transaction must satisfy four properties, known as ACID.
ACID properties of transactions
Not all transactions are ACID compliant. For example, Oracle and SQL Server databases, which default isolation level is Read COMMITTED, do not satisfy I. The MySQL NDB Cluster engine does not meet the D(persistence) requirements.
A (Atomicity) – Atomicity
Atomicity refers to the fact that a database transaction is an integral part of a transaction, and only when all operations in a transaction succeed will the transaction be considered successful. If one operation fails, the other successful operations must also be rolled back. Take the scenario of transferring 1000 yuan as an example. A transfer process is a transaction, which mainly includes the following two steps: 1, deduct 1000 yuan from account A 2, add 1000 yuan to account B imagine, if the first step is successful, then the second step failed, that is equal to the 1000 yuan of A directly disappeared, I believe that this is not acceptable to anyone, so database transactions need to ensure atomicity.
C (Consistent) – consistency
The database integrity constraint is not broken before and after the transaction, and the data state is legal before and after the transaction execution.
For example, if we have a table in which a field name establishes a unique constraint, the name must remain unique after we commit or roll back the transaction.
I (Isolation) – Isolation
Isolation means that the operations of each transaction should be isolated from each other. For example, one transaction is not visible to another until it is committed.
Isolation is a relatively abstract and complex concept, such as the isolation between transactions how isolated do we really want to be? So, for isolation, the SQL92 standard defines four isolation levels, which are described later in the isolation level for transactions.
D (Durable) – persistence
Persistence is easy to understand the concept, that is to say the transaction by submitting a success, so it should be durable, restart, even the database server goes down, and so on and so forth, data is not lost (of course this cannot include because of natural disasters as earthquake caused by storing data of hard disk damage occurs irreversible damage).
Management of transactions
Automatic commit is enabled for MySQL transactions by default. Therefore, to enable automatic commit, we need to turn off automatic commit or display transaction enabled.
Automatic commit of transactions
To view automatic commit statements:
SHOW VARIABLES LIKE 'autocommit'; -- ON: autocommit enabled SELECT @@autocommit; -- 1 indicates that automatic submission is enabledCopy the code
Execute the following statement to turn off automatic commit:
SET autocommit='OFF';
SET @@autocommit = 0;
Copy the code
Note that this change only takes effect in the current session window and does not take effect in any other session window. Almost all MySQL variable Settings are divided into two levels: session(session) and global(global). The default is session level.
Common transaction control statements
- START TRANSACTION or BEGIN: indicates the START TRANSACTION displayed. Note that only START TRANSACTION can be used to START a TRANSACTION in a stored procedure. END syntax, the two will conflict.
- COMMIT: Commits the transaction. You can also write COMMIT WORK.
- ROLLBACK: Rolls back the transaction. You can also write ROLLBACK WORK.
- SAVEPOINT Identifier: Custom SAVEPOINT for long transactions that can be rolled back to our custom location.
- RELEASE SAVEPOINT Identifier: Deletes a SAVEPOINT. If there is no SAVEPOINT, an error will be reported
- ROLLBACK TO[SAVEPOINT] Identifier: ROLLBACK TO the specified SAVEPOINT.
The difference between COMMIT and COMMIT WORK
Both of these can commit a transaction, but the difference lies in what happens after the transaction is committed, as well as ROLLBACK and ROLLBACK WORK, which are controlled by a single variable: Completion_type.
SHOW VARIABLES LIKE '%completion_type%';
Copy the code
Completion_type has the following three results:
value | describe |
---|---|
NO_CHAIN or 0 | Default value. Commit and rollback are equivalent to commit work and rollback work |
CHAIN or 1 | Commit work and rollback Work are equivalent to commit and chain and rollback and chain. A new transaction with the same isolation level will be started immediately after the transaction is committed or rolled back |
RELEASE or 2 | In this case, commit work and rollback Work are equivalent to COMMIT Release and ROLLBACK Release. The current database connection will be disconnected after the transaction is committed or rolled back |
For example 1:
SET completion_type=1; --1 begin; --2 INSERT test2 VALUES(1,' zhang1 '); --3 commit work; --4 INSERT test2 VALUES(2,' zhang1 '); --5 select * from test2; --6 rollback; --7 select * from test2; 8 -Copy the code
Article 4 the statement, we submit a transaction, 5 statement we have inserted a data, the 6 statements can query the 2 data, next we rolled back, 8 to query statements will be found that there are only a data, because the rollback statement 6 times, we did not show after the statement 4 open a transaction, This means that statement 4 automatically starts a new transaction.
For example, 2:
SET completion_type=2; begin; INSERT test2 VALUES(3,' zhang1 '); commit work; select * from test2;Copy the code
The last statement returns the following:
Disconnect as prompted and then reconnect automatically. Testing this example with a tool such as SQlyog may not be obvious because the tool will automatically help you reconnect as if it is not disconnected. Use a command window to test this example
Classification of transactions
From the perspective of transaction theory, we can divide transaction into the following five categories:
Flat transaction
This is the simplest and most common type of transaction in which all operations are atomic and either all succeed or nothing is done.
Flat transactions with savepoints
This is generally suitable for long transactions, where we can choose not to roll back all the transactions, but to roll back to a savepoint we have defined. Here’s an example:
BEGIN; INSERT test VALUES(1,' 1'); SAVEPOINT A INSERT test VALUES(2,' zhang2 '); ROLLBACK TO A COMMIT;Copy the code
In the example above, we define A savepoint A and then roll back to A to commit the transaction, so the second insert statement fails and the first statement succeeds.
Note: After ROLLBACK to the specified savepoint, the transaction is still active and we still need to execute COMMIT or ROLLBACK statements to end the transaction
Chain business
After committing a transaction, we release the data we don’t need and implicitly pass the necessary data to the next transaction. (Note that committing a transaction and starting the next transaction are atomic operations.) This means that the next transaction can see the results of the previous transaction.
Chain of transactions can be seen as special affairs with save points, their difference is with the savepoint transactions can be rolled back to the arbitrary, but after the ROLLBACK transaction still active, you need to perform a COMMIT or ROLLBACK did not end after the transaction, and chain in the transaction can roll back to a savepoint recent (i.e. the start point of transaction).
Chain transactions can be implemented with the completion_type parameter above. Above has the example use method, here does not repeat the example.
The nested transaction
Nested transaction means that a transaction is nested within another transaction. There is a parent-child relationship between transactions. The child transaction does not take effect until the parent transaction commits.
Note that MySQL does not support nested transactions natively, but it is possible to simulate nested transactions with savepoints without the flexibility of a real nested transaction.
Distributed transaction
Distributed transactions are usually flat transactions running under multiple databases in a distributed environment. Flat transactions running in multiple database environments compose a distributed transaction.
The isolation level of the transaction
Read Uncommitted
RU for short. This is the lowest isolation level, equal to no isolation, and virtually no database will use this level. A transaction can read uncommitted data from another transaction, also known as dirty reads.
What is dirty reading? Consider the following example:
Transaction 1 on the left, check, check the id is 1 data name is zhang SAN, transaction 2 at this time again, the zhang SAN to the bill, then transaction 1 for a query, found out the name of li si, so if rollback transaction 2 and at this time, that is, the name is zhang SAN, but read the transaction 1 bill, this is dirty reads.
Read Committed
Hereinafter referred to as RC. A transaction can only read data that has been committed by another transaction, which means that the same query can be executed twice in a transaction with different results. Default database isolation level for Oracle and SQL Server databases. This isolation level solves the dirty read problem, but can cause unrepeatable reads.
What is unrepeatable reading? For example, if transaction 2 commits the data immediately after the update, then the second query from transaction 1 returns the same result as the dirty read, but this time it is not considered as dirty read, because transaction 2 commits the same data, which is called unrepeatable read.
Repeatable Read(Repeatable Read)
Hereinafter referred to as the RR. This isolation level solves the non-repeatable read problem, that is, the same query is executed in the same transaction with the same result, but this level can cause phantom read problems (except for the InnoDB engine, which solves phantom read problems through gap locking).
What is illusory? Consider the following example:
In the figure above, transaction 1 performs a range query, only one record can be found at the first time, then transaction 2 inserts a data, and then transaction 1 executes the same query again, at this time two records can be found, that is, one more record, giving a kind of illusion, so it is called phantom read. InnoDB solves the illusory problem by using keylocksClick here to)
At this point, some people may have a question, because it seems that the unrepeatable read and the phantom read are the result of the committed transaction, as if there is no difference? Indeed,
Serializable
This is the highest level of isolation, which means that all transactions are executed serially, so there are no concurrent transactions, and the problem of dirty reads, repeatable reads, and phantom reads is eliminated.
Comparison of different isolation levels
Different isolation levels can solve different problems, as shown below:
For uncommitted reads and committed reads, it’s easy to understand that you only need to control the commit of one transaction to be visible to the other. But for repeatable reads, how does MySQL implement that even if one transaction has been committed, the other transaction can be invisible? So that’s MVCC.
Implementation of transaction isolation
There are two implementations of transaction isolation, LBCC and MVCC
LBCC
This chapter begins with some concepts of locking, which I will focus on in the next article. This article will not cover too many concepts of locking. Want to know the
LBCC, Based Concurrency Control. When a transaction attempts to read a piece of data, it locks the data and does not allow other transactions to do so. (Of course, the implementation of the lock is also important. If we only lock the current piece of data, we still cannot solve the illusion problem.)
The current reading
This concept is actually easy to understand, MySQL lock after the current read. If the current transaction only has a shared lock, then other transactions cannot have exclusive locks, that is, cannot modify data; If the current transaction requires an exclusive lock, no other transaction can hold any locks. In summary, successful locking ensures that the current data will not be affected by any transaction other than the current transaction, so naturally the current transaction will read only the latest data, not snapshot data (MVCC explains the concept of snapshot read below).
In the LBCC solution, if our business system is read more than write less, this solution will greatly affect the efficiency, so we have another solution: MVCC.
MVCC
Concurrency Control. Multi Version Concurrency Control. When we modify the data, we can create a snapshot of the data, and then we can read the snapshot directly.
So how exactly does MVCC work?
To implement THE MVCC mechanism, InnoDB internally adds two hidden columns for each row: DB_TRX_ID and DB_ROLL_PTR. (MySQL also has a hidden column DB_ROW_ID, which is used as a primary key when InnoDB tables have no primary key.)
DB_TRX_ID
It is 6 bytes long and stores the transaction ID of the last transaction to insert or update the statement.
DB_ROLL_PTR
The length is 7 bytes and is called: rollback pointer. The rollback pointer points to the undo log record written into the rollback segment. When reading the record, the undo log record will be read according to the pointer.
Since the undo log in MySQL maintains a historical data record, we should get in the habit of committing transactions regularly, otherwise the rollback segment will grow larger and fill up the table space.
Read the snapshot
Snapshot read is for the current read above, which means that in RR isolation level, MySQL will choose to retrieve the snapshot data from the undo log without locking based on the rollback pointer, and not always retrieve the latest data. This is why another transaction commits the data. You still see data in the current transaction before another transaction commits.
When does MySQL start reading snapshots
Let’s look at an example from MySQL’s default isolation level RR (note that test and test2 both start as empty tables with only id and name columns).
- Scenario 1(first query after transaction 1 manipulates data) :
Transaction 1 | Transaction 2 |
---|---|
BEGIN; | BEGIN; |
INSERT INTO test VALUE(1, ‘zhang3’); | |
COMMIT; | |
SELECT * FROM test WHERE id=1;
(id=1, name= zhang3)
|
|
UPDATE test SET NAME= ‘id’ WHERE id=1; | |
SELECT * FROM test WHERE id=1;
(id=1, name= zhang3)
|
|
COMMIT; | COMMIT; |
- Scenario 2(Transaction 1 does nothing and transaction 2 starts the first query)
Transaction 1 | Transaction 2 |
---|---|
BEGIN; | BEGIN; |
SELECT * FROM test WHERE id=1;
Id =1, name= 1
|
|
UPDATE test SET NAME= 1 WHERE id=1; | |
COMMIT; | |
SELECT * FROM test WHERE id=1;
Id =1, name= 1
|
|
COMMIT; |
From the above two scenarios, we can conclude that RR isolation level snapshots are not generated at BEGIN, but are not generated until the first query in the transaction, and subsequent queries only read the snapshot data
- Scenario 3(after transaction 2 makes a t1 query, transaction 1 then operates on other tables T2)
Transaction 1 | Transaction 2 |
---|---|
BEGIN; | BEGIN; |
SELECT * FROM test WHERE id=1;
(id=1, name= 5)
|
|
INSERT INTO test2 VALUE(1, ‘Yang’); | |
COMMIT; | |
SELECT * FROM test2 WHERE id=1;
(empty)
|
|
COMMIT; |
From scenario 3, we can conclude that the RR isolation level snapshot is not only for the data currently queried, but for all data in the current MySQL (cross-library as well, as long as it is in the same MySQL).
MVCC query mechanism
How does the MVCC mechanism actually query? Given that many transactions are performed simultaneously, many snapshots will be generated. How will queries be performed?
DB_TRX_ID and DB_ROLL_PTR are integers. DB_TRX_ID and DB_ROLL_PTR are integers.
DB_TRX_ID = 1,DB_ROLL_PTR = null,DB_ROLL_PTR = null
id | name | Transaction id | Rollback Pointers |
---|---|---|---|
1 | Zhang SAN | 1 | null |
2 | Li si | 1 | null |
Select * from transaction C where id=3 and id=3; select * from transaction C where id=3 and id=3;
id | name | Transaction id | Rollback Pointers |
---|---|---|---|
1 | Zhang SAN | 1 | null |
2 | Li si | 1 | null |
3 | Cathy | 3 | null |
Transaction ID = 3; transaction ID = 3; transaction ID = 3; transaction ID = 3; transaction ID = 3; transaction ID = 3;
Select * from test where transaction id<=2Copy the code
Select * from D where id=1 and id=1; select * from D where id=1 and id=1;
id | name | Transaction id | Rollback Pointers |
---|---|---|---|
1 | Zhang SAN | 1 | 4 |
2 | Li si | 1 | null |
3 | Cathy | 3 | null |
Select * from (select * from (select * from (select * from (select * from (select * from (select * from))));
Select * from test where id<=2 and (select * from test where id<=2)Copy the code
6 E, if this time again affairs, the article 2 data updated, then produce a transaction id for 5 data, and the original data of the rollback Pointers are also marked as the current transaction id: 5, then can get the following data:
id | name | Transaction id | Rollback Pointers |
---|---|---|---|
1 | Zhang SAN | 1 | null |
2 | Li si | 1 | 5 |
3 | Cathy | 3 | 4 |
2 | Cathy | 5 | null |
Execute the following query based on the above guess:
Select * from test where id<=2 and (select * from test where id<=2)Copy the code
At this point, it turns out there are only 1 and 2.
MVCC queries two rules
To sum up, the general query rules of MVCC are as follows: 1. Only query data whose transaction ID is less than or equal to the current transaction ID. Select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from))))); (This cannot be equal to because if the transaction deletes a data, the rollback pointer of the data will be generated as the current transaction ID, so the deleted data must be excluded)
Of course, the above rule is only simplified, the actual query is far more complex than here, but I hope this simplified concept can help you understand the MVCC works better.