- Getting started with MySQL (1) : Internal implementation of queries and updates
- MySQL introduction (2) : indexes
- MySQL Introduction (3) : Transaction isolation
- MySQL > lock MySQL > lock
- MySQL Introduction (5) : Replication
Abstract
In this article, I will start by talking about what transactions are and why they are necessary.
Then, I’ll take a look at the four different levels of transaction isolation in InnoDB, what it solves, and what it causes.
Finally, I’ll talk about InnoDB’s approach to high-concurrency transactions: multi-version concurrency control.
What is a transaction
A typical example of A transaction is A bank transfer: Suppose both A and B have balances of $100, and A transfers $50 to B. So our operation process is like this:
- Query the balance of A and save it in
balance
And judgebalance
Is it more than 50 yuan - If yes, then
balance
Subtract $50, write it back to the database, and then add $50 to B’s balance, write it back to the database - If not, insufficient balance is returned
Then the problem comes. After the query in the first step, if we make another transfer immediately and the balance of A is still 100 yuan, which is greater than 50, the system judges that the balance is sufficient and the transfer is successful. But when I write back to the database, A’s balance is still $50, and B’s balance is now $200.
I believe you can see that the core of the problem is that this process has been “interfered” by someone, and the transfer process has not been completed quietly and undisturbed.
It is because we want our business logic to be left undisturbed that we have “transactions”.
So what are the conditions for a transaction?
I’m sure you’ve heard the expression ACID, more or less.
1. Atomicity: In the usual sense, Atomicity refers to the indivisibility of a statement. But in a transaction, it means that all statements that make up the transaction must be executed or rolled back.
2. Consistency: The Consistency here is a little different from the data Consistency we said. When we talk about data consistency, we generally mean that the data in MySQL and Redis is consistent, or the data in the primary and secondary MySQL libraries is consistent. But in this case, it usually refers to whether the transaction produces an unexpected intermediate state or result. For example, in the bank transfer example above, the total balance of two people before the transfer was 200 yuan, but after the transfer, it became 250 yuan. That’s not consistent.
3. Isolation: As the name implies, Isolation means that transactions should not affect each other. In MySQL, transaction isolation is divided into four levels, which we’ll cover in more detail later.
4. Duration: This is easy to understand, if a transaction is committed, the data must be saved and not lost.
2 The isolation level of the transaction
The isolation level of a transaction ranges from low to high, with read uncommitted, read committed, repeatable read, and serialized.
At each level of isolation, problems can arise: dirty reads, unrepeatable reads, and phantom reads.
For example, suppose we have a table with only two fields and insert the following data:
CREATE TABLE `t`(
id int,
v int,
PRIMARY KEY (`id`)
)ENGINE=InnoDB;
insert into t(id, v) values(0, 0)
Copy the code
Note that all of the following is based on table T with only one row (0, 0).
2.1 Read Uncommitted
The isolation level of the transaction is read uncommitted:
As can be seen from the figure: at T3, transaction A searched for data (0, 1), but later transaction B rolled back, resulting in the row of data (0, 1) is wrong, which is called dirty read.
The root cause of the problem is that transaction A reads data that transaction B has not committed, which is also the problem with transaction isolation level read uncommitted. Such a transaction isolation level, which only guarantees atomicity but does not guarantee isolation, is the lowest transaction isolation level.
2.2 Read Submitted
Knowing that the above problem is due to a transaction reading data that has not yet been committed, we make the isolation level of the transaction read committed, that is, only committed transactions can be read at this point. So let’s see what happens when we do this:
We know that only committed data can be found in read committed isolation levels. At time T5, transaction B has committed, and its changes are visible to transaction A.
In other words, at time T5, transaction A finds (0, 1). But the problem is that at time T2 transaction A finds (0,0). This is called a “non-repeatable read” in which the same row of data is searched in the same transaction but different results are obtained.
The problem with reading at the committed transaction isolation level is that there is no guarantee that query results will not change within the same transaction.
2.3 Repeatable
Since we found the problem of not being able to keep the result unchanged in a transaction, we have MySQL copy all the data into a snapshot at the moment the transaction starts, and then have all the lookups performed on this snapshot. This way, all queries are consistent within the same transaction.
Such transaction isolation levels are called repeatable reads.
Note that copying all the data into a snapshot is not accurate, as this would double the amount of storage required for each transaction started, which is obviously not possible. But you can understand it this way, and I’ll explain to you later how MySQL does the “snapshot” feature.
So what might go wrong at the repeatable read isolation level?
At time T2, the result of transaction A looks like this:
id | v |
---|---|
0 | 0 |
It is worth noting that we also inserted a row of v 0 into transaction B at time T3, but since we are using the isolation level of repeatable reads, it can be inferred that the lookup at time T5 will not find the newly inserted row.
In other words, at time T5, the query result is the same as at time T2:
id | v |
---|---|
0 | 0 |
But here’s the problem. Transaction A does not know the existence of transaction B. When transaction A finds that there is no data with id 1 and v 0, transaction A tries to insert the data with id 1 and V 0.
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
Copy the code
The primary key is duplicate. Then transaction A is confused: why can’t I insert this row of data even though IT doesn’t exist?
This is illusory reading.
The causes and solutions will be mentioned in the next article, but let’s move on to the most stringent transaction isolation level.
2.4 serialization
Serialization, as the name implies, means that transactions must be executed serially.
To be more specific, all transactions that involve writes must be executed sequentially.
Unlike other isolation levels, in serialization, read operations are shared locks and follow a strict two-phase locking protocol.
In serialization, because transactions are executed sequentially, the problems mentioned above are not possible. The problem, however, is that when a transaction is serialized, MySQL can no longer process transactions concurrently, resulting in poor performance.
I’ll cover locks in the next article.
3 Multi-version concurrency control
I mentioned the term “snapshot” in 2.3 Repeatable Readability.
However, this is not accurate because iT is impossible for MySQL to back up all the data at the same time a transaction starts.
I’m ready to introduce InnoDB’s Multi-version Concurrency Control (MVCC).
First, clarify two concepts:
First, each transaction is assigned an ID at startup. This ID is assigned by InnoDB and is incremented.
Second, InnoDB adds three fields to each row of the database. DB_TRX_ID specifies the transaction ID of the row to insert or update. DB_ROLL_PTR is a pointer to older versions of undo log data. DB_ROW_ID is an incrementing row ID.
Let’s start with this picture:
Once again, table T has two fields, id and V. It then adds the pointer and transaction ID fields InnoDB automatically adds and omits the row ID field.
The data row outside the top dotted box represents the latest data with ID 0, where V is 4, changed by the transaction with ID 50.
Looking down, in this latest data, the pointer points to a row with ID 0 and V 3 that was changed by a transaction with ID 44.
InnoDB records the id of the transaction in which the row is updated in the transaction ID field, and then fills the pointer field with the memory address of the original data. That is, InnoDB can find the change history of this row and the transaction ID that generated the record based on the pointer address here.
So what does this have to do with what we call snapshots?
Assuming the transaction isolation level is “repeatable read”, when a transaction is started InnoDB internally generates an array of ids of all transactions that are currently active (that is, still executing and not committed) and sorts them.
So when the current transaction executes the search statement, each row of data found will be judged as follows:
- If the transaction ID of the row is less than the minimum value in the array, then the row has been updated before the transaction starts and can be returned directly
- If the transaction ID of the row is greater than the maximum value in the array, then the row was started and modified after the current transaction. The row is not visible, and the pointer is used to find the next row until the condition is met
- If the transaction ID of this row of data is in the middle of the maximum and minimum values in the array, then it is necessary to determine whether the transaction ID of this row of data is in the array. If yes, it indicates that the transaction is still active. Pointer should be used to find the last data. Otherwise, the transaction is committed and the data can be returned directly
Let’s look at an example:
Assume that table T already has a row of data, id=0, v=1, inserted by a transaction with ID 100.
Then assume that transaction A’s ID is 101, transaction B is 102, and transaction C is 103.
At T4, transaction C updates the row with the following historical version of the data:
Then at time T6, transaction B is ready to update the row. Note that when updating, regardless of the historical version of the data, always update the latest row of data. This is called “current read”, meaning InnoDB updates, inserts and deletes are independent of snapshots and must be updated with the latest data. The content of this part will be further introduced in the next section.
So, it became this:
And then it’s time T7, ready to read.
When transaction B is started, transaction C has not been started, so the array is [101, 102], and the version of the data read is 102, which is the update made by transaction B itself, so this row of data meets the requirements and returns.
At time T8, transaction A is ready to read. When transaction A starts, the array is [101], and the current data transaction ID is 102, greater than 100, so it does not meet the requirements, so we need to find the last data.
However, the id of the previous row is 103, which is also greater than 101, so it does not meet the requirements.
Finally, the row with transaction ID 100 is found and returns.
To put it simply:
- Uncommitted is not visible
- Not visible, committed after the current transaction started
- Current transaction modified, visible
- Committed before the current transaction starts, visible
The analysis above is based on “repeatable reads”, that is, views are created at the moment a transaction is started. In fact, “read committed” means the same thing, except that the consistency view is created not immediately after the transaction starts, but before each SELECT statement (also known as a consistent read).
It should also be added that all historical versions of data are saved in undo log, and InnoDB will decide when they are no longer needed to clean up the data to free up space.
In addition, all undo log updates are stored in the redo log.
Write in the last
First of all, thank you for being here!
This article has been a long time, sorry, there are too many things recently.
This article was originally intended to write “transaction isolation and locking”, but I found that there was too much content, so I decided to finish this article and write about transaction isolation and lock in the next article.
If there is something I understand wrong in this article, or I am not clear enough, welcome to exchange and learn together!
I’ll send you the next one soon. It won’t be a pigeon. (Laughter)
PS: If you have other questions, you can also find the author on the official account. And, all articles will be updated in the public account at the first time, welcome to find the author to play ~