Transaction isolation of the database

Some time ago, the company encountered an internal problem, that is, the same batch of tasks we created were assigned to different instances to execute, resulting in problems in online results. The leader of the other group says that the transaction is not enabled and that it is ok to set up the transaction. As soon as there is a consistency problem in the database, it is said to start the transaction. I have a little doubt that the transaction of the database in the end is how to ensure consistency.

In the following content, we can first consider a few questions.

What are the isolation levels for the database? How is the MVVC view of the database implemented? What problem is the isolation level of the database designed to solve?

After reading the above three questions, how many can you answer? No hurry. Let’s move on

Database transactions

Database transactions are simply used to ensure correctness of data, and have only two operations: transactions either succeed or fail and are rolled back.

Why do you do that? This is because normally when we do a transaction, we do a set of operations. Like your usual financial transfers.

There are two operations in this transfer transaction:

  • Withdrawing money from your own bank account
  • Add the received money to the corresponding account.

Now think about what would have happened if we had not added transactions.

  1. If the first deduction is successful, the execution fails to add money to others. But the money has been deducted, the other party did not receive the money, you say what to do?
  2. If the first to the other party to add money, and deduct your money did not deduct success. Is this money subsidized by the bank? Hey, hey, that bank must not be happy.

Therefore, we can only use transactions in this operation, to ensure the success or failure of the execution, the failure of the rollback, to ensure that the withholding operation is not executed.

The transaction of the ACID

Transactions have four properties, called ACID for short

  • Atomicity: The same set of operations, either performed or not performed. The success of one or two operations in a group does not represent success. This is atomicity, do or do not (fail to roll back).
  • Consistency: data Consistency, like the above example said, you deduct money, the other party did not add money, that certainly not.
  • Isolation: When multiple databases operate the same data, they cannot affect each other. You can’t change the data space when you change it.
  • Persistent: Changes are permanent after transaction results are submitted and cannot be lost by subsequent operations or system failures.

The main thing we’re talking about today is isolation. How do transactions ensure isolation between data

The isolation level of the transaction

Different transaction isolation levels correspond to different data execution efficiencies. The more stringent the isolation, the less efficient the execution. The following four isolation levels are the more stringent the original.

  • Read Uncommitted: When data is not committed at the time of a transaction, other transactions can see the result
  • Read Committed: Index data cannot be seen by other transactions until its transaction commits. A view is created when the SQL statement is executed
  • Repeatable read: During the execution of a transaction, the results seen are the same as those seen when the transaction was started. A view snapshot is created when the transaction is started. In this transaction state, the contents of the view snapshot are viewed consistently, and other transaction changes are not seen. Note that the process is read, if it is update, then the current read is used, that is, the update operation of other transactions will get the result, which is used to ensure data consistency
  • Serializable: As the name implies, multiple transactions can be serialized (lock, read/write conflict, read/write conflict). After one transaction is completed, another transaction can be executed, resulting in poor parallelization and low efficiency.

The default transaction isolation level in Mysql is repeatable read. Use the following command to check the current transaction level.

show variables like 'transaction_isolation'; The following statement modifies the transaction level.SETsession TRANSACTION ISOLATION LEVEL Serializable; (Parameters can be Read uncommitted, Read committed, Repeatable, Serializable)Copy the code

How a transaction is started

In our applications, we often default to auto-commit, meaning that one SQL operation is one transaction, but sometimes we need to combine multiple SQL operations and explicitly enable transactions.

Use begin or start transaction to display the opened statement. At the end of the transaction, commit with COMMIT and roll back with rollBAKc if the transaction fails.

Set autocommit=0, the transaction will not commit automatically, and we need to execute the commit manually.

How to avoid long transactions

After turning off auto-commit transactions, we need to commit transactions ourselves, which is how each statement is executed.


begin
sqlstatementscommit
Copy the code

If we write a program, originally a SQL to solve the operation, as a result, forget to commit the transaction, the next SQL to commit, so that there will be a long transaction.

However, long transactions tend to cause a lot of congestion and lock timeout. If there are read/write operations (read/write conflicts, write conflicts) in transactions, data will be locked, and other transactions will also have to wait.

Therefore, we should try to avoid using large transactions in our programs, and also avoid the occasional large transaction (error 😁) when we write programs.

The solution is that we will automatically commit to open the transaction, which will be displayed when the transaction needs to be used.

What if there are a lot of transaction waits in the program

It is very convenient to locate a long transaction problem in MySQL.

First let’s find out what the long transaction being executed is.

select t.*,to_seconds(now())-to_seconds(t.trx_started) idle_time from INFORMATION_SCHEMA.INNODB_TRX t \G
Copy the code

This statement will show the start time of the transaction execution, we can easily calculate the current transaction execution time, where the above idle_time is the transaction execution time

Assuming that all transactions that exceed 30 seconds are long, we can use the following statement to filter transactions that exceed 30 seconds.

select * from information_schema.innodb_trx where TIME_TO_SEC(timediff(now(),trx_started))>30
Copy the code

Long transactions can be located at information_schema.innodb_trx.

To determine whether the long transaction should be killed or continue to wait. If a deadlock occurs, it is handled similarly, finding the statement that is deadlocked, and then executing (somewhat violently) that kills one of the statements.

How is data isolation implemented

Note: This data isolation is set up in a repeatable read scenario

In the repeatable read scenario, we know that each time a transaction is started, a view is currently started, and this view is a snapshot of the view of the entire database.

Why don’t we feel the elapsed time of creating a snapshot?

This is because view snapshots created by databases take advantage of the ability to create view snapshots quickly by having multiple versions of all data.

What about multiple versions of data?

Prepare the data

Hold your horses. Let’s get the data ready.

Now create a table and insert three pieces of data.

create table scores
(
    id    int   not null
        primary key,
    score float null
);
INSERT INTO scores (id, score) VALUES (1.3.5);
INSERT INTO scores (id, score) VALUES (2.3.65);
INSERT INTO scores (id, Score) VALUES (3.4);
Copy the code

There are two things we need to know before we start using it. Begin/Start Transaction and Start Transaction with consistent Snapshot.

  • Begin/Start Transaction views are created only after begin/start transaction, as in the following example

    begin
    select source from scores;  //The view is created from here instead of herebeginThere to createcommit
    Copy the code
  • Start Transaction with consistent snapshot: The view is created after this statement is executed.

Now that we understand the differences between the two created transactions, let’s look at how views can create multiple versions of data. The following SQL opens in both Windows.

A transaction Transaction B The results of
start transaction with consistent snapshot Start the transaction and create the view
start transaction with consistent snapshot Start the transaction and create the view
select score from scors where id =2 The value in transaction A is 3.65
update scores set scores = 10 where id =2 Transaction B is changed to 10
select score from scores where id =2 Transaction B is shown as 10
select score from scores where id =2 Transaction A is shown as 3.65
select score from scores where id =2 for update Will be locked, waiting for transaction B to release the lock (gap lock)
commit Commit transaction B
select score from scores where id =2 for update This statement becomes 10 (using the current read)
select score from scores where id =2 Without “for update” the result would still be 3.65
commit

The above process is two different requests to the same table of the database different operations.

When transaction A executes start Transaction with consistent Snapshot, A’s view is created, and no changes are made by transaction B. Even after transaction Bcommit, as long as transaction A does not end, it will see the value of its start time.

This corresponds to the statement that the results seen during execution are the same as those seen at startup without repeated commits.

Multiple Snapshot Versions

A snapshot is a transaction that starts on the entire database, and the entire database is large, so how does MYSQL allow us to create a snapshot quickly without feeling.

Snapshot multiple versions you can think of as consisting of the following two parts.

  • Transaction ID: This is requested from InnoDB startup when the transaction is started. And notice that it’s increasing.
  • Row trx_id: Specifies the transaction id of the data version, which is assigned to row trx_id each time a transaction is updated.

When there are multiple data versions of a single record, there are multiple row trxids. For example

version value Transaction ID The corresponding statement operation
v1 score =3 89
v2 score =5 90 update scores set score = 5 where id =3; select score from scores where id =3;
v3 score = 6 91 update scores set score = 6 where id =3;

V1 ->v2-> V3 There are three iterations involved. In the middle is through the undo log to save the updated record.

Note that after the snapshot is started, the value of v1 is obtained in the case of repeatable read isolation. The value is not directly stored by MYSQL, but calculated by using the latest version of the current record and the undo log. For example, score in v1 is calculated by using v3 ->v2 -> v1.

Version to calculate

The MYSQL version is not that simple to calculate, so let’s see how it works.

There are two points that we should pay attention to:

  • When a transaction is started, it requests a transaction ID from InnoDB’s transaction system, which is strictly incremented.
  • Row trx_id ==row trx_id; row trx_id ==row trx_id;
  1. When a transaction starts, the results of all committed transactions are visible, but after it starts, changes in other transactions are invisible.

  2. At the moment a transaction is started, in addition to the transaction that has been committed, there will be a transaction that is running at the moment it is created. MYSQL puts the ids of these running transactions into an array. The smallest transaction ID in the array is denoted as low watermark, and the maximum transaction ID +1 created in the current system is denoted as high watermark.

    Let’s take a simple example.

    MYSQL > create array with transaction ID = 8; MYSQL > create array with transaction ID = 9;

    B. If the transaction ID is lower than the low watermark, the transaction ID is visible to the current transaction. If the transaction ID is higher than the high watermark, the transaction ID is invisible to the current transaction. There are two cases where transaction ids are between low and high water levels.

    • If the transaction ID is in an active array, the version is executing, but the result has not yet been committed, so the transaction changes are not visible to the normal transaction.
    • If the transaction ID is not in the active array, it indicates that the transaction has already been committed, so it is visible. For example, there are 90 transactions, 91 transactions, 92 transactions, 91 transactions, 91 transactions, 91 transactions. A new transaction with id 93 is created, so the transaction in the active array is 90,92,93. As you can see, 91 has already committed, and its transaction is still between the low and high levels, but the result is visible to 93.

The general idea is that the result of your transaction is committed when I create it, so it’s visible, and it’s not visible after it’s committed.

Read process

The data in the old version view is calculated using the latest version and undo log.

A transaction Transaction B The results of
Start Transaction with Consistent Snapshot Transaction ID 89 Start the transaction and create the view
Start Transaction with Consistent Snapshot Transaction ID 92 Start the transaction and create the view
select score from scors where id =2 The value in transaction A is 3.65
update scores set scores = 10 where id =2 Transaction B is changed to 10
select score from scores where id =2 Transaction B is shown as 10
select score from scores where id =2 Transaction A is shown as 3.65
commit

Again, look at the transaction operation. Here’s how the data changes.

  • Assume that there are two active transactions with ID 78,88 before starting.
  • When transaction A starts it’s going to put 78, 88, including itself into the active array.
  • The statement that transaction A operates onselect score from scors where id =2The result it sees is v1, such as row trx_id (** note: row trx_id is assigned to row TRX ID after update) is 86 and saved.
  • When transaction B starts, it will find that the active array is 78,88,89, and its 92.
  • After transaction B executes the update statement, a new version V2 is generated, and the data transformation is V1- >V2. The undo log records intermediate changes. The data stored on ID 89 becomes historical data. The data version row trx_id is 92
  • Row trx_id = 92; row trx_id = 92; V1 = 86; row trx_id = 86; And 89+1 below the high water level. But since 86 is not in the active array and belongs to A committed transaction, the current transaction can see the result, so transaction A gets the read value.

You see, in A few simple steps, we get the transaction data we want to read, so whenever transaction A queries, it gets the same result as it read.

You can see with MVCC(Multi-version concurrency Control) the process of calculating the value of another transaction does not affect the current transaction to read the results.

It is often said not to write a long transaction. As can be seen from the above reading process, if a long transaction exists for a long time, there will be many data versions. Then undo log logs need to be saved for a long time, and these rollback logs will occupy a large amount of memory storage space.

The log can be deleted only when no transaction needs to read the log and version data, freeing up memory.

Update process

A transaction Transaction B The results of
Start Transaction with Consistent Snapshot Transaction ID 89 Start the transaction and create the view
Start Transaction with Consistent Snapshot Transaction ID 92 Start the transaction and create the view
select score from scors where id =2 The value in transaction A is 3.65
update scores set scores = 10 where id =2 Transaction B is changed to 10
select score from scores where id =2 Transaction B is shown as 10
select score from scores where id =2 Transaction A is shown as 3.65
select score from scores where id =2 for update Will be locked, waiting for transaction B to release the lock (gap lock)
commit Commit transaction B
select score from scores where id =2 for update This statement becomes 10 (using the current read)
select score from scores where id =2 Without “for update” the result would still be 3.65
commit

Above said the process of reading, in fact, in the transaction, we also have the update process, the update process is relatively simple, we need to ensure the consistency of data in the update process, can not say that others modify, we can not see, that will cause inconsistent data.

In order to ensure that the latest data is seen, the row is locked (row lock). After the lock, other transactions that update the row must wait for other transactions to commit before obtaining the latest value. This process is called current read.

Select score from scores where id =2 for update;

conclusion

This section mainly combs the isolation level of transactions and the realization principle of MVCC multi-version concurrency control of transactions.

This topic can be varied. The three questions we mentioned at the beginning of the interview can already be answered.

Why don’t you try to answer that?

Next period will talk about the illusion in the database, illusion is often encountered in the interview problem oh.