Introduction and Features

For programmers, programming is to express all kinds of business in the world in code, so the final output of code should be able to correspond to the real world. Let’s take bank transfer as an example. For example, I transfer 100 yuan to my friend A, and the final result is either success, my account balance decreases by 100, my friend’s account balance increases by 100, or failure, both of our balances remain unchanged. It is definitely not allowed that my account balance decreases but my friend’s account balance does not increase, or my friend’s account balance increases but my account balance does not decrease, both of which will make our relationship with the bank very disharmonious.

In this example, the transfer task, while seemingly simple, becomes a bit more troublesome when translated into code. Let’s see what it looks like in code:

At first glance, it seems simple enough that there is no problem, but if you add a step between 2 and 3 to check whether your friend’s account is legitimate, the steps become as follows:

Let’s assume that the third step is wrong, my friend’s account is a money laundering account, and the transfer process is about to end, but the problem is that 100 yuan has been deducted from my account. If I don’t pay back 100 yuan, I will not lose money, so I guess the bank has been destroyed. Therefore, this transfer process must have the following characteristics:

1. Atomicity

Either the transfer is successful, or there is no transfer, there can not be a transfer of half this situation. In other words, all parts of the transfer process are inseparable. No matter how complicated the transfer process is, the final result is either success, my account loses money and my friend’s account adds a corresponding amount of money, or neither my account nor my friend’s account changes. This characteristic is called atomicity.

2. Consistency

Many of the blogs on the web don’t get this right. This feature actually means that the data in the transfer process should be able to correspond with the rules in the real world, or it should meet certain rules, for example, my account must be greater than 0, and the amount deducted must be greater than 0. Failure to meet this constraint indicates that the data is faulty

3. Isolation

At the same time in the real world, there are multiple transfers occurring at the same time, and the data of multiple transfers occurring at the same time cannot affect each other. It will be a little more complicated to clarify this problem. Let’s slowly talk about it, and break down the transfer process of our premise into details as follows:

  • (1) Read the amount in my account. Suppose the amount is 101
  • (2) Deduct 100 from the balance
  • (3) Write the remaining balance to disk
  • (4) Read the amount of a friend’s account
  • (5) Add 100 to your friend’s account
  • (6) Write the new balance to disk

Suppose there are two such transfers going on at the same time. In a computer, because of the multiple processors, the two transfers do not strictly complete one

A time line Transfer 1 Transfer 2
1 Read the amount in my account. Now the amount is 101
2 Read the amount in my account. Now the amount is 101
3 Subtract 100 from the balance
4 Write the remaining balance to disk
5 Subtract 1 from the balance
6 Write the remaining balance to disk
7 Read the amount of a friend’s account
8 Read the amount of a friend’s account
9 Add 100 to your friend’s account
10 Writes the new balance to disk
11 Add 1 to your friend’s account
12 Writes the new balance to disk

Believe that many people have seen problems, because at the time line 2 read or 101 money, so in the sixth place on the disk my balance is 100, this time the bank shall not be death, therefore, the correct process is time line 134 must do these three things, then time line 2 is allowed to occur, the correct process is as follows:

A time line Transfer 1 Transfer 2
1 Read the amount in my account, let’s say the amount is 101
2 Subtract 100 from the balance
3 Write the remaining balance to disk
4 Read the amount in my account, and now the amount is 1
5 Subtract 1 from the balance
6 Write the remaining balance to disk
7 Read the amount of a friend’s account
8 Read the amount of a friend’s account
9 Add 100 to your friend’s account
10 Writes the new balance to disk
11 Add 1 to your friend’s account
12 Writes the new balance to disk

4. Durability

Once the transfer is complete, the data is permanently persisted to disk and not lost.

What we call transcation must have these four features or else there will be problems.

Problems encountered with concurrent read and write transactions

Let’s use changing the balance of my account in the database as an example to see what can go wrong when multiple transactions occur simultaneously:

Dirty write

A time line Transaction 1 Transaction 2
1 start
2 start
3 update user set money=100 where id=1
4 update user set money=50 where id=1
5 commit
6 Select * from user where id=1 select * from user where id=1
7 commit

In this case, transaction 2 modified the data that transaction 1 had not submitted, and transaction 1 was completely confused when it re-queried timeline 6. It was clear that its money should be 100, but it did not know why it changed to 50, thus losing 50 yuan for nothing, and the relationship with the bank became extremely disharmonious.

Dirty read

A time line Transaction 1 Transaction 2
1 start
2 start
3 update user set money=100 where id=1
4 Select * from user where id=1 select * from user where id=1
5 update user set money=50 where id=1
6 commit
7 commit

In this case, transaction 1 first updates my account balance in the database to 100, and then transaction 2 finds that the balance is 100, and tells me that the balance is 100. This kind of data that has not been committed by other transactions is called dirty read. Some people don’t understand where the “dirty” font is now, because it is possible for transaction 1 to continue to modify the money field, such as in timeline 6 above, transaction 1 changed the money field, in which case transaction 2 read the wrong data.

Unrepeatable read

A time line Transaction 1 Transaction 2
1 start
2 start
3 Select * from user where id=1 select * from user where id=1
4 update user set money=50 where id=1
5 Select * from user where id=1 select * from user where id=1
6 commit
7 commit

We call this non-repeatable read when transaction 1 reads data at timeline 5 that has not yet been committed by transaction 2 and is different from the data it reads at timeline 3.

Phantom read

A time line Transaction 1 Transaction 2
1 start
2 start
3 Select * from user where money>100 select * from user where money>100
4 Insert into user (money,id) values (200,2)
5 Select * from user where money>100 select * from user where money>100
6 commit
7 commit

Transaction 1 has different data on timeline 3 and timeline 5, and timeline 5 has read data that Timeline 3 has not read, we call this phenomenon phantom read.

The four phenomena are in order of severity: dirty write > dirty read > Unrepeatable read > phantom read

Isolation level

Innodb transaction isolation level (Innodb transaction isolation level) In MySQL, there are four transaction isolation levels supported:

  • READ UNCOMMITTED: READ is not committed
  • READ COMMITTED: Reads COMMITTED
  • REPEATABLE READ: REPEATABLE READ
  • The SERIALIZABLE: serialization

These are actually the four isolation levels in the SQL standard, which also specifies what can and cannot happen at different isolation levels, as follows:

Isolation level Dirty write Dirty read Unrepeatable read Phantom read
READ UNCOMMITTED possible possible possible
READ COMMITTED possible possible
REPEATABLE READ possible
SERIALIZABLE

As you can see, dirty writing is so severe that it is not allowed at any transaction isolation level.