preface

When it comes to database transactions, many people think of the four isolation levels of atomicity, consistency, isolation, persistence, and transactional isolation. However, it is not clear why these four characteristics are used to guarantee transactions, and how the isolation level of transactions is generated. Including LZ’s introduction to transactions in High Performance MySQL, which is too abstract. I happened to see the video “Principle and Practice of Online Distributed Database” on MOOCs, which is quite good. To summarize and share these concepts, I believe you can easily understand them after reading this article

Introduction of the transaction

When we write Java programs and encounter concurrency problems, we think of locking as a solution. How to solve the database concurrency problem? The answer is transactions, which are essentially a combination of locking and concurrency

For example, if Bob transfers $100 to Smith, there will be the following operations

We need to lock the Bob and Smith accounts so that only one thread can operate on them

For example, when thread 1 goes in, thread 2 and thread 3 are waiting outside, so that only thread 1 can see the intermediate state, so if I subtract 100 from Bob’s account and I don’t have time to add it to Smith’s account, then Bob’s account is 0, Smith’s account is 0, and thread 2 and thread 3 can only see either Bob has 100 pieces, Or Smith has $100. Thus the transaction is guaranteed to be consistent, i.e., either Bob has 100 or Smith has 100, and there is no intermediate state

So let’s say we now have 3 transaction units Bob gives Smith100 Smith gives joe100 Smith gives Bob100

Because of the locking of the common account, the next transaction is queued while the previous one is not completed, which is very poor performance

Predecessors summarized the cost-before relationship between transaction units. There are only the following four types of read and write, read, read and write. How can transactions ensure the logical order of the above four operations and complete them in the fastest speed?

Queuing method (serialized read and write)

Put all requests in a queue and execute them from head to tail. Advantage: no conflict control is required

Exclusive locking (controlling access to the same cell)

We used to put all the requests in one queue. Can we put them in different queues? For example, if Bob gives Smith100 pieces and Joe gives Lisa100 pieces, these two different transaction units can be completely parallel. How to do this? Just lock it

If Bob gives Smith100 blocks and Smith gives Bob100 blocks, the two transactions cannot be parallel because they share units

 

Read-write lock

As mentioned above, there are only four coin-before relationships between transactions: read/write, write/read, read/write. If read locks and write locks are separated, parallelism can be made, and parallelism can be improved for tasks that read more and write less

MVCC

Now the mainstream database implementation is MVCC (multi-version concurrency control) is essentially copy on write, can achieve write without blocking read MVCC can achieve write read without conflict, read without conflict, read without conflict, the only conflict is write and write, so the system can be very high concurrency read

MVCC provides a point in time consistency view. Read transactions under MVCC concurrency control usually use a timestamp or transaction ID to mark the state (version) of the database being read and read the data of this version. Read and write transactions are isolated and do not need to be locked. In the case of concurrent reads and writes, the write operation creates a new version based on the current database status, while concurrent reads still access the data of the old version

In short, MVCC is to achieve concurrency control by temporarily retaining multiple versions of the same data

Deep single machine transaction

Let’s look at the ACID of a transaction (atomicity, consistency, isolation, persistence)

Atomicity: a transaction either succeeds at the same time or fails at the same time. Let’s take the example of Bob’s transfer to Smith.

As shown in the figure above, all the states are listed. Take the second step as an example. The account Smith does not exist when transferring money, so the state needs to be rolled back to ver1. In fact, when each operation, the database will record the rollback segment (namely the undo information in the figure above), when the need to roll back according to the undo information can be rolled back, if the VER3 transaction commits timeout, ver3 will be rolled back to ver2, then ver2 to ver1 can be rolled back, this is the atomicity of the transaction, that is, either Bob has 100 blocks, Smith has zero, or Bob has zero and Smith has 100

Atomic semantics only guarantee a rollback segment that can be rolled back to the previous version. Next, consistency. Why consistency? Here’s another example from above

If at ver2 another process adds Smith to 300, then when transaction 1 rolls back, it changes Smith to 0, and Smith’s 300 is gone, but by atomic definition it doesn’t care about this, it only records undo logs that roll back

The core of consistency is Can (happen before)

As shown in the figure, when multiple transaction units are executed, there are three viewpoints: one transaction occurs before or after another transaction (viewpoint 1 and Viewpoint 2), and two transactions occur simultaneously (viewpoint 3). As mentioned above, when transaction 1 is executed to VER2, transaction 2 modifies the Smith account. To avoid data inconsistencies when transaction 1 rolls back, transaction 1 locks the operation, which moves the request from viewpoint 3 up to viewpoint 1 because viewpoint 3 cannot acquire the lock from Bob and Smith and is forced to wait.

This is essentially a process of queuing all requests, not a queue, of course, because the lock will be pushed down on each data. As mentioned above, consistent performance guarantees that all changes in the system will be seen, but if this is done, the system will not be concurrent. For example, if there is a transaction that locks both accounts, all other operations on the two accounts cannot be done in parallel and must wait outside. So the system has to choose another concept, isolation.

Isolation: For performance reasons, there are 4 isolation levels of SERIALIZABLE, REPEATABLE READ,READ COMMITTED, and READ UNCOMMITTED for consistent broken transactions.

Exclusive lock

If the consistency is to be guaranteed, only the happen-before relationship of transactions can be guaranteed. However, when the absolute strong consistency of a transaction unit is guaranteed, only all transactions can be queued, which is SERIALIZABLE in the isolation level. Only one transaction can enter each unit of time with exclusive lock. Of course the performance is very low

Read-write lock

READ and write locks have 2 isolation levels, REPEATABLE READ and READ COMMITTED. How are these 2 isolation levels generated?

Read/write locks have an important concept (or choice) of whether the read lock can be upgraded without the write lock. That is, when a read lock is added to a transaction unit, if a new write comes in, the read should be released to allow the write to enter.

When the read lock cannot be upgraded by the write lock, only the read parallel and reread levels can be achieved, which is not perfect to improve system performance, so there is another isolation level, this isolation level is commit read

Commit the read isolation level, the read lock can be upgraded by the write lock. For example, when two read locks are added to a transaction unit, a new write comes in, allowing the write request to upgrade the read lock to the write request, which can be read parallel, read parallel.


To talk about the difference between repeatable and committed reads, if a transaction has two parallel reads, the second person can read the data that the previous person read, this is called repeatable read. However, if the reads and writes can be parallel, the following situation will occur: the first read of the data with version 1 is parallel, and the second write can be updated to the data. If the data is read again, the version of the data may be different, so the unrepeatable read will occur.

Write lock only, not read lock

The isolation level of uncommitted read is that only write locks are added, but no read locks are added. In this way, read/write parallelism, read/write parallelism, and write/read parallelism are achieved

Read and write in parallel

Write/read parallel (from previous step)

Write/read parallel (from previous step)

Problem: It is possible to read data in the write process. Because the read is not locked, only one write lock is added, so it is possible to read data that is not committed internally. Therefore, this isolation level is generally not used, because the intermediate state will be read

After the transaction completes, changes made to the database by the transaction are persisted in the database

Afterword.

Finally, the definition of isolation level in High Performance MySQL is attached for further understanding

READ UNCOMMITTED

At the READ UNCOMMITTED level, changes in a transaction, even if they are not committed, are visible to other transactions. Transactions can Read uncommitted data, also known as Dirty reads. This level causes a lot of problems. READ UNCOMMITTED is not much better than other levels in terms of performance, but it lacks many of the benefits of other levels, and is rarely used in practice unless there is a really compelling reason to do so.

READ COMMITTED

The default isolation level for most database systems is READ COMMTTED (MySQL is not). READ COMMITTED satisfies the simple definition of isolation mentioned earlier: at the start of a transaction, only the changes made by COMMITTED transactions can be “seen”. In other words, any changes made by a transaction from inception until commit are invisible to other transactions. This level is sometimes called nonrepeatble read because executing the same query twice may result in different results

REPEATABLE READ(REPEATABLE READ)

REPEATABLE READ solved the dirty READ problem. This isolation level ensures that multiple reads of the same record within the same transaction are consistent. But in theory, repeatable Read isolation levels still don’t solve another Phantom Read problem. Phantom reading means that when a transaction reads records in a range, another transaction inserts new records in that range, and when the previous transaction reads records in that range again, Phantom Row will be generated. InnoDB and XtraDB storage engines address the illusion problem with the Multiversion Concurrency Control (MVCC).

SERIALIZABLE

SERIALIZABLE is the highest isolation level. It avoids the illusion problem mentioned earlier by forcing transactions to be executed serially. In short, SERIALIZABLE locks every row read, which can cause a lot of timeouts and lock contention. This isolation level is also rarely used in practice and is only considered when there is a strong need to ensure data consistency and no concurrency is acceptable.

Isolation level

Dirty read probability

Non-repeatable read possibility

Illusory possibility

Read lock

READ UNCOMMITTED

Yes

Yes

Yes

No

READ COMMITTED

No

Yes

Yes

No

REPEATABLE READ

No

No

Yes

No

SERIALIZABLE

No

No

No

Yes