What is a transaction?

Transactions refer to the mechanism by which, after a DML data modification statement is committed to the database, either all data is written successfully or all data is rolled back to its original state if one of the operations fails. Database ensures data integrity and consistency through transactions.

ACID

A complete transaction must contain the following four characteristics: atomicity, consistency, isolation, and persistence.

category describe
Atomicity The transaction is executed as a whole, and all or none of the operations on the database contained within it are executed.
Consistency Transactions should ensure that the state of the database changes from one consistent state to another. Consistent state means that the data in the database should meet integrity constraints.
Isolation When multiple transactions execute concurrently, the execution of one transaction should not affect the execution of other transactions.
“Durability” Changes made to the database by committed transactions should be permanently stored in the database

Multithreaded concurrent transaction problems

In a multi-threaded environment, it is possible for multiple threads to modify the same data at the same time, which leads to operation conflicts. According to the concurrency control theory, there are two ways to solve the conflict problem:

  • 1. Avoid conflicts, such as serialization and locking.
  • 2. Allow collisions, i.e. allow multiple versions of data, such as using MVCC.

At the same time, there is a problem of isolation in multi-threaded transactions. When multiple transactions are executed concurrently, whether data changes in one transaction can be sensed in another transaction can be detected. Database theory specifically defines a series of terms to describe:

The term Introduction to the
Dirty read One transaction can read uncommitted data from another transaction. This often leads to inconsistencies in the data
Unrepeatable read The situation where the same row is read twice in the same transaction and the data is inconsistent is called non-repeatable reads
Phantom read A situation in which data counted by statistics or other summary statements are inconsistent within the same transaction

To solve the above problems, the database proposes transaction isolation levels to correspond to them. As you can see from the table below, the better the consistency, the worse the concurrency.

Isolation level Dirty read Unrepeatable read Phantom read
Read Uncommit There are There are There are
Read Commit There is no There are There are
Repeatable There is no There is no There are
Serialization (Serializable) There is no There is no There is no

InnoDB transaction

In the computer world, data consistency and data crash recovery are usually achieved through WAL (Write Ahead Log) technology — if users make changes to the data in the database, they must ensure that the Log falls before the data. In InnoDB engine, data consistency and integrity are guaranteed by two sections of binlog and Redolog (redolog is written before transaction execution and Binlog is written before execution). If there is an exception in the process of data flushing, when MySQL restarts, Data can be restored based on logs to ensure data consistency. In MySQL, log files are the most important data, while data files are not so important.

MVCC mechanism

The InnoDB engine is a multi-version storage engine that implements transactions by preserving older versions of data rows.

Snapshot read and current read

In MVCC, data can be read in two forms:

  • Snapshot read: Reads only the visible version of the current transaction without locking. Select uses snapshot reads, so query performance can be greatly improved.
  • Current read: Reads the current version and locks it. Insert, Update, and Delete use the current read.

Undo Log

Older rows of data are called undo logs and are stored in rollback sections of the tablespace. Typically our undo log is used to do two things:

  • The rollback
  • Read consistency

The undo log is a logical log that records the original state of the row before the current transaction operation. It can be divided into insert undo log and update undo log.

type instructions
Insert Undo Log The insert Undo log is generated when the data is inserted, and is only used when the data is rolled back. Because the row is new, the data did not exist before, so there are no multiple versions and can be discarded after the transaction commits.
Update Undo Log Updating the Undo log is needed not only for rollback purposes, but also for consistent reads because the data may be used in other transactions. It is possible that other transactions are also processing this data, so these logs should be discarded only if there is no transaction for this data. Therefore, we need to pay attention to the transaction processing time. If a transaction is not released for a long time, the corresponding Undolog log cannot be cleared. When the throughput is high, it needs to occupy a large table space

Data row format

InnoDB engine implicitly adds the following three hidden fields to each row of data. InnoDB implements transactions through the interaction of these three fields. The format of the data row is as follows:

  • DB_ROW_ID is a 6-byte row primary key that uniquely identifies a row. InnoDB automatically creates a row ID as the primary key if no primary key is specified.
  • DB_TRX_ID Specifies the transaction ID of the last operation. The value is 6 bytes long and records the transaction ID of the last operation
  • DB_ROW_PTR Rollback pointer to a row of undo logs in the rollback segment.

For details on how to implement MVCC, see “Do you really know MVCC? The process is written in great detail.

The Delete process

In InnoDB, delete is actually treated as an update operation. When we need to delete a piece of data, InnoDB first marks the deletion bit and its secondary index deletion bit as deleted in memory. Then, when the undo log of the data can be deleted, InnoDB removes the data physically through the Purge thread.

Similarly, InnoDB’s physical data deletion is a logical deletion process, which simply marks the row as deleted and nothing more. Data marked as deleted does not mean that the space owned by the row can be reused immediately, and the column will be overwritten when it can be used later. So when we delete all the data using the delete method, we can see that the.ibd file size does not decrease. This phenomenon that index columns can be reused but not actually used is usually called index void. To solve this problem, we can recycle the disk by rebuilding the table. The rebuild command is as follows:

alert table table_name engine = InnoDB
Copy the code

Clustered indexes differ from secondary indexes

We know that InnoDB primary keys use clustered indexes. For clustered indexes, the leaf node contains all the fields of the data, so the row can be replaced directly when the data changes.

For secondary indexes, things may be a little different. When we update a secondary index, InnoDB first marks the previous index as deleted, then inserts a new index record, and the deleted index is subsequently purged through the Purge thread. So if the secondary index is updated or deleted by another transaction, the overwrite index is invalidated and the query needs to be returned to the table.

Thank you

  • InnoDB and the ACID Model
  • How does MVCC (Multi-Version Concurrency Control) work
  • MySQL · Engine features · InnoDB Undo log roaming
  • Do you really understand MVCC? How about doing it by hand?