1. Update/INSERT statement execution flow

1.1 process

1.1.1 insert process

In our previous article, we discussed the number of steps required to execute a SELECT statement. In fact, update/ INSERT statements have the same number of steps, except for the redo log and bin log. Here’s the whole process again:

  1. Create a connection

Connect to Mysql over TCP/IP

  1. The parser

Mysql will know that this is an UPDATE statement through lexical and syntactic parsing.

  1. The optimizer

Generate the corresponding execution plan and select the optimal execution plan

  1. actuator

This step will open the table and wait if there is an MDL write lock on the table. If not, an MDL read lock is added to the table.

Open_tables if open_tables is close to table_cache and Opened_tables is increasing, it means that when mysql opens a new table, it will read from disk and cannot fetch from cache

These are the steps for the mysql Server layer. We will focus on what the engine layer does.

  1. Get lock information

Use metadata (obtained from open_TABLE) to find out if there is any lock information and write the insert statement lock information to lock INFO

2. Check if data pages are in innoDB bufferIf not, it needs to be loaded from disk into InnoDB buffer

  1. Assign undo segments and record undo logs

  2. Log redo logs generated by undo log

The redo log is an undo segment change

  1. Update the data pages in innDB buffer

Here updates are made directly in memory rather than on disk.

  1. Record redolog

Changes in the INNDB buffer are recorded in the redo log buffer

  1. Insert the binlog cache

Modified information is recorded in the binlog cache in the event format.

  1. Insert change buffer

Only unique indexes involving non-clustered indexes and data pages not in Innodb_buffer are inserted into the change buffer

  1. Two-stage submission process

In SQL, it is considered commit

  • prepare

Flush and sync items in binlog_cache

  • commit

Because the redo log generated by this transaction has been synchronized to disk (innodb_flush_log_at_trx_COMMIT is different). So this step simply marks commit in the redo log.

This is the whole insertion process, only in mysql 5.6 and InnoDB engine environment

1.1.2 the update process

The update process is basically the same as the insert process, so please leave a comment if you have any questions.

1.1.3 Overall flow chart

2 change buffer

From version 1.0.x InnoDB introduced change Buffer also known as Insert Buffer in previous versions. Change Buffer can be seen as an upgrade to insert buffer.

2.1 role

The main function of change buffer is to speed up the operation of non-clustered indexes. If we know about the database before, we know that random reads from disk is a key factor affecting the performance of the database. Most of the optimizations in mysql are to reduce random reads to disk, as is the case with the change Buffer. Take an example of an insert process that involves two parts, the insert of clustered indexes and the insert of non-clustered indexes

  • Clustering index

In general, if the cluster index is incremented by the primary key, the insert is a sequential write, and there is no random read to disk, so it is fast. Of course, if you do not set UUID to autoincrement primary key (UUID), random read to disk will occur, as will the specified value.

  • Nonclustered index

Non-clustered indexes are also covered here in two cases, unique and non-unique indexes.

  • The only index

A unique index returns to the table to determine whether a conflict exists, so this must be a random read.

  • Nonunique index

Since the data pages are actually stored in primary key order, when I insert non-unique indexes, I actually get random reads. But sometimes non-unique indexes can also be sequential.

In some cases, secondary indexes are still inserted sequentially, or in comparative order, such as when a user purchases a time field in a table. In general, the user purchase time is a secondary index for querying by time criteria. However, inserts are inserted in increasing time, so the inserts are “relatively” sequential. — Inside MySQL Technology innoDB Engine

Let’s think a little bit about which of these steps generates random reads that can be optimized. Unique indexes don’t seem to be optimized because they need to be checked for collisions, and non-unique indexes only generate random reads because of the nature of the B+ tree, so maybe you can optimize them. Change Buffer does just that. Operations on non-unique indexes are placed first in change Buffer rather than directly into the data page. Here’s how it works. Determine if the target page is in the InnoDB buffer. If yes, operate the InnoDB buffer directly, otherwise put it in the Change buffer directly, so there is no need to read the page data randomly from disk into the InnoDB buffer. The next time a query needs to access the data page, the data page is read into memory and the change Buffer operations related to the page are performed.

2.2 the principle

The inndb_buffer is also a data page, so it will be loaded into the inndb_buffer as well. It is persisted in ibdatA1, so some data is stored in inndb_buffer, and changes to the change buffer page are recorded in the redo log.

2.2.1 structure

Since change Buffer is a B+ tree structure, leaf nodes and non-leaf nodes are also distinguished here.

  • Child nodes

It consists of three fields: space,marker and offset

  1. space

Table space IDS are recorded, and each table has a unique space.

  1. marker

Reserved fields, compatible with older versions of insert Buffers.

  1. offset

Data so the offset in the page.

  • A leaf node

In addition to the three fields of the non-leaf node, there is additional metadata, as well as the actual inserted fields.

  1. metadata

Save three fields respectively:

IBUF_REC_OFFSET_COUNTER: counter used to sort records into the order of the insert buffer

IBUF_REC_OFFSET_TYPE: Operation type (ibuF_OP_t)

IBUF_REC_OFFSET_FLAGS: indicates the flag bit. Currently, only IBUF_REC_COMPACT is available

2.2.2 the merge process

A change buffer is a buffer pool that merges data pages into real non-clustered indexes. In general, the merge process can be triggered in the following ways:

  1. Load the non-clustered index page into innoDB buffer
  2. The Master Thread is merged periodically
  3. The buffer bitmap determines that the secondary index page space is less than 1/32

The third point involves the concept of a buffer bitmap, which is used to mark the secondary index page space. We suggest that you can directly go to see the “MySQL technology insider InnoDB engine”, here do not repeat.

2.3 Some Problems

2.3.1 How do I Set the change Buffer

The size of change buffer can be set dynamically with innodb_change_buffer_max_size. When this parameter is set to 50, it means that the change buffer can occupy at most 50% of the innoDB buffer.

2.3.2 Application scenarios of change Buffer

Because read operations trigger the merge process of Change Buffer, the change buffer effect is not obvious when a large number of reads are performed. And the maintenance cost of change buffer will be incurred. Therefore, for businesses that write too much and read too little, the probability of the page being accessed immediately after writing is relatively small. In this case, the use of change Buffer is the best. This business model is common in billing and logging systems.

3 redo log

As you can see from the redo log section, every redo log is written to disk first. This is because if every update is written to disk, the disk has to find the corresponding record, and then update, the IO and search costs are very high.

IO cost is the addressing time and the time required to switch between live text, the most important is the user mode and kernel mode context switch. We know that the user mode is unable to directly access the data on the disk and other hardware, only through the operating system to adjust the kernel mode interface, with the kernel mode thread to access. Context switch here refers to the thread context switch with the process, the so-called context is the thread running environment information. First, the user thread needs some intermediate calculation results to store the CPU register, store the ADDRESS of the CPU instruction to the program counter (execution order guarantee), and store some thread-private information such as stack information. Then switch to kernel mode thread execution, need to read the thread private information from the register, program counter, and then perform read disk data. After reading, you return and write the thread information into registers and program counters. After switching to user mode, the user mode thread will read the environment information saved by the thread to resume execution. This process is a major drain on time resources.

3.1 the principle

The redo log consists of two parts:

  1. redo log buffer
  2. redo log file

The redo log buffer is stored in memory. If the redo log fails before writing to the file, the redo log buffer is lost. The redo log buffer is written to disk in near real time. When the session issues a COMMIT statement, the redo log buffer is written in real time. (Note that the redo log is not written to disk.)

When a record needs to be updated, the InnoDB engine writes the record to the redo log and updates the memory. At the same time, the InnoDB engine will update the operation record to disk when appropriate.

Because the redo log is of a fixed size, the redo log is actually flushed when the redo log is near full. The redo log is written in a loop, so it is written backwards and flushed forwards. This is also called checkpoint technology.

3.2 configuration

Innodb_flush_log_at_trx_commit = 2} {0 | 1 |, brush will specify when the transaction log to the disk, the default is 1.

  • 0: Synchronize redo log buffer to OS Buffer and flush from OS buffer to disk log file every second.
  • 1: Synchronize redo log buffer to OS Buffer and flush from OS buffer to disk log file for each transaction commit.
  • 2: Indicates that every transaction commit synchronizes redo log buffer to OS buffer but is flushed from OS buffer to disk log file every second.

3.3…

WAL stands for write-Ahead Logging, and its key point is to Write to the log first and then to the disk. It’s not just MySQL, it’s used by many systems that write to disks including ZooKeeper, ES, etc.

3.3.1 Advantages of WAL

  1. The advantage of WAL is that read and write can be executed completely concurrently without blocking each other (but writes still cannot be concurrent).
  2. WAL, in most cases, has better performance (because you don’t have to write two files every time you write).
  3. Disk I/O behavior is more predictable. Fewer fsync() operations are used, reducing the vulnerability of the system. Improve performance

4 undo log

The redo log is called a redo log because the redo log records changes made to a data page. Undo log is a rollback log. When a transaction fails, it can be rolled back using undo log. More on this in the transactions section.

5 bin log

In addition to the redo log, mysql also has a more important log called the bin log. Differences between redo log and bin log:

  1. The redo log is an InnoDB-specific log, and the bin log is a mysql Server layer log.
  2. The redo log records changes to data pages, while the bin log records SQL logic.
  3. Redo log is a circular write. Bin log is an appending write.

The statement format records the contents of the SQL statement, and the row format records the contents of the row, both before and after the update.

6 Phase 2 Commit

The purpose of two-phase commit is to ensure crash-safe. If two-phase commit is not used, the state of the database may be inconsistent with the state of the library recovered from its logs.


Personal blog

The stones of Sisyphus

The author is of limited level, please point out any errors and omissions.

Refer to the article

SQL > alter database transaction log redo log undo log

2. MySQL 45

Reference books

  1. Innodb Storage Engine