In this video we’re going to focus on the InnoDB storage engine. Mainly from InnoDB storage structure, thread model, data files these aspects to do a deep understanding.

InnoDB storage structure

InnoDB memory structure

The InnoDB storage engine has been used by default since MySQL 5.5. InnoDB storage engine is good at handling transactions and has automatic crash recovery features. Let’s look at the architecture diagram of InnoDB storage engine (mainly divided into memory structure and disk structure).

InnoDB’s memory structure mainly includes:

  • Buffer Pool
  • Change Buffer
  • Adaptive Hash Index
  • Log Buffer

Buffer Pool

Buffer pool, BP for short. The unit is Page, the default size is 16K, the underlying data structure uses the linked list, with the linked list to manage the Page. InnoDB adds a cache to the Page when accessing table records and indexes (to reduce disk IO operations). Seeing the importance of Page, let’s look at how to design Page.

Page management mechanism

Page is represented in three states:

  • Free Page: Free page that is not being used
  • Clean Page: A page that is used, but the data in the page has not been modified
  • Dirty Page: Used pages. Data on the pages is modified and inconsistent with data on disks

InnoDB maintains and manages pages in three states through three linked list structures.

  • Free List: Represents the free buffer used to manage free pages
  • Flush List: Indicates that data needs to be flushed to the disk buffer. It is used to manage dirty pages
    • Internal pages are sorted by modification time
    • Dirty pages exist in flush List and lRU list, but the two do not affect each other
      • Flush List manages dirty page flushing operations
      • The LRU List is responsible for managing the availability and release of pages
  • Lru List: indicates the buffer in use, mainly used to manage clean and Dirty pages.
    • The buffer is based on MIDpoint, with the front list called the New list area (for frequently accessed data) accounting for 63%
    • The later linked list, called the Old list area, is used to hold less accessed data, at 37%
LRU algorithm improvement
  • Ordinary LRU algorithm: adopts the end elimination method, adding new data from the head of the linked list, and eliminating data from the end of the linked list when releasing space
  • The improved LRU algorithm:
    • The linked list is divided into two parts: new and old. Instead of inserting new elements from the table head, new elements are inserted from the midpoint in the middle. If the data is accessed soon, the page is moved to the head of the new list. If the data is not accessed, it will be gradually moved to the old tail area, waiting to be eliminated.
    • Every time a new page is read into the buffer pool, InnoDB determines whether there are free pages or whether there are enough free pages
      • If there are enough free pages: The Free page is removed from the Free List and placed in the LRU list
      • If there are not enough free pages: the default LRU linked list pages are weeded out according to the LRU algorithm, and memory space is freed and allocated to new pages

Change Buffer

Write buffer, CB for short. When we perform DML operations on data, if BP does not have its corresponding page data, disk pages are not immediately loaded into the Buffer Pool. Instead, the changes to the Buffer are recorded in CB, and when the data is read, the data is merged and restored to BP.

CB occupies the space of BP, 25% by default and 50% by maximum operation. We can adjust it according to the traffic of reading and writing. The key parameter is innodb_change_BUFFer_MAX_size.

Every time we update a record, if the record is in BP, it is modified directly in BP, a memory operation. If the record does not exist in BP (no hit), a memory operation will be performed directly in CB (no need to query the data to disk, avoiding a disk I/O). The next time the record is queried, it is read from disk and then read from CB and merged, and finally loaded into BP.

Write buffers only apply to non-unique ordinary index pages, why?

If the index is set to be unique, InnoDB must check for uniqueness before making changes, so it must query the disk and perform disk I/O. At this point, records are queried and loaded directly into BP, and then modified in BP (not in CP).

Adaptive Hash Index

Adaptive hash index is mainly used for optimization of BP query data.

The InnoDB storage engine monitors table index lookups and creates hash indexes if it sees a speed gain from creating a hash index, which is called adaptive. InnoDB will automatically hash indexes for certain pages based on the frequency and mode of access.

Log Buffer

The log buffer is used to store data that is to be written to log files (Redo/Undo) on disk. The contents of the log cache are periodically flushed to the log file on the disk. When the log buffer is full, it is automatically flushed to a new disk. Increasing the log buffer size can reduce disk IO whenever large transactions such as BLOBS or multi-line updates are encountered.

The Log Buffer is used to record InnoDB engine logs. During DML operations, Redo and Undo logs are generated. The innodb_log_buffer_size parameter can be used to increase the log buffer size to reduce disk I/O operations.

Innodb_log_buffer_size specifies the size of the log bufferCopy the code

InnoDB also provides the parameter innodb_flush_log_at_trx_commmit, which controls the log flushing behavior. The default value is 1.

innodb_flush_log_at_trx_commmit
Copy the code
  • Innodb_flush_log_at_trx_commmit = 0: Write log files every one second and flush disks. At most 1s of data will be lost.
    • Write log file LogBuffer –> OS cache, flush OS cache –> disk file
  • Innodb_flush_log_at_trx_commmit = 1: The transaction is committed, the log file is written immediately and the disk is flushed. Data is not lost, but I/O operations are performed frequently.
  • Innodb_flush_log_at_trx_commmit = 2: The transaction is committed. The log file is written immediately and the disk is flushed every 1s.

InnoDB disk structure

The disk structure of InnoDB mainly includes:

  • Tablespaces

  • InnoDB Data Dictionary

  • Doublewrite Buffer

  • Redo Log

  • Undo Logs

Tablespaces

Table Spaces are used to store table structures and data.

Table space has different types of partition:

  • System table space
  • Independent table space
  • Generic table space
  • Temporary table space
  • The Undo tablespace
System Tablespace

The system tablespace contains storage areas for data dictionaries, Doublewrite Buffer, Change Buffer, and Undo Logs in InnoDB. The default contains any table data and index data created by the user in the system table space. A system table space is a shared space that can be shared by a table. The data files in this space can be controlled with the innodb_data_file_path parameter. The default is ibdata 1:12m :autoextend (file name ibdata1, size 12M, autoextend)

File-per-table Tablespaces

A separate tablespace is a single tablespace that is created in its own datafile and not in the system tablespace. When innodb_FILe_per_TABLE is enabled, tables will be created into the tablespace. Deny InnboDB will be created into the system tablespace.

Each table file table space is represented by an.idb data file, which is created by default in the database directory. Table files in table Spaces support dynamic and commpressed row formats.

General Tablespaces

A generic tablespace is a shared tablespace created using the CREATE TABLESPACE syntax that can hold multiple tables and supports all row formats. Generic tablespaces can also be created in other tablespaces outside the MySQL data directory.

Ts1 CREATE TABLESPACE ts1 ADD DATAFILE ts1.idb Engine=InnoDB; TABLESPACE ts1 CREATE TABLE T1 (c1 INT PRIMARY KEY) TABLESPACE ts1;Copy the code
Undo Tablespaces

An Undo tablespace consists of one or more Undo log files. Before MySQL 5.7, Undo occupied the System Tablespace share. Since 5.7, Undo has been separated from the System Tablespace. The default value is 0 (ibDATA1; ibDATA1; ibDATA1; If the value is greater than 0, the Undo tablespace undo_001 and undo_002 is used.

InnoDB Data Dictionary

The InnoDB data dictionary consists of internal system tables. These tables contain metadata for looking up objects such as tables, indexes, and table fields. The metadata is physically located in the InnoDB system table space. For historical reasons, data dictionary metadata overlaps to some extent with information stored in InnoDB table metadata files (.frm files).

Doublewrite Buffer

Located in the system table space, it is a storage area.

Data is stored in the Doublewrite buffer before the BufferPage page is flushed to its real location on disk. If the operating system, storage subsystem, or mysqld process crashes during page page writing, InnoDB can find a good backup of the page from the Doublewrite buffer during crash recovery. To disable the Doublewrite buffer, which is enabled by default in most cases, the innodb_doublewrite parameter is set to 0 to indicate that the Doublewrite buffer is disabled. It is recommended to set innodb_flush_method to O_DIRECT when using the Doublewrite buffer. For specific reasons, let’s take a look at the official explanation:

The innodb_flush_method parameter controls the opening and flushing modes of innoDB data files and redo logs. There are three values: fdatasync(default), O_DSYNC, and O_DIRECT. Setting O_DIRECT means that data file writes inform the operating system not to cache data and not to use preread and write directly from Innodb Buffer to disk files. By default, fdatasync writes to the operating system cache, and then calls fsync() to asynchronously flush data files and redo logs.Copy the code
Redo Log

A redo log is a disk-based data structure used to correct data written by incomplete transactions during crash recovery. MySQL writes a Redo Log file to Log all InnoDB changes to the Buffer Pool. When an instance failure (such as a power outage) causes data to fail to be updated to the data file, a Redo operation is performed during database restart to re-update data to the data file. Redo logs are generated during write/write transactions. By default, Redo logs are physically represented on disk by two files (ib_logfile0 and ib_logFILe1).

Undo Logs

Undo logs are logical logs that are backups of modified data saved before the transaction starts and are used to roll back transactions in exceptional cases. Record according to each row.

Undo logs exist in system table Spaces, undo table Spaces, and temporary table Spaces.

MySQL 8.0 has changed its structure

  • MySQL version 5.7
    • Separating the Undo log tablespace from the shared tablespace IBData file allows the user to specify the file size and number when installing MySQL.
    • Added temporary tablespace that stores data for temporary tables or temporary query result sets.
    • The Buffer Pool size can be dynamically modified without restarting the database instance.
  • MySQL version 8.0
    • InnoDB table data dictionary and Undo Log are completely separated from the shared tablespace IBData. Previously, the data dictionary in ibDATA and the data dictionary in ibD files in independent tablespace are consistent.
    • Temporary A temporary table space can also be configured with multiple physical files, all of which are InnoDB storage engines and can create indexes to speed up processing.
    • You can set up some tablespaces like an Oracle database. Each tablespaces corresponds to multiple physical files. Each tablespaces can be used by multiple tables, but a table can only be stored in one tablespace.
    • Separate the Doublewrite Buffer from the shared table space IBData.