How to ensure consistency in MySQL (InnoDB)

1. MySQL infrastructure

I have written several articles about MySQL, mostly explaining some basic concepts. When I wanted to understand the storage engine, I found that I did not know where to start, or how to start. In retrospect, IT seems that I am not familiar with the MySQL infrastructure, so this article introduces InnoDB storage engine. But I’ll also talk a little bit about the infrastructure of MySQL.

Take a look at this image (found online) :

You can see that the top layer of MySQL is the connection component. The following servers are made up of connection pools, management tools and services, SQL interfaces, parsers, optimizers, caches, storage engines, and file systems.

Does that make sense? Look not to understand. No problem, let’s translate:

Generally speaking, MySQL can be divided into Server layer and storage engine layer:

Server layer: The Server layer covers most of the core business functions of MySQL, and all storage engine functions are implemented in this layer, including stored procedures, triggers, and views. ;
Storage engine layer: There are many storage engines, including MyISAM, InnoDB and Memory, etc. The most commonly used is InnoDB, which is now the default storage engine of MySQL.

1.1 the connector

With MySQL, the first step is to connect to the MySQL database. Connectors are responsible for establishing connections with clients, obtaining permissions, and managing connections. We usually use the mysql -uroot -p + Enter command to Enter the password and log in.

When you enter a password to submit a login, the MySQL client establishes a connection with the server, and after completing the TCP handshake, the connector begins to confirm the user name and password you entered. If the user name and password are correct, the login succeeds. If the user name and password are incorrect, the following error message will be displayed.

[root@VM-12-13-centos ~]# mysql -uroot -p
Enter password: 
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)
[root@VM-12-13-centos ~]# 
Copy the code

1.2 Querying Cache

After the connection is established, suppose you are using the SQL statement to query a piece of data.

select * from user where id = 1
Copy the code

The MySQL execution logic is then returned to the query cache. If the SQL statement has been executed before, the result is probably cached in memory as a key-value. The Key generation refers to the query statement and the Value generation refers to the query result. If the statement you are querying hits the cache, it will return the result set of the Value corresponding to the SQL statement, which will not execute other MySQL parts, greatly improving the efficiency of the query.

However, the pros and cons often exist at the same time, and the query cache has a fatal disadvantage, that is, the query cache invalidation is very frequent. By invalidation of the query cache, we mean that whenever a table is updated, all the query caches on that table are cleared. So maybe you put a lot of effort into saving your results, and then an update empties them out before you even use them! We all know how valuable data is, but because of this fatal flaw, the query cache was abandoned in MySQL8.0, meaning that MySQL8.0 removes the query cache completely!

1.3 analyzer

If the cache is not hit, the SQL statement must be executed. The parser performs a lexical analysis of the SQL statement. It analyzes and identifies what Spaces, strings, and keywords you enter represent in MySQL. For example, it first identifies the SELECT keyword, table name, column name, and condition. Once you have identified these SQL statements, you are in the parsing phase, which checks whether the SQL statement you entered conforms to the MySQL statement standards. You have an error in your SQL syntax.

The first thing you need to pay attention to is the following use near, because it will tell you which grammar has errors near it!

1.4 the optimizer

Query optimizer, which is used by SQL statements to optimize queries before they are queried.

SQL statements that enter the optimizer’s optimization phase indicate that no errors occurred during analyzer analysis. So what does the optimizer do with this SQL statement? If an SQL statement has an index, the optimizer selects the appropriate index based on optimization rules. For example, the query optimizer will automatically optimize the optimal query SQL by changing the order of the indexes as described earlier in the associative index with the left-most prefix principle.

Simply put, the optimizer determines which index you use, which join you use, and its role is to determine the most efficient execution.

1.5 actuators

The parser knows what to do, and the optimizer knows how to do it, which raises the question, who does it? As you can imagine, the executor starts executing the SQL statement. If you do not have permission to perform a query on the table, an error message will be displayed. If you have permission, open the table and continue. The executor invokes the provided engine interface based on the engine of the table to begin execution.

The above content comes from: MySQL infrastructure analysis

2. Storage engine

As you can see from the architecture diagram, one of the most important features that distinguishes MySQL database from other databases is its plug-in table storage engine.

The MySQL plug-in storage engine architecture provides a series of standard management and service support. These standards have nothing to do with the storage engine itself and may be necessary for every database system, such as SQL analyzers and optimizers, etc. Storage engine is the implementation of the underlying physical structure and actual file reading and writing. Each storage engine developer can develop as they wish.

It is important to note that the storage engine is table based, not database based.

The advantage of plug-in storage engines is that each storage engine has its own characteristics and can create different storage engine tables according to specific applications. Due to the open source nature of MySQL database, users can write their own storage engine according to MySQL’s predefined storage engine interface. If users are not satisfied with the performance or function of a storage engine, they can modify the source code to get the features they want. This is the convenience and power of open source.

Due to the open source feature of MySQL database, storage engine can be divided into official MySQL storage engine and third-party storage engine. Some third-party storage engines are powerful, such as the InnoDB storage engine (originally acquired by Oracle), which is widely used. Even MySQL database OLTP(Online Transaction Processing) application is the most widely used storage engine.

2.1 the InnoDB

InnoDB is MySQL’s default transactional engine and the most important and widely used storage engine.

It is designed to handle a large number of short-lived transactions, which are mostly committed normally and rarely rolled back. InnoDB’s performance and automatic crash recovery features make it popular for non-transactional storage requirements as well. Unless there is a very specific reason to use another storage engine, the InnoDB engine should be preferred.

If you want to learn storage engines, InnoDB is also a very good object to spend the most time learning in depth, and the payoff is definitely much higher than spending the average time learning each storage engine.

2.2 MyISAM

In MySQL 5.1 and earlier, MyISAM is the default storage engine. MyISAM offers a number of features, including full-text indexing, compression, spatial functions (GIS), and so on, but MyISAM does not support transaction and row-level locking, and has the unquestionable drawback of not being able to safely recover from a crash.

Although the MyISAM engine does not support transactions and does not support secure recovery after crashes, it is by no means without merit. For read-only data, or if the table is small enough to tolerate a repair operation, you can still use MyISAM (but don’t default to MyISAM; instead, use InnoDB by default).

But MyISAM locks the entire table, not rows. MyISAM is prone to typical performance issues due to table locking issues, as it locks all tables that need to be read and locks all tables that need to be read.

2.3 the Memory

Memory tables (formerly known as HEAP tables) can be useful if you need to access data quickly and if the data can’t be modified or lost after a restart. Memory tables are at least an order of magnitude faster than MyISAM tables because each memory-based storage engine table actually corresponds to a disk file. The file name is the same as the table name, and the type is.frm. Only the structure of the table is stored in this file. Its data files are stored in memory, which facilitates fast data processing and improves the efficiency of the entire table without disk I/O. So the structure of the Memory table remains after a restart, but the data is lost.

Memroy tables can work well in many scenarios:

A lookup or mapping table, such as one that maps zip codes to states.
Used to cache periodically aggregated data results;
Used to store intermediate data generated during data analysis.

Memory tables support Hash indexes, so lookups are very fast. While Memory tables are very fast, they are no substitute for traditional disk-based tables. Memroy tables are table-level locks, so concurrent write performance is low. It does not support BLOB or TEXT columns, and the length of each row is fixed, so even if a VARCHAR column is specified, the actual storage is converted to CHAR, which can result in a partial memory waste.

In addition to the above three, there are other engines such as Mrg_MyISAM, Archive, Blackhole, CSV, Federated, as well as third-party engines such as XtraDB, TokuDB, etc.

2.4 contrast

Differences between MyISAM and INNODB (key points) :

Transaction security (MyISAM does not support transactions, INNODB does);
Foreign keys MyISAM does not support foreign keys, INNODB does;
Lock mechanism (MyISAM table lock, InnoDB row lock);
Query and add speed (MyISAM batch insert speed is fast);
Support for full-text indexing (MyISAM supports full-text indexing, INNODB does not);
MyISAM has a lower memory space usage than InnoDB.

For Memory storage, for example, our data changes frequently and does not need to be stored in the database, and at the same time, we consider the use of Memory, which is extremely fast. (If MySQL restarts, the data will not exist).

The characteristics of	MyISAM	InnoDB	Memory
The transaction		️
Locking mechanism	Table locks	Row locks	Table locks
The full text indexing	️
Support foreign keys		️
Space use	low	high	–
Batch insertion speed	high	low	high

3. InnoDB storage engine

InnoDB is the default storage engine for MySQL. How does InnoDB handle data?

InnoDB is a storage engine that stores data from tables on disk, so our data is still there even after we shut down and restart. The actual processing of the data takes place in memory, so you need to load the data from disk into memory and flush it to disk if you are processing write or modify requests.

And we know that reading and writing to disk is very slow, several orders of magnitude different from reading and writing to memory, so when we want to fetch some records from a table, does InnoDB storage engine need to read the records from disk one by one? This is bound to take a lot of time.

Therefore, InnoDB takes the approach of dividing data into several pages, using the page as the basic unit of interaction between disk and memory. InnoDB typically has a page size of 16KB (i.e. the size of data stored in a leaf node of B+Tree). That is, at least 16KB is read from the disk to the memory at a time and at least 16KB is flushed to the disk at a time.

3.1 Record store structure and index page

We usually insert data into a table on a record basis. These records are stored on disk in what is also known as row format or record format, which means that a page holds rows and rows of data from a table.

3.1.1 row format

The InnoDB storage engine designs four different types of row formats: Compact, Redundant, Dynamic, and Compressed.

To see the default line format, run the following command:

show table status like 'table'\G;
Copy the code

Such as:

mysql> show table status like 'biz_article'\G;
*************************** 1. row ***************************
           Name: biz_article
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 20
 Avg_row_length: 79462
    Data_length: 1589248
Max_data_length: 0
   Index_length: 81920
      Data_free: 4194304
 Auto_increment: 29
    Create_time: 2021-12-01 21:39:38
    Update_time: 2021-12-01 21:33:21
     Check_time: NULL
      Collation: utf8mb4_unicode_ci
       Checksum: NULL
 Create_options: row_format=COMPACT
        Comment: 
1 row in set (0.00 sec)
Copy the code

You can see that the default row format in the table above is COMPACT.

COMPACT stores row records as follows:

1. Variable field length list

We know that MySQL supports variable length data types, such as VARCHAR, VARBINARY, various TEXT types, and various BLOB types. We can also refer to columns that have these data types as variable length fields. The number of bytes stored in variable length fields is variable. So when we store real data, we also need to store the number of bytes that the data takes up.

If the maximum allowed storage length of the variable field exceeds 255 bytes and the actual number of bytes stored exceeds 127 bytes, use 2 bytes, otherwise use 1 byte.

Data overflow

In addition, MySQL states that when a table is declared, the total theoretical length of all VARCHAR types cannot exceed 65535 (there is additional overhead, which is less than 65532, but the upper limit is 65535). If the value exceeds 65535 and SQL_MODE is not set to strict mode, the VARCHAR type is automatically converted to text.

Mysql > create table test(a varchar(30000), b varchar(40000)); ERROR 1074 (42000): Column length too big for column 'a' (max = 16383); use BLOB or TEXT insteadCopy the code

2. List of NULL values

Some columns in a table may store NULL values, which would take up a lot of space if they were stored in the real data of the record, so the Compact row format manages these null-valued columns and stores them in the NULL list. Each null-allowed column has a binary bit, and a binary bit value of 1 represents the value of NULL for that column. When the value of the binary bit is 0, the value of the column is not NULL.

3. Record header information

The record header is made up of a fixed five bytes. Five bytes is forty binary bits, and each bit means different things.

MySQL adds columns (also known as hidden columns) to each record by default, in addition to the columns we define ourselves.

4, DB_ROW_ID (row_id)

Optional, 6-byte line ID that uniquely identifies a record.

5, DB_TRX_ID

Yes, 6 bytes, representing the transaction ID.

6, DB_ROLL_PTR

Yes, 7 bytes, indicating the rollback pointer.

InnoDB table primary key generation strategy is: A user – defined primary key is preferred. If no user – defined primary key is selected, a Unique key is selected. If no Unique key is defined, InnoDB adds a hidden column named row_id as the primary key by default.

Columns DB_TRX_ID (also known as trx_id) and DB_ROLL_PTR (also known as roll_ptr) are mandatory, but row_id is optional (this column is only added if there is no custom primary key or Unique key).

3.1.2 data page

Previously we briefly mentioned the concept of a page, which is the basic unit of storage InnoDB manages. The size of a page is typically 16KB. InnoDB for different purpose and design of many different types of pages, page for our records in the table of that type of nature is also one of them, the official said the deposit record (INDEX) for indexing pages, but also to understand the data page no problem, after all, there is a clustering INDEX the INDEX and data.

Data page structure

The storage space of an InnoDB data page is roughly divided into seven parts:

File Header: Some general information about the 38-byte page in the header of the file;
Page Header: 56 bytes in the header of the page, some information exclusive to the data page;
Infimum + Supremum: Minimum record and maximum record 26 bytes, two virtual line records;
User Records: The user record size is uncertain, the actual stored row record content;
Free Space: free space, of uncertain size, unused space in a page;
Page Directory: page table of contents, size uncertain, relative position of some records in a page;
File Trailer: End of file, 8 bytes, check whether the page is complete.

3.2 the table space

A table space is an abstract concept that, in the case of a system table space, corresponds to one or more actual files in a file system. For each individual table space, there is an actual file in the file system named table name.ibd. You can think of a table space as a pool divided into many pages. When we want to insert a record for a table, we pull out a corresponding page from the pool to write data into.

Again, InnoDB manages storage on a per-page basis. Our clustered indexes (i.e., complete table data) and other secondary indexes are stored in the table space as B+ trees whose nodes are data pages.

Any type of page has a section called File Header. A special place in the File Header (FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID) holds which tablespace the page belongs to. Each page in the table space corresponds to a page number (FIL_PAGE_OFFSET). This page number consists of 4 bytes, or 32 bits. Therefore, a table space can have up to 2³² pages. A tablespace supports a maximum of 64TB of data.

Table Spaces are divided into separate table Spaces and system table Spaces.

3.2.1 Independent tablespace structure

Area (among)

InnoDB also has an extent concept for managing pages that can be as large as 2³². For 16KB pages, 64 consecutive pages are an extent, which means that an extent takes up 1MB of space by default.

Table Spaces, whether system or standalone, can be viewed as consisting of extents, each 256 extents divided into a group. Extent 0 to extent 255 extents (FSP_HDR); extent 0 to extent 255 extents (FSP_HDR); extent 0 to extent 255 extents (FSP_HDR);

The first two pages of each group have a fixed type, one XDES, which registers the attributes of the 256 extents in the group. FSP_HDR pages are similar to XDES pages, but FSP_HDR pages also store some attributes of the tablespace.

What is the main purpose of the introduction area?

By inserting one record into the table, we essentially insert data into the clustered index of the table and into the nodes of the B+ tree represented by all secondary indexes. The pages in each layer of the B+ tree form a bidirectional linked list, and if storage space is allocated on a page basis, the physical locations of adjacent pages of the bidirectional linked list can be very far apart.

When we introduced the application scenario of B+ tree index, we specifically mentioned that range query only needs to locate the leftmost record and the rightmost record, and then scan along the bidirectional list. If the two adjacent pages in the list are physically far apart, it is called random I/O.

Again, the disk speed is several orders of magnitude different from the memory speed, and random I/O is very slow, so we should try to make the physical locations of adjacent pages in the linked list adjacent to each other so that we can use so-called sequential I/O for range queries.

An area is 64 consecutive pages in a physical location. When a table has a large amount of data, the space allocated for an index is not allocated by page, but by extents. Even when the table has a very, very, very large amount of data, multiple contiguous extents can be allocated at one time, which can eliminate a lot of random I/O from the perspective of performance.

Segment (segment)

The scope query we mentioned is actually a sequential scan of the records in the leaf nodes of the B+ tree, and the effect of the scope scan is greatly reduced if the pages represented by the nodes are placed in the applied area without distinction between leaf nodes and non-leaf nodes. So InnoDB makes a distinction between leaf nodes and non-leaf children of the B+ tree, meaning that leaf nodes have their own regions, and non-leaf nodes have their own regions. A set of leaves is a segment, and a set of non-leaves is a segment. That is, an index generates two segments, a leaf segment and a non-leaf segment.

A segment does not correspond to a contiguous physical region of a tablespace, but rather is a logical concept.

3.2.2 System tablespaces

A system tablespace contains more information about the entire system than a separate tablespace. The system tablespace contains more information about the entire system than a separate tablespace. So its tablespace ID is 0.

The type of the first three pages of the system table space is the same as that of the standalone table space, except that pages with page numbers 3 through 7 are unique to the system table space. They include:

Page number	Page type	English description	describe
3	SYS	Insert Buffer Header	Stores the header of the Insert Buffer
4	INDEX	Insert Buffer Root	The root page that stores the Insert Buffer
5	TRX_SYS	Transction System	Information about the transaction system
6	SYS	First Rollback Segment	The first page to roll back the segment
7	SYS	Data Dictionary Header	Data dictionary header information

Extent 1 and extent 2 of the system tablespace, 128 pages with page numbers from 64 to 191, are called Doublewrite buffers.

4. Three features of InnoDB

Double-write Buffer/double-write is one of InnoDB’s three main features, along with Buffer pools and adaptive Hash indexes.

*4.1 Double-write buffer

DoubleWrite Buffer, also known as DoubleWrite, is a special file flush technology that gives InnoDB storage engine data page reliability.

InnoDB writes pages to a contiguousarea called a DoubleWrite buffer before writing them to the data file. InnoDB writes pages to the appropriate location in the data file after the doubleWrite buffer is written. If an unexpected crash occurs during page writing, InnoDB finds an intact copy of the page in the Doublewrite buffer for recovery later in the recovery process.

So, although it is called the double-write buffer, this buffer is not only in memory, but more of the MySQL system tablespace, which is part of the disk file. We know it from the following questions.

1️ that why double write buffer?

Double write buffer to write to disk, after the data also need to write to disk, why need to write a double write buffer in advance? Isn’t that adding IO?

2️ what is partial Page write?

The page size of InnoDB is generally 16K, while the page size of the operating system is 4K. As a result, InnoDB writes back data to the operating system. A page needs to be written four times. On the other hand, computer hardware and operating systems, if the writing process crashes or fails, only part of the data may be written back to the operating system file system, and the entire data is only partially successful. In fact, the data is corrupted. This can cause partial Page write problems. At this time, the page data is not the same situation, thus forming a “broken” page, so that the data generated chaos.

The double-write buffer is InnoDB’s 128 pages (2 extents, EXTENd1 and Extend2) on the table space and is 2MB in size. MySQL flush the dirty data into the tablespace by copying the dirty data to a region of memory (also 2M), and then writing the dirty data to the system tablespace twice. Then immediately call fsync (fsync is an API provided by the underlying operating system to ensure that the data has been dropped) and synchronize to disk.

In this process is sequential write, the overhead is not large, after completing doublewrite, and then write data to each data file file, this is discrete write. So under normal circumstances, MySQL writes data pages twice to disk, first to the DoubleWrite buffer and then to the actual data file. If there is an extreme case (power outage) and InnoDB starts up again and finds that a page is corrupted, then data can be recovered from the double-write buffer.

3️ is double write necessary?

As mentioned earlier, the Doublewrite Buffer on the system table space is actually a file. Writing to the system table space results in more fsync operations on the system, and the fsync performance factor on the hard disk can degrade MySQL’s overall performance. But in storage, doublewrite is in a continuous storage space, so the hard disk when writing data is sequential write, rather than random write, so the performance impact is not big, compared to not doublewrite, reduced about 5-10% or so.

So, in some cases, you can turn doubleWrite off for better performance. For example, it can be turned off on a slave because data can be recovered from the secondary log even if a partial Page write problem occurs. For example, some file systems, such as ZFS, provide a partial write failure defense mechanism. You can also disable it.

4 How to recover data from ️ double write buffer?

The double-write buffer exists for Innodb to keep MySQL data atomic. Under the condition of the database abnormal shutdown started, do database recovery (redo) operation, the process of recovery, the database will check page isn’t legal (check, etc.), if you find a page check result, at this time will use the function of writing this, and the page has a problem of data recovery.

MySQL > select * from ‘mysql_database’;

5️ if write double write buffer is hung?

The first step to write the disk to the double write buffer is likely to fail. If this fails, MySQL will restore the data based on the B+ tree structure on disk and the Redo log.

Here comes the question:

6️ discount since Redo log recovery is possible, why double write buffer?

The Redo log records physical operations on data pages: XXX updates were made to the XXX data page XXX offset in the XXX tablespace.

If the pages are corrupted, no recovery operation can be performed. So we need a copy of the page so that if the server goes down, we can use the copy to restore the original data page. That’s what Doublewrite does.

7️ what if Redo logging fails?

InnoDB has transactions, so this is not the case. Every Redo log is flushed at commit time. The transaction is not complete until the Redo log is flushed.

8️ what is dirty page?

Dirty pages, also known as memory pages. A memory page is called a dirty page when its contents do not match those of a disk.

A clean page is a page whose contents are consistent with those of the data page on the disk after data is written to the disk.

4.2 Buffer Pool Buffer Pool

As we know, to use the InnoDB table as an engine of storage, whether it is used for storing user data index (including the cluster index and secondary index), or a variety of system data, are stored in the table space is in the form of page, the so-called table space is just the InnoDB one or a few actual file on the file system of the abstract, That means our data is, after all, stored on disk.

However, disk speed is slow, so InnoDB storage engine handles client requests and loads the entire page into memory when it needs to access the data of a page. This means that even if we only need to access one record of a page, we need to load the entire page into memory first.

After loading the entire page into memory, it is ready for read and write access. After reading and writing access, it is not in a hurry to free up the memory space corresponding to the page. Instead, it is cached so that future requests to access the page again can save disk IO overhead.

Note: The Buffer Pool and MySQL query cache are not the same thing. The Buffer Pool is located in the storage engine layer.

1️ what is Buffer Pool?

In order to cache the pages in the disk, InnoDB requests a contiguous piece of memory from the operating system when MySQL server starts up. They call this piece of memory a Buffer Pool.

By default, the Buffer Pool is only 128 MB in size (this value is actually small).

The Buffer Pool size can be controlled by setting innodb_buffer_pool_size when starting the server.

2 Internal composition of ️ Buffer Pool

The default page size in the Buffer Pool is the same as the default page size on disk, which is 16KB.

To better manage the cached pages in the Buffer Pool, InnoDB creates some so-called control information for each cached page. This control information includes the tablespace number of the page, the page number, the address of the cache page in the Buffer Pool, the linked list node information, some lock information, LSN information, and of course some other control information, which is called the control block.

PS: Most of the MySQL cache data is stored in the Buffer Pool. The cache page is the data. The data queried by the database is stored in the cache page for quick query. Control block to save the cache also a variety of information address, used to find the cache page.

3 Linked list structure of ️ Buffer Pool

(1) Free list

The Free list is simply a chain of all the idle cache pages. In fact, the address of the control block corresponding to the idle cache pages is maintained by a linked list. So the next time new data comes in, you can go directly to the Free chain and find the Free cached page directly.

The Free chain will also have a block that stores the header, the tail, and the number on the list.

(2) Flush linked list

Flush chains are structured exactly like Free chains.

Since we know that the Buffer Pool caches the data queried from the database, there must be a question: what if the data is modified?

MySQL maintains a linked list of the control blocks of the modified data in the buffer to quickly locate the modified data, also known as dirty data, hence the dirty chain.

PS: After the MySQL data is submitted, it is not immediately dropped, but still in the buffer, and finally it is uniformly dropped. Since there are naturally Redo logs when data is committed, data can be recovered if the database dies.

If it’s dirty data, why didn’t you delete it?

Because MySQL itself doesn’t dump data immediately. MySQL has a scheduled task to Flush data on the chain. Second although it is called dirty data, but in essence these data is the data submitted by the user, just declining disk, when reading directly read is no problem.

(3) the LRU list

LRU (Least Recently Used) linked list is a kind of Least Recently Used obsolete linked list, simple logic is to maintain a linked list, if the data is Used to refer to the head of the linked list. If the list is full and needs to be eliminated, it is eliminated from the end of the list.

MySQL has made its own improvements to the LRU list

SQL > select * from ‘MySQL’; select * from ‘MySQL’; MySQL maintains the data in an LRU linked list to weed out the infrequently used data, leaving the hot data.

MySQL to improve LRU

The LRU linked list is divided into two parts, with hot data removal (Young area) in the front and cold data removal (Old area) in the back, and the size of Old area accounts for 37%. Advantages: hot and cold chain segmentation, the exclusion of the full table scan and other similar large data query, the hot data directly out of the buffer.
There is a time limit for the cold chain data to move to the hot chain. Data access on the cold chain will not be moved to the hot data area within the specified time. Data access on the cold chain will only be moved to the hot data area after the specified time. Advantages: Avoid hot data being removed from the hot chain caused by frequent refresh of large data volume such as full table scan in a short time.

The part of the hot chain doesn’t move forward every time it’s visited. Only the data in the last quarter of the hotspot will be moved to the head of the hot chain during access, reducing resource consumption caused by movement and improving performance.

The ratio of hot and cold areas can be adjusted by the following parameters:

parameter	instructions
innodb_old_blocks_pct	Adjust the ratio of hot and cold areas to 37% by default
innodb_old_blocks_time	Adjust the limit interval, default 1 s

View the default values of parameters:

mysql> show variables like 'innodb_old_blocks_pct'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | innodb_old_blocks_pct | 37 | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + 1 row in the set (0.00 SEC) mysql > show variables like 'innodb_old_blocks_time'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | innodb_old_blocks_time 1000 | | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + 1 row in the set (0.00 SEC)Copy the code

4️ MySQL refreshing data fall channel

MySQL has a timer in the background to uniformly refresh and drop data, so as not to affect user threads to process normal requests. There are two main refresh paths:

Flush a portion of the page to disk from the cold data of the LRU list.

The background thread will periodically scan some pages from the end of the LRU list. The number of pages to scan can be specified by the system variable innodb_lru_scan_depth. If dirty pages are found from the inside, they will be flushed to disk. This way of refreshing the page is called BUF_FLUSH_LRU.
Flushes a portion of the page from the Flush list to disk.

Background threads also periodically flush pages from the Flush list to disk at a rate that depends on whether the system is busy at the time. This way of flushing the page is called BUF_FLUSH_LIST.

Sometimes the background thread is slow to refresh the dirty page drop disk, causing the user thread to load a disk page into the Buffer Pool without available cache pages. In this case, the user thread will try to see if there are unmodified pages at the end of the LRU list that can be freed directly. If not, a dirty page at the end of the LRU list would have to be synchronously flushed to disk (interacting with disk is slow, which slows down the processing of user requests). This flushing of individual pages to disk is called BUF_FLUSH_SINGLE_PAGE.

Of course, there may be times when the system is particularly busy when user threads flush dirty pages from the flush list in batches. Obviously, flushing dirty pages during user requests is a serious slow down, and it is a necessity.

5️ Multiple Buffer Pool examples

As mentioned above, InnoDB’s Buffer Pool is essentially a contiguous memory space applied to the operating system. In a multi-threaded environment, all linked lists in the Buffer Pool need to be locked. If the Buffer Pool is very large and multi-threaded concurrent access is very high, A single Buffer Pool can affect the processing speed of requests. Therefore, when the Buffer Pool is very large, we can divide it into several small Buffer pools. Each Buffer Pool is called an instance, and they are all independent. They apply for memory space independently and manage various linked lists independently. Therefore, the concurrent access of multiple threads does not affect each other, thus improving the concurrent processing capacity.

We can change the number of Buffer Pool instances at server startup by setting innodb_buffer_pool_instances to the value.

How much memory does each Buffer Pool instance actually occupy? It was actually calculated using this formula:

innodb_buffer_pool_size/innodb_buffer_pool_instances
Copy the code

This is the total size divided by the number of instances, resulting in the size of each Buffer Pool instance. InnoDB does not have to create as many Buffer pools as possible. There is also a performance overhead in managing each Buffer Pool separately. InnoDB states:

If the value of Innodb_buffer_pool_size (default 128MB) is less than 1G, InnoDB will change the value of Innodb_buffer_pool_instances to 1 by default.

Therefore, if a Buffer Pool is greater than or equal to 1 GB, multiple Buffer Pool instances should be configured.

6️ Check the state information of Buffer Pool

MySQL provides the SHOW ENGINE INNODB STATUS\G statement to view information about INNODB storage ENGINE running STATUS, including information about the Buffer Pool.

Buffer Pool status:

mysql> SHOW ENGINE INNODB STATUS\G; *************************** 1. row *************************** Type: InnoDB Name: Status: ===================================== 2021-12-07 19:46:57 0x7f25c4111700 INNODB MONITOR OUTPUT ===================================== ---------------------- BUFFER POOL AND MEMORY ---------------------- Total large memory allocated 137428992 Dictionary memory allocated 208861 Buffer pool size 8191 Free buffers 7016 Database pages 1172 Old database pages 451 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, Flush List 0, single Page 0 Pages made young 0, not young 0 0.00 youngs/s, 0.00 non-youngs/s Pages Read 992, Written 1098 0.00 reads/s, 0.00 creates/s, 0.00 index /s No buffer pool page gets since the last printout Pages read ahead 0.00/s, evicted without access 0.00/s, Random Read Ahead 0.00/s LRU len: 1172, unzip_LRU Len: 0 I/O sum[0]:cur[0], unzip sum[0]:cur[0] --------------Copy the code

Get to know the parameters:

Total Memory Allocated: Allocated memory that represents the Buffer Pool allocated to the operating system, including the size of all control blocks, cache pages, and fragments.
Dictionary memory allocated: Amount of memory allocated to data Dictionary information. Note that the memory is not related to the Buffer Pool and is not part of Total memory allocated.
Buffer pool size: Indicates how many Buffer pages the Buffer pool can hold. Note that the unit is pages.
Free buffers: Represents how many Free buffers are left in the current Buffer Pool, that is, how many nodes are left in the Free list.
Database Pages: Represents the number of pages in the LRU linked list, including the number of nodes in the young and old sections.
Old Database Pages: represents the number of nodes in the Old section of the LRU list.
Modified DB Pages: represents the number of dirty pages, which is the number of nodes in the Flush linked list.
Pending reads: Number of pages that are waiting to be loaded from disk into the Buffer Pool. When a page is ready to be loaded from disk, a Buffer Pool is allocated for the page and its corresponding control block. The control block is then added to the head of the LRU’s old section, but the actual disk page is not loaded. The Pending reads value is then incremented by 1.
Pending writes LRU: Number of pages to be flushed from the LRU list to disk.
Pending writes Flush List: Number of pages to be flushed from flush list to disk.
Pending writes Single Page: Number of pages to be flushed to disk as a single page.
Pages Made Young: Represents the number of nodes in the LRU list that have been moved from the old section to the head of the young section.
Page made not young: When innodb_olD_blocks_time is set to a value greater than 0, the Page Made Not young value is increased by 1 if a node in the old section cannot be moved to the young section header for the first or subsequent access because it does not meet the time interval limit.
Youngs /s: Indicates the number of nodes moved from the old region to the head of the Young region per second.
Non-youngs /s: Indicates the number of nodes per second that cannot be moved from the old region to the head of the young region because the time limit is not met.
Pages Read, created, written: indicates how many Pages are read, created, and written. This is followed by the read, create, and write rates.
Buffer pool hit rate: Indicates the number of times a page has been cached in the Buffer pool.
Young-making Rate: Indicates how many visits to the page moved to the head of the Young section over a period of time, out of an average of 1000 page visits.
Not (Young-making Rate) : Indicates how many of the average 1000 page visits in the past did not move the page to the head of the Young section.
LRU len: Indicates the number of nodes in the LRU list.
Unzip_LRU: Represents the number of nodes in the unzip_LRU list.
I/O sum: indicates the total number of disk pages read in the last 50s.
I/O CUR: Number of disk pages being read now. I/O unzip sum: number of decompressed pages in the last 50s.
I/O unzip CUR: Number of pages being decompressed.

4.3 Adaptive Hash Index

The adaptive hash index is developed by Innodb engine to improve the query efficiency of MySQL.

InnoDB storage engine monitors Index tables internally. If an Index is frequently used, it is considered as hot data, and then creates a hash Index internally, called Adaptive Hash Index (AHI).

After the index is created, if the index is queried next time, the hash algorithm is used to derive the address of the record and the data can be queried at one time, which is much more efficient than repeatedly querying nodes in the B+tree index three or four times.

MySQL also adopts the hotspot dispersion technology in the design of hash indexes. The default value of such hash indexes in MySQL is 8, and the hotspot data will be distributed to different hash indexes. Therefore, hot data access can be distributed to different hash indexes, which improves the performance of concurrent access.

Note: Adaptive hash indexes are only created and used by the database itself, and we cannot interfere with them.

You can also use the SHOW ENGINE INNODB STATUS\G command to see the current use of adaptive hash indexes.

------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 0, seg size 2, 0 merges merged operations: insert 0, delete mark 0, delete 0 discarded operations: insert 0, delete mark 0, delete 0 Hash table size 34673, node heap has 1 buffer(s) Hash table size 34673, node heap has 0 buffer(s) Hash table size 34673, node heap has 0 buffer(s) Hash table size 34673, node heap has 0 buffer(s) Hash table size 34673, node heap has 0 buffer(s) Hash table size 34673, node heap has 1 buffer(s) Hash table size 34673, node heap has 0 buffer(s) Hash table size 34673, Node heap has 1 buffer(S) 0.00 hash searches/s, 0.00 non-hash searches/s --Copy the code

5. InnoDB memory structure

6. Summary

This article mainly introduces the MySQL infrastructure and three features of the InnoDB engine, including a small summary of the three features of InnoDB, and part of the content is transferred from it_ Lihongmin’s blog, the main purpose is to improve the depth and documentation of MySQL.

In the process of understanding InnoDB engine, I saw such an interview question:

How can the buffer data of a single instance of MySQL be consistent with the disk data?

You should have an answer after reading this article