Speaking of MySQL, in fact, there are a lot of tutorials on the Internet, why I have to write this article, probably because many websites on the Internet are relatively scattered, and the description is not intuitive, can not have a systematic study of MySQL related knowledge, leading to the formation of knowledge system. I wrote this article to try to make the underlying architecture more intuitive:

  • As far as possible tographicDescribe the technical principles;
  • It involves key technologies, pluswebsiteorTechnical booksSource, convenient for you to further expand learning;
  • Involved in theBackground knowledgeAs far as possible to do an explanation, such as discussing the log buffer flush mode, extend the IO write disk related knowledge.

Well, the MySQL from Unable to Master series is about to begin (if you still can’t… Please ignore this sentence).

Some students may ask: why not directly learn the more advanced TiDB, or the powerful OceanBase.

Actually, MySQL as the old application scenario for the wide range of relational source database, its underlying architecture is very worthy of our learning and absorbing essence of his design, so we can draw lessons from the scheme design of work at ordinary times, if the project is using MySQL, so can the database with the better, to understand the MySQL execution of the underlying principle, It is also a great help for tuning. This article focuses on the MySQL underlying architecture, involving:

  • Memory structure:buffer pool,log buffer,change buffer, what is the page flushing mechanism of buffer pool?
  • The disk structure:System table space,Independent table space,Generic table space,The undo tablespace,redo log;
  • As well asIORelated underlying principles and queriesSQL Execution Process, dataPage structureandLine structureDescription,Clustered indexandSecondary indexThe underlying data organization,MVCCThe underlying implementation principle of multi-version concurrency control, as well as possibleRepeat to read,Reading has been submittedHow to do this through MVCC.

After reading this article, you will know:

  1. **InnoDB storage architecture (1, MySQL)
  2. ** Working principle: ** query statement underlying execution process is how (2, query SQL execution process)
  3. IO performance: ** filesIO operationsWhat are the ways to write disk? What are the IO optimization methods? (3.1.2.
  4. Cache:InnoDB cacheInnodb_flush_method (3.1.2.2, innodb_flush_method)
  5. ** Cache: ** When the log buffer is written to disk (3.10.2, how to ensure data is not lost – where step 4 log buffer persistence to disk is)
  6. ** Cache: ** Why do REDO log prepare write to disk? 3.10.2 How do I keep data from being lost? Why do I write the redo log prepare state to disk in step 2?
  7. ** Cache: ** When do dirty pages write to disk generally occur (3.10.2, how to ensure data is not lost – where step 5: when dirty pages are flushed to disk is)
  8. ** Why can’t unique indexes be updated with change buffer (3.2, change buffer)
  9. Cache:log bufferLog flush control parameters ofinnodb_flush_log_at_trx_commitImpact on write Performance (3.4.1 Configuration Parameters)
  10. ** Buffer pool LRU (3.1.1 buffer pool LRU algorithm)
  11. ** Table storage: ** system table space structure, MySQL InnoDB disk storage format, variousTable spaceWhat are the advantages and disadvantages of (system tablespaces, isolated tablespaces, generic tablespaces)?ibdata,ibd,frmWhat are the files (3.5, tablespace)?
  12. ** Row field storage: ** Underlying page and row storage format (3.6, InnoDB underlying logical storage structure)
  13. Row field storage:varchar.nullWhat is the maximum available storage length (3.6.3.1, MySQL vARCHar maximum length)
  14. ** line field storage: The ** line record is too long for one page, how to store it? (3.6.3.2. How to Store Row Records over page size)
  15. ** Index: ** databaseThe indexWhat is the way of organization of, understand why to adoptB + tree(3.7, index – why MySQL uses B+ trees)
  16. ** Indexes: How are indexes organized and whyLarge fieldTable performance (query performance, update performance) (3.7, index)
  17. Index:Cover index,Joint indexUnder what circumstances does it take effect (3.7.2, Secondary Indexes)
  18. ** Index: ** What isAn index pushdownWhat overhead does index push-down reduce? (3.7.2 Secondary Index – Index Condition push down)
  19. Index:Change BufferWhat are the optimizations for secondary index DML statements (3.2, Change Buffer)
  20. ** How does MySQL guarantee data integrity?redo log,undo logandbuffer poolWhat are the key roles of data integrity (3.10.2, how to ensure data is not lost)
  21. MVCC:MVCCHow is the underlying implementation, repeatable read and read committed implemented (3.11.2 MVCC implementation Principle)
  22. What is the use of a Doublewrite Buffer (3.9, Doublewrite Buffer)
  23. When are Redo logs written in a transaction? What’s the difference between a binlog and a Redo Log? 3.10.1 Timing of Redo Log writing

1. MySQL architecture

Common components of the MySQL architecture are shown below:

2. Query the SQL execution process

There is the following table:

We execute the following SQL:

select * from t_user where user_id=10000;
Copy the code

2.1 establish a connection between the MySQL client and the server

As shown below, the establishment process:

  • The client initiates a connection request through the mysql command.
  • The TCP connection is established with the server after the three-way handshake.
  • The connector uses the user password to authenticate after receiving the request;
  • After the verification, access to the user’s permission information cache, the connection is based on the cached permission to execute SQL;

For Java applications, it is common to reuse established connections in the database connection pool. As long as the connection is not closed, it will remain on the MySQL server, as shown in the show processList command:

Note that there is a Time indicating how long the connection has been inactive. The example above is 656 seconds. By default, if there is no activity for more than 8 hours, the connector will automatically disconnect, which can be controlled with the wait_timeout parameter.

2.2. Execute SQL

Execute SQL as shown below:

  • After receiving the SQL query from the client, the server first tries to query whether the SQL query has the cached result from the query cache. If yes, the server returns the result directly. If no, go to the next step.
  • Analyzer will try to get SQL after lexical analysis and syntax analysis of SQL statements, verify the correctness of syntax, continue to execute after passing;
  • The optimizer takes the SQL from the parser, parses the SQL, determines which index to use, rewrites the SQL based on the actual situation, and finally generates the execution plan.
  • The executor executes the SQL according to the execution plan. Before executing the SQL, the operation permission is verified. Then, according to the table storage engine, the drinking interface is called to query the data. Here, the number of scanned rows refers to the number of records returned by the interface. The executor further processes the returned records, as in this example:
    • Select * from t_user where user_id=10000 (select * from t_user where user_name = “arthinking”); select * from t_user where user_id=10000 (select * from t_user where user_id=10000);

3. InnoDB engine architecture

The following figure shows the architecture of the storage engine:

The structure in memory is not very good to observe directly, but the disk can be seen, let’s go to the disk of the MySQL data folder to see:

CD Innodb_data_home_dir

|- ib_buffer_pool  // Save the tablespace ID and page ID of the page in the buffer pool, which can be used to restart the recovery buffer pool
|- ib_logfile0  // redo log File 1
|- ib_logfile1  // redo log Disk file 2. By default, redo logs are stored in these two files on disk and written to the redo log in a circular fashion
|- ibdata1  // System tablespace files
|- ibtmp1  // Default temporary tablespace file. Innodb_temp_data_file_path specifies the file location
|- mysql/
|- mysql-bin000001.  // bin log file
|- mysql-bin000001.  // bin log file. |- mysql-bin.index// bin log file index
|- mysqld.local.err  // Error log
|- mysqld.local.pid  // Mysql process id
|- performance_schema/  // Performance_schema Database
|- sys/  // Sys database
|- test/  // Database folder
    |- db.opt  // test Database configuration file, containing database character set properties
    |- t.frm  // Data table metadata file, one for each table, whether using a separate table space or a system table space
    |- t.ibd  // Database table independent tablespace file. If independent tablespace is used, each table corresponds to an IBD file. Otherwise, the ibD file is saved in the system tablespace file
Copy the code

innodb_data_home_dir30

ib_buffer_pool31

ib_logfile032

ibtmp133

db.opt34

Let’s go through them one by one.

3.1, the buffer pool

A buffer pool is an area of main memory into which InnoDB caches pages of table data and index data. If the data is read directly from the buffer pool, processing will be faster. On dedicated servers, about 80% of the physical memory is typically allocated to the buffer pool.

To improve cache management efficiency, the buffer pool links pages to lists and uses a modified version of the LRU algorithm to age out little-used data from the cache.

3.1.1 Buffer pool LRU algorithm

Manage buffer pool lists using an improved version of the LRU algorithm.

When new pages need to be stored in the buffer pool, the least recently used pages are weeded out and the new pages are added to the head of the old sublist.

Operation mode of the algorithm:

  • The default 3/8 buffer pool is for old sublists;
  • When a new page becomes a buffer pool, it is first inserted into the head of the old sublist.
  • Repeated visits to the page of the old sublist move it to the head of the new sublist;
  • As the database runs, pages move to the end of the list, and pages that are not located in the buffer pool will eventually age out.

Related optimization parameters:

  • Innodb_old_blocks_pct: Controls the percentage of old sublists in LRU lists. The default value is 37 (3/8), and the range is 5-95.
  • Innodb_old_blocks_time: Specifies a time window after the first time a page is accessed without moving it to the top of the LRU list. The default is 1000, which is 1 second.

Innodb_old_blocks_time is very important. With this 1 second, for the full table scan, because the scan is sequential, generally the data of the same data page is accessed within one second, it is not upgraded to the new sublist, it is always in the old sublist of data, so it does not affect the cache of the new sublist.

3.1.2 About disk I/O modes

O_DIRECT is an optional value for the innodb_flush_method parameter.

This section describes file I/O operations that are closely related to database performance

3.1.2.1 File IO operation method

Database system is based on file system, and its performance is closely related to device read and write mechanism.

Open: Opens a file1
int open(const char *pathname, int flags);
Copy the code

The system call Open gives the process a file descriptor fd. The flags used are as follows:

  • O_WRONLY: indicates that we open it “write”, telling the kernel that we need to write data to the file;
  • O_DSYNC: The system waits for the physical I/O to complete each write operation. However, if the write operation does not affect the reading of newly written data, the system does not wait for the update of file properties.
  • O_SYNC: Each write waits until the physical I/O is complete, including the update of file attributes caused by the write.
  • O_DIRECT: Directly transfers data from user space to files or disk devices, bypassing the buffer cache (kernel buffer). This is called direct IO (DIRECT IO). Because there is no OS cache, O_DIRECT reduces the efficiency of sequential file reads and writes.
Write: Writes files2
ssize_t write(int fd, const void *buf, size_t count);
Copy the code

Once you have the file descriptor, you can call the write function to write the file, depending on the argument to the open function.

Fsync & fdatasync: refreshes files3
#include <unistd.h>

int fsync(int fd);

int fdatasync(int fd);
Copy the code
  • fdatasync: After write, you can flush the file blocks to disk by calling fdatasync. If fdatasync returns a success, the data is considered to have been written to disk.
  • fsync: Similar to the O_SYNC parameter, fsync also updates file metadata to disk;
  • syncSync simply writes the modified block buffer to the queue and returns before the actual write operation is complete.

To ensure that file updates are successfully persisted to disk, you need to call fsync in addition to calling the write method.

The general interaction process is shown as follows:

For more information about Disk IO, see: On Disk IO, Part 1: Flavors of IO27

** Fsync performance issues: ** In addition to flushing dirty pages to disk, fsync also synchronizes file metadata. File data and metadata are usually stored in different places on disk, so fsync requires at least two IO operations.

Fsync performance optimizations: If you can reduce metadata updates due to the above performance issues, you can use Fdatasync. Therefore, you need to ensure that the file size does not change before and after write. To do this, you can create a fixed-size file for writing, and then open a new file to continue writing.

3.1.2.2, innodb_flush_method

Innodb_flush_method defines the methods used to flush data to InnoDB data files and log files, which can affect I/O throughput.

The specific parameters are as follows:

attribute value
Command line format –innodb-flush-method=value
System variables innodb_flush_method
The scope of global
The default value (Windows) unbuffered
The default value (Unix) fsync
RMS (Windows) unbuffered, normal
RMS (Unix) fsync, O_DSYNC, littlesync, nosync, O_DIRECT, O_DIRECT_NO_FSYNC

These three are commonly used:

fsync

Default, using the fsync() system call to flush data files and log files to disk;

O_DSYNC

InnoDB does not use O_DSYNC directly because the O_DSYNC parameter of the open function is problematic on many Unix systems.

InnoDB is used for O_SYNC to open and refresh log files and fsync() to refresh data files.

Log writing is done in write function, data file writing is done by fsync() system call.

O_DIRECT

Open data files using O_DIRECT (which corresponds to Directio () on Solaris) and refresh data files and log files using fsync(). This option is available on some GNU/Linux versions, FreeBSD, and Solaris.

Data files are written directly from the buffer pool to disks without being buffered by the operating system. Logs are still cached by the operating system.

O_DIRECT_NO_FSYNC

InnoDB uses O_DIRECT during I/O refresh and skips fsync() system calls after each write operation.

This setting applies to some types of file systems, but not to others. For example, it does not apply to XFS. If you are not sure whether the file system you are using needs fsync() (for example, to keep all file metadata), use O_DIRECT instead.

As shown below:

Why do I need to call fsync() after using O_DIRECT configuration?

Innodb calls fsync for index with innodb_flush_method=O_DIRECT35 Innodb calls fsync for index with innodb_flush_method=O_DIRECT35

Some tests conducted by Domas show that some file systems (XFS) will not synchronize metadata without fsync. If the metadata changes, you still need to use fsync (or O_SYNC to open the file).

For example, if you increase the file size with O_DIRECT enabled, it will still write to the new part of the file, but since the metadata does not reflect the new size of the file, the tail of the file may be lost if the system crashes at this point.

To this end: When important metadata changes, continue to use fsync or choose to use O_SYNC in addition to O_DIRECT.

MySQL has provided the O_DIRECT_NO_FSYNC option since V5.6.7 to solve this problem.

3.2, Change the Buffer

The change buffer is a special data structure that caches changes when secondary index pages (non-unique indexes) are not in the buffer pool. When pages are loaded into the buffer pool by other read operations, the change buffer generated by INSERT, UPDATE, or DELETE operations (DML) is merged into the data page of the buffer pool.

Why can’t unique indexes use chage buffer?

For unique indexes, if the buffer pool does not have corresponding data pages, you still need to load data pages from disk first to determine whether records are duplicate. This step is unavoidable.

Common indexes are not unique, and inserts occur in a relatively random order. Deletion and update will also affect non-adjacent secondary index trees in the index tree. By using merge buffer, it avoids generating a large number of random I/O accesses on disk to obtain common index pages.

The problem

Change Buffer merges can take several hours when there are many affected rows and many secondary indexes to update, during which time I/O increases and query efficiency can be greatly reduced. Change Buffer merges continue to occur even after the transaction commits or after the server restarts. Related reading: Section 14.22.2, “Forcing InnoDB Recovery”

3.3. Adaptive hash index

The adaptive hash index feature is enabled by the innodb_adaptive_hash_index variable or disabled by — skip-Innodb-adaptive-hash-index at server startup.

3.4, the Log Buffer

The log buffer is used to hold data to be written to a log file on disk. The contents of the log cache are periodically flushed to disk.

The log buffer size is defined by the innodb_log_buffer_size variable. The default size is 16MB. Large log buffers allow large transactions to avoid writing redo logs to disk before committing.

If you have updates, insert, or delete multi-line transactions, try increasing the log buffer size to save disk I/O.

3.4.1. Set Parameters

innodb_flush_log_at_trx_commit

The innodb_flush_LOG_at_trx_COMMIT variable controls how the contents of the log buffer are written and flushed to disk.

This parameter controls whether to store ACID strictly or to try to achieve better performance. This parameter can achieve better performance, but can cause data loss in the event of a system crash.

Optional parameters:

  • 0, after the transaction is committed, the log is only recorded in the log buffer, and the log is written to the cache and refreshed to disk every second. The logs that are not refreshed may be lost.
  • 1, to be fully compliant with ACID, this value must be used to indicate that the log is written to the cache and flushed to disk on each transaction commit;
  • 2. After each transaction is committed, logs are written to page cache and flushed to disk every second. Unrefreshed logs may be lost.

innodb_flush_log_at_timeout

The innodb_flush_LOG_at_TIMEOUT variable controls the log flush frequency. Allows you to set the log refresh rate to *N seconds (where N* is 1… 2700, default is 1)

To prevent data loss, perform the following operations:

  • If binlog is enabled, set sync_binlog=1.
  • innodb_flush_log_at_trx_commit=1;

The configuration effect is as follows:

3.5. Table Spaces

An InnoDB table and its index can be built in a system table space, in a separate table space, or in a generic table space.

  • When innodb_file_per_TABLE is enabled, tables are usually stored in separate table Spaces, which is the default configuration.
  • When innodb_FILe_per_TABLE is disabled, tables are created in the system tablespace.
  • To CREATE a TABLE in a generic TABLE space, use CREATE TABLE… In TABLESPACE syntax. For more information, see the official document 14.6.3.3 General Tablespaces.

Table space overview diagram:

The files involved in the tablespace

Innodb_data_home_dir: innodb_data_home_dir:

| - ibdata1 / / system tablespace file | - ibtmp1 / / the default temporary tablespace file, Through innodb_temp_data_file_path attribute to specify the file location folder | | - test / / / database - db. Opt / / test database configuration file, containing the database character set properties | - t.f rm / / data table metadata file, Whether using separate tablespace or system tablespace, each table should be on a | - t.i bd / / independent table space database table file, if you are using independent table space, is a table corresponding to a ibd files, or stored in the file system table spaceCopy the code

FRM file

When creating an InnoDB table, MySQL creates a.frm file in the database directory. FRM files contain metadata for MySQL tables (such as table definitions). Every InnoDB table has a.FRm file.

Unlike other MySQL storage engines, InnoDB also encodes information about tables in its own internal data dictionary within the system table space. When MySQL deletes a table or database, it deletes one or more. FRM files and the corresponding entries in InnoDB’s data dictionary.

Therefore, in InnoDB, you cannot move tables just by moving.frm files. For information on Moving InnoDB Tables, see the official documentation 14.6.1.4 Moving or Copying InnoDB Tables.

Ibd file

For tables created in a separate table space, an.ibd table space file is also generated in the database directory.

Tables created in a generic tablespace are created in an existing regular tablespace.ibd file. Regular table space files can be created inside or outside the MySQL data directory. For more information, see the official document 14.6.3.3 General Tablespaces.

Ibdata files

System tablespace files, tables created in the InnoDB system tablespace are created in IBData.

3.5.1 System tablespaces

A system table space consists of one or more data files (ibData files). This contains metadata related to InnoDB related objects (InnoDB data Dictionary), as well as the change buffer, A store for doublewrite buffers and undo logs.

InnoDB if the table is created in the system table space, the system table space also contains the table data and index data of the table.

System tablespace problems

Prior to MySQL 5.6.7, the default setting was to keep all InnoDB tables and indexes in the system table space, which often resulted in the file becoming very large. Because system table Spaces never shrink, storage problems can occur if large amounts of temporary data are loaded and then deleted.

In MySQL 5.7, the default is set to independent table space mode, where each table and its associated indexes are stored in a separate.ibd file. This default setting makes it easier to use InnoDB features that use the **Barracuda file format, such as table compression **, efficient storage of off-page columns and large index key prefixes (Innodb_large_prefix).

Keeping all table data in a system table space or in a separate.ibd file usually has an impact on storage management.

InnoDB introduced generic table Spaces 6 in MySQL 5.7.6, which are also represented by.ibd files. A generic TABLESPACE is a shared TABLESPACE created using the CREATE TABLESPACE syntax. They can be created outside of the MySQL data directory and can hold multiple tables and support tables in all row formats.

3.5.2 Independent tablespace

In MySQL 5.7, the configuration parameter innodb_file_per_table is enabled by default. This is an important configuration option that affects InnoDB file storage, functionality availability, I/O features, etc.

When enabled, data and indexes for each table are stored in a separate. Ibd file rather than in a shared IBData file in the system table space.

advantages

  • You can be more flexibleData compression4Line format, as in:
    • By default (innodb_page_size=16K),The prefix index5Contains a maximum of 768 bytes. If innodb_large_prefix is enabled and Innodb tables store rows in DYNAMIC or COMPRESSED format, prefix indexes can contain up to 3072 bytes. Prefix indexes are also applicable.
  • TRUNCATE TABLEExecutes faster, and the reclaimed space is not retained, but used by the operating system;
  • A per-table file tablespace data file can be created on a separate storage device for I/O optimization, space management, or backup. See 14.6.1.2 Creating Tables Tables in the past;

disadvantages

  • The unused space in an independent tablespace can only be used by the same table. If it is not properly managed, space will be wasted.
  • Multiple tables need to be flushed, so only multiple fsync operations can be performed. Write operations on multiple tables cannot be merged, which may result in more total fsync operations.
  • Mysqld must keep an open file handle for each table file space, which may affect performance if there are too many tables;
  • Each table needs its own data file and requires more file descriptors;

Even if innodb_file_per_TABLE is enabled, only data, index and insert cache Bitmap pages are stored in each tablespace. Other data such as rollback information, insert buffer index pages, system transaction information, secondary write buffer, etc. are stored in the original shared tablespace.

3.5.3 Generic tablespaces

A generic TABLESPACE is created using the CREATE TABLESPACE syntax.

Similar to system tablespaces, generic tablespaces are shared tablespaces that can store data from multiple tables.

Generic table Spaces have a potential memory advantage over stand-alone table Spaces, as the server keeps table space metadata in memory for the life of the table space. A generic table space can typically hold multiple table data and consume less table space metadata memory.

Data files can be placed in the MySQL data directory or separate from the MySQL data directory.

3.5.4 undo tablespace

The undo tablespace contains the Undo log.

The innodb_ROLLBACK_segments variable defines the number of rollback segments allocated to each undo tablespace.

Undo logs can be stored in one or more Undo tablespaces rather than system tablespaces.

In the default configuration, undo logs are in the system table space. SSD storage is more suitable for the UNDO log I/O mode. To do this, you can store undo logs on SSDS that are not in the system tablespace.

The innodb_UNDO_TABLespaces configuration option controls the number of undo tablespaces.

3.5.5 Temporary tablespaces

Non-compressed temporary tables created by users and on-disk temporary tables are created in a shared temporary table space.

The innodb_temp_datA_file_path configuration option specifies the path to the table space file at zero. If not specified, an automatic expanded data file ibtmp1 slightly larger than 12MB is created by default in the innodb_data_home_dir directory.

COMPRESSED temporary tables created using the ROW_FORMAT=COMPRESSED attribute are created in the temporary file directory in a separate table space.

Temporary table Spaces are created when the service is started and destroyed when the service is shut down. If the temporary tablespace creation fails, it means that the service failed to start.

3.6 InnoDB underlying logical storage structure

Before introducing indexes, it is worth taking a look at InnoDB’s underlying logical storage structure, on which indexes are created. So far, all we’ve shown is the logical view of the physical disk. Let’s look at the underlying view.

3.6.1 Organizational structure of IBD files

Now open a table space IBD file and see how it is organized.

As shown in the following figure, a tablespace consists of segments, extents, and pages.

InnoDB’s smallest storage unit is pages, which are 16K per page by default.

InnoDB storage engine is row-oriented, data is stored in rows, and the maximum number of rows allowed per page = 16K / 2-200, that is, 7992 rows.

Segment: such as data segment, index segment, and rollback segment. InnoDB storage engine is B+ tree index organization, so data is index, index is data. The leaf nodes of a B+ tree store data segments.

3.6.2. Data page structure13

The name of the Take up the space describe
Fil Header 38 byte Basic information about a page, such as owning tablespace, previous page and next page Pointers.
Page Header 56 byte Data page proprietary related information
Infimun + Supremum 26 byte Two virtual row records that define record boundaries
User Records Dynamic allocation The contents of the row records that are actually stored
Free Space Dynamic adjustment Unused page space
Page Directory Dynamic adjustment The relative positions of some records in a page
Fil Trailer 8 byte Verify page integrity

About Infimun and Supremum: InnoDB automatically sets a minimum record and a maximum record in the root page when the index is first created and never deletes them. The lowest and highest records can be considered as part of the index page overhead. Initially, they both exist on the root page, but as the index grows, the lowest record will exist on the first or lowest leaf page, and the highest record will appear on the last or largest keyword page.

3.6.3. Line record structure description14

Let’s start with the Compact row record format, which was introduced in MySQL5.0. Compact is designed to store data efficiently, allowing more data to fit on a page, and for faster B+ tree look-up.

The name of the describe
Variable-length field length list The maximum field size is 2 bytes, that is, the maximum length: 2^16=65535 bytes; If the field size is smaller than 255 bytes, it is 1 byte.
NULL flag bit Records where fields in the line are null values
Record header information Record header information, fixed 5 bytes
Column 1 data The actual column data, NULL, does not take up space in that section
Column 2 data
.

Record headers are used to chaining consecutive records together and for row-level locking.

Each row of data has two hidden columns in addition to the user-defined columns:

  • A 6-byte transaction ID column;
  • A 7-byte rollback pointer column;
  • If InnoDB does not specify a primary key, a 6-byte ROWId column is added;

The record header packet 12 contains the following contents:

The name of the Size (bit) describe
(a) 1 The unknown
(a) 1 The unknown
deleted_flag 1 Whether the row has been deleted
min_rec_flag 1 1 if the record is the minimum predefined record
n_owned 4 The number of records owned by the record
heap_no 13 The sort number of this record in the index heap
record_type 3 Record type: 000 Common, 001 B+ tree node pointer, 010 Infimum, 011 Supremum, 1XX Reserved
next_record 16 Points to the next record on the page

22.2 InnoDB Page Structure

InnoDB Record Structure 22.1 InnoDB Record Structure

For more details on the Row Formats, see our website: 14.11 InnoDB Row Formats

According to the above format, the organization of records in the data page can be obtained:

3.6.3.1 What is the maximum length of varchar in MySQL

As we know, the maximum value of a field is 65535 bytes, which is the limit of the storage length.

For details, see 8.4.7 Limits on Table Column Count and Row Size

  • MySQL has a hard limit of 4096 columns per table, but for a given table the effective maximum may be less;
  • The maximum limit for a MySQL table is 65,535 bytes per row, which is a logical limit; When actually stored, the physical maximum row size of the table is slightly less than half the page size. If a row is less than half the length of a page, all rows are stored in the local page. If it is more than half of a page, the variable length column is selected for external off-page storage until the row size is kept within half a page.

The actual number of characters that can be stored depends on the encoding.

Background:

  • MySQL > MySQL 4.0; varchar(10); UTF8 (3 bytes);

  • MySQL 5.0 + vARCHar (10), which refers to 10 characters, can store 10 characters whether it is numbers, letters or UTF8 characters (3 bytes each), the maximum size is 65532 bytes;

Therefore, Mysql5 stores different sizes depending on the encoding.

So suppose we use utF8 encoding, then each character occupies a maximum of 3 bytes, that is, the maximum definition of VARCHAR (21845) characters, if the ASCII encoding, a character is equivalent to a byte, the maximum definition of VARCHar (65535) characters, let’s verify.

We tried to create a field like this:

CREATE TABLE `t10` ( `id` int(11) NOT NULL,
                  `a` int(11) NOT NULL.PRIMARY KEY (`id`)
                 ) ENGINE=InnoDB CHARSET=ascii ROW_FORMAT=Compact;


alter table t10 add `str` varchar(21845) DEFAULT NULL;

alter table t10 add `str` varchar(65535) DEFAULT NULL;
Copy the code

Discovery prompts this error:

mysql> alter table t10 add `str` varchar(65535) DEFAULT NULL;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
Copy the code

The reason is that according to the above line format, variable length field length list records also need to occupy space, occupying 2 bytes. In addition, empty fields are allowed within 8 bits, so the NULL flag bit occupies 1 byte, so the total number of characters we can store is:

655-2-2-4-4 =65534

Where -2 bytes represent the variable-length field list, -1 represents the NULL flag bit, and -4 represents the size occupied by two ints

Therefore, the actual size of VARCHAR that can be accommodated is 65524. Let’s verify:

3.6.3.2 How are row records stored when they exceed the page size

Internal representations of MySQL tables have a maximum row size limit of 65,535 bytes. InnoDB with 4KB, 8KB, 16KB, and 32KB Innodb_page_size Settings, the maximum row size of a table (for data stored locally within a database page) is slightly less than half the page size. If InnoDB rows containing variable-length columns exceed the maximum row size, variable-length columns are selected for external off-page storage.

Variable-length columns are too long to fit on b-tree pages. In this case, variable-length columns are stored on separately allocated disk pages. These pages are called overflow pages, and these columns are called out-of-page columns. The value of the off-page column is stored in a single linked list of overflow pages.

InnoDB storage engine supports four row formats: REDUNDANT, COMPACT, DYNAMIC, and COMPRESSED. For details, see 14.11 InnoDB Row Formats.

COMPACT row format processing

Tables using the COMPACT row format store the first 768 bytes of varied-length column values (VARCHAR, VARBINARY and BLOB and TEXT types) in index records within b-tree nodes, with the rest stored on overflow pages.

If the value of the column is 768 bytes or less, no overflow pages are used, so you can save some I/O.

If 768 bytes have been checked, they are stored as follows:

DYNAMIC row format processing

The DYNAMIC row format provides the same storage features as the COMPACT row format, but with improved storage capability for extremely long variable-length columns and support for large index key prefixes.

InnoDB can store excessively long variable length column values entirely out of the page (for VARCHAR, VARBINARY and BLOB and TEXT types), while clustered index records contain only 20-byte Pointers to overflow pages. Fixed-length fields greater than or equal to 768 bytes are encoded as variable-length fields.

Problem caused by large field in table

If there are too many large variable-length fields in a table, resulting in a row that is too long, and the COMPACT row format is used for the entire time, an insert error may occur.

For example, if the size of a page is 16K, MySQL limits the storage of at least two rows of data per page. In COMPACT, the first 768 bytes of a large variable length field are still stored in the index page. 1024 * 16/2/768 = 10.67, then more than 10 large variable length fields will fail to insert.

At this point you can change row format to: DYNAMIC.

Index of 3.7,

Earlier we looked at the underlying storage structure of InnoDB, which organizes data pages in a B+ tree format. You also learned how to store rows in data pages.

When building a B+ tree index, one or more fields must be selected as the index value. If the index is selected as a primary key, then we call it a clustered index. Otherwise, it is a secondary index.

Why does MySQL use B+ trees?

  • Although hash table can provide O(1) single row data operation performance, but can not support sorting and range search, will lead to full table scan;
  • B trees can store data in non-leaf nodes, but this may result in more I/O operations when querying continuous data;
  • While B+ tree data are stored in leaf nodes, which are interconnected by Pointers to reduce the extra random I/O generated by sequential traversal

Why does MySQL use B+ tree 29

3.7.1 Clustered indexes

Now that we know about the underlying logical storage structure, let’s take a closer look at how InnoDB organizes stored data through B+ trees.

Let’s start with clustered indexes.

Clustered index

InnoDB terminology for primary key indexes.

Let’s create a test table and insert data to construct a B+ tree:

CREATE TABLE t20 (
id int NOT NULL,
a int NOT NULL,
b int,
c int.PRIMARY KEY (`id`)
) ENGINE=InnoDB;

insert into t20 values(20.1.2.1);
insert into t20 values(40.1.2.5);
insert into t20 values(30.3.2.4);
insert into t20 values(50.3.6.2);
insert into t20 values(10.1.1.1);
Copy the code

Insert id out of order; insert id out of order;

This sort is built from the B+ index tree.

We can use this online dynamic demo tool to see the construction process of B+ tree, the end result is as follows:

The actual model stored in the database varies with the page size. In order to simplify the model, we explain the data storage structure according to the general model of B+ tree.

Similarly, our data is also organized in this way. In this B+ tree, we build the primary key index and store the complete record under the corresponding page:

Blue is the index page and orange is the data page.

The default size of each page is 16K. If a new row is inserted, it is necessary to apply for a new data page, and then move some data over to realign the B+ tree. This process is called page splitting, and this process affects performance.

Conversely, if InnoDB index page padding factor drops below MERGE_THRESHOLD, which is 50% by default (if not specified), InnoDB tries to shrink the index tree to free pages.

The insertion of auto-increment primary key is in increasing order. Every time records are added, they are appended, which does not involve the moving of records and will not trigger the splitting of leaf nodes. However, the main key of general business fields is often not inserted in order, so the writing cost is high, so we prefer to use auto-increment field as the primary key.

Clustered index considerations

  • When defined in the table abovePRIMARY KEYAfter that, InnoDB uses it as a clustered index. To do this, define one for each of your tablesPRIMARY KEY. If there is no unique and non-empty field or set of columns, add an increment column.
  • If you don’t have a table definitionPRIMARY KEYMySQL will find the first UNIQUE index without null and use it as the clustered index.
  • If the table does not havePRIMARY KEYOr no suitableUNIQUEThe index,InnoDBInternally, a hidden clustered index is generatedGEN_CLUST_INDEX, as the row ID, which is a 6-byte field that increments as data is inserted.

Clustered index lookup

Search for the record with ID =50 according to the index, as shown in the figure below. Search down the B+ tree and finally find the fourth page **. Then load the page into the buffer pool and search for ** through comparison in the cache.

3.7.2 Secondary Indexes

All indexes other than clustered indexes are called secondary indexes (secondary indexes). In InnoDB, each record in the secondary index contains the primary key column for that row and the column specified for the secondary index.

If a record is found in a secondary index, the primary key index ID of the record can be obtained. Then, the primary key index ID can be used to search for specific records in the clustered index. This process is called back table operation.

If the primary key is longer, the secondary index uses more space, so having a short primary key is advantageous.

Let’s add a composite joint index to the table we just created

Add one more field
alter table t20 add column d varchar(20) not null default ' ';
Add a federated index
alter table t20 add index idx_abc(a, b, c);
Copy the code

After the addition, the combined index B+ tree is as follows, where the index key is the combination of the three fields ABC, and the records stored in the index are the primary key ID:

Using index

InnoDB storage engine supports overwriting indexes, which means that the queried records can be retrieved from secondary indexes without having to go back to the table to query the records in the clustered index, thus reducing the amount of I/O operations. The following query uses both the override index idx_abc:

select a, b from t20 where a > 2;
Copy the code

The result is as follows:

The Extra column indicates Using index, and the number of rows scanned is 2. Note: the number of rows scanned here refers to the MySQL executor fetching two records from the engine. The engine may traverse multiple records internally for conditional comparison.

Leftmost matching principle

Since InnoDB is built as an indexed B+ tree, records can be located using the index’s “leftmost prefix”.

That is, not only do all defined fields that use the index go through the index, but you can use the index to speed up retrieval as long as the left-most prefix is satisfied. The leftmost prefix can be the leftmost n field of a federated index.

Using index condition

Index Condition Pushdown (ICP) is an optimization for MySQL to use indexes to retrieve rows from tables.

Push down is a way of leaving the index criteria to the storage engine to determine if the requirements are met, rather than sending the full record back to the MySQL Server layer to determine.

ICP supports range, REF, Eq_REF, and REF_OR_NULL lookups, MyISAM and InnoDB storage engines.

Conditions that reference subqueries cannot be pushed down, triggering conditions cannot be pushed down. For details, see Index Condition Pushdown

If ICP is not used, the storage engine traverses the index to locate rows in the clustered index and returns the results to the MySQL Server layer, which continues to filter rows based on the WHERE condition.

With ICP enabled, if WHERE can evaluate some criteria using only columns in the index, the MySQL Server layer pushes those criteria into the WHERE criteria down to the storage engine. The storage engine then uses index entries to determine index conditions, and only when the conditions are met does it go back to the table to find records and return them to the MySQL Server layer.

The goal of ICP is to reduce the number of rows scanned back to the table, thereby reducing I/O operations. forInnoDBTable, ICP is used for secondary indexes only.

When Using index push-down, Extra in the execution plan prompts Using index condition instead of Using index, because the entire row must be queried back into the table. Using index indicates that an overwrite index is used.

3.8, InnoDB Data Directory

InnoDB Data Directory is stored in the system table space. It mainly contains metadata and is used to track tables, indexes, and table fields. For historical reasons, the metadata in the InnoDB data dictionary is duplicated with the metadata in the.FRm file.

3.9, the Doublewrite Buffer

The Doublewrite Buffer is a storage area, which is 128 pages (2 extans) of InnoDB on the TABLESPACE. The size of the Doublewrite Buffer is 2MB7.

Version differences: Prior to MySQL 8.0.20, the Doublewrite buffer storage was in the InnoDB system table space. As of MySQL 8.0.20, the Doublewrite buffer store is in the Doublewrite file.

This article is written based on MySQL 5.7.

The operating system writes files in 4KB units, so for every InnoDB page written to disk, the operating system needs to write 4 blocks. A partial Page write problem occurs if a system crash occurs while writing to four blocks, resulting in only a partial page write of 16K of data.

InnoDB cannot recover from the redo log because the Fil Trailer (FIL_PAGE_END_LSN) file was damaged.

For this reason, every time InnoDB writes a page to the appropriate location in the data file, it is first written to the double-write buffer. InnoDB writes pages to the final data file only after the buffer is safely flushed to disk.

If the operating system or mysqld process crashes during a page write InnoDB can recover a good copy of the page from the double-write buffer during crash recovery. On recovery, InnoDB scans the double-write buffer and checks the integrity of pages in the data file for each valid page in the buffer.

If system tablespace files (” ibdata files “) are on fusion-IO devices that support atomic write, double-write buffering is automatically disabled and fusion-IO atomic write is used for all data files.

3.10, Redo Log

Redo logs are used for database crash recovery to ensure data integrity.

The redo log consists of two parts:

  • Redo Log Buffer;
  • Redo log files, redo log files on disk by two namesib_logfile0andib_logfile1Physical file representation of.

For data integrity, the redo log must be written to disk before dirty pages are flushed to disk. In addition to data pages, clustered indexes, secondary indexes, and Undo logs are all required to Log redo logs.

3.10.1 Timing of Redo Log writing

In addition to writing Redo logs, we also need to write binlogs. For this purpose, we will briefly introduce binlogs.

3.10.1.1, binlog

Write Binary Log. A binary log is a group of log files. This contains information about data modifications made to the MySQL server instance.

Redo Log is InnoDB engine specific, binlog is MySQL Server implementation, all engines can use.

Redo Log files are written in a circular fashion and run out of space. Binlog files are appended and do not overwrite previous logs.

The main purpose of binlog:

  • Master-slave synchronization, in which the master server sends events contained in binary logs to the slave server, which executes these events to maintain the same data changes as the master server;
  • Some data recovery operations require binary logs to be restored to a backup point.

Binlog is mainly used for master/slave synchronization and data recovery, and Redo Log is mainly used for transaction data integrity, so that InnoDB does not lose data, also known as crash-safe.

Binlog Records two types of logs:

  • Sql-based logging: Events contain SQL statements that make changes to data (insert, add, delete);
  • Row-based logging: Time describing changes to a single row.

Hybrid logging uses statement-based logging by default, but automatically switches to line-based logging as needed.

3.10.1.2 Timing of Redo Log writing

After a brief introduction to binlogs, let’s take a look at the Redo Log writing process.

Suppose we execute an SQL

update t20 set a=10 where id=1;
Copy the code

The execution process is as follows:

3.10.2 How to prevent data loss

When we introduced the Log Buffer, we mentioned that to ensure that data is not lost, we need to do the following:

  • If binlog is enabled, set sync_binlog=1.
  • innodb_flush_log_at_trx_commit=1;
  • Sync_binlog =0: indicates that only write, not fsync, is committed each time.
  • Sync_binlog =1: fsync will be executed every time a transaction is committed.
  • Sync_binlog =N(N>1) : Indicates that every transaction is committed and fsync is performed after N transactions are accumulated.

In the last step of the process, before the commit transaction interface returns to the Server layer, both the binlog cache and the log buffer are fsynced to disk. This ensures that the data will not be lost, even if it crashes. You can also restore data using binlog and redo log. The process is as follows:

The processing flow in disk and memory is as follows:

Where the timing of the fourth step log buffer persistence to disk is:

  • The space occupied by the log buffer is about to reach the limit. Procedureinnodb_log_buffer_sizeHalf the time, the background thread actively writes disk;
  • InnoDB has a thread in the background that flushes the log buffer to disk every second;
  • Because the log buffer is shared by all threads, a redo log written to the log buffer that has not yet been committed is flushed to disk when committed by another transaction thread

Step 5: The time when dirty pages are flushed to disk is:

  • If the system memory is insufficient and dirty pages need to be eliminated, synchronize the dirty pages back to the disk.
  • When MySQL is idle;
  • When MySQL shuts down properly, dirty pages are flushed to disk.

The innodb_max_dirty_pages_pct parameter is the maximum number of dirty pages. The default value is 75%.

Why is the redo log prepare state also written to disk in step 2?

If the redo log is in the prepare state, check whether the corresponding redo log is saved. If the redo log is in the prepare state, the transaction was successfully committed. Use the redo log to restore data.

This allows you to restore a crash using the redo log as follows:

Redo logs are written in a circular fashion as shown in the following figure:

  • When writepos overlaps with checkpoint, logFile is full. The green part is the free space, and the red part is the space where the redo log was written.
  • The checkpoint identifies the currentLSNEach time the system crashes and restarts, the redo log is executed from the current checkpoint. Based on the redo log, check whether the data pages are ok one by one. If any problem occurs, use the redo log to repair it.

LSN Log Sequence Number Indicates the log sequence number. In InnoDB, LSN occupies 8 bytes and monotonically increases.

  • The total amount of redo log writes;
  • Checkpoint locations;
  • Page version;

In addition to the LSN in the redo log, the header of each page also contains the LSN for that page, as we described earlier when we introduced page formats.

The LSN in a page indicates the size of the LSN when the page was last refreshed. 8

3.11, Undo Logs

The redo log records the behavior of transactions and can be used to redo pages, but sometimes food needs to be rolled back and undo logs are required. 15

**InnoDB provides rollback segments. Each rollback segment contains 1024 Undo Log segments. The page with system tablespace offset 5 records the page where all rollback Segment headers are located.

3.11.1 Undo log format

It can be divided into two types according to different behaviors:

insert undo log

Insert undo log: Visible only to the transaction itself, so the insert undo log can be deleted after the transaction is committed without the purge operation;

Insert undo log

next Record the location of an undo log
type_cmpl The undo type is insert or update
*undo_no ID of the record transaction
*table_id Record table object
*len1, col1 Record columns and values
*len2, col2 Record columns and values
. .
start Record the start position of the undo log

T20 table_ID = 10 in transaction 1001

insert into t20(id, a, b, c, d) values(12.2.3.1, "init")
Copy the code

Then an undo log will be generated:

update undo log

Update the undo log: To implement MVCC(described later), update undo logs cannot be deleted immediately after the transaction is committed. You need to add the update undo log to the history list when the transaction is committed. Wait for the Purge thread to make its final deletions.

Update undo log mainly records:

next Record the location of an undo log
type_cmpl The undo type is insert or update
*undo_no Undo Log Number
*table_id Record table object
info_bits
*DATA_TRX_ID The transaction ID
*DATA_ROLL_PTR Rollback Pointers
*len1, i_col1 n_unique_index
*len2, i_col2
.
n_update_fields The following update vector information represents the columns that were sent as a result of the update operation
*pos1, *len1, u_old_col1
*pos2, *len2, u_old_col2
.
n_bytes_below
*pos, *len, col1
*pos, *len, col2
.
start Record the start position of the undo log

Suppose in transaction 1002, execute the following SQL, t20 table_id is 10:

update t20 set d="update1" where id=60;
Copy the code

Then an undo log will be generated:

For each undo log application, a version is rolled back. This is how THE Multi Versioning Concurrency Control (MVCC) works.

We are performing a delete SQL:

delete from t20 where id=60;
Copy the code

The corresponding undo log changes to the following:

As shown above, the actual row record is not deleted immediately, but rather a deleTED_flag bit is recorded in the row record header. The actual deletion will eventually occur when the Purge thread purge the undo log, and the undo log will also be wiped.

3.11.2 Implementation principle of MVCC

As shown in the figure above, MySQL only has one row record, but every time the SQL is executed, the row record is recorded in the undo log format. The undo log is connected by the rollback pointer, so that when we want to go back to a version, we can apply the Undo log and go back to the corresponding version view.

InnoDB supports transaction isolation levels RC(Read Commit) and RR(Repeatable Read), which is implemented through consistent Read View.

When a transaction is started, all active transactions (uncommitted) form a view array. InnoDB uses this view array to determine if row data needs to be unundo to the specified version:

RR Transaction isolation level

Suppose we use the RR transaction isolation level. Let’s look at an example:

In the figure below, assume record A =1 with ID =60

When transaction C starts, the active transactions are shown in yellow below:

If transaction IDDATA_TRX_ID<=11, the maximum transaction IDDATA_TRX_ID can be rolled back using undo. If transaction IDDATA_TRX_ID<=11, the maximum transaction IDDATA_TRX_ID can be undone. If TRX_ID= the current transaction ID, you can also see that you see your changes.

There is one other thing to note:

  • In the RR isolation level, when a transaction updates a transaction, it can only use the current read to obtain the latest version data for update. If the row lock of the current record is occupied by another transaction, it needs to enter the waiting table.
  • At the RC isolation level, each statement execution evaluates the new consistency view.

So let’s analyze the execution flow of the above example:

  • Update (TRX_ID=11); update (TRX_ID=11);
  • Select DATA_TRX_ID=11, undo log =1; select DATA_TRX_ID=11, undo log =1;
  • Transaction B performs update, needs to use the current read, gets the latest record, a=2, then increments 1, finally a=3;
  • Select * from a where id =3; select * from B where id =3;
  • Select (DATA_TRX_ID=11); undo log (DATA_TRX_ID=11);
  • The maximum transaction ID that the new view can see is 14. If transaction B is not committed, transaction C is committed, so a=2:

conclusion

  • Data integrity depends on: redo log
  • The transaction isolation level is implemented by MVCC, and MVCC is implemented by Undo Log
  • I/O performance improvement methods: The buffer pool improves the query efficiency and common index update efficiency, and the log buffer improves the log writing performance
  • The query performance depends on the index. The bottom layer uses page storage. The smaller the field is, the more rows are stored on the page, and the faster the query efficiency is. As a clustered index, the autoincrement field can speed up the insert operation.
  • Fault recovery: double-write buffer, redo log
  • Primary/secondary synchronization: binlog

There is a lot of content in this article, which needs to be combed out after reading it. Finally, you can remember it by referring to this mind map.


The content of this article is almost introduced here, can read here friends are really very patient, thumbs up for you.

This article is written by Arthinking based on relevant technical information and official documentation. It is accurate to ensure that the content is correct. If you find any mistakes or errors, please kindly correct them.

You can follow my blog at itzhai.com for more articles, and I will continue to update on back-end technologies, including THE JVM, Java infrastructure, architectural design, network programming, data structures, databases, algorithms, concurrent programming, distributed systems, and more.

If you feel that you have gained something after reading this article, you can follow my account or like it. The code word is not easy. Your support is the biggest motivation for my writing.

Follow my official account for the latest articles.

More articles

  • JVM series topics: Public accounts send JVMS

Author: Arthinking

Blog links: www.itzhai.com/database/in…

Insight into MySQL’s underlying architecture: walking the line between buffering and disk

Copyright notice: copyright belongs to the author, shall not be reproduced without permission, infringement will be investigated! Please add the public number to contact the author.


References


  1. innodb_data_home_dir. Retrieved from https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_data_home_dir↩
  2. Ib_buffer_pool. Retrieved from https://dev.mysql.com/doc/refman/5.6/en/innodb-preload-buffer-pool.html↩
  3. Ib_logfile0. Retrieved from https://dev.mysql.com/doc/refman/5.7/en/innodb-redo-log.html↩
  4. Ibtmp1. Retrieved from https://dev.mysql.com/doc/refman/5.7/en/innodb-temporary-tablespace.html↩
  5. Db. Opt. Retrieved from https://dev.mysql….↩
  6. Linux Programmer’s Manual – OPEN(2). (2020-02-09). Retrieved from http://man7.org/linux/man-pages/man2/open.2.html↩
  7. Retrieved from http://man7.org/l. [2013]. Spectrum of Retrieved from http://man7.org/l…↩
  8. Retrieved from http://man7.org/l, 2013. The spectrum of spectrum of disease [J].↩
  9. On Disk IO, Part 1: Flavors of IO. medium.com. Retrieved from https://medium.com/databasss/on-disk-io-part-1-flavours-of-io-8e1ace1de017↩
  10. Innodb calls fsync for writes with innodb_flush_method=O_DIRECT. Retrieved from https://bugs.mysql.com/bug.php?id=45892↩
  11. 14.6.3.3 General Tablespaces. Retrieved from https://dev.mysql.com/doc/refman/5.7/en/general-tablespaces.html↩
  12. MYSQL INNODB table compression. (2018-03-09). Retrieved from https://cloud.tencent.com/developer/article/1056453↩
  13. Prefix index, a solution of optimizing index size. (2015-03-03). Retrieved from https://www.cnblogs.com/studyzy/p/4310653.html↩
  14. MySQL Internals Manual – innodb page structure[EB/OL]. (2020-05-04). Retrieved 2020-0530, from https://dev.mysql.com/doc/internals/en/innodb-page-structure.html↩
  15. Official.MySQL Internals manual-Innodb Record Structure [EB/OL]. (2020-…↩
  16. Jiang Chengyao. MySQL Technology Insider-InnoDB Storage Engine 2nd edition [M]. China Machine Press, 2013-5:104.↩
  17. Why the MySQL using B + tree. Draveness. Me. (2019-12-11). Retrieved from https://draveness.me/whys-the-design-mysql-b-plus-tree/↩
  18. Retrieved from The Spectrum of InnoDB DoubleWrite Buffer as Read Cache https://www.usenix.org/legacy/events/fast12/poster_descriptions/Kangdescription2-12-12.pdf↩
  19. Jiang Chengyao. MySQL Technology Insider-InnoDB Storage Engine 2nd edition [M]. China Machine Press, 2013-5:302-303.↩
  20. Jiang Chengyao. MySQL Technology Insider-InnoDB Storage Engine 2nd edition [M]. China Machine Press, 2013-5:306.↩