Mysql — InnoDB storage engine AHI

Introduction to the

From the perspective of InnoDB logical storage structure, all InnoDB data is stored in a space, called table space. As shown, a table space consists of segments, extents, and pages.

Table space

Table Spaces can be thought of as the outermost logical structure of InnoDB storage engine. Mysql InnoDB storage engine architecture has been introduced in previous articles. Table Spaces are divided into system table Spaces, independent table Spaces, regular table Spaces, undo independent table Spaces, and shared temporary table Spaces.

Period of

A table space is composed of segments. Common segments include data segments, index segments, and rollback segments.

  • Data segment: leaf node of B+ tree;
  • Index segment: the middle node of the B+ tree;
  • Rollback segment: Records the undo log.

area

Sections are made up of extents, and extents are made up of multiple pages, each of which is 1M in size in any case. To ensure the continuity of pages in the extents, InnoDB storage engine requests space by extents (4~5 extents from disk at a time). By default, the page size in InnoDB is equal to 16KB, that is, 64 consecutive pages in a range.

Of course, InnoDB storage engine in Mysql 5.7 supports page compression, and the size of each page can be 2K, 4K, 8K, 16K, so the number of pages per section is between 64 and 512.

If innodb_file_per_table is enabled, the default table size is 96K. If innodb_file_per_table is enabled, the default table size is 96K. If innodb_file_per_table is enabled, the default table size is 96K.

In fact, 32 shard pages are used to store data at the beginning of each segment (i.e., 6 shard pages are used by default when creating a table), and 64 consecutive pages are applied when 32 shard pages are used up. For some small tables, you can start with less space to save disk space.

page

Pages are the smallest unit of disk management in InnoDB. By default, the size of pages in InnoDB is equal to 16KB. Innodb_page_size can be set to 4K, 8K, or 16K at the beginning of the database, but once the database table is created, it cannot be changed later. Common page types are:

  • Index, data page (B+Tree Node);
  • Undo Page (undo log Page);
  • System Page;
  • Transaction System Page;
  • Insert Buffer Bitmap;
  • Insert Buffer Free List;
  • Uncompressed BLOB Page;
  • Compressed binary large object Page (Compressed BLOB Page).

Index, data page structure

Nodes in the InnoDB storage engine B+ tree. InnoDB’s data page consists of the following 7 sections:

  • File Header: 38 bytes, used to record the Header information of a page. It consists of eight parts.
The name of the Space (bytes) instructions
FIL_PAGE_SPACE_OR_CHKSUM 4 Represents the checksum value for the page
FIL_PAGE_OFFSET 4 The offset value of a table space page, which is the position of the page on all pages of a table space
FIL_PAGE_PREV 4 The location of the previous page of the current page
FIL_PAGE_NEXT 4 The location of the next page of the current page
FIL_PAGE_LSN 8 The log sequence location LSN of the page that was last modified
FIL_PAGE_TYPE 2 The type of page
FIL_PAGE_FILE_FLUSH_LSN 8 “The file has been flushed to disk, at least up to this LSN” is only valid on the first page of the file
FIL_PAGE_ARCH_LOG_NO_OR_SPACE_ID 4 Which table space the page belongs to

There are several main types of pages:

The name of the hexadecimal instructions
FIL_PAGE_TYPE_ALLOCATED 0x000 This page is the latest assignment
FIL_PAGE_UNDO_LOG 0x002 Undo the log on page
FIL_PAGE_INODE 0x003 The index node
FIL_PAGE_IBUF_FREE_LIST 0x004 Insert buffer free list
FIL_PAGE_IBUF_BITMAP 0x005 Insert buffer bitmap
FIL_PAGE_TYPE_SYS 0x006 The system page
FIL_PAGE_TYPE_TRX_SYS 0x007 Transaction system data
FIL_PAGE_TYPE_FSP_HDR 0x008 File Space Header
FIL_PAGE_TYPE_XDES 0x009 Extended Description page
FIL_PAGE_TYPE_BLOB 0x00A Page BLOB
FIL_PAGE_INDEX 0x45BF B+ leaf node
  • Page Header: 56 bytes, used to record the status information of the data Page. It consists of 14 parts.
The name of the Space (bytes) instructions
PAGE_N_DIR_SLOTS 2 Represents the number of slots for a page directory
PAGE_HEAP_TOP 2 A pointer to the first record in the heap, which is stored on the page according to the heap form
PAGE_N_HEAP 2 Number of records in the heap. Bit 15 indicates the row record format
PAGE_FREE 2 Header pointer to a reusable space
PAGE_GARBAGE 2 Number of bytes of deleted records
PAGE_LAST_INSERT 2 The position where the record was last inserted
PAGE_DIRECTION 2 Last insertion direction:

PAGE_LEFT(0x01)

PAGE_RIGHT(0x02)

PAGE_SMAE_REC(0x03)

PAGE_SMAE_PAGE(0x04)

PAGE_NO_DIRECTION(0x05)
PAGE_N_DIRECTION 2 The number of consecutive inserts in one direction
PAGE_N_RECS 2 The number of records on the page
PAGE_MAX_TRX_ID 8 Change the maximum transaction ID for the current page, noting that this value is defined only in the secondary index
PAGE_LEVEL 2 The number of levels the current page is in the index tree
PAGE_INDEX_ID 8 Index ID, representing the index to which the page belongs
PAGE_BTR_SEG_LEAF 10 B+ The segment header of the data page that does not contain a leaf node
PAGE_BTR_SEG_TOP 10 B+ The segment header of the tree data page
  • Infimun and Supremum Records: Two virtual row Records that belong to each page of the InnoDB storage engine and are used to define record boundaries. Infimun: lower bound, representing a value less than any primary key value; Supremum: upper bound, indicating a value greater than any primary key value; These two values are created when the page is created and will not be deleted in any case.
  • User Records: Row Records actually stored by the User.
  • Free Space: The Free Space of a page.
  • Page Directory: Pointers to places where records are stored. These Pointers can be called Slots. InnoDB does not have a Slot for every record, but a sparse directory, meaning that a Slot can contain multiple records. In Slots, records are stored in order of index key value, so that the corresponding Slot can be quickly found using a binary lookup tree. When searching for data, the user first finds the corresponding Page through the B+ tree, and then loads the Page into the memory. Then, the user performs binary search through the Page Directory to find the corresponding Slot. Then, the user performs next_record through the record pointed to by the Slot to search for the corresponding record.

  • File Trailer: 8 bytes, in order to check whether the page has been written to disk completely (such as possible disk damage during writing, machine shutdown, etc.).
The name of the Space (bytes) instructions
FILE_PAGE_END_LSN 8 The first 4 bytes represent the checksum value of the page; The last four bytes are the same as the FIL_PAGE_LSN of the File Header. Compare these two values to the FIL_PAGE_SPACE_OR_CHKSUM and FIL_PAGE_LSN of the File Header to see if they are consistent to ensure page integrity.

line

In InnoDB storage engine, tables are organized and stored according to the order of primary keys, that is, primary keys are used as index keys. Therefore, tables in InnoDB storage engine are actually B+Tree. The nodes of the Tree store indexes and data pages, among which leaf nodes store data and non-leaf nodes store indexes. Each leaf node contains multiple rows of data.

InnoDB storage engine has strict requirements on the storage of data per page. It allows a maximum of 16KB/ 2-200 = 7992 rows of data, and requires a minimum of 2 rows of data.

Line record format

InnoDB storage engine provides four formats to store row records, including Redundant, Compact (Mysql5.1 default), Dynamic (Mysql5.7 default), and Compressedshow table status;View the row record format in the table.

Compact row record format

The figure above shows the Compact row record format, which consists of the following sections:

  • Variable-length field list: Stores the length of variable-length fields in reverse order of column order; If the length of the column is <255, it is represented by 1 byte; Otherwise, 2 bytes are used (note that the maximum length of VARCHAR is 65535);

  • NULL flag bit: indicates whether the line change data has empty value. If yes, it is represented by 1, which occupies 1 byte.

  • Record header information: fixed 5 bytes. The storage format is as follows:

    The name of the Space (bit) instructions
    (a) 1 The unknown
    (a) 1 The unknown
    delete_flag 1 Whether the line is deleted
    min_rec_flag 1 If equal to 1, the record is pre-defined as the smallest record
    n_owned 4 The number of records owned by the record
    heap_no 13 Index the sort record of this record in the heap
    record_type 3 Record type:

    000: common

    001 indicates the node pointer in the B+ tree

    010 said Infimum

    011 said Supremun

    1XX indicates reserved
    next_record 16 The relative position of the next record in a page
  • Actually store the data for each column;

Note that if the value of a column is NULL, no space is occupied (except for the NULL flag bit, which is 1).

Each row of data has two hidden columns in addition to user-defined ones: the transaction ID column (6 bytes) and the rollback pointer column (7 bytes). If the table does not define a primary key, a ROWID column is added to each row.

Redundant row record format

Redundant is the row recording format prior to Mysql5.0. The figure above shows the Redundant row record format, which consists of the following sections:

  • Field length offset list: stores the offset of all field lengths in reverse order. For example, 23 20 16 14 13 0c 06 field length offset list. 06 indicates the length of the first column = 6,0 c-06= the length of the second column, 13-0c= the length of the third column, and so on. If the length of the column is <255, it is represented by 1 byte; Otherwise, 2 bytes are used (note that the maximum length of VARCHAR is 65535);

  • Record header information: 6 bytes, stored in the following format:

    The name of the Space (bit) instructions
    (a) 1 The unknown
    (a) 1 The unknown
    delete_flag 1 Whether the line is deleted
    min_rec_flag 1 If equal to 1, the record is pre-defined as the smallest record
    n_owned 4 The number of records owned by the record
    heap_no 13 Index the sort record of this record in the heap
    n_fileds 10 The number of columns in the record
    1byte_offs_flag 1 Is the offset list 1 byte or 2 bytes
    next_record 16 The relative position of the next record in a page
  • Actually store the data for each column;

Mysql Varchar can store up to 65535 bytes (TEXT, BLOB, BLOB, BLOB, BLOB, BLOB, BLOB, BLOB, BLOB, BLOB, BLOB). Do they contradict each other?

It actually contradicts each other, but mysql introducesLine overflowTo solve this problem.In the Compact and Redundant row record formats, the data page retains only the first 768 bytes of prefix data followed by an offset pointer to the row overflow page.

② When will release overflow be hit?

A data page must contain at least 2 rows, so InnoDB storage engine will automatically store rows to overflow pages if the current page contains only one record.

③ Are TEXT or blobs always stored in overflow pages?

Follow step ② above if you want to save it in an overflow page.

④Varchar can store up to 65535 bytes. Does that mean the length of a single column?

No, it means that the total length of all VARCHAR columns in a row cannot exceed 65535 bytes.

⑤ Does n in Varchar(n) or Char(n) refer to the number of bytes or the length of a character?

Refers to the character length. Different character sets affect the column length, such as defining the maximum length of Varchar, lantin can define Varchar(65535), and UTF-8 (3 bytes) can only define Varchar(65535/3). For char(n), the different character set lengths are not fixed and InnoDB storage engine internally treats this as a variable-length character type.

Dynamic Line record format

InnoDB version 1.0X is introducing a new file format, formerly Compact and Redundant called Antelope file format, and a new Compressed and Dynamic file format called Baracuda.

The Dynamic row format provides the same storage features as the Compact row format, but full row overflow for long variable length column values (for VARCHAR, VARBINARY and BLOB and TEXT types). As shown in the figure, a large variable-length column value in a row of the data page holds only 20 bytes of Pointers, and the actual data is stored in the external overflow page.

Compressed record format

Compressed row format provides the same storage features and functions as Dynamic row format, but adds support for table and index data compression (using the Zlib algorithm compression).