This series of articles is a translation of the InnoDB series from Jeremy Cole’s Blog. This paper is the third of 16 articles. The basics of InnoDB Space File layout

Due to the limited amount of translation, in order to avoid misunderstanding to the reader, some proper nouns will be followed by [] marking the original text.

InnoDB space file layout basics

In On Learning InnoDB: A Journey to the Core, I described the project Innodb_Diagrams for documenting the internal structure of InnoDB. The diagrams used in this paper can be found in this project.

InnoDB’s data storage model is called “space”, which in MySQL context is often referred to as “table space”, and in InnoDB itself is sometimes referred to as “file space”. A space may consist of multiple physical files (such as IBDatA1, IBDatA2, etc.) at the operating system level, but they are logically treated as one file, and multiple physical files are treated as connected.

InnoDB allocates a 32-bit integer space ID[space ID] for each space, which is needed in many places to reference the space. InnoDB has a “system space” whose space ID is fixed to 0. The system space is used to keep all the special bookkeeping that InnoDB needs. In MySQL, in addition to system space, InnoDB currently only supports Spaces in the form of File per table, which creates an.ibd file for each table. Inside InnoDB, this.ibd file is actually a full-featured space that can contain multiple tables, but in the MySQL implementation, an.idb file contains only one table.

page

Each space is divided into a number of pages, typically each of which is 16KB in size (this page size may vary due to a compile-time change in the UNIV_PAGE_SIZE definition, or InnoDB compression). InnoDB allocates a 32-bit integer page number for each page in the space, often called an “offset,” which is really just the page’s offset from the beginning of the space (because it’s not necessarily the offset from the beginning of the file for a space consisting of multiple files). So page 0 is at offset 0, page 1 is at offset 16384, and so on. (Some readers may know that InnoDB has a data size limit of 64TB, which is actually a per-space size limit, mainly because the page number is a 32-bit integer, combined with the default page size of 16KB: 232 x 16KB = 64TB)

In this case, the size limit of InnoDB is also dependent on the page size. The maximum tablespace limit is 232 x page size. The official documentation also explains this: 14.23 InnoDB Limits

The basic layout of a page is as follows:

Each page has a 38-byte FIL Header and 8-byte FIL Trailer (FIL is short for “File”). The FILE Header contains a field that indicates the page type, and different types of pages have different structures for the rest of the page. The detailed structure of the FIL Header and FIL Trailer is as follows:

  • Page Type(2): The page type is stored inFIL HeaderIn the. Because pages are used for file space management, extent management, transaction systems, data dictionaries, rollback logs [undo log], binary large objects [blobs], and indexes (that is, table data). Pages for different purposes may have different structures, so you must have the page type field in order to parse page data.
  • Space ID(4): Indicates the ID of the space to which the current page belongs.
  • Offset(Page Number)(4): the page number. When the page is initialized, the page number is stored inFIL HeaderIn the. By comparing the page number read from this field to the page number based on file offsets, you can verify that the page is read correctly, and that the field being initialized also indicates that the page has been initialized.
  • Checksum(4) & Old-style Checksum(4)A:32The bit checksum is stored inFIL Header, an old format [older format[and is damaged]32The bit checksum is stored inFIL TrailerIn the. This old format checksum may be deprecated in the future, and its space may be reclaimed
  • Previous Page(4) & Next Page(4): Logical previous and next page Pointers for pages of the same type are stored inFIL HeaderIn the. These two Pointers are used to build a two-way linked list, which is used to bring the same levelINDEXType page linking, which is very effective when performing range scans, such as full index scans. But many types of pages don’t actually use either field.
  • LSN for last page modification(8) & Low 32 bits of LSN(4): Indicates the sequence number of the last modified log on the pageLSN, 64-bit integer) is stored inFIL HeaderAnd the sameLSNThe low32Bits are stored inFIL TrailerIn the.
  • Flush LSN(8)A:64bit"Flush LSN"Fields are stored inFIL HeaderIn fact, in the whole system there is only0Room no.0This field records the highest number of refreshes to the page in the entire system (all space)LSN. This field is well suited for other uses in the rest of the space.

The translator’s note:

  • The checksum isInnoDBA mechanism used to verify that a page read into memory from disk is corrupt. Calculate the checksum when writing the page. Calculate the checksum again when reading the page. If the checksum is the same as the one stored on the page, the page is not damaged.
  • InnoDB has three validation algorithms:crc32,innodb,none, through system variablesinnodb_checksum_algorithmTo specify the difference between them, refer to the official documentation:innodb_checksum_algorithm
  • When the check algorithm iscrc32andnoneWhen,FIL HeaderFIL TrailerThe values of the two checksum fields in theinnodbWhen the two checksum fields are different.

Space file

A spatial file is made up of many pages (up to 232) linked together. To improve management efficiency, these pages are divided into 1 MB blocks (or 64 consecutive pages for a 16KB page size) called “extences.” Many structures allocate pages within space in units of “extents” only.

InnoDB needs to use some bookkeeping information to keep track of all pages, sections, and the space itself, so there are some required “super-structures” in the space file:

The first page in the Space (page 0) is fixed as the FSP_HDR (short for File Space Header) page. The FSP_HDR page contains an FSP Header structure (the name can be confusing) that tracks space size along with free extentlists, fragmented extentlists, and full extentlists, etc. (Free space management will be discussed in more detail in a future article)

An FSP_HDR page has enough internal space to store only 256 extent-level (16,384 pages, 256 MB) bookkeeping information, so you need to use extra space every 16,384 pages to store XDES pages for logging extent-level bookkeeping information. The structure of an XDES page is the same as that of an FSP_HDR page. The only difference is that in an XDES page, the FSP Header structure is set to zero and not used. As the space file grows, these additional pages are automatically allocated.

The third page in the space (page 2) is an INODE page that stores lists related to file segments (a file segment can be understood as an array of groups of extents plus individually allocated “fragmented” pages). Each INODE page can store 85 INODE entries, and each index requires two INODE entries. (The contents of INODE entries and file segments are discussed in more detail in a future article.)

In addition to each FSP_HDR page and each XDES page, there are IBUF_BITMAP pages for insert buffer-related bookkeeping information, which is beyond the scope of this article.

Insert buffers, also known as Change buffers, are used to speed up changes to data, such as INSEERT, DELETE, and UPDATE. Official document: 14.5.2 Change Buffer

System space

In InnoDB, the system space (space ID 0) is special. It contains a number of pages allocated on a fixed page number and is used to store a large amount of information critical to InnoDB operations. System space is just like any other space in that its first three pages are allocated FSP_HDR, IBUF_BITMAP, and INODE pages. In addition, it has some other special pages:

Idbata1 is the name of the InnoDB system tablespace on disk

The following pages are allocated in the system tablespace:

  • 3, the type isSYS: Header and bookkeeping information related to insert buffering
  • 4, the type isINDEX: The root page of the index tree used to insert buffers
  • 5, the type isTRX_SYS: the storage andInnoDBTransaction Information about the operation of the system, such as the latest transactionID,MySQL Bin LogRelated information as wellDouble Write BufferThe location of the section
  • 6, the type isSYS: The page of the first rollback segment.InnoDBAdditional pages (or entire extents) are allocated as needed to store the rollback segment data
  • 7, the type isSYS: Header information associated with the data dictionary, including the root page number of the indexes that make up the data dictionary. This information enables you to find any other indexes because their root page numbers are stored in the data dictionary
  • 64127Page:Double Write BufferIn the first block (64One page, one section).Double Write BufferInnoDBPart of the crash recovery mechanism
  • 128191Page:Double Write BufferThe second block of

In addition to these special pages, other pages are assigned as needed to indexes, rollback segments, undo logs, and so on.

Separate tablespace files

InnoDB provides a File per Table feature that creates a file for each MySQL table created. As mentioned earlier, this actually creates one space per table, so a better name for this feature would be “space per table” rather than “File per table.” The.idb files created for each table have the typical spatial file structure:

After the necessary three initial pages, the next pages allocated in the space are the root pages for each index in the table, sorted in the order they were defined when the table was created (not taking into account the fast Index creation feature involved in adding indexes after the table was created). Page 3 is the root page of the cluster index, page 4 is the root page of the first secondary index, and so on.

Fast Index Creation is a new feature introduced in MySQL 5.5. It is designed to speed up the creation and deletion of secondary indexes. After creating a table, add or delete a secondary index without rebuilding the entire table. In MySQL 5.6, this feature became more general, and more kinds of ALERT TABLE operations can be performed without copying the entire TABLE or blocking DML operations, so in MySQL 5.6 this feature is called Online DDL. Related documents:

  1. Chapter 2. Fast Index Creation in the InnoDB Storage Engine
  2. Fast Index Creation
  3. 14.13 InnoDB and Online DDL

Since most of InnoDB’s bookkeeping structures are stored in system space, most pages allocated in separate table Spaces are indeet-type pages for storing table data.

What’s next

Next, we’ll look at free space management in InnoDB: extent descriptors, file segments (inodes), and a series of linked list structures.