This series of articles is a translation of the InnoDB series from Jeremy Cole’s Blog. This paper is the fourth of 16 articles. Page Management in InnoDB Space Files
Due to the limited amount of translation, in order to avoid misunderstanding to the reader, some proper nouns will be followed by [] marking the original text.
Page management in InnoDB space file
In On Learning InnoDB: A Journey to the Core, I described the project Innodb_Diagrams for documenting the internal structure of InnoDB. The diagrams used in this paper can be found in this project.
The basic structure of space and The basic structure of each page are described in The article “Basics of InnoDB Space File Layout”. Next, we will describe in detail The structure related to page management, section management, and free space management in InnoDB. And how it tracks the usage of pages assigned to different purposes.
Extents and extent descriptors
As described earlier, InnoDB pages are typically 16KB and grouped into 1MB blocks, each containing 64 contiguous pages, known as “extences”. InnoDB allocates FSP_HDR and XDES pages at fixed locations within the space to keep track of which extents are in use and which pages within each extent are in use. The structure of these pages is very simple:
They contain the generic FIL Header and FIL Trailer, an FSP Header (discussed later), 256 “extent descriptors,” and a lot of unused space.
An extent descriptor has the following structure:
The purpose of each field in the extent descriptor is as follows:
File Segment ID(8)
, the file sectionID
: If an extent belongs to a file segment, this field indicates the extent of the file segment to which the extent belongsID
List node for XDES list(12)
.XDES
List node of a list: A pointer to one extent up and the next extent in the extent descriptor listState(4)
, extents’ current state: Only four values are currently defined:FREE
.FREE_FRAG
.FULL_FRAG
These three states indicate that the extent belongs toFREE
List /FREE_FRAGE
List /FULL_FRAGE
List),FSEG
This state indicates that the extent belongs to a file segment, and the file segment ID is stored inXDES Entry
The structure of theFile segment ID
In the field)Page State Bitmap(16)
, page status bitmap: Allocated for each page in this sectiontwo
Bitmap (64
x2
=128
position16
Bytes). The first digit indicates whether the page is free; The second bit is reserved to indicate whether the page is clean [clean
] (clean means there is no dirty data that has not been flushed to disk), but this bit is currently not used and is always set to1
Other structures that reference extents are implemented using a combination of the page number of the FSP_HDR page (or XDES page) on which the extent descriptor is located and the byte offset within the page of the extent descriptor entry itself. For example, page 0 offset 150 refers to the first extent in the space and contains pages 0-63, while page 16384 offset 270 contains pages 16576-16639.
The translator’s note: Here’s a more detailed calculation for the second example. For page 16384 offset 270, first page 16384 is the first page in the 256th extent (XDES type), so this page records the extent descriptors in the 256-511 extent. Offset 270 refers to the fourth XDES entry on this page (38+112+40×3 = 270), so it describes the 259th section of information, i.e., pages 16576-16639.
Linked list base node and linked list node
Linked lists (InnoDB calls them “free lists”) are a very generic structure that links multiple related structures together. It consists of two complementary structures, “base list node” and “list node”, which together make up a good bidirectional list on disk. The structure of the “base node of the list” is as follows:
“List base nodes” are stored only once in some high-level structure, such as FSP headers. It contains the length of the list and Pointers to the first and last nodes in the list. The structure of a “linked list node” looks very similar, except that instead of storing Pointers to the first and last nodes, a “linked list node” stores Pointers to the previous and next nodes:
All Pointers contain a page number (which must be in the same space) and a byte offset from the page where the linked list node can be found. All Pointers point to the beginning of the list node structure (that is, N+0), not necessarily to the beginning of the structure to which the list node belongs. For example, in a linked list of extent descriptor entries, because the list node is offset by 8 in the XDES entry structure, the code reading the XDES entry must “know” that the XDES structure begins eight bytes before the offset of the list node, and start reading the structure from there. (It might be a good idea to make sure that list nodes are first in any structure, but they’re not.)
File space header and extent list
In addition to storing the extent descriptor entry itself, the FSP_HDR page (which is always page 0 in space) also stores the FSP Header structure, which contains a large number of linked lists, which is why I didn’t describe this structure earlier. The structure of the FSP Header is as follows:
Non-list-related fields in the FSP Header (out of order) :
Space ID(4)
: Indicates the ID of the current spaceHighest page number in the space (size)(4)
:size
Represents the highest valid page number and increases as the file grows. However, not all pages assigned with page numbers are initialized (some may be zero-filled) because space expansion is a multi-step processHighest page number initialized (free limit)(4)
.free limit
Is that allFIL header
The maximum number of pages in all pages that have been initialized, which involves storing the page number in the page itself.free limit
Is always less than or equal tosize
.Flags(4)
: Stores space-related tagsNext Unused Segment ID(8)
: The file segment that will be used in the next allocated file segmentID
. (This is an auto-incrementing integer.)Number of pages used in the FREE_FRAG list
: This field is stored for performance optimization purposes so that it can be computed quicklyFREE_FRAG
The sum of free pages for all extents in the list, without traversing all extents in the list and summing up the free pages available in each extent.
The list base nodes for the following list of extent descriptors are also stored in the FSP Header:
FREE_FRAG
Linked list: The extents in this list are all pages that are free and can be allocated in a “fragmented” manner, meaning that individual pages can be allocated according to different needs, rather than the entire extent. For example, each hasFSP_HDR
The page orXDES
Sections of the page will be placed inFREE_FRAG
List so that the remaining free pages in the extent can be allocated for other purposes.FULL_FRAG
List: withFREE_FRAG
Exactly the same, but the linked list holds extentswith no remaining pages available. whenFREE_FRAGE
When an extent in a linked list becomes full (all pages are used), the extent is removed fromFREE_FRAG
The list moves toFULL_FRAG
Linked list, if any pages are released, these sections are moved backFREE_FRAG
Linked lists, because they’re no longer full.FREE
Linked list: The extents in the list are all completely unused extents and can be used to form a whole by the entire extent.FREE
Extents can be assigned to a file segment (and placed in the appropriateINODE
List), or move toFREE_FRAG
The linked list is allocated according to the “fragmentation” mode of page use.
File segments and inodes
File segments and inodes are perhaps the most obscure areas of InnoDB terminology and documentation. InnoDB overloads the common file system term INODE, using it for INODE entries (single small structures) and INODE pages (pages containing many INODE entries). Naming confusion aside, an INODE entry in InnoDB describes only a single file segment (usually called an FSEG), which we’ll refer to as a “file segment INODE” for the rest of this article. The INODE page that contains this information has the following structure:
Each INODE page contains 85 file segment INODE entries (for a 16KB page), each of which is 192 bytes. In addition, each INODE page contains a linked list node that can be used to form the following two linked lists of INODE pages:
FREE_INODES
: At least one is freeINODE
The entry ofINODE
A linked list of pages.FULL_INODES
: No leisureINODE
The entry ofINODE
A linked list of pages. When using”file per table
“Type table space, the list in each table space will be empty unless the table has more than one42
Because each index consumes exactly two file segmentsINODE
Entries.
The base node of the INODE page list is stored in the FSP Header structure of the FSP_HDR page mentioned above.
A file segment INODE entry has the following structure:
The non-list fields of a file segment INODE entry are used for the following purposes:
File Segment ID
, the file sectionID
: The file segmentINODE
The file segment of the entry description (FSEG
)ID
. ifID
为0
, the entry is not used.Magic Number
, magic number: fixed storage97937874
As the file segmentINODE
The flag that the entry was properly initialized.Number of used pages in "NOT_FULL" list
.NOT_FULL
Number of pages already used in the list: with spaceFREE_FRAG
Linked list (saved inFSP Header
Similarly, this field stores isNOT_FULL
The number of pages that have been used in the list so that you can quickly calculate the number of free pages in the list without having to traverse and sum all the sections in the list.Fragement Array
, fragment page array: from spaceFREE_FRAG
List orFULL_FRAG
In the Fragments section of a linked list32
An array of page numbers for each individually allocated page. Once this array becomes full, only full extents can be allocated to file segments.
As the table grows, InnoDB first allocates individual pages in each file segment until the array of fragmented pages in the file segment becomes full, then changes to allocating one extent at a time, and finally allocating four at a time.
Each file segment INODE entry also contains the base node of the extent descriptor list:
FREE
Linked list: Contains the completely free extents allocated to this file segment.NOT_FULL
Linked list: Contains the extent assigned to this file segment and has at least one used page. When the last free page is used, the section is moved toFULL
Chain in the table.FULL
Linked list: Contains extences allocated to this file segment that have no free pages. If a page in one of the sections becomes free, the section moves toNOT_FULL
In the list.
If the last used page in an extent of the NOT_FULL list is freed, the extent can be moved to the FREE list of the file segment, but in fact the extent is directly moved back to the FREE list of the space.
How does the index use segments
While we haven’t covered the INDEX page yet, it’s time to take a look at one small aspect of it. The root page of each index contains Pointers to file segment INODE entries that describe the file segments used by the index. Each index uses one file segment for leaf pages and one file segment for non-leaf (internal) pages. This information is stored in the FSEG Header structure (which is a structure in the INDEX page) :
The Space ids stored there are a bit redundant — they will always be the same as the current Space. The Page Number and Offset point to a file segment INODE entry in the INODE Page. These two file segments will always exist, even though they may be completely empty.
For example, in a newly created table, the only page that exists is the root page, which is also a leaf node page but exists in an “internal” file segment (so that you don’t have to move it when you add data). The INODE list and fragment page array for the “leaf” file segment are both empty. The INODE list for the “internal” file segment is also empty, and the only allocated root page exists in the fragmented page array.
Put it all together
The following diagram links the entire multilevel structure of the index together:
The index root page points to two file segments, each of which has an array of shard pages (pointing to individual pages in the shard page list, up to 32), and a list of several full extents-linked together by a list node pointer in the extent descriptor. The extent descriptor is used to reference the extent and to track the free pages within the extent. Very simple!
What’s next
Next, we’ll look at the structure of INDEX pages, one of the most important page types in InnoDB, from the user’s perspective. Then we’ll look at how InnoDB builds indexes on a macro level.