An overview,

  • Usually only know Innodb is a B+ tree, for its disk on the specific form of storage, has not been very deep understanding, today I read a blog, do a record

On page 2,

  • The size of a disk or a file is also very significant,innodbDividing files into equal-sized chunks of storage, also known aspage;

  • The default page size is 16K, and disk reads are 4K, which means at least four blocks should be read at a time.
  • On the other hand, because the concept of cache Page chache exists, so a read tends to pull some data before and after the memory

Three, area

  • CPU in the use of data, the next step will also use a large probability of logical adjacent data. So to improve the performance of data reads, InnoDB stores data that are logically adjacent to each other in physical pages as much as possible. To achieve this, Innodb introduces the concept of extents/clusters;

  • By default 64 consecutive pages are an extent exactly equal to 1M= 64 * 16K

Four, period

  • In InnoDB, the management information that records the storage space status of the related area is called segment entity. The sum of the areas managed by segment entity is called segment. The purpose of segment is to manage the usage of the area and provide the storage state of the space when allocating space for data.
  • InnoDB allocates two segments for each index, one for the non-leaf nodes of the B+ tree (the index segment) and the other for the leaf nodes (the data segment).

1, the summary

  • In short,
    • The page concept is used for minimal disk IO
    • The concept of a zone is used to make logically continuous data as physically continuous as possible
    • The concept of segments is used to manage the usage of areas,

5. Overall architecture

  • Index data and business row data have different data structures, so they are stored separately. The index data of non-leaf nodes is stored in one segment, and the business data of leaf nodes is stored in another segment. Correspondingly, they are also stored in areas and pages of different structures.

1. Logical structure

2. Physical structure

The overall architecture

6. Relevant details

page

  • Any page consists of a header, a body, and a header.

  • The default page size is 16kB and can be changed

    • Page = 4,6,16 KB; Area = 1 MB
    • Page = 32KB and 64KB time range = 2 or 4MB
  • By default, a page is 16KB, and the amount of pointer data for segments and extents is small, so only partial header information is needed to maintain it. Most of the remaining space is used to store partial partitioned entity information owned by the current tablespace.

When a new record is inserted into the InnoDB clustered Index, InnoDB reserves 1/16 of the page for future insertion or update of the index record.

  • For indexes that are written sequentially (either incrementing or declinating, whichever is the order), the index leaves can be up to 15/16 full.

  • With random index write behavior, the leaves are only 1/2 to 15/16 full. When a leaf is filled less than half full, or removed less than half full, Innodb will shorten the index tree in an attempt to free the leaf so that data can be written to it.

  • File Header

    • The File Header in a Page holds two Pointers, one to the previous Page and one to the next. The Header also contains the Page’s type information and the number that uniquely identifies the Page. From these two Pointers, you can imagine that the Page is linked to a two-way list structure.

  • User Record
  • Both the data and the index are stored in the User Records section of the Page.
    • User Records consist of one Record after another.
    • Within a Page, a single linked list is represented by two virtual records with fixed contents,
      • The string “Infimum” represents the beginning,
      • “Supremum” represents the end and does not itself store data.
    • When indexes are first created, InnoDB automatically sets them in the root page and never deletes them. The two Records that represent the beginning and end are stored in System Records, which are parallel to User Records.
  • InnoDB records exist in four different nodes, which are
    • 1 Non-leaf nodes of the primary key index tree
    • 2 Primary key Index tree leaf node
    • 3 Auxiliary keys Non-leaf nodes of the index tree
    • 4 Secondary keys Leaf node of the index tree.
  • The Record format of these four nodes is somewhat different, but they all store the Next pointer to the Next Record.
  • User records exist on a Page as a single linked list. As new data is inserted and old data is deleted, the physical order of the data becomes chaotic, but they remain in logical order (Next pointer). A linked list of available Spaces can be obtained by concatenating the next Record in the record header.

  • Note here that the B+ tree index does not find a specific row for a key value. B + tree indexOnly the page of the row being searched can be foundAnd then the database gets the result by reading the page into memory and looking it up in memory.
  • A single linked list is traversed from the “Infimum” node in the Page (this traversal is often optimized), and returns successful if the key is found. If the record reaches “supremum”, there is no suitable key in the current Page. In this case, skip to the Next Page using the Next Page pointer of the Page and continue searching from “Infimum”.

How does page-like data stay in order?

  • Innodb page records are already maintained by a linked list that increases from the head of the chain to the end of the chain (common method 2) without binary search.

  • The records pointed to by slots are still in order from right to left. We use rec[S2] to represent the records pointed to by S2, so rec[S2]>rec[S1]. In addition, the record between two slots is called a slot branch chain in this paper. The part circled by the dotted line in Figure 3 represents the slot branch chain to which S2 points. Innodb maintains Slot branch chain heights of 4-8, if the height of one branch chain exceeds or falls short, it will result in the corresponding branch chain split and merge operation.

How to query?

If you are querying record R1 in the page (assume that the index key is unique for simplicity). First, locate Slot number X through binary search, and satisfy

rec[X-1]< r1 <= rec[X]

Then record R1 either does not exist or is in the Slot branch chain X. Then it’s a matter of traversing the branch chain to find the real record. But the search of branch chain can only be traversed one by one, can not use binary search.

How do I insert?

Firstly, the inserted Slot branch chain and insertion position are determined by means of query, and the space is obtained in the free space linked list or unallocated space and the record content is written. The height of the Slot branch chain is increased by 1, and the relationship of the original linked list is maintained.

After the record is inserted, if the height of the Slot branch chain exceeds 8, then the branch chain is split into two sub-chains and one more Slot is applied at the same time (translation of this Slot and the space behind it).

In terms of access efficiency, binary search is not complete, so binary search + sequential traversal is needed

area

  • Extents/clusters themselves are not numbered, but like pages, extents/clusters are allocated consecutively, starting with the first byte of the file. At the same time, the first page of the first partition of every 256 extents/clusters is the index page of the 256 extents/clusters, that is, the XDES page.

  • In order to ensure the continuity of pages in the extones, the InnoDB storage engine applies 4 to 5 extones at a time from the disk during the expansion.

reference

  • Juejin. Cn/post / 691926…