Have feelings, have dry goods, wechat search [three prince Ao Bing] pay attention to this different programmer.

This article has been included in GitHub github.com/JavaFamily, there are a line of large factory interview complete test sites, information and my series of articles.

preface

The system went online last week, and the query was slow when a large amount of data was found. After investigation found that the sister in charge of this function did not specify the primary key when building the table.

Sister was severely criticized after crying: three slanted said don’t use the UUID master key, I will delete the UUID.

Me: fool won’t let you use the uUID primary key, not the primary key.

Watching my sister’s sad face, I relented a little. Do you know why they won’t let you use uUID?

When I asked her questions, she and her sisters around were vague and dismissed me by saying, “It will cause slow writing.”

I wasn’t angry at all. I knew it was time for me to pretend to be a force. I took my sisters’ little hands and sat around me and said:

This shows you don’t understand the storage structure of mysql, so I’m going to give you an idea of the storage structure of the database by inserting a single entry into the table.

The Tablespace/in Tablespace

If I want to become a writer and aspire to write a long work, I need to put words on paper. The first step is to buy two large rolls of uncut white paper from a paper mill. Accordingly, in the computer, all data also needs to be recorded on disk, tape, CD and other storage media for long-term storage.

These media are divided into files, which are the physical Spaces where data is stored.

Since I bought two rolls of paper, and either can store words, I had a hard time deciding which roll to put on when I began writing. It was painful for me as a choicer.

So I planned to hire a secretary to dictate to him what I wanted to write, and he would transfer it to a specific roll of paper. I didn’t care which roll I wrote it on.

Similarly, when a program manipulates a file, it first needs to specify a file path. But in a database, tables are development oriented, and storage devices are operation oriented. When developing and creating tables, it is difficult to determine which file a table corresponds to. O&m dynamically adds files to the database based on the actual situation.

The tight coupling of tables and files seriously restricts the convenience of database use, so it is logical to add a layer of table space between files and files, which connects tables up and files down; The developer only needs to manipulate tables in the table space, and the storage is automatically maintained by the Innodb storage engine based on the table space.

Table Spaces are the highest logical structure in the InnoDB storage engine, where all data is logically stored.

Table Spaces mainly include the following types:

  • The system table space stores the change buffer, Doublewrite Buffer, and metadata for all objects associated with InnoDB. Such as: table space and database information, table structure and field information, and so on. The. FRM files used to store table structure information have been removed from mysql8.0 and all metadata is stored in this system table space. The related core views in the system tablespace INFORMATION_SCHEMA library are as follows:

Select * from information_schema. innodb_TABLESPACE WHERE name=’world/user’; (space: tablespace ID, name: tablespace name)

Run the following command to query information about data files corresponding to the information_schema.files WHERE TABLespacE_NAME =’world/user’; (file_name: relative path of data file)

Select * from information_schema.innodb_tables WHERE name=’world/user’;

Select * from information_schema.innodb_indexes WHERE table_id=1269 and name=’primary’; (page_no: B+ tree root page no; Name =’primary’;

The system tablespace also has a corresponding Data file, which defaults (on Windows) to XXX \MySQL Server 8.0\Data\ibdata1. Only system tablespaces can specify multiple files. All other tablespaces can specify only one data file.

  • Independent tablespace Each table corresponds to an independent tablespace. Ini parameter innodb_file_per_table=1 to start the independent table space, otherwise, default to the system table space. 5.6.6 After this configuration is enabled by default, the default is independent tablespace.When a table is created, a tablespace corresponding to the table name is automatically created for the table and a tablespace file named table name. ibd is generated in the database directory. Create table user in database world

  • A common tablespace is a tablespace created manually by using create TABLESPACE tablespace name.

  • Temporary table Spaces store temporary tables and rollback segments corresponding to temporary table changes. The default temporary file is XXX \MySQL Server 8.0\Data\ibtmp1 (on Windows)

Area / / among clusters

Because a roll of original paper is too large, it may spread over the floor of my mansion for more than ten or even dozens of floors, which is very inconvenient to use. After all, my mansion of 9 square meters still needs to leave space to receive guests. The best way to do this is to cut these pages into A4 size data sheets.

Similarly, the size of a disk or file is very large and difficult to manage, so InnoDB divides files into equal-sized storage blocks, also known as pages.For a literary story, as long as you can find the next page by page number, so as to complete the story. Usually when we finish the first page, we will immediately move on to the second, but if the corresponding pages are scattered in the bedroom, bathroom, and living room, the reading experience will be greatly reduced. If the cost of these scattered pages can be adjusted, the convenience of reading can be greatly improved.

According to the locality principle, when the CPU uses data, it is highly likely to use the data that is logically adjacent in the next step. So to improve the performance of data reads, InnoDB stores logically adjacent data on as many physical pages as possible; To achieve this, Innodb introduces the concept of extents/clusters;

An extent/cluster is a segment of space that is physically consecutively allocated. An extent is divided into contiguous pages to store data of the same logical unit (index segment, data segment). An area/cluster consists of 64 consecutive pages by default, and the default size of each Page is 16K.

In fact, InnoDB divides files into contiguous sections/clusters and then within sections/clusters into contiguous pages. In general, a file is a series of contiguous pages on the micro level and a series of contiguous sections/clusters on the macro level. Know a number of pages and the page size this page can be calculated in the location of the disk, likewise know that a page number can calculate an area/cluster and the size of the page in the area/cluster is which one area/cluster (itself without number, but assumes that the first area/cluster number is 0, can know it logically be p1).

If you think of pages as pages in a real book, then extent can be thought of as a real book.

The purpose of extents is to allocate contiguous space for logical units and also to manage the state of storage space within the extents (e.g., which pages are full within the extents, which are unused, and which contain fragments). Specifically, the space state of the area itself can be specified through different linked lists of areas/clusters, and the space state of pages in the area can be specified through XDES_BITMAP in XDES Entry.

# # # / Segment

After Liu wrote the first three-body Problem book, he did not update it, but the content was so wonderful that Obama sent emails and urged him to update it through diplomatic means. To avoid damaging sino-US relations, Liu followed suit and wrote two more books in succession.

The three books with logically coherent stories are collectively called the three-body Problem, so we call these interrelated books a set. Similarly, InnoDB groups logically related extents/clusters into segments.

In order to make the same logical unit can physically have contiguous storage space, Innodb proposed the concept of area, but the smallest operation unit of IO is page, one IO cannot write a full area, and data can be erased (deleted) and rewritten, so it must record the area itself and the space state of the area: Which areas are full, which areas are unused, and which areas have debris space.

Innodb records the management information of storage space status with correlation areas as segment entities, and the sum of areas managed by segment entities is called segments. The purpose of a segment is to manage extents usage and provide space storage status when allocating space for data.

A paragraph can be similarly seen as a set of books in reality.

Innodb data is organized in a B+ tree, with leaf nodes storing keywords and rows, and non-leaf nodes storing keywords (index data) and page numbers. Index data and business row data have different data structures, so they are stored separately. The index data of non-leaf nodes is stored in one segment, and the business data of leaf nodes is stored in another segment, and they are also stored in extents and pages of different structures.

The logical structure of the data is as follows:

The physical storage structure is as follows:Segments are the logical part of a table space to store data that has the same meaning, such as non-leaf nodes in a B+ pair or leaf nodes in a B+ tree. Common segments are data segment, index segment, rollback segment, and so on.

For each index created, two segments are created: a data segment (the leaf node corresponding to the B+ tree) and an index segment (the non-leaf node). For clustered indexes (typically primary key indexes) data segments store index keys and business rows (all fields); For a non-clustered index, the data segment stores the index key and primary key. If the query is performed through a non-clustered index, the primary key must be found through the B+ tree and the specific row can be queried from the clustered index through the primary key. This is called a back table. Image below: Secondary index (non-clustered index) on left, primary key index (clustered index) on right

Table data is stored by clustered index organization, that is, B+ trees created by primary key indexes. Therefore, you should specify a primary key when creating a table. If no primary key is specified and no unique index is created, the table defaults to creating an increment of the hidden field row_ID as the key field in the clustered index B+ tree. Because it is a hidden field, this field can only be used for queries back to the table.

Page/Page

As mentioned above, a page, like the pages of a book in real life, is the smallest unit of IO operations in InnoDB. Pages in InnoDB are similar to pages in a real book.

The default page size is 16KB; The innodb_page_size parameter can be specified. The options are 4KB, 8KB, 16KB, 32KB, or 64KB. If the page size is 4 KB, 8 KB, or 16KB, the number of pages corresponding to an extent changes synchronously to ensure that the size of an extent(area/cluster) remains 1 MB. When the page size is 32KB or 64KB, the number of pages in an extent area remains the same, and the page size is 2 MB or 4 MB.

Each page has a corresponding number starting from 0, which is called the page number. Because the data files of the table space are divided into equal-sized pages, knowing the page number and based on the initial location of the file, you can calculate the exact location of the page on disk.

Similarly, a table corresponds to a clustered index, and the clustered index metadata specifies the root page number, so Innodb engine can calculate the exact address of index B+ tree root page based on the page number and page size, and thus operate on the entire table data.

The page is used to store service-related data, but extents and segments are also required to manage memory allocation. Innodb stores the following categories according to the content of the page:

  • FSP HDR page: A table space may correspond to multiple data files, each with its own number. A table space is the topmost structure in a database. Metadata in a system table space can be used to query meta information such as files in the corresponding table space, but the segment and extent information corresponding to the current table space cannot be queried. Therefore, storage status of pages in the table space cannot be obtained.

    To provide an external access point to the physical storage of a table space, specify page 0 of file 0 in the table space to store table space information and Pointers to the segment linked list currently owned by the table space.

Any page consists of a header, a body, and a footer.

A page defaults to 16KB, and the amount of pointer data corresponding to segments and extents is not very large, so only part of the header information can be maintained. Most of the remaining space is used to store information about some extents owned by the current tablespace.

Page header: indicates the current page number, type, and owning table space. Footer: used for data verification. Body: This is the main part of the page where data is stored.

The page body is divided into table space header area and business data area. FSP HEADER: (1) : table space information: corresponding space ID, total page number of the table space, etc. (2) : segment information: linked list pointer of the page where the segment entity is filled with data, and linked list pointer of the page where the segment entity is not filled with data (it points to the page where the segment entity is located, and 85 segment entities are stored on one page). (3) : Shard/cluster information: free shard/cluster (XDES entity itself, not the page where XEDS entity resides) list pointer, incomplete shard list pointer, and fully written shard list. This extent/cluster information does not belong to any segment, but belongs to the tablespace and is allocated to the segment when it requests space next time.

In theory, an extent/cluster can be completely allocated to a segment, but some extents/clusters are directly allocated to the tablespace after being created and used as shards. To reduce waste, only some of the pages in these extents are allocated to a specified segment.

For example, when you announce that you are going to write a great book and ask your secretary for 500 pages, the secretary is likely to see through it all and respond with yes and no, but will only give you three pages because she thinks you are likely to be unable to produce six words in seven days. Similarly, when InnoDB allocates space to a newly created segment, it does not allocate an extent/cluster at the beginning. Instead, it allocates 32 pages from the fragment first. Only after the 32 pages are used up, InnoDB considers the segment as a big data segment and starts to formally allocate a complete extent/cluster to it.

Data section:

The FSP HEADER points to the segment and shard lists, but these are only Pointers to the list. The real extent information nodes are stored in the data section of the current page. An area/cluster information entity is called an XDES Entry(eXtent DEScript). A page stores 256 XDES entries.

XDES Entry, as shown above, contains the segment ID (if assigned to a segment) and the pointer to the next node in the shard list. It does not contain page information, because the extents/clusters have corresponding physical space, and the pages in the space are owned pages, so there is no need to specify in the entry.

Careful friends may find that although XDES Entry is a description area/cluster, it does not specify the number or address of the area/cluster. Then, which area/cluster does it correspond to in the physical space?

Extents/clusters themselves are not numbered, but like pages, extents/clusters are allocated consecutively from the first byte of the file. At the same time, the first page of the first area of every 256 areas/clusters is the index page of the 256 areas/clusters, namely XDES Page.

XDES page has page No, so you can calculate the address of this XDES page, that is, the address of all extents/clusters in this page. The next 255 areas/clusters have a corresponding XDES Entry stored in the XDES Page. The offset of these XDES entries in this page is the offset of the next 255 areas/clusters. You can calculate the physical location of an XDES Entry based on the location and offset of all areas/clusters in the current XDES page.

The FSP HDR page is like the cover page of a table space and is the entry page to the entire table space.

  • XDES page: An XDES page is an eXtent DEScript area/cluster description page used to store area/cluster information entities, that is, XDES Entry pages. It is identical except for the FSP HEADER in the FSP page. In essence, the home page is also an XDES page, except that it is the first page of the entire table space, so it also records table space information on the side.

XDES Entry: a logical block that stores information about an area.

An XDES page can only store 256 entries corresponding to 256 extents. Therefore, an XDEX page is logically required to store the information of the next series of 256 extents every 256 extents.

  • INODE pages: Like entries corresponding to extents/clusters, a table space only points to a linked list of page segments (not segment entities) in various states, but does not store segment information itself. An inode page is a page that stores inode entries describing segments.

By default, an inode page stores 85 segment entities, each of which points to a different state of extents/clusters for the segment: unused extents/clusters, full extents/clusters, and not full extents/clusters.

  • The pages above the Index page store physical space usage and are used to manage areas/clusters and segments themselves. The Index page is used for final storage of business data. Table data in InnoDB is organized by clustered indexes, while leaf nodes are stored in one segment and non-leaf nodes are stored in another segment, but are eventually stored in pages of type Index.

The detailed items on the INDEX page are shown as follows:

The storage structure of index page is shown as follows:The business data in a page is a logically ordered one-way linked list. There are two virtual rows in a page, which represent the smallest row and the largest row in the entire page, i.e. the first and last row in the list, and are used to define the scope of the list.

In addition, for index segments, there is approximately 16250B of user data on a page. A row contains a 4-byte int key, a 6-byte page number to the leaf node, and a 6-byte row header for a total of about 16 bytes. Then a rough calculation of one page can store 16250/16 for about 1000 entries. To optimize the query, every 4-8 rows of data is stored in a 2-byte space called slot, which together form an array called Page Directory.

The last slot stores the first row of infimum, the penultimate slot stores row4, and the first slot stores the last row of supremum. The Page Directory array is then an ordered array that can be quickly located using a binary lookup algorithm and then iterated through the block to find the data that ultimately matches the requirements.

Note: Since there is free space between the user row and the end of the page, and the number of slots is not fixed due to the number of lines in the page, the size of the page dirctory array is not fixed, so slot allocation is appending forward in reverse order.

The overall structure

That’s the structure and data information for the different objects in the table space, but let’s take a look at how the components are related as a whole.

  • At the micro level, table space files are physically separated into equal and contiguous pages.

  • Macroscopically, table space files are physically separated into contiguous extents/clusters multiplied by large amounts of water.

  • Page 0 of file 0 is called the FSP page, the home page, and can be assumed to be the cover page of the table space. It stores linked list Pointers to other components of the entire table space and is the entry page for the entire table space.

  • Logically, the FSP page points to different components through two lines. (1) : Refers to segment information through FSP_SEG_INODES_FULL(full segment page list) and FSP_SEG_INODES_FREE(not full segment page list). The segment entity points to the segment through FSEG_FREE(free list of extents/clusters), FSEG_FULL(full extents/clusters), and FSEG_NOT_FULL(not full extents/clusters). (2) : FSP_FREE (linked list of free extents/clusters), FSP_FREE_FRAG (unwritten fragments/clusters), and FSP_FULL_FRAG (written fragments/clusters) point to the extents/clusters that do not belong to any segment.

  • The first page of the first zone/cluster of each 256 zones/clusters stores the management information of the 256 zones/clusters. Page 0 is called the FSP page because of its uniqueness, and the other pages are called XDES pages. Using this page number and the offset of the Entry position stored in it, you can easily calculate the location of the 256 extents on disk. Therefore, even if XDES Entry does not record the number or address of the area/cluster, you can know which area/cluster each Entry manages.

  • When the index page to insert a data, if this page is full, you will need to apply to the area where this page/cluster for space, if the area/cluster is full, then to the section of the application, if the section is full, also can apply to the table space, table space will be through the operating system apply to the disk 3 area/cluster, and join the FSP FSP_FREE chain in the table. It is then allocated one by one and stored in its corresponding linked list.

Line/Row

All of the objects described above are intended to allocate a piece of physical storage space for business data, which can finally be recorded on a specified page. Innodb stores data in rows. Here’s a quick look at the Compact format.

Each record contains a series of headers that describe the storage state of the current record as shown in figure. However, in addition to the header information, the data stored varies depending on the node where the record is located.

  • Clustered leaf nodes, which store business rows in the table, contain transaction IDS, rollback segment Pointers, and a hidden ROW_ID if no primary key or unique index is specified, in addition to the row data itself.

  • Non-leaf nodes are searched against a B+ tree, so the minimum recorded value of the child node and the page number of the child node are recorded.

Relationship between B+ tree nodes and pages

Innodb Page is just a physical storage space, equivalent to a page in a book, just a carrier of data. B+ tree nodes are logical structures of data, and theoretically they have no necessary relationship. You can store a complete B+ tree within a page, or store a complete B+ tree with multiple page pages, or even map page pages to nodes in the B+ tree.

In fact, in Order to achieve simplicity in Innodb, B+ tree nodes and pages are one-to-one correspondence, the following is a simple extension process.

Suppose there is a clustered index B+ tree that starts like this:

Insert rows 16, 17, and 18 into the B+ tree as follows (green) :

Insert another 19 rows into the B+ tree and the original space expands as follows (blue) :

If the clustered index uses an incremented primary key, the data is appended to each page, and if the page is full, you simply allocate a page to continue appending.

If the clustered index uses unordered columns such as Uuid, since B+ is a logically ordered set, inserting data into the B+ tree is likely to be inserted into an already full page, causing the original page to split. It moves first like inserting data into an array to make room for the new data. Therefore, it is recommended to use ordered columns for clustered indexes.

How to store a piece of data step by step

After a lot of trouble, you can finally insert a single piece of data from start to finish, and see how InnoDB stores data into a file step by step. The younger sisters are estimated to have listened to, think about all happy, I can be really a small clever ghost.

Stretch, sweet look at the sisters.

Hey, what about people? Am I crossing into equal space?

Well, good beginning, good end, I will speak to yourself, not others, I also don’t believe that not yourself.

  • Create the table user in database World
 CREATE TABLE user (
    id int(11) NOT NULL AUTO_INCREMENT,
    name varchar(10) DEFAULT NULL,
    age int(11) DEFAULT NULL,
    gender smallint(6) DEFAULT NULL,
    create_time date DEFAULT NULL,
    PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
Copy the code

Innodb stores table structure information into tables and columns of the Information_SCHEMA library in the system table Spaces

  • Create an independent tablespace world/user and the corresponding data file world/user.ibd and update the tablespace information_schema.innodb_tablespaces to information_schema.innodb_tablespaces

Synchronizes file information about the tablespace to information_schema.files

Specify page 0 of file world/user.ibd as the cover page of the tablespace.

  • If the specified primary key or unique index is used, an aggregated index is created using the specified column; otherwise, an aggregated index is created using the hidden column ROW_ID and stored in information_schema.innodb_INDEXES

Create two segments for the index: index segment (non-leaf node) and data segment (leaf node), and store the segment information in the segment linked list on the cover page of the table space.

Create the first Page for the index, Root Page, and record the segment information in the Root Page segment list to manage the segment information of the B+ tree. Also log Root PageNo to information_schema.innodb_Indexes, as shown in the figure above. To associate the logical table with physical storage from the Page, the Root Page acts as the cover of the index.

  • Insert data Insert a data entry into the table as follows
Insert into world.user(name,age,gender,create_time) values(' xiaoxiao ',18,1,now())Copy the code

Extract database and table names from SQL and retrieve table ids from information_schema.innodb_tables

The Root Page No of the clustered index at information_schema. innodb_INDEXES is 4.

Run Root Page No 4 to calculate the physical address of the Root Page. According to the segment information specified in Root Page, index data is inserted into Root Page, data rows are inserted into the corresponding Page of the data segment, and the two types of pages are associated.

If the space of a page is insufficient, the system calculates the area/cluster where the current page resides and applies for space for the area/cluster. The area/cluster searches for idle pages according to the bitmap in XDES Entry and allocates space. If the partition/cluster does not have free space, the system applies for required space from the preceding segment, tablespace, and OS level by level.

The requested tablespaces will be stored in the corresponding linked list (for example, the requested tablespaces will be stored in the corresponding FSP_FREE linked list).

When a Page is allocated or extended, to ensure that it can be found by innodb_Indexes’ Root Page No, the Root Page physical space and the Root node corresponding to the B+ tree remain the same, that is, the Page number is always the 4 space.

In order to keep the Root Page unchanged, InnoDB swaps the data of the new Root node to the original Root Page by copying the data of the new Root node to the original Root Page. In this way, the Root Page is guaranteed to remain unchanged forever. That is, ensure that the table is never disconnected from the physical space.

conclusion

  • A table space is a logical structure in a database that decouples the association of tables, indexes, and so on with files.
  • A segment is also a logical structure that allows data with the same specific logical meaning and the same storage structure to be grouped together for easy management.
  • Extents are physical storage structures that correspond to real physical space on a large disk. It divides the file by the same size starting with the first byte and logically concatenates the areas through XDES Entry. You can calculate the relationship between an XDES Entry and the physical space managed by the XDES Entry based on the page where the XDES Entry resides and the offset within the page.
  • A page is the smallest unit of physical storage IO operations. It is equally sized from the first byte of the file. The table organizes data by index. The clustered index metadata stores Root Page No for the table pair. Pages are numbered so that they can be associated with physical space.

Segments and extents are used to manage the storage status of the space and allocate space services for pages. Real queries only need to operate data on the entire physical space of the table through the association relationship between nodes at all levels in the Page No and B+ tree.

  • Rows are the physical units that ultimately store business data. The default page is 16K, which can store approximately 1000 + rows of index data (non-leaf nodes), or 20 or more lines of business data (leaf nodes). The “binary search” algorithm of B+ tree is used to quickly locate data between pages, and Page Directory is used to divide multiple rows into a group corresponding to a slot in the ordered array of Page Directory. In this way, the “binary search” optimization can be performed within a Page.

To record the state of the row itself, innoDB adds additional record headers to a record. For leaf nodes, additional fields such as Row_ID (hidden primary key), TRx_ID (transaction ID), and rollback pointer are added.

I’m Aobing, the more you know, the more you don’t know, thank you for your talent: likes, favorites and comments, we’ll see you next time!


This article is constantly updated. You can search “Santaizi Aobing” on wechat and read it for the first time. Reply [Information] There are the interview materials and resume templates for first-line big factories prepared by me.