This is the first day of my participation in Gwen Challenge

One, foreword

Data pages are associated with data pages through bidirectional linked lists, and data rows are preceded by one-way linked lists, as shown in the following figure:

The association between data rows, as shown below:

The head of each row has a datatype to indicate the current row:

  • 2: indicates the minimum line
  • 0: indicates a common data row
  • 3: indicates the maximum row


The core goal of page splitting is to ensure that the primary key in the next data page is greater than the primary key in the previous data page.

Inserting data into a table over and over again involves page splitting:

When a new data page is added, it actually moves the larger primary key value from the previous data page to the new data page, and then moves the smaller primary key value from the new data page to the previous data page, ensuring that the primary key value in the new data page must be larger than that in the previous data page.





Second, the index

Question 1: How is the primary key index designed, and how is the query based on the primary key index?

Suppose you want to searchid = 5How do you know which data page this data is on?

In the absence of an index, a full table scan is actually performed at the physical level, scanning each row within each data page in turn.

To design an index for a primary key, generate a primary key directory: the page number of each data page, along with the smallest primary key value in the data page, is put together to form an index directory. As shown in figure:

So when querying, first go to the primary key directory to find the corresponding data page, and then find the corresponding data row in the data page.


Question 2: How is the index usedB+The tree?

In the case of the previous question, the data gradually increased, and so did the index, thus forming the index page, as shown in the figure:

Tips: The subindex page contains the corresponding data page and can point to the corresponding data page.

In order to speed up the search efficiency, B+ tree is introduced, as shown in the figure below:

In a large B+ tree index data structure, the leaf nodes are the data pages themselves, so the B+ tree index can be called a clustered index.

In the InnoDb storage engine, adding, deleting, modifying and querying data is performed directly on the data pages in the clustered index.


Question 3: How do I index fields other than the primary key?

Index other fields, such as the name field.

When inserting data:

  1. Insert full data into the data page of the leaf node of the cluster index and maintain the cluster index
  2. Other fields are indexed separatelyB+Tree, and maintenance

For example, creating an index for the name field produces a B+ tree, as shown below:

The leaf node of the B+ tree is the data page, which contains only the primary key field and the name field.

When searching by the name field:

SELECT * FROM table WHERE name = 'yy';
Copy the code
  1. Start from the root node in the index B+ tree of the name field, and then go to the data page of the leaf node to find the primary key corresponding to name

    Tips: Find the primary key value here

  2. So if I get some other value for this row, then I need to go back to the table.

    Search the cluster index based on the primary key obtained in the previous step

The same goes for syndication indexes, such as name + age.


Q4: How do insert data maintain different indexesB+The tree?

If there are multiple indexes, the index trees need to be maintained.

That is, maintain the clustered index while maintaining the other index trees.


Problem 5: Joint index query principle and full value matching rule

Design system, generally is the design of joint index: in order to make the index number as few as possible, avoid disk occupation too much, add, delete, change performance is poor.

There is a table of student scores:

The basic attributes are as follows: Student, subject and score have primary key ID, and the joint index is as follows: STU, subject and Score

CREATE TABLE IF NOT EXISTS `stu_score` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'primary key',
`stu` VARCHAR(32) NOT NULL COMMENT 'students',
`subject` VARCHAR(32) NOT NULL COMMENT 'subjects',
`score` INT UNSIGNED NOT NULL COMMENT 'scores',
`create_time` DATETIME NOT NULL DEFAULT current_timestamp COMMENT 'Creation time',
`modify_time` DATETIME NOT NULL DEFAULT current_timestamp ON UPDATE current_timestamp COMMENT 'Update Time'.PRIMARY KEY (`id`),
INDEX `idx_stu_subject_score` (`stu`, `subject`, `score`)
) ENGINE=InnoDB DEFAULT CHARSET utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT 'Student Score Table';
Copy the code

The data are:

A primary key students subjects score
13 lala English 89
43 cc English 99
22 product English 79
111 Ha ha English 100

The index tree for the federated index, as shown below:

For full-equivalent queries, you can use this federated index:

SELECT id FROM stu_score WHERE stu = 'lesbian' AND subject = 'English' AND score = 100;
Copy the code

Common index usage rules:

  1. Equivalent matching:WHEREThe fields and the order of the fields are exactly the same as in the union index
  2. Leftmost column matches
  3. Left-most prefix matching rule
  4. Range lookup rule
  5. Equivalent matching + range matching rule


1. The leftmost column matches

The joint index is KEY(STu, subject, score)

You don’t have to look at the three fields in the WHERE statement, just the left-most part of the field.

For example, both can:

  • SELECT * FROM stu_score WHERE stu;
  • SELECT * FROM stu_score WHERE stu AND subject;
  • SELECT * FROM stu_score WHERE stu AND subject AND score;


2. Rules for matching left-most prefixes

For example, use the like syntax.

SELECT * FROM stu_score WHERE stu LIKE ‘pull %’;

SELECT * FROM stu_score WHERE stu LIKE ‘% pull ‘;


3. Scope lookup rules

If there is a range query, the index can only be used for a range query on the leftmost column in the federated index.

SELECT * FROM stu_score WHERE stu < ‘lalu’ AND stu > ‘lalu ‘;

SELECT * FROM stu_score WHERE stu < ‘LBD’ AND subject < ‘LBD ‘; SELECT * FROM stu_score WHERE stu <‘ LBD ‘AND subject <‘ LBD ‘;


4. Equivalent matching + range matching rules

SELECT * FROM stu_score WHERE stu = ‘Lola’ AND subject < ‘Lola’ AND score < 100 AND score > 60;

Stu can be used to locate exactly a batch of data in the index, which is arranged in subject order, so subject < ‘English’ will also be looked up based on the index.

But score < 100 cannot be indexed.


Question 6: How does a sort use an index

You’ll encounter filesort, which sorts files based on disk.

Create INDEX INDEX(xx1, xx2, xx3), ORDER BY xx1, xx2, xx3; ORDER BY xx1 DESC, xx2 DESC, xx3 DESC;


Q7: How can groups use indexes

Group by and order by are indexed in the same order as group by and order by.

In general, for fields after group by, it is best to start with the leftmost field in the union index.