The introduction
Indexing is a hard part of Mysql, but it is a very important basic skill for programmers. In ordinary project development, it is an important SQL optimization tool. In job interviews, it is an important consideration that interviewers often use to evaluate candidate database performance optimization. Therefore, a thorough grasp of the index principle, and can be applied to the database query actual combat is a necessary ability of each program ape. This article will explain Mysql index from index principle and index design principle. I believe that after reading this article, in Mysql index query data understanding section can conquer Ali interviewers. Are you ready? Here we go.
The index principle
Before designing and optimizing indexes, let’s take a closer look at the principles of indexing. Because all design and optimization must be based on a thorough understanding of the principles.
Many people know that the same table, the same data, is used in SQL queries. Query data without index and with index. There’s a lot of difference. So why is there such a gap? Simply put, if business data is compared as a dictionary, then the index is the table of contents of the dictionary. If I ask you to look up a word, when you are not using the directory to look up, you can only turn page by page. If you are unlucky, you may have to turn to the last page to find the word you want. This is the legendary full table scan. But if we look through the table of contents, we can quickly locate the page on which the word is located and find the corresponding word. You see, the power of indexes is to make data queries more efficient. Ok, so now we have an intuitive view of indexes. So let’s take a closer look.
We all know that the index structure in Mysql is B+ tree, so let’s take a step by step look at how the index on disk grows into B+ tree.
1. Data pages
In daily project development, most of our business data is stored in relational data. Then the data in each table of the database is ultimately stored on the server’s hard disk. Have you ever wondered how this data is stored? In fact, the database tables we use every day in Mysql database are logical tables for human understanding. It is actually stored on disk as pages of data. Data pages are the basic unit of disk-memory interaction. Mysql’s Innodb storage engine actually interacts with data pages in disk through buffer pools rather than directly manipulating data pages in disk. The structure of the data page is shown below:
At the same time, the reference between adjacent data pages is maintained by bidirectional linked list. As shown in the figure below, the orange-red part is the data page, and the small boxes in the middle can be understood as the specific data. Mysql’s InnoDB storage engine data page size is 16KB. Mysql’s Innodb storage engine uniquely locates a data page by page number, so each data page has its own page number. According to the figure above, each data Page has its corresponding Page Header, which stores the Page number of the current data Page, the Page number of the next Page, and the Page number of the previous Page.
Pointers are used to refer to each other between adjacent data, and the Pointers mark the page number of data pages. Each data page stores a continuous section of data. The record head of each data row has the address offset of the real data recorded in the next row. So inside the data page, you actually have a one-way linked list of rows. This one-way linked list is about primary key ids, sorted from smallest to largest.
As you can see from the data page structure above, the User Records area grows and the corresponding User Records area decreases each time data is inserted. When the User Records area is consumed, a page split occurs, forming a new data page. To note here is that if we are using the Mysql in the primary key, you can ensure that data in the growth of the id order line arrangement, but if the primary key is set up our own is not the growth, is less likely to be inserted behind the data of the primary key value less than the primary key of the previous data, then split on page, Mysql will rearrange by primary key size. I don’t know if you have any questions here, but why do we have to sort by primary key size? In fact, it is related to the subsequent data query. The data in the data page is arranged according to the primary key order, which is the basis for the normal operation of the index. The general process is shown in the figure below:
2. Page contents
Each data Page has its own Page Directory in the Page structure above, the role of the Page Directory is actually used for data row positioning. Group is, in fact, according to the distribution of data in data page, page directory in the different slot, is corresponding to the different groups of data page, querying data, by id to find the corresponding slot, according to the corresponding slots to know the corresponding rows of data packet in the data page traverse line data grouping in the data until we find the corresponding data.
3. Analysis of index principle
(1) Index basis
With the basics of data pages in the previous two sections, it’s easier to understand how indexing works. In the absence of indexes, data queries perform full table scans. Each row in the query data page is iterated, and all the data pages are iterated again until the item matches the criteria. Therefore, the query efficiency is very low. So how can you provide data query efficiency? Is it possible to have a primary key directory like a dictionary directory to locate the data page number? The answer is yes, and that’s exactly what Mysql does. Mysql implements data query optimization through the primary key directory. The primary key directory contains two important elements, one is the smallest primary key in the data page and the other is the page number of the current data page. This allows you to query data through the primary key directory aspect.
For example, if you want to query for data with primary key ID =5, you first look in the primary key directory. If primary key ID =5 is greater than primary key ID =1, but less than primary key ID =8, then we can confirm that the data is actually on the data page 1.
Of course, in fact, there will be a lot of data pages in Mysql, so the corresponding primary key index will also be a lot, so at this time, we need to use binary search to locate the data page, and then find the corresponding data.
(2) Index page
Nowadays, the rapid development of each Internet company, the corresponding business volume is also very huge. Therefore, the amount of data in the database is also very large. Millions or tens of millions of pieces of data in a table can be common, and according to the above primary key catalog, a large number of primary keys and data page numbers need to be stored. Even binary search, its data query efficiency is relatively low.
Mysql actually stores index statements in index pages. When the amount of data is large, the corresponding indexes are also large. Therefore, Mysql uses a dedicated index page to store index data. In addition, in the upper layer of these index pages through the primary key and index page number to continue to query the location of the index page, so we get the following structure. The ID number indicates the smallest ID number.
If the index page has more and more data, the index page will also split. In this way, the index page also forms different levels. The three page data of the index page layer, index page and data page form the B+ tree. The following figure shows the INDEX B+ tree structure, which is much more efficient than full table scan. It is the leaf node of B+ that stores the data. The figure below shows a primary key index, also known as a cluster index. In fact, we can see that its fundamental idea is the idea of divide and rule. It’s a lot of data, right? So I’m going to break it up into a lot of data pages, a lot of data pages, and THEN I’m going to organize the data pages by index pages, and then I’m going to index them by index pages.
Let’s look at the process of querying data in a B+ tree. For example, if you currently need to query data with ID 3, the index page with index page 3 is determined to go. Then, in index page 3, we continue to judge that id=1 should go to the index page 1, and judge the data page with page number 1 in the index page, and iterate through the data page and finally query the corresponding data.
The above B+ tree composed of index pages and data pages is a clustered index, of course, we can also create a common index through other fields. The leaf node of the knowledge common index stores the corresponding primary key ID instead of specific data, and the index will have the problem of back to the table. That is, after the corresponding ID is queried, the specific data needs to be queried in the cluster index according to the ID, and all the data of select * can be queried through such operations. Of course we can avoid this query waste by overwriting the index.
conclusion
In this paper, the index principle of Mysql InnoDB is deconstructed by step by step graphic method, and the corresponding B+ tree index structure is constructed. The specific process of data query is described. I believe that you have a more profound understanding of the index, the following will be from the actual combat perspective, the analysis of how to design the index and how to deal with the index failure.