The most difficult thing is to know yourself!
Personal website, welcome to visit!
Preface:
This article mainly describes the actual storage structure of the joint index on B+Tree.
The storage structure and data searching method of joint index in B+ tree
The main contents of this article are:
-
The storage structure of a federated index in a B+ tree
-
How to find a federated index
-
Why is there a left-most prefix matching rule
Before sharing this post, I did some research online on the storage structure of MySQL synth index in the B+ tree, and read a number of blogs and technical articles, several of which were contrary to the facts. Details are as follows:
The value of the first field in the union index will only be stored in the non-leaf node of the B+ tree. The value of the remaining fields in the union index will only be stored in the leaf node of the B+ tree. (This is actually not true)
The following diagram shows the B+ tree storage structure of the wrong joint index:
Fortunately, through continuous query, I found a post from the community about “the storage structure of joint index on B+Tree?” Question and answer, the answer to this question, and posted an article and a picture and a simple description. PS: The link to the post is no longer open.
So this article was born under such conditions.
Federated index storage structure:
The following is a reference to the question and answer of the Sifu community to expand the storage structure of the joint index we are going to discuss today.
Think no questions, storage structure of joint index (segmentfault.com/q/101000001…). Users answered as follows:
The joint index BCD looks like the following figure in the index tree. In the process of comparison, b is judged first, c is judged then D:
As the answer is only such a picture, it may make you a little confused, so we will use the shoulders of predecessors to use this example to explore the storage structure of the joint index in the B+ tree more carefully.
Select idx_t1_bcd(b, C, D) from T1; select idx_t1_bcd(b, C,d) from T1; select idx_t1_bcd(b, C, D); All index columns of the union index appear on the index number and the three columns are compared in turn. The figure above has only two levels of tree height and is not easy to understand. Here is the hypothetical table data and my improvements to the structure diagram of its joint index on the B+ tree. PS: Based on InnoDB storage engine.
The structure of index (b, C, d) on the b + tree is as follows:
The data in the T1 table is shown below :(the data in the B+ tree above is from the following figure)
From these two figures, we have a general understanding of the storage structure of the joint index in the B+ tree. Here with my language to explain it for you.
InnoDB uses the primary key index to maintain indexes and data files in a B+ tree. Then we create a joint index (B, C, D) that also generates an index tree, the same structure as a B+ tree. Except that its data section stores the primary key of the row where the union index is located (the purple background of the leaf node above). Why the primary key and not the entire row? Because the federated index is a non-clustered index.
All right, that’s all for the big picture. Let’s combine these two diagrams to explain.
There are only a few more columns for a union index than for a single value index, and these index columns all appear on the index tree. For a federated index, the storage engine will first sort by the first index column, as in the figure above, we can look at the first index column alone, for example, 1, 1, 5, 12, 13… It’s monotonically increasing; If the first column is equal, then the second column is sorted, and so on to form the index tree of the figure above, which can be illustrated by 1, 1, 4, 1, 1, 5 and 13, 12, 4, 13, 16, 1, 13, 16, 5.
Joint index specific search steps:
Select * from T1 where b = 12 and c = 14 and d = 3; This is the entry at column A 4 in table T1.
The search steps are as follows:
- The storage engine starts with the root node (usually resident memory). The first index of the first index is 1,12 is greater than 1, and the first index of the second index is 56,12 is less than 56. The disk file address of the next node is read from the middle of the two indexes. Points to the disk location of the next node.
- A disk IO, the node values loaded in memory, then according to the first step in the same judgment, found that the data are matched, then according to the pointer will leaf node in which the joint index values are loaded from the disk memory, again at this time there was a disk IO, finally according to the leaf nodes in the index value of the associated primary key values.
- Query the primary key index tree (clustered index) for specific row records based on primary key values.
The leftmost prefix principle for federated indexes:
The reason for the left-most prefix matching principle is related to the index construction and storage structure of the joint index.
Select * from idx_t1_bcd(b,c,d); select * from idx_t1_bcd(b,c,d);
In the idx_t1_bcd(b, C,d) example above, the index tree is first built using the first column of the multi-column index. If the values of column B are equal, the index tree is then sorted by column C. If the values of column C are equal, the index tree is sorted by column D. We can take the leaves of the index tree and look at them.
The first column of the index, namely column B, increases monotonously from left to right, but column C and column D do not have this property. They can only increase in a small range if the values of column B are equal, such as the first and second elements of the first leaf node and the last three elements of the second leaf node. Because of the index construction and storage structure of the union index, the union index can only be searched from the first column of the multi-column index. So if your search criteria do not contain b columns such as (c,d), (c), (d), you cannot apply the cache, and you cannot use the index across columns such as (b,d), only use the B column index.
It’s like our phone book, first name and last name and phone number, first name and last name are joint indexes. If the last name can be sorted by the first letter of the last name, and the first letter of the last name is the same, then the first letter of the first name.
Such as:
M not MAO178* * * * * * * * ma teng183* * * * * * * * the horse cloud188* * * * * * * * Z, zhang jie189* * * * * * * * zhang jing glume138* * * * * * * * piece of art176* * * * * * * *Copy the code
We know the first name and last name is soon can position from the name of the first letter of the index to the name, then positioning to the name, and then find the phone number, because all the name of from the top down according to the established rules (initials) is orderly, and name is the name of the first letter of certain conditions are sorted by name the first letter of, but on the whole, all in the name of the together is unordered, So if you only know the name, it’s going to be slow to look it up, because you can’t quickly look it up with a structured structure.
Now if you see why we have the leftmost prefix matching rule.
Practice:
Here are some examples of SQL index usage:
select * from T1 where b = 12 and c = 14 and d = 3;Full value index matches all three columns
select * from T1 where b = 12 and c = 14 and e = 'xml';-- Applies to two column indexes
select * from T1 where b = 12 and e = 'xml';-- Applies to a column of indexes
select * from T1 where b = 12 and c > = 14 and e = 'xml';Push optimization applied to a list of indexes and index conditions
select * from T1 where b = 12 and d = 3;-- Apply to an index column because the index cannot be used across columns
select * from T1 where c = 14 and d = 3;Unable to apply index, violating leftmost matching rule
Copy the code
Postscript:
Here MySQL index index of united way of storage structure and search is finished, I ability is limited, is standing on the shoulders of predecessors’ creation of this article, because I saw before the search engine’s search results exist in several technical articles about unclear or wrong place, so their only summarizes this article to share with everyone, If there is wrong place must correct, thank.
This article has been drawing and writing for two or three days, and the main content is the above. Admittedly, this article is a bit of an armchair strategist to a certain extent, because I am a rookie in the use of MySQL and do not have much experience in database tuning. It is really a shame to talk about it here. Think of it as my personal study notes.
In addition, MySQL indexes and knowledge are extensive, and this article only covers a portion of them. Such as and ORDER BY related index optimization and Covering index (Covering index) topic is not involved in this article, at the same time in addition to b-tree index MySQL also according to different engine support hash index, full text index and so on this article is not involved. If there is a chance, I hope to supplement the parts not covered in this article.
Don’t forget to leave your learning footprints
All see the article not praise are “play rascal”, hey hey ヾ(◍°∇°◍) Blue! Just as a joke, move your little hands and click “like”, each of you will make more learners join in with a contribution (like + comment)! Thank you very much!  ̄  ̄ omega =