preface
Why Innodb index structure is B+ tree? When it comes to this question, leave yourself a way out. Don’t spray the B tree. Because some answers on the web say that b-trees are not suitable for indexing file storage systems. If you follow that answer, you have dug a hole for yourself, and it is difficult to end it.
Mysql here refers to the index structure of Innodb’s storage engine. We will not discuss other storage engines for the moment.
B tree and B plus tree
At the beginning, let’s recall the structure and characteristics of B tree and B+ tree, as follows :B tree
Notice two distinct features of B trees
- Each node in the tree stores data
- No pointer is adjacent between leaf nodes
B + tree
Notice two salient features of B+ trees
- Data only appears in leaf nodes
- A chain pointer has been added to all leaf nodes
In view of the characteristics of the above B+ tree and B tree, we make a summary:
(1) The tree of B tree stores data, so the query efficiency of B tree is not fixed when a single data is queried. The best case is O(1). We can assume that b-tree average performance is better when doing a single data query. However, since there are no adjacent Pointers between nodes in a B-tree, it is not suitable for some data traversal operations.
(2) The data of B+ tree only appear on leaf nodes, so the query speed is very stable when querying single data. Therefore, the average performance of a b-tree is not as good as that of a single data query. However, there are Pointers on leaf nodes of B+ trees to connect, so only leaf nodes need to be traversed during data traversal. This feature makes B+ trees very suitable for range query.
Therefore, we can make a corollary: maybe Mysql has a lot of data traversal, so use B+ tree as index structure. However, Mongodb does more single queries and less data traversal operations, so B tree is used as the index structure.
So why does Mysql do data traversal more? And Mongodb does less data traversal? Because Mysql is a relational database, and Mongodb is non-relational.
So why do relational databases do a lot of data traversal?
As opposed to a relational database, do less data traversal? Let’s move on
Relational vs. non-relational
Suppose we have two logical entities: Student and Class, and there is a one-to-many relationship between the two logical entities. After all, there are multiple students in a class, and a student can only belong to one class. Relational databases In relational databases, we consider several tables to represent the entity relationship between the two. It’s nothing more than a one-to-one relationship, using a list. One-to-many, with two tables. Many-to-many, with three tables. So here, we need two tables to represent the logical relationship between the two, as follows
How many students are there in the class whose cname is class 1? Let’s say this column, cname, we have an index! Execute the SQL as shown below!
SELECT *FROM t_student t1, ( SELECT cid FROM t_class WHERE cname = '1' ) t2WHERE t1.cid = t2.cidCopy the code
And this involves data traversal operations!
You can’t get rid of the join operation. If the index structure is B+ tree, there are Pointers on leaf nodes, which can greatly improve the matching speed of line by line!
Some people might argue that if I go first
SELECT cidFROM t_classWHERE cname = '1'Copy the code
Once the CID is obtained, go through the loop
SELECT *FROM t_studentWHERE cid = ...Copy the code
Can avoid the join operation?
To this, I would say. You did avoid the join operation, but you did not avoid the data traversal operation. You still have to iterate over and over on the leaves of student’s table!
So in a non-relational database, how do we find the class with cname 1, how many students are there? Non-relational databases and someone said, well, you can design it this way, right? So I have two sets as follows
Then, execute two queries to get the results! Student = student = student = student = student = student = student
Yes, it’s possible. I’m not saying no. It’s just not what non-relational databases are designed for. In MongoDB, this is not recommended at all. Although, Mongodb has a $lookup operation, can do join query. Ideally, though, this $lookup operation should not be used very often. If you need to use it often, you are using the wrong data store: if you have associated data, you should use a relational database.
Therefore, the formal design should be as follows
Let’s say the name column, we have an index! I only need to execute the statement once
db.class.find( { name: '1'})Copy the code
In this way, you can query the results you want.
And this is a single data query! After all, you don’t need to do line-by-line matching, there’s no traversal involved, and if you’re lucky, it’s possible to get the result you want in one IO.
Therefore, because of the difference between relational database and non-relational data design. As a result, traversal operations are common in relational data, so it is appropriate to use B+ trees as indexes. In a non-relational database, a single query is more common, so it is appropriate to use b-tree as index.
The interview routines
Sets a
You put mysql in your resume, not mongodb! Interviewer :” What about the mysql index structure?” Me :” Blah, blah, blah. “Interviewer :” You know why B+ trees and not B trees?” At this point, the normal candidate will be fooled, will be the b-tree shortcomings of the spray! The next question comes from the interviewer :” Do you know why some non-relational databases like mongodb use B trees?” And then you went back to wait!
Routine 2
You write mysql resume, also write mongodb! This situation is even more perfect! Interviewer :” What about the mysql index structure?” Interviewer :” You’ve got Mongodb on your resume. Do you know anything about its index structure?” Interviewer :” Why do Mongodb indexes use B trees and Mysql uses B+ trees?” And then you went back to wait!
Routine three
You don’t write mysql or mongodb on your resume! Interviewer;” If you were designing a database, what data structure would you use for its indexes? Me :” First of all, forget red black trees, blah blah blah… “B tree or B+ tree.” The interviewer; “If I were to design a non-relational database like Mongodb, what data structure would I index it with?” Then you can go back and wait!
All three routines are real! Anyway, this is the question the interviewer wants to ask!