preface
Hello everyone, I am Xing.
In this article, we will talk about indexes in the following article.
It is recommended to read the previous article before reading this article, eating the best effect.
The index
Let’s say you’re given a very thick book called Ideas for Java programming, and you don’t have a table of contents. If you want to quickly find a chapter, you might have to look for it for a while. If you have a table of contents, it’s different.
Index is actually to improve the efficiency of data query, just like the book directory, for the database table, index is actually its directory.
Binary search tree
There are many kinds of indexes, such as the common ordered array, hash table, tree and so on. Different structures have their own application scenarios and limitations. In the field of database, tree structure is widely used.
Let’s start with the most basic binary search tree.
The binary search tree has the following characteristics: the values of all nodes in the left subtree of the parent node are smaller than those of the parent node, and the values of all nodes in the right subtree are larger than those of the parent node, as shown in the following figure
If you want to search for data with ID =4, the search order is index page A -> index page B -> index page D -> data page 0, and the time complexity is O(log(N)).
In other words, the search speed is related to the height, the higher the tree, the worse the performance, suppose that a table of 1 million rows, using a binary tree to store, tree height 20, disk randomly read a block of data each time it takes about 10ms, a single row access may take 20 10ms, this query is really slow.
N cross search tree
In order to reduce random disk read I/OS, you must control the height of the tree, so you should not use binary trees, but use n-tree, where N represents the size of the data block.
In other words, the more data you store on an index page, the shorter the tree will be. InnoDB uses B+ trees for indexing.
Take InnoDB’s integer field index as an example.
A page defaults to 16KB, with an 8B bigint field, followed by a 6B pointer to its subtree, which means that an index page can store close to 1200 pieces of data (16KB /14B ≈ 1170).
If the B+ tree is 4 in height, it can store 1,200 to the third, or 1.7 billion pieces of data.
Given that the root node is always in memory, there is a high probability that the second layer of the tree is also in memory, so a search only requires access at most2
A diskIO
.
Why is the root node of the tree and the second layer of the tree in memory, but not the third and fourth layers?
The reason is very simple, look at the data size is clear
- The root of the tree is going to be
16kb
Index page, memory can be completely down, inside storage1200
Bar index - The second layer of the tree is total
1200
Index page,1200 * 16KB = 20M
There’s still room in the memory - The third layer of the tree
1200 * 1200 = 144w
Page,144w * 16kB = 23G
Memory is not appropriate - The fourth layer of the tree is the data page, which belongs to the complete data, let alone can not be loaded into memory
Finally, feel the flow of index search.
Select * from table where id=2699; select * from table where id=2699
- The root index page is directly obtained from the memory, and the directories in the root index page are searched in two ways to locate the index page of the second layer
- The index page of the second layer is directly obtained in memory, and the directories in the index page are searched in two ways to locate the index page of the third layer
- Load the index page of the third layer from disk to memory, search the directory in the index page and locate the fourth layer data page
- The data page of the fourth layer is loaded from disk to memory, and the data page becomes a cache page. Binary search is performed on the directory in the cache page to locate the specific row data
At this point, this article is almost over, and the rest of the index will be covered in a follow-up section.
MySQL good article recommended
- Over the years CURD, have you learned about the architecture design of MySQL?
- MySQL InnoDB memory component
- What is a redo log?
- No, no, nobody knows binlog?
- Redo log vs. binlog
- InnoDB: Buffer Pool to make MySQL faster
- How InnoDB works: Talk about how data pages become indexes