MySQL > select * from B+Tree; MySQL > select * from B+Tree; MySQL > select * from B+Tree;
Amateurs watch the crowd, amateurs watch the door. For this kind of popular knowledge and questions, I hope the students can do well.
Today we are going to review the past and present of MySQL index. Let’s talk about indexes.
What is an index?
In relational databases, an index is a single, physical storage structure that sorts the values of one or more columns in a database table. It is a set of values of one or more columns in a table and the corresponding logical pointer list to the data page in the table that physically identifies these values. Index function is equivalent to a book catalog, you can quickly find the required content according to the page number in the catalog.
When there are a large number of records in the table, if you want to query the table, the first way to search information is the full table search, is to take out all records one by one, compare with the query conditions one by one, and then return the records that meet the conditions, this will consume a lot of database system time, and cause a lot of disk I/O operations; The second method is to build an index in the table, and then find the index value that meets the query conditions in the index. Finally, ROWID (equivalent to page number) stored in the index is used to quickly find the corresponding record in the table.
After MySQL5.5 InnoDB storage engine used index data structure mainly used: B+Tree; B+Tree B+Tree B+Tree
Mark:
B+Tree can use indexes for <, <=, =, >, >=, BETWEEN, IN, and LIKE that do not start with a wildcard. (after MySQL5.5)
These facts may upend some of your beliefs, such as in other articles or books you read. All of these are “range queries” and do not go to the index!
Yes, before previous 5.5, the optimizer would not choose to search by index. The optimizer thinks that the rows extracted in this way are more than the rows scanned in the full table. Because we need to check the table again, the number of rows may involve MORE I/O, which will be abandoned by the optimizer.
After optimization by algorithm (B+Tree), it supports scanning of partial range types (benefit and order of B+Tree data structure). This practice also violates the leftmost prefix principle and results in a condition after a range query not using a federated index, as we’ll explain later.
The advantages and disadvantages of indexes
1, the advantages of
Indexes greatly reduce the amount of data that the server has to scan
Indexes help the server avoid sorting and temporary tables
Indexes can turn random I/ OS into sequential I/ OS
2 and disadvantages
While indexes greatly speed up queries, they slow down the speed of updating tables, such as INSERTS, UPDATES, and DELETES. MySQL not only saves data, but also index files when updating tables.
Index files that take up disk space. In general this is not a serious problem, but if you create multiple composite indexes on a large table with a large number of inserts, the index file size can grow rapidly.
If a data column contains a lot of duplicate content, there is not much practical value in indexing it.
For very small tables, a simple full table scan is more efficient in most cases;
Therefore, only the most frequently queried and ordered data columns should be indexed. MySQL > alter table set index = 16;
One of the purposes of database existence is to solve data storage and fast lookup. So where does the database data live? Yes, disks. What are the advantages of disks? Cheap! Drawbacks? Slower than memory access.
Do you know the main data structure used by MySQL indexes?
B + tree! You blurt it out.
So what kind of data structure is a B+ tree? MySQL > select * from B+ tree;
In fact, the final choice of B+ tree was a long evolution:
Binary sorting Tree → binary balanced Tree →B+Tree (B+Tree)
Some friends asked me, “What’s the difference between a B-tree and a B-tree?” B+Tree (B+Tree); B+Tree (B+Tree); B+Tree (B+Tree
Red black tree is a storage structure in programming language, not MySQL. Java’s HashMap, for example, uses a linked list plus a red-black tree.
Ok, today, let’s take a look at the evolution of a B+ tree.
B+Tree index
1. Binary sort tree
Before we get to B+ trees, a binary sort tree, for a node, all the children of its left subtree are smaller than it is, and all the children of its right subtree are larger than it is, if all the nodes satisfy this condition, then it is a binary sort tree. (Here can string the knowledge point of binary search)
Here is a binary sort tree, you can try to take advantage of its features and experience the process of finding 9:
-
9 is less than 10. Look in its left subtree (node 3)
-
9 is larger than 3. Look for the right subtree of node 3 (node 4)
-
9 is larger than 4. Look for the right subtree of node 4 (node 9)
-
Node 9 is equal to node 9. The search succeeds
A total of 4 comparisons, have you thought about the optimization of the above structure?
2. AVL tree (self-balanced binary search tree)
The figure above is an AVL tree with the same number of nodes and values as a binary sort tree
Let’s look at the process of finding 9 again:
-
9 is greater than 4, so go to its right subtree
-
9 is less than 10. Go to its left subtree
-
Node 9 is equal to node 9. The search succeeds
Three times in total, the same amount of data is less than the binary sorting tree, why? Because the height of AVL tree is smaller than that of binary sort tree, the higher the height, the more times of comparison; Do not underestimate the optimization of this time, if it is 200W data, the number of comparisons will be significantly different.
You can imagine a 1 million-node balanced binary tree, 20 tall. A query may require access to 20 data blocks. In the days of mechanical hard disks, reading a block of data randomly from a disk took about 10 ms of addressing time. In other words, for a 1 million row table, if stored in a binary tree, it might take 20 10 ms to access a single row, which is quite a slow query!
B Tree (Balanced Tree) (Balanced Tree) (Balanced Tree
A B tree is a multi-path self-balanced search tree, which is similar to a normal binary tree, but B books allow each node to have more children. The schematic diagram of B-tree is as follows: It is worth noting that data data of non-leaf nodes and leaf nodes of B-tree are stored separately, so common features such as range query and sorting are unfriendly.
B tree features:
-
All the keys are distributed throughout the tree
-
Any keyword appears and only appears on one node
-
The search may end at a non-leaf node
-
In the keyword set to do a search, performance close to binary search algorithm
To improve efficiency, minimize the number of disk I/ OS. In practice, the disk is not read strictly on demand, but preread every time.
After the disk has read all the data it needs, it reads more data into memory in sequence, based on the principle of locality known in computer science:
-
Because sequential disk reads are efficient (no addressing time required, very little rotation time required), preread can improve I/O efficiency for localized programs. The prefetch length is an integer multiple of a page.
-
MySQL(which uses the InnoDB engine by default) manages records by page, with a default size of 16K per page (which can be changed).
B-tree uses the disk prefetch mechanism:
Each time a new node is created, it is applying for a page space, so it only needs one I/O to find a node. In practical applications, the node depth is very small, so the search efficiency is very high.
So how does the final B+ tree work?
4, B+ Tree (B+ Tree)
It can also be seen from the figure that the difference between B+ tree and B tree is as follows:
-
All keywords are stored in leaf nodes, while non-leaf nodes do not store real data, so they can be quickly located to leaf nodes.
-
The addition of a chain pointer to all leaf nodes means that all values are stored sequentially and that each leaf page is the same distance from the root, making it a good place to find scope data. Supports range query and natural sorting.
Therefore, B+Tree can use indexes for <, <=, =, >, >=, BETWEEN, IN, and LIKE that do not start with a wildcard character. And if this index is used, the consumption of the sorting function is greatly reduced.
Advantages of B+ trees:
The comparison times are balanced, reducing THE I/O times, improving the search speed, and making the search more stable.
-
B+ trees have lower disk read and write costs
-
The query efficiency of B+ tree is more stable
Keep in mind that every time you create a table, the system automatically creates an ID-based clustered index (the B+ tree above) for you to store all the data; Each time you add an index, the database creates an additional index for you (above B+ tree). The number of fields selected by the index is the number of data indexes that each node stores. Note that the index does not store all data.
MySQL > select * from B+ tree;
B+ trees are more suitable for external storage (generally disk storage). Since inner nodes (non-leaf nodes) do not store data, a node can store more inner nodes, and each node can be indexed more accurately. In other words, the B+ tree provides more disk I/O information and higher I/O efficiency than the B tree.
Mysql is a relational database, and it often accesses an index column according to the interval. A chain pointer is established between leaf nodes of B+ tree in order to strengthen the interval access. Therefore, B+ tree is friendly to the interval range query on index column. However, the key and data of each node in the B tree are together, so interval search cannot be performed.
5. What you should know about indexes
1, back table query
For example, if you create the name, age index name_age_index, it is used to query data
Select * from table where name =' age 'and age = 26;Copy the code
Select * from (select * from); select * from (select * from); select * from (select * from);
2. Index coverage
It is better to understand when combined with the back table, such as the above name_age_index index, there is a query
Select name, age from table where name =' age 'and age = 26;Copy the code
In this case, select fields name and age can be obtained in the name_age_index index. Therefore, data in the index is directly returned without returning to the table. It is the preferred optimization method for DBA students.
3. Left-most prefix rule
B+ tree node storage index order is from left to right storage, in the matching process naturally also need to meet the matching from left to right; Usually when we create a joint index, that is, we create an index for multiple fields. I believe that students who have built an index will find that both Oracle and MySQL will let us choose the order of the index. For example, if we want to create a joint index for a, B, and C, we can choose the priority we want. A, B, C, or B, A, C, or C, A, B, etc. Why does the database let us choose the order of the fields? Are not all three field joint indexes? This leads to the principle of the left-most prefix for database indexes.
In our development, we often encounter the problem that the SQL query does not use the index when it is clear that the field has a joint index. SQL > alter table abc_index (a,b,c); SQL > alter table ABc_index (a,b,c);
select * from table where c = '1'; select * from table where b ='1' and c ='2';Copy the code
The following three cases will go through the index:
select * from table where a = '1'; select * from table where a = '1' and b = '2'; select * from table where a = '1' and b = '2' and c='3';Copy the code
Can you see anything from the above two examples?
Yes, the index abc_index:(a,b,c) will only be used for (a), (a,b), (a,b,c). (a, C) will also go, but only the A field index, not the C field index.
In addition, there is a special case, the following type of index will only a and B, c will not go.
select * from table where a = '1' and b > '2' and c='3';
Copy the code
For SQL statements like the one above, c is already unordered after A and B have completed the index, so C cannot go through the index, and the optimizer will consider it not as fast as scanning the whole table for C.
Left-most prefix: as the name implies, it is left-most first. In the above example, we create a_b_c multi-column index, which is equivalent to (a) single column index, (a,b) composite index, and (a,b, C) composite index.
Therefore, when creating a multi-column index, the most frequently used column in the WHERE clause is placed at the far left, depending on business requirements
4. Index push-down optimization
The index name_age_index has the following SQL
Select * from table where name like 'Chen %' and age > 26;Copy the code
This statement can be executed in two ways:
-
Hit the name_age_index syndicate index, search for all rows whose names start with “Chen”, and then return to the table to search for all rows that meet the criteria.
-
Select * from name_age_index where name = ‘Chen’ and age>20; select * from name_age_index where name = ‘Chen’ and age>20;
The second way to query back to the table has fewer rows and fewer I/ OS. This is called index push-down. So not all likes will miss the index.