What data structure is the MySQL index often asked in interviews? What categories does the index have? Development, often encountered clearly added index, why will fail? How should indexes be tuned?
Indexed data structure
MySQL indexes speed up data access.
For index design, consider the existing data structures: hash tables, balanced binary trees (AVL), B trees, B+ trees.
The hash table
The data structure is a linked listObviously, using hash tables has the following disadvantages:
1. When inserting data, hash collisions or hash conflicts occur. Therefore, you need to design a good hash algorithm to avoid waste of airborne storage.
2, when the scope search, must be matched one by one, the efficiency is very low, not suitable for the scope query;
Balanced binary trees
The data structure is a balanced binary tree
How many nodes can a balanced binary tree contain?
This depends on the height of the tree. If the height of the tree is H, then the maximum number of nodes in each layer is 2^(n-1), and the maximum number of nodes in the whole tree is 2^0+2^1+2^2+… H + 2 ^ (1).
By this calculation, the height of the 100W data tree is about 20, which means that in the worst case, it takes 20 lookups to find a single data in a balanced binary tree with 100W data.
According to the characteristics of balanced binary tree, if you want to store more data, you need to make the tree deeper, which will lead to more I/O times and affect the efficiency.
If the use of horizontal binary tree has the following disadvantages:
1. Limited storage data;
2. If too much data is stored, data access requires more I/O, affecting efficiency;
B tree
The data structure is shown below
In terms of data structure, for balanced binary tree, B tree can store more data per node. But the nodes of a B-tree store data as well as key values. In InnoDB, the default page size is 16K. Storing the same data will increase the order (height) of the tree.
If the use of B tree has the following disadvantages:
1, limited data storage, search data disk I/O increase, data query efficiency becomes low;
2. Not suitable for scope search and sorting;
B + tree
Compared with B tree and other data structures, the use of B+ tree has the following advantages: 1. Non-leaf nodes only store indexes, which can store more data. Compared with B tree, they are shorter and fatter and have fewer I/O times. 2, leaf node chain before and after management, convenient range query;
The index classification
Clustered index and non-clustered index
According to the primary key value, there are two categories: primary key index (clustered index), non-primary key primary key (non-clustered index);
The difference between a clustered index and a non-clustered index is that the leaf node of a clustered index stores primary key values and data, while the leaf node of a non-clustered index stores non-primary key values and primary key values.
Therefore, the process of searching through non-clustered index is to find the Key of the cluster index corresponding to the index Key, and then take the cluster index Key to the primary Key index tree to find the corresponding data. This process is called back table!
Therefore, the DBA recommends that a table be created with a primary key; a table without a primary key is soulless. If you do not specify a primary key, MySQL generates a primary key by default.
Type to distinguish
According to the index type, it is classified as follows:
- Unique index: Unique key
- Normal index: Normal
- Full Text Index: Full Text
- Composite index: a combined index that contains two or more columns
As mentioned in the previous chapter, the index structure is B+ tree. If there are multiple indexes, there are multiple B+ trees. If there are multiple indexes, the actual data is stored only in the tree with the primary key index.
Index failure cause
For scenarios where indexes are used but do not take effect, there are several reasons:
- Use fuzzy queries: such as left blur or full blur will invalidate indexes such as ‘% shaw ‘and ‘% shaw %’, but right blur indexes such as ‘% shaw %’ are valid;
- Index fields use function expressions: for example, select name from user where substring(name,1,3)=’ ABC ‘;
- Select id from user where id/2=100;
- If (A,B,C) does not match the leftmost prefix of the joint index,C will be invalid.
- The query types are inconsistent. For example, the age type is VARCHAR, the query statement is SELECT name from user where age = 16, and the value is not quoted.
The index tuning
1. Avoid creating too many indexes and use more combined indexes (no more than 5 indexes in a single table);
2. It is recommended to add indexes to the fields often used for query conditions.
3. It is recommended to generate indexes for fields with frequent group by and order BY, which can greatly improve the efficiency of grouping and sorting.
4, data is unique, it is recommended to generate unique index, in the database level, to ensure data correctness;
5. It is not recommended to add indexes to fields with low discrimination, such as gender fields (if the discrimination is over 30%, it is not recommended to add indexes);
6. It is not recommended to add indexes to fields that are frequently updated. Rebuilding indexes will increase the overhead of the database.
7. Use overwrite indexes whenever possible;
Index knowledge point
Back to the table
First, the row where the data resides is scanned by the non-primary key index of the database, and then the data in the index is retrieved by the row primary key ID. That is, the query based on the non-primary key index needs to scan another index tree.
Indexes cover
If an index contains (or overwrites) the values of all the fields that need to be queried, and does not need to be queried back into the table, it is called index overwriting.
Leftmost matching principle
A combined index is created on multiple fields at the same time. ABC and ACB are two different joint indexes. For example, to create a combined index (a, B, and C) is equivalent to creating indexes A, AB, and ABC. In addition, the composite index is actually an index, not really creating multiple indexes, but the effect is equivalent to creating multiple indexes;
An index pushdown
MySQL 5.6 introduced index push-down optimization. During index traversal, you can judge the fields in the index first to filter out the records that do not meet the conditions and reduce the number of table returns.