The execution plan
An execution plan is an SQL execution process. Of course, what we can observe is not the specific execution process, but the key information used in an SQL execution process. We need to make a judgment based on these information.
For example, how do you tell if an SQL query actually uses an index? This is where you need to judge by executing the plan.
The implementation of the execution plan requires the explain keyword in front of the SQL. Such as:
explain select * from emp where name = 'john';
Copy the code
After the SQL is executed, n more fields will be output. We need to know the meanings of some key fields:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | emp | (Null) | ALL | (Null) | (Null) | (Null) | (Null) | 12 | 100.00 | (Null) |
-
Type: indicates the query type.
There are system, const, ref, range, index, and all. The query efficiency starts from left to right.
If “all” is selected, MySQL will perform a full table scan.
It is recommended to keep the SQL type at the range level at worst.
-
Key: Whether an index is used when executing this SQL.
-
Extra: Optional Extra information. This information can help us determine the efficiency of the current SQL execution.
The main ones are: Using index (overwrite with index), Using index condition (push down with index), Using filesort (sort with temporary space).
The index type
Indexing is implemented in MySQL’s storage engine layer, not in the server layer. Therefore, each storage engine may not have the same index, and not all storage engines support all index types.
MySQL currently provides the following four indexes
- B-tree index: the most common index type. Most storage engines support b-tree indexes.
- Hash index: Supported only by the MEMORY engine.
- R-tree index: also called “spatial index”, is a special index type of MylSAM engine. It is mainly used for geospatial data.
- Full – Text index: also known as “full text index”, is a special type of MylSAM engine, mainly used for full text index, InnoDB from Mysql
5.6
Support for full-text indexing began.
4 indexes supported by different storage engines:
The index type | InnoDB | MyISAM | MEMOERY |
---|---|---|---|
B-tree indexes | support | support | support |
A Hash index | Does not support | Does not support | support |
R – Tree indexes | Does not support | support | Does not support |
Full – Text index | 5.6 After the support |
support | Does not support |
B-Tree index B-Tree index A B-tree index is an index organized by a B+ Tree (balanced search multi-fork Tree) structure. Clustered indexes, compound indexes, prefix indexes, and unique indexes are all organized by B+ tree structure by default.
Index structure
1. Read data in blocks
First of all, the index stores key-value data. There are many data structures that can store key-value data, such as HashMap and tree. MySQL finally chose B+ tree to store index, why B+ tree?
If the index grows when the table is very large, what if the index becomes large enough that it cannot be loaded directly into memory?
If the current memory is 8GB and MySQL index files are up to 16GB, the best solution is to block read, say memory only reads 1GB of index files at a time. The solution reflects the idea of divide and conquer.
At the same time, we need to improve IO efficiency as much as possible, such as replacing mechanical hard drives with solid-state drives. But this is a hardware problem that we can’t handle from the software level. However, we can use some AIDS to improve IO efficiency:
- If you reduce the I/O amount, the efficiency of reading 10 MB is different from that of reading 100 MB.
- If you reduce I/O count, the performance cost of one read operation is different from that of 10 read operations.
There are two aspects of the operating system involved:
- Locality principle:
- Time locality: Previously accessed data is more likely to be accessed again, at least much more likely than unfamiliar data.
- Spatial locality: Data and programs tend to gather in clusters.
- Disk prefetch: Memory and disk interact in a minimum logical unit called apage, or datapage. Generally, it is 4KB or 8KB, depending on the operating system. When data is read in memory, multiples of pages are typically read, such as 4KB, 8KB, 16KB, and so on.
InnoDB engine reads 16KB of data when loading data, which is the concentrated embodiment of block reading.
Why not use HashMap to store indexes?
-
Using hash storage requires all data files to be added to memory, which consumes memory space.
-
A good Hash algorithm is required. If the Hash algorithm is not good, Hash collisions will occur, resulting in uneven data hashes and waste of storage space.
-
The key-value stored in the hash table is out of order, so if you want to do a range lookup, you have to traverse all the data stored in the hash table, which is very expensive.
However, MySQL still provides Hash indexes. MEMORY engine supports Hash indexes. InnoDB engine supports adaptive Hash. Adaptive Hash means that InnoDB normally uses a B-tree index, but MySQL determines whether InnoDB uses a Hash index or a B-tree index.
Why not use binary trees, balanced search binary trees, AVL trees, and red-black trees to store indexes?
What these four typical data structures have in common is that they are all binary trees, with at most two nodes on each branch. So when you’re inserting a lot of data into these four data structures, whether they’re ordered or balanced, they’re going to be very deep. There are still more comparisons to be made when searching, although fewer than when iterating. Each comparison equals one DISK I/O operation, which undoubtedly increases the I/O count and affects the query efficiency.
Why is each comparison equivalent to one disk I/O operation?
Because each node must occupy one disk block, data on the disk is read into memory during comparison.
If you want to improve the stored data structure, you need to implement two things:
- Increases the range of a single node in the tree. For example, instead of just one value per node, you can now store five values per node.
- Adds a child tree to a single node of the tree. For example, each node can have a maximum of five children instead of two.
So we have B trees and B+ trees.
2. B tree
2.1 define
B tree, also called B- tree, is a balanced search multi – fork tree.
First of all, we need to clarify a concept. When describing a B tree and B+ tree, we need to specify its order. The order indicates how many child nodes a node has at most, which is generally expressed by M.
B tree is defined as follows:
- Each node holds a Key and Value.
- The root node has at least one key-value pair.
- Non-root nodes have at least floor(M /2) key-value values.
- Each node has a maximum of M-1 key-value pairs.
- All leaf nodes are in the same layer.
- The number of children on all nodes except leaf nodes is the current number of key-value + 1.
- The keys in each node are sorted in ascending order. All keys in the left subtree of the node are smaller than it, and all keys in the right subtree of the node are larger than it.
Based on the above definitions, we can conclude that:
- The number of key-values of the root node ranges from [1, m-1].
- The number of key-values for non-root nodes ranges from: [floor(m/2), m-1]
Take the 5th order b-tree as an example, the number range of k-v of root nodes is [1,4], and the number range of k-v of non-root nodes is [2,4].
2.2 Inserting Data
The insert rule is:
Check whether the number of key-values of the current node is <= m-1.
- <= m-1, then directly insert.
- If the value is > m-1, the node is directly inserted. Then, the node is split into two nodes using the Key in the middle of the node. The middle Key is inserted into the parent node.
For example, the following is an example of a fifth-order B-tree. Since Value does not play a role in demonstrating data structures, Key is highlighted to represent key-value:
Insert keys are 70,18,40,50,22,23,25,39
2.3 Deleting Data
Deletion principles:
- Delete k-v from leaf node:
- >= floor(m/2) : delete directly
- Number of k-v leaves after deletion < floor(m/2)
- Number of k-v of sibling nodes > floor(m/2) : Delete directly, then move the largest K-V of the parent node to its own node, and then move the largest K-V of the sibling node to the parent node.
- The number of k-v of the sibling node <= floor(m/2) : delete directly, then move the largest k-V of the parent node to its own node, and then merge with the sibling node.
- Delete the k-V in the non-leaf node: delete it directly, and then move the minimum k-V of the successor to the own node. Check whether the number of the successor is less than floor(m/2).
- < floor(m/2)
- Number of k-v of sibling nodes > floor(m/2) : move the largest K-V of the parent node to its own node, and move the largest K-V of the sibling node to the parent node.
- The number of k-v of sibling nodes <= floor(m/2) : the largest k-V of the parent node is moved to its own node and then merged with the elder node.
- >= floor(M /2) : No operation
- < floor(m/2)
-
The initial state tree is as follows
-
Delete 15
-
Delete 22
-
Delete the 28
2.4 Querying Data
When querying data, some details of the above B tree need to be represented as follows:
Each node occupies a disk block and contains K ascending keys k-V and K + 1 Pointers to the child. The Pointers store the address of the disk block where the child node resides.
Each two keys can be divided into three ranges, which correspond to three Pointers pointing to different children, and each child stores the k-V of the corresponding range.
Take the root node as an example, Key is 16 and 34, and the data range of the child pointed to by P1 is < 16. P2 refers to children representing data range from 16 to 34; The child referred to by P3 represents a data range of > 34.
Assume that the query Key = 28, the process is:
- Locate disk block 1 based on the root node and read it into memory.
- Compare Key = 28 between 16 and 34 to find pointer P2 to disk block 1.
- Locate disk block 3 according to pointer P2 and read into memory.
- Compare Key = 28 between 25 and 31 to find pointer P2 to disk block 3.
- Locate disk block 8 according to pointer P2 and read into memory.
- Compare Key = 28 find k-v, end.
2.5 disadvantages
Advantage:
B trees are mostly used to search for the address of disks. Because disks store a large amount of data, it is impossible to load all data into memory at a time. Therefore, disk pages can only be loaded one by one, and each disk page corresponds to a node. In the case of B trees, B trees have already reduced the height of the tree, thus reducing IO operations. While more data is read into memory at a time, the number of reads is small enough to have a crushing advantage over AVL and red-black trees.
Disadvantage:
Each node has a Key and also contains a Value. The storage space of each page is limited. If the Value is very large, the number of keys stored in each node will decrease.
3. B + tree
3.1 define
B+ tree is a data structure optimized on the basis of B tree.
B+ trees have two different types of nodes:
- Index node: a non-leaf node that stores keys but not values.
- Leaf node: a leaf node stores both keys and values.
B+ tree and B tree are defined differently:
- Each node has a maximum of M key-value pairs.
- The parent node holds the first Key of each subsequent child.
- Each leaf node has Pointers to adjacent leaf nodes.
- All nodes except leaf nodes have the same number of children as the current key-value.
3.2 Querying Data
B+ trees can be searched in two ways:
- Range search: B+ tree stores all data in leaf nodes, and the time complexity is O(logN) according to the search method of B tree.
- Linked list search: B+ tree can use leaf nodes to form a linked list to search, first find the bottom-left leaf node, and then search sequentially.
3.3 advantage
- Since B+ tree index nodes only store keys, each node can store more keys than a B tree, thereby reducing the height of the tree more and dividing the data range into more ranges. The more intervals, the faster the data retrieval.
- Since leaf nodes have Pointers to neighboring leaf nodes, the sequential query performance is faster.
- Typically, three or four levels of B+ trees are sufficient to support tens of millions of levels of data storage. If it’s bigger, it’s got to be separate.
Index design
When selecting to index a field, what data type do we expect the field (Key) to be?
When analyzing the principle of B+ tree above, we found that in B+ tree, since the space occupied by Pointers remains unchanged, the only thing that affects memory usage is Key.
- If the Key is of type int, it will take up to 4 bytes regardless of the number;
- If the Key is of type char, then the specified length is occupied.
- If the length is less than 4, the space occupied is less than int.
- If the length is greater than 4, the space occupied is greater than int.
So when designing an index, it’s a rule of thumb to make the Key take up as little storage as possible.
The prefix index
1. Index selectivity
Index selectivity = cardinality/total records of the table, range: (1 / total records of the table) ~ 1.
Cardinality indicates the number of unique or unique index values ina table.
MySQL uses the HyperLogLog algorithm to do cardinality statistics. The cardinality is calculated as an estimate, not an exact value. We only need to look at the magnitude for the cardinality, not the exact value.
The more selective an index is, the more efficient the query is, because a more selective index allows MySQL to filter out more rows in a lookup.
The selectivity of 1 is called unique index, which is the best index selectivity and has the best performance.
2. Prefix index
Prefix index is an important means of index optimization. The prefix index only selects part of the characters of the target column as the index. Reducing the characters of the index can save the index space and improve the index efficiency, but it will reduce the selectivity of the index.
For blob, TEXT, or long VARCHAR type fields, you must use prefixed indexes because MySQL does not allow indexes to the full length of these columns.
For example, you want to index a vARCHAR (20) type field in a table that stores data of either only 4 characters or 16 characters. If an index is added to the field, some keys take up 4 bytes, some 16 bytes. This is not allowed by MySQL.
We can use a fixed length prefix for each data in the current field as the Key, but we want to be selective in determining the prefix length (close to indexing the entire column), but not too long.
See the following example:
City_name = city_name; city_name = city_name; city_name = city_name;
-
The top 10 fields with the most repeated values and their number of repeated values are counted.
select count(*) as count, city_name from city group by city_name order by count desc limit 10; Copy the code
-
Observe the top 10 repetitions of the first three characters in the field and the number of repetitions.
select count(*) as count, left(city_name, 3) as prefix from city group by prefix order by count desc limit 10; Copy the code
-
Check whether the difference between the second query result and the first query result is large. If the difference is large, expand one character to the right and check the first four characters
select count(*) as count, left(city_name, 4) as prefix from city group by prefix order by count desc limit 10; Copy the code
-
Check whether the difference between the third query result and the second query result is large. If the difference is large, expand one character to the right and check the first five characters
select count(*) as count, left(city_name, 5) as prefix from city group by prefix order by count desc limit 10; Copy the code
-
Check whether the difference between the fourth query result and the third query result is large. If the difference is large, expand one character to the right and check the first six characters
select count(*) as count, left(city_name, 6) as prefix from city group by prefix order by count desc limit 10; Copy the code
-
And so on, until you see the difference between the seventh and the sixth time, stop. The sixth time the first 7 characters are checked, so 7 can be used as the final prefix length, and then the prefix index is created.
alter table city add key(city_name(7)); Copy the code
3. The advantages and disadvantages
Advantages: Smaller indexes, greatly saving index space and improving index efficiency.
Disadvantage:
- Reduce index selectivity.
- MySQL cannot use its prefix indexes to group by and order BY, nor can it use prefix indexes to do overwrite scans.
Clustered/non-clustered index
Clustered index and non-clustered index are not a separate index type, but a data storage method. The details depend on how it is implemented.
B-tree indexes can be classified into clustered indexes and non-clustered indexes according to different storage modes.
1. Cluster index
Clustered Index Clustered Index Clustered Index
Cluster index is to construct a B+ tree according to the primary key of the table, and the row record data of the table is stored in the leaf node. In plain English, the data in the table is stored together with the index, and the index is also found.
This storage nature also determines that each table can have only one clustered index.
InnoDB engine uses the clustered index storage method for primary key indexes. If no primary key is defined in the table, a field that is not null unique is selected. If no primary key is defined in the table, a 6-byte ROWID is implicitly defined as the primary key for the clustered index.
In InnoDB engine, the table data file itself is an index structure organized according to B+ tree. The Key of the node in this tree is the primary Key of the table, and the Value of the leaf node saves the complete data record.
So it can be said that in the InnoDB engine, index files and data files are not separated, and the table data itself is the primary index.
2. Non-clustered index
Non-clustered Index, also known as secondary Index or secondary Index.
An index created on top of a clustered index is called a non-clustered index, which in layman’s terms stores data separately from the index.
Composite indexes, prefix indexes, unique indexes, and primary key indexes in MyISAM engine are all non-clustered indexes.
A non-clustered index, like a clustered index, is constructed from a B+ tree, but instead of storing the row records of the table, the leaf nodes store primary keys or specified index fields.
- In the InnoDB engine, the leaves of B+ trees that are not primary key indexes store primary keys.
- In MyISAM, the primary key is stored in the leaf nodes of the B+ tree indexed by the primary key.
- In MyISAM, the leaf nodes of B+ trees that are not primary key indexes store the specified index fields.
3. Application scenarios
Suppose you now have a table user with a primary key in the ID field and an index added to the name field.
There are now two SQL statements:
- select * from user where id = 7;
- select * from user where name = ‘Jobs’;
What are the search procedures for SQL1 and SQL2 in InnoDB and MyISAM respectively?
Use the InnoDB engine to find processes:
SQL1 is a clustered index lookup and SQL2 is a non-clustered index lookup.
SQL1: InnoDB primary key index is a clustered index that organizes primary keys into a B+ tree and stores rows on leaf nodes. Use the condition “where ID = 7” to find the primary key in the B+ tree of the primary key index, according to the search algorithm of the B+ tree to find the corresponding leaf node, and then access the row data.
SQL2: InnoDB is non-clustered except for the primary key index, in InnoDB all non-clustered access row data requires a secondary lookup. Finding the name field with a condition like “where name = ‘jobs'” takes two steps:
- Get the primary key for row data by ‘jobs’ in the B+ tree of the name index.
- Row data is accessed through the obtained primary key in the B+ tree of the primary key index.
SQL2’s execution process is also known as a return table. Since a return table query contains two specific query actions, its performance is lower than that of a clustered index query. Avoid using a return table query when a clustered index can be used.
Use the MyISAM engine to find the flow:
Both SQL1 and SQL2 look up for non-clustered indexes.
Both primary key indexes and non-primary key indexes in MyISAM engine are non-clustered indexes, so the structure of B+ tree nodes of primary key indexes and non-primary key indexes is exactly the same, except that the leaf nodes of B+ tree of primary key indexes store primary keys, while the leaf nodes of B+ tree of non-primary key indexes store non-primary key fields.
In MyISAM, table data is stored independently, and the leaf node of B+ tree, whether primary key index or non-primary key index, also stores an address that points to a row in the table data. Therefore, each index B+ tree is completely independent and does not require mutual access (unlike InnoDB).
Select * from B+ tree where id = 7; select * from B+ tree where id = 7;
Select * from B+ tree where name = ‘jobs’; select * from B+ tree where name = ‘jobs’
4. Advantages and disadvantages of clustering index
Advantage:
- Clustered indexes are found faster than non-clustered indexes because clustered indexes do not need a secondary lookup in InnoDB engine and clustered indexes do not need addressing in MyISAM engine.
- Because the row data of the clustered index is stored together with the leaf node, there can be multiple rows of data in the same data page. When you access data on different rows on a data page, all the data on the page has been loaded into the memory. When you access a row on the page again, the data is directly accessed in the memory without accessing disks, reducing I/O operations.
- In The InnoDB engine, if the data is organized by the primary key, the primary key is loaded into memory along with the row data, and the data can be returned immediately after the leaf node is found, which is faster to retrieve.
- The InnoDB engine uses primary key values as “Pointers” rather than address values as Pointers to reduce the maintenance of non-clustered indexes when rows move or data pages split. In practice, the positions of rows change as the data in the table changes, which is reflected in the SPLITTING of nodes in a B+ tree. Although primary key values take up more space than address values, there is no need to update the “pointer” in the index when a row movement occurs. The reason why InnoDB’s primary key index is clustered is to ensure that no matter how the B+ tree of the primary key index changes, the B+ tree of the non-primary key index is not affected.
Disadvantage:
-
A table can have multiple non-clustered indexes and only one clustered index.
-
Cluster indexes are expensive to maintain because row movement can cause fragmentation.
-
If UUID is used as the primary key, data may be sparse due to the UUID itself, so the query speed using clustered index may be slower than that of full dictionary scan. Therefore, it is recommended to use the self-increasing ID as the primary key.
The reason why you are advised to use the self-increasing ID as the primary key is that the physical storage sequence of the data in the cluster index is the same as the index sequence. As long as the indexes are adjacent, the corresponding data is also stored adjacent to each other on the disk. If the primary key is not an auto-increment ID, the storage engine will constantly adjust the physical address of the data and pagination. While there are measures to reduce these operations, they cannot be completely avoided. If the ID is increment, the data only needs to be written page by page, and the index structure is relatively compact, with less disk fragmentation and higher efficiency.
-
In the InnoDB engine, if the primary key takes up more space, then the non-clustered index takes up more physical space because the leaves of the non-clustered index store the primary key value.
5. Choose
- Clustered indexes are suitable for sorting, but non-clustered indexes are not.
- When fetching a range of data, use a clustered index.
- You can keep related data together and then use clustered indexes to aggregate the data so that you only need to read a few data pages from disk to get all the data you need, reducing disk IO.
- MyISAM is recommended for sorting large amounts of data, full table scanning, count, and so on. Because of the small footprint, these operations need to be done in memory.
Indexes cover
Definition 1.
Coving index.
Index overwriting can be triggered when the Extra field of the output result of EXPLAIN is Using index.
In layman’s terms, index coverage is the ability to retrieve all the column data required by SQL from a single index tree without going back to the table.
The way to achieve index coverage is to create a joint index for the field to be queried.
Suppose you have a table named User, use the innoDB engine, the primary key is ID, and add a normal index to the name field.
Perform SQL1:
select id, name from user where name = 'Gates';
Copy the code
Mysql > select * from table where name = ‘Gates’; mysql > select * from table where name = ‘Gates’; mysql > select * from table where name = ‘Gates’; mysql > select * from table where name = ‘Gates’; In line with index coverage, high efficiency.
Perform SQL2:
select id, name, company from user where name = 'Gates';
Copy the code
Mysql > select * from innoDB where name = ‘Gates’; mysql > select * from innoDB where name = ‘Gates’; mysql > select * from innoDB where name = ‘Gates’; Therefore, the company field needs to be retrieved from the primary key index by the ID value. Consistent with the table, low efficiency.
Index (name); index(name, company); SQL2; Compliant with index coverage, no need to return to the table.
2. Application scenarios
Which scenarios can use index coverage to optimize SQL?
2.1 a scene
Scenario 1: Optimize the count query of the full table.
Suppose you have a user table, using the innoDB engine, and the primary key is ID.
Index overwrite cannot be implemented when the following SQL is executed
select count(name) from user;
Copy the code
After adding an index to the name field, index overwrite can be implemented to improve query speed
alter table user add key (name);
Copy the code
2.2 scenario 2
Scenario 2: Column query back to table optimization
That’s the example in the definition.
2.3 scenario 3
Scenario 3: Paging query
Suppose the table performs a paginated query as in the example in the definition and returns the table.
select id, name, company from user order by name limit 500.100;
Copy the code
If the single-column index index(name) is upgraded to a consistent index index(name, company), the table can be avoided.
Leftmost matching principle
Definition 1.
Follow the leftmost match rule when creating composite indexes.
Add to create a composite index index(a, b) and its b + tree looks like this
As you can see, the values of field a are in order: 1,1,2,2,3,3; The order of the b field is based on the order of a, but is unordered as a whole: 1,2,1,4,1,2.
The condition “where b = 2” does not allow the use of a compound index, because the whole compound index is sorted by A, and there is no way to use only the b field as the condition.
The condition “where a = 1 and b = 2” makes use of a compound index because b is relatively ordered in the case of a.
The condition “where a > 1 and b = 2” cannot use a compound index, because the whole compound index is sorted by A. A can use the index even if it is a range, but b cannot, because b is unordered in the range of A.
The condition “where a > 1 and a < 3 and b > 1” cannot use a compound index, because the whole compound index is sorted by A. A field can use the index even if it is a range, but b field cannot, because b field is unordered in the range of A field.
Thus, the left-most matching definition is derived: In the query conditions, the field defined at the leftmost of the composite index is taken as the starting point, and all the fields and order are consistent with those defined in the composite index. The query conditions formed by the composite index can all use the composite index. However, if the range query (>, <, between) and fuzzy query (like) appear in the query conditions, Order by requires a detailed analysis.
2. Application scenarios
2.1 a scene
Suppose we now have the following table user, using innoDB engine with primary key ID, and add compound indexes to name and company fields
So which of the following four SQL statements will use the conformance index?
// SQL1
select * from user where name = 'Gates' and company = 'Microsoft';
// SQL2
select * from user where name = 'Gates';
// SQL3
select * from user where company = 'Microsoft';
// SQL4
select * from user where company = 'Microsoft' and name = 'Gates';
Copy the code
SQL1 conforms to the leftmost rule because the query criteria start with the leftmost field name defined, and the Company field is also in the definition of conforming indexes, so compound indexes are used.
SQL2 conforms to the leftmost rule, because the leftmost field name defined in the query criteria is the starting point, so a compound index is used.
SQL3 does not comply with the leftmost principle because there is no leftmost field name defined as the starting point in the query criteria, so compound indexes cannot be used.
SQL4 does not comply with the leftmost principle, because the query condition does not start with the leftmost field name, but MySQL has a query optimizer, which automatically optimizes the query order, so compound indexes are used.
2.2 scenario 2
Suppose we now have the following table user, using innoDB engine with primary key ID, and add compound indexes to name and company fields
So which of the following four SQL statements will use the conformance index?
// SQL1
select * from user where name like 'Ga%';
// SQL2
select * from user where name like '%tes';
// SQL3
select * from user where name like '%ate%';
// SQL4
select * from user where company like 'Mic%' and name like 'Ga%';
Copy the code
As a Key character data in the B + tree ordering rules: first is the first character of a string, the first character string as a whole is small, if the first character, then to compare the second character, the second character string as a whole is small, and so on, until there is a character is not the same.
Therefore, if Key is a character data to add index, then B+ tree is sorted according to the prefix of Key. If fuzzy query and range query are involved in character data, not only should we consider whether it conforms to the leftmost principle, but also need to judge whether it can be queried through B+ tree.
SQL1 conforms to the leftmost rule, because the leftmost field name defined in the query criteria is the starting point, and the fuzzy query criteria prefixes name with ‘Ga’, so compound indexes are used.
SQL2 conforms to the left-most principle, because in the query condition, the leftmost field name is defined as the starting point, but in the fuzzy query condition, the prefix of name is not provided, only infixes are provided, and node search in B+ tree cannot be carried out, so compound indexes cannot be used.
SQL3 complies with the leftmost principle, because the query condition defines the leftmost field name as the starting point, but in the fuzzy query condition does not provide the prefix of name, only provides the suffix, cannot search nodes in the B+ tree, therefore cannot use compound index.
SQL4 does not meet the leftmost principle, because the query condition does not define the leftmost field name as the starting point, but in the fuzzy query condition provided name prefix ‘Ga’ and company prefix ‘Mic’, because MySQL has a query optimizer, will automatically optimize the query order, so the use of composite index.
2.3 scenario 3
Suppose a table has three fields a, B and C, all of which are int types, and innoDB is used to store them.
So which of the following four SQL statements will use the conformance index?
// SQL1
select * from table order by a,b,c limit 10;
// SQL2
select * from table order by b,c,a limit 10;
// SQL3
select * from table order by a limit 10;
// SQL4
select * from table order by a,b limit 10;
// SQL5
select * from table_name where a = 1 order by b,c limit 10;
Copy the code
In general, we can only load records into memory and sort them in memory using some sort algorithms, such as quicksort, merge sort, etc. Sometimes if the query result set is too large to be sorted in memory, it may temporarily use disk space to store intermediate results, and then return the sorted results to the client after the sorting operation is complete.
This sort in memory or on disk is called file sort in MySQL. File sorting is very slow, but if the order by clause uses indexed columns, it is possible to omit the file sorting step.
Since the B+ tree index itself is sorted according to the above rules, it is possible to extract data directly from the index, and then perform a back-table operation to retrieve columns that are not contained in the index.
If the clause following ORDER by uses fields defined in the composite index, those fields must also be compounded to the left to use the composite index.
SQL1 conforms to the leftmost principle because the grouping clause starts with the leftmost field defined as a and uses the index fields in the order defined as A, B, and C, so it uses compound indexes.
SQL2 does not comply with the leftmost rule, because the grouping clause does not start with the leftmost field a, because the whole B+ tree is sorted by field A, so compound indexes cannot be used, and MySQL’s query optimizer cannot optimize group by.
SQL3 follows the leftmost principle, because the leftmost field defined in the grouping clause, A, is used as the starting point, so a compound index is used.
SQL4 complies with the leftmost principle because the grouping clause starts with the leftmost field defined as a and uses the index fields in the order in which A and B are defined, so compound indexes are used.
SQL5 conforms to the leftmost principle because the leftmost defined field A is used as the starting point in the grouping clause. Since the starting point a field is constant, b field is ordered relative to A field, and C field is ordered relative to B field, b and C fields can be sorted by b + tree, so compound indexes are used.
An index pushdown
Definition 1.
Index condition pushdown index condition pushdown index condition pushdown index condition pushdown index condition pushdown
If the value of Extra is Using index condition, the SQL server is Using index push-down.
The following MySQL architectures are briefly introduced:
If ICP is not used, the storage engine retrieves the data through the index when querying with a non-primary key index and passes it to the Server, which determines whether the data meets the query criteria.
If ICP is used, when non-primary key indexes are used for query, the Server sends the query criteria to the storage engine. The storage engine then determines whether the data meets the query criteria. Only the data that meets the query criteria can be retrieved and returned to the Server.
2. Application scenarios
Applicable scenarios for index push-down:
- A full table scan is required.
- Applicable to InnoDB engine and MyISAM engine.
- The InnoDB engine only works with non-clustered indexes.
- Subquery conditions cannot be pushed down, trigger conditions cannot be pushed down, and call stored procedure conditions cannot be pushed down.
Suppose you have a user table using the InnoDB engine with fields ID, name, age. Create a composite index index(name, age) for the primary key of the ID field, name field, and age field.
Execute the following SQL:
select * from user where name like 'Ga%' and age = 20;
Copy the code
When ICP is not used, the age field is ignored inside the index, and the name field is first searched, and two data values are found in the B+ tree of the composite index, with ids 5 and 7, respectively, and then the id value is returned to the table twice.
When ICP is used, the index internally determines whether age equals 20 and skips data that does not. Select * from B+ tree; select * from B+ tree; select * from B+ tree;
3. The advantages
Index push-down reduces the number of times the storage engine queries the underlying table and the number of times the Server receives data from the storage engine.
In plain English, the optimization of index push-down on non-primary key indexes can effectively reduce the number of times back to the table and greatly improve the efficiency of query.