I am the wind. This article will talk about how to use MySQL index well.
To better explain, I created a table user_InnoDB with storage engine as InnoDB and batch initialized 500W+ data. Contains the primary key id, the name field, the gender field (0,1 for different genders), the phone number field (phone), and creates a joint index for the name and phone fields.
CREATE TABLE `user_innodb` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL.`gender` tinyint(1) DEFAULT NULL.`phone` varchar(11) DEFAULT NULL,
PRIMARY KEY (`id`),INDEX IDX_NAME_PHONE (name, phone)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Copy the code
1. The cost of indexing
Indexes can be very effective in improving query efficiency, so why not create an index for each field? I advise you to take it easy.
Every coin has two sides, index is no exception. Overusing indexes costs us both in space and time.
1.1 Spatial costs
An index is a B+ number. Each index requires a B+ tree. Each node of the B+ tree is a data page, and each data page occupies 16KB disk space by default. So create a lot of indexes and your disk space will be consumed quickly.
1.2 Cost in time
The cost of space is something you can do with money power, but the cost of time is something we can’t do.
Maintenance of linked lists
I take the primary key index as an example. The records in each node of the B+ tree of the primary key index are connected in a one-way linked list in ascending order of primary key values. As shown below:
If I now delete the record with primary key ID 1, I will break the records ordering in the three data pages and need to rearrange the records in the three data pages, as well as insert and modify operations.
Note: In fact, the delete operation does not immediately rearrange the records in the data page, but will mark the deleted records with a delete mark. When it is appropriate, the records will be removed from the linked list, but it always needs to involve the maintenance of the sorting, which is bound to consume performance.
If the table has 12 fields and we index all 12 fields in the table, we delete 1 record, which requires sorting maintenance of records in N data pages of 12 B+ trees.
To make matters worse, when you add, delete, or change records, you can trigger the recycling and fragmentation of data pages. Again, if I delete the record with id 13, then data page 124 is unnecessary and will be reclaimed by InnoDB storage engine. I insert a record with id 12. If there is not enough space on data page 32 to store the record, InnoDB needs to split the page again. We don’t need to know the details of page reclamation and page splitting, but you can imagine how complex this operation can be.
If you create indexes for every field, you can imagine the performance penalty of all this index maintenance.
Query plan
Before the query statement is executed, the MySQL query optimizer optimizes a query statement based on cost and generates an execution plan. If too many indexes are created, the optimizer calculates the search cost per index, resulting in too much time spent on analysis, which ultimately affects query execution efficiency.
2. The cost of returning tables
2.1 What is a back table
We can find the leaf nodes in the B+ tree through the secondary index, but the leaf nodes in the secondary index are not complete, only the value of the index column and the primary key value. We need to take the primary key value and then go to the leaf node of the clustered index (primary key index) to retrieve the complete user record, this process is called table back.
In the figure above, I take the name secondary index as an example, and only draw the leaf nodes of the secondary index and the leaf nodes of the cluster index, omitting the non-leaf nodes of the two B+ trees.
The three lines that extend from the leaf node of the secondary index represent back table operations.
2.2 Cost of returning to the table
The cost of searching the leaf node of the secondary index based on the name field is relatively small, for two reasons:
- The page where the leaf node is located is associated by bidirectional linked list, and the traversal speed is relatively fast.
- MySQL tries to keep the data pages of leaf nodes of the same index next to each other in disk space to avoid random I/OS.
However, there is no regularity in the arrangement of primary key IDS in the secondary index leaf node. After all, the name index sorts the name field. When performing back tables, it is highly likely that the record with the primary key ID will repeatedly hop across the cluster index leaf node (as indicated by the three lines in the back table above), i.e., random IO. This isn’t too bad if the target data page happens to be in memory, but if it isn’t, loading the contents of a data page (16KB) from disk into memory is too slow.
Now that I’ve talked about the cost of going back to the table, will I give you a more efficient way to search? No, back to the table has been a relatively efficient way to search, we need to do is to minimize the loss caused by back to the table operation, summed up in two points:
- Can not return to the table is not back;
- Reduce the number of times you must return to the table.
I’m going to introduce you to two important concepts related to back tables. The methods involved in these concepts are also part of the principles of indexing, so I’ll explain them to you first.
3. Index overwrite and index push-down
3.1 Index Coverage
If the leaves of a non-clustered index have all the data you want, you don’t need to return to the table. For example, I create a joint index for the name and phone fields, as shown below:
If we just want to search the name, phone, and primary key fields,
SELECT id.name, phone FROM user_innodb WHERE name = "Cicada Mu Wind";
Copy the code
You can get all the data directly from the leaf node, without going back to the table at all.
We call a query whose index already contains all the column data that needs to be read an overwrite index (or index overwrite).
3.2 Index push-down
3.2.1 concept
Mysql > select * from user where name = ‘mufeng’ and phone = ‘6606’; mysql > select * from user where name = ‘mufeng’;
SELECT * FROM user_innodb WHERE name = "Cicada Mu Wind" AND phone LIKE "% 6606";
Copy the code
Since the records of the leaf node of the joint index are sorted by the name field first, and then by the phone field if the name field is the same, it is impossible to use the orderliness of the index to make a quick comparison when adding % before the phone field. This means that only the name field in the query can be quickly compared and filtered using the index. The normal query process would look like this:
-
InnoDB uses the union index to find all secondary index data with the name of the index, and obtains 3 primary keys: 3485,78921, 1,423476;
-
Get the primary key index for back table, get these three complete user records in the cluster index;
-
InnoDB returns these three complete user records to MySQL’s Server layer, where it filters out users with the end number 6606.
As shown in the following two pictures, the first picture shows that InnoDB gets three complete user records through three times of table return and delivers them to the Server layer. The second picture shows that the Server layer finds the records that meet the search criteria after filtering by phone LIKE “%6606” and returns them to the client.
Of concern is that index use is done in the storage engine, while data record comparison is done in the Server layer. Now let’s take the above search to the extreme. If all 100,000 records in the table meet the condition name=’ mu wind ‘and only one of them meets the condition phone LIKE’ %6606 ‘, this means that InnoDB needs to send 99,999 invalid records to the Server layer for filtering. What’s more, the 99,999 entries were all searched through the table. You already know the cost of returning tables.
Now introduce index push-downs. Index Condition Pushdown (ICP) means that the filtering is done by the lower storage engine layer using indexes instead of pushing up to the Server layer for processing. ICP is a post-mysql5.6 feature.
In the first step, we found 3 records in the leaf node of the joint index by name = “mu wind”, and the phone field also happens to be in the leaf node of the joint index. At this time, the leaf node of the joint index can be directly traversed to screen out the record with the tail number of 6606 and find the record with the primary key value of 78921. Finally, only one time is required to perform table back operation to find the one record that meets all conditions and return it to the Server layer.
Obviously, using ICP can effectively reduce the number of times to return to the table.
In addition, ICP is enabled by default. For secondary indexes, as long as the conditions can be thrown to the storage engine below, the storage engine will filter without our intervention.
3.2.2 demo
Check the current ICP status:
SHOW VARIABLES LIKE 'optimizer_switch';
Copy the code
Execute the following SQL statement and view the execution plan Using Index condition Using EXPLAIN
EXPLAIN SELECT * FROM user_innodb WHERE name = "Cicada Mu Wind" AND phone LIKE "% 6606";
Copy the code
And then turn off the ICP
SET optimizer_switch="index_condition_pushdown=off";
Copy the code
Check the status of ICP again
Execute the query again and look at the execution plan Using WHERE Using EXPLAIN
EXPLAIN SELECT * FROM user_innodb WHERE name = "Cicada Mu Wind" AND phone LIKE "% 6606";
Copy the code
Note: Even if the criteria for index push-down are met, the query optimizer may not use index push-down because there may be a more efficient way.
Because I created an index for the name field before, I did not use the index push down. The EXPLAIN statement shows that the name index is used instead of the joint index of name and phone. The effect of the above screenshot was obtained after the name index was deleted. You have to be careful when you do experiments.
By now you should be aware of the performance issues associated with indexing and back tables, and this is not to scare you away from indexing, but rather to embrace indexing in the right way to minimize its negative impact and maximize its benefits. How to use the index well, consider from two aspects:
- Efficient use of indexes that have been created (avoid index invalidation)
- Create appropriate indexes for appropriate columns (index creation principles)
4. When does an index fail?
4.1 Violation of left-most prefix rule
Taking the joint index created at the beginning of this paper as an example, the records in the B+ tree data page of the joint index are sorted according to the name field first, and then sorted according to the phone field if the name field is the same.
Therefore, if we search directly with the phone field, we cannot take advantage of the orderliness of the index.
EXPLAIN SELECT * FROM user_innodb WHERE phone = "13203398311";
Copy the code
EXPLAIN allows you to view the execution plan of the search statement. The possible_keys column indicates possible indexes in the current query. The key column indicates which indexes are actually used.
But once we add name to the search criteria, we use the federated index, and we don’t care WHERE name is in the WHERE clause because the query optimizer does the optimization for us.
EXPLAIN SELECT * FROM user_innodb WHERE phone = "13203398311" AND name = 'Cicada Mu Wind';
Copy the code
4.2 Using reverse Query (! =, < >, NOT LIKE)
MySQL is using reverse query (! =, <>, NOT LIKE), will cause a full table scan, except overwrite index.
EXPLAIN SELECT * FROM user_innodb WHERE name! ='Cicada Mu Wind';
Copy the code
4.3 LIKE Starts with a wildcard character
Mysql > select * from B+ tree where name = ‘% mu ‘or name = ‘% mu’; mysql > select * from B+ tree where name = ‘% mu ‘; mysql > select * from B+ tree where name = ‘% mu ‘ The query can be performed only in full table scan mode.
EXPLAIN SELECT * FROM user_innodb WHERE name LIKE Mphone '%';
Copy the code
But using wildcard endings is fine
EXPLAIN SELECT * FROM user_innodb WHERE name LIKE 'cicada mu %';
Copy the code
4.4 Performing any operation on an index Column
If you do something other than just use the index column, such as numeric evaluation, use of functions, or (manual or automatic) type conversion, the index will be invalidated.
4.4.1 Using Functions
EXPLAIN SELECT * FROM user_innodb WHERE LEFT(name.3) = 'Cicada Mu Wind';
Copy the code
MySQL8.0 added the function index function. You can create an index to the result of a function using the following statement
ALTER TABLE user_innodb ADD KEY IDX_NAME_LEFT ((left(name.3)));
Copy the code
Execute the EXPLAIN statement again, at which point the index takes effect
4.4.2 Using expressions
EXPLAIN SELECT * FROM user_innodb WHERE id + 1 = 1100000;
Copy the code
Another way to use indexes efficiently is to use ids alone:
EXPLAIN SELECT * FROM user_innodb WHERE id = 1100000 - 1;
Copy the code
4.4.3 Using type conversion
Case 1
The phone field in user_InnoDB is of type VARCHAR. Before the experiment, we create an index for the phone field
ALTER TABLE user_innodb ADD INDEX IDX_PHONE (phone);
Copy the code
Search any existing phone number to see if the index is successful
EXPLAIN SELECT * FROM user_innodb WHERE phone = '13203398311';
Copy the code
Now let’s change it a little bit: phone = ‘13203398311’ to phone = 13203398311. This means that we have changed the string search criteria to an integer search criteria. Let’s see if the index is still used:
EXPLAIN SELECT * FROM user_innodb WHERE phone = 13203398311;
Copy the code
Display index invalid.
Case 2
Let’s look at another example where the primary key ID is of type Bigint, but I expect to use string type in the search criteria:
EXPLAIN SELECT * FROM user_innodb WHERE id = '1099999';
Copy the code
conclusion
To summarize the problem a little bit, when the index field type is string, searching with numeric type does not use the index; When the index field type is numeric, a string search uses the index.
To understand this, we need to know what MySQL’s data type conversion rules are. Does MySQL automatically convert numbers to strings or does MySQL automatically convert strings to numbers?
MySQL > SELECT * from ’10’ where ’10’ > 9;
- If the value is 1, MySQL automatically converts the string type to a number
SELECT 10 > 9
; - If the value is 0, MySQL automatically converts numbers to strings
SELECT '10' > '9'
.
mysql> SELECT '10' > 9; + -- -- -- -- -- -- -- -- -- -- + | | '10' > 9 + -- -- -- -- -- -- -- -- -- -- + | 1 | + -- -- -- -- -- -- -- -- -- -- + 1 row in the set (0.00 SEC)Copy the code
MySQL automatically converts a string to a number when it encounters a type conversion. Therefore, for example 1:
EXPLAIN SELECT * FROM user_innodb WHERE phone = 13203398311;
Copy the code
Is equivalent to
EXPLAIN SELECT * FROM user_innodb WHERE CAST(phone AS signed int) = 13203398311;
Copy the code
That is, using a function on an index field. As described above, using a function on an index does not use an index.
For example 2:
EXPLAIN SELECT * FROM user_innodb WHERE id = '1099999';
Copy the code
Is equivalent to
EXPLAIN SELECT * FROM user_innodb WHERE id = CAST('1099999' AS unsigned int);
Copy the code
No operations are added to the index field, so the index can be used.
4.5 the OR connection
Queries that use OR joins do not use indexes if the condition column before OR is an index column, but the condition column after OR is not an index column. For example:
EXPLAIN SELECT * FROM user_innodb WHERE id = 1099999 OR gender = 0;
Copy the code
The above summary summarizes some index failure scenarios, these lessons are often useful for SQL optimization, but it is important to note that these lessons are not a golden rule.
For example, when using <> queries, there are times when indexes can be used:
EXPLAIN SELECT * FROM user_innodb WHERE id <> 1099999;
Copy the code
In the end, it is up to the MySQL Optimizer to decide whether to use an index or not, and the Optimizer is based on the Cost Base Optimizer. The Optimizer is not based on specific rules, it is not based on semantics, and it simply executes a low-cost solution. As a result, you can see possible_keys column in the EXPLAIN result, where the optimizer will try out the possible_keys column once. , and then select the least overhead, if not too much line, then directly scan the full table.
The cost depends on the database version, the amount of data, etc., so if you want to improve index functionality more precisely, embrace EXPLAIN!
5. Index creation (use) principles
Index overwrite and index push-down can be used as the principle of index creation, which is to make the best use of index overwrite and index push-down when creating an index.
Try to avoid the above mentioned index failure situation, also the index use principle.
In addition, let me introduce some more to you.
5.1 Do not create indexes for highly discrete columns
COUNT(DISTINCT(column_name))/COUNT(*), the ratio of the number of DISTINCT columns to all data rows. In short, the more duplicate values a column has, the less discrete the column is. The fewer repetitions, the higher the dispersion.
For example, if we create an index for the Gender column, which has only 0 and 1 values, we would find a lot of duplicate data in the secondary index and do a lot of back-table operations. A lot of back table huh? You get the idea.
Do not create indexes for columns with many duplicate values
5.2 Create indexes only for columns used for searching, sorting, or grouping
We create indexes only for columns that appear in the WHERE clause or in the ORDER BY and GROUP BY clauses. Columns that appear only in the query list do not need to be indexed.
5.3 Make good use of federated indexes
Use two SQL statements to illustrate the problem:
1. SELECT * FROM user_innodb WHERE name = 'Cicada Mu Wind' AND phone = '13203398311';
Copy the code
2. SELECT * FROM user_innodb WHERE name = 'Cicada Mu Wind';
Copy the code
Both statements 1 and 2 can use indexes, which brings us to an index design principle:
Do not create a separate index for the first index column of a federated index
Because the union index itself sorts by the name column first, the union index searches for name effectively and does not need to create a separate index for name. And that’s why
When creating a federated index, always place the most frequently used column at the far left
5.4 Creating prefix Indexes for Fields that are Too long
If a column in string format takes up a lot of space (that is, long string data is allowed to be stored), creating an index for that column means that the data for that column will be fully recorded in every record on every data page and will take up a significant amount of storage space.
To do this, we can create an index for the first few characters of the column, which means that only the first few characters of the string are retained in the records of the secondary index. For example, we could create an index for the phone column that only keeps the first 3 digits of the phone number:
ALTER TABLE user_innodb ADD INDEX IDX_PHONE_3 (phone(3));
Copy the code
Then execute the following SQL statement:
EXPLAIN SELECT * FROM user_innodb WHERE phone = '1320';
Copy the code
Since only the first three digits of the mobile phone number are reserved in the IDX_PHONE_3 index, we can only locate the secondary index records starting with 132, and then determine whether they meet the condition that the fourth digit is 0 when traversing all the secondary index records.
If a column contains a large number of characters, creating a prefix index for the column can effectively save disk space
5.5 Values that are frequently updated. Do not use them as primary keys or indexes
This is because of possible data page splitting situations that can affect performance.
5.6 Random and out-of-order values, such as ID and UUID, are not recommended as indexes
MySQL > select * from primary key; MySQL > select * from primary key;
6. Recommended reading
- MySQL primary key query why is so fast
- This time, understand MySQL indexes thoroughly
I am Cicada Mu wind, if you think this article is well written, please like and comment, your support means a lot to me! See you next time!