Students often ask me, I use an index in a SQL statement, why will still enter the slow query? Today we’ll start with this question and talk about indexing and slow queries.
As an aside, IN my opinion, the team should use ORM reasonably, and we can refer to the trade-offs and choices of ORM. Use of ORM is of the advantage in terms of object-oriented and write operation, to avoid possible pit on joint query (of course, if you have strong Linq query skills another matter), because the ORM blocked too much DB at the bottom of the content of knowledge, the programmer is not a good thing, have ultimate pursuit of performance, but the ORM understand thoroughly team more carefully.
Case analysis
Without further ado, to experiment, I created the following table:
CREATE TABLE `T`(
`id` int(11) NOT NULL,
`a` int(11) DEFAUT NULL,
PRIMARY KEY(`id`),
KEY `a`(`a`)
) ENGINE=InnoDB;
Copy the code
This table has three fields, where id is the primary key index and A is the normal index.
First, SQL determines whether a statement is a slow query statement, using the statement execution time. He compares the statement execution time to the system parameter long_query_time. If the statement execution time is longer than long_query_time, the statement will be logged to the slow query log. The default value of this parameter is 10 seconds. Of course, in production, we will not set this large, usually set 1 second, for some sensitive business, may set a value less than 1 second.
If a table index is used during statement execution, you can explain the output result of a statement to see that the KEY value is not NULL.
Let’s see explain select * from t; The result for KEY is NULL
explain select * from t where id=2; A PRIMARY KEY is used for the index
explain select a from t; The KEY of the index is a, indicating that the index a is used.
Although neither of the latter two queries has a NULL KEY, the last one actually scans the entire index tree A.
Given that the table has 1 million rows, the statement in Figure 2 can still execute fast, but figure 3 is definitely slow. In more extreme cases, for example, if the CPU pressure on the database is very high, the second statement may also take longer than long_query_time and enter the slow query log.
So we can conclude that there is no correlation between using an index and entering a slow query. The use of an index simply represents the execution of an SQL statement, and the entry into a slow query is determined by its execution time, which can be influenced by various external factors. In other words, with indexes your statements may still be slow.
Insufficiency of full index scan
If we look at the problem at a deeper level, there is a hidden problem that needs to be clarified, which is the use of indexes.
As we all know, InnoDB is an index organized table, all data is stored in the index tree. For example, table T above contains two indexes, a primary key index and a normal index. In InnoDB, data is placed in the primary key index. As shown in the figure:
If you run select from t where id>0, do you think this statement is useful for indexing?
The explain output of the above statement shows PRIMARY. As you can see from the data, this statement must have been scanned. The optimizer, however, considers that this statement needs to be executed according to the primary key index to locate the first value that meets the ID>0, which is also considered to use the index.
So even if the KEY written in the explain result is not NULL, it may actually be a full table scan, so there is only one case in InnoDB that is not using index, which is to scan the whole index tree to the right from the leftmost leaf of the primary KEY index.
That said, the absence of an index is not an accurate description.
You can use a full table scan to indicate that a query traverses the primary key index tree;
You can also use full index scans to specify things like select a from t; For such queries, he scans the entire plain index tree;
Select * from t where id=2 select * from t where id=2 What he meant was that we used the index’s quick search capability and effectively reduced the number of rows scanned.
The index should be sufficiently filterable
Based on the above anatomy, we know that full index scanning can slow down queries, so let’s talk about filters of indexes.
Select * from T_people where age between 10 and 15 select * from T_people where age between 10 and 15 select * from t_people where age between 10 and 15
When you look at the statement, you must start indexing on the AGE field, otherwise it will be a full scan, but you will find that the statement is still slow after you index, because there may be more than 100 million rows of data that satisfy this condition.
Let’s look at the organization chart of the table after the index is created:
The execution flow of this statement looks like this:
Obtain the primary key ID value of the first record whose age is equal to 10. According to the value of the primary key id, obtain the entire row information from the primary key index and return it as part of the result set.
Scan to the right on the index age, take the value of an ID, take the entire row on the primary key index, and return it as part of the result set;
Repeat the above steps until you hit the first record whose age is greater than 15;
If you look at this statement, even though he uses an index, he scans over 100 million rows. So now you know, when we’re talking about indexes, what we really care about is the number of rows scanned.
For a large table, not only must there be an index, but the index must be filterable enough.
In this example, age, the filter is not good enough, so when we design the table structure, we want all the filters to be good enough, that is, discriminating enough.
The cost of returning to the table
So the filter is good, does it mean that the number of scanned lines must be less?
Let’s look at another example:
Select * from t_people where name=’ zhang3 ‘and age=8
T_people table there is an index name and age on the joint index, the index of the joint in should be pretty good, can be found in the combined index quickly first name is zhang SAN, and the age of eight children, of course, such children should be not much, so few scan lines to the right, the query efficiency is high.
But the filter of the query and index are not necessarily the same, if now you need to find all the names of the first word is zhang, and the age of all children is 8 years old, your statement will be what?
How do you write your sentences? Select * from t_people where name like ‘%’ and age=8; select * from t_people where name like’ %’ and age=8;
In MySQL5.5 and previous versions, this statement runs like this:
First, select the first age field from the joint index, fetch the primary key ID, then go to the primary key index tree, fetch the whole row value according to the ID;
Determine whether the age field is equal to 8 and return it as a row in the result set if so, discard it if not.
Iterate right over the joint index and repeat the table and judgment logic until you hit a record where the first word of the name on the joint index tree is not zhang.
We call the action of looking up an entire row on a primary key index by ID a back table. As you can see, the most time-consuming step in this process is to return to the table. Suppose there are 80 million people whose first name is zhang in the country, then this process will return to the table 80 million times. When locating the first row of records, only the left-most prefix of the index and the joint index can be used, which is called the left-most prefix principle.
You can see this execution process, it has a lot of times back to the table, the performance is not good enough, is there a way to optimize?
In MySQL5.6, the index condition pushdown optimization was introduced. Let’s take a look at the optimized execution flow:
First of all, from the joint index tree, find the first age field is the beginning of the record, determine whether the index record, the age value of 8, if yes, back to the table, take out the whole row of data, as a part of the result set return, if not discarded;
In the joint index tree, to the right traversal, and judge the age field, according to the need to do back to the table, until the joint index tree name of the first word is not zhang’s record;
Keep up with the differences, the process is in the process of traversing the joint index, push the age under the condition of equal to 8 to all traversal process, reduce the number of the back to the table, assume that the name the first word is a person, there are 1 million is 8 years old children, so the query process in joint index to traverse the 80 million times, It only takes a million times to go back.
Virtual columns
You can see that this optimization is still pretty good, but it still doesn’t bypass the left-most prefix rule, so you still have to scan 80 million rows in the joint index. Is there a further optimization method?
Consider a joint index of the first word of the name with age. You can do this using the virtual columns introduced in MySQL5.7. SQL > alter table structure;
alter table t_people add name_first varchar(2) generated (left(name,1)),add index(name_first,age);
Copy the code
Let’s look at the effect of this SQL statement:
CREATE TABLE `t_people`(
`id` int(11) DEFAULT NULL,
`name` varchar(20) DEFAUT NULL,
`name_first` varchar(2) GENERATED ALWAYS AS (left(`name`,1)) VIRTUAL,KEY `name_first`(`name_first`,'age')
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code
Create a joint index for name_FIRST and age. The value of this virtual column is always equal to the first two bytes of name. The virtual column cannot specify a value when inserting data, nor can it be changed when updating. Its value is automatically generated by definition and is automatically changed when the name field is modified.
Select * from t_people where name_first=’ zhang ‘and age=8
In this case, only 1 million rows of the federated index need to be scanned and returned to the table 1 million times. The essence of this optimization is that we create a more compact index to speed up the query process.
conclusion
This article introduces the basic structure of the index and some basic ideas of query optimization. You now know that the statements using the index can also be slow queries. Our query optimization process is often the process of reducing the number of scanned rows.
Slow query can be summed up as follows:
-
A full table scan
-
Full index scan
-
Index filtering is not good
-
The cost of frequent table returns
thinking
What if the business requirement is to count 1.4 billion people between the ages of 10 and 15, without adding a filter factor? (select * from t_people where age between 10 and 15)
Suppose the statistics must be OLTP, and the statistics are presented in real time.
Author: Zhang Feihong
cnblogs.com/jackyfei/p/12122767.html