Interviewer: What about index failures? What is the cause of the failure?

Hello, I’m Xiao Lin.

At work, if we want to speed up a statement query, we usually want to index a field.

But indexes are not a panacea. The creation of an index does not mean that any query can go through the index scan.

A little attention, you may write a query statement will lead to index failure, so that the full table scan, although the query result is no problem, but the performance of the query is greatly reduced.

Today, I’ll tell you about six common index failures.

We will not only illustrate with experimental cases, but also make clear the reason for each index failure.

Start!

What does the index storage structure look like?

Let’s take a look at what the index storage structure looks like, okay? Only by knowing the storage structure of the index can we better understand the index failure problem.

The index structure depends on which storage engine MySQL uses, because storage engines persist data to disk, and different storage engines use different index data structures.

MySQL’s default storage engine is InnoDB, which uses B+Tree as its index data structure. For details, see this article: Why MySQL likes B+Tree?

When creating a table, InnoDB storage engine creates a primary key index (the clustered index) by default, and all other indexes are secondary indexes.

MySQL’s MyISAM storage engine supports a variety of index data structures, such as B+ tree index, R tree index, and full-text index. When the MyISAM storage engine creates a table, the primary key index created by default uses B+ tree index.

Although InnoDB and MyISAM both support B+ tree indexes, their data storage structures are implemented differently. The differences are:

InnoDB storage engine: the leaf node of the B+ tree index stores the data itself;
MyISAM storage engine: the physical address of the data stored in the leaf node of the B+ tree index;

Next, I will give you an example to show the difference between the index storage structure of the two storage engines.

Here is a T_USER table where the ID field is the primary key index and everything else is plain.

If the MyISAM storage engine is used, the leaf node of the B+ tree index holds the physical address of the data, i.e. the pointer to the user data, as shown below:

If InnoDB storage engine is used, the leaf node of the B+ tree index stores the data itself, as shown below:

InnoDB storage engine is divided into clustered indexes (shown above) and secondary indexes according to index type. The difference is that the leaf node of the clustered index stores the actual data, and all the complete user data is stored in the leaf node of the clustered index, while the leaf node of the secondary index stores the primary key, not the actual data.

If the name field is set to a normal index, the secondary index looks like this, with the leaf node holding only primary keys.

Now that we know the storage structure of the clustered index and secondary index of InnoDB storage engine, we will explain how to choose which index type to use in the query process.

When we use the “primary key index” field as the conditional query, if the data to be queried are all in the leaf node of the “cluster index”, the corresponding leaf node will be retrieved in the B+ tree of the “cluster index”, and then the data to be queried will be directly read. Such as the following statement:

Select * from t_user where id=1;Copy the code

When we use the “secondary index” field as a conditional query, we need to retrieve two B+ trees if all the data to be queried are in the “clustered index” leaf node:

First, find the corresponding leaf node in the B+ tree of the “secondary index” and obtain the primary key value.
Then use the primary key value obtained in the previous step to retrieve the corresponding leaf node in the B+ tree of “Cluster index”, and then obtain the data to be queried.

The above process is called back to the table, as in the following statement:

Select * from t_user where name=" * ";Copy the code

When we use the “secondary index” field as a conditional query, if the data to be queried is in the leaf node of the “secondary index”, we only need to find the corresponding leaf node in the B+ tree of the “secondary index” and then read the data to be queried, which is called the overwriting index. Such as the following statement:

Select id from t_user where name=" Lin ";Copy the code

The conditions of these queries all use index columns, so indexes are used in the query process.

However, it does not mean that just because a query condition uses the index column, the query process must use the index. Let’s take a look at the circumstances that cause the index to be implemented and the full table scan to occur.

First of all, for the following experimental example, I used MySQL version 8.0.26.

Use left or right fuzzy matching for indexes

When we use left or right fuzzy matching, that is, like %xx or like %xx%, both methods will cause index invalidation.

Select * from ‘Lin’ where ‘Lin’ = ‘Lin’; select * from ‘Lin’ where ‘Lin’ = ‘Lin’;

Select * from t_user where name like '% Lin ';Copy the code

Type =range; key=index_name;

Select * from t_user where name like 'Lin %';Copy the code

Why can’t the left or right fuzzy match of the like keyword go through the index?

Because index B+ trees are stored in order by “index value”, comparisons can only be made by prefix.

For example, the following secondary index chart is stored in order of the name field.

SQL > select * from ‘Lin %’ where name =’ Lin %’;

Comparison of the first node query: The pinyin size of the word Lin is larger than the word Chen in the first index value of the first node, but smaller than the word Zhou in the second index value of the first node, so node 2 is selected to continue the query.
Query comparison of node 2: The pinyin size of Chen character in the first index value of node 2 is smaller than that of Lin character, so we continue to look at the next index value, and find that node 2 has an index value matching the prefix of Lin character, so we search for leaf node, namely leaf node 4.
Node 4 query comparison: the first index of node 4 has a prefix that matches the forest word, so it reads the row and continues to match to the right until no index with a prefix of forest is matched.

If name like ‘% Lin ‘is used, the result of the query may be “Chen Lin, Zhang Lin, Zhou Lin”, etc., so we do not know which index value to start the comparison, so we can only use the full table scan method to query.

To learn more about InnoDB’s B+ tree query process, you can read my article: What are the nodes in the B+ tree? What is the process of querying data?

Use functions for indexes

Sometimes we use MySQL’s own function to get the result we want. If we use a function on an index field, it will invalidate the index.

Select * from table where type=ALL; select * from table where type=ALL;

Select * from t_user where length(name)=6;Copy the code

Why can’t I use a function on an index?

Because the index stores the original value of the index field, rather than the calculated value of the function, there is no way to use the index.

However, since MySQL 8.0, the index feature has added a functional index, which means that an index can be created on the evaluated value of a function, meaning that the value of the index is the evaluated value of the function, so that the data can be queried by scanning the index.

For example, I create an index named idx_name_length on the result of length(name).

alter table t_user add key idx_name_length ((length(name)));
Copy the code

Then I use the following query, which will go to the index.

Perform an expression evaluation on an index

In the query condition of the index expression calculation, also cannot go to the index.

SQL > select * from table where type = ALL;

explain select * from t_user where id + 1 = 10;
Copy the code

Select * from table_name where id = 10; select * from table_name where id = 10;

Why can’t an index be evaluated by an expression?

The reason is similar to using a function on an index.

Because the index stores the original value of the index field, rather than the value calculated by the ID + 1 expression, it cannot go to the index. Instead, it can only extract the value of the index field, and then calculate the expression in turn to determine the condition. Therefore, it adopts the mode of full table scan.

For example, if id + 1 = 10 is changed to ID = 10-1, it is possible to perform a simple index scan.

Yes, it can be implemented, but MySQL still stole the lazy and didn’t implement it.

My idea, perhaps also because there are so many different expressions to evaluate, is that the code can become bloated if you have to consider each of them, so just tell the programmer about the index failure scenario and make sure that the programmer doesn’t evaluate the index in the query.

Implicitly cast an index

If the index field is a string, but the input parameter is an integer in a conditional query, you will find that this statement will perform a full table scan at the end of the execution plan.

I added a phone field to the original T_user table, which is a secondary index and of type VARCHar.

Then I use an integer as input parameter in a conditional query. In this case, type = ALL in the execution plan is used to query data through a full table scan.

select * from t_user where phone = 1300000001;
Copy the code

However, if the index field is an integer, the input parameter in the query condition, even if it is a string, does not invalidate the index and can still be scanned through the index.

Let’s look at the second example, where id is an integer, but the following statement still does the index scan.

 explain select * from t_user where id = '1';
Copy the code

Why does the first example cause index invalidation, but not the second?

In order to understand this reason, we need to know what are the data type conversion rules of MySQL? This is to see if MySQL converts strings to numbers or numbers to strings.

Select * from mysql45 where ’10’ > 9; select * from mysql45 where ’10’ > 9

If the rule is that MySQL will automatically convert “string” to “number”, this is equivalent to select 10 > 9, this is a number comparison, so the result should be 1;
If the rule is that MySQL will automatically convert “numbers” to “strings”, this is equivalent to select “10” > “9”, this is a string comparison, string comparison size is a bit-by-bit from high to low (ASCII), then “10” string is equivalent to “1 “and” 0″ character combination, So first, the “1” character is compared to the “9” character. Since the “1” character is smaller than the “9” character, the result should be 0.

In MySQL, the result is as follows:

The result above is 1, indicating that MySQL automatically converts strings to numbers and then compares them.

SQL > select * from table_name; select * from table_name;

Select * from t_user where phone = 1300000001;Copy the code

MySQL automatically converts the string to a number because phone is a string, so this is equivalent to:

select * from t_user where CAST(phone AS signed int) = 1300000001;
Copy the code

The CAST function applies to the phone field, and the phone field is the index. As mentioned earlier, using a function on an index will invalidate the index.

The query in example 2, which I told you would go through the index scan:

Select * from t_user where id = "1";Copy the code

Since the string part is the input parameter, we need to convert the string to a number, so this statement equals:

select * from t_user where id = CAST("1" AS signed int);
Copy the code

As you can see, the index field does not use any function. The CAST function is used in the input parameter, so it can be scanned through the index.

The union index is not a leftmost match

An index on a primary key field is called a clustered index, and an index on a normal field is called a secondary index.

Then an index created by combining common fields is called a joint index, also known as a composite index.

When creating federated indexes, we need to pay attention to the order in which they are created, because the federated indexes (a, b, c) and (c, b, a) can be used differently.

The correct use of a federated index must follow the left-most matching principle, that is, the index is matched in the left-most first way.

For example, if you create a (a, B, c) federated index, you can match the federated index if the query criteria are:

Where a = 1;
Where a=1 and b=2 and c=3;
Where a=1 and b=2;

Note that the order of the A fields in the WHERE clause is not important because of the query optimizer.

However, if the query condition is any of the following, the union index cannot be matched because it does not comply with the leftmost matching principle, and the union index will be invalid:

Where b = 2;
Where c = 3;
Where b=2 and c=3;

Select * from a where a = 1 and c = 3 where a = 1 and c = 3

This is actually strictly index truncation and is handled differently in different versions.

MySQL 5.5: select * from primary key; select * from primary key; select * from primary key;

From MySQL5.6, there is an index push down function, can be in the index traversal process, the index contained in the first judgment, directly filter out the records that do not meet the conditions, reduce the number of times back to the table.

Truncated fields are pushed down to the storage engine layer for conditional judgment (because the value of the C field is in the (A, B, C) joint index), then filtered out and returned to the Server layer. Because a large amount of data is filtered out in the engine layer, there is no need to go back to the table to read data for judgment, reducing the number of times back to the table, thus improving performance.

For example, where a= 1 and c = 0, we can use index push-down from the Extra=Using index condition in the execution plan.

Why does a federated index fail if it does not follow the leftmost matching principle?

The reason is that in the case of a joint index, the data is sorted by the first column of the index, and the second column is sorted only if the first column is the same.

That is, if we want to use as many columns as possible in the federated index, the columns in the query criteria must be consecutive columns from the leftmost in the federated index. If we just search by the second column, we can’t go through the index.

OR in the WHERE clause

In the WHERE clause, if the condition column before OR is an index column, and the condition column after OR is not an index column, then the index is invalidated.

For example, in the following query, id is the primary key, age is the normal column, and the result of the execution plan is a full table scan.

select * from t_user where id = 1 or age = 18;
Copy the code

This is because the meaning of OR is that only one of two conditional columns is needed, so it is meaningless to have only one conditional column as an index column. As long as any conditional column is not an index column, a full table scan will be performed.

The solution is simple: set the AGE field to the index.

You can see that type=index merge, which means that the id and age are scanned separately, and then the two result sets are merged. The advantage of this is that the full table scan is avoided.

conclusion

Here are 6 ways index failures can occur:

When we use left or right fuzzy matching, that islike %xxorlike %xx%Both methods cause index invalidation;
When we use a function on an index column in a query condition, the index is invalidated.
When we perform expression calculation on the index column in the query condition, we also cannot walk the index.
MySQL automatically converts a string to a number and then compares it. If the string is an index column and the input parameter in the conditional statement is a number, then the index column will be CAST implicitly. Since CAST is implemented through CAST, it is equivalent to using a function on the index column, so the index will be invalidated.
To properly use a federated index, it must follow the left-most matching principle, that is, the index is matched in the left-most first way; otherwise, the index will fail.
In the WHERE clause, if the condition column before OR is an index column, and the condition column after OR is not an index column, then the index is invalidated.

Interviewer: What about index failures? What is the cause of the failure?

What does the index storage structure look like?

Use left or right fuzzy matching for indexes

Use functions for indexes

Perform an expression evaluation on an index

Implicitly cast an index

The union index is not a leftmost match

OR in the WHERE clause

conclusion

Related Posts

Spring MVC view parser

Five ways to count numbers in Python

Architecture optimization practices for high concurrency IM systems