Online slow query accident caused by incorrect index selection of ten-million-level data table

preface

See you again! Another two weeks passed, and I had several more half-written drafts in my cloud notes. Some are ready to add more content because the quality is not as good as expected, while others are just an inspiration with no content at all. Envy a lot of bigwigs, a week can produce five or six articles, give me two liver I am not enough. Well, without further ado…

Recently, the online environment encounters a database fault caused by slow SQL query, which affects online services. The MySQL optimizer selected the wrong index (not the “wrong” index, but the index that took longer to execute) when SQL was being executed. In the process of troubleshooting, I consulted a lot of information, and also learned the basic criteria for MySQL optimizer to select indexes. In this article, I shared the ideas to solve the problem. I understand the depth of MySQL is limited, if the error welcome rational discussion and correction.

“In this incident, we can fully see the importance of in-depth understanding of the operation of MySQL, which is the key to solve problems independently.” Imagine a dark and windy night, the company’s online suddenly went down, but your colleagues are not online, and you alone have the ability to solve the problem. At this time, if you are stuck by the basic skills of the engineer, I will ask you whether you are embarrassed.

Fault description

A database suddenly receives a large number of alarms at 11:00 on July 24. The number of slow queries exceeds the threshold and the number of connections increases rapidly. As a result, the database responds slowly and services are affected. See the chart, slow queries reach 14W times per minute at the peak, and the number of slow queries is only below two digits under normal circumstances, as shown below:

Mysql > select * from ‘SQL’; SQL > select * from ‘SQL’;

select
  *
from
  sample_table
where
    1 = 1
    and (city_id = 565)
    and (type = 13)
order by
  id desc
limit
  0, 1
Copy the code

It looks very simple, nothing special. But the query time per execution was a staggering 44s.

It’s sensational, it’s not slow…

Next, view the table data information as shown below:

It can be seen that the amount of data in the table is large, and the estimated number of rows is 83683240, that is, about 8000W, “a table with tens of millions of data”.

This is the general situation, the following into the investigation of the link.

Fault Cause Troubleshooting

SQL > alter table DML select * from DML;

KEY `idx_1` (`city_id`,`type`,`rank`),
KEY `idx_log_dt_city_id_rank` (`log_dt`,`city_id`,`rank`),
KEY `idx_city_id_type` (`city_id`,`type`)
Copy the code

Please ignore the duplicate indexes idx_1 and idx_city_id_type. This is a historical problem.

Idx_city_id_type = city_id; idx_1 = idx_1; idx_city_id = idx_1;

But is it true that city_id and type are all our query criteria? (If you’re smart enough to notice the problem, you’ll have to move on.)

Now that there is an index, the next step is to see if the statement actually goes to the index. MySQL provides Explain to analyze SQL statements. Explain is used to analyze SELECT query statements.

Explain important fields include:

Select_type: specifies the query type, including simple query, combined query, and sub-query
Key: indicates the used index
Rows: The number of rows expected to be scanned

More detailed Explain can refer to: MySQL performance optimization artifact Explain usage analysis

We analyze this statement using Explain:

Select * from sample_table where city_id = 565 and type = 13 order by id desc limit 0,1Copy the code

Results obtained:

As you can see, PossibleKey has our index, but it ended up with the primary key index. MySQL takes a long time to retrieve the primary key index, resulting in a slow query.

We can use force index(idx_city_id_type) to make the statement select the joint index we set:

select * from sample_table force index(idx_city_id_type) where ( ( (1 = 1) and (city_id = 565) ) and (type = 13) ) order  by id desc limit 0, 1Copy the code

This time the execution is obviously very fast.

The actual execution time is 0.00175714s. After the joint index is removed, it is no longer a slow query.

The problem is found and summed up as follows: “The MySQL optimizer considers that the primary key index can be used to find the data faster in limit 1, and that if the primary key index is scanned and sorted, the optimizer considers that the primary key index is naturally ordered. MySQL went through 8000W data and didn’t find the chosen one, so it wasted a lot of time.”

MySQL index selection principle

Optimizer index selection criteria

The query optimizer is where the index is selected:

An explanation cited in reference:

❝ First of all, it is the MySQL optimizer’s job to select indexes. The purpose of the optimizer’s index selection is to find an optimal execution plan and to execute the statement with the minimum cost. In a database, the number of rows scanned is one of the factors affecting the execution cost. Fewer rows scanned means less disk data is accessed and less CPU resources are consumed. “Of course, the number of rows scanned is not the only criterion. The optimizer also takes into account factors such as whether to use temporary tables, whether to sort, etc.” ❞

In summary, there are many factors to consider in optimizer selection: “number of rows scanned, whether to use temporary tables, whether to sort, etc.”

Let’s go back to the previous two Explain screenshots:

Select * from ‘primary index’ where rows are estimated at 1833 and ‘union index’ is forced at 45640. So instead of reinforcing the index, “the optimizer chose the primary key index because it felt that the primary key index scanned fewer rows and did not require additional sorting operations, because the primary key index was naturally ordered.”

How did rows estimate it

You might ask why rows is only 1833, when it actually scans the entire primary key index, and there are far more than a few thousand rows. Explain rows is the “estimated” number of rows by MySQL based on query criteria, indexes, and limits.

How does MySQL get the cardinality of the index? Here, I give you a brief introduction to the MySQL sampling statistics method.

Why do we sample statistics? Although accurate results can be obtained by taking out the whole table for statistics row by row, the cost is too high, so we can only choose “sampling statistics”.

InnoDB will select N data pages by default, count the different values of these pages, get an average, and then multiply by the number of pages in the index to get the cardinality of the index.

Tables are constantly updated and index statistics are not fixed. Therefore, when the number of changed rows exceeds 1/M, a new index count is automatically triggered.

In MySQL, there are two ways to store index statistics, which can be selected by setting the value of innodb_stats_persistent:

If this parameter is set to on, statistics will be stored persistently. By default, N is 20 and M is 10. If this parameter is set to off, statistics are stored only in memory. By default, N is 8 and M is 16. Because we’re sampling statistics, it’s easy to get this base wrong whether N is 20 or 8. We can use the Analyze table t command, which can be used to re-count index information. But the production environment needs to contact the DBA for this command, so I won’t do the experiment and you can experiment on your own.

The index takes into account the fields of order by

Why do you say so? Because if the index in my table is a combined index of city_id,type, and ID, then the optimizer will walk through the combined index because the indexes are already sorted.

Can changing the limit size fix the problem?

Does increasing the limit affect the estimated number of rows and thus the optimizer’s index selection?

The answer is yes.

We enforce limit 10

Select * from sample_table where city_id = 565 and type = 13 order by id desc limit 0,10Copy the code

In the figure, “Rows” becomes 18211, a 10-fold increase. What happens if YOU use limit 100?

The optimizer chose the federated index. The initial estimate is that rows will double again, so the optimizer drops the primary key index. Rather than use a primary key index, use a joint index after sort.

Why are abnormally slow queries suddenly appearing

Q: This query has been running steadily online for a long time. Why the slow query?

Select * from limit1; limit1; select * from limit1; In this code, the query condition is null, resulting in all primary key indexes being scanned.

The solution

Now that we know why MySQL chose this index, we can use the above ideas to enumerate the solutions.

There are two general directions:

Forcing an index
Interference optimizer selection

Forcibly select an index: force index

Just like I did in the beginning above, we use force index directly and let the statement go to the index we want.

select * from sample_table force index(idx_city_id_type) where ( ( (1 = 1) and (city_id = 565) ) and (type = 13) ) order  by id desc limit 0, 1Copy the code

The advantage of this is that the results are quick and the problem can be solved immediately.

The disadvantages are also obvious:

High coupling, this kind of statement written in code, can become difficult to maintain, if the index name changes, or the index is not there, the code has to change repeatedly. It’s hard coded.
Much of the code encapsulates SQL in a framework, and force index() is not easy to add. “Instead, let’s guide the optimizer to select the federated index.”

Interference optimizer selection: increase limit

By increasing the limit, we can increase the estimated number of scanned rows rapidly, such as limit 0, 1000 below

SELECT * FROM sample_table where city_id = 565 and type = 13 order by id desc LIMIT 0,1000
Copy the code

This will go up to the union index and sort, but forcing the limit to grow in this way always feels like a black-box callback. Do we have a more elegant solution?

Interference optimizer selection: Add a federated index that contains the Order by ID field

Our slow query uses order by ID, but we do not add the ID field to the joint index, causing the optimizer to think that the joint index needs to be sorted, so it does not want to use the joint index at all.

We can solve this problem by creating a combined index city_id,type, and ID.

This also has some disadvantages, for example, my table has 8000W data, it is very time-consuming to build an index, and usually the index is 3.4 GB, if unlimited use of indexes to solve the problem, it may bring new problems. Do not have too many indexes in the table.

Interference optimizer selection: written as a subquery

What else can we do? Select city_id from limit1; select city_id from limit1; select city_id from limit1; select city_id from limit1

However, subqueries are risky to use, and the first VERSION of the DBA does not recommend using subqueries, instead recommending that you complete complex queries in your code logic. Of course our sentence is not complicated ~

Select * From sample_table Where id in (Select id From `newhome_db`.`af_hot_price_region` where (city_id = 565 and type = 13)) limit 0, 1
Copy the code

There are many solutions… SQL optimization is a big project, we have a lot of ways to solve this slow query problem, here is not a spread out. I’ll leave it to you to think about.

conclusion

This article reviews an online slow query accident caused by the MySQL optimizer’s incorrect index selection. It can be seen that the MySQL optimizer’s index selection does not rely on a single standard, but the result of a comprehensive selection. My own understanding of this aspect is not in-depth, but also need a lot of learning, to be able to do a good index selection summary (digging). Anyway, pick up the huge high Performance MySQL and start…

Hold down my instant noodles…

“Finally, a summary of the article:”

The use of order by ID in this slow query caused the optimizer to choose between the primary key index and the combined index of city_id and type, resulting in the selection of the slower index.
You can solve the problem by forcing indexes, creating federated indexes with ids, and increasing the limit.
In daily development, especially for tables with large data volume, we should pay attention to the standard OF SQL statements and the establishment of indexes to avoid the occurrence of accidents.

Three things to watch ❤️

If you find this article helpful, I’d like to invite you to do three small favors for me:

Like, forward, have your “like and comment”, is the motivation of my creation.
Follow the public account “Java rotten pigskin” and share original knowledge from time to time.
Also look forward to the follow-up article ing🚀
[666] Scan the code to obtain the learning materials package

Author: the back-end technology rambling links: club.perfma.com/article/216…