Preface:
Recently, when I clicked a chart display page in the test environment, it took half a day to get the background response data for page rendering. The background response was very slow, which greatly reduced the user experience.
The problem was found to be caused by a group by statement with slow query.
Main line of this article:
(1) Briefly describe the troubleshooting steps;
② Optimize group by query slowly;
The following describes the troubleshooting steps:
The investigation is mainly divided into two steps:
- Background interface monitoring to see which method takes the most time to call
- Slow query logs are enabled for the database to record THE SQL that is slowly executed
It is recommended to use Arthas, an open source Java online diagnostic tool of Ali, to count the time spent calling each method node on the link using its trace command.
Arthas tools can be used in the following ways: Troubleshooting for slow response times to online services;
Using the Arthas tool, a group by query method for a database was time-consuming;
In order to further confirm that this query is very time-consuming, I turn on the slow query log of MySql, and then call the background interface again, find that the slow query log does exist this SQL statement;
The SQL statement is as follows:
SELECT
date_format(createts, '%Y') AS YEAR
FROM
t_test_log
GROUP BY
date_format(createts, '%Y')
ORDER BY
createts DESC
Copy the code
This SQL statement is used to count the year in which all data in the table was created;
Let’s talk about why this SQL is slow and how it is optimized.
Group by query slow optimization:
When optimizing a group by query, the following two terms come to mind. The following two types of index scan can efficiently complete a group by operation:
- Loose Index Scan
- Tight Index Scan
When no suitable index is available, the group by operation usually scans the entire table to extract data, creates a temporary table, and then sorts the columns specified by group by. In this temporary table, rows are contiguous for each group.
After sorting, you have all the groups groups and can perform the aggregate function.
As you can see, you need to create temporary tables and sort when indexes are not in use; This information is usually seen in the execution plan Extra Using temporary; Using filesort appears.
SQL > execute plan SQL > execute plan
As soon as you get the SQL for this slow query, analyze its execution plan using the Explain keyword:
By looking at the execution plan, we found that the SQL statement was a full table scan, and about 99974 rows were scanned before the final result set, and temporary tables and file sorting were used in the execution.
2. Content description of SQL execution plan:
To view the execution plan, focus on the three data points in the wreath above:
-
Type: access type. This is an important metric in SQL query optimization. The result values are in descending order:
-
Rows: Data Rows, based on table statistics and index selection, roughly estimate the number of Rows that need to be read to find the desired record;
-
The Extra information that is important for SQL execution is a few common values:
- Using filesort: the default sort of index is not used, so you need to use file to sort it.
- Using temporary: Use temporary tables to save intermediate results. This is common in group BY.
- Using index: indicates that the final value can be obtained directly in the index tree, avoiding the table back, indicating that the SQL performance is very good;
- Using index for group-by: indicates that loose index scanning is used, which indicates that the SQL performance is good. Because loose index scanning only needs to read a small amount of data to complete the group by operation, the execution efficiency is very high.
- Select Tables Optimized Away: In the absence of group by clause, the operation of MIN/MAX aggregation function based on index optimization can be completed at the generation stage of query execution plan without waiting for calculation in the execution stage. The occurrence of system, which indicates that THE SQL performance has reached the optimal level and usually matches with type access type, appears.
3, create index and then query execution plan:
You can see from the execution plan above that because no corresponding index is created, full table scan is performed, and the performance is the worst. Then create an index for the createTS field; Then look at its execution plan:
By checking the execution plan after the index creation, it is found that the query removes the index full scan. Although this query is optimized from the full table scan to the full index scan, it still needs to scan about 99974 rows to get the final result set, and the performance does not improve much.
Using temporary exists in the “Extra” message. Using filesort, indicating that loose index scans or compact index scans are not used.
Then analyze the SQL statement again:
SELECT
date_format(createts, '%Y') AS YEAR
FROM
t_test_log
GROUP BY
date_format(createts, '%Y')
ORDER BY
createts DESC
Copy the code
SQL > select date_format from createts; select date_format from createts; Then I need to rewrite the SQL.
4. Optimize by rewriting SQL:
The rewritten SQL is as follows:
SELECT
date_format(createts, '%Y') AS years
FROM
(
SELECT
createts
FROM
t_test_log
GROUP BY
createts
) t_test_log_1
GROUP BY
date_format(createts, '%Y')
ORDER BY
createts DESC
Copy the code
Rewrite SQL after re-execution, found that the query speed is very much faster, performance has a qualitative leap;
Then check its execution plan as follows:
If you look at the execution plan of the nested query statement above, the subquery section scans approximately 52 rows to get the result set, compared to 99974 rows to get the result set in the first place. In addition, Using index for group-by appears in the Extra information of the sub-query, indicating that the efficiency is improved by Using loose index scanning.
External query groups and sorts the result set of sub-query (52 row records) again. At this time, it adopts the query of the whole table (the whole result set). If the result set is very large, the efficiency is not very high.
Therefore, when using the SQL statement of this optimization scheme, you need to count the size of the result set of the subquery. If the result set of the subquery is large, you are not advised to use this optimization scheme.
5. Optimize by rewriting SQL + rewriting code:
The above optimization scheme, just rewrite SQL, no need to modify the code; This optimization program should not only rewrite SQL, but also modify the code;
SQL > select minimum year and maximum year from table
(
SELECT
date_format(createts, '%Y') AS years
FROM
t_test_log
ORDER BY
createts
LIMIT 1
)
UNION ALL
(
SELECT
date_format(createts, '%Y') AS years
FROM
t_test_log
ORDER BY
createts DESC
LIMIT 1
)
Copy the code
Check the execution plan of the above SQL statement:
Select * from rows: 1; select * from rows: 1; select * from rows: 1; So the performance of this SQL is very good.
However, remember that the result set generated by this SQL query is not the final data needed, and you need to write code to calculate the final result set:
- The maximum and minimum values are the same. It indicates that the data in the table belongs to the same year
- The two values of maximum and minimum years obtained are different:
- The two values are subtracted to one: indicates that the years are adjacent to each other and the result set can be returned directly.
- When two values are subtracted from one, it indicates that there are years between the minimum year and the maximum year, and the intermediate year is obtained by calculation
Note, however, that there is a problem with writing code to calculate the final year, and that is that it is true that there is no data for the intermediate year in the table, but the calculation does;
For example: if the minimum and maximum years are 2018 and 2021 in SQL query, then the middle years 2019 and 2020 are calculated by code, but there is no data of 2019 in the table, this is a problem.
Therefore, this solution needs to be analyzed based on specific business scenarios and actual data.
Extension:
In the optimization by rewriting SQL + rewriting code, rewriting more than the above KIND of SQL, there is a query efficiency is relatively high rewriting SQL;
Is to use min, Max aggregate function to rewrite SQL, but in the use of aggregate function, you can write the following two types of SQL, in the end which rewrite SQL efficiency is relatively high, leave a suspense, we can go to try to analyze yo! Leave your answers in the comments!
The first way to rewrite SQL:
(
SELECT
date_format(min(createts), '%Y') AS years
FROM
t_test_log
)
UNION ALL
(
SELECT
date_format(max(createts), '%Y') AS years
FROM
t_test_log
)
Copy the code
Second way to rewrite SQL:
(
SELECT
date_format(minyear, '%Y') AS years
FROM
(
SELECT
min(createts) AS minyear
FROM
t_test_log
) t_test_log_1
)
UNION ALL
(
SELECT
date_format(maxyear, '%Y') AS years
FROM
(
SELECT
max(createts) AS maxyear
FROM
t_test_log
) t_test_log_2
)
Copy the code
Like + comment + forward yo
If this article is helpful to you, please wave your love to make a fortune of the little hand under the praise ah, your support is my continuous creation of power, thank you!
You can VX search [Muzi thunder] public number, a large number of Java learning articles, you can look at yo!