Slow SQL = long_query_time; slow SQL = long_query_time; True slow SQL is often accompanied by extensive row scans, temporary file sorting, or frequent disk flush, which increases disk I/O, turning normal SQL into slow SQL, and causing large execution timeouts. This article will share the governance process for slow SQL.
The author | | wei-min shi source ali technology to the public
Why do I do this
1 What is slow SQL?
This refers to slow MySQL queries, specifically SQL queries that run longer than the long_query_time value.
Redolog, undolog, etc. For slow queries, there is also a slow query log, slowlog, used to record statements whose response time exceeds the threshold in MySQL.
Don’t be misled by the name slow query. Slow query logs only record SELECT statements. In fact, slow query logs also record INSERT, update, and other DML statements whose execution time exceeds the long_QUERy_time threshold.
Show variables like "slow_query_log%"; # show variables like "long_query_time";Copy the code
For the alisqL-X-cluster (XDB) we use, slow queries are enabled by default and long_query_time is set to 1 second.
2 Why Does slow query Cause a Fault?
Real slow SQL is often accompanied by a large number of row scans, temporary file sorting, or frequent disk flush. The direct effect is that disk I/O increases, normal SQL becomes slow SQL, and most of the execution times out.
After November 11 last year, cAINiao CTO launched several special governance projects for the problems exposed on the technical side, and CTO-D received one project as sponsor for each project. The big team I was in was responsible for the special project of slow SQL governance.
Two to achieve what degree
How to measure the severity of slow SQL in an application?
Micro, on average,
The sum (aone application slow SQL execution times) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the sum (aone application SQL execution times)Copy the code
We believe that the greater the value, the greater the influence; The smaller this value is, the less impact it may have.
In extreme cases, the application executes slow SQL every time, and this value is 1. None of the SQL executed in the application is slow. This value is 0.
The problem with this metric, however, is poor differentiation, especially for SQL where THE QPS is high and the SQL is not slow in most cases, and the occasional slow SQL can be drowned out.
Another question, is accidental slow SQL really slow SQL? We encountered a lot of SQL that was logged by slow queries, but in fact it may have been affected by other slow SQL, MySQL disk jitter, optimizer selection, etc., that made the SQL that was clearly not slow in a regular query become slow.
Macro average
The sum (slow SQL execution times 1) sum (slow SQL n number of executions) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the sum (SQL execution times 1) sum n number of executions (SQL) --------------------------------------- nCopy the code
This algorithm is based on a certain number of times the slow SQL is caught, and can reduce the impact of false slow SQL.
When some applications have very low QPS, that is, SQL is executed only a few times a day, it can cause statistical errors if pseudo-SQL is encountered.
Number of executions
Sum (number of aone slow SQL executions) ----------------------- 7Copy the code
The average number of slow SQL executions per day in the last week can eliminate the pseudo SQL problem caused by macro averaging.
Number of slow SQL templates
Each of the above dimensions has a time limit. To trace the slow SQL history processing, we also introduced the global slow SQL template number dimension.
Count (DISTINCT (AOne application slow SQL template))Copy the code
2 goals
- Core application: Eliminates all slow SQL
- Common application: micromean index decreased by 50%
3 the CTO report
Cto-d is used as the unit to collect the weighted average of applications according to the above multidimensional indicators, and the ranking is obtained from low to high, highlighting the top3 and tail, which is broadcast weekly.
Three why am I doing it
I guess it may have something to do with my background. I have C/C++ background. I used to be responsible for the design and implementation of remote live architecture at the company level in my last company, and I have a good understanding of MySQL.
In addition, it may be irrelevant. My small team has just started its business, so there is no slow SQL, which can be inserted into each line of business.
Iv Action Support
1 Group MySQL specification
Index protocol Excerpt:
[Mandatory] Disallow join when more than three tables are used. The data types of the fields to be joined must be the same. When multiple tables are used for associated query, ensure that the associated fields have indexes.
Note: Pay attention to table index, SQL performance even if double table JOIN.
[Mandatory] When creating an index for a VARCHAR field, the index length must be specified. It is not necessary to create an index for all fields. The index length depends on the actual text discrimination.
Note: The length of an index is contradictory to its distinctiveness. Generally, the distinctiveness of an index with a length of 20 is over 90% for string data. You can use the distinctiveness of count(distinct left(column name, index length)/count(*) to determine the distinctiveness.
[Mandatory] Page search is strictly prohibited from left blur or full blur, if necessary, please go to the search engine to solve.
Note: The index file has the left-most prefix matching feature of b-tree. If the left value is not determined, the index cannot be used.
[Recommendation] Prevent index invalidation due to implicit conversion caused by different field types.
Avoid the following extreme misunderstandings when creating indexes:
Better have too many indexes than too few
Consider that a query requires an index.
2) Stingy index creation
Indexes are considered to consume space and significantly slow down updates and additions.
3) Resist unique indexes
Unique indexes should be solved by “search before interpolation” at the application layer.
2 DB change standard
DDL needs to control change speed, pay attention to grayscale and concurrency control, and change publishing needs to be within the specified change publishing window.
Share some examples of my participation in optimization
1 data distribution is uneven
1) The classification of database and table is unreasonable
The service data is divided into 8 libraries, and each database is divided into 16 tables. By viewing the table space, you can see that the data is almost distributed in two tables of each library. There are problems with the strategy of subdivision and subdivision, and the business increment is overestimated.
2) Unreasonable index
A single table created a federated index idx_logisticS_CORP_ID_special_ID, but even this was too low. The combination of the logistics_corp_id and transport_type_id fields was very high. In addition, services exist in the transport_type_id single check scenario.
2 Index Problems
SELECT COUNT(0) AS `tmp_count` FROM( SELECT `table_holder`.`user_id`, `table_holder`.`sc_item_id`, SUM( CASE `table_holder`.`inventory_type` WHEN 1 THEN `table_holder`.`quantity` ELSE 0 END ) AS `saleable_quantity`, SUM( CASE `table_holder`.`inventory_type` WHEN 1 THEN `table_holder`.`lock_quantity` ELSE 0 END ) AS `saleable_lock_quantity`, SUM( CASE `table_holder`.`inventory_type` WHEN 401 THEN `table_holder`.`quantity` ELSE 0 END ) AS `transfer_on_way_quantity`, `table_holder`.`store_code`, MAX(`table_holder`.`gmt_modified`) AS `gmt_modified` FROM `table_holder` WHERE(`table_holder`.`is_deleted` = 0) AND(`table_holder`.`quantity` > 0) AND `table_holder`.`user_id` IN(3405569954) AND `table_holder`.`store_code` IN('ZJJHBHYTJJ0001', '... 'GROUP BY' table_holder '. 'user_id', 'table_holder'. 'sc_item_id' ORDER BY 'table_holder'. 'user_id' ASC, `table_holder`.`sc_item_id` ASC ) `a`;Copy the code
The table corresponding to this case has store_code index, so it is considered ok and cannot be optimized. In fact, by executing the plan, we find that MySQL has selected a full table scan. As a result of this case practice, the index optimizer will not use the field index when the number of range queries exceeds 200.
Finally, after pulling the relevant query SQL of the recent period and combining with the data distribution of the business, we found that it could be solved by using (IS_deleted,quantity).
Determine the length of the index used by the execution plan: formula for calculating the length of key_len (>=5.6.4)
Char (10) allows NULL = 10 * (character set: UTF8MB4 =4, UTF8 =3, GBK =2,latin1=1) + 1(NULL) Char (10) does not allow NULL = 10 * (character set: Utf8mb4 =4, UTf8 =3, GBK =2,latin1=1) VARCHR (10) allows NULL = 10 * (character set: Utf8mb4 =4, UTF8 =3, GBK =2,latin1=1) + 1(NULL) + 2(variable length field) VARCHR (10) not allowed NULL = 10 * (character set: Utf8mb4 =4, UTf8 =3, GBK =2,latin1=1) + 2(variable length fields) int NULL =4 + 1(NULL) int NULL =4 + 1(NULL) Timestamp NULL = 4 datatiMe NULL = 5 + 1 is allowed (NULL) Datatime NULL = 5 is not allowedCopy the code
3 Be influenced by others
20 million lines are scanned using the index:
Index fields are highly differentiated:
At the same time regular SQL became slow queries:
DB data disk access:
Mysql > select * from ‘SQL’; mysql > select * from ‘SQL’; mysql > select * from ‘SQL’;
Multiple MySQL instance leader nodes are deployed on the same physical machine. Although docker is used to isolate CPU, MEM and other resources, buffer IO is not isolated yet.
4 Unable to solve
Summarizing and analyzing the high-frequency queries and combining them with the business to find the appropriate index can often solve the slow queries encountered in daily life, but it is not a panacea.
For example, it might add more and more indexes to something like this:
Some scenarios, such as multiple field combination queries, do not have mandatory fields, and it is not reasonable to support them all through indexes.
In query scenarios, it is a good practice to set highly differentiated fields as required fields. Consider going to a store or search engine with better search support for large query combinations.
Six daily treatment
With the in-depth governance of each cto-d line, each indicator has been greatly improved compared with before. For example, the core application has completed the slow query zeroing, and some of the most influential slow SQL has been solved, and the ranking of my team has also changed from the top3 in the tail to the top3 in the head.
Slow SQL governance has become a daily occurrence. Through weekly fixed push of slow SQL work orders, owner takes over processing and statement, the habit and atmosphere of regular clearing has been basically formed. Slow SQL governance has been praised for many times.
Seven summary
This is a late summary. Looking back now, I think the process of strategy formulation, problem analysis and solution is worth sharing with everyone.
The original link
This article is the original content of Aliyun and shall not be reproduced without permission.