preface
In the early stage of application development, the amount of data is small, and developers pay more attention to the realization of functions when developing functions. With the growth of production data, many SQL statements begin to expose performance problems, which have an increasing impact on production. Sometimes these problematic SQL statements may be the bottleneck of the performance of the whole system.
General steps for SQL optimization
1. Locate the SQL statements that are less efficient to execute by slowly checking the log
2. Explain the execution plan of SQL
Focus on Type, Rows, Filtered, Extra.
Type is more and more efficient from top to bottom
- ALL full table scan
- Index full scan
- >=,between,in, etc
- Ref uses a nonunique index scan or a unique index prefix scan to return a single record, often found in associative queries
- EQ_REF is similar to REF, except that it uses a unique index and an associative query that uses a primary key
- Const /system is a single record that the system treats as constants other columns in the matching row, such as primary keys or unique index queries
- NULL MySQL does not access any tables or indexes and returns results directly
SELECT * FROM T WHERE A = 1 AND B IN (1, 2) ORDER BY C; SELECT * FROM T WHERE A = 1 AND B IN (1, 2) ORDER BY C; SELECT * FROM T WHERE A = 1 AND B IN (1, 2) ORDER BY C; If you go idx1, then type is range; if you go idx2, then type is ref; IDX1 will be used when the number of rows that need to be scanned is about 5 times greater than IDX1, otherwise IDX2 will be used
Extra
- Using Filesort: MySQL requires an extra pass to figure out how to retrieve rows in sort order. The sorting is done by browsing all rows by join type and saving the sort keyword and pointer to the row for all rows that match the WHERE clause. The keywords are then sorted and the rows are retrieved in sort order.
- Using Temporary: A temporary table is used to store intermediate results. Performance is particularly poor and requires significant optimization
- Using index: select index from table Using index from table Using index from table Using index If a Using Where occurs at the same time, it means that the matching data cannot be queried directly through the index lookup.
-
Using Index Condition: ICP added after MySQL5.6. Using Index Condtion uses ICP to filter data in the storage engine layer, rather than in the service layer. Using index existing data to reduce the data back to the table.
3. Show Profile
Understand the status of the thread of SQL execution and the elapsed time. Set profiling = 1; Set profiling = 1;
SHOW PROFILES ;
SHOW PROFILE FOR QUERY #{id};
4, the trace
Trace analyzes how the optimizer selects an execution plan, and the trace file provides further insight into why the coupon selects an execution plan A over A plan B.
set optimizer_trace="enabled=on";
set optimizer_trace_max_mem_size=1000000;
select * from information_schema.optimizer_trace;
5. Determine the problem and take appropriate measures
- Optimize the index
- SQL statement optimization: modify SQL, IN query segmentation, time query segmentation, based on the last data filtering
- Use other implementation methods: ES, data storehouse, etc
-
Data fragmentation
Scenario analysis
Case 1, most left match
The index
KEY `idx_shopid_orderno` (`shop_id`,`order_no`)
The SQL statement
select * from _t where orderno=''
The query matches from left to right. To index with ORDER_NO, the query condition must carry SHOP_ID or the index (SHOP_ID, ORDER_NO) in reverse order
Case 2: Implicit conversion
The index
KEY `idx_mobile` (`mobile`)
The SQL statement
select * from _user where mobile=12345678901
Implicit conversions are equivalent to performing an operation on an index and invalidate the index. Mobile is a character type that uses numbers, so string matching should be used, otherwise MySQL will use implicit substitution, which will invalidate the index.
Case 3, large pages
The index
KEY `idx_a_b_c` (`a`, `b`, `c`)
The SQL statement
select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;
For large paging scenarios, product optimization requirements can be prioritized. If there is no optimization, there are two optimization methods as follows. One is to transfer the last piece of data, that is, the above C, and then do “C < XXX” processing. The other is to use a lazy association to reduce SQL back to the table, but remember that the index needs to be fully overwritten for this to work
select t1.* from _t t1, (select id from _t where a = 1 and b = 2 order by c desc limit 10000, 10) t2 where t1.id = t2.id;
In + order by
The index
KEY `idx_shopid_status_created` (`shop_id`, `order_status`, `created_at`)
The SQL statement
select * from _order where shop_id = 1 and order_status in (1, 2, 3) order by created_at desc limit 10
In query is an N * M query in MySQL, similar to a UNION, but more efficient than a UNION. When in query calculates COST (cost = number of tuples * average value of IO), the number of tuples is obtained by querying the values contained in in one by one.
This calculation process is slow, so MySQL sets a critical value (EQ_RANGE_INDEX_DIVE_LIMIT). After 5.6, the cost of this column will not participate in the calculation if the value exceeds this critical value. This can result in inaccurate execution plan selection. The default is 200, that is, if the IN condition exceeds 200 data, there will be problems in the cost calculation of IN, which may lead to inaccurate index selected by MySQL.
(ORDER_STATUS, CREATED_AT) can be used to swap the order and adjust the SQL to defer association.
Case 5: Range query is blocked, and subsequent fields cannot be indexed
The index
KEY `idx_shopid_created_status` (`shop_id`, `created_at`, `order_status`)
The SQL statement
select * from _order where shop_id = 1 and created_at > '2021-01-01 00:00:00' and order_status = 10
Range queries also have “IN, between”
Case 6 is not equal to and does not include a fast search that cannot be indexed. (You can use ICP)
Select * from _order where shop_id=1 and order_status not in (1,2) select * from _order where shop_id=1 and order_status not in (1,2 ! = 1
On indexes, avoid NOT,! =, <>,! <,! >, NOT EXISTS, NOT IN, NOT LIKE, etc
Case 7: A case where the optimizer chooses not to use an index
If the amount of data required to access is small, the optimizer will still choose the secondary index, but when the data accessed is a relatively large portion of the total table data (typically around 20%), the optimizer will choose to find the data by clustering the index.
select * from _order where order_status = 1
Query all outstanding orders, such orders are usually very few, even if the index is built, can not use the index.
Case 8: Complex queries
select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';
select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;
If it is to statistics some data, may use the number warehouse to solve; If there are so complex queries on the business, it may not be recommended to continue to go SQL, but to use other ways to solve, such as the use of ES, etc.
Case 9: ASC and DESC mixed
Select * from _t where a=1 order by b desc select * from _t where a=1 order by b desc
Case 10: Big data
For the data storage of push service, there may be a large amount of data. As for the choice of scheme, the final choice is to store it in MySQL and save it for 7 days or other validity period. It should be noted that frequent cleaning of data will cause data fragmentation, so it is necessary to contact the DBA for data fragmentation processing.
This is the end of the article, if you do not understand the “high-performance MySQL” book, you can click on the electronic fileportalTo receive.