In this article we will look at the optimization of the ORDER BY statement. Before that, you need to have a basic understanding of indexes. Now let’s get started.
There are two sorts in MySQL
1. Returns ordered data through sequential index scan
Because the index structure is a B+ tree, the data in the index is arranged in a certain order, so if you can use the index in the sort query, you can avoid additional sorting operations. When parsing a query EXPLAIN, Extra shows Using index.
2.Filesort sort, sort the returned data
All operations that do not return a sort directly by index are Filesort sorts, which means that additional sort operations have been performed. When parsing a query EXPLAIN, Extra is displayed as Using filesort.
The core principle of ORDER BY optimization
Minimize extra sorting and return ordered data directly through the index.
ORDER BY optimize actual combat
The index situation for the Customer table used in the experiment:
First note:
MySQL can only use one index at a time. If you want to use multiple indexes, create a composite index.
The ORDER BY optimization
1. The queried fields contain only the index fields and primary keys used in the query. The other non-index fields and index fields are used as query fields and do not use indexes.
To query only the index field used for sorting, you can use index sort:
explain select store_id,email from customer order by store_id,email;
SQL > select * from ‘sort’ where ‘sort’ = ‘index sort’;
explain select store_id,email,last_name from customer order by store_id,email,last_name;
To query only the index fields and primary keys used for sorting, use index sort:
Voiceover: MySQL’s default InnoDB engine physically uses clustered indexes to search by primary keys, so InnoDB requires tables to have primary keys. Even if a primary key is not explicitly specified, InnoDB generates a unique implicit primary key, which means that the index must have a primary key.
explain select customer_id,store_id,email from customer order by store_id,email;
SQL > select * from primary key; SQL > select * from primary key;
explain select store_id,email,last_name from customer order by store_id,email;
explain select * from customer order by store_id,email;
WHERE + ORDER BY optimization
1. Index sort cannot be used if the field is in multiple indexes
Sort fields in multiple indexes (not in the same index) cannot take advantage of index sort:
explain select * from customer where last_name='swj' order by last_name,store_id;
Voice-over: When the sort fields are not in the same index, the sort cannot be done in a B+ tree and an additional sort must be done
If the ORDER BY column is in an index and the WHERE condition is in the same index as the ORDER BY column, we can use index sort:
explain select * from customer where last_name='swj' order by last_name;
Of course, composite indexes can also use index sort:
Note that the field store_id,email is in a composite index
explain select * from customer where store_id = 5 order by store_id,email;
2. The order of sorted columns is inconsistent with that of indexed columns, and index sort cannot be used
The WHERE clause must have the first column in the index. The ORDER BY clause does not require this, but it does require that the ORDER of the sorted columns match the ORDER of the combined index columns. When we usually use composite index, we must develop a good habit of writing according to the sequence of composite index columns.
The order of the sorted columns is inconsistent with the order of the indexed columns and cannot use the index sort:
explain select * from customer where store_id > 5 order by email,store_id;
You should ensure that the sort field order is the same as the index column order to take advantage of index sort:
explain select * from customer where store_id > 5 order by store_id,email;
The ORDER BY clause does not require the first column to be indexed, so index sort can still be used. However, there is a precondition that can only be used in equivalence filtering, but not in range query:
explain select * from customer where store_id = 5 order by email;
explain select * from customer where store_id > 5 order by email;
Voice-over:
The reason for this is simple: when a range query is performed, the first column A must be sorted (ascending by default), while the second column B is not sorted. But if field A has the same value, then field B is sorted. So if it’s a range query, you can only do an extra sort on B once.
3. The ascending and descending order is inconsistent, and index sort cannot be used
Fields must be sorted either in positive ORDER or in reverse ORDER, otherwise you cannot take advantage of index sort.
explain select * from customer where store_id > 5 order by store_id,email;
explain select * from customer where store_id > 5 order by store_id desc,email desc;
explain select * from customer where store_id > 5 order by store_id desc,email asc;
Conclusion:
The above optimization can be summarized as: WHERE condition uses the same index as ORDER BY, and ORDER BY is in the same ORDER as index ORDER, and ORDER BY is in ascending or descending ORDER. Otherwise, you would definitely need an extra sort operation, and Filesort would appear.
Filesort optimization
It is possible to reduce the presence of Filesort by creating appropriate indexes, but in some cases, it is not possible to completely eliminate Filesort.
Two sorting algorithms of Filesort:
1. Double scan algorithm
The sort field and row pointer information are first fetched according to the condition, and then sorted in sort buffer. This sort algorithm requires two accesses to the data, the first to get sorted field and row pointer information, the second to get records based on the row pointer, and the second read operation may result in a large number of random I/O operations. The advantage is that sorting costs less memory.
2. One scan algorithm
Fetch all the fields of the row that meet the condition at one time, and then output the result set directly after sorting in sort buffer. The memory overhead of sorting is relatively large, but the sorting efficiency is higher than that of two-scan algorithm.
According to the characteristics of the two sorting algorithms, appropriately increasing the value of system variable max_LENGTH_FOR_sort_data can make MySQL choose a more optimized Filesort sorting algorithm. In addition, when writing SQL statements, only use required fields, rather than SELECT * all fields, so as to reduce the use of sorting area, improve SQL performance.
reference
MySQL in Simple Form
Recommended reading
MySQL — Analyze SQL execution plans through EXPLAIN
MySQL — Index base
MySQL — Index optimization
The data structure behind the database index
What influences database index selection?