MySQL8.0 LIMIT to Using filesort

preface

Recently, I really feel the pain of transition, slowly accumulate precipitation, or understanding is not good enough.

The following test cases are provided on the official websitesakilaDatabase, attachedDownload link.

The role of the LIMIT

As we all know,limitThe second and third statements have the same functions as each other, as follows:

 , as shown in figure 1
 select * from film limit 5
 , as shown in figure 2
 select * from film limit 6.5
 select * from film limit 5 OFFSET 6
Copy the code

The following2This is illustrated in the diagramlimitThe red part is the result.

Optimization of a`LIMIT`statements

forsakila.filmTable, respectively for the corresponding fieldlimitOperation, useEXPLAINAnalyze its execution plan.

Note: 1. The title field corresponds to the index IDx_title and the type is VARCHar (255).

In my actual work, I encountered a case aboutLIMITThe slow query needs to be optimized, used herefilmTable statement corresponding to,The parameters are slightly exaggerated to show the difference in query time, as shown below.

Sort by movie title, use LIMIT when paging
select film_id, description from film order by title limit 50000.10
Copy the code

You can start withEXPLAINTake a look at the execution plan, as follows

explain select film_id, description from film order by title limit 50000.10
Copy the code

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	film	NULL	ALL	NULL	NULL	NULL	NULL	64818	100.00	Using filesort

Execution time: about 0.15s
We can see that type is ALL, Extra is Using filesort, the data is scanned by the row pointer, the title field is sorted, the first 50000 data is discarded according to the value OFFET, and the number of LIMIT data is returned.
MySQL > select * from idex_title; select * from idex_title; select * from idex_title;
We can force MySQL to use the idx_title index.

explain select film_id, description  from film force index(`idx_title`) order by `title` limit 50000.10
Copy the code

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	film	NULL	index	NULL	idx_title	767	NULL	50010	100.00	NULL

Execution time: about 0.08 seconds
As you can see, it is up to the query optimizer to decide whether or not the order BY statement uses an existing sort of index, requiring a detailed look at the execution plan in Explain. The reasons can be seen on the official website:

the query uses SELECT , which may select more columns than key_part1 and key_part2. In that case, scanning an entire index and looking up table rows to find columns not in the index may be more expensive than scanning The table and sorting the results. If so, the optimizer probably will not use the index. The query optimizer will not use the index when the cost of scanning the table using the index and querying the row data back to the table is higher than the cost of scanning the table data using the row pointer and sorting, depending on the implementation of the query optimizer.

A common optimization method is deferred association, which first finds the primary key of the required data by overwriting the index, and then uses the primary key for association to find the required data. The specific statement is as follows:

select film_id, description  from film 
		inner join(
			select film_id from film ORDER BY title limit 50000.10
		) a USING(film_id);
Copy the code

Execution time: about 0.012 seconds
Note: Of course, because the number behind OFFSET is still large, the number of scanned data is still large. You can try to avoid OFFSET statements with your own business.

`Using filesort`Sorting (not involved in specific algorithms)

So, inUsing filesortHow does MySQL sort?

Two-pass sort of data transfer

Sort two data transfers: read the row pointer and the field to be sorted, and then read the corresponding row data based on the sorted result, as shown below (Has nothing to do with the specific storage format, just the definition).
Advantages: Sort stores as little data as possible, with more rows in the “sort buffer” (memory).
Disadvantages: Two data transfers, the second read generates a large number of random I/O, high cost.

Single-pass Sort of data transfer

Sort a data transfer: first read all the columns required by the query, sort according to the given column, and finally return the sorting result directly.
Advantages: Only one sequential I/O is required to read the data, no additional random I/O is required
Disadvantages: Sort redundant columns and take up space.

Which one?

The sum of all required columns in the queryORDER BYThe total column size exceedsmax_length_for_sort_data(valid before 8.0.20) bytestwo-passThe algorithm;BLOBorTEXTEven if notORDER BYUse it and will use ittwo-pass.

Simple understanding: The size of redundant columns is relatively large, and the space cost is relatively large.

show variables Like '%max_length_for%'
Copy the code

Variable_name	Value
max_length_for_sort_data	4096

Memory or disk?

Depending on the sort buffer size setting soft_buffer_size, if the sorted data is larger than the sort buffer size, the sort operation is done using temporary files from the disk. Ideally, sorting can be done in memory more efficiently, but soft_buffer_size is pre-allocated in version 5.7, and increasing it can lead to memory overuse. This has been improved in MySQL 8.0.12, where the optimizer allocates memory as needed (except when a lot of sorting queries concurrency… Not much). Also note that max_sort_length determines the column size of the sort buffer.

summary

SELECTWhen a non-indexed column is included,Order byWhether or not the statement uses index scanning depends on the query optimizer’s decision in a specific case as neededEXPLAIN.
Using filesortDivided intosingle_passandtwo_passData transfer process, and sorting parameters.

reference

High performance MySQL
Dev.mysql.com/doc/refman/…
Stackoverflow.com/questions/1…
Dev.mysql.com/doc/refman/…

preface

The role of the LIMIT

Optimization of aLIMITstatements

Using filesortSorting (not involved in specific algorithms)

Two-pass sort of data transfer

Single-pass Sort of data transfer

Which one?

Memory or disk?

summary

reference

Related Posts

The use of set in Redis

Is learning IT self-taught still system grooms good? How to choose?

Reading this summary of the G1 garbage collector is enough to blow your interviewer away

Optimization of a`LIMIT`statements

`Using filesort`Sorting (not involved in specific algorithms)