preface

Recently, I really feel the pain of transition, slowly accumulate precipitation, or understanding is not good enough.

  • The following test cases are provided on the official websitesakilaDatabase, attachedDownload link.

The role of the LIMIT

  • As we all know,limitThe second and third statements have the same functions as each other, as follows:
 , as shown in figure 1
 select * from film limit 5
 , as shown in figure 2
 select * from film limit 6.5
 select * from film limit 5 OFFSET 6
Copy the code
  • The following2This is illustrated in the diagramlimitThe red part is the result.

Optimization of aLIMITstatements

  • forsakila.filmTable, respectively for the corresponding fieldlimitOperation, useEXPLAINAnalyze its execution plan.

Note: 1. The title field corresponds to the index IDx_title and the type is VARCHar (255).

  • In my actual work, I encountered a case aboutLIMITThe slow query needs to be optimized, used herefilmTable statement corresponding to,The parameters are slightly exaggerated to show the difference in query time, as shown below.
Sort by movie title, use LIMIT when paging
select film_id, description from film order by title limit 50000.10
Copy the code
  • You can start withEXPLAINTake a look at the execution plan, as follows
explain select film_id, description from film order by title limit 50000.10
Copy the code
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE film NULL ALL NULL NULL NULL NULL 64818 100.00 Using filesort
  • Execution time: about 0.15s

  • We can see that type is ALL, Extra is Using filesort, the data is scanned by the row pointer, the title field is sorted, the first 50000 data is discarded according to the value OFFET, and the number of LIMIT data is returned.

  • MySQL > select * from idex_title; select * from idex_title; select * from idex_title;

  • We can force MySQL to use the idx_title index.

explain select film_id, description  from film force index(`idx_title`) order by `title` limit 50000.10
Copy the code
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE film NULL index NULL idx_title 767 NULL 50010 100.00 NULL
  • Execution time: about 0.08 seconds

  • As you can see, it is up to the query optimizer to decide whether or not the order BY statement uses an existing sort of index, requiring a detailed look at the execution plan in Explain. The reasons can be seen on the official website:

the query uses SELECT , which may select more columns than key_part1 and key_part2. In that case, scanning an entire index and looking up table rows to find columns not in the index may be more expensive than scanning The table and sorting the results. If so, the optimizer probably will not use the index. The query optimizer will not use the index when the cost of scanning the table using the index and querying the row data back to the table is higher than the cost of scanning the table data using the row pointer and sorting, depending on the implementation of the query optimizer.

  • A common optimization method is deferred association, which first finds the primary key of the required data by overwriting the index, and then uses the primary key for association to find the required data. The specific statement is as follows:
select film_id, description  from film 
		inner join(
			select film_id from film ORDER BY title limit 50000.10
		) a USING(film_id);
Copy the code
  • Execution time: about 0.012 seconds

  • Note: Of course, because the number behind OFFSET is still large, the number of scanned data is still large. You can try to avoid OFFSET statements with your own business.

Using filesortSorting (not involved in specific algorithms)

  • So, inUsing filesortHow does MySQL sort?

Two-pass sort of data transfer

  • Sort two data transfers: read the row pointer and the field to be sorted, and then read the corresponding row data based on the sorted result, as shown below (Has nothing to do with the specific storage format, just the definition).
  • Advantages: Sort stores as little data as possible, with more rows in the “sort buffer” (memory).
  • Disadvantages: Two data transfers, the second read generates a large number of random I/O, high cost.

Single-pass Sort of data transfer

  • Sort a data transfer: first read all the columns required by the query, sort according to the given column, and finally return the sorting result directly.
  • Advantages: Only one sequential I/O is required to read the data, no additional random I/O is required
  • Disadvantages: Sort redundant columns and take up space.

Which one?

  • The sum of all required columns in the queryORDER BYThe total column size exceedsmax_length_for_sort_data(valid before 8.0.20) bytestwo-passThe algorithm;BLOBorTEXTEven if notORDER BYUse it and will use ittwo-pass.

Simple understanding: The size of redundant columns is relatively large, and the space cost is relatively large.

show variables Like '%max_length_for%'
Copy the code
Variable_name Value
max_length_for_sort_data 4096

Memory or disk?

Depending on the sort buffer size setting soft_buffer_size, if the sorted data is larger than the sort buffer size, the sort operation is done using temporary files from the disk. Ideally, sorting can be done in memory more efficiently, but soft_buffer_size is pre-allocated in version 5.7, and increasing it can lead to memory overuse. This has been improved in MySQL 8.0.12, where the optimizer allocates memory as needed (except when a lot of sorting queries concurrency… Not much). Also note that max_sort_length determines the column size of the sort buffer.

summary

  • SELECTWhen a non-indexed column is included,Order byWhether or not the statement uses index scanning depends on the query optimizer’s decision in a specific case as neededEXPLAIN.
  • Using filesortDivided intosingle_passandtwo_passData transfer process, and sorting parameters.

reference

  • High performance MySQL
  • Dev.mysql.com/doc/refman/…
  • Stackoverflow.com/questions/1…
  • Dev.mysql.com/doc/refman/…