It was a fine day and little A was practicing his fishing skills at his station.
Suddenly received the product ☎️, and demand?I need to find that city is the name of all the people in “Shanghai”, and return the name and age of the top 1000 people by name.
Small A hurried to sit upright, from a pile of library tables turned out the table needed, out of its construction sentence:
Look at the table structure and then look at the product requirements
Feel very easy, with SQL such a write:Alas, the sentence seems simple and unpretentious, as if a need had been perfectly answered. But in order to show their strong level of performance optimization, considering to avoid full table scan, so givecityField index. After creating the index, we need to use explain to verify:
explain select city, name, age from citizen where city = 'Shanghai' order by name limit 1000; +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------ -----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------ -----------------+ |1 | SIMPLE | citizen | NULL | ALL | city | NULL | NULL | NULL | 32 | 100.00| Using where; Using filesort | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------ -----------------+1 row in set, 1 warning (0.00 sec)
Copy the code
MySQL allocates a block of memory to each thread for sorting, called sort_buffer.
MySQL > order by order by order by order by order by order by order by “A” woke up with a start. I hadn’t thought about it.
The product manager sneered: Do you know what your city index looks like? I built it myself. I didn’t know that! I’ll just draw it
- Index diagram of the City field
Id_x ~ ID_ (x+n) = city=’ Shanghai ‘
Product: that you pour say this SQL execution flow? Don’t know, LET me tell you:
- Initialize thesort_buffer, confirm to place
Name, City, age
Three fields - From the index
city
Find your first satisfactionCity = 'Shanghai'
The primary key ID of the condition, that isid_x
; - Fetch the whole row from the primary key index of id
Name, City, age
The values of the three fields are savedsort_buffer - Gets the primary key ID of a record from index City
- Repeat 3 and 4 until the value of city does not meet the query criteria, that is, the primary key
id_y
- rightsort_bufferIn the data
name
Do fast row - The first 1000 rows of the sorted result are returned to the client
This is theFull field sort, the execution process is as follows:
Sorting by name may be done in memory, or it may require an external sort, depending on
- Memory required for sorting
- Parameter sort_buffer_size
Size of memory (sort_buffer) allocated by MySQL for sorting. If the amount of data to sort is less than sort_BUFFer_size, the sort is done in memory. If the amount of sorting data is too large to store in the disk, you have to use temporary files to assist sorting.
The product starts showing off again and asks: You knowWhen a sort statement will use temporary files
? This? Well, that really hits my intellectual blind spot again!
mysql> SET optimizer_trace='enabled=on';
Query OK, 0 rows affected (0.00 sec)
/* Save the initial value of Innodb_rows_read with @a */
mysql> select VARIABLE_VALUE into @a from performance_schema.session_status where variable_name = 'Innodb_rows_read';
Query OK, 1 row affected (0.00 sec)
mysql> select city, name,age from citizen where city='Shanghai' order by name limit 1000; + + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- + | city | name | age | + -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- + | | Java | Shanghai22|.../* View the OPTIMIZER_TRACE output */
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`\G
/* Use @b to save the current value of Innodb_rows_read */
mysql> select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read';
Query OK, 1 row affected (0.00 sec)
/* Calculate the difference of Innodb_rows_read */
mysql> select @b-@a;
Copy the code
Check the number_of_tmp_files field in the OPTIMIZER_TRACE result to see if temporary files are used.
"filesort_execution": [
],
"filesort_summary": {
"rows": 4000
"examined_rows": 4000,
"number_of_tmp_files": 12,
"sort_buffer_size": 32664 ,
"sort_mode": "<sort_key, packed_additional_fields>"
Copy the code
- number_of_tmp_files
The number of temporary files used during sorting. Why do you need 12 files? If you can’t fit it in, you need to use external sort, and external sort usually uses merge sort. MySQL divides the data that needs to be sorted into 12 pieces, and each piece is sorted separately and stored in these temporary files. Then combine the 12 ordered files into one large ordered file.
If sort_buffer_size exceeds the amount of data to sort, number_of_tmp_files is 0, which means the sort can be done directly in memory.
Otherwise, you need to sort them in temporary files. The smaller the sort_buffer_size, the more shares to be split, and the larger the number_of_tmp_files value.
- examined_rows
Number of rows participating in the sort. The test table has 4000 records that satisfy city=’ Shanghai ‘, so this parameter is 4000.
- The sort_mode packed_additional_fields
The sorting process compacts the string. Even if the name field is defined as vARCHar (16), space is allocated by actual length during sorting.
Select @b-@a results in 4000 rows, i.e., only 4000 rows were scanned during the entire execution.
Note that to avoid interfering with the conclusion, I set internal_tmp_disk_storage_engine to MyISAM. Otherwise, the result of select @b-@a is 4001. The default value of internal_tmp_disk_storage_engine is InnoDB. With InnoDB, the value of Innodb_rows_read is increased by 1 when data is fetched from a temporary table.
I looked at the product in surprise, like a great man, why don’t you inherit my code, let me do the product? The rowid sorting
The algorithm above, just read the original table data again, the rest of the operation is insort_bufferAnd temporary file execution. But there is a problem: if the query returns many fields, thensort_bufferYou’ll have a lot of fields to put in, fewer rows in memory to put in at the same time, and you’ll have to split it up into temporary files, and you’ll have poor sorting performance.So if the single line is large, this method is not efficient enough. The product big start again trouble, then you know ifWhat does MySQL do if it thinks a sorted row is too long?
Now change a parameter to let MySQL use a different algorithm.
SET max_length_for_sort_data = 16;
Copy the code
- max_length_for_sort_data
MySQL is used to control the length of rows of data used for sorting. If the length of a single line exceeds this value, MySQL considers the single line to be too large and changes its algorithm.
If max_length_FOR_sort_data is set to 16, the total length of city, name and age fields should be 36.
The new algorithm puts only the columns to be sorted (the name field) and the primary key ID into the sort_buffer. * * * * * * * * * * * * * * * * * * * * * * * * * *
- Initialize sort_buffer to make sure to put two fields, i.e
name
andid
- from
city
Find the first primary key id that meets the condition city=’ Shanghai ‘, which is id_x in the figure - Select name, ID and sort_buffer from sort_buffer
- from
city
Retrieves the primary key ID of a record - Repeat steps 3 and 4 until city=’ Shanghai ‘is not satisfied, which is id_y in the figure
- Sort the data in sort_buffer by field name
- Iterate through the sorting result, take the first 1000 rows, and return to the original table according to the value of ID to retrieve the city, name and age fields and return them to the client.
Hear here, feel understand some: product you don’t worry, you see me draw thisThe rowid sorting
A schematic of the execution process, see if it works?Citizen primary key index (step7); citizen primary key index (step7)
Note that the last resultSet is a logical concept. In fact, the MySQL server retrits ids successively from the sorted sort_buffer, and then queries the results of the three fields city, name and age from the original table. It does not need to spend any more memory on the server to store the results, but directly returns them to the client.
Look at the OPTIMIZER_TRACE result for rowid sort and see what the difference is
"filesort_execution": [
],
"filesort_summary": {
"rows": 4000
"examined_rows": 4000,
"number_of_tmp_files": 10,
"sort_buffer_size": 32728 ,
"sort_mode": "<sort_key, rowid>"
Copy the code
select @b-@a
It turns out to be 5,000
Because in addition to the sorting process, after the sorting is complete, we need to go to the original table according to the ID. Because the statement is LIMIT 1000, 1000 more lines will be read.
- Sort_mode becomes
,>
Indicates that only the name and ID fields participate in the sort
- Number_of_tmp_files into 10
Because the number of rows involved in sorting is still 4000, but each row is smaller, the total amount of data to sort is smaller, and fewer temporary files are needed.
The product finally concluded:
- If MySQL thinks that the sort memory is too small, it will affect the sort efficiency, so it will use ROWID sort
You can sort more rows at a time, but at the end you have to go back to the table
- If MySQL determines that the memory is large enough, it will select full-field sort preferentially
Sort_buffer returns the query results directly from memory, without returning to the table.
So MySQL is: if you have enough memory, use more memory and minimize disk access.
For InnoDB, rowid sorting will require back to the table, causing more disk reads, so it will not be preferred. So MySQL sorting is an expensive operation.
- Do all orders by need to be sorted? If you get the right result without sorting, the system is much less expensive and the statement execution time is much shorter.
Not all orders by require sorting operations. MySQL needs to create temporary tables and sort on temporary tables because the original data is unordered.
- If we can ensure that rows retrieved from the city index are naturally sorted incrementally by name, we can stop sorting?
Yes.
So create a city,name joint index:
alter table t add index citizen(city, name);
Copy the code
- A diagram of the index
You can still use tree search to locate the first record that satisfies city=’ Shanghai ‘, and ensure that the following “next record” is traversed sequentially, as long as city is Shanghai, the name value must be in order. Then the flow of the whole query process becomes:
- Select the first primary key from index (city,name) where city=’ Shanghai ‘
- Select name, city, and age as part of the result set
- Retrieves a record primary key ID from the index (city,name)
- Repeat steps 2 and 3 until the 1000th record is found or the condition city=’ Shanghai ‘is not met
- The execution plan of the query statement after the introduction of the (city,name) syndication index
As you can see, this query procedure does not require temporary tables or sorting.
- Use Explain to view the execution plan of the (city,name) federated index
explain select city, name, age from citizen where city = 'Shanghai' order by name limit 1000;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------- ----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------- ----------------+
| 1 | SIMPLE | citizen | NULL | ref | city,name | name | 51 | const | 4000 | 100.00 | Using index condition |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------- ----------------+
1 row in set.1 warning (0.00 sec)
Copy the code
Using filesort is not required in Extra. And because the (city,name) joint index is itself ordered, the query does not have to read all 4000 rows, but only finds the first 1000 records that meet the criteria and exits. In this case, only 1000 scans.
Is it possible to simplify the execution of this statement further?
- Cover index
There is enough information on the index to satisfy the query request without going back to the primary key index to fetch the data.
By overwriting the index, you can further optimize the execution flow of the query. For this query, we can create a joint index of city, name, and age. The corresponding SQL statement is as follows:
alter table t add index city_user_age(city, name, age);
Copy the code
In this case, rows with the same value of the city field are sorted incrementally by the value of the name field, and the query does not need to be sorted. The entire query execution flow becomes:
- Find the first record from the index (city,name,age) that meets the condition city=’ Shanghai ‘, fetch the values of the three fields city,name, and age, and return them directly as part of the result set
- Fetches the next record from the index (city,name,age), also fetches the values of these three fields, and returns them directly as part of the result set
- Repeat 2 until the 1000th record is found or city=’ Shanghai ‘is not satisfied
The introduction of(city,name,age)
Federated index, query statement execution flow
- Explain to view
(city,name,age)
The execution plan of the federated index query statement
explain select city, name, age from citizen where city = 'Shanghai' order by name limit 1000;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------- ----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------- ----------------+
| 1 | SIMPLE | citizen | NULL | ref | city,name,age | age | 51 | const | 4000 | 100.00 | Using where; Using index |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------- ----------------+
1 row in set.1 warning (0.00 sec)
Copy the code
The Extra field contains more “Using index”, indicating that the overwrite index is used, and the performance is much faster. This does not mean that you need to create a federated index for all fields in a statement in order for each query to be covered by an index. After all, indexes take up a lot of space.
reference
- How does “Order by” work?
- Blog.csdn.net/Linuxhus/ar…
- Write order by everyday, do you know the basic implementation of Mysql?