MySQL Order by execution flow

It is common to use order BY in your work, but what do you know about the mysql Order execution flow?

For example 🌰, suppose we have a table

CREATE TABLE `t` (
`id` int(11) NOT NULL,
`city` varchar(16) NOT NULL, `name` varchar(16) NOT NULL, `age` int(11) NOT NULL,
`addr` varchar(128) DEFAULT NULL.PRIMARY KEY (`id`),
KEY `city` (`city`)
) ENGINE=InnoDB;
Copy the code

And then you execute one

select city,name,age from t where city='hangzhou' order by name limit 1000;
Copy the code

Full field scan

Let’s use the Explain command to see how this statement executes

The “Using filesort” field in Extra indicates that the sorting is required. The “Using filesort” field indicates that the index is used. MySQL gives each thread a memory called “sort_buffer” for sorting.

The above SQL statement execution process is simply summarized

  1. Initialize sort_buffer to determine the select field.
  2. Find all columns that meet the criteria in the index, get the entire row in the primary key, retrieve the select field, and put it into sort Buffer.
  3. Sort the contents of sort Buffer (the above SQL sort by name, this sort action is done in memory).
  4. Take the number of lines of limit and return it to the front end.

⚠️ Note that all values that meet the criteria are put into sort Buffer.

Since all the values are put into the sort buffer, the sort buffer is memory, so it may not fit, so there is a situation where the sort buffer does not fit. Sort_buffer_size specifies the size of sort Buffer. If the sort buffer does not fit, MySQL will need temporary files.

/* Open optimizer_trace, valid only for this thread */ 
SET optimizer_trace='enabled=on';
/* @a saves the initial value of Innodb_rows_read */
select VARIABLE_VALUE into @a from performance_schema.session_status where variable_name = 'Innodb_rows_read'
/* Execute the statement */
select city, name,age from t where city='hangzhou' order by name limit 1000;
/* View the OPTIMIZER_TRACE output */
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`\G
/* @b saves the current value of Innodb_rows_read */
select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read';
/* Calculate the Innodb_rows_read difference */ 
select @b-@a;
Copy the code

This method is verified by looking at the result of OPTIMIZER_TRACE, which you can see from number_of_tmp_files to see if temporary files are used.

Number_of_tmp_files is the number of temporary files used during sorting. MySQL divides the data that needs to be sorted into 12 pieces, and each piece is sorted separately and stored in these temporary files. Then combine the 12 ordered files into one large ordered file. Number_of_tmp_files is 0, which means that sorting can be done directly in memory.

The rowid sorting

In the full field scan above, only the original table data is read once, and the rest is done in sort_buffer and temporary files. If you select a large number of columns from the sort buffer, the space of the sort buffer will be limited. If you want to put as few columns as possible into the sort buffer, you will only have the columns to sort (i.e. the name field and the primary key ID). The entire execution process is as follows:

  1. Initialize sort_buffer to place the field (name) and primary key ID that you want to sort.
  2. Find the ID of the column that meets the WHERE condition, get the column that you want to sort from the cluster index, and put it into sort buffer.
  3. Sort the contents of sort Buffer.
  4. Fetches the limit number of rows for the sorted column.
  5. Retrieve other fields from the clustered index by ID. Back to the front end.

Sorting data is a very expensive thing. If the data itself is in order, there is no need to sort it. How can we ensure that it is in order? That’s the SQL we’re executing up thereSelect city,name,age from t where city=' hangzhou 'order by name limit 1000;To do this, create a federated indexalter table t add index city_user(city, name);Why is creating a federated index ok? Recall for yourself the data structure of a federated index.

You can use tree search to locate the first record that satisfies city=’ Hangzhou ‘, and additionally ensure that the value of name must be in order as long as the value of city is Hangzhou in the following “next record” traversal. The execution process is as follows:

  1. Find columns that meet the WHERE condition based on the index (city,name)
  2. Uncluster the index to retrieve the desired field and return it to the front end (repeat the median limit number of times)

A quick thought: would it still work if you created a syndicated index with name on the left and city on the right?

The above process does not need to fetch columns from the clustered index if there are already columns in the index that need to be returned. In the case of the above query, we do not need to fetch data from the clustered index if we create the federated index (city,name,age).