sequence
This article is a supplement to the cognitive article, focusing on MySQL’s cost-based optimizer.
Post a query flow chart first
Cost-based query optimizer
A query can be executed in many ways, all returning the same result. The optimizer’s job is to find the best execution plan. The execution plan is the query tree, which consists of a series of physical operators that form the execution plan of the query according to certain operation relationships.
MySQL uses a cost-based optimizer, which will try to predict the cost of a query using a certain execution plan and select the one with the least cost. Here are some default cost configuration items:
It can be seen (cost in descending order) :
key_compare_cost
Represents the cost of key comparison. The more key comparisons, the more expensive this term is, and this is an important metric, with a default value of 0.05.memory_temptable_row_cost
Represents the row cost of a temporary table in memory. The default value is 0.1.row_evaluate_cost
If the number of qualified rows increases, the cost of this item increases. Therefore, this is an important metric. The default value is 0.1.memory_block_read_cost
The cost of memory reading a page of data is 0.25.disk_temptable_row_cost
, represents the row cost of a temporary table file (MyISAM or InnoDB). The default value is 0.5.memory_temptable_create_cost
Represents the creation cost of temporary tables in memory. The default value is 1.io_block_read_cost
The cost of disk reading a page of data is 1.disk_temptable_create_cost
, represents the creation cost of temporary table files (MyISAM or InnoDB). The default value is 20.
The optimizer estimates costs based on statistics such as the number of pages per table or index, index cardinality (the number of different values in the index), index and data row length, and index distribution. These statistics are obtained by the optimizer from the storage engine.
Why does the optimizer go wrong
The optimizer may choose a less-than-optimal execution plan due to some factors, such as:
- Inaccurate statistics. For example, InnoDB cannot maintain accurate statistics on the number of rows in a table due to its MVCC architecture.
- The estimated cost of executing the plan is not the same as the actual cost of executing it. For example, when multiple pages of data are read sequentially or the data is stored in memory, the optimizer has no way of knowing how many physical IO it actually takes to read the data.
- MySQL optimality does not necessarily mean the fastest execution time, i.e. lower cost does not mean faster execution.
- MySQL does not consider concurrent queries, does not consider caching.
- MySQL is not cost-based optimization all the time.
- MySQL does not consider the cost of operations that are not under its control.
- MySQL cannot estimate all possible execution plans.
What optimizers do in general
MySQL has two optimization strategies. One is static optimization, which can directly analyze and optimize the parsing tree. For example, by doing some simple algebraic transformations to convert the WHERE condition into another equivalent form. Static optimization does not depend on specific values and remains effective after the first completion, even if the query is repeatedly executed with different parameters. The other is dynamic optimization, which depends on the query context, such as the value in the WHERE condition, the number of rows corresponding to the entry in the index, and so on.
Here are some optimizations that MySQL can handle:
- Redefine the order of associated tables
- Convert an outer join to an inner join
- Use the equivalence transformation rule
- Optimize COUNT(), MIN(), and MAX()
- Estimate and convert to a constant expression
- Overwrite index scan
- Subquery optimization
- Submit termination query
- Equivalent transmission
- Comparison of the list IN()
Optimization of associated query
What is associated query? MySQL performs nested loop association for any association, which means that MySQL loops through one table to retrieve a single item of data, then loops through the next table to find matching rows, and so on, until it finds matching behavior in all tables. Temporary tables are used in some cases: when a subquery is encountered in the FROM clause, the subquery is executed first and the results are placed in a temporary table; A series of query results are put into temporary tables for UNION queries. Note here that temporary tables use the Memory storage engine and do not have any indexes.
When dealing with associated queries, the optimizer changes the association order to allow fewer nested loops and backtracking. It can be simply understood as keeping the result set of the first query as small as possible to reduce the data range of the next query.
Optimization of sorting
Sorting is an expensive operation, so if possible sort by index. If it is not possible to use the index to generate the sort result, MySQL will need to do the sort itself, in memory when the data is small, in disk when the data is large. MySQL’s own sorting process is called file sorting.
MySQL > select * from ‘sort’;
Two-pass :(old version) reads the row pointer and the field to be sorted, sorts it, and then reads the desired rows based on the sorted result. Disadvantages: Two data transfers, and the second time will generate a lot of random IO Advantages: store as little data as possible when sorting, which makes it possible to hold as many rows in the sort cache for sorting
Single-pass: (new version) First reads the rows and columns required by the query, then sorts the query based on the given column, and finally returns the sorting result directly. Advantages: One data transfer, no random IO. Disadvantages: Very many and large columns are returned. Takes up a lot of extra space
If all columns in the ORDER BY clause are from the first table associated with the optimizer, then MySQL will sort the files in the first table associated with the optimizer.
other
Simple lists are no progress in writing articles.
If you find this article helpful, please leave a red like before you go, or browse the author’s other articles if you are interested. If you think this article is a waste of your time, please leave your comments in the comments section. If you have any questions, please leave a message. The author is willing to spend time and energy to find answers and discuss them together.