As for MySQL, only by more comprehensive understanding of each process of SQL statement execution can WE better design and optimize SQL. When you want MySQL to run queries with higher performance, it’s best to figure out how MySQL optimizes and executes queries. Once you understand this, a lot of query optimization is actually about following some principles and working the way you expect it to work reasonably. Here’s what MySQL does when it sends a request to MySQL:

  1. The client sends a query to the server.

  2. The server first checks the query cache, and if a hit is made, it immediately returns the result stored in the cache. Otherwise move on to the next stage.

  3. SQL is parsed and preprocessed on the server, and then the optimizer generates the corresponding execution plan.

  4. MySQL then calls the storage engine API to execute the query based on the execution plan generated by the optimizer.

  5. Returns the result to the client.

SQL statement execution process

The query cache

The MySQL query cache holds the complete structure of the query returned. When a query hits the cache, MySQL returns results immediately, skipping parsing, optimization, and execution. The query caching system keeps track of every table involved in a query, and if the tables change, all cached data associated with that table is invalidated. MySQL stores the cache in a reference table, referenced by a hash value that includes the query itself, the current database being queried, the version of the client protocol, and other information that may affect the results returned. When determining whether or not the cache hits, MySQL does not parse the query, but instead uses the SQL statement and other raw information sent by the client. Therefore, any character differences, such as Spaces, annotations, etc., will result in a cache miss. If there is some indeterminate data in the query, it will not be cached. For example, queries that contain the function NOW() or CURRENT_DATE() are not cached. Tables that contain any user-defined functions, storage functions, user variables, temporary tables, system tables in the mysql database, or tables that contain any column-level permissions will not be cached. It is important to note that MySQL does not not check the query cache just because the query contains an indeterminate function, because MySQL does not parse the query before checking the query cache, so it has no way of knowing if there are indeterminate functions in the statement. The fact is, if the query contains any undefined function, the query result will not be cached because it will not be found in the query cache. The configuration for the query cache is shown below.

  • Query_cache_type: indicates whether query cache is enabled. The value can be OFF, ON, or DEMAND. DEMAND indicates that only statements that explicitly specify SQL_CACHE in the query statement will be placed in the query cache.

  • Query_cache_size: Queries the total memory space used by the cache.

  • Query_cache_min_res_unit: The smallest unit of memory allocated in the query cache. A smaller value reduces the amount of memory space wasted by fragmentation, but results in more frequent memory block operations.

  • Query_cache_limit: indicates the maximum query result that MySQL can query. If the query result is greater than this value, it will not be cached. Since the query cache attempts to cache data at the time the data is generated, MySQL does not know if the query results exceed the limit until all the results are returned. After that, the results are removed from the query cache.

Query cache optimization is an important part of database performance optimization. The judging process is roughly as shown in the figure below.

Query cache judgment flowchart

The cache hit ratio can be calculated using the following formula: Qcache_hits/(Qcache_hits + Com_select).

Parsing and preprocessing

The parser parses SQL statements by keyword and generates the corresponding parse tree. The MySQL parser validates and parses queries using MySQL syntax rules. The preprocessor further checks that the parse book is valid based on some MySQL rules, such as checking for tables and columns, and parsing names and aliases to see if they are ambiguous.

Query optimizer

The query optimizer transforms the parse tree into an execution plan. A query can be executed in multiple ways, all returning the same result. The optimizer’s job is to find the best execution plan. The process of generating an execution plan can be time consuming, especially if there are many alternative execution plans. If the final execution plan corresponding to an SQL statement is cached during the execution of the statement, the cached execution plan can be directly used when similar statements are entered into the server again. In this way, the whole process of generating an execution plan for SQL statements is skipped and the execution speed of statements is improved.

Execution plan cache

MySQL uses a cost-based query Optimizer (CBO). It tries to predict the cost of a query using a certain execution plan and selects the one that costs the least. The optimizer will transform the relational expression according to the optimization rules. The transformation here means that one relational expression will generate another relational expression after the optimization rules, and the original expression will also be retained. After a series of transformations, multiple execution plans will be generated. Then CBO will calculate the Cost of each execution plan based on statistical information and Cost Model, and select the execution plan with the lowest Cost. It can be seen from the above that there are two dependencies in CBO: statistical information and cost model. Whether the statistical information is accurate or not and whether the cost model is reasonable or not will affect CBO to choose the optimal plan. The principle of the optimizer is very complicated, so I won’t go into details here. You can learn by yourself.

Query execution engine

During the parsing and optimization phase, MySQL will generate the corresponding execution plan of the query, and the query execution engine of MySQL will complete the entire query according to this execution plan. Here the execution plan is a data structure, rather than generating the corresponding bytecode as other relational databases do.

Returns the result to the client

If the query can be cached, the MySQL page will store the results in the query cache at this stage. MySQL returns the result set to the client in an incremental, step-by-step process. After the query produces the first result, MySQL can start gradually returning the result set to the client.

Subscribe to the latest articles, please follow my wechat official account

reference

  • SQL optimizer principle, the query optimizer review on https://zhuanlan.zhihu.com/p/40478975

  • High Performance MySQL

  • InnoDB Storage Engine