1. The client sends a query to the server.

  2. The server first checks the query cache, and if it hits the cache, it immediately returns the results stored in the cache. Otherwise move on to the next stage;

  3. SQL parsing and preprocessing are performed on the server, and then the optimizer generates the corresponding execution plan.

  4. MySQL invokes the storage engine API to execute the query according to the execution plan generated by the optimizer.

  5. Returns the result to the client.

Query cache

Before parsing a query statement, if the query cache is open, MySQL checks first to see if the query matches the data in the query cache. This check is implemented through a case-sensitive hash lookup. Even if the query differs from the cached query by only one byte, it will not match the cached result, in which case the query will proceed to the next stage of processing.

If the current query happens to hit the query cache, MySQL checks user permissions before returning the query result. There is still no need to parse the query SQL statement, because the query cache already holds the table information that the current query needs to access. If the permissions are ok, MySQL skips all other stages and gets the results directly from the cache and returns them to the client. In this case, the query is not parsed, no execution plan is generated, and no execution is performed.

Cache configuration parameters:

Query_cache_limit: The maximum number of results that MySQL can cache. If the value is exceeded, increase the value of Qcache_not_cached and delete the query result

Query_cache_min_res_unit: Minimum unit size for allocating memory blocks

Query_cache_size: The total memory space used by the cache, in bytes. This value must be a multiple of 1024, otherwise MySQL may actually allocate different memory space.

Query_cache_type: indicates whether the cache is enabled. OFF: indicates that the cache is disabled. ON: indicates that the cache is always enabled

Query_cache_wlock_invalidate: Whether data is still returned from the cache if a table is locked. The default value is OFF to indicate that data can still be returned

Syntax parsers and preprocessors

First, MySQL parses SQL statements by keyword and generates a corresponding “parse tree”. The MySQL parser validates and parses queries using MySQL syntax rules. For example, it verifies that the wrong keywords are used, or that the keywords are used in the right order, or that the quotes match correctly.

The preprocessor further checks the parse tree for validity based on some MySQL rules, such as tables and columns, and resolves names and aliases to see if they are ambiguous.

The next step is for the preprocessor to validate the permissions, which is usually quick unless there are a lot of permissions set on the server.

Query optimizer

The syntax tree is now considered legal and is translated into an execution plan by the optimizer. A query can be executed in many ways, all returning the same result. The optimizer’s job is to find the best execution plan.

MySQL uses a cost-based optimizer, which will try to predict the cost of a query using a certain execution plan and select the one with the least cost. Initially, the minimum unit of cost was the cost of randomly reading a PAGE of 4K data, but later the costing formula became more complex and “factors” were introduced to estimate the cost of certain operations, such as when performing a WHERE conditional comparison. You can find the cost of the current query calculated by MySQL by querying the value of last_query_cost of the current session.

There are many reasons why the MySQL optimizer may choose the wrong execution plan, such as:

  1. Statistical information is not accurate.

  2. The cost estimate in the execution plan is not equivalent to the actual cost of the execution plan.

  3. MySQL optimality may not be what you think optimality is.

  4. MySQL never considers other concurrent queries, which can affect the speed of the current query.

  5. MySQL is not always cost-based optimization, but sometimes it is based on fixed rules.

  6. MySQL does not consider costs that are not under its control, such as the cost of executing stored procedures or user-defined functions.

MySQL query optimization uses a number of optimization strategies to generate an optimal execution plan. Optimization strategies can be divided into two types, static optimization and dynamic optimization. Static optimization can directly analyze the analytic tree and complete optimization. For example, the optimizer can convert the WHERE condition into another equivalent form by some simple algebraic transformations. Static optimization does not depend on particular values, such as some constants substituted in the WHERE condition. Static optimization persists after the first completion, even if the query is repeated with different parameters, and can be considered a “compile-time optimization.”

Dynamic optimization, on the other hand, depends on the context of the query. It can also be related to many other factors, such as the value in the WHERE condition, the number of rows corresponding to the entry in the index, and so on, which need to be reevaluated with each query and can be considered “runtime optimization.”

Here are some of the optimizations MySQL can handle:

  1. Redefine the order of associated tables

Table associations are not always performed in the order specified in the query, and determining the order of associations is an important part of the optimizer’s function.

  1. Convert an outer join to an inner join

Not all outer JOIN statements must be executed as outer joins. Many factors, such as where conditions and library table structure, can make an outer join equivalent to an inner join. MySQL can recognize this and rewrite the query so that it can adjust the association order.

  1. Use the equivalence transformation rule

MySQL can simplify and standardize expressions by using some equivalent transformations. It can merge and reduce some comparisons, and it can remove some judgments that are always true and some that are never true. For example :(5=5 and a>5) would be rewritten as a>5. Similarly, if I have a5 and b=c and a=5.

  1. Optimize count(), min(), and Max ()

Whether indexes and columns are empty or not usually helps MySQL optimize such expressions. For example, to find the minimum value of a column, you only need to query the left end of the corresponding B-tree index. MySQL can directly retrieve the first row of the index. This can be taken advantage of when the optimizer generates the execution plan. In the B-tree index, the optimizer treats this expression as a constant. Similarly, to find a maximum value, you only need to read the last record in the B-tree index. If MySQL uses this type of optimization, you can see “Select Tables Optimized Away” in Explain. Literally, it indicates that the optimizer has removed the table from the execution plan and replaced it with a constant.

Similarly, count(*) queries without any WHERE conditions can often use some of the optimizations provided by the storage engine. For example, MyISAM maintains a variable to hold the number of rows in a table.

  1. Estimate and convert to a constant expression

  2. Overwrite index scan

When the columns in the index contain all the columns needed in the query, MySQL can use the index to return the required data without querying the corresponding rows.

  1. Subquery optimization

MySQL can convert subqueries into a more efficient form in some cases, reducing the number of queries accessing data multiple times.

  1. Early termination of query

MySQL can always terminate the query immediately when it finds that the query requirements have been met. A typical example is when the limit clause is used. In addition, MySQL can also terminate a query prematurely in several cases, such as when a condition is found that is not true, MySQL can immediately return a null result.

As the above example shows, the query was terminated during the optimization phase.

  1. Equivalent transmission

  2. Comparison of the list in()

In many database systems, in() is exactly equivalent to multiple or conditions, because the two are completely equivalent. This is not true in MySQL. MySQL sorts the data in the in() list first, and then determines whether the values in the list meet the conditions through binary search. This is an O (log n) complexity operation, equivalent to the complexity of the query converted to or, for the in() list with a large number of values, MySQL will be faster.

Query execution engine

During the parsing and optimization phase, MySQL generates the corresponding execution plan for the query, and the query execution engine of MySQL completes the entire query according to this execution plan. The execution plan is a data structure, not the bytecode that many relational databases generate.

In contrast to the query optimization phase, the query execution phase is not so complicated: MySQL simply executes step by step according to the instructions given in the execution plan. In the process of step-by-step execution according to the execution plan, a large number of operations need to be done by calling the interfaces implemented by the storage engine, which we call the “Handler API.” In fact, MySQL creates a handler instance for each table during the optimization phase. The optimizer can obtain information about the table based on the interface of these instances, including all column names of the table, index statistics, and so on.

Returns the result to the client

The final stage of query execution is to return the results to the client. Even if the query does not return results to the client, MySQL still returns some information about the query, such as the number of rows affected by the query.

If the query can be cached, MySQL will store the result in the query cache at this stage.

MySQL returns the results to the client in an incremental, step-by-step process. For example, in an associative table operation, once the server has processed the last associative table and started generating the first result, MySQL can start gradually returning the result set to the client.

This has two advantages: the server doesn’t have to store too many results, and it doesn’t consume too much memory by returning too many results. In addition, this processing allows the MySQL client to get the results returned in the first place.

Each row in the result set is sent in a packet that meets the MySQL client/server communication protocol, and then transmitted over TCP. During TCP transmission, MySQL packets may be cached and then transmitted in batches.