MySQL uses syndication far more than we used to recognize. Basically, it assumes that every query has a union, not just matching rows from two tables, which includes subqueries and even SELECT operations only on a single table. Therefore, it is important to understand how MySQL performs federation.

MySQL federated query execution policy.

Taking a UNION query as an example, MySQL will execute the UNION query as a series of single queries, then put the corresponding results into a temporary table, and finally read back. In MySQL, each independent query is a federated query, as is the return result from a read from a temporary table.

In this case, MySQL’s federated query execution is simple — it treats the federated query here as a nested loop of federated queries. This means that MySQL runs a loop to read rows from the table, while running a nested loop to read matching rows from the next table. This process continues until all matching rows in the federated query are found. The result is then constructed based on the columns required in the SELECT statement. The following query statement looks like this:

SELECT tb1.col1, tb2.col2
FROM tb1 INNER JOIN tb2 USING(col3)
WHERE tb1.col1 IN(5.6);
Copy the code

The actual conversion to the pseudo-code that MySQL might execute would look like this:

outer_iter = iterator over tb1 where col1 IN(5.6);
outer_row = outer_iter.next;
while outer_row
	inner_iter = iterator over tb2 where col3 = outer_row.col3;
	inner_row = inner_iter.next
    while inner_row
    	output [outer_row.col1, inner_row.col2];
        inner_row = inner_iter.next;
	end
    outer_row = outer.iter.next;
end
Copy the code

This query execution plan handles multi-table queries as simply as single-table queries, which is why even single-table queries can be treated as federated queries — the basic operation for more complex combined federated queries. The same is true for external joins, such as the following query:

SELECT tb1.col1, tb2.col2
FROM tb1 LEFT OUTER JOIN tb2 USING(col3)
WHERE tb1.col1 IN(5.6);
Copy the code

Converted to pseudocode, it looks like this:

outer_iter = iterator over tb1 where col1 IN(5.6);
outer_row = outer_iter.next;
while outer_row
	inner_iter = iterator over tb2 where col3 = outer_row.col3;
	inner_row = inner_iter.next
    if inner_row
        while inner_row
            output [outer_row.col1, inner_row.col2];
            inner_row = inner_iter.next;
        end
    else
    	output [outer_row.col1, NULL];
	end
    outer_row = outer.iter.next;
end
Copy the code

Another way to visualize a query plan is in the form of a swimlane map. The figure below shows the swimlane map for the inner join query.

MySQL performs all kinds of queries in essentially the same way. For example, subqueries that need to be executed first in the FROM condition are also put into temporary tables, which are then federated as if they were normal tables. MySQL also uses temporary tables to perform federated queries, and then overrides the right join query to its equivalent left join. In short, the current version of MySQL handles queries in this way as much as possible (more sophisticated processing has been introduced since the latest version of MySQL5.6).

Of course, not all legitimate SQL queries can do this, and some queries can do poorly.

The execution plan

Unlike many other database products, MySQL does not generate bytecode from query statements to execute query plans. In effect, the query execution plan is an instruction tree from which the query execution engine generates query results. The final query plan contains enough information to refactor the original query. If you execute EXPLAIN EXTENDED on the query (not required after MySQL 8) and then execute SHOW WARNINGS, you can see the refactored query.

Multi-table queries can be conceptually represented by trees. For example, a query with four tables might look like the tree below. This is called a balanced tree in computers,

However, this is not how MySQL performs queries. As mentioned earlier, MySQL always starts with one table and then looks for matching rows from the next table. Therefore, MySQL’s query plan looks like the left deep join tree below.

Federated query optimizer

The most important part of MySQL’s query optimizer is the federated query optimizer, which determines the optimal order of multi-table query execution. It is often possible to obtain the same results in the order of multiple federated queries. The federated query optimizer tries to estimate the cost of these alternatives and then selects the least expensive option to execute.

The following is an example of a federated query that queries the same results but in a different order.

SELECT film.film_id, film.title, film.release_year, actor.actor_id, actor.first_name, actor.last_name
FROM sakila.film
INNER JOIN sakila.film_actor USING(film_id)
INNER JOIN sakila.actor USING(actor_id);
Copy the code

There might be a few different ways to query. For example, MySQL could start from the film table, use the film_id index of film_actor to find the corresponding acTOR_di value, and then use the primary key to find the corresponding actor row from the actor table. An Oracle user might say, “Film is the driver of film_actor, and film_actor is the driver of actor.” The results of parsing using Explain are as follows:

******** 1.row ********
id: 1
select_type: SIMPLE
table: actor
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 200
Extra:
******** 2.row ********
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: PRIMARY, idx_fk_film_id
key: PRIMARY
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: USING index
******** 3.row ********
id: 1
select_type: SIMPLE
table: film
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 2
ref: sakila.film_actor.film_id
rows: 1
Extra: 
Copy the code

The execution plan is quite different from what we expected. MySQL starts with actor tables first and then reverses the order. Is it really more effective? We could add STRAIGHT_JOIN to EXPLAIN to avoid optimization:

EXPLAIN SELECT STRAIGHT_JOIN film.film_id, film.title, film.release_year, actor.actor_id, actor.first_name, actor.last_name
FROM sakila.film
INNER JOIN sakila.film_actor USING(film_id)
INNER JOIN sakila.actor USING(actor_id);
Copy the code
******** 1.row ********
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 951
Extra:
******** 2.row ********
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: PRIMARY, idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: USING index
******** 3.row ********
id: 1
select_type: SIMPLE
table: actor
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 2
ref: sakila.film_actor.actor_id
rows: 1
Extra: 
Copy the code

This explains why MySQL needs to perform queries in reverse order, which results in fewer rows being examined.

  • Querying film first will require 951 queries for film_actor and actor (outermost loop)
  • If the actor table is front-loaded, you only need to perform 200 queries on the other tables.

As you can see from this example, MySQL’s federated query optimizer can reduce query costs by adjusting the order of the query tables. A reordered federated query is usually a very efficient optimization, often a multifold performance improvement. If there were no performance gains, STRAIGHT_JOIN could have been used to avoid reordering and use what we thought was best for the query. This is rarely the case, and in most cases, the federated query optimizer will do a better job than a human.

The federated Query Optimizer view builds a query execution tree with minimal completion cost. If possible, it starts with a full single-table plan and examines all possible combinations of subtrees. Unfortunately, a combined query of N tables has an N factorial number of combined orders. This is called the search space for all possible query plans, and the number is growing very fast. A 10-table joint index can have 3,628,800 different ways! Once the search space grows too large, the optimization of the query will take too long. At this time, the server will stop doing full analysis and instead complete optimization in a way similar to greedy algorithm. This amount is controlled by the Optimizer_search_depth system variable, which you can modify yourself.