Hello everyone, I am Wolf King, a programmer who loves playing ball

We usually deal with SQL statements of time certainly a lot, have written simple SQL, also for SQL is very complex business logic racking their brains, that I ask you here a simple question: that you know SQL statement query order is what?

This is what I thought when I first saw the problem

This should be an easy question to answer, since you’ve written countless SQL queries, some of them quite complex. You don’t have enough of this??

But the truth is, IT’s still hard for me to say exactly what the order is. \

Let’s look at the complete syntax of the SELECT statement:

1. SELECT 
2. DISTINCT <select_list>
3. FROM <left_table>
4. <join_type> JOIN <right_table>
5. ON <join_condition>
6. WHERE <where_condition>
7. GROUP BY <group_by_list>
8. HAVING <having_condition>
9. ORDER BY <order_by_condition>
10.LIMIT <limit_number>
Copy the code

However, the order of execution is as follows

JOIN < JOIN, left JOIN, right JOIN... > <join table > # specify join to add data to virtual table after ON, For example, left JOIN will add the remaining data of the left table to the virtual table WHERE < WHERE condition > # filter the above virtual table GROUP BY < GROUP condition > # aggregate function <SUM() > # Having < group filter > # aggregate filter on the results of a group SELECT < return list of data > # the single column returned must be in the group by clause. ORDER BY * ORDER BY * ORDER BY * ORDER BY * LIMITCopy the code

For each of these steps, the SQL engine creates a virtual table in memory, performs subsequent operations on the virtual table, frees the memory of unused virtual tables, and so on.

The specific process is as follows :(VT indicates → virtual table virtual)

  1. From: select * from table_1, table_2; Select * from table_1 join table_2; Is the same result, is the cartesian product; This is used to directly compute the Cartesian product of two tables to obtain the virtual table VT1, which is the first operation performed by all SELECT statements. The other operations are performed on this table, which is what the FROM operation does
  2. On: Filter qualified data from VT1 table to form VT2 table;
  3. Join: Add the data of the join type to the VT2 table. For example, left Join adds the remaining data of the left table to the virtual table VT2 to form the VT3 table. If the number of tables is greater than 2, steps 1-3 are repeated.
  4. Where: Perform filtering, (without aggregation function) get VT4 table;
  5. Group by: group VT4 tables to obtain VT5 tables. Columns used in subsequent statements, such as SELECT and HAVING, must be included in the group by condition.
  6. Having: Filter grouped data to get VT6 table;
  7. Select * from VT7;
  8. Distinct: used to obtain VT8 table by deduplication.
  9. Order by: used to sort VT9 table;
  10. Limit: return the required number of rows, get VT10;

Note that:

  1. In the group by condition, each column must be a valid column and cannot be an aggregation function;
  2. Null values are also returned as a group;
  3. Except for aggregate functions, columns in the SELECT clause must be in the group by condition;

This gives us an idea of what a query will return, and answers the following questions:

  1. Can I use WHERE after GRROUP BY? (No, GROUP BY is after WHERE!)
  2. Can I filter the results returned by window functions? SELECT SELECT after WHERE and GROUP BY
  3. Can WE ORDER BY based on things in GROUP BY? (Yes, ORDER BY is basically executed at the end, so you can ORDER BY based on anything)
  4. When is LIMIT implemented? (At the end!)

However, database engines do not necessarily execute SQL queries in this order, as they are optimized for faster query execution, as explained below.

Do aliases in SQL affect the order in which SQL is executed?

The following SQL is displayed:

SELECT 
CONCAT(first_name, ' ', last_name) AS full_name, 
count(*)
FROM table
GROUP BY full_name
Copy the code

From this statement, it looks as if GROUP BY is executed after SELECT because it references an alias in SELECT. But it doesn’t have to be this way, the database engine will rewrite the query like this:

SELECT 
CONCAT(first_name, ' ', last_name) AS full_name, 
count(*)
FROM table
GROUP BY CONCAT(first_name, ' ', last_name)
Copy the code

So, GROUP BY is still executed first.

In addition, the database engine does a series of checks to make sure that things in SELECT and GROUP BY are valid, so it does an overall check of the query before generating the execution plan.

The database is likely to execute queries out of the normal order (optimization)

In practice, databases do not necessarily execute queries in JOIN, WHERE, or GROUP BY order, because they perform a series of optimizations that throw the order out of order and make the query execute faster, as long as the query results are not changed.

This query illustrates why queries need to be executed in a different order:

SELECT * FROM
dept d LEFT JOIN student s 
ON d.student_id = s.id
WHERE s.name = 'the Wolf king'
Copy the code

If you only need to find the student named Wolf King, there is no need to left-join all the data in the two tables. The query will be much faster if you filter before joining, and for this query, filtering first will not change the result of the query.


All right. That’s all for today. I will continue to share what I have learned and thought. I hope we can walk on the road to success together!

Willing to output dry Java technology public number: Wolf king programming. The public number has a large number of technical articles, massive video resources, beautiful brain map, might as well pay attention to it! Get lots of learning resources and free books!

Forwarding moments is the biggest support for me!

\

Click “like and watching” if you feel something! Thank you for your support!