Note: This article from “Mysql Technology insider: SQL programming”, a good book, explain the SQL principle of Mysql, recommended reading
Question 1: There is an SQL statement, so how does the query result return?
Data Table 1:
CREATE TABLE `test1` (
`id` int(4) NOT NULL,
`age` int(4) DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8; Copy the code
Data Table 2:
CREATE TABLE `test2` (
`id` int(4) DEFAULT NULL,
`t1` int(4) NOT NULL COMMENT 'foreign key id in test1',
`name` varchar(100) NOT NULL COMMENT 'name'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code
sql: Copy the code
select DISTINCT a.id, a.age,b.id from test1 a
left JOIN test2 b on a.id = b.t1
where a.id in(1,2) GROUP BY A.id with cube HAVING B.id < 100 order BY A.age desclimit10, 20;Copy the code
How efficient is this SQL statement? How is it optimized? With such questions, we gradually learn the execution process of SQL statements and how to improve the efficiency of SQL queries
First, query processing
1. The logical execution flow of mysql abstracts the SQL statement in Question 1 as follows:
(8)select (9)DISTINCT<select_list>
(1)from <left_table>
(3)<join_type> JOIN <right_table>
(2)on <JOIN_condition>
(4)where <where_condition>
(5)GROUP BY<group_list>
(6)WITH<cube|ROLLUP>
(7)having<having_condition>
(10)order by<order_BY_list>
(11)limit <limit_num> Copy the code
Mysql always starts with the FROM table, each step produces a virtual table, and the final result is not returned until the last step is complete.
(1) Left_table and right_table perform cartesian product to generate virtual table VT1. If the records in the left table are M and the records in the right table are N, then the number of records in the virtual table is not m× N.
(2) Through the ON condition, filter the records that meet the condition to the virtual table VT2, pay attention to the NULL in the ON condition, usually try not to have null values in the attributes of the ON condition;
Select * from outer join (select * from outer join); select * from outer join (select * from outer join); select * from outer join (select * from outer join);
(4) According to where_condition of VT3, select the qualified record to generate VT4; Note: Aggregate functions such as count(), min(), Max () cannot be used for where conditions that are not grouped, and attribute aliases cannot be used without performing select.
(5) Generate VT5 according to the group column grouping of VT4;
(6) Vt5 aggregation operation to generate VT6;
(7) Having vt7 filter generate VT7;
Select vt8 from vt8;
(9)vt8 to obtain vt9, this will use a temporary table to handle, add a unique index, achieve deduplication;
Innodb stores vt10 by index. If there is no index in the order by column, it will rescan the order, which will consume a lot of performance.
(11) in VT10, select the specified number of records by limit, return to the client, vt10 for a table scan through this analysis process, roughly know the problem in the SQL execution process
2. Physical query processing
Mysql has two components: Parser and Optimizer. Parser parses SQL statements and Optimizer optimizes SQL statements. The real query is not necessarily in accordance with the process of logical query processing, index plays a key role in query optimization, if there is no index, according to the logical query to deal with, the efficiency is very low, the cost is very high. Each virtual table actually exists in memory. Question 2: Why is it true that indexing improves query efficiency? We’ll find out later on.
Second, subquery
Mysql subqueries have always been a maligned process with poor performance and are rarely used in applications. But sometimes it’s easy to implement and SQL is more readable.
The primary table of self-query is scanned for all tables. If sub-tables match indexes, indexes are used. Otherwise, Cartesian product is performed