This article should take about 6 minutes to read.
preface
Brush online to a database optimization article, to study a wave of their own.
scenario
Database version: 5.7.25, running on a VIRTUAL machine.
The curriculum
# the curriculum
create table Course(
c_id int PRIMARY KEY,
name varchar(10)
)Copy the code
Add 100 pieces of data
Add 100 pieces of data to the class schedule
DROP PROCEDURE IF EXISTS insert_Course;
DELIMITER $
CREATE PROCEDURE insert_Course()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i<=100 DO
INSERT INTO Course(`c_id`,`name`) VALUES(i, CONCAT('Chinese',i+' '));
SET i = i+1;
END WHILE;
END $
CALL insert_Course();Copy the code
Run time consuming
CALL insert_Course(); > OK > Time: 0.152sCopy the code
Course data
Students table
# students table
create table Student(
s_id int PRIMARY KEY,
name varchar(10)
)Copy the code
Add 7W pieces of data
Add 70000 entries to student table
DROP PROCEDURE IF EXISTS insert_Student;
DELIMITER $
CREATE PROCEDURE insert_Student()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i<=70000 DO
INSERT INTO Student(`s_id`,`name`) VALUES(i, CONCAT('Joe',i+' '));
SET i = i+1;
END WHILE;
END $
CALL insert_Student();Copy the code
The results
CALL insert_Student(); > OK > Time: 175.838sCopy the code
The student data
League tables
The result for #
CREATE table Result(
r_id int PRIMARY KEY,
s_id int,
c_id int,
score int
)Copy the code
Add 70W pieces of data
Add 70W data to the score table
DROP PROCEDURE IF EXISTS insert_Result;
DELIMITER $
CREATE PROCEDURE insert_Result()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE sNum INT DEFAULT 1;
DECLARE cNum INT DEFAULT 1;
WHILE i<=700000 DO
if (sNum%70000 = 0) THEN
set sNum = 1;
elseif (cNum%100 = 0) THEN
set cNum = 1;
end if;
INSERT INTO Result(`r_id`,`s_id`,`c_id`,`score`) VALUES(i,sNum ,cNum , (RAND()*99)+1);
SET i = i+1;
SET sNum = sNum+1;
SET cNum = cNum+1;
END WHILE;
END $
CALL insert_Result();Copy the code
The results
CALL insert_Result(); > OK > Time: 2029.5sCopy the code
Results data
test
The business requirements
Find Chinese 1 score 100 test takers
The query
Select * from test1 where test1 = 100
select s.* from Student s where s.s_id in
(select s_id from Result r where r.c_id = 1 and r.score = 100)Copy the code
Execution time: 0.937s
Query result: 32 students meet the conditions
0.9s, to view the query plan:
EXPLAIN
select s.* from Student s where s.s_id in
(select s_id from Result r where r.c_id = 1 and r.score = 100)Copy the code
Select * from ‘where’; select * from ‘where’; select * from ‘where’;
In the query result, column TYPE: all indicates that all tables are scanned, and index indicates that index is scanned.
Create index (c_id, score)
CREATE index result_c_id_index on Result(c_id);
CREATE index result_score_index on Result(score);Copy the code
Run the preceding query statement again in 0.027 seconds
Fast 34.7 times (rounded), greatly shorten the query time, it seems that the index can greatly improve the query efficiency, it is necessary to build an index in the right column, many times forget to build an index, when the amount of data is small, there is no feeling, this optimization feeling is very nice.
If the same SQL statement is executed multiple times, you will find that the first time is the longest, and subsequent executions will take slightly less time than the first time because the same statement is read directly from the cache the second time.
0.027s is very short, but can it be optimized again? Take a closer look at the execution plan:
View the optimized SQL:
SELECT
`example`.`s`.`s_id` AS `s_id`,
`example`.`s`.`name` AS `name`
FROM
`example`.`Student` `s` semi
JOIN ( `example`.`Result` `r` )
WHERE
(
( `example`.`s`.`s_id` = `<subquery2>`.`s_id` )
AND ( `example`.`r`.`score` = 100 )
AND ( `example`.`r`.`c_id` = 1 )
)Copy the code
How do I view optimized statements?
The method is as follows (executed in a command window) :
# to perform
EXPLAIN
select s.* from Student s where s.s_id in
(select s_id from Result r where r.c_id = 1 and r.score = 100);
# in the implementation
show warnings;Copy the code
The results are as follows
A type = all
As previously thought, the order in which this SQL is executed is to execute subqueries
select s_id from Result r where r.c_id = 1 and r.score = 100Copy the code
Time: 1.402 s
The following results are obtained (partially)
And then execute
select s.* from Student s where s.s_id in(12871409 87467 29613 81395 5106 87140 47269 17288 97311 74388 96565 18107 74250 30977 8125 44247 21272 95603 61, 38479469 90669 88679 0359 95461 92475 78581 71632 20668 5673 72462 79646 93.Copy the code
Time: 0.222 s
MySQL > execute (MATERIALIZED subquery); MySQL > execute (MATERIALIZED subquery); MySQL > execute (MATERIALIZED subquery); MySQL > select * from student where ID = 1; MySQL > select * from student where ID = 1;
Materialized subqueries: The optimizer uses materialization to process subqueries more efficiently. Materialization speeds up query execution by treating subquery results as a temporary table, normally in memory. The first time mysql needs a subquery result, it materializes the result into a temporary table. Whenever the result set is needed later, mysql references the temporary table again. The optimizer may use a hash index to make the query faster and cheaper. Indexes are unique, eliminating duplication and making the table less data.
What about joining queries instead?
In order to re-analyze the connection query, delete the indexes result_C_ID_index and result_score_index temporarily.
DROP index result_c_id_index on Result;
DROP index result_score_index on Result;Copy the code
Join queries
select s.* from
Student s
INNER JOIN Result r
on r.s_id = s.s_id
where r.c_id = 1 and r.score = 100;Copy the code
Execution time: 1.293s
The query results
EXPLAIN + query SQL to check the execution plan:
Select * from s_id; select * from s_id
CREATE index result_s_id_index on Result(s_id);
show index from Result;Copy the code
A join query is being executed
Time: 1.17s (a little weird, it should be longer depending on the article you are reading)
Take a look at the execution plan:
The optimized query statement is as follows:
SELECT `example`.`s`.`s_id` AS `s_id`, `example`.`s`.`name` AS `name` FROM `example`.`Student` `s` JOIN `example`.`Result` `r` WHERE ( ( `example`.`s`.`s_id` = `example`.`r`.`s_id` ) AND ( `example`.`r`.`score` = 100 ) AND ( `example`.`r`.`c_id` = 1 ) )Copy the code
SQL > select ‘where’ from ‘join’;
Back to the previous execution plan:
SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL
Normally, join is performed before WHERE filtering, but in our case, if join is performed first, 70W data will be sent to join. Therefore, the wise scheme of WHERE filtering is implemented first. Now, in order to exclude mysql query optimization, I write an optimized SQL.
Delete index first
DROP index result_s_id_index on Result;Copy the code
Execute your own written optimized SQL
SELECT
s.*
FROM
(
SELECT * FROM Result r WHERE r.c_id = 1 AND r.score = 100
) t
INNER JOIN Student s ON t.s_id = s.s_idCopy the code
The time is 0.413 seconds
Shorter than the previous SQL time.
Viewing the Execution Plan
It is much more efficient to extract result first and then join the table. The problem now is that the scan table appears when extracting result, so it is clear that relevant indexes need to be established.
CREATE index result_c_id_index on Result(c_id);
CREATE index result_score_index on Result(score);Copy the code
Execute the query again
SELECT
s.*
FROM
(
SELECT * FROM Result r WHERE r.c_id = 1 AND r.score = 100
) t
INNER JOIN Student s ON t.s_id = s.s_idCopy the code
The time is 0.044 seconds
That’s a good time, 10 times faster.
Execution Plan:
We’re going to see that the index is used to get the result and then the table.
SQL > execute SQL
EXPLAIN
select s.* from
Student s
INNER JOIN Result r
on r.s_id = s.s_id
where r.c_id = 1 and r.score = 100;Copy the code
Execution time: 0.050 seconds
Execution Plan:
SQL > select * from ‘where’; SQL > select * from ‘where’; SQL > select * from ‘where’;
Expand the test data, adjust the content of the result table to increase the data to 300W, student data is more scattered.
DROP PROCEDURE IF EXISTS insert_Result_TO300W;
DELIMITER $
CREATE PROCEDURE insert_Result_TO300W()
BEGIN
DECLARE i INT DEFAULT 700001;
DECLARE sNum INT DEFAULT 1;
DECLARE cNum INT DEFAULT 1;
WHILE i<=3000000 DO
INSERT INTO Result(`r_id`,`s_id`,`c_id`,`score`)
VALUES(i,(RAND()*69999)+1 ,(RAND()*99)+1 , (RAND()*99)+1);
SET i = i+1;
END WHILE;
END $
CALL insert_Result_TO300W();Copy the code
Replaced the way of data generation, all using random number format.
A quick recap:
show index from Result;Copy the code
Execute SQL
select s.* from
Student s
INNER JOIN Result r
on r.s_id = s.s_id
where r.c_id = 81 and r.score = 84;Copy the code
Execution time: 1.278s
Execution Plan:
Intersect union operation is used here, that is, the union of the results of the two indexes is obtained at the same time, and then the distinction between score and C_id is looked at. However, the distinction between score and C_id is not very high in a single field. If the Result table is retrieved, c_id = 81, the Result is 81. Score = 84 is 84.
The result of c_id = 81 and score = 84 is 19881, that is, the distinction between the two fields combined is relatively high, so the query efficiency of establishing the joint index will be higher. From another perspective, the data of the table is 300W, and there will be more in the future, which is not a small number in terms of index storage. As the amount of data increases, indexes cannot be loaded into the memory, but must be read from disk. In this way, the more indexes there are, the more the cost of reading disk is. Therefore, it is necessary to establish a multi-column joint index according to specific service conditions.
DROP index result_c_id_index on Result;
DROP index result_score_index on Result;
CREATE index result_c_id_score_index on Result(c_id,score);Copy the code
Points to the above query statement
Consumption time: 0.025s
That’s pretty fast and acceptable.
The optimization of this statement is over for now.
conclusion
-
MySQL nested subqueries are really inefficient
-
It can be optimized for join queries
-
When joining a table, you can filter the table with a WHERE condition and then join the table.
-
Set up appropriate indexes, and set up multi-column federated indexes if necessary
-
Learn to analyze SQL execution plans, mysql will optimize SQL, all analysis plans are important
Knowledge extension
The index optimization
I talked about optimization of subqueries and how to build indexes, and in cases where multiple fields are indexed, a single index is created for each field.
Later, it is found that in fact, it is more efficient to establish joint index, especially in the case of large data volume and low differentiation of single column.
Single index
The query statement is as follows:
select * from user_test_copy where sex = 2 and type = 2 and age = 10Copy the code
Index:
CREATE index user_test_index_sex on user_test_copy(sex);
CREATE index user_test_index_type on user_test_copy(type);
CREATE index user_test_index_age on user_test_copy(age);Copy the code
Sex, Type and age fields are indexed, and the data volume is 300W
Query time: 0.415s
Execution Plan:
Find type = index_merge
This is mysql’s optimization for multiple single-column indexes with the INTERSECT union operation on the result set
The column index more
The column index more
We can create multiple column indexes on these three columns and copy the table for testing purposes.
create index user_test_index_sex_type_age on user_test(sex,type,age);Copy the code
Query statement:
select * from user_test where sex = 2 and type = 2 and age = 10Copy the code
Execution time: 0.032s
More than 10 times faster, and the higher the differentiation of multi-column indexes, the faster the improvement.
Execution Plan:
The most left prefix
Multi-column indexes also have the left-most prefix feature:
That is, the first field in the index, sex, should appear in the WHERE condition.
Execute the following statement:
select * from user_test where sex = 2
select * from user_test where sex = 2 and type = 2
select * from user_test where sex = 2 and age = 10Copy the code
Indexes cover
The columns of the query are indexed, so that when the result set is retrieved, the index data can be returned directly without having to fetch data of other columns from disk
Such as:
select sex,type,age from user_test where sex = 2 and type = 2 and age = 10Copy the code
Execution time: 0.003s
It’s much faster than getting all the fields
The sorting
select * from user_test where sex = 2 and type = 2 ORDER BY user_nameCopy the code
Time: 0.139 s
Indexing sorted fields improves sorting efficiency
select * from user_test where sex = 2 and type = 2 ORDER BY user_nameCopy the code
Finally, some SUMMARY of SQL tuning is attached for further study when there is time
-
Column types should be defined as numeric types and as short as possible, such as primary and foreign keys, type fields, and so on
-
Create a single-column index
-
Create multi-column federated indexes as needed
-
If there is a lot of data after a single column is filtered, the efficiency of the index will be low, that is, the differentiation of the columns will be low. If the index is built on multiple columns, the differentiation of the columns will be large and the efficiency will be significantly improved.
-
Build coverage indexes based on business scenarios
-
Query only the fields required by services. If these fields are covered by indexes, the query efficiency is greatly improved
-
Indexes are required on fields of multiple table joins
-
This greatly improves the efficiency of table joins
-
An index is required on the WHERE condition field
-
An index is required on the sort field
-
Indexes need to be created on grouped fields
-
Do not use operations on Where conditions to avoid index invalidation
, END,
The growth path of programmers
Though the road is long, the journey is sure to come
This article was originally posted on the wechat public account of the same name “The Growth of programmers”, reply to “1024” you know, give a thumbs up.
Wechat ID: cxyDCzzl
Highlights from the past
7 big platform tools for programmers to connect private work
The growth path of Java programmers
Why does TCP require a three-way handshake
50 Details of Java Performance Optimization (Collector’s Edition)
Design e-commerce platform coupon system
A conversation that tells you what an architect does?
Teach you a way to use IDE programming to improve efficiency SAO operation!
The classic ebook package for programmers