preface

In the previous article, I discussed the role of indexes in database queries. The main performance is to reduce the number of queries to improve the execution efficiency. The fundamental reason is to reduce the cost of I/O. This article will do a related optimization for Mysql database example, the query and index do a good job of connection, enhance the ability of practical application!

The Mysql

Once it comes to query optimization, it cannot leave the application of index. In this paper, InnoDB, a common engine of mysql, is selected as the research object, and the index structure B+ tree used by InnoDB engine is briefly explained.

InnoDB B + tree

Create table Student where primary key = id;

CREATE TABLE `Student` (
  `id` int(16) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;Copy the code

Insert 12 pieces of data:

insert into Student(id,name) valuse(1,'XiaoHong')
insert into Student(id,name) valuse(2,'XiaoMing')
insert into Student(id,name) valuse(3,'XiaoFang')... insert into Student(id,name) valuse(12,'XiaoYou')
Copy the code


The Innodb engine will create a B+ tree index structure based on the primary key ID. We have the following abstraction:



How to understand the shape of the structure in the diagram?

Table data is first stored in disk space according to the order of primary key ID. The middle node in the figure stores the real data of each row in the table. It can be recognized that the table data itself is a part of the primary key index, as shown in the following figure, each row is stored according to the order of primary key ID:



We set the id type Int to take up 4 bytes, the name field to take up a fixed 10 bytes of Char, and the Student table to take up 14 bytes of disk space per row. Ideally, we can simplify to this: assume that the first line (1, XiaoHong) is at the disk address 0x01, then the second line (2, XiaoMing) is at the disk address 0x0f (0x01+14=0x0f), and so on.

The non-leaf node stores the index value and the corresponding pointer. We can see that the 12 rows of data are divided into five nodes (one non-leaf node and four leaf nodes) according to the primary key ID. In the real environment, Mysql sets each disk block (also known as page, generally 4KB) based on the principle of disk read by block. Innodb sets the page size to 16KB) to a tree node size, so that each disk I/O can fetch all data of the corresponding node.

For non-leaf nodes, the pointer to the left of each index value points to the node address of the corresponding data less than the index value, and the pointer to the right of the index value points to the node address of the corresponding data greater than or equal to the index value:



As shown in the figure above, the data pointing to nodes of the left pointer with index value of 4 must be less than 4, and the corresponding range pointing to nodes of the right pointer must be greater than or equal to 4. In addition, in the case of a certain number of index values, B+ trees require each non-leaf node to store as much data as possible in order to keep the height of the tree as small as possible. Generally, the number of index values of non-leaf nodes is required to be at least (n-1)/2, and n is the maximum number of values that can be accommodated by a page block size. According to the structure of the above hypothesis form, we know that each page piece can only accommodate three index values or at most three rows of data (the actual will be much bigger), under the premise, if continue to insert rows of data, so first of all, there will be no room for the new leaf node data, the leaf node by division to increase a new leaves do save:



Conceivably, we try to continue to insert two pieces of data:

insert into Student(id,name) valuse(13,'XiaoRui')
insert into Student(id,name) valuse(14,'XiaoKe')Copy the code

The result will be the following:



Because each non-page node can hold a maximum of three index values and the corresponding four Pointers (fan out), the complexity of the entire query is O(log4N), where N is the number of rows in the table. For the Student table with 1000 Student data, the complexity of the id query is log41000=5. Here, the complexity of the query can be intuitively understood as the height of the tree in B+ tree, and the page block address of the target data is finally located through the hierarchical judgment of non-leaf nodes.

Therefore, innoDB engine’s table data is organized by primary key index structure. Leaf nodes store row data, which is a KIND of B+ tree file organization. If the row data is located by primary key, it will have great efficiency. The engine automatically creates a primary key index for the table and constructs a B+ tree file organization. In practice, when to query some data through the primary key, the first is through the B + tree positioning to specific leaf node address, because a leaf node is just setting of integer times the size of disk blocks for the address, so fast I/O by continuous address will the entire node content is loaded into memory, and then from memory to the node content filter to find the target data.

But the InnoDB engine also allows us to create separate indexes on other table fields, also known as secondary indexes. For example, we create table Student like this:

CREATE TABLE `Student` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;Copy the code

Insert sample data:

insert into Student(id,name) valuse(1,'A')
insert into Student(id,name) valuse(2,'A')
insert into Student(id,name) valuse(3,'B')... . insert into Student(id,name) valuse(12,'F')Copy the code

How to understand the existence of the name field index structure? Directly above:



The secondary index will also build A B+ tree index structure, except that the leaf node stores the primary key ID value. The non-numeric index will be sorted according to the predefined character set sorting rules in the index structure, for example, name=A is smaller than B in the corresponding sorting rules.

As per the above structure, suppose we do the following:

select * from Student where name='A';Copy the code

First, the secondary index is used to locate leaf node 1, and then loaded into memory. In memory, two primary key ids are found: 1 and 2. Then, the primary key id is retrieved from the primary key index, and the row data is loaded out.

In secondary indexes, InnoDB also supports the form of composite indexes, which combine multiple fields into a single index. For example, we create the following table Student:

CREATE TABLE `StudentTmp` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_name_age` (`name`,`age`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;Copy the code

Name and age combine to form an index. The corresponding B+ tree index structure has the following form:



In this combined index, the contents of leaf nodes are sorted according to the name field first, and then sorted according to the age field when the value of the name field is the same. In this way, when name and age are queried as the combined condition, the sorting relationship of the two fields will be fully utilized to realize the positioning of multi-level indexes.

Ok, we don’t have to dwell on the details of B+ tree, we just need to build the general shape of the index structure in our mind, imagine that the query of the index is a process of progressively locating the target data in a tree structure, and realize that the query complexity is related to the number of Pointers in the non-leaf nodes of B+ tree. This is very helpful for us to develop sensitivity to query costs.

Use the Explain method to understand query costs

After realizing the improvement of query efficiency brought by index construction, how can we understand the utilization of index in each query Sql in practical application? The Explain method can assist us in making judgments prior to execution, especially for complex queries with multiple nested or joined levels, with related parameters.

Plan analysis can be done by prefixing the query SQL with the explain keyword:

explain select id from Student where id=1;

The result is as follows:



The result form has id, table, select_Type… Wait for 10 parameters, each parameter has the corresponding result value, next we do a good understanding step by step.

Id: identifies the execution sequence of subqueries. A larger value indicates a higher execution priority

The query above is only a simple query, so the ID only has a 1, now we add a sub-query:

explain select name from Student where id=(select max(id) from Student);

There are:



The table field is used to specify the name of the table that corresponds to the query plan, and the id value is used to indicate which query plan is executed first.

Table: specifies the name of the table associated with the query plan

Select Max (id) from Student where id=2; select Max (id) from Student where id=2; We found a note like this in the Extra field: Select Tables Optimized Away tells us that the engine has optimized the query plan. Index-based optimizations like min/ Max operations or count(*) operations do not need to wait until the table is retrieved. The value may be stored somewhere in advance and read directly. As the id field itself belongs to the primary key index of the Student table, the engine itself stores min(ID) and Max (ID) values in real time for query, or directly reads the data of the first and last leaf node of the primary key index tree to obtain, so the similar query plan has great execution efficiency in practice.

Select_type: identifies the type of the query plan

There are several different types of select_type:

  • SIMPLE: SIMPLE SELECT without UNION or subquery
  • PRIMARY: If the query contains any complex subparts, the outermost select is marked as PRIMARY
  • UNION: The second or subsequent SELECT statement in UNION
  • SUBQUERY: The first SELECT in a SUBQUERY
  • DERIVED(SELECT FROM table, subquery FROM clause)

Explain select id from Student where id=1;

If select_Type is SIMPLE, the SQL query is in the simplest form

Explain select name from Student union select name from Course; There are:



Select PRIMARY from Student; select PRIMARY from Student; select PRIMARY from Student; select PRIMARY from Student; select PRIMARY from Student; select PRIMARY from Student; This confirms the description of UNION type above. Combined with what the ID field represents, we see that the engine executes the Course table plan first and then the Student table plan.

Explain select id,(select count(*) from Course) as count from Student; There are:



There are also two query plans this time, but the difference lies in that we constructed a SUBQUERY statement for the Course table, with the corresponding select_type as SUBQUERY. According to the ID, the SQL will execute the query plan of the Course table first and then execute the query plan of the Student table.

Explain select name from (select name from Student where id=1) TB; There are:



What makes this statement special is that the subquery plan for the Student table is wrapped in a layer, so the corresponding select_type is DERIVED.

With select_Type, you can split more than one query plan in a row. With select_Type, you can split more than one query plan in a row. With select_Type, you can split more than one query plan in a row. For different query plan one by one analysis and finally complete the overall optimization.

Let’s start by focusing on a few other fields in the Explian analysis form:

  • type

  • Possible_keys: Possible index in the query plan
  • Key: indicates the actual index used by the query plan
  • Rows: Query complexity, which can also be simply understood as the number of rows that a query plan needs to process

These fields are closely related to the index and will really provide a reference for our analysis of query costs. We can use these fields to determine index utilization.

Type: The retrieval method used when querying data from a table

Type indicates whether the query plan makes use of the index structure and the specific characteristics of the search. The specific categories are:

  • ALL: no index, MySQL will traverse the table to find a matching row
  • Index: Only use index structure, in InnoDB can be understood as only in the B+ tree global search, do not directly operate on the table
  • Range: Retrieves only rows in a given range, using an index to select rows
  • Ref: Retrieves by index, except that the index is not unique and may retrieve multiple records with the same value
  • Eq_ref: similar to ref, the difference is that the index used is a unique index. For each index key value, only one record in the table matches
  • Const, system: These types of access are used when MySQL optimizes part of the query and converts it to a constant. MySQL can convert the query to a constant if the primary key is placed in the WHERE list. System is a special case of const type, and is used when the table being queried has only one row
  • NULL: MySQL breaks down statements during optimization without even accessing the table or index. For example, selecting a minimum value from an index column can be done through a separate index lookup

Explain select name from Student where name=’ Student ‘; There are:



Select * from Student where type = ALL and name = ‘Student’; select * from Student where type = ALL and name = ‘Student’; select * from Student where name = ‘Student’; Select * from Student where rows=699323;

Explain select name from Student; There are:



Type is index, because we have built an auxiliary index for name in advance, so all the name information in the table only needs to be scanned on the B+ tree corresponding to name:



As shown in the figure above, start the scan directly at the left node of the secondary index tree, query all the name information, query the data itself is sorted, if you happen to need to sort SQL:

select name from Student order by name asc;Copy the code

The query speed is much faster than retrieving from the table data structure!

Explain select * from Student where id>1 and id<5; There are:



Type is range, which indicates that the range of the query is determined by the index structure, as shown below:



Explain select name from Student where name=’A’; There are:



If type is ref, it indicates that the name index is not unique, that is, there may be multiple records with the same name in the table. When data is retrieved through the name index structure, all records matching conditions will be retrieved.

Explain select student.name, score. Score from Score join Student on score.s_id = student.id explain select student.name, score. Score from Score join Student on score.s_id = student.id



Note that in this join query, when the primary key ID of Student is used as the join condition, the query plan type of the corresponding Student table is eq_ref, indicating that the unique index feature is used and that each query on Student will eventually locate a result.

Explain select id from Student where id=1; There are:



Type is const. This condition occurs only when the SQL query condition is a unique index. It indicates that the engine performs special processing on the statement. Here’s another example:

Explain select student. name,Score. Score from Score join Student on Score. S_id = student. id where student. id=1; There are:



For this join query, the following execution logic can be expressed in pseudocode without regard to preexecution const optimization:

outerIterator=select A.s_id,A.score from Score as A; // Perform a global row scan of the Score tablewhile (outerRow=outerIterator.next){
    innerIterator=select A.id,A.name from Student as A where A.id=1;
    innerRow=innerIterator.next;
    if(innerrow. id= outerrow. s_id){// Records the result matching the conditionsprint(outerRow.score,innerRow.name); }}Copy the code

Select * from Student where id = ‘Score’; select * from Student where id = ‘Score’;

ConstIterator =select a.id, a.name from Student as A; constIterator=select a.id, a.name from Student as AwhereA.id=1; constRow=constIterator.next; constId=constRow.id; constName=constRow.name; OuterIterator =select a._id, a.core from Score as A;while(outerRow= outeriterator.next){// You only need to compare it with the corresponding constant during the plan execution, which greatly improves the execution efficiencyif (innerRow.id=constId){
        print(outerRow.score,constName); }}Copy the code

By extracting the results of the Student table plan into constants, you avoid the query cost of circular retrieval, and the resulting performance gain is significant.

So far we have made a basic introduction to Explain method, through the division of SQL query plan and index utilization degree of judgment has been able to provide most of the optimization ideas, next we will combine with real data for a test, We will focus on the changes in the Rows field to determine the effect of our optimizations and hopefully give more thought to the whole process.

Practical optimization

Before, I saw a student discussing the optimization process of database in the forum. Here we refer to the table situation they were faced with at that time to demonstrate. Let’s assume that the database of a huge educational administration system is optimized based on mysql 5.5, involving three tables:

Student table (id: student ID; Name: student name)

CREATE TABLE `Student` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;Copy the code

Alter table Student; alter table Student;

INSERT INTO `Student` (`id`, `name`)
VALUES
	(1, Students' 0 '),
	(2, Students' 1 '),
	(3, Students' 2 '),... . . (700000,'699999')Copy the code

Course schedule (id: course ID; Name: Course name)

CREATE TABLE `Course` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;Copy the code

Class schedule initialization has 100 rows of data:

INSERT INTO `Course` (`id`, `name`)
VALUES
	(1, 'course 0'),
	(2, 'class 1'),
	(3, 'course 2'),... . . (100,'course 99')
Copy the code

Transcript (ID: record ID; S_id: indicates the id of a student. C_id: course ID; Score:)

CREATE TABLE `Score` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `s_id` int(11) DEFAULT NULL,
  `c_id` int(11) DEFAULT NULL,
  `score` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;Copy the code

The score table records the scores of different students for corresponding courses. We initialize the data of 20 courses of 100,000 students with a total of 2 million lines in the table:

INSERT INTO `Score` (`id`, `s_id`, `c_id`, `score`) VALUES (1, 1, 1, 63), (2, 2, 1, 67), (3, 3, 1, 40), ..... . 000,20,95 (20000000100)Copy the code

Query the names of all students who scored 100 points in a course.

select Student.name from Student where id in (select s_id from Score where c_id = 1 and score = 100 );

I tried to run it and had to terminate it after a few minutes of waiting. Why was it so time-consuming? Through Explain analysis, the results are as follows:



The execution plan contains two query plans. We notice that the corresponding rows are 100000 and millions respectively, and the corresponding query complexity is O(100000) and O(2000000) respectively, which is exactly the same as the number of rows corresponding to the table, indicating that the whole table is scanned. We see that the type field is ALL, confirming our assumption. Next try to index the subquery:

alter table Score add index index_cid(c_id)Copy the code

After a few minutes of waiting, I give up and continue to check Explain results:

When the query complexity of the Score table is reduced to O(200000), it only brings a 10-fold performance optimization. Through type=ref, we know that this is a non-unique index, indicating that c_id contains a large number of identical values in the table, and the optimization effect is not significant. We try to establish a secondary index for C_id and Score again:

alter table Score add index_cid_score (c_id,score)Copy the code

This time we got 1566 results, which took 103s in total:



Corresponding Explain analysis results:



By combining indexes, we reduced the query complexity of Score table to O(1565), which was greatly improved compared with single indexes, but the overall execution time was still unsatisfactory. Select * from Student where id = 2; select * from Student where id = 1; select * from Student where id = 1;

select s_id from Score where c_id = 1 and score = 100Copy the code

Total 1566 results, 45ms:



Then execute the Student plan:

select Student.name from Student  where id in(55,68,104,243... 99688).Copy the code

A total of 1566 results, taking 1.06s:



We expect that the total execution time of the two query plans should be 1.06s+0.045s=1.105s, which is quite different from the actual 103s. How to explain this? Observe the select_type of Score plan carefully as DEPENDENT SUBQUERY. The form of SUBQUERY has been introduced above, that is, it represents the first SELECT in the SUBQUERY. The DEPENDENT mark here is different from ordinary SUBQUERY in that it indicates that the plan has a dependency relationship. That is, the execution process of Score plan depends on the execution result of the external plan (Student table), and the whole logical process is represented by pseudo-code:

OuterIterator =select Student. Id,Student. Name from Studentwhile(outerRow= outeriterator.next){innerIterator=select score.s_id from Scorewhere c_id = 1 and score = 100
    innerRow=innerIterator.next;
    if(innerrow.s_id = outerrow. id){// Records the matching resultsprint(outerRow.name); }}Copy the code

The query complexity is O(total) =O(Student)*O(Score). Here, the calculation cost borne by Mysql is as follows: O(157337275)=O(100535)*O(1565) Is there a way to make the id index of the Student table work, at least theoretically, so that we can keep the whole query at about 1s?

From an execution logic point of view we can imagine a situation like this:

OuterIterator =select score. s_id from Scorewhere c_id = 1 and score = 100

while(outerRow= outeriterator.next){innerIterator=select Student. Id,Student. Name from Studentwhere id=outerRow.s_id
    innerRow=innerIterator.next;
    if(innerRow! =null){// Record the result that matches the conditionprint(innerRow.name); }}Copy the code

C_id =1, Score =100, c_id=1, Score =100, c_id=1, Score =100, c_id=1, Score =100, c_id=1, Score =100, c_id=1, Score =100

explain select Score.s_id from Score where c_id = 1 and score = 100There are:

explain select name from Student where id=? There are:

The guess is: O(1565)=O(1565)*O(1), we try to express SQL as join query:

Select student. name from Student inner join Score on student. id=1 and Score.



It only costs 0.048s! Take a look at Explain analysis:



As expected, the Student table plan uses unique indexes, the Score table uses composite indexes, and the final query complexity is controlled at O(1565). Unlike the initial subquery, why does the join query make full use of indexes? How to understand the internal execution logic?

In mysql, a join is essentially a cartesian product, where all rows of two tables are joined by one row.

select * from Student,Course;Copy the code

Corresponding logical process:

OuterIterator =select Student. Id,Student. Name from Student;while(outerRow= outeriterator.next){innerIterator=select Course. Id,Course. Name from Course;while(innerRow= inneriterator.next){// Get the result of the connectionprint(outerRow.id,outerRow.name,innerRow.id,innerRow.name)
    }  
}Copy the code

In mysql, we usually express it like this:

select Student.id,Student.name,Course.id,Course.name from Student join Course;Copy the code

It can be seen that the maximum complexity of join query can reach O(number of Student table rows)*O(number of Course table rows). If n tables are added to join query, the complexity model has O=O(number of table 2 rows)*O(number of table 2 rows)…… *O(table n rows), which would be a near exponential burst of growth! In practical applications, on and WHERE are often used to control the scale of data connection. Specifically, the result rows are filtered according to the actual data filtering conditions, and then optimized in internal queries combined with indexes.

Ok, back to the above question:

Select student. name from Student inner join Score on student. id=1 and Score.

The author has read the data before. In general, the cartesian product process is carried out when the database conducts join, and the ON field is used as the judgment condition when the row joins. Finally, the WHERE condition is used to filter the resulting rows.

OuterIterator =select Student. Id,Student. Name from Student;while(outerRow= outeriterator.next){innerIterator=select score.s_id, score.c_id, score.score from Score;while(innerRow= inneriterator.next){// The on field condition here determines whether to joinif(outerRow.id= innerrow.s_id){// Save the result matching the connection conditions tmpArr[]=(outerRow.name, innerrow. c_id, innerrow.score); }}} // Here we gowhereConditional result filteringfori:=0; i<n; i++{if(tmpArr[i].c_id=1&&tmpArr[i].score=100){ resultArr[]=tmpArr[i]; }} // Finish the final outputprint(resultArr)Copy the code

Following the above procedure, we should predict the query complexity as O=O(Student rows)*O(Score rows); SQL > select * from mysql; SQL > select * from mysql; SQL > select * from mysql;

Score uses the combined index index_cid_score. We can guess that the engine tries to judge the WHERE condition first, and then joins the result set with the Student table. During the join process, we find that the Student table is favorable to use the primary key index. Student = Student; Student = Student; Student = Student;

// select * from ScorewhereOuterIterator =select score. s_id, score. c_id, score. Score from Scorewhere c_id=1 and score=100;
while(outerRow= outeriterator.next){// The primary key index is used by the inner StudentwhereInnerIterator =select student. name from StudentwhereId = outerrow. s_id // If there are corresponding rows, retain themif(innerRow=innerIterator.next){ resultArr[]=(innerRow.name,outerRow.c_id,outerRow.score); }} // Finish the final outputprint(resultArr)Copy the code

Obviously, the complexity of the appeal logic process depends on the number of rows after the Score table conditional retrieval, which is also consistent with the results of our actual Explain analysis.

However, the author thinks that for Mysql, it is not certain that join can meet optimization requirements. On the one hand, different engines and different versions of Mysql may adopt different optimization methods, and there is no fixed standard. Such changes should be analyzed more in combination with Explain in the actual business process.

For this example, I also tried to execute Mysql 5.6, and found that for SQL:

select Student.name from Student where Student.id in (select s_id from Score where c_id = 1 and score = 100 )

Under the same index construction, the Explain analysis results are:



The Score query plan id is 2, which has a higher execution priority, but select_type appears before Mysql5.5: MATERIALIZED



This is the same as we did with the join in Mysql5.5! Going back to MATERIALIZED thinking, the official description of MATERIALIZED is:

The optimizer uses materialization to enable more efficient subquery processing. Materialization speeds up query execution by generating a subquery result as a temporary table, normally in memory. The first time MySQL needs the subquery result, it materializes that result into a temporary table. Any subsequent time the result is needed, MySQL refers again to the temporary table. The optimizer may index the table with a hash index to make lookups fast and inexpensive. The index is unique, which eliminates duplicates and makes the table smaller.

In order to make the subquery more efficient, the optimization controller will generate a temporary table from the result of the subquery, which is generally placed in memory. Meanwhile, the corresponding hash index of the temporary table is generated to improve the efficiency of the memory query. The logical process of the example can be described as follows:

MateralizedRows =select score.s_id from Score; materalizedRows=select score.s_id from Scorewhere c_id = 1 and score = 100

fori=0; i<n; MateralizedRow =materalizedRows[I]; materalizedRows =materalizedRows[I]whereInnerIterator =select student. name from StudentwhereId = materalizedrow. s_id // Retain rows if any existif(innerRow=innerIterator.next){ resultArr[]=(innerRow.name,outerRow.c_id,outerRow.score); }} // Finish the final outputprint(resultArr)Copy the code

This logic is similar to the join process above. Here is a way to see the optimized expression of SQL by entering two statements at a time on the console:

explain select s.name from Student s where s.id in (select s_id from Score sc where sc.c_id = 1 and sc.score = 100 );
show warnings;Copy the code

The optimized SQL form is as follows:

select `test`.`Student`.`name` AS `name` from `test`.`Student` semi 
    join (`test`.`Score`) 
    where((`test`.`Student`.`id` = `<subquery2>`.`s_id`) 
            and (`test`.`Score`.`score` = 100) 
            and (`test`.`Score`.`c_id` = 1)
    )Copy the code

As expected, in Mysql5.6, the subquery will be transformed into join form, while the so-called MATERIALIZED optimization is just borrowed from the optimization form of join join, which indicates that different mysql versions will adjust the structure of SQL statements. I suggest in the face of complex queries can use this method to understand, and then combined with Explain method analysis!

Here, the entire process of the optimization of sample over, no matter how complicated the actual environment of the query demand we can try to classify the query plan, observe each plan executive priorities, and understand the engine internal execution logic, finally calculate the overall cost of query is adjusting and optimizing step by step, in most cases the author tried!

conclusion

At the beginning of the paper, we first understand the index structure of InnoDB engine, the purpose is to form the sensitivity of query cost, have the theoretical support of query complexity judgment, and Explain method is specific to the process of practical application, this is the author can think of the most straightforward optimization means. The final example demonstrates the flexibility of the optimization process, which is reflected in the support of different versions of Mysql. All these need to accumulate experience in practical application to better deal with. The author needs to remind is that the index structure at the same time in the impact of database maintenance costs, in addition to improve query efficiency, data deletion and insertion have increased the burden of the database, this needs to be combined with the actual situation to do a good balance!