Mysql optimized art | small volume of free learning

If you still remember the mysql said before choose index based on the cost, it is inevitably that the actual cost of IO and back to the table is the cost of serious and calculation rules, in other words, when we confirm yourself enough to know that the data under the premise of the query efficiency is higher than the system at the specified index of the we provide us with the choice. So how do you specify an index?

Self-reliance! My query is my decision

Forced index selection

SELECT * 

FROM table_name

FORCE INDEX (`index_name`)

WHERE 
Copy the code

Specify the index

USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

CREATE TABLE student (
    number INT NOT NULL AUTO_INCREMENT COMMENT 'student id',
    name VARCHAR(5) COMMENT 'name',
    major VARCHAR(30) COMMENT 'professional'.PRIMARY KEY (number)
) Engine=InnoDB CHARSET=utf8 COMMENT 'Student Information Sheet';
insert into student values(1111.'Stomachache'.'School of Software');
insert into student values(2222.'Fan Tong'.'School of Computer science');
insert into student values(3333.'It's the truth'.'School of Computer science');
create index name on student(name);
create index major on student(major);
explain select name from student use index (major)
where name = 'Stomachache' and major = 'School of Software';
Copy the code

The results of

So what’s the difference between use index and force index

Use index is an option, mysql can choose not to use index, instead of using full table scan solution.

Ban index

IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

A series of problems caused by join

In SQL Server, Inner Join and Outer Join between tables are used by the execution engine according to the selected column, whether there is an index on the data, The selected data can be converted to one of three physical joins: Loop Join,Merge Join, or Hash Join.

Connect inside connect outside

Join (Num1 * Num2); join (Num1 * Num2); join (Num1 * Num2); join (Num1 * Num2); These conditions can be roughly divided into two categories:

Conditions that relate only to a single table:t1.m1 > 1
Conditions that are related to both tables:t1.m1 = t2.m2,t1.n1 > t2.n2

Driver table: The table that is first queried, and then another table is queried through this table
Driven table: The table queried by the driven table is called the driven table
Inner join: Outputs only tables with rows that match a condition

+ outer join: the output is not matched + left outer join: the left table is the driver table + right outer join

This match means that the records in the driver table are matched in the driven table, so there is a difference in the number of rows in the search output between the left outer join and the right outer join

For these kinds of differences, we may or may not want them to appear, so in order to correctly indicate what we think, we divide the filter conditions into where and on

Nested-loop Join

It’s a for loop,

First find the row that drives the table,
According to each data row to the driven table query

Step 2 can be too slow with full table queries each time, so consider using indexes

Block Nested-loop Join

As mentioned before, mysql consumes most of the IO. Ordinary nested loop consumes too much IO. Can we change the loop order to partition the data of the driven table into blocks and read one piece at a time? The benefit of looping the rows in the table to match the data block is a significant reduction in IO consumption.

Merge Join

Merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join = merge Join Second, Merge Join requires at least one equal sign in the table Join condition for the query analyzer to select Merge Join.

For two ordered tables AB
Merge Join starts by taking the first row from each of the two input sets and returns the matching row if it matches. If two lines do not match, the set of inputs with smaller values +1

Generally speaking, Merge Join will be highly efficient if the input ends are ordered. However, if explicit Sort is needed to ensure the orderly implementation of Merge Join, Hash Join will be a more efficient choice. However, there is an exception, that is, the existence of order BY,group BY,distinct, etc. in the query may cause the query analyzer to have to make an explicit sort, so for the query analyzer, Since we’ve already done the explicit Sort, why not use the result of the Sort to make a lower cost MERGE JOIN? In this case, Merge Join would be a better choice.

Hash join

Hash matching joins are more complex than the previous two methods, but hash matching performs better than Merge Join and Loop Join for large amounts of data and without order. In cases where Join columns are not sorted (that is, there is no index), the query parser tends to use Hash Join.

The hash algorithm inevitably needs to hash the data [therefore, it will try to select the driving table with small data volume], which will inevitably cause a large amount of CPU consumption. In my opinion, it is not suitable for the case of large data volume

When joining, the suitability of the driver table is bound to affect the efficiency

When the join condition is specified in mysql, the table that meets the query condition with fewer rows is the driver table. If query conditions are not specified, the driver table is the one with fewer rows to scan. This is how the mysql optimizer roughs up the execution order in the way that smaller tables drive larger tables.

Therefore, for the inner join approach, if you wanted to make the driver table, you could have used STRAIGHT_JOIN

Subquery optimization

What is happy sub query ah

A subquery is A query A that uses the result of another query B as A condition (range) for A

That sounds hard to understand, so let’s

The result of a query can be many rows, it can be one row, it can be a number, so we have

//A subquery is a numberSELECT * FROM t1 WHERE m1 = (SELECT MIN(m2) FROM t2);
//A subquery is a row (which is really no different than a number)SELECT * FROM t1 WHERE (m1, n1) = (SELECT m2, n2 FROM t2 LIMIT 1);
//A subquery is a number of rowsSELECT * FROM t1 WHERE m1 IN (SELECT m2 FROM t2);
SELECT * FROM t1 WHERE (m1, n1) IN (SELECT m2, n2 FROM t2);
Copy the code

IN addition to IN, there are ANY, SOME, ALL, etc

Optimization of subqueries

Optimization of IN subquery

Transform the subquery into an inner join after materializing it

Select * from user where userid in (3,4); select * from user where userid in (3,4); Select *from user where userid = 3 or userid = 4;

Such equivalence can be costly if the subquery results are particularly large, so we create a temporary table for the subquery result set

This is, of course, only if the subquery is not relevant to the external query.

The temporary table will be reweighted
Create a hash index for it

If the result set is really big, it’s not good to put it all in memory, so let’s create a B+ index

Materialization: Result set -> temporary table, which we also call materialization table.

Select * from user where userid in (select cuserid from orders where name = ‘blind ‘);

We can think of it as

Select * from user inner join orders where user.userid = orders.cuserid and orders.name = 'blind box ';

After the subquery is converted to an inner join, mysql will select a table as the driver table based on the cost.

Convert the subquery to a semi-join

To be perfect

At the end of the article, please include the following text and link: This article is participating in the “Gold Nugget Booklet free learning!” Event, click to view the event details