preface

For Internet companies, with the increasing number of users and data, slow query is an inevitable problem. In general, slow query indicates that the interface responds slowly or times out. In a high concurrency scenario, the database connection may be occupied, leading to service unavailability.

Slow queries do cause a lot of problems. How can we optimize slow queries?

The main solutions are:

  • Monitor SQL execution, send email, SMS alarm, easy to quickly identify slow query SQL
  • Enable the database slow query log function
  • Simplifying business logic
  • Code refactoring and optimization
  • Asynchronous processing
  • SQL optimization
  • The index optimization

Other methods first do not say, the back of the opportunity to introduce alone. Today I’m going to focus on index optimization because it’s the most effective way to solve the problem of slow query SQL.

How do I check the index execution of an SQL query?

Insert the explain keyword in front of the SQL to see its execution plan. Through the execution plan, we can clearly see the status of the table and index execution, whether or not the index is executed, the order of index execution, and the type of index.

The steps of index optimization are:

  1. Use Explain to view SQL execution plans
  2. Determine which indexes are being used incorrectly
  3. Optimize SQL, which may require multiple optimizations to reach the optimal value for index usage

Since the first step in index optimization is to use Explain, let’s take a thorough look at it.

Explain to introduce

Let’s look at the official documentation for mysql to describe explain:

  • EXPLAIN can be used with SELECT, DELETE, INSERT, REPLACE, and UPDATE statements.
  • When EXPLAIN is used with explainable statements, MySQL displays information from the optimizer about the statement execution plan. That is, MySQL explains how it will process the statement, including information about how tables are joined and in what order.
  • When used with non-explainable statements, EXPLAIN displays the execution plan for the statement executed in the named connection.
  • For SELECT statements, EXPLAIN warnings about other execution plans that can be displayed.

Explain,

Explain syntax:

{EXPLAIN | DESCRIBE | DESC}
    tbl_name [col_name | wild]

{EXPLAIN | DESCRIBE | DESC}
    [explain_type]
    {explainable_stmt | FORCONNECTION connection_id}

explain_type: {
    EXTENDED
  | PARTITIONS
  | FORMAT = format_name
}

format_name: {
    TRADITIONAL
  | JSON
}

explainable_stmt: {
    SELECTstatement
  | DELETEstatement
  | INSERTstatement
  | REPLACEstatement
  | UPDATEstatement
}
Copy the code

Use a simple SQL statement to see the effect of using the explain keyword:

explain select * from test1;
Copy the code

Execution Result:

In the preceding figure, 12 columns of information are displayed in the execution result. The details of each column are as follows:

To put it bluntly, we need to understand what these columns mean in order to properly judge index usage.

Without further ado, let’s get right to the introduction.

1. The id column

The value of this column is the ordinal number in the SELECT query, such as 1, 2, 3, 4, and so on, and determines the order in which the table is executed.

There are three common situations in an SQL execution plan:

  • Same id
  • Id is different
  • I have the same id and I have different ids

So what is the order of execution of the three tables?

1.1 same id

Execute SQL as follows:

explain select * from test1 t1 inner join test1 t2 on t1.id=t2.id
Copy the code

Results:In the figure, we see that the two data ids in the execution result are both 1 and identical.

What is the order of execution of this fact sheet?

Answer: Execute from top to bottom, first executing table T1, then table T2.

How do we look at tables that are executed?

Answer: Look at the table field, which will be explained later.

1.2 different id

Execute SQL as follows:

explain select * from test1 t1 where t1.id = (select id from  test1 t2 where  t2.id=2);
Copy the code

Results:

We see that the two data ids in the result are different. The first data is 1 and the second data is 2.

What is the order of execution of this fact sheet?

Table T2 is executed first, and then table T1 is executed from the bottom up.

1.3 The ids can be identical or different

Execute SQL as follows:

explain
select t1.* from test1 t1
inner join (select max(id) mid from test1 group by id) t2
on t1.id=t2.mid
Copy the code

Results:

The first two ids have the same id, and the third one has a different ID.

What is the order of execution of this fact sheet?

Answer: Execute the large number first, execute it from bottom to top. When the serial number is the same, execute it from the top down. So the order of the tables in this column is: test1, T1,

You may be wondering: what the hell?

It represents a derived table, and we’ll talk about that later.

One more question: Are id column values allowed to be null?

The answer will come later.

2. Select_type columns

This column represents the type of select. It contains the following 11 types:

But commonly used is actually the following:

Let’s see how these SELECT types appear in detail:

  1. SIMPLE

Execute SQL as follows:

explain select * from test1;
Copy the code

Results:

The picture

It only occurs in simple SELECT queries and does not include subqueries or unions, which is more intuitive.

  1. PRIMARY and the SUBQUERY

Execute SQL as follows:

explain select * from test1 t1 where t1.id = (select id from  test1 t2 where  t2.id=2);
Copy the code

Results:

We see that in this SQL nested query, the outermost T1 table is of type PRIMARY and the innermost SUBQUERY T2 table is of type SUBQUERY.

  1. DERIVED

Execute SQL as follows:

explain
select t1.* from test1 t1
inner join (select max(id) mid from test1 group by id) t2
on t1.id=t2.mid
Copy the code

Results:

The last record is a derived table, which is typically a subquery contained in a FROM list, in this case a grouped subquery in SQL.

  1. The UNION and the UNION RESULT

Execute SQL as follows:

explain
select * from test1
union
select* from test2
Copy the code

Results:

Table test2 is the query after the UNION keyword and is marked as UNION, while test1 is the PRIMARY table and is marked as PRIMARY. While <union1,2> represents the table union with id=1 and ID =2, whose RESULT is marked as union RESULT.

UNION and UNION RESULT usually come in pairs.

Also, answer the question above: Is the value of the ID column allowed to be null?

If you look closely at the figure above, you can see that the ID column is allowed to be null, and only when the SELECT type is: UNION RESULT.

3. The table columns

The value of this column represents the name of the table referenced by the output row, for example: test1, test2, and so on.

But it can also be one of the following:

<unionM,N> : union N of the row M with and id values. : the value N used for the result ID of the derived table from the row. Derived tables may come FROM, for example, a subquery in the FROM clause. : Result of subquery, whose ID value is N

4. The partitions are listed

The value of this column represents the partition from which the query will match the record

5. The type column

The value of this column represents the connection type and is an important indicator of index performance. Contains the following types:

Results are executed from top to bottom in the order of best to worst.

We need to focus on the following types:

system > const > eq_ref > ref > range > index > ALL

Before we demonstrate, there is only one data in table test2:

And a normal index is built over the code field:

Let’s take a look at each of the common connection types:

system

This type requires that there be only one data in the database table, which is a special case of const and is not normally present.

const

SQL > select primary key, primary key, primary key, primary key, primary key, primary key, primary key, primary key, primary key, primary key, primary key, primary key

explain select * from test2 where id=1;
Copy the code

Results:

eq_ref

Often used for primary key or unique index scans. Execute SQL as follows:

explain select * from test2 t1 inner join test2 t2 on t1.id=t2.id;
Copy the code

Results:

At this point, one might wonder, const and eq_ref are both scans of primary keys or unique indexes. What’s the difference?

A const index is only indexed once. The eq_ref primary key matches the eq_ref primary key. The eq_ref primary key matches the eq_ref primary key.

ref

Often used for non-primary key and unique index scans. Execute SQL as follows:

explain select * from test2 where code = '001';
Copy the code

Results:

range

Commonly used for range queries, for example, between… And or In, execute the following SQL:

explain select * from test2 where id between 1 and 2;
Copy the code

Results:

index

Full index scan. Execute SQL as follows:

explain select code from test2;
Copy the code

Results:

ALL

Full table scan. Execute SQL as follows:

explain select *  from test2;
Copy the code

Results:

6. Possible_keys columns

This column represents possible index selections.

Note that this column is completely independent of the table order, which means possible_keys may not be used in practice with the generated table order.

If this column is NULL, there is no associated index. In this case, you can improve query performance by checking the WHERE clause to see if it references any columns that are appropriate for the index.

7. The key columns

This column represents the actual index used.

Possible_keys can be NULL, but the key is not NULL.

Before demonstrating, let’s look at the structure of the test1 table:

Test1 表 名 :

Index used:

The code and name fields use joint indexes.

Execute SQL as follows:

explain select code  from test1;
Copy the code

Results:

This SQL does not expect to use an index, but actually uses an index in full index scan mode.

8. Key_len columns

This column represents the length of the index to use. The key column above shows whether the index is being used, and the key_len column further shows whether the index is being used adequately. It is, unsurprisingly, the most important column.

A key question arises: how is key_len computed?

Three factors determine the value of key_len:

  1. Character set

  2. The length of the

  3. Whether is empty

Commonly used character encodings occupy the following number of bytes:

My current database character encoding format is UTF8 with 3 bytes.

Mysql > select * from user where id = 1;

Additionally, add 1 byte if the field type is allowed to be null.

What about 184 in the figure above?

184 is 30 times 3 plus 2 plus 30 times 3 plus 2

To change the code field type test1 tables to char, and to allow null:

Execute SQL as follows:

explain select code  from test1;
Copy the code

Results:

How did it work out? 183 is 30 times 3 plus 1 plus 30 times 3 plus 2

One more question: why does this column indicate that the index is underused and underused?

Execute SQL as follows:

explain select code  from test1 where code='001';
Copy the code

Results:

In the figure above, the associative index is used: idx_code_name. Key_len should be 183 if all the indexes match, but in fact it is 92, indicating that not all indexes are used and that the index is underused.

8. Ref

This column represents the column or constant hit by the index.

Execute SQL as follows:

explain select *  from test1 t1 inner join test1 t2 on t1.id=t2.id where t1.code='001';
Copy the code

Results:

We see that the index hit by table T1 is const(constant) and the index hit by table T2 is the ID field of table T1 in the column Sue library.

9. Rows

This column represents the number of rows that MySQL believes must be checked to execute the query.

For InnoDB tables, this number is an estimate and may not always be accurate.

10. Filtered column

This column represents the estimated percentage of table rows filtered by table criteria. The maximum value is 100, which represents unfiltered rows. A decrease in the value from 100 indicates an increase in the amount of filtering.

Rows shows the estimated number of rows checked, and rows× Filtered shows the number of rows connected to the table below. For example, if rows is 1000 and filtered is 50.00 (50%), the number of rows connected to the table below is 1000 x 50% = 500.

11. Extra columns

This column contains additional information about how MySQL parses the query, which is important, but it contains too many values to cover, just a few common ones.

Impossible WHERE

The condition after WHERE is always false

Execute SQL as follows:

explain select code  from test1 where 'a' = 'b';
Copy the code

Results:

Using filesort

Sort by file, usually only when the specified sort and index sort are inconsistent.

Execute SQL as follows:

explain select code  from test1 order by name desc;
Copy the code

Results:

The joint index of code and name is established here. The order of the joint index is code before name, and the order of the joint index is directly descending by name, which is different from the previous joint index.

Using index

Indicates whether an overwrite index is used and, in plain English, indicates whether all obtained columns are indexed.

Using index is used in the example above, because only one code column is returned, and its fields are indexed.

Using temporary

Indicates whether a temporary table is used. This is common in order BY and group BY statements.

Execute SQL as follows:

explain select name  from test1 group by name;
Copy the code

Results:

Using where

Indicates that where condition filtering is used.

Using join buffer

Indicates whether connection buffering is used. Tables from the earlier join are partially read into the join buffer, and their rows are then used from the buffer to perform the join with the current table.

The process of index optimization

  1. Use slow query logs to locate the SQL to be optimized

  2. Use the Explain execution plan to view index usage

  3. Focus on:

    Key (check if index is used)

    Key_len (to see if the index is fully used)

    Type (View index type)

    Extra (View additional information: Sort, temporary table, where condition false, etc.)

    In general, index problems can be found based on these four columns.

  4. Optimize the SQL based on the index problems identified in the previous step

  5. Go back to step 2

One last word (attention, don’t fuck me for nothing)

If this article is of any help or inspiration to you, please scan the QR code and pay attention to it. Your support is the biggest motivation for me to keep writing.

If this article is of any help or inspiration to you, please pay attention to it. Your support is the biggest motivation for me to keep writing.

Ask for a key three even: like, forward, look.

In addition to pay attention to the public number: [Su SAN said technology], in the public number reply: interview, code artifact, development manual, time management have super good fan welfare, in addition reply: add group, can communicate with a lot of BAT big factory seniors and learn.