preface

Friends must have encountered such feedback: the page loading data is too slow, some even timeout, the user experience is very poor, need to quickly optimize;

Feedback equals complaint ah, how many times, I guess the leader wants to talk to you.

So had to stop the head of the live, hurriedly investigation, may be the final process of the problem, may also be a large number of concurrent queuing problems, may also be SQL query performance caused by; And in many cases, slow SQL query is the most direct culprit, slow down the system is to achieve a same function, some friend millisecond rendering effects, while others want to several seconds, and tuning needs to spend time to be reckoned with, and ultimately may reflect on personal business ability and image: wow, really cow force, points minutes; Novice, unexpectedly write such SQL;

And for SQL tuning, search engine a check, 72 stunts are absolutely enough show, so according to the beginning of the practice, good luck is solved, bad luck how to use all not good; So more important is the business scenario, to learn to analyze the cause, finally know how to solve; This series is about database optimization, why, and how.

MySQL logical structure oracle

The logical structure of MySQL is divided into four layers, like project layers, with each layer handling different business logic.

Overview:

  • Client: this refers to connect to MySQL in various forms, such as. Net in the use of ADO connection, Java using JDBC connection; MySQL is a client and server mode. The connection must be established before data transmission and related logic can be processed.

  • Business logic: There are many modules in MySQL to deal with related business logic.

    Connection management: responsible for connection authentication, connection number judgment, connection pool processing and other business logic processing;

    Query cache: When a SQL came in, if open the query cache feature, MySQL will priority to check whether there is data in the query cache match, if there is a match, don’t go to parse the corresponding SQL, but if the statement have a user-defined function, storage function, the user variables, temporary tables, system tables in MySQL database, don’t walk the cache; For query caching, which has been removed in MySQL8.0, the official response is that query caching can cause performance bottlenecks in certain scenarios.

    Parser: For an SQL statement, MySql parses it according to the syntax rules and generates an internally recognized parse tree.

    Optimizer: is responsible for optimizing the parse tree obtained by the parser. MySQL will find an execution plan that MySQL considers optimal based on its internal algorithm, and then follow this execution plan. Therefore, the following analysis is the optimal execution plan selected by MySQL for SQL statements, combined with the business, according to the rules of SQL optimization, so that SQL statements in MySQL to achieve the real optimal.

    Executor: After obtaining the execution plan, the system finds the corresponding storage engine and executes the commands according to the execution plan.

  • Storage engine: the storage and extraction of data finally depends on the storage engine; MySQL implements a pluggable storage engine mechanism. Different storage engines execute different logic.

  • Physical file: The final location of data storage, that is, on disk; The storage engine collaborates with the storage engine to read and write data.

About the logical structure of MySql, the above is only a simple description, the functional modules of the business logic layer are far more than the above mentioned, partners interested in special study, the purpose here is to reflect the SQL statement to the server through several key steps, to facilitate the understanding of subsequent optimization.

2. Order of keyword execution in SQL statements

When writing a query statement, it is customary to start typing from the beginning to the end, should be from select, but does not seem to pay attention to their actual execution order; Since to optimize, certainly need to know a SQL statement about the execution process, combined with the execution plan, the purpose is more clear; The previous picture was easy to understand:

Key words:

  • FROM: Determine the data source, that is, specify the table;
  • JOIN… ON: Determine the association table and association conditions;
  • WHERE: Specifies the filtering criteria to filter out the data that meets the criteria.
  • GROUP BY: Groups the filtered data according to the specified field.
  • HAVING: Specifies filtering conditions for grouped data.
  • SELECT: find the desired field data;
  • DISTINCT: Deduplication is performed on the searched data.
  • ORDER BY: Sorts the specified fields of the data after deduplication.
  • LIMIT: Limits the number of pages that can be retrieved after deduplication.

Now that you understand the logical structure of MySQL and the order in which SQL query keywords are executed, you can talk about the execution plan.

3. Talk about the execution plan

When a SQL is sent to MySQL for execution, it needs to be optimized by the internal optimizer. The explain keyword can simulate the optimizer to execute SQL query statements, so as to know how MySQL processes SQL, that is, the execution plan of SQL. Analyze SQL statements based on the execution plan information provided by Explain, and then perform related optimization actions. The following example uses five tables: USER(USER table), MENU(MENU table), ROLE(ROLE table), USER_ROLE(USER ROLE table), ROLE_MENU(ROLE MENU table), ADDR(USER address table), FRIEND(FRIEND table), The relationship between them is not detailed here, partners must understand, this is the control of the menu permissions of the five basic tables and two basic information tables;

The demo version is MySql5.5, which varies from version to version, so don’t call me a mess if the test results are different. In fact, it is important to view the idea, the whole is more or less the same. (Please forgive me……)

The following information will be output through Explain. Many partners only pay attention to the part marked in the red box (index), but in fact, it is not enough. Next, we will talk about it one by one.

  • id

    This id is not the same as the primary key ID of the table. Here the ID represents the order in which the table is loaded in each SQL statement execution plan. There are three cases:

    When the ID is the same: execute from top to bottom;

    EXPLAIN SELECT t.ID,t.USER_NAME,r.ROLE_NAME FROM USER t 
    	JOIN USER_ROLE tr ON t.ID = tr.USER_ID
    	JOIN ROLE r ON tr.ROLE_ID = r.ID
    Copy the code

    If the following statement is executed, the result is as follows:

    T ->tr-> R (where the table is alias);

    When ids are different: When ids are different, the larger ids are, the earlier the command is executed.

    EXPLAIN SELECT t.ID,t.MENU_NAME,t.MENU_URL FROM MENU t
    	WHERE t.ID IN (SELECT MENU_ID FROM ROLE_MENU rm 
    		WHERE rm.ROLE_ID IN(SELECT ROLE_ID FROM USER_ROLE ur WHERE ur.USER_ID=1))
    Copy the code

    The subquery will increment the ID as follows:

    Ur ->rm->t; ur->rm->t;

    When the ids are the same and different at the same time: The group with the same ID is considered to be the same, or the group is loaded from the top down. The bigger the bigger the better

    EXPLAIN SELECT t.ROLE_ID,m.ID,m.MENU_NAME,m.MENU_URL FROM 
    	(SELECT ROLE_ID FROM USER_ROLE WHERE USER_ID=3) t,ROLE_MENU rm,MENU m
    	WHERE t.ROLE_ID=rm.ROLE_ID
    	AND rm.MENU_ID=m.ID
    Copy the code

    The result is as follows:

    As shown in the figure above, if the ids are the same or different, the load sequence of the corresponding tables is USER_ROLE->derived2 ->rm->m; The deriveD2 deriveD2 deriveD2 deriveD2 deriveD2 deriveD2 deriveD2 deriveD2 deriveD2 deriveD2

  • select_type

    Select_type is the query type of each step, so that the analyst can directly see what query is executed in the current step. There are many types, as shown in the following figure:

    1> SIMPLE: a SIMPLE SELECT query that does not contain subqueries or unions;

    EXPLAIN SELECT * FROM USER;
    Copy the code

    The following output is displayed:

    2> PRIMARY: the outermost SELECT is marked as this type if the query contains other subqueries or UNION operations;

    As shown in the figure above, the query contains sub-queries, with the outermost query marked as PRIMARY;

    3> SUBQUERY: Subqueries contained in SELECT or WHERE are marked as this type;

    As shown in the PRIMARY figure, subqueries are marked as subQueries when they exist

    4> MATERIALIZED: MATERIALIZED subquery, that is, MATERIALIZED the corresponding subquery into a temporary table;

    EXPLAIN SELECT t.ID,t.MENU_NAME,t.MENU_URL FROM MENU t
    	WHERE t.ID IN (SELECT MENU_ID FROM ROLE_MENU rm 
    		WHERE rm.ROLE_ID IN(SELECT ROLE_ID FROM USER_ROLE ur WHERE ur.USER_ID=1));
    Copy the code

    MySQL8.0 = MySQL8.0; MySQL8.0 = MySQL8.0; MySQL8.0 = MySQL8.0

    Materialize the subquery as a temporary table subQuery2, as shown in the figure above, by setting the optimizer’s corresponding switch.

    5> DERIVED: Subqueries after FROM are marked as of this type, again putting the results in a temporary table;

    EXPLAIN SELECT tm.MENU_NAME,rm.ROLE_ID FROM 
    	(SELECT * FROM MENU WHERE ID >3 ) tm ,ROLE_MENU rm 
    	WHERE tm.ID=rm.MENU_ID AND rm.ROLE_ID=1
    Copy the code

    Output result:

    As shown in the figure, the subquery following FROM is labeled DERIVED and the DERIVED table generated by the step is deriveD2. The higher version seems to have been optimized, and the 8.0 form is considered a simple query.

    6> UNION: SELECT the inner part of the query;

    EXPLAIN SELECT * FROM USER_ROLE T1 WHERE T1.USER_ID=1
    UNION
    SELECT * FROM USER_ROLE T2 WHERE T2.USER_ID=2
    Copy the code

    The following output is displayed:

    As shown in the figure above, annotate the second SELECT as UNION, which means that the corresponding loaded table is T2.

    **7> UNIOIN RESULT: **UNION RESULT, corresponding id is empty, representing a RESULT set;

    See the UNIOIN figure. UNIOIN RESULT represents the RESULT after UNION, and the corresponding ID is empty.

  • table

    Table represents which table is loaded by the corresponding step. Temporary tables such as SubQuery2 and DeriveD2 appear in the middle. The last number represents the ID of the corresponding step that generated the table.

  • type

    Represents the access type, which is divided into multiple types within MySQL. The common types are shown in the following order from best to worst:

    system->const->eq_ef->ref->fulltext->ref_or_null->index_merge->unique_subquery->index_subquery->range->index->ALL;

    In actual development scenarios, the more common types are as follows: const->eq_ref->ref->range->index->ALL(order from good to bad), usually at least at the range level or above, such as ref is relatively good;

    Good to bad refers to query performance.

    1>const (primary key); const (primary key);

    2> eQ_ref: unique index scan. For each index key, only one record in the table matches it. It is usually used for primary key or unique index scan.

    3>ref: non-unique index scan, return all matched rows, for example, create a friend maintenance table, maintain the corresponding friends of the user, and create non-unique index on the user ID;

    4>range: use an index to retrieve rows in a specified range. Generally, between, <, >, and in will be queried in the WHERE statement.

    5>index: full index scan, only traverses the index tree;

    6>ALL: scans ALL tables and finds matching rows. Compared with index, ALL scans disk data, and index traverses the index tree.

  • possible_keys

    Displays indexes that might be used, but not necessarily in the actual query. The query involves fields that are listed if an index exists, but are listed only in key if an overridden index is used.

  • key

    If the value is NULL, it indicates that the index is not used. This is also the key to determine whether to use the index.

  • key_len

    Key_len indicates the number of bytes used by the index. Based on this value, you can determine the usage of the index. Especially when combining indexes, it is very important to determine how much part of the index is used. Key_len is computed from the table definition. Create a non-unique index on USER_NAME in table USER as follows:

    Here key_len is evaluated as follows, provided the specified string set is UTF8, variably long and allowed to be null:

    128(variable length set)*3(utF8 for 3 bytes)+1(null flag allowed for 1 byte)+2(length information for 2 bytes)=387;

    Key_len is calculated differently for different types of fields. Here we use the USER table as an example:

    field Key_len instructions
    ID(int, not null) 4 Int is 4 bytes, not null
    USER_NAME(varchar(128), utF8, null) 128 * 3 + 1 + 2 = 387 The value can be 128, and each utF8 takes 3 bytes. One byte identifies the controllable length and two bytes identifies the length

    Different types of fields and character sets require different bytes. A field that is allowed to be empty requires one byte, and a field with variable length requires two bytes. You can calculate other types by following this idea.

  • ref

    Shows which columns of the index are referenced, usually the corresponding field or const;

  • rows

    Approximate the number of rows scanned to find the desired record data based on table statistics and index usage; Not the number of rows of required data.

  • Extra

    This field contains some additional information, but is also an important reference for tuning SQL. The following information usually appears:

    Using index: indicates that an overwrite index is used in a query statement. Data rows of the table are not accessed, resulting in high query efficiency.

    If you use SELECT *, there is no Using index.

    Using filesort: indicates that MySQL will use an external index to sort data (filesort) instead of Using an in-table index. This situation should be avoided in SQL query, preferably not in Extra:

    This is usually caused BY the use of the ORDER BY statement, as is the case in the figure above, when a field without an index is used in the wrong way, such as when combined indexes are used incorrectly.

    Using temporary: create a temporary table to save intermediate results. SQL does not allow you to create temporary tables to save intermediate results.

    This type is often caused BY ORDER BY and GROUP BY, so when sorting data and grouping queries, it is important to pay attention to the proper use of indexes.

    Using where: Use where to filter data.

    Using join buffer: indicates that the table join buffer is used. If the table contains a large amount of data, the buffer may be too large and the query efficiency may be low. In this case, use indexes correctly on the table join fields.

    If the table join query is slow, add an index to the join field to try, cure the problem;

    **impossible WHERE: ** indicates where the condition is always false and no data can be matched.

    The tables and data used are obtained from Gitgub: github.com/zyq025/SQL_…

conclusion

After reading this article, partners to find some SQL to see the corresponding execution plan, is not understand, for optimization significance; But this is not enough, next we need to talk about index, talk about index failure, talk about other optimization methods besides EXPALIN, finally the daily development optimization should be able to handle, away from inefficient SQL, whether there is more time to learn.

A handsome guy who was made ugly by the program, pay attention to “Code variety circle “, learn with me ~~~