SQL optimization and diagnosis


Explain the diagnosis

Explain The meanings of each parameter are as follows:

The column name instructions
id Execute the number to identify the row to which select belongs. If there is no subquery or associated query in the statement, only a unique SELECT, each row will display a 1. Otherwise, the inner SELECT statements are typically sequentially numbered, corresponding to their position in the original statement
select_type Show that the row is a simple or complex select, with the outermost layer marked PRIMARY (DERIVED, UNION, UNION RESUIT) if the query has any complex subqueries.
table Which table is referenced (referencing a query, such as “derived3”)
type Data access/read operation type (All, index, range, ref, eq_ref, const/system, NULL)
possible_key Revealing which indexes might facilitate efficient lookups
key Shows which index mysql actually decides to use to optimize the query
key_len Displays the number of bytes used by mysql in the index
ref Shows the column or constant used by the previous table to find the value in the index of the key column record
rows The number of rows to be read in order to find the desired row, estimated
Extra Additional information, such as using index, filesORT, etc

Select_type Common type and its meanings

  • SIMPLE: Queries that do not contain subqueries or UNION operations
  • PRIMARY: If the query contains any subqueries, the outermost query is marked as PRIMARY
  • SUBQUERY: the first SELECT in a SUBQUERY
  • DEPENDENT SUBQUERY: The first SELECT in a SUBQUERY that depends on an external query
  • UNION: The second or subsequent query of the UNION operation
  • DEPENDENT UNION: The second or subsequent query of the UNION operation, DEPENDENT on the external query
  • UNION RESULT: The RESULT set generated by the UNION
  • DERIVED: a subquery that appears in the FROM clause

Type Common types and their meanings

  • System: This is a special case of const type and occurs only if the table to be queried has only one row of data
  • consts: Usually occurs when the primary key or unique index is compared with the constant value. In this case, the query performance is optimal
  • Eq_ref: This is used when the connection is using a full INDEX and is a PRIMARY KEY or UNIQUE NOT NULL INDEX
  • ref: Used when the join uses a prefix INDEX or when the join condition is not a PRIMARY KEY or UNIQUE INDEX
  • Ref_or_null: Similar to a ref type query, but with a NULL value column attached
  • Index_merge: This join type indicates that indexes are used for merge optimization
  • range: Uses an index to perform a range scan. This is common for query conditions such as BETWEEN, >, and <
  • index: The index join type is the same as that of ALL, except that the index tree is scanned, which usually occurs when the index is the overwritten index of the query
  • ALL: full table scan, which is the least efficient search method

Ali coding specification requirements: at least to reach the range level, the requirement is ref level, if can be consts best

The key column

Whether an index flag field is actually used in the query

Extra column

The Extra column displays additional information. Common information and its meanings are as follows:

  • Using WHERE: The MySQL server filters rows after the storage engine retrieves them
  • Using filesort: Usually occurs in a GROUP BY or ORDER BY statement, and the sort or grouping is not based on an index. In this case, you need to use files to sort in memory
  • Using index: A query Using an overridden index can retrieve all data from the index without accessing the table
  • Using index condition: An index is used, but the data is queried back to the table
  • Using TEMPORARY: Indicates that a temporary table is required to process a query. This is usually used in a GROUP BY or ORDER BY statement

How to view SQL optimized by the Mysql optimizer

Explain extended SELECT * FROM 'student' WHERE 'name' = 1 and 'age' = 1; # Run show warnings; # Result:  /* select#1 */ select `mytest`.`student`.`age` AS `age`,`mytest`.`student`.`name` AS `name`,`mytest`.`student`.`year` AS `year` from `mytest`.`student` where ((`mytest`.`student`.`age` = 1) and (`mytest`.`student`.`name` = 1))Copy the code

Why do it? We know Mysql has a leftmost matching rule, so if my index is set to “age”, “name”, then I can query in order of “name”, “age”. In fact, we can, because the Mysql query optimizer helps us automatically optimize the order of SQL execution to select the cheapest way to query (note: the cheapest, not the shortest).

SQL optimization

Oversized paging scenario solution

If the data in the table needs to be deeply paging, how can I improve efficiency? In the Java programming specification produced by Ali, it says:

Use deferred correlation or subqueries to optimize hyperpaging scenarios

Description: Instead of skipping the offset rows, MySQL takes offset+N rows and returns the previous offset rows, returning N rows. When the offset is very large, it is very ineffective to either control the total number of pages returned or do SQL rewriting for pages that exceed a certain threshold

Select * from task_result LIMIT 2000000, 10; SELECT a.* FROM task_result a, (SELECT id FROM task_result LIMIT 20000000, 10) b where a.id = b.id; The task_result table is a table in the production environment. The total data amount is 34 million, id is the primary key, and the offset is 20 millionCopy the code

Limit 1 when retrieving a piece of data

If the situation of the data table is known and a service needs to obtain a data that meets a Where condition, use Limit

Note: In many cases where we know that only one of the data exists, we should tell the database that only one of the data exists, otherwise a full table scan will be performed

# counterexample (takes 2424.612 s) select * from task_result where unique_key = 'ebbf420b65d95573db7669f21fa3be3e_861414030800727_48'; # are case (takes 1.036 s) select * from task_result where unique_key = 'ebbf420b65d95573db7669f21fa3be3e_861414030800727_48' LIMIT 1; # task_result = task_result = task_result; # task_result = task_resultCopy the code

Bulk insert

INSERT into person(name,age) values('A',24) INSERT into person(name,age) values('B',24 The person (name, age) values (' C ', 24) # is INSERT into the person (name, age) values (' A ', 24), (' B ', 24), (' C ', 24); # description is more conventional, so I won't do much explanationCopy the code

Optimization of like statements

General business requirements of like statements are in the form of ‘% keyword %’, but we still need to consider whether the right fuzzy way can be used to replace the requirements of products, in which The Coding specification of Ali mentions:

Page search is strictly prohibited left or all fuzzy, if you need to go to the search engine to solve

# counterexample (takes 78.843 s) EXPLAIN the select * from task_result where taskid LIKE '% % tt600e6b601677b5cbfe516a013b8e46 LIMIT 1; # are case (takes 0.986 s) select * from task_result where taskid LIKE 'tt600e6b601677b5cbfe516a013b8e46%' LIMIT 1 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # for example Explain 1 SIMPLE task_result range SIMPLE task_result ALL 33628554 11.11 Using WHERE # The TASK_result table is a table in the production environment, and the total data amount is 34 million. Taskid is a common index column. Therefore, indexes cannot be used in the %% matching mode, and the full table scan efficiency is very lowCopy the code

Avoid function transformations or expression evaluations of the WHERE field in SQL

Select * from task_result where id + 1 = 15551; Select * from task_result where id = 15550; # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # for example Explain 1 SIMPLE task_result const Explain 1 SIMPLE task_result ALL 33631512 100.00 Using where # Actually after know the SQL optimizer, I personally feel this common expression conversion should be processed in advance and then to query, so can use index, but the problem again, if the mysql optimizer can calculate the result in advance, then write the SQL statement must also can calculate the result in advance, So there is a contradiction here, which makes indexing impossible until version 5.7, and it may be optimized in the futureCopy the code

Use ISNULL() to determine if the value ISNULL

Note: A direct comparison of NULL to any value is NULL

# 1) NULL<>NULL returns NULL, not false. # 2) NULL=NULL returns NULL instead of true. # 3) NULL<>1 returns NULL instead of true.Copy the code

Multi-table query

The company I work for basically prohibits multiple table queries, so if we have to use them, we can refer to Alibaba’s coding specifications together

Eg: Join cannot be joined if more than three tables exist. The data type of the fields to be joined must be absolutely the same. When multiple tables are queried by association, ensure that the associated fields have indexes

Why do full table scans when there are indexes

When I answered some interview questions before, THERE was a deviation in my understanding of a certain point, that is, I thought that as long as the queried column had an index, the index would be used to Push the data

However, there is more to it than that, and it should be: too many rows for the query will be converted to a full table query

So what does this over-reference mean?

My test results are 50%, but personally I don’t think the MySQL optimizer is completely preoccupied with row count and whether it’s full table or not. There are a lot of other factors that combine to find that full table scans are more efficient, etc., so just be aware of this issue

Count (*) or count (id)

Ali’s Java coding specification includes the following:

[Mandatory] Do not use count(column name) or count(constant) instead of count(*)

Count (*) is the standard syntax for counting the number of rows defined by SQL92. It is independent of the database, and is independent of NULL and non-NULL.

Note: Count (*) counts NULL rows, while count(column name) does not count NULL rows

The index is invalid because the field type is different

Ali’s Java coding specification includes the following:

[Recommended] Prevent index invalidation due to implicit conversion due to different field types

In fact, the database will perform an implicit conversion when querying. For example, the vARCHar type field will be queried by number

SELECT * FROM 'user_coll' where pid = '1'; SELECT * FROM 'user_coll' WHERE pid = 1; SELECT * FROM 'user_coll' where pid = 1; Type: index ref: NULL rows:3 Using index # Indicates that the pid field has an index and is in vARCHar formatCopy the code

about

Thanks to the following blog posts and their authors:

Dry! SQL performance optimization to write high-quality SQL statements

Dry! SQL Performance optimization, writing high-quality SQL Statements (part 2)

MySQL official document

Tips

Self-built data table for testing

CREATE TABLE 'student' (' id 'bigint(20) NOT NULL AUTO_INCREMENT COMMENT' primary key ', 'name' vARCHar (255) NOT NULL, 'class' varchar(255) DEFAULT NULL, 'page' bigint(20) DEFAULT NULL, 'status' Tinyint (3) unsigned NOT NULL COMMENT' ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET= UTF8MB4Copy the code

Insert data

DELIMITER ;;
    CREATE PROCEDURE insertData()
    BEGIN
        declare i int;
        set i = 1 ;
        WHILE (i < 1000000) DO
            INSERT INTO student(`name`,class,`page`,`status`)
                VALUES(CONCAT('class_', i),
                    CONCAT('class_', i),
                i, (SELECT FLOOR(RAND() * 2)));
            set i = i + 1;
        END WHILE;
        commit;
		END;;
CALL insertData();
Copy the code