MySQL index optimization analysis

Why are your SQL queries slow to write? Why do your indexes often fail? In this chapter, you will learn the causes of MySQL performance degradation, an introduction to indexes, principles for index creation, the use of EXPLAIN commands, and the meaning of explain output fields. Help you understand indexes, analyze indexes, and use indexes to write higher performance SQL statements. What are you waiting for? Rolling up your sleeves is dry!

Case analysis

Let’s take a quick look at the difference between a non-relational database and a relational database. MongoDB is one of the NoSQL types. NoSQL stands for Not only SQL, non-relational database. It is characterized by high performance, strong expansibility and flexible mode, especially in high concurrency scenarios. But at present it is only the supplement of the relational database, it in the consistency of data, data security, query complexity and relational database still have a certain gap. MySQL is a relational database, strong query function, high data consistency, high data security, support secondary index. However, the performance is not as good as that of MongoDB, especially for data with a level of more than one million, which is prone to slow queries. At this time need to analyze the reasons for the slow query, generally is the programmer SQL write bad, or there is no key index, or index failure and other reasons. The company’s ERP system database is mainly MongoDB (NoSQL, which is closest to relational data), followed by Redis, and MySQL only occupies a small part. MySQL is now being used again thanks to Alibaba’s Qimen and Jushita systems. Considering that the number of orders has been more than a million levels, the performance analysis of MySQL is particularly important.

Let’s start with two simple examples. The functions and meanings of each parameter are described in details later. Note: the SQL you need to use has been put on Github, like students can click star, haha. Github.com/ITDragonBlo…

Scenario 1: Order import, using the transaction number to avoid repeated orders

Business logic: When importing an order, in order to avoid repeated lead orders, the transaction number is generally used to query the database to determine whether the order already exists.

The most basic SQL statement

mysql> select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +-------+--------------------+-------+------+----------+--------------+----------+------------------+-------------+----- --------+------------+---------------------+ | id | transaction_id | gross | net | stock_id | order_status | descript | finance_descript | create_type | order_level | input_user | input_date | +-------+--------------------+-------+------+----------+--------------+----------+------------------+-------------+----- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 10000 | 81 x97310v32236260e 10 | | | | | 1 6.13 6.6 ok ok | | auto | | 1 itdragon | 2017-08-18 17:01:49 | +-------+--------------------+-------+------+----------+--------------+----------+------------------+-------------+----- --------+------------+---------------------+ mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ -------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | itdragon_order_list | NULL | | NULL ALL | NULL | NULL | NULL | 3 | | 33.33 Using the where | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ -------------+Copy the code

There was nothing wrong with the query itself, nor was there anything wrong with the offline test environment. However, once the function is online, the problem of slow query is oncoming. Millions of orders, full table scan? Ah? Hum! How do I know that this SQL is a full table scan? The explain command shows how MySQL processes SQL statements. The printed contents are as follows: ID: The serial number is 1. Select_type: query type is simple query, simple SELECT statement without union and subquery. Table: The table is itdragon_ORDER_list. Partitions: No partitions. Type: indicates the connection type. All indicates that full table scan is adopted. Possible_keys: Possible null index. Key: the actual index is null. Key_len: The length of the index is also null. Ref: No column or parameter is used with the key. Extra: uses where query. Because there are only three pieces of data in the database, the rows and filtered information doesn’t matter much. If type is ALL, the performance of the full table scan is the worst. If there are millions of data in the database, it will be unusually slow without the help of the index.

Preliminary optimization: Create an index for transaction_id

mysql> create unique index idx_order_transaID on itdragon_order_list (transaction_id); mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------+ | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | NULL | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------+Copy the code

The index created here is a unique index, not a normal index. The only index that prints the type value is const. Indicates that it can be found by indexing once. That is to find the value of the end of the scan to return the query results. A normal index prints a type value of ref. Represents a non-unique index scan. Find the value and continue scanning until the index file has been scanned. (No code is posted here.) It is obvious that const performs much better than ref. And based on the business logic, it makes sense to create a unique index.

Optimize again: overwrite indexes

mysql> explain select transaction_id from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref  | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------------+ | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | Using index | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------------+Copy the code

Select * from select transaction_id from Extra (select transaction_id from Extra) Using index (select transaction_id from Extra) Using filesort(Using internal sort) and Using temporary(Using temporary table) indicate that the SQL needs to be optimized immediately. According to the business logic, the query structure return transaction_id can satisfy the business logic requirements.

Scenario 2, order management page, sorted by order level and order entry time

Business logic: Priority processing orders with high order level and long input time. Order by, order by, order by, order by, order by, order by

The most basic SQL statement

mysql> explain select * from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered  | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort  | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+Copy the code

First of all, Using full table scan is not reasonable, and Using file sorting Using filesort further slows down performance. MySQL before version 4.1 uses the dual-path sorting algorithm to sort files. As a result, it takes a long time to scan disks twice. Finally, a single sorting algorithm is optimized. The essence of this is to trade space for time, but if the amount of data is too large, the buffer space is insufficient, resulting in multiple I/ OS. The effect is worse. Rather than ask your operations colleagues to change the MySQL configuration, build the index yourself.

Preliminary optimization: create composite index for order_level,input_date

mysql> create index idx_order_levelDate on itdragon_order_list (order_level,input_date); mysql> explain select * from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered  | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort  | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+Copy the code

After creating a composite index, you will be surprised to find that it is the same as if you did not create the index. Both are full table scans, both use file sort. Index failure? Failed index creation? Let’s try to see what the print looks like

mysql> explain select order_level,input_date from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------------+ | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | Using index | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------------+Copy the code

Select order_level,input_date from select * from Type changed from all to index, indicating full index scan. Extra also indicates that overwrite index is used. But wrong !!!! Order_level and input_date are the only two fields returned. Should we create a composite index for each field? MySQL is not so stupid as to use force index to force an index. Change force index(idx_order_levelDate) on the original SQL statement.

mysql> explain select * from itdragon_order_list force index(idx_order_levelDate) order by order_level,input_date; +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------+ | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | NULL | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------+Copy the code

Optimize again: Should order levels really be sorted?

It doesn’t make much sense to sort an order level, nor does it make much sense to index an order level. Because order_level can only have low, medium, high, and rush. Sorting and indexing are of little use for fields that are repetitive and evenly distributed. Can we fix order_level first and then sort input_date? If the query is effective, you can recommend this query method to your colleagues.

mysql> explain select * from itdragon_order_list where order_level=3 order by input_date; +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---- ---+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---- ---+------+----------+-----------------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ref | idx_order_levelDate | idx_order_levelDate | 5 | const | 1 | 100 | Using index condition | +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---- ---+------+----------+-----------------------+Copy the code

Type is upgraded from index to ref(non-unique index scan) compared to the previous SQL. Index length changed from 68 to 5, indicating that only one index was used. Ref is also a constant. Extra: Using index condition: automatically selects index scan or full table scan based on the critical value. Overall, the performance is much better than the previous SQL.

The above two examples are just a quick start, but it’s important to remember that optimization is based on business logic. Business logic should never be modified for optimization. It would be best if it could be modified.

The index profile

Official definition: An Index is a data structure that helps MySQL efficiently retrieve data.

Why is an index a data structure and how does it speed up queries? We take the most common binary tree to analyze how indexing works. Take a look at the picture below:

Advantages of creating indexes 1 Speed up data retrieval and reduce database IO costs: The purpose of using indexes is to speed up search by reducing the number of records in a table that need to be queried. 2 reduce the cost of sorting data, reduce THE CPU consumption: the index is fast, because the data is sorted first, if the field exactly needs to be sorted, it really reduces the cost of sorting.

Disadvantages of creating an index 1 Storage space: An index is actually a table that records primary keys and index fields. It is usually stored on disk as an index file. 2 Slow down the update speed of the table: When the data of the table changes, the corresponding index also needs to be changed, thus reducing the update speed. Otherwise, the index may point to the wrong physical data, which is one of the causes of index failure. Quality index creation is difficult: Index creation is not overnight or constant. The best indexes need to be created frequently based on user behavior and specific business logic.

The index classification

When we say index, we usually refer to the index of the BTree structure. There are aggregated index, secondary index, composite index, prefix index, unique index, general index, of course, in addition to B+ tree, there are hash index, etc.

Single value index: an index contains only a single column, a table can have multiple single index only index: the value of the index columns must be the only, but allow free value composite index: an index contains multiple columns, it is recommended to use in practical development in practical development is recommended to use a composite index, and the index number of the single table creation advice not more than five

Basic syntax: Create:

create [unique] index indexName on tableName (columnName...)
alter tableName add [unique] index [indexName] on (columnName...)Copy the code

Delete:

drop index [indexName] on tableNameCopy the code

To view:

show index from tableNameCopy the code

Indexes are required in the following situations: 1 Primary key and unique index 2 Indexes are required for fields that are frequently used for query. 3 Indexes are required for fields that are frequently used for sorting, grouping, and statistics. 4 Indexes are required for fields that are associated with other tables in the query and foreign key relationships

Do not create an index in the following situations: 1 the table has too few records and does not need to create an index for the data of less than millions. 2 The table that is frequently added or deleted does not need to create an index. 3 The table that has the same data and is evenly distributed does not need to create an index, such as true and false. 4 Do not create indexes for fields that are frequently updated. 5 Do not create indexes for fields that are not used in the WHERE condition

Performance analysis

MySQL bottlenecks

The disk space is insufficient, the disk I/O is too large, and the server hardware performance is low. 1 CPU: Data is loaded into the memory or read from the disk when the CPU is saturated. 2 IO: Disk I/O bottleneck occurs when the data is loaded into the memory. 3 Performance bottleneck of the server hardware: Top,free, IOstat, and vmstat You can view the system performance status

Explain analyze SQL statements

Use the Explain keyword to simulate the optimizer’s execution of SQL queries to see how MySQL processes SQL statements.

+----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-------+ | id |  select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-------+Copy the code

id

Select The serial number of a query, containing a repeatable set of numbers, indicating the order in which SQL statements are executed in the query. Generally, there are three cases: first, all ids are the same, SQL execution order is from top to bottom; Second: all IDS are different, SQL execution order is based on the id of the first execution; The third type: id exists both identical and different. The command is executed based on a large ID first, and then the command is executed based on the same ID from top to bottom.

select_type

Simple: a simple select query that does not contain a subquery or a union primary: If the query contains any complex subquery, the outermost query is marked as primary subQuery. The subqueries contained in the FROM list are marked as derived and MySQL performs these subqueries recursively, putting the results in temporary tables. Union: If the second select appears after the union, it is marked as union, and if the union is included in the subquery of the FROM clause, the outer select is marked as: Derived Union result: The select that gets the result from the union table

partitions

The table uses partitions, if you want to count the amount of company orders for ten years, you can divide the data into ten zones, each year represents a zone. This can greatly improve query efficiency.

type

This is a very important parameter, connection type, common are: all, index, range, ref, eq_ref, const, system, null eight levels. For a Java programmer, it is a good and responsible programmer to ensure that the query reaches at least range and preferably ref. All (Full table scan) is the worst. If there are millions of data, the full table scan will be very slow. A full index file scan is much better than an ALL file scan. After all, it is faster to find data from an index tree than from a full table. Range: Retrieves only rows in a given range, using indexes to match rows. The scope is narrowed, certainly faster than full table and full index file scans. SQL statements generally contain between, in, >, and < queries. Ref: Non-unique index scan, which is essentially an index access, returns all rows that match a single value. For example, if you query all the colleagues belonging to the r&d team in the company, the matching result will be multiple values, not unique values. Eq_ref: Unique index scan, for each index key, there is one record in the table that matches it. For example, if you query the CEO of a company, the result can only be one record, const: indicates that it can be found through the index once, const is used to compare the primary key or unique index. Because only one row of data is matched, MySQL will soon be able to convert the query to a constant if the primary key is placed in the WHERE list. System: The table has only one record (equal to the system table), which is a const column

possible_keys

Displays indexes (one, more, or null) that may be used by the query but may not be actually used by the query. For reference only.

key

Displays the actual index used by the query statement. If null, no index is used.

key_len

Displays the number of bytes used in the index. The length of the index used in the query can be calculated by key_len. The shorter the index length, the better, without sacrificing accuracy. The value key_len displays is the most likely length of the index field, not the actual length, i.e. key_len is calculated from the table definition, not retrieved from the table.

ref

Shows which column or constant of the index is used to find the value on the index column.

rows

Based on table statistics and index selection, the number of rows that need to be read to find the desired record is roughly estimated. The larger the value is, the worse it is.

extra

Using filesort: indicates that MySQL uses an external index sort for data, instead of reading data in the order of the indexes in the table. A sort operation in MySQL that cannot be done with an index is called “file sort”. Optimize SQL immediately when this occurs. Using temporary: A temporary table is used to hold intermediate results. MySQL uses temporary tables when sorting query results. Common in sort order by and group by queries. This makes it even more important to optimize SQL immediately. Using index: the corresponding select operation Using index (Covering index), avoid access to the table row, effect is good! Using WHERE indicates that the index is used to perform a lookup of the index key value. If Using WHERE is not present at the same time, the index is used to read data rather than perform lookup actions. Covering Index (Covering Index) : also called Index Covering, is the select data column only from the Index can be obtained, do not have to read the data row, MySQL can use the Index to return the select list of fields, and do not have to read the data file again according to the Index. Using Index condition: A new feature added since version 5.6 allows the optimizer to select whether to use an index or perform a full table traversal based on the ratio of the number of items in the RANGE to the total number in the presence of an index. Using JOIN buffer: The value of the impossible WHERE: WHERE statement is always false and cannot be used to get any element distinct: Optimize the distinct operation to stop finding the same value once the first matching tuple is found.

filtered

A value of a percentage, used with the value of the Rows column, estimates the result set of the previous table in the query execution plan (QEP) to determine the number of join loops. Small tables drive large tables, reducing the number of connections.

From the parameter introduction of explain, we can know: 1 table read order (ID), 2 data read operation type (type), 3 which indexes are actually used (key), 4 references between tables (ref), 5 How many rows of each table are queried by the optimizer (rows).

Causes of performance degradation

From the programmer’s point of view, 1 the query statement is not written properly. 2 No index is created, the index is improperly created, or the index is invalid. 3 Too many Joins exist in the associated query

conclusion

1 An index is a data structure that is ordered and quickly searched. Its purpose is to improve the efficiency of query. 2 After indexes are created, data is queried faster but updated slower. 3 Performance degradation is most likely caused by index failure. 4 Principles for creating indexes. Frequently queried fields are suitable for creating indexes. Frequently updated data is not suitable for creating indexes. 5 Frequent update of index fields or physical deletion of table data may cause index invalidity. In addition to optimizing SQL statements, you can also optimize the design of tables. As far as possible to make a single table query, reduce the association between tables. Design archiving forms, etc.

This is the end of the MySQL index optimization analysis, if there are any errors, you can point out. Click on the recommendation if you think it’s good.

reference

MySQL order by sorting optimization: blog.51cto.com/ustb80/1073…

ITDragon dragon






www.cnblogs.com/itdragon/