Hey Guys, this is Programmer Cxuan. Welcome to my latest article, which is a summary of my tuning experience in MySQL development. I hope it will be helpful to you. Let’s start the text.
I have included this article on Github, address in MySQL optimization
In general, traditional Internet companies rarely touch the PROBLEM of SQL optimization. The reason is that the amount of data is small, and the database performance of most manufacturers can meet the daily business needs, so there is no need for SQL optimization. However, with the continuous increase of application programs and the surge of data volume, the performance of the database itself cannot keep up. This is where you need to optimize from the perspective of SQL itself, which we discuss in this article.
SQL Optimization Steps
When faced with a SQL that needs to be optimized, what kinds of troubleshooting ideas do we have?
Run the show status command to check the number of SQL executions
First, we can use the show status command to check the server status information. The show status command displays each server variable variable_name and value. Status variables are read-only. If you use SQL commands, you can use like or WHERE conditions to restrict the results. Like does standard pattern matching for variable names.
I did not complete the picture, there are many variables below, readers can try their own. You can also obtain these messages using the mysqladmin extended-status command on the operating system.
But this error occurs when I execute mysqladmin extended-status.
Mysqladmin -p3306 -uroot -p -h127.0.0.1 -r -i 1 extended-status mysqladmin -p3306 -uroot -p h127.0.0.1 -r -i 1 extended-status
Note that there are two levels at which statistics can be added to the show status command
- Session level: Default statistics of the current link
- Global: statistics since the database was started last time
If the statistics level is not specified, the session level is used by default.
There are two types of parameters that need to be paid attention to in the statistical results queried by show status. One type of parameters starts with Com_, and the other type of parameters starts with Innodb_.
The following is the parameter starting with Com_, many parameters, I also did not truncate.
Com_xxx indicates the number of times each XXX statement is executed. We are usually concerned with the number of times select, INSERT, UPDATE, delete statements are executed, i.e
- Com_select: indicates the number of times the select operation is performed. A query results in + 1.
- Com_insert: Number of INSERT operations performed. For batch INSERT operations, this is added up only once.
- Com_update: indicates the number of UPDATE operations performed.
- Com_delete: indicates the number of DELETE operations.
Parameters starting with Innodb_ are mainly
- Innodb_rows_read: Number of rows returned from a select query.
- Innodb_rows_inserted: Number of rows inserted during the INSERT operation.
- Innodb_rows_updated: Number of rows updated during the UPDATE operation.
- Innodb_rows_deleted: Number of rows deleted by the DELETE operation.
Through the statistics of the results of the above parameters, we can roughly understand the current database is mainly updated (including insert, delete) or query.
In addition, there are a few other parameters that you can use to get the basics of your database.
- Connections: Query the number of Connections to the MySQL database. This number counts whether the connection was successful or not.
- Uptime: indicates the working time of the server.
- Slow_queries: indicates the maximum query times.
- Threads_connected: Displays the number of currently open connections.
The following blog summarizes almost all of the show Status parameters and serves as a reference manual.
Blog.csdn.net/ayay_870621…
Locate SQL that is executing inefficiently
There are two methods to locate SQL statements that are not executed efficiently
- You can slow query the log to locate which SQL statements are executing less efficiently.
The MySQL provides a log recording function for slow query. The SQL statements that are queried for more than a certain number of seconds can be written into the slow query log. During routine maintenance, you can quickly and accurately identify faults based on the log information of slow query. When started with the –log-slow-queries option, mysqld writes a log file containing all SQL statements that take longer than long_query_time seconds to execute. You can locate inefficient SQL statements by looking at this log file.
For example, we can add the following code to my.cnf, then exit and restart MySQL.
log-slow-queries = /tmp/mysql-slow.log
long_query_time = 2
Copy the code
Usually we set the maximum query time to 2 seconds, which means that queries that are longer than 2 seconds are recorded. Usually 2 seconds is enough, but for many WEB applications, 2 seconds is still a long time.
It can also be turned on by using the command:
MySQL slow query log is enabled
show variables like "%slow%";
Copy the code
The slow log query function is enabled
set global slow_query_log='ON';
Copy the code
Then check whether the slow query function is enabled
As shown in the figure, we have enabled the slow query log.
Slow query logs are logged after the query is complete, so slow query logs do not locate problems when the application response execution efficiency is problematic. In this case, run the show processlist command to see which threads are currently running MySQL. You can view the SQL execution in real time, including the status of the thread and whether the table is locked. Again, you can get this information using the mysqladmin ProcessList statement.
Here’s how to explain the concepts of each field
- Id: Id is an identifier that is useful when using the kill command to kill a process, such as the kill process number.
- User: Displays the current User. If not root, this command displays only the SQL statements that you have permission to use.
- Host: Displays the IP address used to trace the problem
- Db: shows which database the process is currently connected to. Null means no database is selected.
- Command: displays the commands executed by the current connection lock. Generally, there are three types: query, sleep, and connect.
- Time: Indicates the duration of the state, in seconds
- State: Displays the current State of the SQL statement. This is very important, as explained below.
- Info: Displays the SQL statement.
The State column is very important, and there is a lot more to this column, so you can check it out in this article
Blog.csdn.net/weixin_3435…
This involves the status of the thread, whether to lock the table and other options, can be real-time view of SQL execution, while some lock table optimization.
The EXPLAIN command is used to analyze the SQL execution plan
After inefficient SQL statements are queried in the preceding steps, you can use the EXPLAIN or DESC command to obtain information about how MySQL executes SELECT statements, including how tables are joined and the join order during SELECT statement execution.
For example, we use the following SQL statement to analyze the execution plan
explain select * from test1;
Copy the code
The contents of the above table are as follows
- Select_type: indicates a common SELECT type. The common type is SIMPLE. SIMPLE indicates a SIMPLE SQL statement and does not include UNION or subquery operations.
SELECT * from ‘UNION’; SELECT * from ‘UNION’; SELECT * from ‘UNION’; SELECT from ‘UNION’;
UNION, in a UNION operation, the inner SELECT in the query (when the inner SELECT statement has no dependency on the outer SELECT statement).
SUBQUERY: the first SELECT (if more than one SUBQUERY exists) in a SUBQUERY, as in our query statement above, the first SUBQUERY is sr (sys_role) table, so its select_type is SUBQUERY.
-
Table, this option represents the table that outputs the result set.
-
This option is worth exploring in depth because many SQL tuning methods are based on type, but in this article we will focus on the optimization method. The type field will be used for the time being, and we will not go too far into this article.
The type field has implications for the performance of the connection, ranging from good to different
System: When there is only one data in the table, the table is queried as if it were a constant table.
Const: when only one entry in a table matches, such as when the primary key or unique index is used.
Eq-ref: indicates that the primary key or unique index of the table is used when multiple tables are joined, for example
select A.text, B.text where A.ID = B.ID Copy the code
For each ID row in table A, table B can only have A unique B. ID to match.
Ref: This type is not as fast as eq-ref above, because it indicates that because for every row scanned in table A, there are several possible rows in table C, c.ID is not unique.
Ref_or_null: Similar to ref, except this option includes a query for NULL.
Index_merge: Query statements that use more than two indexes, often in the case of and and or keywords, but may not perform as well as range (described below) because too many indexes are read.
Unique_subquery: This option is often used after the in keyword in subqueries with the WHERE keyword, which is represented in SQL
value IN (SELECT primary_key FROM single_table WHERE some_expr) Copy the code
Range: a range query, typically used IN queries using =, <>, >=, <, <=, IS NULL, <=>, BETWEEN, IN(), or like.
Index: index full table scan, which scans the index from beginning to end.
Select * from ‘XXX’ where ‘XXX’ = ‘select * from’ XXX ‘where’ XXX ‘=’ XXX ‘where’ XXX ‘=’ XXX ‘
The above is a general explanation of type. We often use Explain to analyze type in SQL tuning, and then improve the query method. The closer to system, the higher the query efficiency, and the closer to ALL, the lower the query efficiency.
- Possible_keys: possible index to use in a query.
- Key: indicates the index in use.
- Key_len: indicates the length of the index field.
- Rows: Number of rows scanned.
- Filtered: The proportion of the number of SQL entries queried by query criteria in the total number of rows.
- Extra: a description of the execution.
From the above analysis, we can roughly determine the reason for the low efficiency of SQL. A very effective way to improve the efficiency of SQL queries is to use indexes. Next, I will explain how to use indexes to improve the efficiency of queries.
The index
Indexes are one of the most common and important methods of database optimization. Most SQL performance problems can be solved by using different indexes. This is one of the most frequently asked optimization methods in interviews. Want to! Not just use it, but understand it! Richard!
The index is introduced
The purpose of an index is to quickly find data for a column, and using an index for related columns can greatly improve the performance of query operations. Without indexes, MySQL must read the entire table from the first record until it finds the relevant row, and the larger the table, the more time it takes to query the data. If the columns in the table are queried with indexes, MySQL can quickly go to a location to search the data file without having to look at all the data, which will save a significant amount of time.
The index classification
First, let’s look at the categories of indexes.
Global index (FULLTEXT)
Global indexes, currently only supported by the MyISAM engine, were developed to solve the problem of inefficient fuzzy queries against TEXT, and are limited to CHAR, VARCHAR, and TEXT columns.HASH index
The hash index is the only key-value data structure used by MySQL. It is suitable for indexing. A HASH index has the benefit of a single location and does not need to be looked up node by node like a tree, but this lookup is suitable for looking up a single key, where the performance of the HASH index is poor for range lookups. By default, the MEMORY storage engine uses HASH indexes, but BTREE indexes are also supported.B-tree indexes
BTree is a Balance Tree. There are many varieties of BTree. The most common is B+ Tree, which is widely used by MySQL.R - Tree indexes
: R-Tree is rarely used in MySQL and only supports the geometry data type. Only MyISAM, BDb, InnoDb, NDb, and Archive can support this type of storage engine. Compared with B-Tree, R-Tree has the advantage of range search.
MySQL is logically classified into the following categories
-
Ordinary index: Ordinary index is the most basic index type, it has no restrictions. Create it as follows
create index normal_index on cxuan003(id); Copy the code
Delete the way
drop index normal_index on cxuan003; Copy the code
-
Unique index: The value of a unique index column must be unique. Empty values are allowed. For a composite index, the combination of column values must be unique
create unique index normal_index on cxuan003(id); Copy the code
-
Primary key index: a special index. A table can have only one primary key and cannot have empty values. The primary key index is created when the table is created.
CREATE TABLE `table` ( `id` int(11) NOT NULL AUTO_INCREMENT , `title` char(255) NOT NULL , PRIMARY KEY (`id`))Copy the code
-
Composite index: an index created on multiple fields. An index is used only when the first field of the index is used in the query condition. Follow the leftmost prefix rule when using composite indexes, which we will create below.
-
Full-text indexes are used to find keywords in text, not to compare them directly with values in the index. Currently, full-text indexes can only be created on char, Varchar, and TEXT columns
CREATE TABLE `table` ( `id` int(11) NOT NULL AUTO_INCREMENT , `title` char(255) CHARACTER NOT NULL , `content` text CHARACTER NULL , `time` int(10) NULL DEFAULT NULL , PRIMARY KEY (`id`), FULLTEXT (content));Copy the code
You can also create global indexes directly
CREATE FULLTEXT INDEX index_content ON article(content) Copy the code
Index usage
The index can be created when the table is created, or it can be created separately. In this case, we create the prefix index at cxuan004
We used Explain for analysis, and we can see how Cxuan004 uses the index
If you do not want to use an index, you can delete the index
Index Usage Rules
We create a composite index on cxuan005 based on ID and hash, as shown below
create index id_hash_index on cxuan005(id.hash);
Copy the code
The execution plan is then analyzed according to the ID
explain select * from cxuan005 where id = '333';
Copy the code
You can see that the index can still be used even if the compound index (Id, hash) is not used in the WHERE condition. This is the prefix property of the index. However, if the query is based solely on hashes, the index is not used.
explain select * from cxuan005 where hash='8fd1f12575f6b39ee7c6d704eb54b353';
Copy the code
The index can only be used if the WHERE condition uses a like query and % is not the first character.
For composite indexes, you can only do like queries with IDS, because the hash column does not go through the index no matter how you query it.
explain select * from cxuan005 where id like '% 1';
Copy the code
As you can see, if the first character is %, no index is used.
explain select * from cxuan005 where id like '1%';
Copy the code
If the % sign is used, the index is fired.
If the column name is an index, then a NULL query on the column name will trigger the index.
explain select * from cxuan005 where id is null;
Copy the code
There are also cases where indexes exist but MySQL will not use them.
-
In the simplest case, MySQL will not use indexes if using them is less efficient than using them at all.
-
If an OR condition is used in SQL and the column before OR has an index and the column behind it does not have an index, then none of the indexes involved will be used. For example, in cxuan005, only the ID and hash fields have an index, while the INFO field does not have an index.
explain select * from cxuan005 where id = 111 and info = 'cxuan'; Copy the code
As we can see from the explain execution, the ID_hash_index index is still available on possible_keys, but the index is not used in the SQL statement from key and key_len.
-
Query data on a column with a compound index that is not the first column, and the index is not used.
explain select * from cxuan005 where hash = '8fd1f12575f6b39ee7c6d704eb54b353'; Copy the code
-
If the columns of the WHERE condition participate in the calculation, the index is not used either
explain select * from cxuan005 where id + '111' = '666'; Copy the code
-
Index columns use functions, and indexes are not used either
explain select * from cxuan005 where concat(id.'111') = '666'; Copy the code
-
If the index column uses like and % is the first character, the index is not used.
-
In an Order BY operation, the sorted columns are also in the WHERE statement and the index is not used.
-
When the data type is implicitly converted, such as when a Varchar without single quotes may be converted to an int, the index is invalidated and a full table scan is triggered. For example, the following two examples can clearly illustrate this point
-
Use the IS NOT NULL operation on the index column
-
Use <>,! =. The non-equal operator never uses an index, so its processing will only result in a full table scan.
There are many scenarios in which the index is set but does not take effect. This needs to be summarized and improved in the work of my colleagues. However, the index failure scenarios I summarized above can cover most of the index failure scenarios.
View the index usage
When MySQL indexes are used, there is a Handler_read_key value that represents the number of times a row has been read by the index. A low value for Handler_read_key indicates that the performance improvement from adding an index is less than ideal, and that the index may not be used as often as it should be.
Another value is Handler_read_rnd_next. A high value means that the query is not running efficiently and should be indexed to rescue it. This value means the number of requests to read the next line in the data file. If a large number of table scans are being performed and the value of Handler_read_rnd_next is high, the table index is incorrect or the query was written without utilizing the index.
MySQL analysis table, check table, and optimization table
For most developers, they prefer to solve the optimization of simple SQL, and leave the optimization of complex SQL to the company DBA.
Here are some simple ways to optimize from the perspective of ordinary programmers.
MySQL analysis table
The analysis table is used to analyze and store the keyword distribution of the table. The analysis results can make the system get accurate statistics and make THE SQL generate the correct execution plan. If used to feel that the actual execution plan does not match expectations, an analysis table can be executed to solve the problem
analyze table cxuan005;
Copy the code
The field attributes involved in the analysis result are as follows
Table: indicates the name of the Table.
Op: operations performed, analyze for analyzing, check for checking, optimize for optimizing;
Msg_type: indicates the information type, which is usually one of four values: status, warning, error, and message.
Msg_text: Displays information.
Regular analysis of tables improves performance and should be part of your daily routine. This is because database performance can be improved by analyzing tables by updating their index information.
MySQL checklist
Databases can often encounter errors, such as an error while data is being written to disk, or indexes are not being updated in sync, or the database stops without shutting down MySQL. Incorrect key file for table: ‘. Try to repair it. At this point, we can use the Check Table statement to Check the Table and its corresponding index.
check table cxuan005;
Copy the code
The main purpose of a checklist is to check for errors in one or more tables. Check Table is useful for MyISAM and InnoDB tables. Check Table can also Check for view errors.
MySQL optimizing table
MySQL optimized tables are used when a large amount of table data has been dropped, or when a large amount of changes have been made to include VARCHAR, BLOB, or TEXT commands. MySQL optimized tables can merge large amounts of space debris, eliminating the waste of space due to deletes or updates. Its command is as follows
optimize table cxuan005;
Copy the code
My storage engine is an InnoDB engine, but as you can see from the graph, InnoDB does not support optimize, so I recommend using abstract + Analyze for optimization. The optimize command only works with MyISAM and BDB tables.
Common SQL optimizations
Earlier we introduced the use of indexes to optimize MySQL, so how to optimize the various syntax and syntax of SQL? Next, I’ll talk about SQL optimization from the perspective of SQL commands.
Optimization of imports
For a table of type MyISAM, you can import a large amount of data in the following manner
ALTER TABLE tblname DISABLE KEYS;
loading the data
ALTER TABLE tblname ENABLE KEYS;
Copy the code
These two commands enable or disable the update of non-unique indexes of MyISAM table. By setting these two commands, you can improve the efficiency of importing large amounts of data into a non-empty MyISAM table. For importing large amounts of data into an empty MyISAM table, the default is to import the data first and then create the index, so no setting is required.
However, for InnoDB search engine tables, this does not improve the import efficiency. There are several ways to improve the import efficiency:
- Since InnoDB tables are stored in the order of primary keys, the import data can be arranged in the order of primary keys to effectively improve the efficiency of data import. If InnoDB table does not have a primary key, the system will create an internal column as the primary key by default, so if you can create a primary key for the table, you can take advantage of this to improve the efficiency of data import.
- Before importing data, run SETUNIQUE_CHECKS = 0 to disable the uniqueness check. After importing data, run SETUNIQUE_CHECKS = 1 to restore the uniqueness check. This improves the import efficiency.
- If the application uses automatic commit, you are advised to run SET AUTOCOMMIT = 0 to disable automatic commit before the import and run SET AUTOCOMMIT = 1 to enable automatic commit after the import to improve the import efficiency.
The optimization of the insert
There are several ways to consider tuning when inserting statements
- If you want to insert multiple entries into the same table, it is best to do so at once to reduce the database connection -> disconnect time, as shown below
insert into test values(1.2), (1.3), (1.4)
Copy the code
- If you are inserting multiple pieces of data into different tables, you can use the INSERT Delayed statement to improve execution efficiency. Delayed means that the INSERT statement is executed immediately, or the data will be queued in memory and not actually written to disk.
- For MyISAM tables, the value of Bulk_insert_buffer_size can be increased to improve insertion efficiency.
- It is best to keep indexes and data files on separate disks.
Optimization of group by
In grouping and sorting scenarios, if Group By is followed By Order By, you can specify Order By NULL to disallow sorting because Order By NULL can avoid filesort, which can be time-consuming. As shown below.
explain select id.sum(moneys) from sales2 group by id order by null;
Copy the code
Optimization of order by
In the execution plan, filesort is often seen in Extra column. Filesort is a kind of filesort. This kind of sort is slow and we think it is bad sort and needs to be optimized.
The way to optimize is to use indexes.
We create an index on cxuan005.
create index idx on cxuan005(id);
Copy the code
We then use the query fields and sort in the same order.
explain select id from cxuan005 where id > '111' order by id;
Copy the code
As you can see, in this query, Using Index is used. This indicates that we are using an index.
If the index is created in a different order than order by, Using filesort will be used.
explain select id from cxuan005 where id > '111' order by info;
Copy the code
MySQL supports two types of sorting: filesort and index. Using index means that MySQL scans the index itself to complete sorting. Index is efficient, while filesort is inefficient.
Order BY uses index only if the following conditions are met
- The Order BY statement uses the left-most front row of the index.
- Use the WHERE clause with the Order by clause to satisfy the left-most front row of the index.
Optimize nested queries
Nested queries are commonly used to create a single query result using a SELECT statement and then use that result as the scope of the nested statement in another query statement. When used, subqueries can break a complex query into separate parts that are logically easier to understand and code can be maintained and reused.
However, in some cases, the efficiency of subquery is not high. Generally, join is used to replace subquery.
Explain analysis using SQL statements with nested queries is as follows
explain select c05.id from cxuan005 c05 where id not in (select id from cxuan003);
Copy the code
It can be seen from the explain results that the query of the main table is INDEX and the subquery is INDEx_subquery, both of which are not highly efficient. Our optimized analysis plan using Join is as follows.
explain select c05.id from cxuan005 c05 left join cxuan003 c03 on c05.id = c03.id;
Copy the code
It can be seen from the analysis results of explain that the main table query and subquery are index and REF respectively, and ref has a relatively high execution efficiency. System–>const–>eq_ref–>ref–> Fulltext – > ref_or_null – > index_merge – > unique_subquery – > index_subquery – > range – > index — > all.
The optimization of the count
Count is often used to count the number of rows in a column of a result set. When MySQL confirms that the expression in parentheses cannot be null, it is actually counting the number of rows.
Count counts the number of columns. By default, it does not count NULL values.
A common mistake is to specify a column in parentheses but expect to count the number of rows in the result set. If you want to know the number of rows in the result set, it is best to use count(*).
Optimization of limit paging
Normally our system does paging, and normally we do that with a limit plus an offset. We’ll also add an Order by statement to sort it. If you use indexes, the efficiency is generally not a problem. If you don’t use indexes, MySQL will probably do a lot of file sorting.
In general, we might have a situation like limit 1000, 50, discard 1000 and only take 50, which is very expensive, if all pages are visited equally often, then on average such a query would need to access half of the table data.
To tune such queries, either limit the number of pages or optimize performance for large offsets.
SQL should not contain too many values for IN
IN is optimized so that MySQL stores all the constants IN an array. If the number is large, the cost will be higher, for example
select name from dual where num in(4.5.6)
Copy the code
For SQL statements like this, don’t use in if you can use between.
Only one piece of data is needed
If only one piece of data is needed, limit 1 is recommended. This makes the type in the execution plan const.
If indexes are not used, try to minimize sorting
Try to use union all instead of union
The main difference between union and Union ALL is that the former requires the result set to be combined before the unique filtering operation, which involves sorting, increases a lot of CPU operations, and increases resource consumption and latency. Of course, union ALL assumes that there is no duplicate data between the two result sets.
Where condition optimization
-
Avoid nulls for fields in WHERE clauses
-
Avoid using in WHERE! = or <> operators
-
Do not use the % prefix fuzzy query, for example, LIKE “%name” or “%name%”. This query will invalidate indexes and result in full table scan. But you can use LIKE “name%”.
-
**select user_id,user_project from table_name where age*2=36 ** **select user_id,user_project from table_name where age=36/2 **
-
It is recommended to determine the type of column in the WHERE clause to avoid conversions if the type of column is inconsistent with the type of the parameters passed in.
When querying, specify the field name as much as possible
In our daily use of SELECT query, try to use the select field name this way, to avoid directly **select **, which increases a lot of unnecessary consumption (CPU, IO, memory, network bandwidth); And the query efficiency is low.
In addition, I bought six PDFS by myself, which spread more than 10W on the whole network. After searching the public account of “programmer Cxuan” on wechat, I replied to Cxuan in the background and got all the PDFS, which are as follows
Get six free PDFS