preface
Interview questions from: Social Recruitment one year and a half interview shared (including Ali Meituan Toutiao jingdong Didi)
Common optimization methods of MySQL are divided into the following aspects:
SQL optimization, design optimization, hardware optimization, etc., in which each big direction contains a number of small optimization points
Let’s look at it in detail
The article was first published on an official account (Yuebeifeiyu), and then synchronized to the Nuggets and personal website: xiaoflyfish.cn/
Feel there is harvest, hope to help like, forward ha, thank you, thank you
SQL optimization
The optimization scheme refers to improving the operating efficiency of MySQL database by optimizing SQL statements and indexes. The details are as follows:
Paging optimization
Such as:
select * from table where type = 2 and level = 9 order by id asc limit 190289.10;
Copy the code
Optimization scheme:
-
Delays associated
First, the primary key is extracted through the WHERE condition. Then, the table is associated with the original table, and rows are extracted by the primary key ID instead of the original secondary index
Such as:
select a.* from table a, (select id from table where type = 2 and level = 9 order by id asc limit 190289.10 ) b where a.id = b.id
Copy the code
-
Bookmarks way
Bookmarking basically means finding the primary key that corresponds to the first parameter of limit and then filtering and limiting based on that primary key
Such as:
select * from table where id > (select * from table where type = 2 and level = 9 order by id asc limit 190289.1) limit 10;
Copy the code
The index optimization
Proper use of indexes
If we do not add indexes, then the full table scan will be triggered during the query, so the query data will be a lot, and the query efficiency will be very low. In order to improve the query performance, we need to add corresponding indexes to the most commonly used query fields, so as to improve the query performance
Build coverage index
InnoDB uses secondary indexes to query data back to the table, but if the leaf node of the index already contains the fields to be queried, it does not need to query back to the table. This is called overwriting the index
For example:
select name from test where city='Shanghai'
Copy the code
We build the fields being queried into a federated index so that the query results can be retrieved directly from the index
alter table test add index idx_city_name (city, name);
Copy the code
Prior to MySQL 5.0, the use of OR queries was avoided
Before MySQL 5.0, you should avoid using or queries. You can use union or subqueries instead. In MySQL 5.0, index merges were introduced
Index merging simply means that a multi-condition query, such as an OR or and query, performs a conditional scan of multiple indexes and then merges their results so that no index invalidation occurs
If the value of the type column in the Explain execution plan is index_merge, you can see that MySQL uses index merge to perform the query on the table
Refer to my previous article on the use of Explain: The most complete Explain summary, SQL Optimization is No longer difficult
Avoid using it in WHERE query conditions! = or <> operator
In SQL, the unequal operator causes the query engine to drop the index index, causing a full table scan, even if the field being compared has an index
Workaround: You can use indexes to avoid full table scans by changing the not equal operator to OR
For example, change column<> ‘aaa’ to column> ‘aaa’ or column< ‘aaa’ to use the index
Use prefix indexes appropriately
MySQL supports prefix indexing, which means you can define a portion of a string as an index
We know that the longer the index the greater the amount of disk space, then the same data page can put down the index values of the less, this means that the query time required for the search index is, the longer and the efficiency of the query will reduce, so we can choose the appropriate use of prefix index, in order to reduce the occupied space and improve the query efficiency
For example, if a mailbox suffix is fixed “@xxx.com,” fields with fixed values like these are good candidates for prefix indexes
alter table test add index index2(email(6));
Copy the code
Using a prefix index with a defined length can save space without incurring much additional query cost
MySQL does not use prefix indexes for order by, group BY, or overwrite indexes
Query specific fields instead of all fields
Try to avoid using SELECT * and instead query for the required fields to increase speed and reduce bandwidth pressure on network traffic
Optimized subquery
Try to use Join statements instead of subqueries, because subqueries are nested queries, and nested queries create a temporary table. The creation and destruction of temporary tables occupy certain system resources and take a certain amount of time. At the same time, subqueries that return large result sets have greater impact on query performance
You can refer to my previous article on how to write good Join statements
Small tables drive large tables
If the data in table B is smaller than that in table A, the order of execution is to check table B and then check table A. The specific query statement is as follows:
select name from A where id in (select id from B);
Copy the code
Do not perform operations on columns
Do not perform arithmetic or other expression operations on column fields, as this may cause the query engine to fail to use the index properly, thereby affecting the efficiency of the query
select * from test where id + 1 = 50;
select * from test where month(updateTime) = 7;
Copy the code
An easy pit to tread on: implicit type conversion:
select * from test where skuId=123456
Copy the code
The skuId field has an index, but the explain results show that this statement is a full table scan
The reason is that the character type of skuId is vARCHar (32), and the comparison value is an integer. Therefore, type conversion is required
Add redundant fields appropriately
Adding redundant fields can reduce a large number of linked table queries, because the performance of the linked table query of multiple tables is very low, so we can appropriately add redundant fields to reduce the associated query of multiple tables, which is a space-for-time optimization strategy
Use federated indexes correctly
MySQL database engines that use B+ trees, such as InnoDB, match the data from left to right for each query of compound fields, so you need to pay attention to the order in which indexes are created when creating federated indexes
For example, if we create a federated index idX (name,age,sex), then when we use name + age + gender, name + age, name and so on, the left-most prefix query conditions will trigger the federated index query. However, if a query condition is not a left-most match, such as gender + name, it will not trigger a federated index
The Join optimization
The join statement in MySQL uses the nested-loop join algorithm to join tables. This process is similar to nested loop. In simple terms, it traverses the driver table (outer table) and searches for rows that meet the conditions in the driven table (inner table) after each row is read
To improve the performance of join statements, minimize the number of nested loops
A significant optimization method is to index the join field of the driven table, which can quickly match the corresponding row, avoiding the comparison with each row of the inner table, and greatly reducing the total number of cycles. Another optimization point is to use small result sets to drive large result sets when joining, which can further reduce the number of nested loops on the basis of index optimization
If it is difficult to determine which table is large and which is small, you can use inner join, MySQL will automatically select the small table to drive the large table
You can refer to my previous article on how to write good Join statements
Avoid joining too many tables
For MySQL, there is an associative cache, and the size of the cache can be set by the join_buffer_size parameter
In MySQL, if more tables are joined to the same SQL, an additional association cache will be allocated. If more tables are associated with a SQL, more memory will be occupied
If a large number of programs use the operation of multi-table association, while the JOIN_BUFFer_SIZE setting is not reasonable, it is easy to cause server memory overflow, will affect the stability of server database performance
Order to optimize
Sort by index scan
MySQL produces ordered results in two ways: one is to sort the result set, and the other is to scan the result in order by index
However, if the index does not cover the columns required by the query, it will have to go back to the table every time a record is scanned. This read operation is random IO and is usually slower than a sequential full table scan
Therefore, when designing an index, it is possible to use the same index for both sorting and finding rows
Such as:
-- Create index (date,staff_id,customer_id)
select staff_id, customer_id from test where date = '2010-01-01' order by staff_id,customer_id;
Copy the code
The index can be used to sort results only if the index column ORDER is exactly the same as the ORDER BY clause, and all columns are sorted in the same direction
The UNION optimization
MySQL’s strategy for dealing with union is to create a temporary table first, then populate each query result into the temporary table, and then do the query. Many optimization strategies will fail in union query because it can’t take advantage of indexes
It is best to manually push the WHERE, limit, and so on down into the union’s subqueries so that the optimizer can take full advantage of these conditions for optimization
In addition, use union all unless you really need to de-duplicate the server. If you do not add the all keyword, MySQL will add a distinct option to the temporary table. This will result in an expensive uniqueness check for the entire temporary table
Slow Query logs
The common troubleshooting method for slow query is to use the log function of slow query to query slow SQL statements, query the execution plan of SQL statements through Explain, analyze and locate the root cause of the problem, and then handle the problem
Slow query log indicates that you can configure the slow query log recording function in MySQL. The SQL that exceeds the value of long_query_time will be recorded in the log
Slow query can be turned on by setting “slow_query_log=1”
Note that the slow log function may adversely affect MySQL performance. Therefore, use this function with caution in the production environment
Design optimization
Try to avoid using NULL
Nulls are difficult to handle in MySQL, requiring extra storage space and special operators for operations. Columns containing NULL are difficult to query optimizations
Columns should be specified as not NULL, and null values should be replaced by 0, an empty string, or some other special value, such as int not NULL default 0
Minimum data length
Smaller data type lengths generally require less space in disk, memory, and CPU caches and are faster to process
Use the simplest data type
Simple data type operations are less costly. For example, do not use vARCHar when you can use int because int is more efficient than vARCHAR
Define text types as little as possible
The query efficiency of the text type is low. If you need to use text to define a field, you can separate the field into sub-tables. When you need to query the field, you can use joint query to improve the query efficiency of the main table
Appropriate separate table, separate library strategy
Split table means that when there are more fields in a table, we can try to split a large table into multiple sub-tables, and put the main information with high frequency into the main table and the other into sub-tables. In this way, most of our queries can be completed only with the main table with fewer fields, which effectively improves the efficiency of the query
Branching refers to dividing a database into multiple databases. For example, we split a database into multiple databases, one of which is used for writing and modifying data, and the other is used for synchronizing master data and providing clients with queries. In this way, the pressure of reading and writing of a library is distributed to multiple libraries, thus improving the overall operation efficiency of the database
Common type selection
Integer type width setting
MySQL can specify widths for integer types, such as int(11), but it doesn’t really make sense. It doesn’t limit the range of values. Int (1) and int(20) are the same for storage and computation
VARCHAR and CHAR
The char type is fixed length, whereas vARCHAR stores variable strings, which saves more space than fixed strings, but vARCHAR requires an extra 1 or 2 bytes to record the length of the string and is prone to fragmentation when updated
Select vARCHAR if the maximum string column length is much larger than the average length, or if the column has few updates. Use char if you want to store very short strings, or if string values are all the same length, such as MD5 values, or if column data changes frequently
DATETIME and TIMESTAMP
Datetime has a larger range and can represent the year 1001 to 9999, while timestamp can only represent the year 1970 to 2038. Datetime is time zone independent, timestamp displays a value that is time zone dependent. Both types work fine in most scenarios, but timestamp is recommended because datetime takes up 8 bytes and timestamp only takes up 4 bytes, making the timestamp space more efficient
BLOB and TEXT types
Blob and text are string data types designed to store large amounts of data in binary and character formats, respectively
In practice, the two types should be used with caution because they have low query efficiency. If the two types are necessary, you can separate the field into sub-tables and use joint query to query the field, which can improve the query efficiency of the main table
Fan Shihua
When data is well formalized, less data is modified, and formalized tables are typically smaller, allowing more data to be cached in memory, so operations are faster
The disadvantage is that more association is required when querying
First normal form: Fields are indivisible, supported by the database by default
Second normal form: Eliminate partial dependence on primary keys by adding a field to the table that is independent of business logic as the primary key, such as an increment ID
Third normal Form: By eliminating transitive dependence on primary keys, you can split tables and reduce data redundancy
The hardware optimization
The hardware requirements of MySQL are mainly reflected in three aspects: disk, network, and memory
disk
Disks should be used with high performance read and write capabilities, such as solid-state drives, to reduce I/O time and improve the overall efficiency of MySQL
You can also use multiple small disks instead of one large disk, because the rotational speed of a disk is fixed. Having multiple small disks is like having multiple disks running in parallel
network
To ensure the smooth network bandwidth (low latency) and sufficient network bandwidth are the basic conditions for the normal operation of MySQL. If conditions permit, multiple network adapters can be set up to improve the operating efficiency of MySQL server during network peak hours
memory
The larger the memory of MySQL server is, the more information will be stored and cached, and the performance of memory is very high, thus improving the operating efficiency of MySQL
The last
Feel there is harvest, hope to help like, forward ha, thank you, thank you
Wechat search: month with flying fish, make a friend, into the interview exchange group
The public account responds to 666 backstage, you can get free electronic books
References:
-
High Performance MySQL
-
MySQL Tech Insider: InnodDB Storage Engine