preface

Interview questions from: Social Recruitment one year and a half interview shared (including Ali Meituan Toutiao jingdong Didi)

Common optimization methods of MySQL are divided into the following aspects:

SQL optimization, design optimization, hardware optimization, etc., in which each big direction contains a number of small optimization points

Let’s look at it in detail

The article was first published on an official account (Yuebanfeiyu), and then synchronized to the personal website: xiaoflyfish.cn/

Wechat search: month with flying fish, make a friend, into the interview exchange group

  • The public account responds to 666 backstage, you can get free electronic books

Feel there is harvest, hope to help like, forward ha, thank you, thank you

SQL optimization

The optimization scheme refers to improving the operating efficiency of MySQL database by optimizing SQL statements and indexes. The details are as follows:

Paging optimization

Such as:

Select * from table where type = 2 and level = 9 order by id asc limit 190289,10;Copy the code

Optimization scheme:

  • Delays associated

    First, the primary key is extracted through the WHERE condition. Then, the table is associated with the original table, and rows are extracted by the primary key ID instead of the original secondary index

    Such as:

select a.* from table a, (select id from table where type = 2 and level = 9 order by id asc limit 190289,10) b where a.id = b.idCopy the code
  • Bookmarks way

    Bookmarking basically means finding the primary key that corresponds to the first parameter of limit and then filtering and limiting based on that primary key

    Such as:

select * from table where id > (select * from table where type = 2 and level = 9 order by id asc limit 190289, 1) limit 10;
Copy the code

The index optimization

Proper use of indexes

If we do not add indexes, then the full table scan will be triggered during the query, so the query data will be a lot, and the query efficiency will be very low. In order to improve the query performance, we need to add corresponding indexes to the most commonly used query fields, so as to improve the query performance

Build coverage index

InnoDB uses secondary indexes to query data back to the table, but if the leaf node of the index already contains the fields to be queried, it does not need to query back to the table. This is called overwriting the index

For example:

Select name from test where city=' Shanghai 'Copy the code

We build the fields being queried into a federated index so that the query results can be retrieved directly from the index

alter table test add index idx_city_name (city, name);
Copy the code

Prior to MySQL 5.0, the use of OR queries was avoided

Before MySQL 5.0, you should avoid using or queries. You can use union or subqueries instead. In MySQL 5.0, index merges were introduced

Index merging simply means that a multi-condition query, such as an OR or and query, performs a conditional scan of multiple indexes and then merges their results so that no index invalidation occurs

If the value of the type column in the Explain execution plan is index_merge, you can see that MySQL uses index merge to perform the query on the table

Refer to my previous article on the use of Explain: The most complete Explain summary, SQL Optimization is No longer difficult

Avoid using it in WHERE query conditions! = or <> operator

In SQL, the unequal operator causes the query engine to drop the index index, causing a full table scan, even if the field being compared has an index

Workaround: You can use indexes to avoid full table scans by changing the not equal operator to OR

For example, change column<> ‘aaa’ to column> ‘aaa’ or column< ‘aaa’ to use the index

Use prefix indexes appropriately

MySQL supports prefix indexing, which means you can define a portion of a string as an index

We know that the longer the index the greater the amount of disk space, then the same data page can put down the index values of the less, this means that the query time required for the search index is, the longer and the efficiency of the query will reduce, so we can choose the appropriate use of prefix index, in order to reduce the occupied space and improve the query efficiency

For example, if a mailbox suffix is fixed “@xxx.com,” fields with fixed values like these are good candidates for prefix indexes

alter table test add index index2(email(6));
Copy the code

Using a prefix index with a defined length can save space without incurring much additional query cost

MySQL does not use prefix indexes for order by, group BY, or overwrite indexes

Query specific fields instead of all fields

Try to avoid using SELECT * and instead query for the required fields to increase speed and reduce bandwidth pressure on network traffic

Optimized subquery

Try to use Join statements instead of subqueries, because subqueries are nested queries, and nested queries create a temporary table. The creation and destruction of temporary tables occupy certain system resources and take a certain amount of time. At the same time, subqueries that return large result sets have greater impact on query performance

You can refer to my previous article on how to write good Join statements

Small tables drive large tables

If the data in table B is smaller than that in table A, the order of execution is to check table B and then check table A. The specific query statement is as follows:

select name from A where id in (select id from B);
Copy the code

Do not perform operations on columns

Do not perform arithmetic or other expression operations on column fields, as this may cause the query engine to fail to use the index properly, thereby affecting the efficiency of the query

select * from test where id + 1 = 50;
select * from test where month(updateTime) = 7;
Copy the code

An easy pit to tread on: implicit type conversion:

select * from test where skuId=123456
Copy the code

The skuId field has an index, but the explain results show that this statement is a full table scan

The reason is that the character type of skuId is vARCHar (32), and the comparison value is an integer. Therefore, type conversion is required

Add redundant fields appropriately

Adding redundant fields can reduce a large number of linked table queries, because the performance of the linked table query of multiple tables is very low, so we can appropriately add redundant fields to reduce the associated query of multiple tables, which is a space-for-time optimization strategy

Use federated indexes correctly

MySQL database engines that use B+ trees, such as InnoDB, match the data from left to right for each query of compound fields, so you need to pay attention to the order in which indexes are created when creating federated indexes

For example, if we create a federated index idX (name,age,sex), then when we use name + age + gender, name + age, name and so on, the left-most prefix query conditions will trigger the federated index query. However, if a query condition is not a left-most match, such as gender + name, it will not trigger a federated index

The Join optimization

The join statement in MySQL uses the nested-loop join algorithm to join tables. This process is similar to nested loop. In simple terms, it traverses the driver table (outer table) and searches for rows that meet the conditions in the driven table (inner table) after each row is read

To improve the performance of join statements, minimize the number of nested loops

A significant optimization method is to index the join field of the driven table, which can quickly match the corresponding row, avoiding the comparison with each row of the inner table, and greatly reducing the total number of cycles. Another optimization point is to use small result sets to drive large result sets when joining, which can further reduce the number of nested loops on the basis of index optimization

If it is difficult to determine which table is large and which is small, you can use inner join, MySQL will automatically select the small table to drive the large table

You can refer to my previous article on how to write good Join statements

Avoid joining too many tables

For MySQL, there is an associative cache, and the size of the cache can be set by the join_buffer_size parameter

In MySQL, if more tables are joined to the same SQL, an additional association cache will be allocated. If more tables are associated with a SQL, more memory will be occupied

If a large number of programs use the operation of multi-table association, while the JOIN_BUFFer_SIZE setting is not reasonable, it is easy to cause server memory overflow, will affect the stability of server database performance

Order to optimize

Sort by index scan

MySQL produces ordered results in two ways: one is to sort the result set, and the other is to scan the result in order by index

However, if the index does not cover the columns required by the query, it will have to go back to the table every time a record is scanned. This read operation is random IO and is usually slower than a sequential full table scan

Therefore, when designing an index, it is possible to use the same index for both sorting and finding rows

Such as:

Select staff_id from customer_id select staff_id from customer_id customer_id from test where date = '2010-01-01' order by staff_id,customer_id;Copy the code

The index can be used to sort results only if the index column ORDER is exactly the same as the ORDER BY clause, and all columns are sorted in the same direction

The UNION optimization

MySQL’s strategy for dealing with union is to create a temporary table first, then populate each query result into the temporary table, and then do the query. Many optimization strategies will fail in union query because it can’t take advantage of indexes

It is best to manually push the WHERE, limit, and so on down into the union’s subqueries so that the optimizer can take full advantage of these conditions for optimization

In addition, use union all unless you really need to de-duplicate the server. If you do not add the all keyword, MySQL will add a distinct option to the temporary table. This will result in an expensive uniqueness check for the entire temporary table

Slow Query logs

The common troubleshooting method for slow query is to use the log function of slow query to query slow SQL statements, query the execution plan of SQL statements through Explain, analyze and locate the root cause of the problem, and then handle the problem

Slow query log indicates that you can configure the slow query log recording function in MySQL. The SQL that exceeds the value of long_query_time will be recorded in the log

Slow query can be turned on by setting “slow_query_log=1”

Note that the slow log function may adversely affect MySQL performance. Therefore, use this function with caution in the production environment

Design optimization

Try to avoid using NULL

Nulls are difficult to handle in MySQL, requiring extra storage space and special operators for operations. Columns containing NULL are difficult to query optimizations

Columns should be specified as not NULL, and null values should be replaced by 0, an empty string, or some other special value, such as int not NULL default 0

Minimum data length

Smaller data type lengths generally require less space in disk, memory, and CPU caches and are faster to process

Use the simplest data type

Simple data type operations are less costly. For example, do not use vARCHar when you can use int because int is more efficient than vARCHAR

Define text types as little as possible

The query efficiency of the text type is low. If you need to use text to define a field, you can separate the field into sub-tables. When you need to query the field, you can use joint query to improve the query efficiency of the main table

Appropriate separate table, separate library strategy

Split table means that when there are more fields in a table, we can try to split a large table into multiple sub-tables, and put the main information with high frequency into the main table and the other into sub-tables. In this way, most of our queries can be completed only with the main table with fewer fields, which effectively improves the efficiency of the query

Branching refers to dividing a database into multiple databases. For example, we split a database into multiple databases, one of which is used for writing and modifying data, and the other is used for synchronizing master data and providing clients with queries. In this way, the pressure of reading and writing of a library is distributed to multiple libraries, thus improving the overall operation efficiency of the database

Common type selection

Integer type width setting

MySQL can specify widths for integer types, such as int(11), but it doesn’t really make sense. It doesn’t limit the range of values. Int (1) and int(20) are the same for storage and computation

VARCHAR and CHAR

The char type is fixed length, whereas vARCHAR stores variable strings, which saves more space than fixed strings, but vARCHAR requires an extra 1 or 2 bytes to record the length of the string and is prone to fragmentation when updated

Select vARCHAR if the maximum string column length is much larger than the average length, or if the column has few updates. Use char if you want to store very short strings, or if string values are all the same length, such as MD5 values, or if column data changes frequently

DATETIME and TIMESTAMP

Datetime has a larger range and can represent the year 1001 to 9999, while timestamp can only represent the year 1970 to 2038. Datetime is time zone independent, timestamp displays a value that is time zone dependent. Both types work fine in most scenarios, but timestamp is recommended because datetime takes up 8 bytes and timestamp only takes up 4 bytes, making the timestamp space more efficient

BLOB and TEXT types

Blob and text are string data types designed to store large amounts of data in binary and character formats, respectively

In practice, the two types should be used with caution because they have low query efficiency. If the two types are necessary, you can separate the field into sub-tables and use joint query to query the field, which can improve the query efficiency of the main table

Fan Shihua

When data is well formalized, less data is modified, and formalized tables are typically smaller, allowing more data to be cached in memory, so operations are faster

The disadvantage is that more association is required when querying

First normal form: Fields are indivisible, supported by the database by default

Second normal form: Eliminate partial dependence on primary keys by adding a field to the table that is independent of business logic as the primary key, such as an increment ID

Third normal Form: By eliminating transitive dependence on primary keys, you can split tables and reduce data redundancy

The hardware optimization

The hardware requirements of MySQL are mainly reflected in three aspects: disk, network, and memory

disk

Disks should be used with high performance read and write capabilities, such as solid-state drives, to reduce I/O time and improve the overall efficiency of MySQL

You can also use multiple small disks instead of one large disk, because the rotational speed of a disk is fixed. Having multiple small disks is like having multiple disks running in parallel

network

To ensure the smooth network bandwidth (low latency) and sufficient network bandwidth are the basic conditions for the normal operation of MySQL. If conditions permit, multiple network adapters can be set up to improve the operating efficiency of MySQL server during network peak hours

memory

The larger the memory of MySQL server is, the more information will be stored and cached, and the performance of memory is very high, thus improving the operating efficiency of MySQL

The last

Feel there is harvest, hope to help like, forward ha, thank you, thank you

Wechat search: month with flying fish, make a friend, into the interview exchange group

The public account responds to 666 backstage, you can get free electronic books

References:

  • High Performance MySQL
  • MySQL Tech Insider: InnodDB Storage Engine