preface

  • How to optimize MySQL database index optimization This question is often asked in the interview, Chen today to sum up.

Several steps of SQL optimization

1. Byshow statusCommand to learn the execution efficiency of various SQL commands

show [session | global] status;
Copy the code
  • You can add parameters as needed to displaysessionLevel (current connection, default) andglobalLevel (since the database was last started).
show status like 'Com_%';  -- Displays the values of all the statistics parameters of the current connection.
Copy the code
  • Com_xxxEach saidxxxThe number of times a statement is executed, usually the following parameters need to be noted:Com_select/Com_insert/Com_update/Com_delete.

2. Locate SQL statements that are inefficient

  • Run the show processList command to view the execution status of the CURRENT SQL server in real time.

  • Locate faults by slowly querying logs.

Through 3.explaindescAnalyze execution plans for inefficient SQL

  • Refer to the Explain execution plan explored by Mysql in the previous article

Through 4.show profileAnalysis of SQL.

  • Show Profile helps us see where our time is going.

  • Show Profiles provides a better understanding of the process of SQL execution;

Through 5.traceHow does the analysis optimizer select an execution plan

  • MySQL5.6Provides a trace of SQLtrace, can help us understand why the optimizer chooses to execute Plan A rather than Plan B, and further understand the optimizer’s behavior.

6. Identify problems and take appropriate optimization measures.

MySQL SQL statement optimization method

  1. Use in where clauses should be avoided! = or <> otherwise the engine will abandon the index for a full table scan.

  2. Queries should be optimized to avoid full table scans, and indexes should be considered on where and order by columns first.

  3. Try to avoid null values for fields in the WHERE clause, as this will cause the engine to abandon the index for a full table scan. Such as:

select id from t where num is null
Copy the code
  • Select * from num where num is null; select * from num where num is null;
select id from t where num=0
Copy the code
  1. Avoid using OR to join conditions in the WHERE clause. If a field has an index and a field has no index, the engine will abandon the index and perform a full table scan instead.

  2. A leading fuzzy query results in a full table scan

  select id from t where name like%c%'Copy the code
  • Let’s use the index
select id from t where name like'c%'Copy the code
  1. Use not in with caution, otherwise it will cause a full table scan. For continuous values, do not use in when you use between. Use exists instead of in.

  2. Using parameters in the WHERE clause also results in a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; It must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is unknown and therefore cannot be used as an input for index selection. The following statement will perform a full table scan:

  select id from t where num=@num
Copy the code
  • You can force the query to use indexes instead:
  select id from t with(index(索引名)) where num=@num
Copy the code
  1. Should be avoided as far as possiblewhereClause to perform expressions and functions or other expression operations on fields that cause the engine to abandon indexes and perform a full table scan. Such as:

Select id from t where num/2=100 select id from t where num/2=100 *2

Select id from t where substring(name,1,3)=’ ABC ‘; Select id from t where name like ‘ABC %’ select id from t where name like’ ABC %’

Select id from t where datediff(day,createdate,’2005-11-30′)=0 — ‘2005-11-30’ Select id from t where createdate = ‘2005-11-30’ and createdate<‘2005-12-01’

  1. Update statements, if only 1 or 2 fields are changed, do not Update all fields, or frequent calls will cause significant performance consumption and a large number of logs.

  2. When an index field is used as a condition, if the index is a compound index, the first field in the index must be used as a condition to ensure that the system can use the index. Otherwise, the index will not be used, and the field order should be as consistent as possible with the index order.

  3. Not all indexes are valid for queries. SQL queries are optimized based on the data in the table. SQL queries may not take advantage of indexes when there is a large amount of duplicate data in index columns. If there is a field sex in the table, and there is almost half male and half female in the table, then even if the index is built on sex, it will not affect the query efficiency.

  4. More indexes are not always better. While indexes can improve the efficiency of the corresponding SELECT, they can also reduce the efficiency of inserts and updates, which may rebuild the index. The number of indexes in a table should not exceed 6.

  5. We should avoid updating clustered index columns as much as possible, because the sequence of clustered index columns is the physical storage sequence of table records. Changes in clustered index columns will result in the adjustment of the sequence of the entire table records, which will consume considerable resources.

  6. Use numeric fields. If fields contain only numeric information, do not use character fields. This reduces query and connection performance and increases storage overhead. This is because the engine compares each character in the string one by one while processing queries and joins, whereas for numeric types it only needs to compare once.

  7. Do not use select * from t anywhere, replace * with a list of specific fields, and do not return any fields that are not needed.

  8. For multiple table joins with a large amount of data, pagination and JOIN should be performed first, otherwise the logical read will be very high and the performance will be poor.

  9. Use table variables instead of temporary tables whenever possible.

  10. Consider using temporary tables to temporarily store intermediate results. Temporary tables are not unusable, and their proper use can make some queries more efficient, for example, when a large table or a data set in a commonly used table needs to be referenced repeatedly. Temporary results are stored in temporary tables and subsequent queries are queried in tempDB. This avoids multiple scans of the main table in the program and greatly reduces the blocking of shared locks during program execution. Update locks are reduced and concurrency performance is improved. However, for one-off events, it is better to use exported tables.

  11. When creating a temporary table, if a large amount of data is inserted at a time, you can use Select INTO instead of create table to avoid creating a large number of logs and improve the speed. If the amount of data is small, to reduce the resources of the system table, create table first, then insert.

  12. If temporary tables are used, you must explicitly delete all temporary tables at the end of the stored procedure, truncate TABLE first, and then DROP table. In this way, system tables cannot be locked for a long time.

  13. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

  14. Avoid cursors because cursors are inefficient. As with temporary tables, cursors are not unusable. Using the FAST_FORWARD cursor for small data sets is generally preferable to other line-by-line processing methods, especially if you have to reference several tables to get the data you want. Routines that include “totals” in the result set are generally faster to execute than those that use cursors.

  15. SET SET NOCOUNT ON at the beginning and SET NOCOUNT OFF at the end of all stored procedures and triggers.

  16. Avoid returning large amounts of data to the client.

  17. Avoid large transaction operations as far as possible to improve system concurrency.

  18. Replace the Having clause with the WHERE clause

Avoid the HAVING clause, which filters the result set only after all the records have been retrieved. This process requires sorting. Limiting the number of records with the WHERE clause can reduce this overhead. On, where, and having are all conditional clauses. On is the first to execute, where is the second, and having is the last.

  1. Use Truncate instead of DELETE

Use Truncate instead of DELETE to delete records in a full table. In general, rollback segments are used to hold information that can be recovered. If you do not COMMIT a transaction,ORACLE restores data to the state before the delete command was executed. With TRUNCATE, the rollback segment does not store any recoverable information. After the command is executed, data cannot be restored. As a result, few resources are called and the execution time is short.

  1. Use table aliases:

When joining multiple tables in an SQL statement, use the alias of the table and prefix the alias to each Column. This reduces parsing time and reduces syntax errors caused by Column ambiguities.

  1. useunion allreplaceunion

When an SQL statement requires two query result sets to be union, the two result sets are combined in a union all manner and sorted before the final result is output. If union all is used instead of material union, sorting is not eliminated in this way, and efficiency will be improved accordingly. Note that UNION ALL prints the same record in both result sets repeatedly.

  1. Where instead of order by:

    The ORDER BY clause uses indexes only under two strict conditions: ① All columns in an ORDER BY must be contained in the same index and remain in the ORDER in the index; All columns in ORDER BY must be defined as non-empty;

SELECT DEPT_CODE FROM DEPT ORDER BY DEPT_TYPE

SELECT DEPT_CODE FROM DEPT WHERE DEPT_TYPE > 0

  1. Avoid type conversions for indexed columns:

Suppose EMP_TYPE is an indexed column of character type. The SELECT… FROM EMP WHERE EMP_TYPE = 123 FROM EMP WHERE EMP_TYPE=’123′; This index will not be used because of internal type conversions! To avoid ORACLE implicitly casting your SQL, it is best to express the casting explicitly. Note that ORACLE preferentially converts numeric types to character types when comparing characters with numeric values.

  1. To optimize the Group by

Improve the efficiency of GROUP BY statements BY filtering out unwanted records before GROUP BY. The following two queries return the same result but the second is significantly faster.

SELECT SAL, AVG(SAL) FROM EMP GROUP by JOB HAVING JOB = ‘PRESIDENT’ OR JOB =’ MANAGER’

SELECT SAL, AVG FROM EMP WHERE JOB = ‘PRESIDENT’ OR JOB =’ MANAGER’ GROUP by JOB

  1. Avoid using resource-intensive operations:

With a DISTINCT, UNION, MINUS, intersects, the ORDER BY the SQL statement will start SQL engine execution cost resources sorting (SORT) function. DISTINCT requires one sort operation, while others require at least two sorts. In general, SQL statements with UNION, MINUS, and INTERSECT can be rewritten in other ways. If your database’s SORT_AREA_SIZE is well tuned, UNION, MINUS, and INTERSECT are also acceptable, as they are very readable.

  1. Use as much as possible in running codePreparedStatementDo not useStatement.

MySQL index optimization

  • About index optimization, the front of the article has been detailed about the 20 iron rules, interested in the article can take you to understand how to optimize the index!

MySQL database optimization goals, common mistakes and basic principles

The optimization goal

  • Two common bottlenecks in MySQL databases are CPU and I/O bottlenecks. CPU saturation usually occurs when data is loaded into memory or read from disk. Disk I/O bottlenecks occur when much more data is loaded than there is memory.

    • Reduce I/O times: I/O is always the database is the most easy bottleneck, which is determined by the responsibility of the database, most of the database operation more than 90% of the time is OCCUPIED by IO operations, reduce I/O times is the FIRST priority in SQL optimization, of course, but also the most obvious optimization means.

    • Reduce CPU computation: In addition to IO bottlenecks, SQL optimization needs to consider CPU computation optimization. Order by, group by,distinct… Both are CPU hogs (these operations are basically the CPU processing of in-memory data comparison operations). When we reach a certain stage of IO optimization, reducing CPU computation becomes an important goal of our SQL optimization.

Common misconceptions about

  1. count(1)andcount(primary_key)Better thancount(*):

Many people use count(1) and count(primary_key) instead of count(*) to count the number of entries, thinking it will perform better, but this is a mistake. For some scenarios, this may be worse because the database has been specifically optimized for count(*) counting operations. In MyISAM, the total number of rows in the table is recorded. Use count (*) to fetch the value directly.

  1. count(column)count(*)Is the same

In fact, count(column) and count(*) are completely different operations, and they mean completely different things. Count (column) is the number of non-null column columns in the result set. Only non-null values are processed. Count (*) indicates how many records there are in the entire result set. Null values are not skipped.

  1. Select a, b from...thanSelect a, b, c from...It allows the database to access less data

    In fact, most relational databases are stored as rows, and data access operations are performed in fixed-size IO units (called blocks or pages), typically 4KB, 8KB… Most of the time, there are multiple rows in each IO cell, and each row stores all of the fields in that row (except for special types such as LOBS). So, whether we take one field or multiple fields, the database actually needs to access the same amount of data in the table.

Of course, there is an exception, that is, our query can be completed in the index, that is, when only a, B two fields, do not need to return to the table, and c is not in the index, need to return to the table to obtain its data. In this case, the I/O amount of the two will be quite different.

  1. Order by must require a sort operation

    We know that the index data is actually an orderly, if we need the order of the data and some index are consistent, and our query execution by the index, the database would omit the sorting operation, and bring the data back directly, because the database know sorting data already meet our requirements. In fact, using indexes to optimize SQL with sorting requirements is a very important optimization tool.

  2. When filesort is in the execution plan, disk files are sorted

    This is not our fault, but the MySQL developers’ problem with words. Filesort is the information we might see in the “Extra” column when viewing an SQL execution plan using the Explain command. In fact, Using filesort will be displayed whenever a SQL statement requires a sort operation. This does not mean that a filesort operation will be performed.

The basic principle of

  1. As far as possible the join

    MySQL’s strength is its simplicity, but in some ways it’s also its weakness. The MySQL optimizer is efficient, but because of its limited amount of statistics, the optimizer’s working process is more likely to be biased. For complex multi-table joins, on the one hand, due to its limited optimizer and insufficient efforts in the aspect of Join, the performance is still far from the relational database predecessors such as Oracle. But for simple single-table queries, the gap is smaller and in some cases better than these database predecessors.

  2. Sort as little as possible

    Sorting consumes a lot of CPU resources. Therefore, reducing sorting greatly affects the SQL response time when the CACHE hit ratio is high and THE I/O capability is sufficient. For MySQL, there are many ways to reduce sorting, such as: optimization by using indexes to sort; Reduce the number of records involved in sorting; It is not necessary not to sort the data.

  3. Try to avoid select *

    The number of fields in the SELECT clause does not affect the number of fields in the select clause. Order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: order by: When our query results only need to be found in the index, the IO is greatly reduced.

  4. Use joins instead of subqueries whenever possible

    Although the Join performance is poor, it still has a significant performance advantage over MySQL’s subqueries.

  5. As far as possible the or

    When there are multiple conditions in the WHERE clause that coexist with “or”, the MySQL optimizer does not solve the problem of execution plan optimization well. In addition, MySQL’s unique HIERARCHICAL structure of SQL and Storage causes its low performance. Many times it is better to use a union all or union (if necessary) instead of an OR.

  6. Try to use union all instead of union

    The main difference between union and Union all is that the former requires the combination of two (or more) result sets and then the unique filtering operation, which involves sorting, increasing a large number of CPU operations, and increasing resource consumption and delay. So use union all instead of union when we can confirm that duplicate result sets are impossible or don’t care about duplicate result sets.

  7. Filter as early as possible

    This optimization strategy is actually most commonly used in index optimization design (putting more filtered fields up front). You can also use this principle in SQL writing to optimize some Join SQL. For example, when we perform paging data query in multiple tables, it is best to filter data and split pages in one table first, and then Join the result set of split pages with other tables. In this way, unnecessary IO operations can be reduced as much as possible and the time consumed by IO operations can be greatly saved.

  8. Avoiding type conversions

    Type conversions refer to conversions that occur when the type of the COLUMN column column in the WHERE clause is inconsistent with the type of the parameter passed in

  9. Prioritize SQL with high concurrency over some “big” SQL with low execution frequency

    High-concurrency SQL is always more destructive than low-frequency SQL, because high-concurrency SQL can overwhelm the system without even giving us a break. And for some SQL that consumes a lot of IO and responds slowly, because of the low frequency, even if encountered, at most let the whole system respond slowly, but at least may support for a while, so that we have the opportunity to buffer.

  10. Optimize from the overall point of view, rather than one-sided adjustment

SQL optimization can not be a single for a certain, but should fully consider all SQL in the system, especially in the optimization of SQL execution plan through the adjustment of the index, do not care about one and lose the other, penny wise and pound foolish.

  1. Explain every SQL running in the database whenever possible

To optimize SQL, you need to know the execution plan of SQL to determine whether there is room for optimization and to determine whether there is an execution plan problem. After a period of optimization of the SQL running in the database, there may be very few obvious problems with SQL and most of them need to be explored. At this time, a large number of EXPLAIN operations are needed to collect execution plans and determine whether optimization is needed.

MySQL database table structure optimization

  • MySQL database is a database based on Row storage, and the database operates IO in page block mode, that is, if we reduce the space occupied by each record, we will increase the number of rows per page. The number of rows that can be accessed per IO increases. On the other hand, when processing data in the same row, fewer pages need to be accessed, that is, fewer I/O operations, which directly improves performance. In addition, since our memory is limited, increasing the number of rows per page is equivalent to increasing the amount of cached data per memory block, and it also increases the chance of a hit in memory swap, also known as the cache hit ratio.

Data type selection

  1. The most time-consuming database operation is THE I/O processing. More than 90% of the time of most database operations is spent on I/O reading and writing. Therefore, minimizing the AMOUNT of I/O reads and writes can greatly improve the performance of database operations. We can’t change what data needs to be stored in the database, but we can put some thought into how that data is stored. The following optimization suggestions for field types are mainly applicable to scenarios with a large number of records and a large amount of data, because elaborate data type Settings may increase maintenance costs, and excessive optimization may cause other problems:

Numeric types: Use DOUBLE only when necessary, not just for length, but for accuracy as well. Similarly, the use of DECIMAL for fixed-precision decimals is not recommended, and conversion to integer storage by fixed multiples is recommended, which can greatly save storage space without incurs any additional maintenance costs. For integer storage, in the case of large data volume, it is recommended to separate the TINYINT/INT/BIGINT options, because the storage space occupied by the three is also very different, to ensure that the use of negative fields, it is recommended to add unsigned definition. Of course, if the data volume of a small database, you can not strictly distinguish between the three integer types.

Int primary key =>4 bytes => 8 bits per byte =>32 bits 64-bit, because the previous system is generally 32 bits, so in the operation of 4 bytes of data is just, the highest efficiency, but now our system is basically 64 bits, in fact, there is no better use of CPU operation, so in the design of the table field recommended to use 8 bytes of primary key Bigint, Instead of using int directly as the primary key.

Character types: Do not use the TEXT data type unless you absolutely have to, because the way it is handled will result in lower performance than char or vARCHar processing. For fields of fixed length, it is recommended to use CHAR. For fields of variable length, use VARCHAR, and only set the appropriate maximum length, rather than arbitrarily set a large maximum length limit, because MySQL will have different storage processing. Char (10) = ‘ABC’; char(10) = ‘ABC’; ‘ABC’ = ‘ABC’; Varchar does not take up space when it is not stored, and it takes up as much space as the data is stored.

Time type: Use TIMESTAMP as much as possible because it requires only half the storage space of DATETIME. For datatypes that only need to be accurate to a particular day, DATE is recommended because it requires only three bytes of storage space, less than TIMESTAMP. It is not recommended to store a Unix TIMESTAMP value through an INT class, as this is unintuitive, incurs unnecessary maintenance trouble, and provides no benefit.

ENUM & SET: For state fields, you can try to store them in ENUM, because the storage space can be greatly reduced, and even if new types need to be added, as long as they are added at the end of the structure, there is no need to rebuild the table data. What if you store predefinable attribute data? Try using the SET type, even with multiple attributes, and save a lot of storage.

LOB type: Strongly against storing LOB type data in databases, although databases provide such functionality, it is not his expertise, we should let the appropriate tools do what he is good at, to make the best of it. Storing LOB data in a database is like asking a marketing professional who learned a little Java in school years ago to write Java code.

Character encoding: The character set directly determines the encoding method of data storage in MySQL. Because different character sets represent the space occupied by the same content, the appropriate character set can help reduce the amount of data as much as possible, and thus reduce the number of I/O operations. (1) There is no need to select a character encoding other than latin1 for the content that can be represented by pure Latin characters, because it will save a lot of storage space; (2) If we can be sure that we don’t need to store multiple languages, we don’t need to use UTF8 or other UNICODE character types, which waste a lot of storage space. ③ the data type of MySQL can be accurate to the field, so when we need to store multi-byte data in a large database, we can use different data types for different fields of different tables to greatly reduce the data storage, and then reduce the number of IO operations and improve the cache hit ratio.

Split properly: There are times when you might want to map a complete object to a database table, which is great for application development, but there are other times when it can cause major performance problems. When we have a large table field such as TEXT or a large VARCHAR, if we do not need the field for most of the table access, we should split it into a separate table to reduce the storage space used for frequently used data. One obvious benefit of this is that the number of pieces of data that can be stored in each data block can be greatly increased, both reducing the number of physical I/OS and greatly increasing the cache hit ratio in memory.

  1. The previous optimization is to reduce the storage space of each record, so that each database can store more records, so as to reduce the NUMBER of I/O operations and improve the cache hit ratio. The following optimization recommendation may be confusing to many developers, as it is typically anti-paradigm design and runs counter to the goals of the above optimization recommendations.

Moderate Redundancy: Why do we have redundancy? Doesn’t this increase the size of each piece of data and reduce the number of records that can be stored in each block? It is true that this will increase the size of each record and decrease the number of entries in each record, but we still have to do this in some cases: In such a scenario, each Join is only to obtain the value of a small field, and the Join record is large, which will cause a lot of unnecessary IO. It can be optimized by exchanging space for time. However, redundancy needs to be accompanied by ensuring that data consistency is not compromised and that redundant fields are updated at the same time as updates.

Use NOT NULL as often as possible: The NULL type is special and SQL is difficult to optimize. Although the MySQL NULL type is different from the Oracle NULL type and will enter the index, if it is a composite index, this NULL type can greatly affect the efficiency of the entire index. Many people think that NULL will save space, so try to use NULL to save IO, but most of the time this will do the opposite. Although there may be some space savings, there are many other optimization problems, not only will not save IO, but also increase THE AMOUNT of SQL IO. It is also a good practice to ensure that the DEFAULT value is not NULL.

MySQL database cache parameter optimization

  • Not very useful. Ignore it

conclusion

  • The most commonly used database optimization methods are: SQL statements and indexes, database table structure, system configuration, hardware.
  • Optimization results: SQL statements and indexes > database table Structure > System Configuration > Hardware, but cost from low to high.
  • Database optimization method summary:
    1. Design databases that conform to the paradigm
    2. Select the appropriate storage engine
    3. SQL statement optimization
    4. Index optimization: create indexes for highly separated fields
    5. SQL table structure, field optimization
    6. Database parameters optimization: IO parameters, CPU parameters
    7. Division table: vertical segmentation and horizontal segmentation
    8. Partition: Stores table data in different partitions according to specific rules to improve disk I/O efficiency and database performance
    9. Master/slave replication and read/write separation: the three main threads are separated from the bin-log file and relay_log file. The master database is responsible for write operations and the slave database is responsible for read operations
    10. Load balancing
    11. Database cluster
    12. hardware
  • More articles welcome to pay attention to wechat public number [code ape Technology column]