The optimization goal
1. Reduce IO times IO is always the database is the most easy bottleneck, which is determined by the responsibility of the database, most of the database operations in more than 90% of the time is OCCUPIED by IO operations, reduce IO times is the FIRST priority in SQL optimization, of course, but also the most obvious optimization means.
2. Reduce CPU calculation In addition to the IO bottleneck, SQL optimization needs to consider the AMOUNT of CPU calculation optimization. Order by, group by,distinct… Both are CPU hogs (these operations are basically the CPU processing of in-memory data comparison operations). When we reach a certain stage of IO optimization, reducing CPU computation becomes an important goal of our SQL optimization
• Optimization method
1. Change the SQL execution plan Once the optimization goal is clear, we need to determine how to achieve our goal. For SQL statements, the way to achieve the above two goals is actually only one, that is to change the EXECUTION plan of SQL, let him as far as possible “less detour”, as far as possible through a variety of “shortcut” to find the data we need, in order to achieve the goal of “reduce IO times” and “reduce CPU calculation”
• Common misconceptions
1. Count (1) and count(primary_key) are better than count(*) Many people use count(1) and count(primary_key) instead of count(*) for better performance, This is actually a mistake. For some scenarios, this may be worse because the database has been specifically optimized for count(*) counting operations.
2. Count (column) is the same as count(*) This myth is common even among senior engineers and DBAs, and many of them take it for granted. In fact, count(column) and count(*) are completely different operations, and they mean completely different things. count(column) indicates the number of non-empty column records in the result set. count(*) indicates the number of records in the entire result set
3. Select a and b from… Select a, B,c from… You can make the database access less data This misconception mainly exists among a large number of developers, mainly because of the database storage principle is not very well understood. In fact, most relational databases are stored as rows, and data access operations are performed in fixed-size IO units (called blocks or pages), typically 4KB, 8KB… Most of the time, there are multiple rows in each IO cell, and each row stores all of the fields in that row (except for special types such as LOBS). Therefore, whether we take a single field or multiple fields, the database actually needs to access the same amount of data in the table. Of course, there are exceptions, that is, our query can be completed in the index, that is, when only a, B two fields, do not need to return to the table, and c is not in the index, need to return to the table to obtain its data. In this case, the I/O amount of the two will be quite different.
We know that index data is actually ordered. If we need the data in the same order as an index, and our query is executed through the index, the database will omit the sorting operation and return the data directly. Because the database knows that the data already meets our sorting requirements. In fact, using indexes to optimize SQL with sorting requirements is a very important optimization tool
This error is not our fault, but the MySQL developers have a problem with the use of words. Filesort is the information we might see in the “Extra” column when viewing an SQL execution plan using the Explain command. in fact, whenever a SQL statement requires a sort operation, “Using filesort” will be displayed. This does not mean that a filesort operation will be performed.
• Basic principles
1. Minimize join MySQL’s advantage is its simplicity, but this is actually its disadvantage in some ways. The MySQL optimizer is efficient, but because of its limited amount of statistics, the optimizer’s working process is more likely to be biased. For complex multi-table joins, on the one hand, due to its limited optimizer and insufficient efforts in the aspect of Join, the performance is still far from the relational database predecessors such as Oracle. But for simple single-table queries, the gap is smaller and in some cases better than these database predecessors.
2. Sort as little as possible Sort operations consume a lot of CPU resources, so reducing sort can greatly affect the SQL response time when the CACHE hit ratio is high and I/O capacity is sufficient. For MySQL, there are many ways to reduce sorting, such as:
• Optimization by sorting by index, as mentioned in the myth above
• Reduce the number of records participating in the sort
• It is not necessary not to sort data
•…
3. Try to avoid select * many people see this after feel more difficult to understand, the above is not the error just said select clause in the number of fields does not affect the read data? Yes, most of the time it does not affect the amount of IO, but when we have an order by operation, the number of fields in the SELECT clause can greatly affect our sorting efficiency. In addition, the above misunderstanding is not also said, but most of the time it does not affect the IO amount, when our query results only need to be found in the index, or will greatly reduce the IO amount.
4. Try to use join instead of subquery Although join performance is not good, it still has great performance advantages compared with MySQL subquery. MySQL has had major problems with its subquery execution plan for many years, but it has been prevalent in all stable releases so far with little improvement. While officials have long acknowledged the problem and promised to fix it, at least so far we haven’t seen a version that addresses the issue.
5. Minimize or When there are multiple conditions in the WHERE clause with “or” coexist, the MySQL optimizer does not solve its execution plan optimization problem well, coupled with MySQL’s unique SQL and Storage hierarchical architecture, resulting in its low performance. Many times it is better to use union all or union (if necessary) instead of “or”.
6. Try to replace union with union all The main difference between union and union all is that the former requires the combination of two (or more) result sets and then the unique filtering operation, which involves sorting, increases a large number of CPU operations, and increases resource consumption and delay. So use union all instead of union when we can confirm that duplicate result sets are impossible or don’t care about duplicate result sets.
7. Filter as early as possible. This optimization strategy is most commonly used in the optimization design of indexes (putting the more filtrable fields higher up). You can also use this principle in SQL writing to optimize some Join SQL. For example, when we perform paging data query in multiple tables, it is best to filter data and split pages in one table first, and then Join the result set of split pages with other tables. In this way, unnecessary IO operations can be reduced as much as possible and the time consumed by IO operations can be greatly saved.
8. Avoid type conversions When the type of the column column column in the WHERE clause is inconsistent with the type of the parameter passed in:
• The index cannot be used by MySQL (in fact, other databases have the same problem). If the index must be converted, it should be converted on the parameter passed in
If the data type we pass in is inconsistent with the field type, and we have not done any casting, MySQL may cast our data itself, or it may not process the data and the storage engine will process it. In this case, The index becomes unusable, causing execution plan problems.
9. Prioritize high concurrency SQL, not low execution frequency some “large” SQL for destructive, high concurrency SQL will always be greater than low frequency, because high concurrency SQL once problems occur, even will not give us any respite will be overwhelmed system. And for some SQL that consumes a lot of IO and responds slowly, because of the low frequency, even if encountered, at most let the whole system respond slowly, but at least may support for a while, so that we have the opportunity to buffer.
10 from the global optimization, rather than one-sided adjustment SQL optimization can not be a separate for a certain, and should fully consider all SQL in the system, especially in the execution plan through the adjustment of the index optimization SQL, do not care about this, penny wise and pound foolish.
As far as possible for each SQL running in the database to explain optimization SQL, need to do know, know the SQL execution plan to determine whether there is room for optimization, to determine whether there is an execution plan problem. After a period of optimization of the SQL running in the database, there may be very few obvious problems with SQL and most of them need to be explored. At this time, a large number of EXPLAIN operations are needed to collect execution plans and determine whether optimization is needed.