1, the LIMIT statement
Paging queries are one of the most common scenarios, but they are also often the most problematic. For example, for simple statements like the following, the typical DBA solution is to add a composite index to the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.
Well, maybe 90% or more dbAs solve this problem and that’s it. But when the LIMIT clause becomes “LIMIT 100000010”, programmers still complain: why is it slow when I only fetch 10 records?
The database does not know where the 1000,000th record starts, and even if it has an index it needs to be evaluated from scratch. When performance problems like this occur, most of the time the programmer is lazy.
The maximum value of the previous page can be used as the query condition in scenarios such as front-end data browsing and page turning or big data export in batches. SQL was redesigned as follows:
Under the new design, the query time is basically fixed and will not change with the increase of data volume. Pay attention to the public number ape technology column, reply keywords “interview treasure book” to obtain back-end classic interview questions.
2. Implicit conversion
Mismatching of query variable and field definition types in SQL statements is another common error. Such as the following statement:
Where field BPN is defined as vARCHar (20), MySQL’s policy is to convert strings to numbers before comparison. Function on table field, index invalidated.
These may be parameters that the application framework automatically fills in, rather than what the programmer intended. Now a lot of application framework is very complex, easy to use at the same time it may be careful to dig their own holes.
3. Associated update and deletion
While MySQL5.6 introduces materialization, it’s important to note that it’s currently only optimized for query statements. For updates or deletes, you need to rewrite the JOIN manually.
For example, in the UPDATE statement below, MySQL actually executes a DEPENDENT SUBQUERY, which takes as long as you can imagine.
Execution Plan:
When rewritten as JOIN, the SUBQUERY selection mode changed from DEPENDENT SUBQUERY to DERIVED query, and the execution speed was greatly reduced from 7 seconds to 2 milliseconds
The implementation plan is simplified as:
MySQL cannot mix sorts using indexes. But in some scenarios, there are opportunities to use special methods to improve performance.
The execution plan is displayed as full table scan:
Since IS_Reply has only 0 and 1 states, we rewrote it as follows, reducing the execution time from 1.58 seconds to 2 milliseconds.
5, EXISTS statement
MySQL still uses nested subqueries for the EXISTS clause. SQL statement as follows:
The implementation plan is:
Changing exists to JOIN can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 ms.
New implementation plan:
6. Push under conditions
External query criteria cannot be pushed down to complex views or subqueries:
- Aggregate subquery;
- Subquery with LIMIT;
- UNION or UNION ALL subquery;
- Subqueries in output fields;
In the following statement, you can see from the execution plan that the conditions apply after the aggregate subquery
Make sure that the query conditions can be pushed down semantically and rewritten as follows:
The execution plan becomes:
7. Narrow down ahead of time
Start with initial SQL statement:
The number is 900,000, and the time consumed is 12 seconds.
Since the last WHERE condition and the sort are on the leftmost main table, we can reduce the amount of data in my_order before we do the left join. The SQL is rewritten as follows, and the execution time is reduced to about 1 ms.
Check the execution plan again: subquery materialized (select_type=DERIVED) joins. Although the estimated row scan is still 900,000, the actual execution time becomes very small with the use of indexes and the LIMIT clause.
8. Push down the intermediate result set
Consider the following example, which has been preliminarily optimized (primary table priority function query condition in left join) :
So are there any other problems with this statement? It is not difficult to see that subquery C is a full table aggregation query, which will lead to the performance of the whole statement in the case of a particularly large number of tables.
In fact, for subquery C, the left join result set only cares about the data that can match the main table resourceID. So we can rewrite the statement as follows to reduce the execution time from 2 seconds to 2 milliseconds.
But subquery A appears multiple times in our SQL statement. Not only does this have extra overhead, it also makes the entire statement seem cumbersome. Rewrite again using the WITH statement:
conclusion
The database compiler generates the execution plan, which determines how the SQL is actually executed. But compilers do their best, and all database compilers are not perfect.
Most of the scenarios mentioned above have performance issues in other databases as well. Understand the characteristics of the database compiler, to avoid its shortcomings, write high-performance SQL statements.
Programmers bring algorithmic ideas or awareness to the design of data models and to the writing of SQL statements.
Get into the habit of using the WITH statement when writing complex SQL statements. Concise and clear SQL statements can also reduce the burden on the database.
Phase to recommend
Reduce your code by 90%. These Java libraries are so powerful!
What? SpringBoot projects can also be packaged with EXE applications to run without installing Jre!
Do you know the five ways to create a scheduled task?
The real Optional in Java8 is powerful. Are you using it correctly?