In the previous section, we explained how to choose optimal data types and how to use indexes efficiently, which are essential for high performance MySQL. But that’s not enough. You need to design queries properly. If the query is poorly written, even the most well-structured table and well-indexed table will not achieve high performance.
When it comes to MySQL performance optimization, query optimization as the source of optimization, it is also the best indication of whether a system is faster. This chapter and the next few chapters will be about the content of the query performance optimization, emphasis on to introduce some query optimization techniques, help people more deeply understand how MySQL really execute queries where, slow, how to make it fast, and I understand that the cause of the high and low, so that more will help you to better to optimize the query SQL statements.
This chapter starts from “why the query speed is so slow”, so that you can clearly know the query may be slow in which links, so that you will help you better optimize the query, do know, better than others.
Where is slow?
The real measure of query speed is response time. ** If you think of a query as a task, it is made up of a series of subtasks, each of which consumes a certain amount of time. If you want to optimize the query, and actually optimize the subtasks, eliminate some of the subtasks, reduce the number of times the subtasks are executed, or make the subtasks run faster.
What are the subtasks that take the most time to execute a query in MySQL? This requires some tools or methods (such as execution plans) to parse the query to locate where the discovery is slow.
In general, the life cycle of a query can be viewed roughly in order: ** from client to server, where it is parsed, an execution plan is generated, executed, and results are returned to the client. ** Among them, “execution” can be considered as the most important stage in the whole life cycle, which includes a large number of calls to retrieve data to the storage engine and data processing after the call, including sorting, grouping, etc.
At the completion of these tasks, the query need to spend time in different places at different stages of, including network, CPU, generate statistics and execution plan, lock waits, such as operation, especially the call to retrieve data from the underlying storage engine operation, these calls need to be in memory, CPU operation, also may produce a large number of context switches, and system calls.
In each of these operations, a significant amount of time is consumed, and there may be unnecessary additional operations, some of which may be repeated many times, some of which may be performed slowly, and so on. This is where queries can really be slow, and the purpose of optimizing queries is to reduce and eliminate the time spent on these operations.
Through the above analysis, we have an overall understanding of the process of query, can clearly know where the query may have problems, ultimately lead to the whole query is very slow, for the actual query optimization to provide direction.
In other words, query optimization can be done from two perspectives:
- Reduce the number of subqueries
- Reduce extra, repetitive operations
A common reason for poor query performance is that too much data is accessed. When the amount of data is small, the query speed is fine, but once the amount of data is up, the query speed will change dramatically, making people crazy and the experience is terrible. For query optimization, you can perform the following operations:
- Whether unnecessary data is queried
- Whether additional records were scanned
2. Whether unnecessary data is queried
Many times in a real query, the data that is actually needed is queried, and then the excess data is discarded by the application. This is an additional overhead for MySQL and also consumes CPU and memory resources of the application server.
Some typical cases are as follows:
1. Query unnecessary records
This is a common mistake, and often leads to the misconception that MySQL returns only the required data, when in fact MySQL returns the entire result set first.
Developers routinely use SELECT statements to query a large number of results, and then apply the query or front end presentation layer to retrieve the first N rows of data, for example, to query 100 records on a news site, but only to display the first 10 on the page.
The most efficient solution is to query as many records as you need, usually at the end of the queryLIMIT
, that is, paging query.
2. Return all columns when multiple tables are associated
If you want to query all the actors who appear in Academy Dinosaur, do not do this:
select * fromt actor a
inner join film_actor fa.actorId = a.actorId
inner join film f f.filmId = fa.filmId
where fa.title = 'Academy Dinosaur';
Copy the code
This will return all the columns of the three tables, but the actual requirement is to query the actor information, the correct format should be:
select a.* fromt actor a
inner join film_actor fa.actorId = a.actorId
inner join film f f.filmId = fa.filmId
where fa.title = 'Academy Dinosaur';
Copy the code
3. Always query all columns
Every time I seeselect *
Do you really need to return all columns?
In most cases, no. Select * results in a full table scan, which makes it impossible for the optimizer to do optimizations like index scans, and too many columns can cause additional I/O, memory, and CPU consumption on the server. Even if you really need to query all columns, you should list all columns one by one rather than *.
4. Repeatedly query the same data
If you’re not careful, it’s easy to make the mistake of executing the same query over and over again and then returning exactly the same data each time.
For example, if you need to query the URL of the user’s profile picture in the user’s comment section, the user may repeatedly query this data when making multiple comments. It is better to cache this data for the first query and then fetch it directly from the cache for subsequent use.
3. Whether additional records are scanned
Once you have determined that the query queries only the data you need, you should then look to see if too much data is being scanned during the query. For MySQL, the simplest three metrics to measure query overhead are as follows:
- The response time
- Number of rows scanned
- The number of rows returned
No metric is a complete measure of the cost of a query, but they can give you a rough idea of how much data needs to be accessed to perform queries internally, and can give you a rough idea of how queries are actually running. All three of these metrics are recorded in the MySQL slow log, so checking the slow log is the way to find a query that scans too many rows.
Slow query: records statements whose response time exceeds the threshold (long_query_time, 10s by default) in the MySQL database. Slow queries are recorded in slow logs. Slow query can be enabled with the slow_query_long variable, which is disabled by default, and slow logs can be logged to the table slow_log or to a file for review and analysis.
1. Response time
Response time is the sum of two parts: service time and queue time. Service time is how long it actually takes the database to process the query. Queuing time is the time that the server is waiting for some resource without actually executing a query, whether it is waiting for I/O operations, row locks, and so on.
There is no consistent pattern or formula for response times under different types of application pressures. Many factors such as storage engine locks (table locks, row locks), high concurrency resource contention, hardware responses, and so on can affect response time, so response time can be either the result of a problem or the cause of a problem, depending on the case.
When you see the response time for a query, you first need to ask yourself whether the response time is a reasonable value.
2. Number of scanned rows and number of returned rows
When analyzing a query, it is very helpful to look at the number of rows scanned by the query and, on top of that, to analyze whether additional records have been scanned.
This metric may not be perfect for spotting bad queries, because not all rows have the same access cost. Short rows are much faster to access, and rows in memory are much faster to access than rows on disk.
** Ideally, the number of rows scanned and returned should be the same. For example, when doing an associative query, the ratio of rows scanned to rows returned is usually very small, usually between 1:1 and 10:1, although sometimes this value can be very large.
3. Number of scanned rows and access type
When evaluating the cost of a query, consider the cost of finding a row of data from a table. MySQL has several ways of accessing it to find and return a row of results. These accesses may require many rows to return a single result, and some may return results without scanning.
Executing the planEXPLAIN
In the statementtype
Columns reflect the access type. There are many types of access, ranging from full table scan to index scan, range scan, unique index, constant index, etc. So these things right here, the speed goes from slow to fast, and the number of rows scanned goes from high to low.
If the query cannot find a suitable access type, the best solution is usually to add a suitable index, which is the problem we discussed earlier. Now you can see why indexes are so important for query optimization. Indexes allow MySQL to find the records it needs in the most efficient way with the least number of rows scanned.
If you find that a query scans a large amount of data but returns only a few rows, you can usually try the following techniques to optimize it:
- An index override scan is used to place all required columns into the index so that the storage engine can return results without going back to the table to retrieve the corresponding row.
- Optimize the table structure. For example, use separate summary tables to complete queries.
- Rewrite complex queries so that the MySQL optimizer can execute the query in a more optimized manner.