When optimizing problematic queries, we need to change the way we get query results — but that doesn’t mean we get the same result set from MySQL. Sometimes we can convert a query to a better performance query that gets the same results. However, we also need to consider rewriting the query to get different results, as this can improve development efficiency. You can also modify the application code to achieve the same effect. This article will cover tips on how to rewrite queries.

Complex query and step query

An important query design topic is whether it is better to decompose complex queries into simple ones. Traditional database design emphasizes solving a lot of work with as few queries as possible. In the past, this approach would have been better. This is due to the higher cost of network communication and the overhead of considering query parsers and optimizers.

However, this advice does not apply very well to MySQL, due to the efficient way MySQL handles connection establishment and disconnection, and its quick response to simple queries. Today’s Internet speeds are also much faster than before. Depending on the server version, MySQL can run more than 100,000 simple queries per second on a normal machine and 2,000 query communications per second on a gigabit network. Therefore, doing distributed queries is not as bad as it used to be.

The connection response is still slow compared to the number of data rows traversed per second. In memory data, this time reaches the millisecond level. Of course, it’s still a good idea to use as many queries as possible. However, sometimes we can improve performance by splitting complex queries into several simple ones. We’ll show you some examples.

Using too many queries is a common mistake in programming. For example, some applications perform 10 separate queries to fetch 10 rows of data (using a loop to fetch one row of data) that could have been done by querying 10 rows of data in one query. Therefore, this is not an advocate of splitting queries every time, but rather on a case-by-case basis.

Shard query statement

Another approach is to split the query and recombine it. Reduce the number of rows affected at a time by splitting large queries into smaller queries.

Cleaning old data is a classic example. Periodic data cleaning requires the removal of large amounts of data, which locks up large numbers of rows for long periods of time. This operation also generates transaction logs, consumes a lot of resources, and blocks queries with small data volumes that should not be interrupted. After the DELETE statement is shred, using a medium-size query can significantly improve performance and reduce the additional delay caused by repeated queries when the query is repeated. For example, the following delete statement:

DELETE FROM messages WHERE created < DATE_SUB(NOW(), INTERVAL 3 MONTH);
Copy the code

The pseudo-code applied looks like this:

rows_affected = 0
do {
  rows_affected = do_query (
  "DELETE FROM messages WHERE created < DATE_SUB(NOW(), INTERVAL 3 MONTH) LIMIT 10000")}while rows_affected > 0
Copy the code

Deleting 10,000 rows at a time is a large enough task to improve the efficiency of each query. A short enough task will reduce the impact on the server (the transaction storage engine will benefit). It is also a good idea to insert some sleep time in DELETE statements to spread the load over time and shorten the duration of lock holding.

Disassemble federated queries

Many high-performance applications disassemble federated queries. You can do this by splitting the federated query into multiple single-table queries and then combining the results in your application. Such as:

SELECT * FROM tag
	JOIN tag_post ON tag_post.tag_id=tag.id
  JOIN post ON tag_post.post_id=post.id
WHERE tag.tag='mysql';
Copy the code

You can split the federated query as follows.

SELECT * FROM tag WHERE tag='mysql';
SELECT * FROM tag_post WHERE tag_id=1234;
SELECT * FROM post WHERE post.id IN (123.456.567.9098.8904);
Copy the code

Note: tag_id=1234 and post.id IN (123, 456, 567, 9098, 8904) are based on the results of the previous query. Why would you do that? At first glance it may seem unnecessary — it just increases the number of queries. However, this rebuilt query can bring the following advantages:

  • A caching mechanism would be more effective. Many applications directly use ORM to map data tables. In this example, if the object with the mysql tag is already cached, the first query is skipped. If the posts id is 123,567 or 9908 IN the cache, you can remove them from the IN list. With this strategy, query caching benefits accordingly. If only one of the tables changes frequently, dismantling the federated query can reduce the number of cache invalidations.
  • Executing these queries individually can sometimes reduce the chance of locking tables.
  • This makes it easy to scale the database and put the tables on different machines.
  • The query itself can be optimized. IN this example, it might be better for MySQL to sort row ids and retrieve data rows by using the IN query instead of the union query.
  • Redundant row accesses can be reduced. Using this approach means that row fetching is done only once, whereas it is possible to fetch the same data repeatedly in a federated query. For this reason, it is also possible to reduce the overall network load and memory footprint.
  • As an extension, it is possible to replace the nested loop of MySQL federated queries by artificially running hash federated queries, which may also be more efficient.

Ultimately, it can be seen that by dismantling federated queries, cache reuse can be improved, multi-server distributed data schemes can be simplified, and IN queries can be used IN large data tables instead of federated queries or multiple repeated queries of the same table.