When faced with an under-optimized or poorly performing SQL statement, the common thought is to refactor the SQL statement so that the result set of the query remains the same as before, and hopefully improve SQL performance. When refactoring SQL, there are usually some methods and techniques for reference. This article will show you how to use these techniques to reconstruct SQL.

First, decompose SQL

Sometimes for a complex SQL, the first thing we think about is whether we need to split a complex SQL into multiple simple SQL to achieve the same business processing results.

In the past, people always emphasized the need to the database layer as much as possible to complete the work, and it is easy to understand in some old products, projects, often see many super, super long SQL statements, complex logic thought to do it many times interactions, the network bandwidth and network communication between application and database is a costly thing. And now, both in terms of bandwidth and latency, the network is much faster than it used to be, and multiple interactions are not too much of a problem. Even on a generic server, you can run over 100,000 queries per second, so running multiple small queries is not a big deal now.

Decomposition of complex SQL, the performance improvement is particularly significant in the face of super complex SQL statements. Therefore, when faced with super complex SQL statements and performance problems, it is recommended to break down into small queries for optimization.

However, in application design, if a query is competent and does not cause performance problems, it can be done with a slightly more complex SQL, if the rigid force to break up into multiple small queries is not wise.

** In many high-performance application systems today, it is highly recommended to use a single table operation, and then associate the query results of a single table in the application to meet the query requirements of complex business. ** Why write it separately when one SQL will do the job and you have to execute the SQL query multiple times in your application and then associate the result sets?

At first glance, this may seem complicated and unhelpful, as one query becomes multiple. In fact, this decomposition has the following advantages:

  • Make caching more efficient. In the application, it is very convenient to cache the result object corresponding to the query result of a single table, so that the data can be obtained directly from the result object at any time later.
  • By decomposing the query, a single query can reduce table lock contention.
  • Associated in the application layer, it is easier to split the database, easier to achieve high performance and scalability.
  • The efficiency of single-table query is higher than that of multi-table complex query.
  • Reduce the search for redundant records. Association at the application layer means that only one query is required for a record application, while associated query in the database may require repeated access to some data records. From this point of view, such refactoring may also reduce network and memory consumption.

Query sharding

Sometimes for a large query, that is, a query with a large result set, we need to adopt the idea of “divide and conquer”, dividing the large query into smaller queries, each of which performs exactly the same function, but only completes a small portion of the query, and only returns a small portion of the query results each time. In plain English, the filter scope of the WHERE condition is shred, and only one part of the data is queried at a time, similar to paging queries.

This is a small overhead, both for the SQL query itself and for the upper level business. The most typical case is paging query, currently all kinds of frameworks have a good support, such as: MyBatis, only in the actual use of a little attention can be avoided.

Three, implementation plan

Using the execution plan EXPLAIN keyword lets us know how MySQL executes SQL statements, which can help us analyze performance bottlenecks in our query statements or table structures. The EXPLAIN query results will also tell us how index primary keys are utilized and how tables are searched or sorted…. And so on.

The syntax format is:

EXPLAIN SELECT statement;

Executing the plan results will guide us to further refactor SQL statements, such as adding indexes, reordering indexes, avoiding certain functions, and so on.

The implementation plan will be explained in detail in a separate section.

4. Abide by principles

Some SQL performance problems can be largely avoided by developing good habits and paying more attention when writing SQL. Summary is as follows:

  • Always set an ID primary key for each table.

  • Avoid using SELECT *.

  • Index the search field.

  • Use columns of the corresponding type when joining a table and index them.

  • Use NOT NULL whenever possible.

  • The smaller the column, the faster it will be.

  • LIMIT 1 is used when there is only one row of data.

  • Operator optimization, as far as possible not to use bad operators index, the purpose is to avoid full table scan.

    1) In and not in should be used cautiously. Try to use between instead of in and not exists instead of not in

    2) Is null and is not null with caution

    3)! Use the = or <> operators at all costs, otherwise it will cause the engine to abandon the index for a full table scan.

Use the query cache

When many of the same queries are executed multiple times, the query results are placed in a cache so that subsequent queries can access the cached results without doing anything.

The MySQL query cache holds the complete results returned by the query. When a query hits the cache, MySQL like returns the result, skipping parsing, optimization, and truncation.

This is one of the most effective ways to improve query performance, and it is handled by the MySQL engine. Generally, MySQL does not enable query caching by default, so you need to manually enable it.

The query cache is completely transparent to the application. The application does not need to care whether MySQL returns the results from a query or from an actual execution. In fact, the results are exactly the same. In other words, query caching does not require any syntax.

As today’s general-purpose servers become more powerful, query caching is found to be a factor in server scalability. It can become a single point of contention for resources across the entire server and, in the case of multi-core servers, can lead to server death. Therefore, query caching should be disabled by default most of the time. If query caching is very useful, you can configure a small cache space of tens of megabytes. (When choosing, there are trade-offs.)

You can configure the following parameters for the query cache:

  • query_cache_type

    Whether to enable query cache. The value can be OFF, ON, or DEMAND. DEMAND indicates that only statements explicitly written into sql_cache are placed in the query cache.

  • query_cache_size

    Total memory space, in bytes, used by the query cache. The value must be an integer multiple of 1024; otherwise, the actual allocated data will be different from the specified size.

  • query_cache_min_res_unit

    The smallest unit of memory allocated in the query cache.

  • query_cache_limit

    Maximum cached query result. If the query result is greater than this value, it will not be cached. Since the query cache attempts to cache data at the time the data is generated, MySQL does not know if the query results exceed the limit until all the results are returned.

Query caching will be covered in detail in a separate section.