This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Hello everyone, I am small black, August and September busy turn, abandoned to share the event, recently eased, step up to fill.

background

If you are concerned about the online system database slow query log, I believe that some COUNT() statements are the most common occurrence, today and we share how to optimize the COUNT() statement.

The basic principle of the COUNT() function

Let’s take a look at the basics of COUNT().

COUNT() is essentially a built-in MySQL function that takes the name of a field in a database table as an input. There are two special uses of COUNT(1) and COUNT(*).

If you are passing a field in a database table, COUNT() counts the number of non-empty rows in the database table for that field. If it is COUNT(1) and COUNT(*), MySQL will directly COUNT the number of rows in the database table.

In addition, different storage engines differ in detail.

MyISAM, for example, directly records the number of rows in the database table. If you execute COUNT(1), COUNT(), you can get the result quickly (without the WHERE condition of course). InnoDB does not store the number of rows of a database table in order to support transactions, so executing COUNT(1)/COUNT() often requires a full table scan.

In daily work, we usually use the InnoDB engine, because InnoDB engine does not record the number of rows in the database table, and often also carries the WHERE condition. If the data volume is large, the number of rows scanned will be large, and the COUNT(1)/COUNT(*) statement will naturally slow.

COUNT(*), COUNT(1), COUNT(primary key ID), COUNT(col)

So how do you choose from all these different uses of COUNT()? (InnoDB engine only)

  • COUNT(col) is used to COUNT the number of non-null rows in a column
  • COUNT(primary key ID) is the same as COUNT(col), but the primary key ID is NOT NULL, so MySQL has optimized this step
  • COUNT(1) is the same as COUNT(*), and it is iterated, but it is just pulled out of the storage engine layer, which is better

Count (col) < count(primary key id) < count(1) = count(*)

Scenes with low precision requirements

So how do you optimize? This should be divided into scenes, if the accuracy of the scene is not high. For example, baidu shows the query results of the entry scenario, the user does not need to know the specific number of lines, know about the line.

For these scenarios, you can start by looking in MySQL. The rows field (third from right to left) is the estimated number of rows in the database table (MySQL itself uses this value to select optimization options).

In fact, if the write operation is ordered, rarely do random delete operation, the estimate of this value is relatively accurate, of course, for the sake of insurance, can be set up a special scheduled task on the database table: analyze table T, let MySQL resampling the statistical base.

Get the rows in the City table and start looking in
mysql> EXPLAIN SELECT * FROM city;

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+

| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+

|  1 | SIMPLE      | city  | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  600 |   100.00 | NULL  |

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+

1 row in set.1 warning (0.01 sec)
Copy the code

Scenes with high precision requirements

What about high precision?

This kind of field can be implemented based on Redis. If the data is inserted/deleted, the redis will be changed synchronously. The disadvantage of this scheme is that if the data is rolled back, the redis data will be inconsistent with the number of MySQL rows.

You can also use MySQL to design a count table to count quantities. If data is inserted/deleted, the counter table is updated within the same transaction, which has the disadvantage of phantom reading problems.

conclusion

Just to conclude.

  1. For performance, count(col) < count(primary key ID) < count(1) = count(*). It is recommended to use count(*) in daily work. It is the standard row count syntax defined in SQL92.

  1. If I want to optimize COUNT(*),The first step is to identify the scene. Precision, simplicity, and speed are two of the three.
    • To performance, to cost, not precision: show table status
    • Performance, precision, cost, whatever: Design a statistical mechanism for counting (pay attention to consistency in concurrent cases and rollback of things)
    • Cost, intensive reading, no performance: Then don’t say anything, old honest practical COUNT(*)

Thanks for reading, thanks.

Ref

  1. Geek time, 14 | count (*) so slow, what should I do?

Time.geekbang.org/column/arti…

  1. The COUNT statement in MySQL can be abused by the interviewer. ?

Mp.weixin.qq.com/s/IOHvtel2K…