Why count(*) is so slow?

takeaway

In the development of a table will be used to count the number of rows, such as a trading system, the boss will let you generate a report every day, these statistics information is indispensable in the SQL count function.
But as more records are recorded, the speed of the query becomes slower and slower. Why is this so? What is the process inside Mysql?
Today’s article will take a closer look at how the count function is handled within Mysql.

How count is implemented

The count function is implemented differently by different storage engines in Mysql.
MyISAM keeps the total number of rows in a table on disk, so count(*) returns that number, which is efficient (no WHERE query).
The InnoDB engine does not store the total directly on disk. Instead, it reads the data line by line when the count(*) function is executed and then aggregates the total.

InnoDB supports transactions. Transactions are isolated. If you store totals, how can you ensure the consistency of totals between transactions? Look at the picture below:

A transactionandTransaction BIn thecount(*)The results of InnoDB’s execution vary, so the number of rows returned by the InnoDB engine per transaction is indeterminate, and can only be read line by line to determine the total.

inInnoDBFor how to improvecount(*)Query efficiency, there are a variety of solutions online, here mainly introduce three, and analyze the feasibility.

The show table status command can quickly query the number of rows in each table in the database, but can it really replace count(*)?
The answer is no. The reason for this is simple: the value calculated by this command is an “estimate” and therefore inaccurate, with the official documentation stating that the error is somewhere between 40% and 50%.
So this method passes directly, so why use it if it’s not accurate.

This method is also the easiest to think of, add a row +1, delete a row -1, and the cache system is fast to read, both simple and convenient why not?
The cache system and Mysql are two different systems, such as Redis and Mysql, which are typically compared. The most difficult thing about the two systems is that they cannot guarantee data consistency under high concurrency. Let’s take a look at the following two figures:

With the above two graphs, either the redis count +1 or the insert into user first results in logically inconsistent data. In the first graph, the redis count is low, and in the second graph, the count is correct but the inserted row is not queried.
In a concurrent system, it is impossible to precisely control the execution time of different threads because of the sequence of operations shown in the figure, so we can say that even if Redis works properly, the count is still logically imprecise.

Through the analysis of the cache system, we learned that the use of cache can not ensure the consistency of data in logic, so we thought of directly using the database to save, with the support of “transaction”, it will ensure the consistency of data.
How do you use it? Very simple, just store the count in a table (table_name,total).
As for the logic executed, just change the redis count +1 in the cache system to total field +1, as shown below:

Count () is an aggregate function that evaluates the returned result set line by line, incrementing the total if the count function argument is not NULL, otherwise it is not incremented. Finally returns the cumulative value.
Count can be used in several ways: count(*), count(field), count(1), count(primary key ID). So what’s the difference? Of course, “provided there is no WHERE condition statement.”
Count (ID) : The InnoDB engine will traverse the entire table and fetch the ID of each row and return it to the server layer. The server layer gets the ID, determines that it cannot be empty, and adds it up by row.
Count (1) : InnoDB engine traverses the entire table, but does not take a value. For each row returned, the server layer adds the number 1 to it, judging that it cannot be empty, and adds it up.
Count (field) : if the “field” is defined as not null, read the field from the record line by line.
If the field definition is allowed to be NULL, then the value must be extracted to determine if it is not null.
Count (*) : does not fetch all fields, but is optimized and does not take a value. Count (*) must not be null.
So the conclusion is simple: “In order of efficiency, count(field)
“Note” : if count(id) is not an index, why is the query efficiency similar to other indexes? Chen here to explain, although the index, but still need to scan line by line to count the total.

conclusion

MyISAM tables, while count(*) is fast, do not support transactions;
The show table status command returns quickly but inaccurately.
InnoDB directly count(*) traverses the entire table (without where conditions), which is accurate but can cause performance problems.
Although the storage count of the cache system is simple and efficient, it cannot guarantee the consistency of data.
Database save count is very simple, but also to ensure data consistency, recommended use.