Hello, I’m Xiao Lin.

When we make statistics on the records in a data table, we usually use the count function to make statistics, but the count function passes in many kinds of parameters, such as count(1), count(*), count(field) and so on.

What kind of efficiency is best? Is count(*) the least efficient?

I used to think that count(*) was the least efficient, because cognitively selete * from t would read all the fields in the table, so everything with * would think that it would read all the fields in the table. There were many blogs on the Internet at that time also said that.

However, when I delve into the principle of the count function, I am smacked in the face!

No more talking, let’s go!

Which count performs best?

Which count performs best?

Let me jump to the conclusion:

To understand this, we need to dig into the principle of count, which is based on the common InnoDB storage engine.

What is count()?

Count () is an aggregate function that takes the name of a field or any other expression. It counts the number of records that match the query condition and whose parameter is not NULL.

Suppose the count() function takes a field name as follows:

select count(name) from t_order;
Copy the code

Select * from table T_ORDER where name is not NULL; That is, if the value of the name field in a record is NULL, it will not be counted.

Suppose the count() function takes the number 1 as follows:

select count(1) from t_order;
Copy the code

This statement counts the number of records in the T_ORDER table where 1 is not NULL.

1 is just a number, it’s never NULL, so this statement is actually counting how many records there are in the T_ORDER table.

How is count(primary key field) executed?

The server layer of MySQL maintains a variable named count when counting the number of records through the count function.

The server layer circulates a record to InnoDB. If count is not NULL, the variable count is incremented by 1 until all records that match the query have been read. Finally, the value of the count variable is sent to the client.

InnoDB uses B+ tree to keep records. According to the type of index, it is divided into clustered index and secondary index. The difference between them is that the leaf node of the clustered index stores the actual data, while the leaf node of the secondary index stores the primary key, not the actual data.

Use the following statement as an example:

Select count(id) from t_order;Copy the code

If there is only primary key index in the table and no secondary index, InnoDB loops through the cluster index and returns the read record to the server layer. Then InnoDB reads the id value in the record and determines whether it is NULL. If not, InnoDB increses the count variable by 1.

However, if there are secondary indexes in the table, InnoDB loops over objects that are not clustered indexes, but secondary indexes.

The picture

This is because the same number of secondary index records can take up less storage space than the clustered index records, so the secondary index tree is smaller than the clustered index tree, so the I/O cost of traversing the secondary index is smaller than the I/O cost of traversing the clustered index, so the “optimizer” preferentially selects the secondary index.

How is count(1) executed?

Use the following statement as an example:

select count(1) from t_order;
Copy the code

If there is only a primary key index in the table, but no secondary index.

So InnoDB loops through the cluster index (primary key index) and returns the read record to the Server layer, but does not read the value of any field in the record, because the count function takes 1, not a field, so there is no need to read the field value in the record. The parameter 1 is obviously not NULL, so the server layer increses the count variable every time it reads a record from InnoDB.

As you can see, count(1) is one step less than count(the primary key field) and does not need to read the field value in the record, so it is usually said that count(1) is a little more efficient than count(the primary key field).

However, if there is a secondary index in the table, InnoDB loops over objects that are secondary indexed.

How does count(*) execute?

When you see the * character, do you think you are reading all the fields in the record?

This is true for selete *, but not for count(*).

Count (*) is actually equal to count(0), which means that when you use count(*), MySQL converts the * argument to the 0 argument.

Therefore, the count(*) execution is basically the same as the count(1) execution, with no performance difference.

In the official MySQL 5.7 manual there is a sentence like this:

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

InnoDB handles SELECT COUNT in the same way.*) and SELECT COUNT (1) operation, no performance difference.

MySQL also optimizes count(*) and count(1). If there are multiple secondary indexes, the optimizer will use the smallest secondary index of key_len to scan.

Primary key indexes are used for statistics only when there are no secondary indexes.

How does count(field) execute?

Count (field) is the least efficient compared to count(1), count(*), and count(primary key field).

Use the following statement as an example:

Select count(name) from t_order;Copy the code

For this query, a full table scan is used to count, so it is inefficient to perform.

The picture

summary

Count (1), count(*), count(primary key field) if there is a secondary index in the table, the optimizer selects the secondary index for the scan.

Therefore, if you want to execute count(1), count(*), count(primary key field), try to create a secondary index on the table, so that the optimizer will automatically scan the secondary index with the smallest key_len, which is more efficient than scanning the primary key index.

Also, don’t use count(field) to count the number of records, because it is the least efficient, use full table scan method to count. If you want to count the number of records in the table where the field is not NULL, it is recommended to create a secondary index for the field.

Why do I count by traversal?

You might wonder, why does the count function need to iterate over the number of records?

The previous examples are all based on Innodb storage engine, but in MyISAM storage engine, the way to execute the count function is different, usually without any query condition count(*), MyISAM query speed is significantly faster than Innodb.

With MyISAM, only O(1) complexity is required to execute the count function. This is because each MyISAM table has a meta information that stores the row_count value, which is guaranteed by table-level locking. So reading row_count directly is the result of the count function.

InnoDB storage engine supports transactions, multiple queries at the same time, and due to multi-version concurrency control (MVCC), InnoDB tables “should return how many rows” is also uncertain, so it cannot maintain only one row_count variable like MyISAM.

For example, suppose the table T_ORDER has 100 records and now has two sessions in parallel with the following statements:

Query the total number of records in t_order at the last moment of session A and session B, and you can find that the displayed results are different. So, when using InnoDB storage engine, you need to scan tables to count specific records.

With the WHERE condition, MyISAM is no different from InnoDB in that both need to scan tables to count the number of records.

How do I optimize count(*)?

If you use count(*) constantly on a large table, it’s not good.

Select count(*) from t_order; select count(*) from t_order;

Is there any other better way to deal with the record statistics of large tables?

The first is approximation

If your business does not need to be precise about statistics, for example, the number of search results given by a search engine when searching for keywords is approximate.

At this point, we can use the show table status or explain command to estimate the table.

Executing the EXPLAIN command is efficient because it does not actually query. The rows field values in the figure below are the estimated values of the EXPLAIN command for the table T_ORDER records.

Second, extra tables hold counted values

If we want to get the exact total number of records in a table, we can store the count value in a separate count table.

When we insert a record into the table, we add + 1 to the count field in the count table. In other words, we need to maintain the counter table as we add and delete operations. Mp.weixin.qq.com/s?__biz=MzU…).