Today someone told me that count(1) is faster than count(*) in MySQL. You have to have some fun with him.

Disclaimer: The following discussion is based on the InnoDB storage engine, MyISAM, which I will cover separately at the end of this article due to special circumstances.

Conclusion first: There is not much difference between the two properties.

Practice 1.

I prepared a table with 100W pieces of data. The structure of the table is as follows:

CREATE TABLE `user` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(255) DEFAULT NULL,
  `address` varchar(255) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Copy the code

As you can see, there is a primary key index.

Let’s count the number of entries in the table in two ways, as follows:

As you can see, the execution efficiency of the two SQL statements is actually about the same, 0.14s for both.

Let’s look at two more stats:

Id is the primary key, username and address are common fields.

As you can see, using ID for statistics also has an advantage. In Songgo, the test data sample is relatively small, so the effect is not obvious. Friends can increase the test data, so the difference will be more obvious.

So what is the cause of this difference? Let’s make a brief analysis.

2. The explain analysis

Let’s use Explain to look at the different execution plans for SQL:

As can be seen, the execution plan of the first three statistical methods is the same, and the last two are the same.

Let me compare the different items in Explain with you:

  • Type: the type value of the first three indexes is index, indicating that the whole index is scanned. The type value of the latter two is all, indicating a full table scan, that is, no index is used.
  • Key: this indicates that MySQL determines which index to use to optimize access to this table. PRIMARY indicates that the PRIMARY key index is used, and NULL indicates that no index is used.
  • Key_len: This represents the length of the key used by MySQL. Since our primary key type is INT and non-empty, the value is 4.
  • Extra: The Using index in this means that the optimizer only needs to access the index to get the required data (no need to return to the table).

In fact, we can generally see from explain that the execution efficiency of the first three statistical methods is higher (because indexes are used), while the latter two statistical efficiency is relatively lower (there is no index and full table scan is required).

Only the above analysis is not enough, let’s analyze it from the perspective of principle.

3. Principle analysis

3.1 Primary Key Indexes and Common Indexes

Before we begin our analysis of principles, I want to take you through B+ trees, which are important for understanding what we are going to do next.

InnoDB’s primary key index is stored as a B+ tree. InnoDB’s primary key index is stored as a B+ tree. InnoDB’s primary key index is stored as a B+ tree.

You can see that in the primary key index, the leaf node holds the data for each row.

In the normal index, the leaf node stores the primary key value. When we use the normal index to search for data, we first find the primary key in the leaf node, and then use the primary key to search for data in the primary key index, which is equivalent to doing two searches, which is commonly referred to as the back table operation.

3.2 Principle Analysis

I don’t know if you’ve noticed that when we were studying MySQL, the count function was classified as an aggregate function, avg, sum, etc., and the count function was grouped with these, so it’s also an aggregate function.

Since it is an aggregate function, it is necessary to judge the result set returned line by line, which involves the question, what is the result returned? Let’s look at them separately:

Select count(1) from user; For this query, the InnoDB engine will find the smallest index tree to traverse (not necessarily the primary key index), but will not read the data. Instead, it will read a leaf node, return 1, and add up the results.

Select count(id) from user; For this query, the InnoDB engine will traverse the entire primary key index and read the ID and return it, but since the ID is the primary key and is on the leaf of the B+ tree, there is no random IO involved and the performance is OK.

Select count(username) from user; For this query, InnoDB will scan the entire table, read the username field in each row and return it. If username is set to not NULL, then count the number of username directly. If username is not set to not null, check whether username is null.

Select count(*) from user; When MySQL sees count(*), it knows that you want to count the total number of records, so it will find the smallest index tree to traverse and count the total number of records.

Because the leaf nodes of a primary key index (clustered index) are data, whereas the leaf nodes of a normal index are primary key values, the index tree of a normal index is smaller. However, in the case above, we only have primary key indexes, so we end up using primary key indexes.

Now, if I change the table above and add an index to username field as well, then we will look at explain select count(*) from user; Implementation plan of

Select * from username where username = ‘username’;

From the above description, we can see that the first query has the highest performance, the second (because the ID needs to be read and returned), the third (because the full table scan is required), and the fourth is close to the first.

4. MyISAM?

MyISAM select count(*) from user; The operation is very fast because MyISAM stores the number of rows in the table directly to disk and reads them when needed, so it is very fast.

The MyISAM engine does this mainly because it does not support transactions, so its statistics are actually quite easy to add a row of records and a row of records.

Our usual InnoDB can’t do that! Why? Because InnoDB supports transactions! Are introduced in order to support transactions, InnoDB MVCC multi version concurrency control, so there may be dirty when data is read read, illusion and not repeatable read, specific can refer to www.bilibili.com/video/BV14L… Video.

Therefore, InnoDB needs to take out each row of data and determine whether the row is visible to the current session. If so, InnoDB counts the row, otherwise it does not count the row.

Of course, MVCC in MySQL is actually a very big topic, songko will tell you more about MVCC when he has time.

Okay, now do you guys get it? If you have any questions, please leave a comment.