COUNT(*); COUNT(1); COUNT(primary key); COUNT(field); There’s a lot of debate about which way MYSQL executes faster, and I didn’t know which way was faster or who was right (laughing and crying). Fortunately, I recently devoted a section of my MySQL column on Geek Time to discuss this issue, and I was able to clear up the confusion for a long time.
MyISAM engine records the total number of rows of a table in the disk, which is very efficient when querying (if the where condition is added, it cannot be returned directly from the disk). For InnoDB, due to multi-version concurrency control (MVCC), even if InnoDB tables are queried at the same time, it is not certain how many rows should be returned. For example, if there are 10000 rows in table T:
moment | The session A | Session B | Session C |
---|---|---|---|
T1 | begin; | ||
T2 | select count(*) from t; | ||
T3 | Insert into t; | ||
T4 | begin; | ||
T5 | Insert into t; | ||
T6 | select count(*) from t; (Return 10000) | select count(*) from t; (return 10002); | select count(*) from t; (Return 10001) |
Session A starts the transaction on T1 and gets the consistency view. At the repeatable read level, the data read at any time in the transaction is the same. The update of other transactions does not affect session A, so the result of count(*) is 10000. A new statement has been inserted and committed in session C before T4 (an update statement is executed separately, InnoDB will start a transaction by itself and commit as soon as the statement is executed). Session B inserts a new piece of data at T5, and the result of count(*) in T6 query is 10002(the SESSION C INSERT statement was committed at T4 BEGIN, so you can see this update in session B’s transaction). Since session B’s transaction has not committed at T6, session C cannot see the update of session B, so session C’s count(*) at T6 is 10001.
COUNT is an aggregate function that evaluates each row in the returned result set. If the COUNT argument is not NULL, it increments by 1, otherwise it does not, and returns the cumulative value. Let’s take a look at the efficiency of each COUNT version:
- COUNT(primary key ID) InnoDB traverses the table and returns the primary key value of each row to the Server layer of MySQL. Since the primary key cannot be NULL, the Server layer accumulates the primary key by row and returns the cumulative value to the client.
- COUNT(1) traverses the table without a value, and the Server layer adds the number “1” to each row returned. COUNT(1) is faster than COUNT(primary key) because no value is required, reducing data transfer.
- COUNT(field) traverses the table, row by row reading the field value from the record to the Server layer, the Server layer determines that the value is not NULL and then accumulates.
COUNT(*)
MySQL is optimized to find the smallest index tree in the table. InnoDB normal index tree is much smaller than primary key indexCOUNT(*)
Which tree is going to be the same,count(*)
MySQL does not take the record value whencount(*)
Also definitely not NULL, Server layer directly by row accumulation.
So this version of COUNT from low to high is:
< COUNT(1) ≈ COUNT(*)
It is recommended that you use count(*) to get the number of rows.
Note also that many people will record the number of rows in the table in Redis for sales purposes, but this does not guarantee that the number of rows in the table in Redis is exactly the same as the number of rows in the table in MySQL. This is because two different storage systems do not support distributed transactions, so it is impossible to get an accurate consistent view. If you want to store rows in a separate MySQL table for efficiency, then it is better to store rows in a separate MySQL table so that you can’t get a consistent view.
For a more detailed analysis of MySQL, we recommend to focus on MySQL practice 45