It's the SELECT COUNT statement, which can be abused by an interviewer

Database query I believe that many people are not unfamiliar, all often some people ridicule programmers is CRUD commissioner, the so-called CRUD refers to the increase, deletion, change and check database.

Query operation is the most frequently used in the operation of adding, deleting, changing and checking database. In all the query operations, the statistical operation is often used.

There is a function for counting rows in a database, whether MySQL or Oracle, and that is COUNT.

However, this commonly used COUNT function, but hidden a lot of secrets, especially in the interview, not careful will be abused. If you don’t believe me, try answering the following questions:

1. How many uses is COUNT?

What is the difference between COUNT(*) and COUNT(*)?

3. What’s the difference between COUNT(1) and COUNT(*)?

4. Which is more efficient between COUNT(1) and COUNT(*)?

5. Why alibaba Java Development Manual suggests using COUNT(*)

MySQL MyISAM optimizes COUNT(*)

MySQL InnoDB engine optimizes COUNT(*)

8. What is the key premise for MySQL to optimize COUNT(*)?

SELECT COUNT(*) where COUNT(*);

COUNT(*), COUNT(1), and COUNT(field name).

If you can answer all of the above questions correctly, then you really know the COUNT function. If there is any knowledge that you do not know, then this article can help you to answer the question.

To know the COUNT

The COUNT function is described in detail on the MySQL website:

A brief translation:

COUNT(expr) returns the number of rows whose expr value is not NULL. The result is a BIGINT value.

2. If no record is matched in the query result, 0 is returned

3. It is worth noting, however, that the COUNT(*) COUNT contains the number of rows whose value is NULL.

That is, the following table records

create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
Copy the code

Use count(*),count(id),count(id2) to query the result as follows:

select count(*),count(id),count(id2)
from #bla
results 7 3 2
Copy the code

In addition to COUNT(id) and COUNT(*), you can use COUNT(constant), such as COUNT(1), to COUNT rows. What is the difference between these three SQL statements? Which is more efficient? Why is it mandatory in the Alibaba Java Development Manual not to use COUNT(column name) or COUNT(constant) instead of COUNT(*)?

The difference between COUNT(column name), COUNT(constant), and COUNT(*)

COUNT(expr) specifies the number of rows whose expr is not NULL. COUNT(expr) specifies the number of rows whose expr is not NULL. COUNT(constant) specifies the number of rows whose expr is not NULL.

In the column name, constant, and * conditions, constant is a fixed value and must not be NULL. Select * from row where column names are NULL.

So,COUNT (constant) 和 COUNT(*)Represents the number of rows directly queried for a qualified database table. whileCOUNT (column name)Represents the number of rows where the value of the eligible column is not NULL.

COUNT(*) is standard row COUNT syntax compared to COUNT(constant) and COUNT(column name). Because it is standard syntax, MySQL database has optimized it a lot.

SQL92, is an ANSI/ISO standard for databases. It defines a language (SQL) and the behavior of the database (transactions, isolation levels, and so on).

The optimization of COUNT (*)

As mentioned earlier, COUNT(*) is the standard row COUNT syntax defined by SQL92, so MySQL database has optimized it a lot. So, what exactly did you do?

The presentation here differentiates between different execution engines. The most common execution engines in MySQL are InnoDB and MyISAM.

There are many differences between MyISAM and InnoDB. One key difference is related to COUNT(*), which is that MyISAM does not support transactions. The locks in MyISAM are table-level locks. InnoDB supports transactions and row-level locking.

Is because the lock MyISAM table level lock, so the same table operations require a serial of above, so, do a simple optimization MyISAM, is that it can record total number of rows of tables separately, if from a table using the COUNT (*) query, can be directly returns the value of the record is ok, of course, The premise is that you can’t have a WHERE condition.

MyISAM is able to record the total number of rows in a table for use in COUNT(*) queries because MyISAM databases are table locked and do not have concurrent database row changes, so the query is accurate.

For InnoDB, however, this caching is not possible, because InnoDB supports transactions, most of which are row-level locks, so it is possible that the number of rows in the table can be changed concurrently, and the total number of rows recorded by the cache will not be accurate.

However, InnoDB does make some optimizations for COUNT(*) statements.

In InnoDB, when using COUNT(*) to query the number of rows, it is inevitable to scan tables, so you can work hard to optimize the efficiency of the scan process.

Since MySQL 8.0.13, InnoDB’s SELECT COUNT(*) FROM tbl_name statement has been optimized for table scanning. Ensure that the query statement does not contain conditions such as WHERE or GROUP BY.

As we know, COUNT(*) is only used to COUNT the total number of rows, so he doesn’t care about the specific value he finds. Therefore, he can save a lot of time if he can choose a lower cost index in the process of scanning the table.

As we know, indexes in InnoDB are divided into clustered index (primary key index) and non-clustered index (non-primary key index). The leaf node of the clustered index stores the whole row record, while the leaf node of the non-clustered index stores the primary key value of the row record.

Therefore, non-clustered indexes are much smaller than clustered indexes, so MySQL preferentially selects the smallest non-clustered indexes to scan the table. Therefore, when we build the table, it is necessary to create a non-primary key index in addition to the primary key index.

MySQL database optimizations for COUNT(*) do not include WHERE and GROUP BY conditions.

COUNT (*) and COUNT (1)

COUNT(*), let’s look at COUNT(1), and there’s a lot of debate online about whether there’s a difference between the two.

Some say COUNT(*) is converted to COUNT(1) when executed, so COUNT(1) has fewer conversion steps, so it is faster.

COUNT(*) is faster because MySQL is specially optimized for COUNT(*).

So which one is true? Check out the MySQL documentation to see what it says:

InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.

Same way, no performance difference. So, MySQL optimizations are exactly the same for COUNT(1) and COUNT(*), there is no faster than the other!

So since COUNT(*) is the same as COUNT(1), which one do I recommend?

COUNT(*)! Is recommended. Because this is SQL92 definition of standard row count syntax, and this article is only based on MySQL analysis, about Oracle this problem, is also controversial.

COUNT (field)

Finally, we have not mentioned the COUNT(field), his query is relatively simple, it is a full table scan, and determine whether the specified field value is NULL, not NULL, summation.

COUNT(field) has one more step than COUNT(*) to determine whether the queried field is NULL, so its performance is slower than COUNT(*).

conclusion

This article introduces the use of the COUNT function, which is mainly used to COUNT table rows. The main uses are COUNT(*), COUNT(field), and COUNT(1).

Since COUNT(*) is a standard row COUNT syntax defined in SQL92, MySQL has made many optimizations for it. MyISAM will directly record the total number of rows for COUNT(*) query, while InnoDB will select the smallest index when scanning tables to reduce costs. Of course, all of these optimizations assume that there are no conditional queries for WHERE and group.

In InnoDB, COUNT(*) and COUNT(1) are implemented in the same way, and the efficiency is the same, but COUNT(field) requires non-null judgment, so the efficiency is lower.

Because COUNT(*) is a standard row COUNT syntax defined in SQL92 and is efficient, use COUNT(*) directly to query the number of rows in the table!

Resources: dev.mysql.com/doc/refman/… Geek Time — MySQL Tutorial 45

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

It’s the SELECT COUNT statement, which can be abused by an interviewer

To know the COUNT

The difference between COUNT(column name), COUNT(constant), and COUNT(*)

The optimization of COUNT (*)

COUNT (*) and COUNT (1)

COUNT (field)

conclusion

It’s the SELECT COUNT statement, which can be abused by an interviewer

To know the COUNT

The difference between COUNT(column name), COUNT(constant), and COUNT(*)

The optimization of COUNT (*)

COUNT (*) and COUNT (1)

COUNT (field)

conclusion

Related Posts

What is JIT? How do you optimize?

Still copying objects with BeanUtils? MapStruct is king! (4)

Use of IK word dividers in Elasticsearch