This is the 10th day of my participation in the August More Text Challenge
Innodb repeatable read isolation level: count(field), count(primary key ID), count(1), count(*)
Reason 1.
1.1 Why Are there Four counts
Count definition: Count () is an aggregate function that evaluates the returned result set line by line if the count function argument is not NULL, the cumulative value is incremented, otherwise it is not incremented. Finally returns the cumulative value.
Count (*), count(primary key ID), and count(1) all represent the total number of rows that return a result set that meets the criteria; Count (field) indicates the total number of rows in which the parameter “field” is not NULL.
From the definition, we can infer that count(*), count(primary key ID), and count(1) get the same result. Count (field) may not be NULL, so the result may be inconsistent with the other three.
I have four counts, but they correspond to the same function.
1.2 Count is sometimes slow
Many students have intuitive experience, when the table is large, the count speed is very slow, sometimes takes tens of seconds or even a few minutes, but MyISAM can quickly return the value.
The main reasons are as follows:
-
MyISAM saves the exact number of rows of the table, so this code is executed in the MyISAM storage engine by simply reading the number of saved rows.
-
Innodb does not store the number of rows, count is traversed across the table, row by row. The reason for doing this is that MVCC needs to determine which rows are visible and which rows are not visible to the current count statement. For details, see Innodb transaction isolation. .
2. The count process
The count operation involves the Server layer and storage engine layer of MySQL. Server is responsible for judging and counting, Innodb storage engine is responsible for fetching data sets.
Why do I need to brush dirty pages in Innodb? When reading data, it is necessary to read data from disk to memory. Therefore, count takes the following nodes:
-
Reading data from disk into memory, which causes many I/ OS, is a major time drain
-
To read data from memory into the returned data set. The engine’s return of a dataset involves parsing rows of data and copying field values
-
The Server layer nulls the data set for counting operations
From this, several aspects that affect performance can be analyzed:
-
The amount of data read into memory from disk
-
Whether a value needs to be fetched from memory
-
Whether NULL checks are required for the Server layer
3. Performance differences
Count (primary key ID), count(1), count(*).
For count(primary key ID), the InnoDB engine iterates through the table, fetching the ID value for each row and returning it to the server layer. The server layer gets the ID, determines that it cannot be empty, and adds it up by row.
For count(1), the InnoDB engine traverses the entire table without taking a value. The server layer adds a number “1” to each row returned, judging that it cannot be empty, and adds it up.
For count(field) :
-
If the “field” is defined as not null, read the column from the record line by line.
-
If the “field” definition is allowed to be NULL, then the value must be extracted to determine if it is not null.
Count (*) is an exception. It does not extract all fields, but is optimized for no value. Count (*) must not be null.
To summarize each operation:
operation | Whether the values | Whether the judge | note |
---|---|---|---|
count(*) | no | no | The fastest |
count(1) | no | is | |
Count (primary key id) | is | is | The smallest index tree is possible |
Count (field) | is | is | If there is no index on a field, only the primary key index can be selected |
Count (field)<=count(primary key ID)<count(1)≈count(*)
Count (field)<=count(primary key ID) because count(primary key ID) is likely to select the smallest non-clustered index, whereas count(field) can only select the primary key index if the “field” has no index, thus loading much more data into memory. The other two may choose the same index tree.
Of course, count(primary key ID) selects the minimum index tree only if there is no where condition and group by condition in the query statement.
Count (*), count(1), count(primary key ID), and count(primary key ID) can be optimized in the same way.
When you’re counting, you can just choose count(*).
4. The optimization
Count (*) is already the fastest solution, so how do you optimize count(*) performance if you don’t meet the criteria?
There are two options:
-
Create a minimum index so that count(*) computations select the minimum index to reduce I/O
-
Use transactions and count yourself. Create count tables and update count tables as data is added or deleted.
conclusion
You know a little bit more about count, but there are a lot of little things to think about.
data
-
Innodb count(*),count(field),count(primary key ID),count(1)
-
Count (1); count(*); count(*)
-
MySQL > alter table COUNT(*)
-
Optimization of the count(*) function in MySQL’s InnoDB storage engine
-
MySQL InnoDB count() function
The last
If you like my article, you can follow my public account (Programmer Malatang)
My personal blog is shidawuhen.github. IO /
Review of previous articles:
-
Design patterns
-
recruitment
-
thinking
-
storage
-
The algorithm series
-
Reading notes
-
Small tools
-
architecture
-
network
-
The Go