“This is the 21st day of my participation in the Gwen Challenge in November. See details: The Last Gwen Challenge in 2021”

In development, we often do some interface of the query function, basically based data query will adopt the paging mode, every time the query is a query in the article the total number, according to the query to the total number of article, and then calculate the need to display the total number of pages, render to the browser allows users to paging query data. The query statements are generally as follows:

SELECT count(*) FROM table_name;
Copy the code

Such query statements, in the system at the beginning of the online data volume is not very much, so the query performance is relatively fast, when the data reaches millions of pieces of data, the query time may be a few seconds left longer, so that a query page will appear data for a long time will affect the user experience. Next, we will thoroughly understand why MySQL query total is so slow, which will help us optimize the function in the future.

The implementation of counting

The most commonly used MySQL engines are MyISAM and InnoDB. These two engines have different implementation methods for count(*) totals:

  • MyISAM engine: in the data increase or delete, each will be the total number of rows in the table on disk, in each execution of count(*) query total time, will directly return the total number, so that it will soon return the total number;
  • InnoDB engine: when querying the total number, it will accumulate the number of rows, so that each query will count the total number, resulting in a very long time to query the total number;

Why can’t InnoDB just record a total like MyISAM and return it every time InnoDB queries a total?

InnoDB internally records a total number of entries, but this number is approximate and inaccurate. When it needs to know the exact number, InnoDB will not return this number to the client, but go to the field to count the total. The reason for this:

  • InnoDB supports transactions. In order to ensure data isolation, it adopts MVCC multi-version concurrency control. In different sessions, according to the Rodo log and the transaction isolation level of each session, it needs to determine which data should be visible at present. The total number of queries for different session levels may be inconsistent.

When InnoDB calculates the total number, it searches the total number based on the index tree. In the constructed index, the write index tree is relatively small and only calculates the total number. InnoDB will select a relatively small index number to traverse, which can reduce the amount of scanned data, improve the efficiency of query, and get the accurate total number.

Count (*), count(1), count(primary key), and count(common field) efficiency

Count (*), count(1), count(primary key), count(1), count(primary key), count(primary key) ,

Count () is an aggregate function, so when counting the total, the result set is returned, line by line. If count is not NULL, the total is incremented by 1; if it is NULL, it is not.

  • Count (primary key) and count(primary key) : the logic is the same, the whole table will be traversed, each row of the primary key or field is fetched, and then the server will get the data, the server will determine whether the data is empty: If a field is allowed to be NULL, the value must be fetched to determine if it is not NULL. Count (primary key) > count(field)

  • Count (1) : As with the above traversal logic, the server does not take the field value, but directly returns the number 1. For each row returned by the server, it is determined that it cannot be empty, so it directly accumulates by row. Compared with count(ID), it does not take the specific field value during the query, which will reduce part of the time. So count(1) is more efficient than count(primary key)

  • Count () : this is a special case for MySQL. For the syntax of count(), the server layer does not determine whether the count() is empty, so that the query speed is slightly higher than count(1).

Count (*) ≈ count(1) > count(primary key) > count(field)

How do I avoid counting

In my work project, our entire team for queries are not calculated the total pages, but to the user query, the default queries only the first page, not telling the user total current conditions, if the query returns a number is less than the current page number, is considered to be the last page, the user can not click next time, if so, The user can only click the next page one by one, and the next page can only query 10 pages at most. If the user clicks more pages, the paging will become more and the query efficiency will be very low.

Before we do this project, we have too much on the user’s query page number page, click on the survey statistics, the user clicks to the low possibility of 10 pages back, if you want to go to see more data asynchronous export function, it will speed up the user interface of the query, also can reduce the query server stress, improve the performance of the database.