preface
When a table that needs to be queried from a database has tens of thousands of records, it can be slow to query all the results at once, especially as the data volume increases, and paging queries are required. For database paging queries, there are also many methods and optimization points. Here are some of the ways I know.
The preparatory work
To test some of the optimizations listed below, one of the existing tables is illustrated below.
-
The table name: order_history
-
Description: Order history table for a business
-
Unsigned int ID, tinyint(4) int type
-
The table contains 37 fields and does not contain large data such as text. The maximum value is VARCHAR (500). The ID field is the index and increases.
-
Data amount: 5709294
-
MySQL version: 5.7.16
It’s not easy to find a million-dollar test table offline. If you need to test yourself, you can write shell scripts or insert data to test.
The following SQL statement execution environment does not change, the following are the basic test results:
select count(*) from orders_history;
Copy the code
Result: 5709294
The three query times are as follows:
8903 ms8323 ms8401 ms
General paging query
General paging queries can be implemented using a simple limit clause. The limit clause declares as follows:
SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset
Copy the code
The LIMIT clause can be used to specify the number of records returned by the SELECT statement. Note the following points:
-
The first parameter specifies the offset of the first return row
-
The second parameter specifies the maximum number of rows to return
-
If only one argument is given: it indicates the maximum number of rows to return
-
The second argument -1 retrieves all rows from an offset to the end of the recordset
-
Initial row offset is 0(not 1)
Here is an example application:
select * from orders_history where type= 8limit1000, 10;Copy the code
This statement will query 10 items of data after the 1000th item, i.e. 1001-10010, from the table ORDERS_History.
By default, records in a table are sorted by primary key (usually ID). The result above is equivalent to:
select * from orders_history where type=8 order by id limit10000, 10;Copy the code
The three query times are as follows:
3040 ms3063 ms3018 ms
For this type of query, the following tests the impact of query record volume on time:
select * from orders_history where type= 8limit10000, 1; select * from orders_historywhere type= 8limit10000, 10; select * from orders_historywhere type= 8limit10000100; select * from orders_historywhere type= 8limit10000100; select * from orders_historywhere type= 8limit 10000,10000;
Copy the code
The three query times are as follows:
Query one record: 3072ms 3092ms 3002ms Query 10 records: 3081ms 3077ms 3032ms Query 100 records: 3118ms 3200ms 3128ms Query 1000 records: 3412ms 3468ms 3394ms Query 10000 records: 3749ms 3802ms 3696ms
In addition, I also made more than ten queries. From the query time, it can be basically determined that when the query record quantity is less than 100, there is no difference in the query time. With the query record quantity increasing, the time will be more and more.
Tests against query offsets:
select * from orders_history where type= 8limit100100; select * from orders_historywhere type= 8limit1000100; select * from orders_historywhere type= 8limit10000100; select * from orders_historywhere type= 8limit100000100; select * from orders_historywhere type= 8limit1000000100;Copy the code
The three query times are as follows:
Query 100 offset: 25ms 24ms 24ms 1000 offset: 78ms 76ms 77ms 10000 offset: 3092ms 3212ms 3128ms 100000 offset: 3878ms 3812ms 3798ms 1000000 offset: 3812ms 3798ms 14608ms 14062ms 14700ms
With the increase of the query offset, especially when the query offset is greater than 100,000, the query time increases sharply.
This paged query scans from the first record in the database, so the later the query becomes slower, and the more data you query, the slower the overall query speed.
Use subquery optimization
In this way, the ID of the offset position is located first and then queried later. This method is suitable for the case of increasing ids.
select * from orders_history where type= 8limit100000, 1; select id from orders_historywhere type= 8limit100000, 1; select * from orders_historywhere type=8 and
id>=(select id from orders_history where type= 8limit100000, 1)limit 100;
select * from orders_history where type= 8limit100000100;Copy the code
The query time of the four statements is as follows:
First statement: 3674ms second statement: 1315ms third statement: 1327ms fourth statement: 3710ms
Note for the above query:
-
Compare statement 1 with statement 2: using select ID instead of SELECT * is 3 times faster
-
Compare statement 2 with statement 3: the speed difference is tens of milliseconds
-
Compare statement 3 with statement 4: statement 3 is 3 times faster thanks to the select ID speed increase
This method will be several times faster than the original general query method.
Use ID to qualify optimization
This method assumes that the id of the table is continuously increasing, then we can calculate the id range of the query according to the number of pages and records queried. We can use id between and to query:
select * from orders_history where type=2
and id between 1000000 and 1000100 limit 100;
Copy the code
Query time: 15ms 12ms 9ms
This query method can greatly optimize the query speed, basically can be completed in dozens of milliseconds. The limitation is that it can only be used when the ID is clearly known, but the basic ID field is usually added when the table is created, which makes paging queries a lot easier.
There’s another way to write it:
select * from orders_history where id >= 1000001 limit 100;
Copy the code
Select * from table_name where id = 0; select * from table_name where ID = 0; select * from table_name where ID = 0;
select * from orders_history where id in
(select order_id from trade_2 where goods = 'pen')
limit 100;
Copy the code
Note this approach to in queries: some versions of mysql do not support limit in clauses.
Use temporary table optimization
This approach is no longer query optimization, which is mentioned here incidentally.
For problems with ID-based optimization, sequential increments of ids are required, but in some scenarios, such as when using history tables, or when missing data is a problem, consider using temporary tables to record paging ids and paging ids for IN queries. This can greatly improve the speed of traditional paging queries, especially when the data volume is in the tens of millions.
Id description of the data table
In general, when creating a table in a database, it is mandatory to add an ID increment field to each table for easy query.
If the amount of data such as order library is very large, it is generally divided into database and table. Instead of using the database ID as the unique id, you should use a distributed, highly concurrent unique ID generator and store the unique id in additional fields in the data table.
Using range queries to locate ids (or indexes) and then indexes to locate data can speed up queries several times. Select id, select *;
The last
Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.