Cause analysis and optimization scheme of limit paging large offset in MySQL

In MySQL, we usually use limit to do pagination on pages, but when the amount of data reaches a high value, the interface becomes more and more slow to respond the further you turn the page.

This article mainly discusses the cause and optimization of limit paging large offset slow, to simulate this situation, the following table structure and SQL execution.

Scenario simulation

Build table statements

The user table has a simple structure of ID, sex, and name. To make the SQL execution time change more obvious, there are nine name columns.

CREATE TABLE `user`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `sex` tinyint(4) NULL DEFAULT NULL COMMENT 'Sex 0-male 1-female',
  `name1` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name2` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name3` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name4` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name5` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name6` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name7` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name8` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name',
  `name9` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT 'name'.PRIMARY KEY (`id`) USING BTREE,
  INDEX `sex`(`sex`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 9000001 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
Copy the code

The data

A stored procedure is created to populate the data, a total of 9 million pieces of data. After executing the function, an SQL statement is executed to modify the gender field.

Ps: This function takes a long time to execute. I ran it for 617.284 seconds.

CREATE DEFINER=`root`@`localhost` PROCEDURE `data`()
begin 
  declare i int; 
  set i=1; 
  while(i< =9000000)do 
    insert into user values(i,0,i,i,i,i,i,i,i,i,i);
    set i=i+1; 
  end while;
end

-- Set the gender of user with an even id to 1-female
update user set sex=1 where id%2=0;
Copy the code

SQL and execution time

SQL	The execution time
select * from user where sex = 1 limit 100, 10;	OK, Time: 0.005000 s
select * from user where sex = 1 limit 1000, 10;	OK, Time: 0.007000 s
select * from user where sex = 1 limit 10000, 10;	OK, Time: 0.016000 s
select * from user where sex = 1 limit 100000, 10;	OK, Time: 0.169000 s
select * from user where sex = 1 limit 1000000, 10;	OK, Time: 5.892000 s
select * from user where sex = 1 limit 10000000, 10;	OK, Time: 33.465000 s

As you can see, the larger the offset of limit, the longer the execution time.

Cause analysis,

Let’s first examine the execution of this SQL statement, using the first row in the table above as an example.

MySQL > select * from ‘sex’ where ‘sex’ =1;

Select * from ‘sex’; select * from ‘sex’; select * from ‘sex’; Select * from primary key index tree (name, sex); select * from primary key index tree (name, sex);

MySQL > select * from limit 100, 10; MySQL > select * from limit 100, 10; MySQL > select * from limit 100, 10;

To summarize, in the above execution process, the cause of the execution time of the limit large offset is as follows:

Querying all columns results in a return to the table
limit a, bThe first a+ B data is queried and the first A data is discarded

In combination with the above two reasons, MySQL spends a lot of time on the callback table, and the result of a callback table does not appear in the result set, which causes the query time to become longer and longer.

Optimization scheme

Cover index

Since invalid back to the table is the main cause of the slow query, then the optimization scheme was mainly from the aspects of reducing frequency of back to the table, we assume that the limit a, b, the first thing we got an a + 1 to + b article id of the data, and then get back to the table other column data, then cut back to a time table operation, the speed will certainly get a lot.

This is where covered indexes come in. A covered index is the desired data that can be retrieved from a non-primary cluster index, rather than having to go back to the table to query other columns from the primary key index, which can significantly improve performance.

Based on this idea, the optimization scheme is to query the primary key ID first, and then query other column data according to the primary key ID. The optimized SQL and execution time are shown in the following table.

Optimized SQL	The execution time
select * from user a join (select id from user where sex = 1 limit 100, 10) b on a.id=b.id;	OK, Time: 0.000000 s
select * from user a join (select id from user where sex = 1 limit 1000, 10) b on a.id=b.id;	OK, Time: 0.00000 s
select * from user a join (select id from user where sex = 1 limit 10000, 10) b on a.id=b.id;	OK, Time: 0.002000 s
select * from user a join (select id from user where sex = 1 limit 100000, 10) b on a.id=b.id;	OK, Time: 0.015000 s
select * from user a join (select id from user where sex = 1 limit 1000000, 10) b on a.id=b.id;	OK, Time: 0.151000 s
select * from user a join (select id from user where sex = 1 limit 10000000, 10) b on a.id=b.id;	OK, Time: 1.161000 s

Sure enough, execution efficiency improved significantly.

filter

Of course, another flawed approach is conditional filtering based on sorting.

For example, in the example user table above, I want to use limit pagination to get 1000001 to 1000010 entries. I can write SQL like this:

select * from user where sex = 1 and id > (select id from user where sex = 1 limit 1000000.1) limit 10;
Copy the code

However, there is a condition for tuning in this way: the primary key ids must be ordered. In an ordered case, you can use other fields, such as creation time, instead of the primary key ID, but only if the field is indexed.

Overall, there are limitations to using conditional filtering to optimize limits, and overwriting indexes is generally recommended.

summary

This paper mainly analyzes the reasons for the slow execution of limit page large offset, and also puts forward the optimization scheme of response, and recommends the method of overwriting index to optimize the execution time of limit page large offset.

I hope it helps.

Ps: this article appeared on planeswalker23’s personal blog