One, foreword

MySQL pagination query as a Java interview as a frequent interview questions, here it is necessary to practice, after all, practice is the real knowledge. Many students suffer from lack of massive data when doing tests, but the official test library is actually a set.

2. Simulated data

Here, the simulated data can be imported in two cases. If only data testing is required, the official data is recommended. If the official numbers don’t cut it, we can simulate them ourselves.

1. Import the official test library

Download the official database file or download it on Github.

The test library contains six tables.

Enter employees_DB and execute the import data instruction

mysql -uroot -proot -t < employees.sql
Copy the code

Some environments may report errors

ERROR 1193 (HY000) at line 38: Unknown system variable 'storage_engine'
Copy the code

Connect to mysql and check the default engine. It is not a local environment problem.

mysql> show variables like '%engine%'; +----------------------------------+--------+ | Variable_name | Value | +----------------------------------+--------+ | default_storage_engine | InnoDB | | default_tmp_storage_engine | InnoDB | | disabled_storage_engines | | | Internal_tmp_disk_storage_engine | InnoDB | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- - + 4 rows in the set (0.01 SEC)Copy the code

Modify the employees.sql script

   set default_storage_engine = InnoDB;
-- set storage_engine = MyISAM;
-- set storage_engine = Falcon;
-- set storage_engine = PBXT;
-- set storage_engine = Maria;

select CONCAT('storage engine: ', @@default_storage_engine) as INFO;
Copy the code

The discovery import succeeded again

➜ employees_db mysql -uroot -proot -t < employees. SQL mysql: [Warning] Using a password on thecommandline interface can be insecure. +-----------------------------+ | INFO | +-----------------------------+ | CREATING DATABASE STRUCTURE | +-----------------------------+ +------------------------+ | INFO | +------------------------+ | storage engine: InnoDB | +------------------------+ +---------------------+ | INFO | +---------------------+ | LOADING departments | +---------------------+ +-------------------+ | INFO | +-------------------+ | LOADING employees | +-------------------+  +------------------+ | INFO | +------------------+ | LOADING dept_emp | +------------------+ +----------------------+ |  INFO | +----------------------+ | LOADING dept_manager | +----------------------+ +----------------+ | INFO | +----------------+ | LOADING titles | +----------------+ +------------------+ | INFO | +------------------+ | LOADING salaries | +------------------+Copy the code

Verify the result (same as the previous configuration modification)

➜ employees_db mysql -uroot -proot -t < test_employees_sha. SQL mysql: [Warning] Using a password on the command line interface can be insecure. +----------------------+ | INFO | +----------------------+ | TESTING INSTALLATION | +----------------------+ +--------------+------------------+------------------------------------------+ | table_name | expected_records | expected_crc | +--------------+------------------+------------------------------------------+ | departments | 9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 | | dept_emp | 331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b | | dept_manager | 24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c | | employees | 300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 | | salaries | 2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f | | titles | 443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e | +--------------+------------------+------------------------------------------+Copy the code

We can see that emP has about 330,000 pieces of data.

2. The stored procedure imports simulated data

Here we can choose stored procedure batch import.

Start by creating a table

drop table if exists `user`;
create table `user`(
  `id` int unsigned auto_increment,
  `username` varchar(64) not null default '',
  `score` int(11) not null default 0,
    primary key(`id`)
)ENGINE = InnoDB;
Copy the code

Creating a stored procedure

DROP PROCEDURE IF EXISTS batchInsert; delimiter ? Create PROCEDURE batchInsert() -- Create a stored procedure begin -- Start of the stored procedure body Declare num int; Set num=1; Insert into user(' username ', 'score') values(concat('user ', num),num); insert into user(' username ', 'score') values(concat('user ', num),num); Set num=num+1; -- Loop variable increment end while; -- End loop? -- End of stored procedure body delimiter; # recovery; End CALL batchInsert. -- Execute the stored procedureCopy the code

You can see that the test 300W pieces of data were inserted in about 1046s. Well, the planned import of 1000W results took too long.

Repeat and optimize the common MySQL paging query problems.

We tested the existing table user, which had 300W entries.

1. Pre-check

Start by looking at the structure of the table and what indexes currently exist

mysql> desc user; +----------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | username | varchar(30) | NO | | | | | score | int(11) | NO | | 0 | | + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 3 rows in the set (0.00 SEC) mysql > show the index the from user; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+----- -------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+----- -------+---------+---------------+ | user | 0 | PRIMARY | 1 | id | A | 2991886 | NULL | NULL | | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+----- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set (0.00 SEC)Copy the code

You can see that there is only the ID primary key index.


Second, check whether caching is enabled (to avoid query caching affecting execution efficiency)

mysql> show variables like '%query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| have_query_cache             | YES     |
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 1048576 |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
6 rows in set (0.00 sec)

mysql> show profiles;
Empty set, 1 warning (0.00 sec)
Copy the code

Have_query_cache and query_cache_type Note Caching is supported but not enabled. If the show Profiles display is empty, the Profiles function is disabled.


Open the profiles

mysql> SET profiling = 1; Query OK, 0 rows affected, 1 warning (0.00 SEC) mysql> show profiles; +----------+------------+-------------------+ | Query_ID | Duration | Query | + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | | 0.00012300 SET profiling | = 1 + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

2. Query without index paging

Order by + limit M,n. Now let’s test paging performance

Select * from user order by score limit 0,10; -- 10 rows in set (0.65 SEC) select * from user order by score limit 10000,10; -- 10 rows in set (0.83 SEC) select * from user order by score limit 1000000,10; -- 10 rows in set (1.03 SEC) select * from user order by score limit 1000000,10; -- 10 rows in set (1.14 SEC)Copy the code

Let’s make sure we use the index here

Mysql > explain select * from user order by score limit 1000000,10; +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------- -----+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------- -- -- -- -- -- + | | 1 SIMPLE | user | NULL | | NULL ALL | NULL | NULL | NULL | 2991995 | | 100.00 Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------- -----+ 1 row in set, 1 Warning (0.00 SEC)Copy the code

As you can see, no indexes are used, and a full table scan of 100W data pages takes about 1.14s.

3. Query on index pages

Select * from user order by id limit 10000,10; -- 10 rows in set (0.0sec) select * from user order by id limit 1000000,10; -- 10 rows in set (0.18 SEC) SELECT * from user order by ID limit 200000,10; -- 10 rows in set (0.35 SEC)Copy the code

This query uses the primary key index, so the query is relatively efficient. It can be seen that when the amount of data increases, the query efficiency decreases significantly.

Let’s make sure we use the index here

Mysql > explain select * from user order by id limit 200000010; +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- + | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- + | 1 | SIMPLE | user | NULL | index | NULL | PRIMARY 4 | NULL | | 2000010 | | NULL | 100.00 +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- + 1 row in set, 1 warning (0.00 SEC)Copy the code

You can see that a full index scan was used to query a total of 2000010 rows of data.

4. The optimization

We use MYSQL’s Query diagnostic analysis tool to view the time spent on each SQL statement operation. It can be seen that the 2000010 records obtained by the query are returned to the client, and the time is mainly concentrated in the Sending data stage. But the client only needs 10 pieces of data. Can we only return 10 pieces of data to the client?

mysql> show profiles; +----------+------------+---------------------------------------------------------+ | Query_ID | Duration | Query | + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | | SET profiling 0.00012300 = 1 | | | | SET profiling 0.00009200 = 1 2 | | 3 | | 0.35689500 select * from user order id limit by 2000000, | | | 4 0.00023900 | explain the select * from user order by id 2000000, 10 | limit +----------+------------+---------------------------------------------------------+ 4 rows in set, 1 warning (0.00 SEC) mysql> show profile for query 3; + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- + | Status | Duration | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- + | starting 0.000071 | | | 0.000007 checking permissions | | | Opening tables 0.000012 | | | init 0.000017 | | | System lock 0.000008 | | | Optimizing 0.000005 | | | statistics 0.000024 | | | preparing 0.000016 | | | Sorting result 0.000004 | | | executing | 0.356653 0.000003 | | Sending data | | | end 0.000013 | | | | 0.000005 query end | | closing tables 0.000008 | | | Freeing the items 0.000019 | | | cleaning up 0.000030 | | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- + 16 rows in the set, 1 warning (0.00 SEC)Copy the code

Online optimization scheme: sub-query + overwrite index

mysql> select * from user where id > (select id from user order by id limit 2000000, 1) limit 10; +---------+--------------+---------+ | id | username | score | +---------+--------------+---------+ | 2000002 | user-2000002 | 2000002 | | 2000003 | user-2000003 | 2000003 | | 2000004 | user-2000004 | 2000004 | | 2000005 | user-2000005 | 2000005 | | 2000006 | user-2000006 | 2000006 | | 2000007 | user-2000007 | 2000007 | | 2000008 | user-2000008 | 2000008 | | 2000009 | user-2000009 | 2000009 | | 2000010 | user-2000010 | 2000010 | | 2000011 | User - 2000011 | 2000011 | + -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- - + 10 rows in the set (0.29 SEC) mysql > explain the select * from user where id > (select id from user order by id limit 2000000, 1) limit 10; +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- ------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------ -- -- -- -- -- -- -- + | 1 | PRIMARY | user | NULL | range | PRIMARY | PRIMARY 4 | NULL | | 1495997 | | 100.00 Using the where | | | 2 SUBQUERY | user | NULL | index | NULL | PRIMARY 4 | NULL | | 2000001 | | 100.00 Using index | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- ------+ 2 rows in set, 1 warning (0.30 SEC)Copy the code

However, it does not improve query performance. Don’t see what the problem is? As you can see from the execution plan, the indexes are in line with our expectations. “Rows” retrieves a lot of rows here. Look at the subqueries separately

mysql> select id from user order by id limit 2000000, 1; + -- -- -- -- -- -- -- -- -- + | | id + -- -- -- -- -- -- -- -- -- + | 2000001 | + -- -- -- -- -- -- -- -- -- + 1 row in the set (0.29 SEC) mysql > explain the select id from the user for the order by id limit 2000000, 1; +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- ------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------ -- -- -- -- -- -- -- + | | 1 SIMPLE | user | NULL | index | NULL | PRIMARY 4 | NULL | | 2000001 | | 100.00 Using index | +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+------- ------+ 1 row in set, 1 Warning (0.00 SEC)Copy the code

Here you can see that the subquery still takes about 3s even if it overwrites the index, which I think is the normal index IO time. No official test data was found for comparison, as well as the time taken by MySQL for an IO query.

Int primary key a page can store 1000 key, root permanent memory, then B+Tree second layer about 100W key, test data in 200W pages, theoretically need 2 IO to find data. If the time of 2 IO is 3s, 1 IO should be about 1.5s. Let’s query the page around 99W to see if it fits our hypothesis.

Mysql > select id from user order by id limit 990000,1; + -- -- -- -- -- -- -- -- + | | id + -- -- -- -- -- -- -- -- + | 990001 | + -- -- -- -- -- -- -- -- + 1 row in the set (0.15 SEC)Copy the code

So the author’s bold guess here is that the result is normal overhead

Four, the last

Originally wanted to check whether the paging optimization scheme on the web is reliable, but the expected results are still different. I hope the intelligent readers have different opinions. There is the author’s wechat QR code in the official account.