Case series: Analysis of SQL query optimization principle (900W+ data, 17s to 300ms)

1 financial flow table, current data volume 9555695, paging query with limit, prior to optimization 16 s 938 ms execution: 16 s 831 ms, fetching: To fetch 347 ms (execution: 163 ms, 184 ms) after adjusting SQL as below;

Operation: Put the query condition into the sub-query, the sub-query only looks up the primary key ID, and then use the primary key determined in the sub-query to associate query other attribute fields;

Principle: reduce back table operations;

SQLSELECT FROMtable_nameWHERE LIMIT0,10 SELECT * FROMtable_namemain_taleRIGHTJOIN from LIMIT0,10; Temp_tableONtemp_table. Primary key = main_table. A primary key

A: the preface

If you want more information about MySQL, you can add it to the database group 934623944 for free

mysql>selectversion(); + — — — — — — — — — — — + | version () | + — — — — — — — — — — — + | 5.7.17 | + — — — — — — — — — — – + 1 rowinset (0.00 SEC) table structure:

mysql>desctest; +——–+———————+——+—–+———+—————-+|Field|Type|Null|Key|Default|Extra|+——–+— ——————+——+—–+———+—————-+|id|bigint(20)unsigned|NO|PRI|NULL|auto_increment||val|int(10 )unsigned|NO|MUL|0|||source|int(10)unsigned|NO||0||+——–+———————+——+—–+———+————– –+3rowsinset(0.00 SEC) id is an autoincrement primary key, val is a non-unique index.

A large amount of data, 5 million in total:

mysql>selectcount()fromtest; + — — — — — — — — — — + | the count () | + — — — — — — — — — — + | 5242882 | + — — — — — — — — — — + 1 rowinset (4.25 SEC) as we know, when the limit offset rows of offset is large, there will be the efficiency problem:

Mysql > select * fromtestwhereval limit300000 = 4, 5; +———+—–+——–+|id|val|source|+———+—–+——–+|3327622|4|4||3327632|4|4||3327642|4|4||3327652|4|4|| 3327662 | | | 4 + 4 — — — — — — — — — + + — — — — — — — — — — — — – + 5 rowsinset (15.98 SEC) in order to achieve the same purpose, we usually make the following statement:

Mysql > select * fromtestainnerjoin (limit300000 selectidfromtestwhereval = 4, 5) bona. Id = b.i d; +———+—–+——–+———+|id|val|source|id|+———+—–+——–+———+|3327622|4|4|3327622||3327632| 4|4|3327632||3327642|4|4|3327642||3327652|4|4|3327652||3327662|4|4|3327662|+———+—–+——–+———+5rowsinse T (0.38 SEC) time difference is obvious. If you want more information, please add the information sharing group 934623944 to get it for free

Why the above results? Select * from test where val=4 limit 300000,5; Query process:

Query index leaf node data. Query all required field values on the cluster index according to the primary key value on the leaf node. Something like this:

Case column: analysis of SQL query optimization principle (900W+ data, 17s to 300ms) as above, you need to query 30,005 index nodes and 30,005 clustered index data. Finally, filter out the first 300,000 results and extract the last five. MySQL spends a lot of random I/O queries on clustered index data, and 300,000 random I/O queries do not show up in the result set.

The question must be asked: since you started with an index, why not go down the index leaf nodes to the final five nodes and then query the actual data in the clustered index? This only takes 5 random I/ OS, similar to the process shown below

SQL query optimization principle analysis (900W+ data, 17s to 300ms)

2: confirm

Let’s verify the above inference in practice:

Select * from test where val=4 limit 300000, select * from test where val=4 limit 300000 We need to know if MySQL has a way to count the number of times a data node is queried through an index node in a SQL query. I tried the Handler_read_* series first, and unfortunately none of the variables met the criteria.

I can only confirm this indirectly:

InnoDB has buffer pools. It contains recently accessed data pages, including data pages and index pages. So we need to run two SQL to compare the number of data pages in the buffer pool. Select * from test a inner join (select id from test where val=4 LIMIT 3000005); Select * from test where val=4 LIMIT 3000005; select * from test where val=4 limit 3000005; Because the previous SQL only accessed the data page 5 times, and the next SQL accessed the data page 30,005 times.

Select * from test where val=4 limit 300000,5

mysql>selectindex_name,count(*)frominformation_schema.INNODB_BUFFER_PAGEwhereINDEX_NAMEin(‘val’,’primary’)andTABLE_NAMEl ike’%test%’groupbyindex_name; Emptyset(0.04 SEC) you can see that there is no data page for the test table in the buffer pool.

Mysql > selectfromtestwhereval limit300000 = 4, 5; +———+—–+——–+|id|val|source|+———+—–+——–+|3327622|4|4||3327632|4|4||3327642|4|4||3327652|4|4|| 3327662 | | | 4 + 4 — — — — — — — — — + + — — — — — — — — — — — — – + 5 rowsinset (26.19 SEC) mysql > selectindex_name, count () frominformation_schema. INNODB_BU FFER_PAGEwhereINDEX_NAMEin(‘val’,’primary’)andTABLE_NAMElike’%test%’groupbyindex_name; +————+———-+|index_name|count(*)|+————+———-+|PRIMARY|4098||val|208|+————+———-+2 Rowsinset (0.04 SEC) indicates that there are 4098 data pages and 208 index pages in the buffer pool about the test table.

Select * from test a inner join (select id from test where val=4 LIMIT 3000005); To prevent the impact of the previous experiment, we need to clear the buffer pool and restart mysql.

mysqladminshutdown/usr/local/bin/mysqld_safe& mysql>selectindex_name,count(*)frominformation_schema.INNODB_BUFFER_PAGEwhereINDEX_NAMEin(‘val’,’primary’)andTABLE_NAMEl ike’%test%’groupbyindex_name; Emptyset(0.03 SEC)

Mysql > selectfromtestainnerjoin (limit300000 selectidfromtestwhereval = 4, 5) bona. Id = b.i d; +———+—–+——–+———+|id|val|source|id|+———+—–+——–+———+|3327622|4|4|3327622||3327632| 4|4|3327632||3327642|4|4|3327642||3327652|4|4|3327652||3327662|4|4|3327662|+———+—–+——–+———+5rowsinse T (0.09 SEC) mysql > selectindex_name, count () frominformation_schema. INNODB_BUFFER_PAGEwhereINDEX_NAMEin andTA (‘ val ‘, ‘primary’) BLE_NAMElike’%test%’groupbyindex_name; +————+———-+|index_name|count(*)|+————+———-+|PRIMARY|5||val|390|+————+———-+2row Sinset (0.03 SEC) : The first SQL loaded 4098 pages into the buffer pool, while the second SQL loaded only 5 pages into the buffer pool. In line with our predictions. It also confirms why the first SQL was slow: a large number of useless rows were read (300,000) and then discarded. In addition, this can cause a problem: loading a lot of data pages into the buffer pool, which will pollute the buffer pool and occupy the buffer pool space. Problems encountered

To ensure that the buffer pool is empty on every restart, we need to turn off innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup, These options control how much data is dumped from the buffer pool when the database is shut down and how much data is loaded onto disk when the database is started.

If you want more information, please add the information sharing group 934623944 to get it for free

Case series: Analysis of SQL query optimization principle (900W+ data, 17s to 300ms)

Related Posts

What happens to the process of new? After reading this one, you will understand

STM8 new proofing board cannot burn program solution

Java Concurrent Programming 1- Volatile