Interviewer: 10 million data, how to quickly query?

preface

Interviewer: Tell me, how did you query the data of ten million?
B: Direct paging query, use limit paging.
Interviewer: Have you ever done any practical work?
B: There must be

Here’s a song called “Cool cool”

Perhaps some people have never encountered a multi-million table and are not sure what happens when you query a multi-million table.

MySQL 5.7.26 MySQL 5.7.26 MySQL 5.7.26 MySQL 5.7.26

To prepare data

What if we don’t have $10 million?

Create a bai

Code creation 10 million? That’s impossible. It’s too slow. It could literally take all day. Database scripts can be used for much faster execution.

Create a table

CREATE TABLE `user_operation_log`  (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr4` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr5` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr6` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr7` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr8` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr9` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr10` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr11` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  `attr12` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,

  PRIMARY KEY (`id`) USING BTREE

) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

Copy the code

Creating data Scripts

The efficiency of batch insert is much faster. In addition, commit every 1000 entries. A large amount of data may slow the efficiency of batch insert

DELIMITER ;;

CREATE PROCEDURE batch_insert_log()

BEGIN

  DECLARE i INT DEFAULT 1;

  DECLARE userId INT DEFAULT 10000000;

 set @execSql = 'INSERT INTO `test`.`user_operation_log`(`user_id`, `ip`, `op_data`, `attr1`, `attr2`, `attr3`, `attr4`, `attr5`, `attr6`, `attr7`, `attr8`, `attr9`, `attr10`, `attr11`, `attr12`) VALUES';

 set @execData = ' ';

  WHILE i<=10000000 DO

   set @attr = "' Test properties that are long, long, long, long, long, long, long, long, long, long, long. '";

  set @execData = concat(@execData, "(", userId + i, ", '10.0.69.175', 'User login operation '".",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")");

  if i % 1000 = 0

  then

     set @stmtSql = concat(@execSql, @execData,";");

    prepare stmt from @stmtSql;

    execute stmt;

    DEALLOCATE prepare stmt;

    commit;

    set @execData = "";

   else

     set @execData = concat(@execData, ",");

   end if;

  SET i=i+1;

  END WHILE;



END;;

DELIMITER ;

Copy the code

To begin testing

Brother’s computer configuration is relatively low: WIN10 standard pressure slag I5 read and write about 500MB SSD

Due to low configuration, only 3148000 pieces of data were prepared for this test, occupying 5G of disk (without index) and running for 38min. Students with computer configuration are able to insert multi-point data for the test

SELECT count(1) FROM `user_operation_log`

Copy the code

Result: 3148000

The three query times are as follows:

14060 ms
13755 ms
13447 ms

Plain page query

MySQL supports the LIMIT statement to select a specified number of items. Oracle can use ROWNUM to select items.

MySQL > select * from pager;

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset

Copy the code

The first parameter specifies the offset of the first return row
The second parameter specifies the maximum number of rows to return

Let’s start testing the query results:

SELECT * FROM `user_operation_log` LIMIT 10000, 10

Copy the code

The query times are as follows:

59 ms
49 ms
50 ms

This seems to be a good speed, but it is a local database, naturally faster.

Let’s test it from a different Angle

Same offset, different data volume

SELECT * FROM `user_operation_log` LIMIT 10000, 10

SELECT * FROM `user_operation_log` LIMIT 10000, 100

SELECT * FROM `user_operation_log` LIMIT 10000, 1000

SELECT * FROM `user_operation_log` LIMIT 10000, 10000

SELECT * FROM `user_operation_log` LIMIT 10000, 100000

SELECT * FROM `user_operation_log` LIMIT 10000, 1000000

Copy the code

The query time is as follows:

The number of	For the first time,	The second time	The third time
Article 10.	53ms	52ms	47ms
Article 100.	50ms	60ms	55ms
Article 1000.	61ms	74ms	60ms
Article 10000.	164ms	180ms	217ms
Article 100000.	1609ms	1741ms	1764ms
Article 1000000.	16219ms	16889ms	17081ms

Conclusion: The larger the data, the longer it takes

Same amount of data, different offsets

SELECT * FROM `user_operation_log` LIMIT 100, 100

SELECT * FROM `user_operation_log` LIMIT 1000, 100

SELECT * FROM `user_operation_log` LIMIT 10000, 100

SELECT * FROM `user_operation_log` LIMIT 100000, 100

SELECT * FROM `user_operation_log` LIMIT 1000000, 100

Copy the code

The offset	For the first time,	The second time	The third time
100	36ms	40ms	36ms
1000	31ms	38ms	32ms
10000	53ms	48ms	51ms
100000	622ms	576ms	627ms
1000000	4891ms	5076ms	4856ms

Conclusion can be drawn from the above results: The larger the offset, the longer the time

SELECT * FROM `user_operation_log` LIMIT 100, 100

SELECT id, attr FROM `user_operation_log` LIMIT 100, 100

Copy the code

How to optimize

Since we have gone through the above toss and turn, we have also come to a conclusion. For the above two problems: large offset and large amount of data, we set about optimization respectively

Optimize the problem of large offset

Subquery is adopted

We can locate the ID of the offset position first and then query the data

SELECT * FROM `user_operation_log` LIMIT 1000000, 10



SELECT id FROM `user_operation_log` LIMIT 1000000, 1



SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10

Copy the code

The query results are as follows:

sql	Spend time
Article 1 with a	4818ms
Article 2 (Without index)	4329ms
Article 2 (with indexes)	199ms
Article 3 (Without Index)	4319ms
Article 3 (With index)	201ms

Conclusions can be drawn from the above results:

Number one takes the most time, and number three is slightly better
Subqueries use indexes faster

Disadvantages: Only applicable to the case of id increment

The non-incrementing id case can be written as follows, but the disadvantage is that paging queries can only be placed in subqueries

Note: Some mysql versions do not support the use of limit in the IN clause, so multiple nested SELECT’s are used

SELECT * FROM `user_operation_log` WHERE id IN (SELECT t.id FROM (SELECT id FROM `user_operation_log` LIMIT 1000000, 10) AS t)

Copy the code

The id qualification mode is used

This method is more demanding, the ID must be continuously increasing, and the range of ids must be calculated, and then the SQL is as follows

SELECT * FROM `user_operation_log` WHERE id between 1000000 AND 1000100 LIMIT 100



SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100



Copy the code

The query results are as follows:

sql	Spend time
Article 1 with a	22ms
Article 2:	21ms

As you can see from the results, this approach is very fast

Note: LIMIT is used to LIMIT the number of items, not the offset

Optimize the problem of large data volume

The amount of data returned also directly affects speed

SELECT * FROM `user_operation_log` LIMIT 1, 1000000



SELECT id FROM `user_operation_log` LIMIT 1, 1000000



SELECT id, user_id, ip, op_data, attr1, attr2, attr3, attr4, attr5, attr6, attr7, attr8, attr9, attr10, attr11, attr12 FROM `user_operation_log` LIMIT 1, 1000000

Copy the code

The query results are as follows:

sql	Spend time
Article 1 with a	15676ms
Article 2:	7298ms
Article 3 the	15960ms

As can be seen from the results, the query efficiency can be significantly improved by reducing the number of unnecessary columns

The first and the third query speed is about the same, at this time you will certainly ridicule, that I also write so many fields to do what, direct * not finished

Note that my MySQL server and client are on the same machine, so the query data is similar. If you have conditions, you can test the client and MySQL separately

SELECT * Doesn’t it smell good?

Here, incidentally, I would like to add why SELECT * is disabled. Is it simple and brainless? Doesn’t it smell good?

Two main points:

With “SELECT * “database needs to resolve more objects, fields, permissions, attributes and other related content, in the CASE of SQL statement complex, hard parsing more, will cause a heavy burden on the database.
Increase network overhead, * sometimes mistakenly bring useless and large text fields such as log and IconMD5, data transmission size will increase geometrically. Especially when MySQL and the application are not on the same machine, this overhead is obvious.

The end of the

Finally, I hope you will go to practice, you can harvest more, welcome to leave a message!

Create script I give you just what are you waiting for!!

How to optimize MySQL before I serve it again

Interviewer: 10 million data, how to quickly query?

preface

To prepare data

Create a table

Creating data Scripts

To begin testing

Plain page query

Same offset, different data volume

Same amount of data, different offsets

How to optimize

Optimize the problem of large offset

Subquery is adopted

The id qualification mode is used

Optimize the problem of large data volume

SELECT * Doesn’t it smell good?

The end of the

Related Posts

What is Seata? This paper introduces the realization principle

Absolutely! Heap sort in the past life and in Python implementation

SpringCloud Microservices Governance I (Introduction, Environment Architecture, Eureka)