preface
- Interviewer: Tell me, how did you query the data of ten million?
- B: Direct paging query, use limit paging.
- Interviewer: Have you ever done any practical work?
- B: There must be
Here’s a song called “Cool cool”
Perhaps some people have never encountered a multi-million table and are not sure what happens when you query a multi-million table.
MySQL 5.7.26 MySQL 5.7.26 MySQL 5.7.26 MySQL 5.7.26
To prepare data
What if we don’t have $10 million?
Create a bai
Code creation 10 million? That’s impossible. It’s too slow. It could literally take all day. Database scripts can be used for much faster execution.
Create a table
CREATE TABLE `user_operation_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`ip` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`op_data` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr4` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr5` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr6` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr7` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr8` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr9` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr10` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr11` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`attr12` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
Copy the code
Creating data Scripts
The efficiency of batch insert is much faster. In addition, commit every 1000 entries. A large amount of data may slow the efficiency of batch insert
DELIMITER ;;
CREATE PROCEDURE batch_insert_log()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE userId INT DEFAULT 10000000;
set @execSql = 'INSERT INTO `test`.`user_operation_log`(`user_id`, `ip`, `op_data`, `attr1`, `attr2`, `attr3`, `attr4`, `attr5`, `attr6`, `attr7`, `attr8`, `attr9`, `attr10`, `attr11`, `attr12`) VALUES';
set @execData = ' ';
WHILE i<=10000000 DO
set @attr = "' Test properties that are long, long, long, long, long, long, long, long, long, long, long. '";
set @execData = concat(@execData, "(", userId + i, ", '10.0.69.175', 'User login operation '".",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ",", @attr, ")");
if i % 1000 = 0
then
set @stmtSql = concat(@execSql, @execData,";");
prepare stmt from @stmtSql;
execute stmt;
DEALLOCATE prepare stmt;
commit;
set @execData = "";
else
set @execData = concat(@execData, ",");
end if;
SET i=i+1;
END WHILE;
END;;
DELIMITER ;
Copy the code
To begin testing
Brother’s computer configuration is relatively low: WIN10 standard pressure slag I5 read and write about 500MB SSD
Due to low configuration, only 3148000 pieces of data were prepared for this test, occupying 5G of disk (without index) and running for 38min. Students with computer configuration are able to insert multi-point data for the test
SELECT count(1) FROM `user_operation_log`
Copy the code
Result: 3148000
The three query times are as follows:
- 14060 ms
- 13755 ms
- 13447 ms
Plain page query
MySQL supports the LIMIT statement to select a specified number of items. Oracle can use ROWNUM to select items.
MySQL > select * from pager;
SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset
Copy the code
- The first parameter specifies the offset of the first return row
- The second parameter specifies the maximum number of rows to return
Let’s start testing the query results:
SELECT * FROM `user_operation_log` LIMIT 10000, 10
Copy the code
The query times are as follows:
- 59 ms
- 49 ms
- 50 ms
This seems to be a good speed, but it is a local database, naturally faster.
Let’s test it from a different Angle
Same offset, different data volume
SELECT * FROM `user_operation_log` LIMIT 10000, 10
SELECT * FROM `user_operation_log` LIMIT 10000, 100
SELECT * FROM `user_operation_log` LIMIT 10000, 1000
SELECT * FROM `user_operation_log` LIMIT 10000, 10000
SELECT * FROM `user_operation_log` LIMIT 10000, 100000
SELECT * FROM `user_operation_log` LIMIT 10000, 1000000
Copy the code
The query time is as follows:
The number of | For the first time, | The second time | The third time |
---|---|---|---|
Article 10. | 53ms | 52ms | 47ms |
Article 100. | 50ms | 60ms | 55ms |
Article 1000. | 61ms | 74ms | 60ms |
Article 10000. | 164ms | 180ms | 217ms |
Article 100000. | 1609ms | 1741ms | 1764ms |
Article 1000000. | 16219ms | 16889ms | 17081ms |
Conclusion: The larger the data, the longer it takes
Same amount of data, different offsets
SELECT * FROM `user_operation_log` LIMIT 100, 100
SELECT * FROM `user_operation_log` LIMIT 1000, 100
SELECT * FROM `user_operation_log` LIMIT 10000, 100
SELECT * FROM `user_operation_log` LIMIT 100000, 100
SELECT * FROM `user_operation_log` LIMIT 1000000, 100
Copy the code
The offset | For the first time, | The second time | The third time |
---|---|---|---|
100 | 36ms | 40ms | 36ms |
1000 | 31ms | 38ms | 32ms |
10000 | 53ms | 48ms | 51ms |
100000 | 622ms | 576ms | 627ms |
1000000 | 4891ms | 5076ms | 4856ms |
Conclusion can be drawn from the above results: The larger the offset, the longer the time
SELECT * FROM `user_operation_log` LIMIT 100, 100
SELECT id, attr FROM `user_operation_log` LIMIT 100, 100
Copy the code
How to optimize
Since we have gone through the above toss and turn, we have also come to a conclusion. For the above two problems: large offset and large amount of data, we set about optimization respectively
Optimize the problem of large offset
Subquery is adopted
We can locate the ID of the offset position first and then query the data
SELECT * FROM `user_operation_log` LIMIT 1000000, 10
SELECT id FROM `user_operation_log` LIMIT 1000000, 1
SELECT * FROM `user_operation_log` WHERE id >= (SELECT id FROM `user_operation_log` LIMIT 1000000, 1) LIMIT 10
Copy the code
The query results are as follows:
sql | Spend time |
---|---|
Article 1 with a | 4818ms |
Article 2 (Without index) | 4329ms |
Article 2 (with indexes) | 199ms |
Article 3 (Without Index) | 4319ms |
Article 3 (With index) | 201ms |
Conclusions can be drawn from the above results:
- Number one takes the most time, and number three is slightly better
- Subqueries use indexes faster
Disadvantages: Only applicable to the case of id increment
The non-incrementing id case can be written as follows, but the disadvantage is that paging queries can only be placed in subqueries
Note: Some mysql versions do not support the use of limit in the IN clause, so multiple nested SELECT’s are used
SELECT * FROM `user_operation_log` WHERE id IN (SELECT t.id FROM (SELECT id FROM `user_operation_log` LIMIT 1000000, 10) AS t)
Copy the code
The id qualification mode is used
This method is more demanding, the ID must be continuously increasing, and the range of ids must be calculated, and then the SQL is as follows
SELECT * FROM `user_operation_log` WHERE id between 1000000 AND 1000100 LIMIT 100
SELECT * FROM `user_operation_log` WHERE id >= 1000000 LIMIT 100
Copy the code
The query results are as follows:
sql | Spend time |
---|---|
Article 1 with a | 22ms |
Article 2: | 21ms |
As you can see from the results, this approach is very fast
Note: LIMIT is used to LIMIT the number of items, not the offset
Optimize the problem of large data volume
The amount of data returned also directly affects speed
SELECT * FROM `user_operation_log` LIMIT 1, 1000000
SELECT id FROM `user_operation_log` LIMIT 1, 1000000
SELECT id, user_id, ip, op_data, attr1, attr2, attr3, attr4, attr5, attr6, attr7, attr8, attr9, attr10, attr11, attr12 FROM `user_operation_log` LIMIT 1, 1000000
Copy the code
The query results are as follows:
sql | Spend time |
---|---|
Article 1 with a | 15676ms |
Article 2: | 7298ms |
Article 3 the | 15960ms |
As can be seen from the results, the query efficiency can be significantly improved by reducing the number of unnecessary columns
The first and the third query speed is about the same, at this time you will certainly ridicule, that I also write so many fields to do what, direct * not finished
Note that my MySQL server and client are on the same machine, so the query data is similar. If you have conditions, you can test the client and MySQL separately
SELECT * Doesn’t it smell good?
Here, incidentally, I would like to add why SELECT * is disabled. Is it simple and brainless? Doesn’t it smell good?
Two main points:
- With “SELECT * “database needs to resolve more objects, fields, permissions, attributes and other related content, in the CASE of SQL statement complex, hard parsing more, will cause a heavy burden on the database.
- Increase network overhead, * sometimes mistakenly bring useless and large text fields such as log and IconMD5, data transmission size will increase geometrically. Especially when MySQL and the application are not on the same machine, this overhead is obvious.
The end of the
Finally, I hope you will go to practice, you can harvest more, welcome to leave a message!
Create script I give you just what are you waiting for!!
How to optimize MySQL before I serve it again