Recently, WHEN I was working on a project, I encountered some operations with a large amount of data, including a large number of CRUD operations. After the performance test of the scheme implemented at the beginning, I found that the performance was not very good. Then I began to review the code and optimized the relevant operations that could improve the performance.

The principle of

First of all, I will not talk about the index related content and the optimization of the corresponding parameters of the database. Here, assuming that you have a relevant understanding of the index, I summarize my optimization this time, the main two principles:

  • Some specific scenarios, try to use batch processing data, such as batch add data, batch modify data;
  • Combined with business to minimize the number of SQL execution and query unnecessary data;

Scene practice

To simulate the running scenario, I created a table here and added 300W pieces of data to it. The table structure is as follows:

CREATE TABLE `tb_big_data` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `weixin_id` varchar(64) NOT NULL.`openid` varchar(64) NOT NULL.`status` int(3) NOT NULL.`gmt_create` datetime NOT NULL.`gmt_modified` datetime NOT NULL,
 PRIMARY KEY (`id`),
 KEY `weixin_id_gmt_create_openid` (`weixin_id`.`gmt_create`.`openid`))ENGINE=InnoDB AUTO_INCREMENT DEFAULT CHARSET=utf8Copy the code

1. Small optimization of paging query

Paging query platitudes, a variety of online optimization methods are many, here is not mentioned, here is just to share a small skill:

How can I improve performance when using the most common limit?

Suppose we now have an SQL that looks like this:

SELECT * FROM `tb_big_data` where weixin_id ='gh_266a30a8a1f6' and gmt_create > '2017-10-10 00:00:00' order by id asc limit 800000.100; Execution time: 100 rows inset (1.53 sec)Copy the code

How can we speed up this SQL if we can’t make other optimizations now, such as passing in minimum IDS, split table queries, and no SQL preheating? In fact, it is very simple. We only need one in operation:

SELECT * FROM `tb_big_data` t1 where t1.id in ( 
    SELECT tt.id FROM ( 
        SELECT id FROM `tb_big_data` t2 where weixin_id = 'gh_266a30a8a1f6' and gmt_create > '2017-10-10 00:00:00' order by t2.id asc limit 800100.100
        ) astt); Execution time: 100 rows inset (1.17 sec)Copy the code

As you can see, SQL can be 30 to 40 percent more efficient with a few modifications, and the larger the data record, the better. This is not the best method of paging, it’s just a trick;

2. Reduce SQL queries

Now we have a requirement that we now have a list of users (the unique id of the user is OpenID) and then we need to determine whether the user has a record for that day;

This problem is actually very simple, the first thing we think of an operation is circulating this list one by one judge, simple is also very good, but the real test of the time found that has very poor performance, especially in the case of large amount of data, the multiple levels of growth, there are have the consumption of network data transmission time and the execution time of the SQL itself;

Suppose we now execute the following SQL:

SELECT * FROM `tb_big_data` WHERE weixin_id ='gh_266a30a8a1f6' and gmt_create > '2017-10-13 00:00:00' and openid='2n6bvynihm5bzgyx'; Execution time: 1 row inset (0.95 sec)Copy the code

Now, if we execute 100 times, we can’t imagine what will happen. We are glad that we found this problem, because with a small amount of data, the problem is not that serious. In fact, we can solve this problem in another efficient way with a few changes:

SELECT * FROM `tb_big_data` WHERE weixin_id ='gh_266a30a8a1f6' and gmt_create > '2017-10-13 00:00:00' and openid in ('2n6bvynihm5bzgyx'.'1stbvdnl63de2q37'.'3z8552gxzfi3wy27'...). ; Execution time: 100 row inset (1.05 sec)Copy the code

In, and the execution time is almost the same as that of a single query. It can be seen that only a single part of the processing can improve performance.

3. Batch processing of SQL is used in specific scenarios

This point has a similar point, that is to reduce the SQL execution, it is just a query, and when there is a large quantity of CUD operation, implement each SQL, database transaction processing, it will consume large amounts of time, and in extreme cases can cause large quantities of waiting cannot execute SQL, lead to business error, precisely because of these reasons, We can use batch processing in some appropriate situations to solve this problem.

(1) Batch insert

Batch inserts are simple and common. Here is the basic syntax:

INSERT INTO table_name (field1,filed2,...) values(value11,value12,...) ,(value21,value22,...) .Copy the code
(2) Batch update

I will take a simple example, we will now update the data according to some conditions, the specific SQL is as follows:

update `tb_big_data` set status = 2 WHERE weixin_id ='gh_266a30a8a1f6' and gmt_create > '2017-10-13 00:00:00' and openid = '2n6bvynihm5bzgyx';

Query OK, 1 row affected (2.28 sec)
Rows matched: 1  Changed: 1  Warnings: 0Copy the code

We were surprised that we only updated one record, and the update condition was a composite index. We didn’t expect the speed to be so slow. You can imagine how much time it would take to update data in batches.

But let’s look at another SQL:

update `tb_big_data` set status = 1 WHERE id = 900098;

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0Copy the code

Conditions before the above id value for screening out the record id, whether very surprised, how the SQL execution time almost don’t need any time, so we can use this feature and batch query simplified batch updates, although can’t let this way to the optimal performance, but also can improve a lot, I had a test, Batch update 100 pieces of data according to corresponding conditions:

way Direct batch update Check primary keys in batches and update them in batches
Time consuming 289.12 s 1.342 s

It can be seen that the performance of this method is greatly improved compared with the ordinary method. When executing the SQL, we can also put the SQL in one transaction to reduce the number of database transactions, but this is only an optimization at the code level.

In addition, we can use the special syntax provided by MySQL to perform batch updates. The specific syntax is:

INSERT INTO table_name (id,field1,field2...) VALUES (id1,value11,value12,...) ,(id1,value11,value12,...) . on duplicate key update field = VAULES(field); INSERT INTO 'tb_big_data' (id,weixin_id, openID,gmt_create,status) values (1,'gh_266a30a8a1f6','w9q8fmodytjgppsr','2017-10-13 12:00:00',3),(2,'gh_266a30a8a1f6','bu1flmch4i8eegzf','2017-10-13 12:00:00',3) on duplicate key update status = VAULES(status);Copy the code

After testing, the efficiency of this method is similar to that of the above method in the case of small amount of data. However, with the increasing amount of data, the performance is getting better and better. However, the shortcoming is that the main amount of data to be transmitted is large, and the fields that do not need to be updated also need to be transmitted.

In addition, it is not recommended to update a large number of data in batches, not more than 1000 pieces at a time.

conclusion

In general, SQL optimization is a careful knowledge, need to constantly try, test, find the best way, there is another point is to combine the actual situation, comprehensive consideration to choose the right way.