1, the background

In XX instance (one master and one slave), SLA alarm is reported every morning. This alarm means that there is a certain master and slave delay. (If the primary/secondary switchover occurs at this time, it takes a long time to complete the switchover, and the catch-up delay is required to ensure the consistency of the primary/secondary data.)

XX instance has the largest number of slow queries (SQL that takes more than 1s to execute will be recorded), and XX application side is doing the task of deleting data from one month ago every night.

2, analysis,

Use pt-query-digest to analyze the mysql-slow. Log of the last week:

pt-query-digest --since=148h mysql-slow.log | lessCopy the code

Results Part I:

In the last week, the total recorded slow query execution time was 25403s, the maximum slow SQL execution time was 266s, the average slow SQL execution time was 5s, and the average number of scanned rows was 17.66 million.

Results Part II:

Select arrival_RECORD Records a maximum of 40,000 slow queries with an average response time of 4s. Delete arrival_record records six slow queries with an average response time of 258s.

3, select XXx_RECORD statement

Select arrival_RECORD slow query statements like the following. The parameter fields in the WHERE statement are the same and the parameter values are different:

select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\GCopy the code

The maximum number of rows scanned in the select arrival_record statement in MySQL is 56 million, and the average number of rows scanned is 1.72 million.

View the execution plan:

explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G; 
*************************** 1. row *************************** 
id: 1 
select_type: SIMPLE 
table: arrival_record 
partitions: NULL 
type: ref 
possible_keys: IXFK_arrival_record 
key: IXFK_arrival_record 
key_len: 8 
ref: const 
rows: 32261320 
filtered: 3.70 
Extra: Using index condition; Using where 
1 row in set, 1 warning (0.00 sec)Copy the code

IXFK_arrival_record is used, but the number of rows scanned is expected to be more than 30 million:

show index from arrival_record; +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+ --------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+ --------+------+------------+---------+---------------+ | arrival_record | 0 | PRIMARY | 1 | id | A | 107990720 | NULL |  NULL | | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 1 | product_id | A | 1344 | NULL | NULL | | BTREE | |  | | arrival_record | 1 | IXFK_arrival_record | 2 | station_no | A | 22161 | NULL | NULL | YES | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 3 | sequence | A | 77233384 | NULL | NULL | | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 4 | receive_time | A | 65854652 | NULL | NULL | YES | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 5 | arrival_time | A | 73861904 | NULL | NULL | YES | BTREE | | | +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+ --------+------+------------+---------+---------------+ show create table arrival_record; . arrival_spend_ms bigint(20) DEFAULT NULL, total_spend_ms bigint(20) DEFAULT NULL, PRIMARY KEY (id), KEY IXFK_arrival_record (product_id,station_no,sequence,receive_time,arrival_time) USING BTREE, CONSTRAINT FK_arrival_record_product FOREIGN KEY (product_id) REFERENCES product (id) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=614538979 DEFAULT CHARSET=utf8 COLLATE=utf8_bin |Copy the code

① The total number of records in this table is about 100 million, and there is only one compound index in the table. The cardinality of product_id field is very small and selectivity is not good.

② Incoming filtering conditions:

where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0 Copy the code

The product_ID, STATION_NO, sequence, receive_time fields of IXFK_arrival_record are not used because there is no STATION_NU field.

Select product_id from arrival_record; select product_id from arrival_record; select product_id from arrival_record;

4 The receive_time field has a large cardinality and good selectivity. You can create an index for the field. The select arrival_RECORD SQL will use the index.

Select arrival_record where (product_id, receive_time, receive_SPend_ms);

Tcpdump select statement from tcpdump

tcpdump -i bond0 -s 0 -l -w - dst port 3316 | strings | grep select | egrep -i 'arrival_record' >/tmp/select_arri.logCopy the code

Get where condition statement after FROM in SELECT statement:

IFS_OLD=$IFS IFS=$'\n' for i in `cat /tmp/select_arri.log `; do echo ${i#*'from'}; done | less IFS=$IFS_OLDCopy the code
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=17 and arrivalrec0_.station_no='56742' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S7100' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4631' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S9466' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4205' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4105' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4506' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4617' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S8356' 
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S8356'Copy the code

Select product_ID, STATION_NO, sequence from IXFK_arrival_record where product_ID, STATION_NO, sequence from IXFK_arrival_record

In summary, the optimization method is as follows:

Delete the composite index IXFK_arrival_record

Create the compound index idx_sequence_station_NO_product_id

Create an independent index indx_Receive_time

4, delete XXx_RECORD statement

The average number of rows scanned by this DELETE operation was 110 million, and the average execution time was 262s.

The delete statement looks like this, with different values passed in for each recorded slow query:

Delete from arrival_record where receive_time < STR_TO_DATE(‘2019-02-23’, ‘%Y-%m-%d’)\G

explain select * from arrival_record where receive_time < STR_TO_DATE('2019-02-23', '%Y-%m-%d')\G 
*************************** 1. row *************************** 
id: 1 
select_type: SIMPLE 
table: arrival_record 
partitions: NULL 
type: ALL 
possible_keys: NULL 
key: NULL 
key_len: NULL 
ref: NULL 
rows: 109501508 
filtered: 33.33 
Extra: Using where 
1 row in set, 1 warning (0.00 sec)Copy the code

The DELETE statement does not use an index (no suitable index is available) and takes a full table scan, resulting in a long execution time.

The optimization method is also as follows: Create an independent index indx_Receive_time (receive_time).

5, test,

Copy the arrival_RECORD table to the test instance for deletion and reindexing.

Alter table arrival_record

du -sh /datas/mysql/data/3316/cq_new_cimiss/arrival_record* 
12K    /datas/mysql/data/3316/cq_new_cimiss/arrival_record.frm 
48G    /datas/mysql/data/3316/cq_new_cimiss/arrival_record.ibd 

select count() from cq_new_cimiss.arrival_record; 
+-----------+ 
| count() | 
+-----------+ 
| 112294946 | 
+-----------+ Copy the code

More than 100 million records

SELECT table_name, CONCAT(FORMAT(SUM(data_length) / 1024/1024,2),'M') AS dbdata_size, CONCAT(FORMAT(SUM(index_length) / 1024/1024,2),'M') AS dbindex_size, CONCAT(FORMAT(SUM(data_length + index_length) / 1024/1024/1024,2),'G') AS table_size(G), AVG_ROW_LENGTH,table_rows,update_time FROM information_schema.tables WHERE table_schema = 'cq_new_cimiss' and table_name='arrival_record'; +----------------+-------------+--------------+------------+----------------+------------+---------------------+ | table_name | dbdata_size | dbindex_size | table_size(G) | AVG_ROW_LENGTH | table_rows | update_time | +----------------+-------------+--------------+------------+----------------+------------+---------------------+ | Arrival_record | 18268 | 13868.05.02 M M 31.38 G | | 175 | 109155053 | 2019-03-26 12:40:17 | +----------------+-------------+--------------+------------+----------------+------------+---------------------+Copy the code

The disk occupies 48 GB space. The size of this table in MySQL is 31 GB, and about 17 GB fragments exist. Most of the fragments are caused by deletion operations. (Record deleted, space not reclaimed)

Restore the table to a new instance, delete the original composite index, and add another index for testing.

Mydumper:

user=root 
 passwd=xxxx 
socket=/datas/mysql/data/3316/mysqld.sock 
db=cq_new_cimiss 
table_name=arrival_record 
backupdir=/datas/dump_$table_name 
mkdir -p $backupdir 

  nohup echo `date +%T` && mydumper -u $user -p $passwd -S $socket  -B $db -c  -T $table_name  -o $backupdir  -t 32 -r 2000000 && echo `date +%T` &Copy the code

The time (52s) and space (1.2g) taken by the parallel compression backup (the actual disk space of this table is 48G, and the compression ratio of myDumper is quite high) :

Started dump at: 2019-03-26 12:46:04 ........ Finished dump at: 2019-03-26 12:46:56 Du -sh /datas/dump_arrival_record/ 1.2g /datas/dump_arrival_record/ Finished dump at: 2019-03-26 12:46:56 DU -sh /datas/dump_arrival_record/Copy the code

Copy dump data to test node:

SCP - rp/datas/dump_arrival_record [email protected]: / datasCopy the code

Multithreaded import data:

time myloader -u root -S /datas/mysql/data/3308/mysqld.sock -P 3308 -p root -B test -d /datas/dump_arrival_record -t 32 

real 126m42.885s 
user 1m4.543s 
sys 0m4.267sCopy the code

Disk space occupied by the logical import of this table:

du -h -d 1 /datas/mysql/data/3308/test/arrival_record.* 
12K /datas/mysql/data/3308/test/arrival_record.frm 
30G /datas/mysql/data/3308/test/arrival_record.ibd Copy the code

The size of this table is the same as that of the mysql table

cp -rp /datas/mysql/data/3308 /datasCopy the code

Online DDL and PT-OSC tools are used to delete and rebuild indexes.

Delete foreign key from compound index; foreign key column is the first column in compound index;

nohup bash /tmp/ddl_index.sh & 
2019-04-04-10:41:39 begin stop mysqld_3308 
2019-04-04-10:41:41 begin rm -rf datadir and cp -rp datadir_bak 
2019-04-04-10:46:53 start mysqld_3308 
2019-04-04-10:46:59 online ddl begin 
2019-04-04-11:20:34 onlie ddl stop 
2019-04-04-11:20:34 begin stop mysqld_3308 
2019-04-04-11:20:36 begin rm -rf datadir and cp -rp datadir_bak 
2019-04-04-11:22:48 start mysqld_3308 
2019-04-04-11:22:53 pt-osc begin 
2019-04-04-12:19:15 pt-osc stop Copy the code

Online DDL takes 34 minutes, pT-OSC 57 minutes, using onLNE DDL is about half the time of pT-OSC tool.

Do DDL reference:

6,

Since it is a master-slave instance, the application is a VIP connection, and the online DDL is used to delete and rebuild the index.

After the master/slave replication is stopped, a master/slave switchover is performed on the slave instance (no binlog is recorded), and then on the newly switched slave instance (no binlog is recorded) :

function red_echo () { local what="$*" echo -e "$(date +%F-%T) ${what}" } function check_las_comm(){ if [ "$1" != "0" ]; then red_echo "$2" echo "exit 1" exit 1 fi } red_echo "stop slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"stop slave" check_las_comm "$?" "stop slave failed" red_echo "online ddl begin" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"set sql_log_bin=0; select now() as ddl_start; ALTER TABLE $db_.\`${table_name}\` DROP FOREIGN KEY FK_arrival_record_product,drop index IXFK_arrival_record,add index idx_product_id_sequence_station_no(product_id,sequence,station_no),add index idx_receive_time(receive_time); select now() as ddl_stop" >>${log_file} 2>& 1 red_echo "onlie ddl stop" red_echo "add foreign key" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"set sql_log_bin=0; ALTER TABLE $db_.${table_name} ADD CONSTRAINT _FK_${table_name}_product FOREIGN KEY (product_id) REFERENCES cq_new_cimiss.product (id) ON DELETE NO ACTION ON UPDATE NO ACTION;" >>${log_file} 2>& 1 check_las_comm "$?" "add foreign key error" red_echo "add foreign key stop" red_echo "start slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"start slave" check_las_comm "$?" "start slave failed"Copy the code

Execution time:

2019-04-08-11:17:36 stop slave 
mysql: [Warning] Using a password on the command line interface can be insecure. 
ddl_start 
2019-04-08 11:17:36 
ddl_stop 
2019-04-08 11:45:13 
2019-04-08-11:45:13 onlie ddl stop 
2019-04-08-11:45:13 add foreign key 
mysql: [Warning] Using a password on the command line interface can be insecure. 
2019-04-08-12:33:48 add foreign key stop 
2019-04-08-12:33:48 start slave Copy the code

It takes 28 minutes to delete and rebuild indexes, and 48 minutes to add foreign key constraints.

Look again at the execution plan for delete and SELECT statements:

explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10', '%Y-%m-%d')\G 
*************************** 1. row *************************** 
id: 1 
select_type: SIMPLE 
table: arrival_record 
partitions: NULL 
type: range 
possible_keys: idx_receive_time 
key: idx_receive_time 
key_len: 6 
ref: NULL 
rows: 7540948 
filtered: 100.00 
Extra: Using where; Using index 

explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G; 
*************************** 1. row *************************** 
id: 1 
select_type: SIMPLE 
table: arrival_record 
partitions: NULL 
type: range 
possible_keys: idx_product_id_sequence_station_no,idx_receive_time 
key: idx_receive_time 
key_len: 6 
ref: NULL 
rows: 291448 
filtered: 16.66 
Extra: Using index condition; Using whereCopy the code

Both use the IDx_Receive_time index, and the number of rows scanned is significantly reduced.

7. After index optimization

Delete still takes 77s:

delete from arrival_record where receive_time < STR_TO_DATE('2019-03-10', '%Y-%m-%d')\GCopy the code

The DELETE statement took 77 seconds to delete more than 3 million records from the receive_time index.

Delete large tables to delete small batches

The application is optimized to delete 10 minutes of data each time (each execution takes about 1s), and SLA(primary/secondary delay alarm) is not displayed in XXX:

Another method is to delete 20000 records at a time by primary key order:

SELECT MAX(ID) INTO @need_delete_max_id FROM 'arrival_record' WHERE SELECT MAX(ID) INTO @need_delete_max_id FROM 'arrival_record' WHERE receive_time<'2019-03-01' ; DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000; select ROW_COUNT(); If row_count() is not 0, the loop will be executed. If row_count() is 0, the loop will exit. DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000; select ROW_COUNT(); # Programmed sleep 0.5 secondsCopy the code

9,

When a table has a large amount of data, you need to pay attention to the response time of accessing the table and the maintenance cost of the table (for example, it takes a long time to create DDL tables and delete historical data).

When performing DDL operations on large tables, consider the actual situation of the table (for example, whether there are foreign keys for the table to be published together) to select the appropriate DDL change mode.

Delete large data tables to reduce the pressure on the master instance and the master/slave delay by deleting them in small batches.

The original link: www.cnblogs.com/YangJiaXin/… Wenyuan network, only for the use of learning, such as infringement, contact deletion.

I have collected quality technical articles and experience summary in my public account “Java Circle”.

In order to facilitate your learning, I have compiled a set of learning materials, covering Java virtual machine, Spring framework, Java threads, data structures, design patterns and so on, free for students who love Java! More learning communication group, more communication problems can be faster progress ~