1, the background
In XX instance (one master and one slave), SLA alarm is reported every morning. This alarm means that there is a certain master and slave delay. (If the primary/secondary switchover occurs at this time, it takes a long time to complete the switchover, and the catch-up delay is required to ensure the consistency of the primary/secondary data.)
XX instance has the largest number of slow queries (SQL that takes more than 1s to execute will be recorded), and XX application side is doing the task of deleting data from one month ago every night.
2, analysis,
Use pt-query-digest to analyze the mysql-slow. Log of the last week:
pt-query-digest --since=148h mysql-slow.log | lessCopy the code
Results Part I:
In the last week, the total recorded slow query execution time was 25403s, the maximum slow SQL execution time was 266s, the average slow SQL execution time was 5s, and the average number of scanned rows was 17.66 million.
Results Part II:
Select arrival_RECORD Records a maximum of 40,000 slow queries with an average response time of 4s. Delete arrival_record records six slow queries with an average response time of 258s.
3, select XXx_RECORD statement
Select arrival_RECORD slow query statements like the following. The parameter fields in the WHERE statement are the same and the parameter values are different:
select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\GCopy the code
The maximum number of rows scanned in the select arrival_record statement in MySQL is 56 million, and the average number of rows scanned is 1.72 million.
View the execution plan:
explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: arrival_record
partitions: NULL
type: ref
possible_keys: IXFK_arrival_record
key: IXFK_arrival_record
key_len: 8
ref: const
rows: 32261320
filtered: 3.70
Extra: Using index condition; Using where
1 row in set, 1 warning (0.00 sec)Copy the code
IXFK_arrival_record is used, but the number of rows scanned is expected to be more than 30 million:
show index from arrival_record; +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+ --------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+ --------+------+------------+---------+---------------+ | arrival_record | 0 | PRIMARY | 1 | id | A | 107990720 | NULL | NULL | | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 1 | product_id | A | 1344 | NULL | NULL | | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 2 | station_no | A | 22161 | NULL | NULL | YES | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 3 | sequence | A | 77233384 | NULL | NULL | | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 4 | receive_time | A | 65854652 | NULL | NULL | YES | BTREE | | | | arrival_record | 1 | IXFK_arrival_record | 5 | arrival_time | A | 73861904 | NULL | NULL | YES | BTREE | | | +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+ --------+------+------------+---------+---------------+ show create table arrival_record; . arrival_spend_ms bigint(20) DEFAULT NULL, total_spend_ms bigint(20) DEFAULT NULL, PRIMARY KEY (id), KEY IXFK_arrival_record (product_id,station_no,sequence,receive_time,arrival_time) USING BTREE, CONSTRAINT FK_arrival_record_product FOREIGN KEY (product_id) REFERENCES product (id) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=614538979 DEFAULT CHARSET=utf8 COLLATE=utf8_bin |Copy the code
① The total number of records in this table is about 100 million, and there is only one compound index in the table. The cardinality of product_id field is very small and selectivity is not good.
② Incoming filtering conditions:
where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0 Copy the code
The product_ID, STATION_NO, sequence, receive_time fields of IXFK_arrival_record are not used because there is no STATION_NU field.
Select product_id from arrival_record; select product_id from arrival_record; select product_id from arrival_record;
4 The receive_time field has a large cardinality and good selectivity. You can create an index for the field. The select arrival_RECORD SQL will use the index.
Select arrival_record where (product_id, receive_time, receive_SPend_ms);
Tcpdump select statement from tcpdump
tcpdump -i bond0 -s 0 -l -w - dst port 3316 | strings | grep select | egrep -i 'arrival_record' >/tmp/select_arri.logCopy the code
Get where condition statement after FROM in SELECT statement:
IFS_OLD=$IFS IFS=$'\n' for i in `cat /tmp/select_arri.log `; do echo ${i#*'from'}; done | less IFS=$IFS_OLDCopy the code
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=17 and arrivalrec0_.station_no='56742'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S7100'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4631'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S9466'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4205'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4105'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4506'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=24 and arrivalrec0_.station_no='V4617'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S8356'
arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40' and arrivalrec0_.product_id=22 and arrivalrec0_.station_no='S8356'Copy the code
Select product_ID, STATION_NO, sequence from IXFK_arrival_record where product_ID, STATION_NO, sequence from IXFK_arrival_record
In summary, the optimization method is as follows:
Delete the composite index IXFK_arrival_record
Create the compound index idx_sequence_station_NO_product_id
Create an independent index indx_Receive_time
4, delete XXx_RECORD statement
The average number of rows scanned by this DELETE operation was 110 million, and the average execution time was 262s.
The delete statement looks like this, with different values passed in for each recorded slow query:
Delete from arrival_record where receive_time < STR_TO_DATE(‘2019-02-23’, ‘%Y-%m-%d’)\G
explain select * from arrival_record where receive_time < STR_TO_DATE('2019-02-23', '%Y-%m-%d')\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: arrival_record
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 109501508
filtered: 33.33
Extra: Using where
1 row in set, 1 warning (0.00 sec)Copy the code
The DELETE statement does not use an index (no suitable index is available) and takes a full table scan, resulting in a long execution time.
The optimization method is also as follows: Create an independent index indx_Receive_time (receive_time).
5, test,
Copy the arrival_RECORD table to the test instance for deletion and reindexing.
Alter table arrival_record
du -sh /datas/mysql/data/3316/cq_new_cimiss/arrival_record*
12K /datas/mysql/data/3316/cq_new_cimiss/arrival_record.frm
48G /datas/mysql/data/3316/cq_new_cimiss/arrival_record.ibd
select count() from cq_new_cimiss.arrival_record;
+-----------+
| count() |
+-----------+
| 112294946 |
+-----------+ Copy the code
More than 100 million records
SELECT table_name, CONCAT(FORMAT(SUM(data_length) / 1024/1024,2),'M') AS dbdata_size, CONCAT(FORMAT(SUM(index_length) / 1024/1024,2),'M') AS dbindex_size, CONCAT(FORMAT(SUM(data_length + index_length) / 1024/1024/1024,2),'G') AS table_size(G), AVG_ROW_LENGTH,table_rows,update_time FROM information_schema.tables WHERE table_schema = 'cq_new_cimiss' and table_name='arrival_record'; +----------------+-------------+--------------+------------+----------------+------------+---------------------+ | table_name | dbdata_size | dbindex_size | table_size(G) | AVG_ROW_LENGTH | table_rows | update_time | +----------------+-------------+--------------+------------+----------------+------------+---------------------+ | Arrival_record | 18268 | 13868.05.02 M M 31.38 G | | 175 | 109155053 | 2019-03-26 12:40:17 | +----------------+-------------+--------------+------------+----------------+------------+---------------------+Copy the code
The disk occupies 48 GB space. The size of this table in MySQL is 31 GB, and about 17 GB fragments exist. Most of the fragments are caused by deletion operations. (Record deleted, space not reclaimed)
Restore the table to a new instance, delete the original composite index, and add another index for testing.
Mydumper:
user=root
passwd=xxxx
socket=/datas/mysql/data/3316/mysqld.sock
db=cq_new_cimiss
table_name=arrival_record
backupdir=/datas/dump_$table_name
mkdir -p $backupdir
nohup echo `date +%T` && mydumper -u $user -p $passwd -S $socket -B $db -c -T $table_name -o $backupdir -t 32 -r 2000000 && echo `date +%T` &Copy the code
The time (52s) and space (1.2g) taken by the parallel compression backup (the actual disk space of this table is 48G, and the compression ratio of myDumper is quite high) :
Started dump at: 2019-03-26 12:46:04 ........ Finished dump at: 2019-03-26 12:46:56 Du -sh /datas/dump_arrival_record/ 1.2g /datas/dump_arrival_record/ Finished dump at: 2019-03-26 12:46:56 DU -sh /datas/dump_arrival_record/Copy the code
Copy dump data to test node:
SCP - rp/datas/dump_arrival_record [email protected]: / datasCopy the code
Multithreaded import data:
time myloader -u root -S /datas/mysql/data/3308/mysqld.sock -P 3308 -p root -B test -d /datas/dump_arrival_record -t 32
real 126m42.885s
user 1m4.543s
sys 0m4.267sCopy the code
Disk space occupied by the logical import of this table:
du -h -d 1 /datas/mysql/data/3308/test/arrival_record.*
12K /datas/mysql/data/3308/test/arrival_record.frm
30G /datas/mysql/data/3308/test/arrival_record.ibd Copy the code
The size of this table is the same as that of the mysql table
cp -rp /datas/mysql/data/3308 /datasCopy the code
Online DDL and PT-OSC tools are used to delete and rebuild indexes.
Delete foreign key from compound index; foreign key column is the first column in compound index;
nohup bash /tmp/ddl_index.sh &
2019-04-04-10:41:39 begin stop mysqld_3308
2019-04-04-10:41:41 begin rm -rf datadir and cp -rp datadir_bak
2019-04-04-10:46:53 start mysqld_3308
2019-04-04-10:46:59 online ddl begin
2019-04-04-11:20:34 onlie ddl stop
2019-04-04-11:20:34 begin stop mysqld_3308
2019-04-04-11:20:36 begin rm -rf datadir and cp -rp datadir_bak
2019-04-04-11:22:48 start mysqld_3308
2019-04-04-11:22:53 pt-osc begin
2019-04-04-12:19:15 pt-osc stop Copy the code
Online DDL takes 34 minutes, pT-OSC 57 minutes, using onLNE DDL is about half the time of pT-OSC tool.
Do DDL reference:
6,
Since it is a master-slave instance, the application is a VIP connection, and the online DDL is used to delete and rebuild the index.
After the master/slave replication is stopped, a master/slave switchover is performed on the slave instance (no binlog is recorded), and then on the newly switched slave instance (no binlog is recorded) :
function red_echo () { local what="$*" echo -e "$(date +%F-%T) ${what}" } function check_las_comm(){ if [ "$1" != "0" ]; then red_echo "$2" echo "exit 1" exit 1 fi } red_echo "stop slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"stop slave" check_las_comm "$?" "stop slave failed" red_echo "online ddl begin" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"set sql_log_bin=0; select now() as ddl_start; ALTER TABLE $db_.\`${table_name}\` DROP FOREIGN KEY FK_arrival_record_product,drop index IXFK_arrival_record,add index idx_product_id_sequence_station_no(product_id,sequence,station_no),add index idx_receive_time(receive_time); select now() as ddl_stop" >>${log_file} 2>& 1 red_echo "onlie ddl stop" red_echo "add foreign key" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"set sql_log_bin=0; ALTER TABLE $db_.${table_name} ADD CONSTRAINT _FK_${table_name}_product FOREIGN KEY (product_id) REFERENCES cq_new_cimiss.product (id) ON DELETE NO ACTION ON UPDATE NO ACTION;" >>${log_file} 2>& 1 check_las_comm "$?" "add foreign key error" red_echo "add foreign key stop" red_echo "start slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"start slave" check_las_comm "$?" "start slave failed"Copy the code
Execution time:
2019-04-08-11:17:36 stop slave
mysql: [Warning] Using a password on the command line interface can be insecure.
ddl_start
2019-04-08 11:17:36
ddl_stop
2019-04-08 11:45:13
2019-04-08-11:45:13 onlie ddl stop
2019-04-08-11:45:13 add foreign key
mysql: [Warning] Using a password on the command line interface can be insecure.
2019-04-08-12:33:48 add foreign key stop
2019-04-08-12:33:48 start slave Copy the code
It takes 28 minutes to delete and rebuild indexes, and 48 minutes to add foreign key constraints.
Look again at the execution plan for delete and SELECT statements:
explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10', '%Y-%m-%d')\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: arrival_record
partitions: NULL
type: range
possible_keys: idx_receive_time
key: idx_receive_time
key_len: 6
ref: NULL
rows: 7540948
filtered: 100.00
Extra: Using where; Using index
explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: arrival_record
partitions: NULL
type: range
possible_keys: idx_product_id_sequence_station_no,idx_receive_time
key: idx_receive_time
key_len: 6
ref: NULL
rows: 291448
filtered: 16.66
Extra: Using index condition; Using whereCopy the code
Both use the IDx_Receive_time index, and the number of rows scanned is significantly reduced.
7. After index optimization
Delete still takes 77s:
delete from arrival_record where receive_time < STR_TO_DATE('2019-03-10', '%Y-%m-%d')\GCopy the code
The DELETE statement took 77 seconds to delete more than 3 million records from the receive_time index.
Delete large tables to delete small batches
The application is optimized to delete 10 minutes of data each time (each execution takes about 1s), and SLA(primary/secondary delay alarm) is not displayed in XXX:
Another method is to delete 20000 records at a time by primary key order:
SELECT MAX(ID) INTO @need_delete_max_id FROM 'arrival_record' WHERE SELECT MAX(ID) INTO @need_delete_max_id FROM 'arrival_record' WHERE receive_time<'2019-03-01' ; DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000; select ROW_COUNT(); If row_count() is not 0, the loop will be executed. If row_count() is 0, the loop will exit. DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000; select ROW_COUNT(); # Programmed sleep 0.5 secondsCopy the code
9,
When a table has a large amount of data, you need to pay attention to the response time of accessing the table and the maintenance cost of the table (for example, it takes a long time to create DDL tables and delete historical data).
When performing DDL operations on large tables, consider the actual situation of the table (for example, whether there are foreign keys for the table to be published together) to select the appropriate DDL change mode.
Delete large data tables to reduce the pressure on the master instance and the master/slave delay by deleting them in small batches.
The original link: www.cnblogs.com/YangJiaXin/… Wenyuan network, only for the use of learning, such as infringement, contact deletion.
I have collected quality technical articles and experience summary in my public account “Java Circle”.
In order to facilitate your learning, I have compiled a set of learning materials, covering Java virtual machine, Spring framework, Java threads, data structures, design patterns and so on, free for students who love Java! More learning communication group, more communication problems can be faster progress ~