This article is from huawei cloud MySQL RESEARCH and development team. It mainly shares the backup process of MySQL backup tool Xtrabackup, the optimization and improvement made by Huawei cloud database team, as well as possible problems and solutions in use. This article mainly discusses Huawei cloud RDS for MySQL and user-built community MySQL database, hoping to help you understand and use Xtrabackup, and face Xtrabackup more calmly in the future.

Introduction to Xtrabackup

Xtrabackup is an open source backup tool developed by Percona team for physical hot backup of MySQL database. It has the characteristics of fast backup speed, backup data compression, automatic verification of backup data, streaming output, and almost no impact on services during backup. Xtrabackup is widely used by various cloud vendors as a MySQL backup tool.

Xtrabackup has two versions: Xtrabackup 2.4.x and 8.0.x. Xtrabackup is used to backup MySQL 5.x and MySQL 8.0.x respectively. Here we introduce how Xtrabackup backs up MySQL community edition and the backup principle of Xtrabackup on Huawei Cloud.

MySQL Xtrabackup backup

Xtrabackup is designed for Percona MySQL and also supports backing up the official community version of MySQL as shown in the following figure:

Figure 1: Xtrabackup backup official MySQL flow diagram

  1. Compatibility check: Xtrabackup community version only supports tables of MyISAM, InnoDB, CSV and MRG_MYISAM. Tables of other storage engines are not backed up. In this step, Xtrabackup will not backup the table if the storage engine that contains the table is not one of the above four engines by querying tables.
  2. Start the redo log backup thread: Start the redo log backup thread and back up all incremental redo logs from the last checkpoint LSN of the backup instance until the backup task is complete.
  3. Load all InnoDB table Spaces: Open and scan all innoDB table data files, check the first page of all table Spaces, initialize the memory structure of all tables.
  4. Back up innoDB tables: Walk through the memory structure of the tables built in Step 3, back up the data files of each InnoDB table, and check whether the data of each page is correct during the backup process.
  5. FLUSH TABLES WITH READ LOCK (FTWRL) : FLUSH TABLES WITH READ LOCK (FTWRL) : FLUSH TABLES WITH READ LOCK (FTWRL) : FLUSH TABLES WITH READ LOCK (FTWRL) : FLUSH TABLES WITH READ LOCK (FTWRL)
  6. Back up non-InnoDB tables: It is safe to back up non-InnoDB tables at this time because we have locked the instance in Step 5.
  7. Record the current GTID information of binlog: Note that we still hold the global read lock at this time. The purpose of this step is to use the backup set to quickly create a standby machine.
  8. stop redo Backup thread.
  9. Lock resources are released. Backup is complete.

Note that Xtrabackup 2.4.x differs from 8.0.x in steps 7 and 8. This difference is due to MySQL 8.0.x.

Huawei Cloud RDS for MySQL backup

When backing up a community edition MySQL instance, Xtrabackup will add a global read lock (FTWRL) to the instance. This lock has a serious impact on the services of the database, and even causes the database to “hang”, which is unacceptable to customers. Therefore, huawei Cloud MySQL team has optimized this process, mainly in two aspects:

  1. Add LOCK TABLES FOR BACKUP to MySQL 5.x and 0.x
  2. MySQL 5.x add binlog LOCK: LOCK binlog FOR BACKUP

After optimization, huawei Cloud Xtrabackup backs up MySQL as follows:

Figure 2 Xtrabackup process for backing up huawei cloud MySQL

Compared to FTWRL, LOCK TABLES FOR BACKUP has little impact on client instances. The LOCK process is simple. DML operations on InnoDB TABLES are not affected during the LOCK process, but all update operations and DDL operations on non-InnoDB TABLES are still not allowed.

After backing up all table files, Xtrabackup needs to obtain binlog GTID information.

  • FOR MySQL 5.x, Xtrabackup 2.4.x performs the LOCK BINLOG FOR BACKUP operation, locks the BINLOG, and obtains the GTID information.
  • For MySQL 8.0.x, Huawei Cloud Xtrabackup 8.0.x uses the official consistency backup point query method. When Xtrabackup queries log_status, the MySQL server uses lightweight locks on redo logs, binlogs, etc., to obtain consistent backup points. This process is very short and has little impact on the running of the instance. MySQL 8.0.x backup consistency points: redo log LSN and binlog GTID; Xtrabackup backs up the last binlog file to determine whether the transaction needs to be rolled back during recovery. Finally, the redo log backup thread task stops when the redo log LSN is greater than the redo log LSN at the backup consistency point.

Since Xtrabackup 2.4.x and 8.0.x handle binlogs differently, the recovery process is also different, which will be explained in more detail in a future article.

4. Common problems and solutions

Huawei Cloud has used Xtrabackup to provide backup services for almost all MySQL instances of the company. During the process of using Xtrabackup, we actively keep in touch with the community and report some problems to the Percona community to help Xtrabackup to better evolution. In addition, if the community fails to rectify fatal problems, Huawei cloud database team will rectify them in time to ensure the correctness of backup data.

The following is a summary of the problems we may encounter in each stage of the Xtrabackup backup process, and analysis of their causes and corresponding solutions.

1. Check compatibility* * * *

  • Symptom: After Xtrabackup is started, Xtrabackup is suspended for a long time, and the redo log backup thread is not started.

Cause: The MDL lock cannot be obtained during Xtrabackup compatibility check. The Xtrabackup compatibility check is implemented by querying the imformation_schema.tables plug-in table:

“SELECT CONCAT(table_schema, ‘/’, table_name), engine FROM information_schema.tables WHERE engine NOT IN (‘MyISAM’, ‘InnoDB’, ‘CSV’, ‘MRG_MYISAM’) AND table_schema NOT IN (‘performance_schema’, ‘information_schema’, ‘mysql’)”

If there is a long DML or DDL statement in the MySQL instance, or if there is an MDL deadlock, the above query will be blocked waiting for the MDL lock, and Xtrabackup will “hang” for a long time.

Solution: If the reason for the lock is only because other SQL statements are blocked, wait for other SQL statements to complete. If a deadlock occurs, analyze the cause of the deadlock and remove the deadlock. Huawei Cloud RDS for MySQL provides the MDL lock view function to help users analyze SERVICE MDL deadlocks.

2.redo log Backup phase* * * *

  • Symptom 1: The redo log fails to be backed up. Xtrabackup displays the following error message:

“Xtrabackup: error:it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying Being too slow, or log files being too small.\n”);”

The reason: The backup speed of the redo log backup thread on Xtrabackup is lower than that of the redo log backup thread on Xtrabackup. Because the MySQL redo log file is written in round-robin mode, the newly written logs overwrite the previously written logs that have not been backed up. Therefore, the backup fails.

Solution: Backup during off-peak hours or increase the redo log size.

  • Symptom 2: Backup fails due to DDL operation, and the following error information is displayed:

“An Optimized (without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.

PXB will not be able to take a consistent backup. Retry the backup operation”

Cause: DURING the backup process, a DDL operation was performed to create indexes for the MySQL instance. Because redo data is not written during index creation, data inconsistency may occur if you continue the backup. Therefore, Xtrabackup backup failure is expected in this scenario.

Workaround: Do not create indexes during the backup process. If necessary, it is recommended to directly include indexes in table statements or to use lock-DDL arguments for backups (blocking new DDL operations on the instance).

  • Symptom 3: The undo truncate backup fails, and the Xtrabackup error message is as follows:

“An undo DDL truncation (could be automatic) operation has been performed.”

Cause: During Xtrabackup backup, if undo TRUNCate occurs in MySQL instances, undo logs written into new undo files (with different space ids) may be lost, causing problems in restored data. This problem is fixed in Xtrabackup 8.0.14 (based on MySQL 8.0.21) using the redo log thread. If the redo log is parsed as the TRUNCate operation of undo log, the backup fails. Unfortunately, this fix does not completely resolve the issue, and there are still two scenarios where the community version of Xtrabackup can have inconsistent recovered data:

  1. MySQL 8.0.21;
  2. A user creates a new Undo TABLESPACE during the backup process.

Solution: During backup, disable the TRUNCate operation of undo tablespace and prohibit users from creating undo tablespace. In this way, inconsistent backup data can be effectively prevented. In addition, Huawei Cloud Xtrabackup has further repaired this problem, which can effectively prevent this kind of phenomenon.

3. Loading the tablespace* * * *

  • Symptom 1: Xtrabackup displays an error message: Too many open files

Cause: The operating system allows a limited number of files to be opened at the same time. Xtrabackup will open all table files at the same time during the load TABLESPACE phase. If the number of tables opened by Xtrabackup exceeds the limit, the backup fails.

Workaround: Scale up the operating system to allow the maximum number of files to be turned on at the same time, or use the lock-DDL parameter (blocking new DDL operations on the instance).

  • Symptom 2: A rename table backup fails, and the following error information is displayed:

“Trying to add tablespace ‘XXXX’ with ID XXX to the tablespace memory cache, but tablespace xxxx already exists in the cache! ;”

Cause: Xtrabackup does not lock the entire process of opening a tablespace. If a RENAME table occurs, the same tablespace may be loaded repeatedly. In this case, Xtrabackup will detect duplicate TABLESPACE ids, and therefore backup fails.

Percona Xtrabackup 2.4.x only supports single-thread loading of a table space. Huawei Xtrabackup supports multi-thread loading of a table space.

4. Backup innodb Table stage* * * *

  • Symptom: The innoDB table data file is damaged and the backup fails. The error message is as follows:

“Xtrabackup: Database Page Corruption detected at Page XXXX, retrying.”

Cause: Xtrabackup will check the checksum of each page when backing up innoDB table data files. If the checksum is not correct, then the backup fails. Then the data of MySQL instance has been corrupted (for example, disk silence error).

Solution: The backup can only succeed if the data is restored by restoring the previous backup data or using other methods, which will be covered in more detail in future articles.

Five, the conclusion

This article mainly introduces the backup principle of Xtrabackup, MySQL backup community edition and huawei cloud’s improvement on it, and shares the troubleshooting and solution of Xtrabackup common problems. We will also bring you more in-depth analysis and more practical use skills later. I hope this will help you understand and use Xtrabackup. We will continue to provide customers with better database services, and always protect the data security of customers.

 

Finally, tell everyone a good news, cloud database MySQL 19.9 yuan in package, power enterprise safe on cloud, welcome everyone to come to experience > > activity.huaweicloud.com/dbs_Promoti…