Abstract: This paper introduces the whole process of Roach full recovery and describes the operation mechanism of Roach cluster full backup scenario.
Data backup and recovery is one of the important means to protect data security. The Roach tool supports multiple types of backup and recovery, such as cluster-level physical backup and table-level logical backup. Supports DISK, NBU, OBS, and EISOO backup media. Supports disaster recovery (Dr) between two clusters to ensure data reliability.
Roach supports the following features:
A list,
This article will sort out the whole process of Roach full recovery and describe the operation mechanism of Roach cluster full backup scenario. PITR, incremental backup, breakpoint continuation, dual cluster, table level backup and other functions are not described.
Roach’s cluster-level backup uses physical backup, that is, the database is backed up by copying physical files. The database can be fully restored by backing up data files and logs. Full backup is to back up all data in the current point-in-time database. Of course, full backup can back up an entire hard drive, an entire partition, or a specific directory. The advantage of full backup is convenient data recovery, because all data is in the same backup, all data will be restored once full backup is restored.
The advantages are as follows: Fast physical backup and low-cost backup and recovery through reasonable planning.
The disadvantage is that the backup time is longer than incremental backup.
2. Backup architecture
2.1 Roach Backup Global flow chart
Roach backup uses the producer-consumer model. The following figure shows the interaction of various threads and I/OS in the backup process. The exec thread acts as a producer, and at the same time reduces the pressure on the exec thread by adding parallel reader threads to read small files. Sender thread as consumer; The middle is connected by a large (configurable) 256MB buffer.
GaussDB kernel backup components are GaussRoach. Py and gs_roach. You need to start backup jobs in the cluster.
2.2 Scheduling Process
GaussRoach. Py: GaussRoach Roach for a single cluster full backup. Each time, run python GaussRoach. Py -t backup… Roach’s Python language model was up and running.
You can start GaussRoach. Py on any node as the active node, and then start the Gs_Roach process on each node to back up the local node and all nodes and DN on each node in parallel.
2.3 Backup process
2.4 Interface Invocation
After data backup and compression is completed, the compressed file needs to be transferred to the remote storage medium to achieve loose coupling between the storage medium and Roach backup process. The third-party media need not pay attention to Roach backup process and can be connected to Roach only according to interface implementation. At the same time, Roach process implementation does not have to pay attention to the implementation of the underlying storage media, reducing unnecessary branch judgment.
2.5 Full Backup
Based on the functional structure of the Gauss database, the required files are backed up in the following order:
· Database-related configuration files.
· Row saving all data: GaussDB A supports row saving.
· Xlog log file: Roach supports online service backup. The xlog file can be backed up to Redo the service during the backup during recovery to ensure data consistency.
· Column storage of all data: GaussDB A supports column storage.
· Backup data is backed up on a per-node basis, so each node only stores the backup of the current node.
2.6 Process Analysis
Analysis of the Roach tool backup scheduling flow diagram and log information is as follows:
The upper-layer code of the entire backup process is Python code, namely GaussRoach. Py. The master process is created and the Agent process is started by Python. The configuration and parameter checking of the backup is also done by python side code. After the Agent process of each node is pulled up, the C-side code performs specific service operations. After the C-side code is running, you can check the process status through PS UX to the Gs_Roach process.
· As long as you do not go to ③, the backup is not started; · ② : Metadata list is the list of files backed up this time;
· The percentage of backup time does not represent the backup progress of time dimension, but only the process progress;
· As long as it is not printed ④, the backup is not finished, even if the progress is 100%;
· ⑤ : After this parameter is enabled, all DDL statements will be executed logically, and operations on physical files will not take effect immediately;
· ⑥ : Data changes after create barrier will be backed up by xlog records;
· ⑨ : After the deferred DDL parameter is turned off, all delayed operations on physical files will be executed immediately;
2.7 Storage Mode of backup Sets
· Backup data will be compressed and written to the RCH file and then stored in the instance folder under the backup path, and each RCH file size is 4GB;
· Storage path: [store path] / roach/backupkey/hostname /, the store path for the backup command specified – media – the value of the destination backupkey as identification for the starting time of the current backup a particular set of backup, Hostname indicates the hostname of the current node. As follows:
· Roach compressed file *. RCH internal structure:
· Control the metadata to be backed up. The metadata is stored in the path of –metadata-destination specified in the backup command. The contents of the path are as follows:
· The meta information of each backup is recorded in the INI file
{” BackupCount “: 1,” BackupDetails “: [{” S_NO” : 1, “BackupKey” : “20190814 _163625”, “BackupType” : “FULL”,….}
· The Roach folder stores meta information for a specific backup
2.8 Log Recovery
Log is an important file for checking code running status and error location. The Roach kernel recycle log can ensure only one log recycle point: $GAUSSLOG/ Roach /.
Roach log management can be divided into three categories:
· Agent folder saves the logs generated on the kernel side;
· Controller folder holds scheduling information on python side;
· The frame folder stores logs generated by Python codes during the two-cluster Dr Process.
(1) Kernel logs
· By default, kernel logs only record messages of warning and higher levels. The Roach tool supports only ERROR and Warning logs by default. Run the –logging – logging-level INFO command to enable INFO level logging
· If the backup or restore operation fails, you can view the error summary displayed in the console to identify the host where the error occurred.
(2) System logs
· Linux logs system events to system logs. The Roach tool logs FATAL and ERROR messages to the same system log file. For example, on devices running SUSE Linux, Roach logs to the /var/log/messages file.
(3) Security logs
· Users can save all activity information to a file. Security log files include timestamps, as well as details of backup, restore, and build files. The security log file is named in the following format: roach-agent-security-YYYY-MM-DD_Hmmmss.log.
(4) Controller logs
· Controller logs are Python script run logs. Users can save controller logs to files.
· The controller log file format is as follows:
Roach – controller – YYYY – MM – DD_HHMMSS. Log. For example: Roach-controller-2015-12-15_203415.log When a fault occurs, you can quickly locate the fault location based on the screen information and the corresponding log file, improving the troubleshooting efficiency.
Third, summary
Backup can recover lost data, damaged data, and historical data, which is the basis for constructing disaster recovery solutions. How to quickly and effectively implement big data backup is a very important topic in this era.
This document is shared with full Backup of GaussDB(DWS) in Huawei cloud community.
Click to follow, the first time to learn about Huawei cloud fresh technology ~