Abstract: In order to solve the problem of disk storage space limitation, the method of disk storage is proposed, so as to realize the purpose of decentralized storage of backup files in the secondary cluster.

1. In this paper,

In the dual-cluster Dr Scenario, data in the active cluster must be backed up to the standby cluster. However, as the amount of data in the primary cluster increases, the secondary cluster does not have a single large-capacity disk to store the backup set sent from the primary cluster, or the drive letter space for storing the backup set is insufficient to store all the backup files sent from the primary cluster. However, the backup and recovery of two clusters must be supported in a Dr Scenario. Therefore, in order to solve the problem of disk storage space limitation, a partition storage method is proposed to realize the purpose of dispersing backup files in the secondary cluster.

2. Disk storage principle

In the dual-cluster Dr Design, the role of the primary cluster is backup only, and that of the secondary cluster is recovery only. To synchronize the active and standby data, copy the backup set in. RCH format.

As shown in the preceding figure, before optimization, the primary cluster will SCP all the data compressed in RoachBackup to the secondary cluster. Such storage mode will cause a heavy burden on the secondary cluster disk space.

After optimization, the primary cluster will compress the SCP data to the RoachBackup folder in the instance directory of the secondary cluster. Then, different DN and CN in the backup directory are linked to the RoachBackup folder through soft link. Because the CN and DN were distributed on different disks, and the soft connection allowed Roach to see the same path as before, it realized the purpose of data storage on disk.

3. Storage by disk

Step 1: Create folder roachBackup in the CN and DN instance directories of the standby cluster. In the active cluster, create a directory link under the media directory of the standby cluster according to config.ini:

Such as/data1 / roach3 / mediadata roach / 20210129 _181026 / ecs – env – 2998 / dn_6001_6002 as a symbolic link, The symbolic links can point to prepare cluster roachbackup directory/data1 / ha_install_3 / data1 / roachbackup / 20210129 _091422 / master1 directory. The master1 subdirectory is Archive /data/data_colstore, which respectively represent log/row data/ column data

Step 2: Before the backup cluster is restored, skip the RoachBackup folder of each instance directory.

Step 3: There are different node directories in the backupKey folder of the different backup directory and the instance directory. The CN and DN directories in the node directory are soft-connected to the instance directory.

Step 4: This is mandatory only for full backup and full restore. Incremental backup and restore is optional. Following current design logic, both full and incremental backups have soft links.

Step 5: After the restoration is complete, delete the symbolic link created in the secondary cluster and the RoachBackup folder in each instance directory. For the roachBackup directory in the primary cluster, data is transferred and deleted at the same time.

4. Store the result by disk

As shown in the figure above, cn and DN directories are connected to data_CN,master1 and Dummy1 corresponding to RoachBackup through soft connection. With a soft connection, roach thought it was being transferred to one disk when it was SCP, but in fact it was being written to another disk when it was SCP, so the data could be stored on different disks.

5. Soft connection meaning

In disk storage, the foundation of implementation is the establishment of soft connection. So what does soft connect mean?

Soft links have their own inode numbers and blocks of user data. The user data block is just a reference to another file’s path name.

As shown in the above, in the case of cluster recovery dn_6001_6002 data, is actually to/data1 / ha_install_3 / data1 / roachbackup / 20210129 _091422 / master1 directory to read the real data, and complete data access, Finally, the backup set of the secondary cluster can be restored.

Conclusion 6.

With the development of data warehouse services and the increase of customer data volume, it requires continuous iteration of dual-cluster Dr. In-depth mining of customer requirements is the first principle of r&d.

A dual-cluster Dr Task has many policies, such as disk storage, life cycle clearing, breakpoint continued backup and recovery, and active/standby switchover. Building on the past, new features of dual clustering need to be developed, such as automated test tools, process monitoring tools, and cloud. In the future, the goal of product ecology is always the first productive force of research and development.

This document is shared with Zxy_db in Huawei cloud community disk Storage for GaussDB(DWS) Backup Dr.

Click to follow, the first time to learn about Huawei cloud fresh technology ~