Abstract: This article describes a Hbase migration method, which can be used in some specific scenarios.

background

Data migration is often required in Hbase clusters for some reasons. In most cases, you can negotiate with users to migrate data in offline mode. It is easy to migrate data in offline mode, such as the entire Hbase data storage directory. However, when a large amount of data is stored in a cluster, it takes a long time to copy files and affects customers’ services. This section describes how to use Hbase functions to migrate clusters and reduce cluster service interruption time.

Introduction to the

It is well known that Hbase provides the snapshot function. You can use a snapshot to record table data at a certain point in time and save the snapshot. Table data can be restored to the time when the snapshot was taken. The Hbase snapshot can be used to export full data at a certain point in time.

Because the user’s business continued writing in the table, in addition to migrate a snapshot in time before the full amount of data, we need the snapshot time points after a steady stream of incremental data migration also, if we can adopt the way of double write here, to write the data into two clusters, but the user’s business can’t do this, if you do have to ensure that the double transactional consistency. You can use the Hbase replication function, which preserves WAL logs of the source cluster and writes them back to the destination cluster. In this way, the service end, the original cluster, and the destination cluster form a string of data flows, and Hbase ensures data correctness.

Therefore, the migration method is to use Snapshot to migrate full data and Replication to migrate incremental data.

Migration steps

The figure above shows the overall timeline of the migration, with five main points in time.

T0: A Replication relationship has been configured between cluster A and cluster B. Data has been synchronized from cluster A to cluster B and tables have been set to synchronization. Data newly written into cluster A tables will be saved in WAL logs from now on.

T1: generates full data at the point in time and exports the data to cluster B by creating a snapshot and exporting the snapshot data.

T2: After the snapshot data is imported to cluster B at time T1, A table is created in cluster B from the snapshot data. At this time, Replication in cluster A automatically plays back WAL logs saved at time T0 to the tables in cluster B to start incremental data synchronization.

T3: due to the fact that operation between T0 – T3 will spend a period of time, this time will accumulate a lot of WAL log files, need some time to synchronize to a new cluster, here need to monitor the data synchronization, and other old clusters are gradually consumption through a WAL, at this time can will stop the old cluster to write business and ready to cut all read and write operations to the new cluster B.

T4: there should be a very short time between T3-T4, and only at this point will there be some interruption in the whole migration. At this point, users are allowed to completely switch services to the new cluster B, and the migration is completed.

Commands involved in the operation

1. Set the peer relationship between clusters A and B

In the Hbase shell of the source cluster, set the peer

add_peer ‘peer_name’,’ClusterB:2181:/Hbase’

2. Set the replication property in the table of cluster A

Family=f

Go to Hbase Shell.

alter ‘Student’,{NAME => ‘f’,REPLICATION_SCOPE => ‘1’}

3. Create A snapshot for the table in cluster A

In the Hbase shell

snapshot ‘Student’,’Student_table_snapshot’

4. Export snapshots in cluster A

Hbase org.apache.hadoop.Hbase.snapshot.ExportSnapshot -snapshot Student_table_snapshot -copy-to /snapshot-backup/Student

5. Save the snapshot data to the corresponding directory in cluster B

The preceding command exports two directories, one containing snapshot metadata and the other containing raw data

Save metadata to /Hbase/.Hbase-snapshot and original data to /Hbase/archive

Due to Hbase archive directory will be regularly cleaned, here can ahead of time will be the master of Hbase cluster B. Master. The cleaner. The interval value larger, avoid copying process happened data cleaning.

If no directory exists in cluster B, create one in advance

hdfs dfs -mkdir -p /Hbase/.Hbase-snapshot hdfs dfs -mkdir -p /Hbase/archive/data/default/

Move the exported snapshot file to HDFS DFS -mv /snapshot-backup/Student/.Hbase-snapshot/Student_table_snapshot /Hbase/.Hbase-snapshot/ hdfs dfs -mv /snapshot-backup/Student/archive/data/default/Student /Hbase/archive/data/default/

6. Restore the snapshot of the table in cluster B

Go to Hbase shell restore_snapshot ‘Student_table_snapshot’.

Restore completed, remember to cluster B in hmaster Hbase. Master. The cleaner. The interval value to adjust back.

Reference Documents:

Blog.csdn.net/qq475781638…

Support.huaweicloud.com/usermanual-…

Click to follow, the first time to learn about Huawei cloud fresh technology ~