One, foreword

This document describes three simple Hbase Dr Backup schemes, including CopyTable, Export/Import, and Snapshot. They are introduced as follows:

Second, the CopyTable

2.1 introduction

A CopyTable can copy data from an existing table to a new table with the following features:

  • Support time interval, row interval, change table name, change column family name, and whether to Copy deleted data and other functions;
  • Before running this command, create a new table with the same structure as the original table.
  • CopyTableIs performed based on the HBase Client APIscanTo query, useputWrite.

2.2 Command Formats

Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
Copy the code

2.3 Common Commands

  1. CopyTable in the same cluster
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=tableCopy  tableOrig
Copy the code
  1. Copytables in different clusters
#Two tables with the same name
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase tableOrig

#It can also refer to the new table name
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--peer.adr=dstClusterZK:2181:/hbase \
--new.name=tableCopy tableOrig
Copy the code
  1. Here is a more complete example given by the authorities, specifying start and end times, cluster addresses, and copying only the specified column family:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--starttime=1265875194289 \
--endtime=1265878794289 \
--peer.adr=server1,server2,server3:2181:/hbase \
--families=myOldCf:myNewCf,cf2,cf3 TestTable
Copy the code

2.4 More Parameters

You can view more supported parameters by using –help

# hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
Copy the code

Third, Export/Import

3.1 introduction

  • ExportSupports data export to HDFS.ImportData can be imported from HDFS.ExportIt is also possible to specify the start and end times of the exported data, so it can be used for incremental backup.
  • ExportExport andCopyTableThe same is true for HBasescanoperation

3.2 Command Format

# Export
hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

# Inport
hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
Copy the code
  • The exportedoutputdirDirectories can be created automatically without being created in advance. After the export is complete, the user who runs the export command owns the exported file.
  • By default, only the given is exportedCellThe latest version, regardless of the historical version. To export multiple versions, the<versions>Parameter is replaced with the required number of versions.

3.3 Common Commands

  1. Export orders
Hbase org, apache hadoop. Hbase. Graphs. The Export tableName HDFS path/tableName dbCopy the code
  1. The import command
Hbase org, apache hadoop. Hbase. Graphs. The Import tableName HDFS path/tableName dbCopy the code

Fourth, the Snapshot

4.1 introduction

The HBase Snapshot function allows you to obtain a copy of a table (including content and metadata) with minimal performance overhead. Because the snapshot stores only the metadata of the table and the information of HFiles. The Clone operation creates a new table from the snapshot, and the restore operation restores the table contents to the snapshot node. The clone and restore operations do not copy any data because the underlying HFiles(files that contain HBase table data) are not modified, only the metadata information of the table is modified.

4.2 configuration

The HBase snapshot function is disabled by default. To enable the snapshot function, add the following configuration items to the hbase-site. XML file:

<property>
    <name>hbase.snapshot.enabled</name>
    <value>true</value>
</property>
Copy the code

4.3 Common Commands

All snapshot commands must be executed in the Hbase Shell interactive command line interface (CLI).

1. Take a Snapshot

#Taking a snapshot
hbase> snapshot 'the name of the table'.'Snapshot name'
Copy the code

By default, data is flushed in memory before a snapshot is taken. To ensure that the data in memory is included in the snapshot. But if you don’t want to include data in memory, you can disable the flush with the SKIP_FLUSH option.

#Disabling memory refresh
hbase> snapshot  'the name of the table'.'Snapshot name', {SKIP_FLUSH => true}
Copy the code

2. Listing Snapshots

#Obtaining the Snapshot List
hbase> list_snapshots
Copy the code

3. Deleting Snapshots

#To delete a snapshot
hbase> delete_snapshot 'Snapshot name'
Copy the code

4. Clone a table from snapshot

#Create a new table from an existing snapshot
hbase>  clone_snapshot 'Snapshot name'.'New table name'
Copy the code

5. Restore a snapshot

To restore a table to the snapshot node, disable the table first

hbase> disable 'the name of the table'
hbase> restore_snapshot 'Snapshot name'
Copy the code

Note the following: If Replication is configured in HBase, Replication works at the log level and snapshots work at the file system level. Therefore, after restoration, the replica and the primary server may be in different states. In this case, you can stop synchronization, restore all servers to the same data point, and then re-establish synchronization.

The resources

  1. Online Apache HBase Backups with CopyTable
  2. Apache HBase ™ Reference Guide

See the GitHub Open Source Project: Getting Started with Big Data for more articles in the big Data series