Migrating and restoring Apache HBase tables on Apache HBase on Amazon S3 - Migrating to Apache HBase on Amazon S3 on Amazon EMR

Migrating and restoring Apache HBase tables on Apache HBase on Amazon S3

Data migration

This paper covers using the ExportSnapshot tool to migrate the data. For additional options, refer to Tips for Migrating to Apache HBase on Amazon S3 from HDFS.

Creating a snapshot

To create a snapshot, perform the following commands from the HBase shell:

hbase shell hbase(main):001:0>disable 'table_name' hbase(main):002:0>snapshot 'table_name', 'table_name_snapshot_date' hbase(main):003:0>enable 'table_name'

If you are taking the snapshot from a production HBase cluster and cannot afford service disruption, you do not need to disable the table to take a snapshot. There is minimal performance degradation if you keep the table active. However, there may be some inconsistencies between the state of the table at the end of the snapshot operation and the snapshot contents.

If you can afford service disruption in your production HBase cluster, disabling the table guarantees that the snapshot is fully consistent with the state of the disabled table.

Validating the snapshot

As soon as the snapshot is concluded, use the following command to check that the snapshot was successful.

hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -stats - snapshot table_name_snapshot_date Snapshot Info ---------------------------------------- Name: table_name_snapshot_date Type: FLUSH Table: table_name Format: 2 Created: 2018-03-29T16:02:06 Owner: 10 HFiles (0 in archive), total size 48.8 K (100.00% 48.8 K shared with the source table) 0 Logs, total size 0

Exporting a snapshot to Amazon S3

Next, use org.apache.hadoop.HBase.snapshot.ExportSnapshot to copy the data over to the Apache HBase root directory on Amazon S3.

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot_name> -copy-to s3://<HBase_on_S3_root_dir>/

As an example, the export of 40 TB of data with 4x10GB Direct Connect takes approximately four to five hours.

Data restore

Creating an empty table

If you are restoring data from a snapshot, first create an empty table and then issue a snapshot restore instead of a snapshot clone. A snapshot clone (clone_snapshot) produces an actual copy of the files. A snapshot restore (restore_snapshot) creates links to the files copied to the Amazon S3 root directory.

hbase shell hbase(main):001:0> create ‘table-name’,’cf1’ hbase(main):002:0> disable ‘table-name’

Restoring the snapshot from the HBase shell

After creating an empty table, you can restore the snapshot.

hbase(main):004:0> restore_snapshot ‘table-name-snapshot’ hbase(main):005:0> enable ‘table-name’