Copying Hbase table(a large one) from one cluster to another cluster.
This can be done in two ways,
A.Creating a snap shot of the present table.
B.Copying the table using CopyTable class.
A.Creating a snapshot of the present table
In the first case, the versions of the two Hbase clusters should be the same, otherwise it is not possible to copy the table between the clusters.
First things first,
Set this property in your hbase-site.xml in both the $Hbase_Home/conf files like below,
On Source Hbase cluster:
1.Enter the Hbase shell in the source cluster and create a snapshot of the table you want to copy.
2.Give the snapshot name of the source table to anything you want
Now check whether the snapshot is created or not by using "list_snapshot" command.
3.Now export the snap shot to the destination cluster like below:
./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotName -copy-to namenodeURI:PortNumber/hbase -mappers 16(desired)
3.Now export the snap shot to the destination cluster like below:
./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotName -copy-to namenodeURI:PortNumber/hbase -mappers 16(desired)
Now the snapshot will be copied to the destination cluster.
On the receiving Hbase cluster,
1.clone_snapshot 'snapshotName','Namespace:tableName'
Tadumm..TSSSS
The table is copied.
B.Copying the table using CopyTable class:
This approach can be applied between clusters with different Hbase versions.
1. Create a table with the desired name in the hbase shell and column families which you will copy.
2.Go to this path $Hbase_Home and run the below command(source hbase path)
./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=zkHost:5181:/hbase --families=Column_family_name(s) --new.name=destination_table source_table
Check the number of rows by comparing the result of this command between two clusters hbase tables.
Count of Hbase table rows:
$Hbase_Home$./bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'table_name'
This will run a map reduce job.
Copying the table using Timestamp:
Copying the table using Timestamp:
./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=zkHost:5181:/hbase --families=Column_family_name(s)
--starttime=timestamp --endtime=timestamp --new.name=destination_table source_table
--starttime=timestamp --endtime=timestamp --new.name=destination_table source_table