Apache HBase
HBase
HBase works seamlessly with Hadoop, sharing its file system and serving as a direct input and
output to the MapReduce framework and execution engine. HBase also integrates with Apache
Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support
for Java Database Connectivity (JDBC). For more information about HBase, see Apache HBase
With HBase on Amazon EMR, you can also back up your HBase data directly to Amazon Simple Storage Service (Amazon S3), and restore from a previously created backup when launching an HBase cluster. Amazon EMR offers additional options to integrate with Amazon S3 for data persistence and disaster recovery.
-
HBase on Amazon S3 - With Amazon EMR version 5.2.0 and later, you can use HBase on Amazon S3 to store a cluster's HBase root directory and metadata directly to Amazon S3. You can subsequently start a new cluster, pointing it to the root directory location in Amazon S3. Only one cluster at a time can use the HBase location in Amazon S3, with the exception of a read-replica cluster. For more information, see HBase on Amazon S3 (Amazon S3 storage mode).
-
HBase read-replicas - Amazon EMR version 5.7.0 and later with HBase on Amazon S3 supports read-replica clusters. A read-replica cluster provides read-only access to a primary cluster's store files and metadata for read-only operations. For more information, see Using a read-replica cluster.
HBase Snapshots - As an alternative to HBase on Amazon S3, with EMR version 4.0 and later you can create snapshots of your HBase data directly to Amazon S3 and then recover data using the snapshots. For more information, see Using HBase snapshots.
Important
For Amazon EMR HBase cluster scaling, we do not recommend using managed scaling or scaling with custom policies with HBase clusters.
The following table lists the version of HBase included in the latest release of the Amazon EMR 7.x series, along with the components that Amazon EMR installs with HBase.
For the version of components installed with HBase in this release, see Release 7.4.0 Component Versions.
Amazon EMR Release Label | HBase Version | Components Installed With HBase |
---|---|---|
emr-7.4.0 |
HBase 2.5.5 |
emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-wal-cli, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-mapred, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hbase-hmaster, hbase-client, hbase-region-server, hbase-rest-server, hbase-thrift-server, hbase-operator-tools, zookeeper-client, zookeeper-server |
The following table lists the version of HBase included in the latest release of the Amazon EMR 6.x series, along with the components that Amazon EMR installs with HBase.
For the version of components installed with HBase in this release, see Release 6.15.0 Component Versions.
Amazon EMR Release Label | HBase Version | Components Installed With HBase |
---|---|---|
emr-6.15.0 |
HBase 2.4.17 |
emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-wal-cli, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-mapred, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hbase-hmaster, hbase-client, hbase-region-server, hbase-rest-server, hbase-thrift-server, hbase-operator-tools, zookeeper-client, zookeeper-server |
Note
Apache HBase HBCK2 is a separate operational tool for repairing HBase regions and system
tables. In Amazon EMR version 6.1.0 and later, the hbase-hbck2.jar is provided in /usr/lib/hbase-operator-tools/
on the primary node. For more information about how to build and use the tool, see HBase
HBCK2
The following table lists the version of HBase included in the latest release of the Amazon EMR 5.x series, along with the components that Amazon EMR installs with HBase.
For the version of components installed with HBase in this release, see Release 5.36.2 Component Versions.
Amazon EMR Release Label | HBase Version | Components Installed With HBase |
---|---|---|
emr-5.36.2 |
HBase 1.4.13 |
emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-mapred, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hbase-hmaster, hbase-client, hbase-region-server, hbase-rest-server, hbase-thrift-server, zookeeper-client, zookeeper-server |
Topics
- Creating a cluster with HBase
- HBase on Amazon S3 (Amazon S3 storage mode)
- Write-ahead logs (WAL) for Amazon EMR
- Using the HBase shell
- Access HBase tables with Hive
- Using HBase snapshots
- Configure HBase
- View the HBase user interface
- View HBase log files
- Monitor HBase with Ganglia
- Migrating from previous HBase versions
- HBase release history