Apache Hadoop : Creating HBase table with HBase shell and Hue
In this chapter, we're going to learn the basics of HBase. First, we will create a table with HBase shell, and then create the same table using UI provided by Hue.
- Wide-column NoSQL database, does not provide a SQL based access.
- Column-Oriented data store, known as "Hadoop Database".
- Unlike HDFS, HBase supports random real-time CRUD.
- Distributed and designed to serve large tables.
- Based on Google's Bigtable which is built on top of GFS, HBase is implemented on top of HDFS.
- Horizontally scalable - automatic sharding.
- Integration with MapReduce framework.
- Thrift, Avro, and RESTful web services.
- Use CREATE TABLE over HDFS data.
- Then, query with Hive.
HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes.
"A Bigtable is a sparse, distributed, persistent multidimensional sorted map"
HBase is an open source, non-relational, distributed database modeled after Google's BigTable.
So, the best way to know about HBase is to understand the six colorful concepts.
HDFS | HBase |
---|---|
Distributed file system | Built on top of HDFS |
No fast data lookups | Fast data lookups via indexed files |
Latency : high | Latency : low |
Only sequential access | Random access via hash tables |
RowID | Column Family 1 | Column Family 2 | ||||
---|---|---|---|---|---|---|
col 1 | col 2 | col 3 | col 1 | col 2 | col 3 | |
1 | ||||||
2 |
hduser@laptop:/usr/local/hbase/bin$ hbase Usage: hbase [] [ ] Options: --config DIR Configuration direction to use. Default: ./conf --hosts HOSTS Override the list in 'regionservers' file --auth-as-server Authenticate to ZooKeeper using servers configuration Commands: Some commands take arguments. Pass no args or -h for usage. shell Run the HBase shell hbck Run the hbase 'fsck' tool snapshot Create a new snapshot of a table snapshotinfo Tool for dumping snapshot information wal Write-ahead-log analyzer hfile Store file analyzer zkcli Run the ZooKeeper shell upgrade Upgrade hbase master Run an HBase HMaster node regionserver Run an HBase HRegionServer node zookeeper Run a Zookeeper server rest Run an HBase REST server thrift Run the HBase Thrift server thrift2 Run the HBase Thrift2 server clean Run the HBase clean up script classpath Dump hbase CLASSPATH mapredcp Dump CLASSPATH entries required by mapreduce pe Run PerformanceEvaluation ltt Run LoadTestTool version Print the version CLASSNAME Run the class named CLASSNAME
e can connect to the running instance of HBase using the hbase shell command, located in the bin/ directory of our HBase install. The HBase Shell is a command interpreter for HBasewritten in Ruby.
The HBase Shell prompt ends with a > character:
hduser@laptop:/usr/local/hbase/bin$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016 hbase(main):001:0>
Type help and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.
hbase(main):001:0> help HBase Shell, version 1.2.4, r67592f3d062743907f8c5ae00dbbe1ae4f69e5af, Tue Oct 25 18:10:20 CDT 2016 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve Group name: tools Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump Group name: replication Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs Group name: snapshots Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot Group name: configuration Commands: update_all_config, update_config Group name: quotas Commands: list_quotas, set_quota Group name: security Commands: grant, list_security_capabilities, revoke, user_permission Group name: procedures Commands: abort_procedure, list_procedures Group name: visibility labels Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility SHELL USAGE: Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used in the creation and alteration of tables are Ruby Hashes. They look like this: {'key1' => 'value1', 'key2' => 'value2', ...} and are opened and closed with curley-braces. Key/values are delimited by the '=>' character combination. Usually keys are predefined constants such as NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type 'Object.constants' to see a (messy) list of all constants in the environment. If you are using binary keys or values and need to enter them in the shell, use double-quote'd hexadecimal representation. For example: hbase> get 't1', "key\x03\x3f\xcd" hbase> get 't1', "key\003\023\011" hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40" The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added. For more on the HBase Shell, see http://hbase.apache.org/book.html hbase(main):002:0>
hduser@laptop:~$ hdfs dfs -ls / Found 3 items drwxr-xr-x - hduser supergroup 0 2016-11-19 21:59 /hbase drwx------ - hduser supergroup 0 2016-11-18 16:04 /tmp drwxr-xr-x - hduser supergroup 0 2016-11-18 09:13 /user hduser@laptop:~$ hdfs dfs -ls /hbase Found 7 items drwxr-xr-x - hduser supergroup 0 2016-11-19 21:59 /hbase/.tmp drwxr-xr-x - hduser supergroup 0 2016-11-19 23:00 /hbase/MasterProcWALs drwxr-xr-x - hduser supergroup 0 2016-11-19 21:59 /hbase/WALs drwxr-xr-x - hduser supergroup 0 2016-11-19 22:00 /hbase/data -rw-r--r-- 1 hduser supergroup 42 2016-11-19 21:59 /hbase/hbase.id -rw-r--r-- 1 hduser supergroup 7 2016-11-19 21:59 /hbase/hbase.version drwxr-xr-x - hduser supergroup 0 2016-11-19 22:59 /hbase/oldWALs
The following couple of sections are from Get Started with HBase.
We can create a table using the create command, here we must specify the table name and the Column Family name. The syntax to create a table in HBase shell is this:
create '<table name>','<column family>'
First, let's check the status of HBase:
hbase(main):001:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
Use the create command to create a new table. We must specify the table name and the ColumnFamily name:
hbase(main):002:0> create 'test', 'cf' 0 row(s) in 2.9340 seconds => Hbase::Table - test
hbase(main):003:0> list 'test' TABLE test 1 row(s) in 0.0990 seconds => ["test"]
Here, we insert three values, one at a time.
The first insert is at row1, column cf:a, with a value of value1.
Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in the case below:
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' 0 row(s) in 1.1960 seconds hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2' 0 row(s) in 0.0780 seconds hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3' 0 row(s) in 0.0700 seconds
We can get data from HBase using scan. We can limit our scan, but for now, we fetch all data:
hbase(main):007:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1479672857436, value=value1 row2 column=cf:b, timestamp=1479672868662, value=value2 row3 column=cf:c, timestamp=1479672883262, value=value3 3 row(s) in 0.1960 seconds
To get a single row of data at a time, we can use the get command.
hbase(main):008:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1479672857436, value=value1 1 row(s) in 0.1200 seconds
If we want to delete a table or change its settings, as well as in some other situations, we need to disable the table first, using the disable command. We can re-enable it using the enable command.
hbase(main):009:0> disable 'test' 0 row(s) in 4.7960 seconds hbase(main):010:0> enable 'test' 0 row(s) in 2.5360 seconds
To drop (delete) a table, use the drop command.
hbase(main):011:0> drop 'test' ERROR: Table test is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1' hbase(main):012:0> disable 'test' 0 row(s) in 4.4210 seconds hbase(main):013:0> drop 'test' 0 row(s) in 2.5440 seconds
To exit the HBase Shell and disconnect from our cluster, use the quit command. Note that HBase is still running in the background.
hbase(main):014:0> quit hduser@laptop:/usr/local/hbase/bin$
(Note) The following material is doing the same thing as we've done in previous sections. But it is based on CDH quickstart. So, we may want to install the quickstart from Cloudera before we proceed.
Here, we'll create the same table using HBase browser.
Under Hue > Data Browsers > HBase:
Hit "Submit". Then, in the lower right corner, press "New Row" button:
Type in "row1" for the row key, "cf:a" for family:column_name, "value1" for cell_value:
Do the same for "row2" and "row3", then our table will look like this:
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization