Hbase入門(五)——客戶端（Java，Shell，Thrift，Rest，MR，WebUI）

Hbase的客戶端有原生java客戶端，Hbase Shell，Thrift，Rest，Mapreduce，WebUI等等。下麵是這幾種客戶端的常見用法。一、原生Java客戶端原生java客戶端是hbase最主要，最高效的客戶端。涵蓋了增刪改查等API，還實現了創建，刪除，修改表等DDL操作 ...

file

Hbase的客戶端有原生java客戶端，Hbase Shell，Thrift，Rest，Mapreduce，WebUI等等。

下麵是這幾種客戶端的常見用法。

file

一、原生Java客戶端

原生java客戶端是hbase最主要，最高效的客戶端。

涵蓋了增刪改查等API，還實現了創建，刪除，修改表等DDL操作。

配置java連接hbase

Java連接HBase需要兩個類：

HBaseConfiguration
ConnectionFactory

首先，配置一個hbase連接：

比如zookeeper的地址埠
hbase.zookeeper.quorum
hbase.zookeeper.property.clientPort

更通用的做法是編寫hbase-site.xml文件，實現配置文件的載入：

hbase-site.xml示例：

<configuration>

<property>
<name>hbase.master</name>
<value>hdfs://host1:60000</value>
</property>

<property>
<name>hbase.zookeeper.quorum</name>
<value>host1,host2,host3</value>
</property>

<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>

隨後我們載入配置文件，創建連接：

 config.addResource(new Path(System.getenv("HBASE_CONF_DIR"), "hbase-site.xml"));
 Connection connection = ConnectionFactory.createConnection(config);

創建表

要創建表我們需要首先創建一個Admin對象

Admin admin = connection.getAdmin(); //使用連接對象獲取Admin對象
TableName tableName = TableName.valueOf("test");//定義表名

HTableDescriptor htd = new HTableDescriptor(tableName);//定義表對象

HColumnDescriptor hcd = new HColumnDescriptor("data");//定義列族對象

htd.addFamily(hcd); //添加

admin.createTable(htd);//創建表

HBase2.X創建表

HBase2.X 的版本中創建表使用了新的 API

TableName tableName = TableName.valueOf("test");//定義表名
//TableDescriptor對象通過TableDescriptorBuilder構建；
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tableName);
ColumnFamilyDescriptor family = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("data")).build();//構建列族對象
tableDescriptor.setColumnFamily(family);//設置列族
admin.createTable(tableDescriptor.build());//創建表

添加數據

Table table = connection.getTable(tableName);//獲取Table對象
try {
    byte[] row = Bytes.toBytes("row1"); //定義行
    Put put = new Put(row);             //創建Put對象
    byte[] columnFamily = Bytes.toBytes("data");    //列
    byte[] qualifier = Bytes.toBytes(String.valueOf(1)); //列族修飾詞
    byte[] value = Bytes.toBytes("張三豐");    //值
    put.addColumn(columnFamily, qualifier, value);
    table.put(put);     //向表中添加數據

} finally {
    //使用完了要釋放資源
    table.close();
}

獲取指定行數據

//獲取數據
Get get = new Get(Bytes.toBytes("row1"));   //定義get對象
Result result = table.get(get);         //通過table對象獲取數據
System.out.println("Result: " + result);
//很多時候我們只需要獲取“值” 這裡表示獲取 data:1 列族的值
byte[] valueBytes = result.getValue(Bytes.toBytes("data"), Bytes.toBytes("1")); //獲取到的是位元組數組
//將位元組轉成字元串
String valueStr = new String(valueBytes,"utf-8");
System.out.println("value:" + valueStr);

掃描表中的數據

Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
try {
    for (Result scannerResult: scanner) {
        System.out.println("Scan: " + scannerResult);
         byte[] row = scannerResult.getRow();
         System.out.println("rowName:" + new String(row,"utf-8"));
    }
} finally {
    scanner.close();
}

刪除表

TableName tableName = TableName.valueOf("test");
admin.disableTable(tableName);  //禁用表
admin.deleteTable(tableName);   //刪除表

Hbase Java API表DDL完整示例：

package com.example.hbase.admin;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.compress.Compression.Algorithm;

public class Example {

  private static final String TABLE_NAME = "MY_TABLE_NAME_TOO";
  private static final String CF_DEFAULT = "DEFAULT_COLUMN_FAMILY";

  public static void createOrOverwrite(Admin admin, HTableDescriptor table) throws IOException {
    if (admin.tableExists(table.getTableName())) {
      admin.disableTable(table.getTableName());
      admin.deleteTable(table.getTableName());
    }
    admin.createTable(table);
  }

  public static void createSchemaTables(Configuration config) throws IOException {
    try (Connection connection = ConnectionFactory.createConnection(config);
         Admin admin = connection.getAdmin()) {

      HTableDescriptor table = new HTableDescriptor(TableName.valueOf(TABLE_NAME));
      table.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.NONE));

      System.out.print("Creating table. ");
      createOrOverwrite(admin, table);
      System.out.println(" Done.");
    }
  }

  public static void modifySchema (Configuration config) throws IOException {
    try (Connection connection = ConnectionFactory.createConnection(config);
         Admin admin = connection.getAdmin()) {

      TableName tableName = TableName.valueOf(TABLE_NAME);
      if (!admin.tableExists(tableName)) {
        System.out.println("Table does not exist.");
        System.exit(-1);
      }

      HTableDescriptor table = admin.getTableDescriptor(tableName);

      // 更新表格
      HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF");
      newColumn.setCompactionCompressionType(Algorithm.GZ);
      newColumn.setMaxVersions(HConstants.ALL_VERSIONS);
      admin.addColumn(tableName, newColumn);

      // 更新列族
      HColumnDescriptor existingColumn = new HColumnDescriptor(CF_DEFAULT);
      existingColumn.setCompactionCompressionType(Algorithm.GZ);
      existingColumn.setMaxVersions(HConstants.ALL_VERSIONS);
      table.modifyFamily(existingColumn);
      admin.modifyTable(tableName, table);

      // 禁用表格
      admin.disableTable(tableName);

      // 刪除列族
      admin.deleteColumn(tableName, CF_DEFAULT.getBytes("UTF-8"));

      // 刪除表格(需提前禁用)
      admin.deleteTable(tableName);
    }
  }

  public static void main(String... args) throws IOException {
    Configuration config = HBaseConfiguration.create();

    //添加必要配置文件(hbase-site.xml, core-site.xml)
    config.addResource(new Path(System.getenv("HBASE_CONF_DIR"), "hbase-site.xml"));
    config.addResource(new Path(System.getenv("HADOOP_CONF_DIR"), "core-site.xml"));
    createSchemaTables(config);
    modifySchema(config);
  }
}

二、使用Hbase Shell工具操作Hbase

在 HBase 安裝目錄 bin/ 目錄下使用hbase shell命令連接正在運行的 HBase 實例。

$ ./bin/hbase shell
hbase(main):001:0>

預覽 HBase Shell 的幫助文本

輸入help並回車, 可以看到 HBase Shell 的基本信息和一些示例命令.

創建表

使用 create創建一個表必須指定一個表名和列族名

hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds

=> Hbase::Table - test

表信息

使用 list 查看存在表

hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds

=> ["test"]

使用 `describe` 查看表細節及配置

hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
 => '65536'}
1 row(s)
Took 0.9998 seconds

插入數據

使用 put 插入數據

hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds

hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds

hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds

掃描全部數據

從 HBase 獲取數據的途徑之一就是 scan 。使用 scan 命令掃描表數據。你可以對掃描做限制。

hbase(main):006:0> scan 'test'
ROW                                      COLUMN+CELL
 row1                                    column=cf:a, timestamp=1421762485768, value=value1
 row2                                    column=cf:b, timestamp=1421762491785, value=value2
 row3                                    column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds

獲取一條數據

使用 get 命令一次獲取一條數據

hbase(main):007:0> get 'test', 'row1'
COLUMN                                   CELL
 cf:a                                    timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds

禁用表

使用 disable 命令禁用表

hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds

hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds

使用 enable 命令啟用表

hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds

刪除表

hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds

退出 HBase Shell

使用quit命令退出命令行並從集群斷開連接。

三、使用Thrift客戶端訪問HBase

由於Hbase是用Java寫的，因此它原生地提供了Java介面，對非Java程式人員，怎麼辦呢？幸好它提供了thrift介面伺服器，因此也可以採用其他語言來編寫Hbase的客戶端，這裡是常用的Hbase python介面的介紹。其他語言也類似。

1.啟動thrift-server

要使用Hbase的thrift介面，必須將它的服務啟動，啟動Hbase的thrift-server進程如下：

cd /app/zpy/hbase/bin
./hbase-daemon.sh start thrift 
執行jps命令檢查：
34533 ThriftServer

thrift預設埠是9090，啟動成功後可以查看埠是否起來。

2.安裝thrift所需依賴

(1)安裝依賴

yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel

(2)安裝boost

wget http://sourceforge.net/projects/boost/files/boost/1.53.0/boost_1_53_0.tar.gz 
tar xvf boost_1_53_0.tar.gz 
cd boost_1_53_0 
./bootstrap.sh 
./b2 install

3.安裝thrift客戶端

官網下載 thrift-0.11.0.tar.gz，解壓並安裝

wget http://mirrors.hust.edu.cn/apache/thrift/0.11.0/thrift-0.11.0.tar.gz
tar xzvf thrift-0.11.0.tar.gz
cd thrift-0.11.0
mkdir /app/zpy/thrift
./configure --prefix=/app/zpy/thrift
make 
make install

make可能報錯如下：

g++: error: /usr/lib64/libboost_unit_test_framework.a: No such file or directory

解決：

find / -name libboost_unit_test_framework.*
cp /usr/local/lib/libboost_unit_test_framework.a  /usr/lib64/

4.使用python3連接Hbase

安裝所需包

pip install thrift
pip install hbase-thrift

python 腳本如下：

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

from hbase import Hbase
from hbase.ttypes import *

transport = TSocket.TSocket('localhost', 9090)
protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)
transport.open()
a = client.getTableNames()
print(a)

四、Rest客戶端

1、啟動REST服務

a.啟動一個非守護進程模式的REST伺服器(ctrl+c 終止)

bin/hbase rest start

b.啟動守護進程模式的REST伺服器

bin/hbase-daemon.sh start rest

預設啟動的是8080埠(可以使用參數在啟動時指定埠)，可以被訪問。curl http://:8080/

2、java調用示例：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.rest.client.Client;
import org.apache.hadoop.hbase.rest.client.Cluster;
import org.apache.hadoop.hbase.rest.client.RemoteHTable;
import org.apache.hadoop.hbase.util.Bytes;
import util.HBaseHelper;
import java.io.IOException;

/**
 * Created by root on 15-1-9.
 */
public class RestExample {
    public static void main(String[] args) throws IOException {
       Configuration conf = HBaseConfiguration.create();

       HBaseHelper helper = HBaseHelper.getHelper(conf);
       helper.dropTable("testtable");
       helper.createTable("testtable", "colfam1");
       System.out.println("Adding rows to table...");
       helper.fillTable("testtable", 1, 10, 5, "colfam1");

       Cluster cluster=new Cluster();
       cluster.add("hadoop",8080);

       Client client=new Client(cluster);
 

       Get get = new Get(Bytes.toBytes("row-30")); 
       get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-3"));
       Result result1 = table.get(get);
 
       System.out.println("Get result1: " + result1);

       Scan scan = new Scan();
       scan.setStartRow(Bytes.toBytes("row-10"));
       scan.setStopRow(Bytes.toBytes("row-15"));
       scan.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-5"));
       ResultScanner scanner = table.getScanner(scan); 
       for (Result result2 : scanner) {
         System.out.println("Scan row[" + Bytes.toString(result2.getRow()) +
                    "]: " + result2);
        }
    }
}

五、MapReduce操作Hbase

Apache MapReduce 是Hadoop提供的軟體框架,用來進行大規模數據分析.

mapred and mapreduce

與 MapReduce 一樣,在 HBase 中也有 2 種 mapreduce API 包.org.apache.hadoop.hbase.mapred and org.apache.hadoop.hbase.mapreduce.前者使用舊式風格的 API,後者採用新的模式.相比於前者,後者更加靈活。

HBase MapReduce 示例

HBase MapReduce 讀示例

Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs
...

TableMapReduceUtil.initTableMapperJob(
  tableName,        // input HBase table name
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper
  null,             // mapper output key
  null,             // mapper output value
  job);
job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't emitting anything from mapper

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

public static class MyMapper extends TableMapper<Text, Text> {

  public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
    // process data for the row from the Result instance.
   }
}

HBase MapReduce 讀寫示例

Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class);    // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs

TableMapReduceUtil.initTableMapperJob(
  sourceTable,      // input table
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper class
  null,             // mapper output key
  null,             // mapper output value
  job);
TableMapReduceUtil.initTableReducerJob(
  targetTable,      // output table
  null,             // reducer class
  job);
job.setNumReduceTasks(0);

boolean b = job.waitForCompletion(true);
if (!b) {
    throw new IOException("error with job!");
}

六、Hbase Web UI

Hbase提供了一種Web方式的用戶介面，用戶可以通過Web界面查看Hbase集群的屬性等狀態信息，web頁面分為：Master狀態界面，和Zookeeper統計信息頁面。

預設訪問地址分別是：

ip:60010

ip::60030

ip:60010/zk.jsp

Master狀態界面會看到Master狀態的詳情。

該頁面大概分HBase集群信息，任務信息，表信息，RegionServer信息。每一部分又包含了一些具體的屬性。

RegionServer狀態界面會看到RegionServer狀態的詳情。

RegionServer的節點屬性信息，任務信息和Region信息。

Zookeeper統計信息頁面是非常簡單的半結構化文本列印信息。

file

更多實時計算，Hbase，Flink，Kafka等相關技術博文，歡迎關註實時流式計算

file

本文由博客一文多發平臺 OpenWrite 發佈！

Hbase入門(五)——客戶端（Java，Shell，Thrift，Rest，MR，WebUI）

一、原生Java客戶端

配置java連接hbase

創建表

HBase2.X創建表

添加數據

獲取指定行數據

掃描表中的數據

刪除表

二、使用Hbase Shell工具操作Hbase

預覽 HBase Shell 的幫助文本

創建表

表信息

使用 describe 查看表細節及配置

插入數據

掃描全部數據

獲取一條數據

禁用表

刪除表

退出 HBase Shell

三、使用Thrift客戶端訪問HBase

1.啟動thrift-server

2.安裝thrift所需依賴

3.安裝thrift客戶端

4.使用python3連接Hbase

四、Rest客戶端

1、啟動REST服務

2、java調用示例：

五、MapReduce操作Hbase

HBase MapReduce 示例

六、Hbase Web UI

使用 `describe` 查看表細節及配置