Hbase的客戶端有原生java客戶端,Hbase Shell,Thrift,Rest,Mapreduce,WebUI等等。 下麵是這幾種客戶端的常見用法。 一、原生Java客戶端 原生java客戶端是hbase最主要,最高效的客戶端。 涵蓋了增刪改查等API,還實現了創建,刪除,修改表等DDL操作 ...
Hbase的客戶端有原生java客戶端,Hbase Shell,Thrift,Rest,Mapreduce,WebUI等等。
下麵是這幾種客戶端的常見用法。
一、原生Java客戶端
原生java客戶端是hbase最主要,最高效的客戶端。
涵蓋了增刪改查等API,還實現了創建,刪除,修改表等DDL操作。
配置java連接hbase
Java連接HBase需要兩個類:
HBaseConfiguration
ConnectionFactory
首先,配置一個hbase連接:
比如zookeeper的地址埠
hbase.zookeeper.quorum
hbase.zookeeper.property.clientPort
更通用的做法是編寫hbase-site.xml文件,實現配置文件的載入:
hbase-site.xml示例:
<configuration>
<property>
<name>hbase.master</name>
<value>hdfs://host1:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>host1,host2,host3</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
</configuration>
隨後我們載入配置文件,創建連接:
config.addResource(new Path(System.getenv("HBASE_CONF_DIR"), "hbase-site.xml"));
Connection connection = ConnectionFactory.createConnection(config);
創建表
要創建表我們需要首先創建一個Admin
對象
Admin admin = connection.getAdmin(); //使用連接對象獲取Admin對象
TableName tableName = TableName.valueOf("test");//定義表名
HTableDescriptor htd = new HTableDescriptor(tableName);//定義表對象
HColumnDescriptor hcd = new HColumnDescriptor("data");//定義列族對象
htd.addFamily(hcd); //添加
admin.createTable(htd);//創建表
HBase2.X創建表
HBase2.X 的版本中創建表使用了新的 API
TableName tableName = TableName.valueOf("test");//定義表名
//TableDescriptor對象通過TableDescriptorBuilder構建;
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tableName);
ColumnFamilyDescriptor family = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("data")).build();//構建列族對象
tableDescriptor.setColumnFamily(family);//設置列族
admin.createTable(tableDescriptor.build());//創建表
添加數據
Table table = connection.getTable(tableName);//獲取Table對象
try {
byte[] row = Bytes.toBytes("row1"); //定義行
Put put = new Put(row); //創建Put對象
byte[] columnFamily = Bytes.toBytes("data"); //列
byte[] qualifier = Bytes.toBytes(String.valueOf(1)); //列族修飾詞
byte[] value = Bytes.toBytes("張三豐"); //值
put.addColumn(columnFamily, qualifier, value);
table.put(put); //向表中添加數據
} finally {
//使用完了要釋放資源
table.close();
}
獲取指定行數據
//獲取數據
Get get = new Get(Bytes.toBytes("row1")); //定義get對象
Result result = table.get(get); //通過table對象獲取數據
System.out.println("Result: " + result);
//很多時候我們只需要獲取“值” 這裡表示獲取 data:1 列族的值
byte[] valueBytes = result.getValue(Bytes.toBytes("data"), Bytes.toBytes("1")); //獲取到的是位元組數組
//將位元組轉成字元串
String valueStr = new String(valueBytes,"utf-8");
System.out.println("value:" + valueStr);
掃描表中的數據
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
try {
for (Result scannerResult: scanner) {
System.out.println("Scan: " + scannerResult);
byte[] row = scannerResult.getRow();
System.out.println("rowName:" + new String(row,"utf-8"));
}
} finally {
scanner.close();
}
刪除表
TableName tableName = TableName.valueOf("test");
admin.disableTable(tableName); //禁用表
admin.deleteTable(tableName); //刪除表
Hbase Java API表DDL完整示例:
package com.example.hbase.admin;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.io.compress.Compression.Algorithm;
public class Example {
private static final String TABLE_NAME = "MY_TABLE_NAME_TOO";
private static final String CF_DEFAULT = "DEFAULT_COLUMN_FAMILY";
public static void createOrOverwrite(Admin admin, HTableDescriptor table) throws IOException {
if (admin.tableExists(table.getTableName())) {
admin.disableTable(table.getTableName());
admin.deleteTable(table.getTableName());
}
admin.createTable(table);
}
public static void createSchemaTables(Configuration config) throws IOException {
try (Connection connection = ConnectionFactory.createConnection(config);
Admin admin = connection.getAdmin()) {
HTableDescriptor table = new HTableDescriptor(TableName.valueOf(TABLE_NAME));
table.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.NONE));
System.out.print("Creating table. ");
createOrOverwrite(admin, table);
System.out.println(" Done.");
}
}
public static void modifySchema (Configuration config) throws IOException {
try (Connection connection = ConnectionFactory.createConnection(config);
Admin admin = connection.getAdmin()) {
TableName tableName = TableName.valueOf(TABLE_NAME);
if (!admin.tableExists(tableName)) {
System.out.println("Table does not exist.");
System.exit(-1);
}
HTableDescriptor table = admin.getTableDescriptor(tableName);
// 更新表格
HColumnDescriptor newColumn = new HColumnDescriptor("NEWCF");
newColumn.setCompactionCompressionType(Algorithm.GZ);
newColumn.setMaxVersions(HConstants.ALL_VERSIONS);
admin.addColumn(tableName, newColumn);
// 更新列族
HColumnDescriptor existingColumn = new HColumnDescriptor(CF_DEFAULT);
existingColumn.setCompactionCompressionType(Algorithm.GZ);
existingColumn.setMaxVersions(HConstants.ALL_VERSIONS);
table.modifyFamily(existingColumn);
admin.modifyTable(tableName, table);
// 禁用表格
admin.disableTable(tableName);
// 刪除列族
admin.deleteColumn(tableName, CF_DEFAULT.getBytes("UTF-8"));
// 刪除表格(需提前禁用)
admin.deleteTable(tableName);
}
}
public static void main(String... args) throws IOException {
Configuration config = HBaseConfiguration.create();
//添加必要配置文件(hbase-site.xml, core-site.xml)
config.addResource(new Path(System.getenv("HBASE_CONF_DIR"), "hbase-site.xml"));
config.addResource(new Path(System.getenv("HADOOP_CONF_DIR"), "core-site.xml"));
createSchemaTables(config);
modifySchema(config);
}
}
二、使用Hbase Shell工具操作Hbase
在 HBase 安裝目錄 bin/ 目錄下使用hbase shell
命令連接正在運行的 HBase 實例。
$ ./bin/hbase shell
hbase(main):001:0>
預覽 HBase Shell 的幫助文本
輸入help
並回車, 可以看到 HBase Shell 的基本信息和一些示例命令.
創建表
使用 create
創建一個表 必須指定一個表名和列族名
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
表信息
使用 list
查看存在表
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds
=> ["test"]
使用 describe
查看表細節及配置
hbase(main):003:0> describe 'test'
Table test is ENABLED
test
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE =>
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f
alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
=> '65536'}
1 row(s)
Took 0.9998 seconds
插入數據
使用 put
插入數據
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds
掃描全部數據
從 HBase 獲取數據的途徑之一就是 scan
。使用 scan 命令掃描表數據。你可以對掃描做限制。
hbase(main):006:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1421762485768, value=value1
row2 column=cf:b, timestamp=1421762491785, value=value2
row3 column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
獲取一條數據
使用 get
命令一次獲取一條數據
hbase(main):007:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
禁用表
使用 disable
命令禁用表
hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds
hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds
使用 enable
命令啟用表
hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds
刪除表
hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
退出 HBase Shell
使用quit
命令退出命令行並從集群斷開連接。
三、使用Thrift客戶端訪問HBase
由於Hbase是用Java寫的,因此它原生地提供了Java介面,對非Java程式人員,怎麼辦呢?幸好它提供了thrift介面伺服器,因此也可以採用其他語言來編寫Hbase的客戶端,這裡是常用的Hbase python介面的介紹。其他語言也類似。
1.啟動thrift-server
要使用Hbase的thrift介面,必須將它的服務啟動,啟動Hbase的thrift-server進程如下:
cd /app/zpy/hbase/bin
./hbase-daemon.sh start thrift
執行jps命令檢查:
34533 ThriftServer
thrift預設埠是9090,啟動成功後可以查看埠是否起來。
2.安裝thrift所需依賴
(1)安裝依賴
yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel
(2)安裝boost
wget http://sourceforge.net/projects/boost/files/boost/1.53.0/boost_1_53_0.tar.gz
tar xvf boost_1_53_0.tar.gz
cd boost_1_53_0
./bootstrap.sh
./b2 install
3.安裝thrift客戶端
官網下載 thrift-0.11.0.tar.gz,解壓並安裝
wget http://mirrors.hust.edu.cn/apache/thrift/0.11.0/thrift-0.11.0.tar.gz
tar xzvf thrift-0.11.0.tar.gz
cd thrift-0.11.0
mkdir /app/zpy/thrift
./configure --prefix=/app/zpy/thrift
make
make install
make可能報錯如下:
g++: error: /usr/lib64/libboost_unit_test_framework.a: No such file or directory
解決:
find / -name libboost_unit_test_framework.*
cp /usr/local/lib/libboost_unit_test_framework.a /usr/lib64/
4.使用python3連接Hbase
安裝所需包
pip install thrift
pip install hbase-thrift
python 腳本如下:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
transport = TSocket.TSocket('localhost', 9090)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()
a = client.getTableNames()
print(a)
四、Rest客戶端
1、啟動REST服務
a.啟動一個非守護進程模式的REST伺服器(ctrl+c 終止)
bin/hbase rest start
b.啟動守護進程模式的REST伺服器
bin/hbase-daemon.sh start rest
預設啟動的是8080埠(可以使用參數在啟動時指定埠),可以被訪問。curl http://
2、java調用示例:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.rest.client.Client;
import org.apache.hadoop.hbase.rest.client.Cluster;
import org.apache.hadoop.hbase.rest.client.RemoteHTable;
import org.apache.hadoop.hbase.util.Bytes;
import util.HBaseHelper;
import java.io.IOException;
/**
* Created by root on 15-1-9.
*/
public class RestExample {
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
HBaseHelper helper = HBaseHelper.getHelper(conf);
helper.dropTable("testtable");
helper.createTable("testtable", "colfam1");
System.out.println("Adding rows to table...");
helper.fillTable("testtable", 1, 10, 5, "colfam1");
Cluster cluster=new Cluster();
cluster.add("hadoop",8080);
Client client=new Client(cluster);
Get get = new Get(Bytes.toBytes("row-30"));
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-3"));
Result result1 = table.get(get);
System.out.println("Get result1: " + result1);
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes("row-10"));
scan.setStopRow(Bytes.toBytes("row-15"));
scan.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-5"));
ResultScanner scanner = table.getScanner(scan);
for (Result result2 : scanner) {
System.out.println("Scan row[" + Bytes.toString(result2.getRow()) +
"]: " + result2);
}
}
}
五、MapReduce操作Hbase
Apache MapReduce 是Hadoop提供的軟體框架,用來進行大規模數據分析.
mapred
and mapreduce
與 MapReduce 一樣,在 HBase 中也有 2 種 mapreduce API 包.org.apache.hadoop.hbase.mapred and org.apache.hadoop.hbase.mapreduce.前者使用舊式風格的 API,後者採用新的模式.相比於前者,後者更加靈活。
HBase MapReduce 示例
HBase MapReduce 讀示例
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
...
TableMapReduceUtil.initTableMapperJob(
tableName, // input HBase table name
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper
null, // mapper output key
null, // mapper output value
job);
job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
public static class MyMapper extends TableMapper<Text, Text> {
public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
// process data for the row from the Result instance.
}
}
HBase MapReduce 讀寫示例
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleReadWrite");
job.setJarByClass(MyReadWriteJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
sourceTable, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
null, // mapper output key
null, // mapper output value
job);
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
null, // reducer class
job);
job.setNumReduceTasks(0);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
六、Hbase Web UI
Hbase提供了一種Web方式的用戶介面,用戶可以通過Web界面查看Hbase集群的屬性等狀態信息,web頁面分為:Master狀態界面,和Zookeeper統計信息頁面。
預設訪問地址分別是:
ip:60010
ip::60030
ip:60010/zk.jsp
Master狀態界面會看到Master狀態的詳情。
該頁面大概分HBase集群信息,任務信息,表信息,RegionServer信息。每一部分又包含了一些具體的屬性。
RegionServer狀態界面會看到RegionServer狀態的詳情。
RegionServer的節點屬性信息,任務信息和Region信息。
Zookeeper統計信息頁面是非常簡單的半結構化文本列印信息。
更多實時計算,Hbase,Flink,Kafka等相關技術博文,歡迎關註實時流式計算
本文由博客一文多發平臺 OpenWrite 發佈!