基於 HDP2.4安裝(五):集群及組件安裝 創建的hadoop集群,修改預設配置,將hbase 存儲配置為 Azure Blob Storage 目錄: 簡述 配置 驗證 簡述: hadoop-azure 提供hadoop 與 azure blob storage 集成支持,需要部署 hadoop ...
基於 HDP2.4安裝(五):集群及組件安裝 創建的hadoop集群,修改預設配置,將hbase 存儲配置為 Azure Blob Storage
目錄:
- 簡述
- 配置
- 驗證
簡述:
- hadoop-azure 提供hadoop 與 azure blob storage 集成支持,需要部署 hadoop-azure.jar 程式包,在HDP2.4 安裝包中已預設提供,如下圖:
- 配置成功後,讀寫的數據都存儲在 Azure Blob Storage account
- 支持配置多個 Azure Blob Storage account, 實現了標準的 Hadoop FileSystem interface
- Reference file system paths using URLs using the wasb scheme.
- Tested on both Linux and Windows. Tested at scale.
- Azure Blob Storage 包含三部分內容:
- Storage Account: All access is done through a storage account
- Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
- Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata
配置 :
- 在 china Azure 門戶(https://manage.windowsazure.cn) 創建一個 blob storage Account, 如下圖命名:localhbase
- 配置訪問 Azure blob storage 訪問證書及key以及切換文件系統配置,本地 hadoop core-site.xml 文件,內容如下
<property> <name>fs.defaultFS</name> <value>wasb://[email protected]</value> </property> <property> <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name> <value>YOUR ACCESS KEY</value> </property>
-
在大多數場景下Hadoop clusters, the core-site.xml file is world-readable,為了安全起見,可通過配置將Key加密,然後通過配置的程式對key進行解密,此場景下的配置如下(基於安全考慮的可選配置):
<property> <name>fs.azure.account.keyprovider.localhbase.blob.core.chinacloudapi.cn</name> <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value> </property> <property> <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name> <value>YOUR ENCRYPTED ACCESS KEY</value> </property> <property> <name>fs.azure.shellkeyprovider.script</name> <value>PATH TO DECRYPTION PROGRAM</value> </property>
-
Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc
-
Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail,That won’t work for HBase logs, so page blob support was introduced to overcome this limitation
-
Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs
-
In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names
<property> <name>fs.azure.page.blob.dir</name> <value>/hbase/WALs,/hbase/oldWALs,/mapreducestaging,/hbase/MasterProcWALs,/atshistory,/tezstaging,/ams/hbase</value> </property>
驗證:
- 上面的參數配置均在 ambari 中完成,重啟參數依賴的服務
-
命令: hdfs dfs -ls /hbase/data/default 如下圖, 沒有數據
- 參見 HBase(三): Azure HDInsigt HBase表數據導入本地HBase 將測試表數據導入,完成後如下圖:
- 命令:./hbase hbck -repair -ignorePreCheckPermission
- 命令: hbase shell
- 查看數據,如下圖,則OK
- 用我們自己開發的查詢工具驗證數據,如下圖,關於工具的開發見下一章
- 參考資料: https://hadoop.apache.org/docs/current/hadoop-azure/index.html