介紹hadoop中的hadoop和hdfs命令_ZenDei技術網路在線

有些hive安裝文檔提到了hdfs dfs -mkdir ，也就是說hdfs也是可以用的，但在2.8.0中已經不那麼處理了，之所以還可以使用，是為了向下相容. 本文簡要介紹一下有關的命令，以便對hadoop的命令有一個大概的影響，併在想使用的時候能夠知道從哪裡可以獲得幫助。概述在$HADOOP_ ...

有些hive安裝文檔提到了hdfs dfs -mkdir ，也就是說hdfs也是可以用的，但在2.8.0中已經不那麼處理了，之所以還可以使用，是為了向下相容.

本文簡要介紹一下有關的命令，以便對hadoop的命令有一個大概的影響，併在想使用的時候能夠知道從哪裡可以獲得幫助。

概述

在$HADOOP_HOME/bin下可以看到hadoop和hdfs的腳本。

hdfs的相當一部分的功能可以使用hdoop來替代（目前），但hdfs有自己的一些獨有的功能。hadoop主要面向更廣泛複雜的功能。

本文介紹hadoop,hdfs和yarn的命令，目的是為了給予自己留下一個大概的映像！

第一部分 hadoop命令

參見http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/CommandsManual.html

Usage: hadoop [--config confdir] [--loglevel loglevel] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]

GENERIC_OPTION	Description	中文說明
`-archives <comma separated list of archives>`	Specify comma separated archives to be unarchived on the compute machines. Applies only to job.	為某個作業提交一串的壓縮文件（以逗號分隔),目的是讓作業加壓，併在計算節點計算
`-conf <configuration file>`	Specify an application configuration file.	設定配置文件
`-D <property>=<value>`	Use value for given property.	讓hadoop命令使用特性屬性值
`-files <comma separated list of files>`	Specify comma separated files to be copied to the map reduce cluster. Applies only to job.	設定逗號分隔的文件列表，這些文件被覆制到mr幾圈。只針對job
`-fs <file:///> or <hdfs://namenode:port>`	Specify default filesystem URL to use. Overrides ‘fs.defaultFS’ property from configurations.	設定hadoop命令需要用到的文件系統，會覆蓋fs.defaultFS的配置
`-jt <local> or <resourcemanager:port>`	Specify a ResourceManager. Applies only to job.	設定一個資源管理器。只針對job
`-libjars <comma seperated list of jars>`	Specify comma separated jar files to include in the classpath. Applies only to job.	設定一個逗號分隔的jar文件列表，這些jar會被加入classpath。只針對作業

一般情況下，以上的通用選項可以不需要用到。

下麵介紹命令command

archive
checknative
classpath
credential
distcp
fs
jar
key
trace
version
CLASSNAME

1.1 archive

創建一個hadoop壓縮文件，詳細的可以參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-archives/HadoopArchives.html

hadoop的壓縮文件不同於普通的壓縮文件，是特有格式（不能使用rar,zip,tar之類的解壓縮).尾碼是har.壓縮目錄包含元數據和數據。

壓縮的目的，主要是為了減少可用空間，和傳輸的數據量。

註：hadoop官方文檔沒有過多的解釋。如此是否意味著har文件僅僅為mapreduce服務？如果我們不用mapreduce,那麼是否可以不關註這個。

創建壓縮文件

hadoop archive -archiveName name -p <parent> [-r <replication factor>] <src>* <dest>

例如

把目錄/foor/bar下的內容壓縮為zoo.har並存儲在/outputdir下

hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir

把目錄/user/haoop/dir1和/user/hadoop/dir2下的文件壓縮為foo.har，並存儲到/user/zoo中

hadoop archive -archiveName foo.har -p /user/ hadoop/dir1 hadoop/dir2 /user/zoo

解壓

把文件foo.har中的dir目錄解壓到 /user/zoo/newdir下

hdfs dfs -cp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir

以並行方式解壓

hadoop distcp har:///user/zoo/foo.har/dir1 hdfs:/user/zoo/newdir

查看解壓文件

hdfs dfs -ls -R har:///user/zoo/foo.har/

1.2 checknative

hadoop checknative [-a] [-h]

-a 檢查所有的庫

-h 顯示幫助

檢查hadoop的原生代碼，一般人用不到。具體可以參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/NativeLibraries.html

1.3 classpath

hadoop classpath [--glob |--jar <path> |-h |--help]

列印hadoop jar或者庫的類路徑

1.4 credential

hadoop credential <subcommand> [options]

管理憑證供應商的憑證、密碼和secret(有關秘密信息）。

查看幫助

hadoop credential -list

註：暫時沒有涉略，大概是用於有關安全認證的。

1.5 distcp

功能:複製文件或者目錄

詳細參考： http://hadoop.apache.org/docs/r2.8.0/hadoop-distcp/DistCp.html

distcp就是distributed copy的縮寫（望文生義),主要用於集群內/集群之間複製文件。需要使用到mapreduce。

原文用了相當的篇幅介紹這個功能，估計這個功能有不少用處，畢竟搬遷巨量文件還是挺複雜的，值得專門寫這個工具。

簡單複製1

[hadoop@bigdata ~]$ hadoop distcp /tmp/input/hadoop /tmp/input/haoop1
17/06/07 15:57:53 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 15:57:53 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
17/06/07 15:57:54 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/tmp/input/hadoop], targetPath=/tmp/input/haoop1, targetPathExists=false, filtersFile='null'}
17/06/07 15:57:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/07 15:57:56 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 31; dirCnt = 1
17/06/07 15:57:56 INFO tools.SimpleCopyListing: Build file listing completed.
17/06/07 15:57:56 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/06/07 15:57:56 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
17/06/07 15:57:57 INFO tools.DistCp: Number of paths in the copy list: 31
17/06/07 15:57:57 INFO tools.DistCp: Number of paths in the copy list: 31
17/06/07 15:57:58 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/07 15:57:59 INFO mapreduce.JobSubmitter: number of splits:20
17/06/07 15:58:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1496800112089_0001
17/06/07 15:58:01 INFO impl.YarnClientImpl: Submitted application application_1496800112089_0001
17/06/07 15:58:01 INFO mapreduce.Job: The url to track the job: http://bigdata.lzf:8099/proxy/application_1496800112089_0001/
17/06/07 15:58:01 INFO tools.DistCp: DistCp job-id: job_1496800112089_0001
17/06/07 15:58:01 INFO mapreduce.Job: Running job: job_1496800112089_0001
17/06/07 15:58:24 INFO mapreduce.Job: Job job_1496800112089_0001 running in uber mode : false

--註：後面太多，省略了

結果查看下（列出部分）

hadoop fs -ls /tmp/intput/hadoop1

[hadoop@bigdata ~]$ hadoop fs -ls /tmp/input/haoop1
17/06/07 16:05:58 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 16:05:58 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
Found 30 items
-rw-r--r--   1 hadoop supergroup       4942 2017-06-07 15:59 /tmp/input/haoop1/capacity-scheduler.xml
-rw-r--r--   1 hadoop supergroup       1335 2017-06-07 15:58 /tmp/input/haoop1/configuration.xsl
-rw-r--r--   1 hadoop supergroup        318 2017-06-07 15:59 /tmp/input/haoop1/container-executor.cfg
-rw-r--r--   1 hadoop supergroup       1443 2017-06-07 15:59 /tmp/input/haoop1/core-site.xml
-rw-r--r--   1 hadoop supergroup       3804 2017-06-07 16:00 /tmp/input/haoop1/hadoop-env.cmd
-rw-r--r--   1 hadoop supergroup       4755 2017-06-07 16:00 /tmp/input/haoop1/hadoop-env.sh
-rw-r--r--   1 hadoop supergroup       2490 2017-06-07 15:58 /tmp/input/haoop1/hadoop-metrics.properties
-rw-r--r--   1 hadoop supergroup       2598 2017-06-07 15:59 /tmp/input/haoop1/hadoop-metrics2.properties
-rw-r--r--   1 hadoop supergroup       9683 2017-06-07 16:00 /tmp/input/haoop1/hadoop-policy.xml
-rw-r--r--   1 hadoop supergroup       1527 2017-06-07 15:58 /tmp/input/haoop1/hdfs-site.xml
-rw-r--r--   1 hadoop supergroup       1449 2017-06-07 15:59 /tmp/input/haoop1/httpfs-env.sh
-rw-r--r--   1 hadoop supergroup       1657 2017-06-07 15:59 /tmp/input/haoop1/httpfs-log4j.properties

........

路徑可以使用uri，例如 hadoop distcp hdfs://bigdata.lzf:9001/tmp/input/hadoop hdfs://bigdata.lzf:9001/tmp/input/hadoop1

源可以是多個例如 hadoop distcp hdfs://bigdata.lzf:9001/tmp/input/hadoop hdfs://bigdata.lzf:9001/tmp/input/test hdfs://bigdata.lzf:9001/tmp/input/hadoop1

註意：複製的總是提示

17/06/07 16:09:52 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
17/06/07 16:09:52 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor

這個不用管，通過8099的配置可以看到，使用的是 mapreduce.task.io.sort.mb,mapreduce.task.io.sort.factor

1.6 fs

這個是比較常用的一個命令，和hdfs dfs基本等價，但還是有一些區別。

http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

說明很詳細，用法很簡單。

•appendToFile
•cat
•checksum
•chgrp
•chmod
•chown
•copyFromLocal
•copyToLocal
•count
•cp
•createSnapshot
•deleteSnapshot
•df
•du
•dus
•expunge
•find
•get
•getfacl
•getfattr
•getmerge
•help
•ls
•lsr
•mkdir
•moveFromLocal
•moveToLocal
•mv
•put
•renameSnapshot
•rm
•rmdir
•rmr
•setfacl
•setfattr
•setrep
•stat
•tail
•test
•text
•touchz
•truncate
•usage
這些參數很容易閱讀理解，和linux的常見文件系統命令基本一致。

這裡介紹幾個有意思，且常用的。

從本地文件系統複製數據到hadoop uri

hadoop fs -copyFromLocal <localsrc> URI

這個命令很多情況下等同於put，只不過前者只能在本地文件系統下用。

例如：

[hadoop@bigdata ~]$ hadoop fs -copyFromLocal start-hadoop.sh /log
17/06/07 17:10:21 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 17:10:21 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

--通過uri,強制覆蓋

[hadoop@bigdata ~]$ hadoop fs -copyFromLocal -f start-hadoop.sh hdfs://bigdata.lzf:9001/log
17/06/07 17:12:03 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 17:12:03 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

複製uri中文件到本地copyToLocal

hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

命令等同於get,只不過只能複製到本地中而已。

例如

hadoop fs -copyToLocal -f hdfs://bigdata.lzf:9001/log/start-hadoop.sh /home/hadoop/testdir

計數count

hadoop fs -count [-q] [-h] [-v] [-x] [-t [<storage type>]] [-u] <paths>

這個命令還是挺有用的。

Count the number of directories, files and bytes under the paths that match the specified file pattern. Get the quota and the usage. The output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE, PATHNAME

計算目錄，文件個數和位元組數

例如：

[hadoop@bigdata ~]$ hadoop fs -count /tmp/input/hadoop
17/06/07 17:41:04 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 17:41:04 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
1 30 83564 /tmp/input/hadoop

通過這個命令，瞭解下存儲的文件情況。

複製cp

Usage: hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest>

目標必須是目錄，源可以多個。

和distcp有點類似，不過這個只能在同個hadoop集群內？且distcp需要使用mapreduce

例如：

hadoop fs -cp /tmp/input/hadoop1/hadoop/*.* /tmp/input/hadoop

創建快照

http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html

主要功能是備份

刪除快照

略

顯示可用空間df

hadoop fs -df [-h] URI [URI ...]

[hadoop@bigdata ~]$ hadoop fs -df -h /
17/06/07 17:51:31 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 17:51:31 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
Filesystem Size Used Available Use%
hdfs://bigdata.lzf:9001 46.5 G 1.5 M 35.8 G 0%

計算目錄位元組大小du

hadoop fs -du [-s] [-h] [-x] URI [URI ...]

部分功能可以用count替代

[hadoop@bigdata ~]$ hadoop fs -du -h /
17/06/07 17:52:56 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 17:52:56 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
0      /input
63     /log
0      /test
1.0 M /tmp
9      /warehouse
[hadoop@bigdata ~]$ hadoop fs -du -h -s /
17/06/07 17:53:19 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/07 17:53:19 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
1.0 M /
[hadoop@bigdata ~]$

參數基本同Linux的

清空回收站數據expunge

hadoop fs -expunge

永久刪除過期的文件，並創建新的檢查點。檢查點比fs.trash.interval老的數據，會再下次的這個操作的時候清空。

查找find

hadoop fs -find <path> ... <expression> ...

查找根據文件名稱查找，而不是文件內容。

[hadoop@bigdata ~]$ hadoop fs -find / -name hadoop -print
17/06/08 11:59:04 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/08 11:59:04 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
/tmp/hadoop-yarn/staging/hadoop
/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop
/tmp/hive/hadoop
/tmp/input/hadoop
/tmp/input/hadoop1/hadoop

或者使用iname(不考慮大小寫)

hadoop fs -find / -iname hadoop -print

[hadoop@bigdata ~]$ hadoop fs -find / -name hadooP -print
17/06/08 12:00:59 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/08 12:00:59 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
[hadoop@bigdata ~]$ hadoop fs -find / -iname hadooP -print
17/06/08 12:01:06 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/06/08 12:01:06 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
/tmp/hadoop-yarn/staging/hadoop
/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop
/tmp/hive/hadoop
/tmp/input/hadoop
/tmp/input/hadoop1/hadoop

下載文件到本地get

類似於copyToLocal.但有crc校驗

hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] <src> <localdst>

例如：

hadoop fs -get /tmp/input/hadoop/*.xml /home/hadoop/testdir/

查看文件或者目錄屬性 getfattr

hadoop fs -getfattr [-R] -n name | -d [-e en] <path>

-n name和 -d是互斥的，-d表示獲取所有屬性。-R表示迴圈獲取； -e en 表示對獲取的內容編碼，en的可以取值是 “text”, “hex”, and “base64”.

例如

hadoop fs -getfattr -d /file

hadoop fs -getfattr -R -n user.myAttr /dir

從實際例子看，暫時不知道有什麼特別用處。

合併文件getmerge

hadoop fs -getmerge -nl /src /opt/output.txt

hadoop fs -getmerge -nl /src/file1.txt /src/file2.txt /output.txt

例如：

hadoop fs -getmerge -nl /tmp/input/hadoop/hadoop-env.sh /tmp/input/hadoop/slaves /home/hadoop/testdir/merget-test.txt

註：目標是本地文件，不是uri文件

羅列文件列表ls

hadoop fs -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] <args>

mkdir hadoop fs -mkdir [-p] <paths> --創建目錄

moveFromLocal hadoop fs -moveFromLocal <localsrc> <dst> --從本地上傳，類似Put

集群內移動目錄mv

hadoop fs -mv URI [URI ...] <dest>

源可以是多個。

例如 hadoop fs -mv /tmp/input/hadoop1/hadoop/slaves /tmp/input/hadoop1/

上傳文件put

hadoop fs -put [-f] [-p] [-l] [-d] [ - | <localsrc1> .. ]. <dst>

類似於copyFromLocal

刪除文件rm hadoop fs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI ...]

刪除目錄rmdir hadoop fs -rmdir [--ignore-fail-on-non-empty] URI [URI ...]

顯示文件部分內容tail hadoop fs -tail [-f] URI

其餘略

1.7 jar

使用hadoop來運行一個jar

hadoop jar <jar> [mainClass] args...

但hadoop建議使用yarn jar 來替代hadoop jar

yarn jar的命令參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar

1.8 key

管理密匙供應商的密匙

具體略
1.9 trace

查看和修改跟蹤設置，具體參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Tracing.html
1.10 version

查看版本信息

hadoop version

1.11 CLASSNAME

利用hadoop運行某個類

語法：hadoop CLASSNAME

以下內容來自 http://www.thebigdata.cn/Hadoop/1924.html

使用hadoop CLASSNAM之前，你需要設置HADOOP_CLASSPATH.

export HADOOP_CLASSPATH=/home/hadoop/jardir/*.jar:/home/hadoop/workspace/hdfstest/bin/

其中/home/hadoop/jardir/包含了我所有的hadoop的jar包。

/home/hadoop/workspace/hdfstest/bin/就是我的開發class的所在目錄。

我使用eclipse寫java開發，由於eclipse有自動編譯的功能，寫好之後，就可以直接在命令行運行hadoop CLASSNAME的命令：hadoop FileSystemDoubleCat hdfs://Hadoop:8020/xstartup

你同樣可以將你的工程打成runable jar包（將所有的jar包打包）。然後運行hadoop jar jar包名類型參數1 。每一次都要打成jar包，這對於測試來說極不方便的。。。

這個主要就是為了方便開發人員測試的。

第二部分 hdfs 命令

直接在cli下輸入hdfs可以獲得官方的幫助

dfs                     run a filesystem command on the file systems supported in Hadoop.
classpath            prints the classpath
namenode -format     format the DFS filesystem
secondarynamenode    run the DFS secondary namenode
namenode             run the DFS namenode
journalnode          run the DFS journalnode
zkfc                     run the ZK Failover Controller daemon
datanode             run a DFS datanode
debug                  run a Debug Admin to execute HDFS debug commands
dfsadmin             run a DFS admin client
haadmin              run a DFS HA admin client
fsck                    run a DFS filesystem checking utility
balancer             run a cluster balancing utility
jmxget               get JMX exported values from NameNode or DataNode.
mover                run a utility to move block replicas across
                           storage types
oiv                    apply the offline fsimage viewer to an fsimage
oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
oev                  apply the offline edits viewer to an edits file
fetchdt              fetch a delegation token from the NameNode
getconf              get config values from configuration
groups               get the groups which users belong to
snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
lsSnapshottableDir   list all snapshottable dirs owned by the current user
      Use -help to see options
portmap              run a portmap service
nfs3                 run an NFS version 3 gateway
cacheadmin           configure the HDFS cache
crypto               configure HDFS encryption zones
storagepolicies      list/get/set block storage policies
version              print the version

或者直接通過 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html 獲得官方的幫助

需要閱讀的內容太多，先提供一個清單，簡要說明每個命令是做什麼，並重點介紹幾個內容

命令	語法	功能概要描述
classpath	`hdfs classpath [--glob \|--jar <path> \|-h \|--help]`	獲取jar包或者庫的有關類路徑
dfs	`hdfs dfs [COMMAND [COMMAND_OPTIONS]]`	等同於hadoop fs 命令
fetchdt	`hdfs fetchdt <opts> <token_file_path>`	從名稱節點獲取代理令牌
fsck	hdfs fsck <path> [-list-corruptfileblocks \| [-move \| -delete \| -openforwrite] [-files [-blocks [-locations \| -racks \| -replicaDetails]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>]	運行hdfs文件系統檢驗管理員有必要常常執行這個命令
getconf	hdfs getconf -namenodes hdfs getconf -secondaryNameNodes hdfs getconf -backupNodes hdfs getconf -includeFile hdfs getconf -excludeFile hdfs getconf -nnRpcAddresses hdfs getconf -confKey [key]	獲取配置信息
groups	`hdfs groups [username ...]`	獲取用戶的組信息
lsSnapshottableDir	`hdfs lsSnapshottableDir [-help]`	獲取快照表目錄
jmxget	`hdfs jmxget [-localVM ConnectorURL \| -port port \| -server mbeanserver \| -service service]`	從特定服務獲取jmx信息原文用的是dump/倒出
oev	`hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE` 參考 http://lxw1234.com/archives/2015/08/442.htm	離線編輯查看器
oiv	`hdfs oiv [OPTIONS] -i INPUT_FILE` 參考 http://lxw1234.com/archives/2015/08/440.htm	離線映像編輯查看器
snapshotDiff	`hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>` 具體參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html#Get_Snapshots_Difference_Report	比較不同快照的差異
version	`hdfs version`	查看版本信息
balancer	hdfs balancer [-threshold <threshold>] [-policy <policy>] [-exclude [-f <hosts-file> \| <comma-separated list of hosts>]] [-include [-f <hosts-file> \| <comma-separated list of hosts>]] [-source [-f <hosts-file> \| <comma-separated list of hosts>]] [-blockpools <comma-separated list of blockpool ids>] [-idleiterations <idleiterations>] 詳細參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer	運行集群均衡非常重要命令由於各種原因，需要重新均衡數據節點。例如添加了新節點之後
cacheadmin	`hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]` 詳細參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html#cacheadmin_command-line_interface	緩存管理非常重要命令官方微了這個寫了一大篇的文章進行描述。
datanode	`hdfs datanode [-regular \| -rollback \| -rollingupgrade rollback]`	數據節點管理用於啟動數據節點和滾動升級中進行回滾
dfsadmin	hdfs dfsadmin [GENERIC_OPTIONS] [-report [-live] [-dead] [-decommissioning]] [-safemode enter \| leave \| get \| wait \| forceExit] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true \|false \|check] [-refreshNodes] [-setQuota <quota> <dirname>...<dirname>] [-clrQuota <dirname>...<dirname>] [-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>] [-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>] [-finalizeUpgrade] [-rollingUpgrade [<query> \|<prepare> \|<finalize>]] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-refresh <host:ipc_port> <key> [arg1..argn]] [-reconfig <datanode \|...> <host:ipc_port> <start \|status>] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setBalancerBandwidth <bandwidth in bytes per second>] [-getBalancerBandwidth <datanode_host:ipc_port>] [-allowSnapshot <snapshotDir>] [-disallowSnapshot <snapshotDir>] [-fetchImage <local directory>] [-shutdownDatanode <datanode_host:ipc_port> [upgrade]] [-getDatanodeInfo <datanode_host:ipc_port>] [-evictWriters <datanode_host:ipc_port>] [-triggerBlockReport [-incremental] <datanode_host:ipc_port>] [-help [cmd]]	文件管理核心命令--至關重要
haadmin	hdfs haadmin -checkHealth <serviceId> hdfs haadmin -failover [--forcefence] [--forceactive] <serviceId> <serviceId> hdfs haadmin -getServiceState <serviceId> hdfs haadmin -help <command> hdfs haadmin -transitionToActive <serviceId> [--forceactive] hdfs haadmin -transitionToStandby <serviceId>	高可靠管理核心命令-至關重要
journalnode	`hdfs journalnode` 參考 http://blog.csdn.net/kiwi_kid/article/details/53514314 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Administrative_commands http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html http://blog.csdn.net/dr_guo/article/details/50975851 --搭建ha集群參考	運行一個名稱節點見同步服務
mover	`hdfs mover [-p <files/dirs> \| -f <local file name>]` 參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool	運行數據遷移。用於遷移壓縮文件。類似於均衡器。定時均衡有關數據
namenode	hdfs namenode [-backup] \| [-checkpoint] \| [-format [-clusterid cid ] [-force] [-nonInteractive] ] \| [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] \| [-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] \| [-rollback] \| [-rollingUpgrade <rollback \|started> ] \| [-finalize] \| [-importCheckpoint] \| [-initializeSharedEdits] \| [-bootstrapStandby [-force] [-nonInteractive] [-skipSharedEditsCheck] ] \| [-recover [-force] ] \| [-metadataVersion ]	名稱節點管理（核心命令-至關重要）進行備份，格式化，升級，回滾，恢復等等至關重要的操作。
nfs3	hdfs nfs3 參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service	啟動一個nfs3網關，能夠以類似操作系統文件瀏覽方式來瀏覽hdfs文件。通過這個東西，有的時候能夠更方便地操作
`portmap`	hdfs portmap 參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service	和nfs伺服器一起使用
secondarynamenode	`hdfs secondarynamenode [-checkpoint [force]] \| [-format] \| [-geteditsize]` 參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Secondary_NameNode	關於第二名稱節點
storagepolicies	`hdfs storagepolicies` 參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html	壓縮存儲策略管理在某些環境下很有利。也許以後不存在所謂ssd的問題，僅僅是記憶體還是磁碟的問題
zkfc	`hdfs zkfc [-formatZK [-force] [-nonInteractive]]` 參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Administrative_commands	管理動物園管理員節點和journalnoe有關高可靠的重要部分
debug	`hdfs debug verifyMeta -meta <metadata-file> [-block <block-file>]`	檢驗hdfs的元數據和塊文件。
	`hdfs debug computeMeta -block <block-file> -out <output-metadata-file>` 謹慎使用，官方告警： Use at your own risk! If the block file is corrupt and you overwrite it’s meta file, it will show up as ‘good’ in HDFS, but you can’t read the data. Only use as a last measure, and when you are 100% certain the block file is good.	通過塊文件計算元數據
	`hdfs debug recoverLease -path <path> [-retries <num-retries>]`	恢復租約? 恢復特定路徑的租約

第三部分 yarn命令

細節參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html

下表列出命令概覽

yarn命令概覽
命令	語法和概述	備註
application	`yarn application [options]`	打開應用報告或者終止應用
applicationattempt	`yarn applicationattempt [options]`	列印應用嘗試報告
classpath	`yarn classpath [--glob \|--jar <path> \|-h \|--help]`	列印hadoop jar需要用到的類路徑和庫
container	`yarn container [options]`	列印容器信息
jar	`yarn jar <jar> [mainClass] args...`	通過yarn運行一個jar.jar中的代碼必須和yarn有關
logs	`yarn logs -applicationId <application ID> [options]`	導出容器日誌
node	`yarn node [options]`	列印節點報告
queue	`yarn queue [options]`	列印隊列信息
version	`yarn version`	顯示版本信息
daemonlog	yarn daemonlog -getlevel <host:httpport> <classname> yarn daemonlog -setlevel <host:httpport> <classname> <level> 例如： bin/yarn daemonlog -setlevel 127.0.0.1:8088 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl DEBUG	獲取/設置類的日誌級別
nodemanager	`yarn nodemanager`	啟動yarn的節點管理器
proxyserver	`yarn proxyserver`	啟動web代理伺服器
resourcemanager	`yarn resourcemanager [-format-state-store]`	啟動yarn資源管理親戚
rmadmin	Usage: yarn rmadmin -refreshQueues -refreshNodes [-g [timeout in seconds]] -refreshNodesResources -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -addToClusterNodeLabels <"label1(exclusive=true),label2(exclusive=false),label3"> -removeFromClusterNodeLabels <label1,label2,label3> (label splitted by ",") -replaceLabelsOnNode <"node1[:port]=label1,label2 node2[:port]=label1,label2"> [-failOnUnknownNodes] -directlyAccessNodeLabelStore -refreshClusterMaxPriority -updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) -transitionToActive [--forceactive] <serviceId> -transitionToStandby <serviceId> -failover [--forcefence] [--forceactive] <serviceId> <serviceId> -getServiceState <serviceId> -checkHealth <serviceId> -help [cmd]	管理資源管理器
scmadmin	`yarn scmadmin [options]` yarn scmadmin -runCleanerTask	執行共用緩存管理
sharedcachemanager	`yarn sharedcachemanager`	啟動共用緩存管理器
timelineserver	`yarn timelineserver`	啟動時間線伺服器

第四部分總結

1. 有很多重要的命令

2. 瞭解所有這些命令，必須耗費許多時間，並必須在一個完善的環境下進行！

3. 不要在blog中插入太多表格，否則會倒霉的。