hadoop運行wordcount實例,hdfs簡單操作

来源:http://www.cnblogs.com/tijun/archive/2017/09/18/7544228.html
-Advertisement-
Play Games

通過hadoop安裝包自帶的資源,進行hadoop集群搭建的驗證,並簡單介紹一下hdfs的一些操作 ...


1.查看hadoop版本

[hadoop@ltt1 sbin]$ hadoop version
Hadoop 2.6.0-cdh5.12.0
Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd
Compiled by jenkins on 2017-06-29T11:33Z
Compiled with protoc 2.5.0
From source with checksum 7c45ae7a4592ce5af86bc4598c5b4
This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.0-cdh5.12.0.jar

2.通過hadoop自帶的jar文件,可以簡單測試一些功能。

提君博客原創

查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表

[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

3.在hdfs上創建文件夾

hadoop fs -mkdir /input

4.查看hdfs的更目錄列表

[hadoop@ltt1 ~]$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp

5.上傳本地文件到hdfs

hadoop fs -put $HADOOP_HOME/*.txt /input

6.查看hdfs上input目錄下文件

[hadoop@ltt1 ~]$ hadoop fs -ls /input
Found 3 items
-rw-r--r--   2 hadoop supergroup      85063 2017-09-17 08:15 /input/LICENSE.txt
-rw-r--r--   2 hadoop supergroup      14978 2017-09-17 08:15 /input/NOTICE.txt
-rw-r--r--   2 hadoop supergroup       1366 2017-09-17 08:15 /input/README.txt

7.wordcount簡單測試。

提君博客原創

[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount /input /output
17/09/17 08:19:12 INFO input.FileInputFormat: Total input paths to process : 3
17/09/17 08:19:13 INFO mapreduce.JobSubmitter: number of splits:3
17/09/17 08:19:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_0002
17/09/17 08:19:14 INFO impl.YarnClientImpl: Submitted application application_1505605169997_0002
17/09/17 08:19:14 INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/
17/09/17 08:19:14 INFO mapreduce.Job: Running job: job_1505605169997_0002
17/09/17 08:19:27 INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false
17/09/17 08:19:27 INFO mapreduce.Job:  map 0% reduce 0%
17/09/17 08:19:39 INFO mapreduce.Job:  map 33% reduce 0%
17/09/17 08:19:48 INFO mapreduce.Job:  map 100% reduce 0%
17/09/17 08:19:50 INFO mapreduce.Job:  map 100% reduce 100%
17/09/17 08:19:50 INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully
17/09/17 08:19:50 INFO mapreduce.Job: Counters: 50
>>提君博客原創  http://www.cnblogs.com/tijun/  <<
File System Counters FILE: Number of bytes read=42705 FILE: Number of bytes written=588235 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=101699 HDFS: Number of bytes written=30167 HDFS: Number of read operations=12 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=3 Launched reduce tasks=1 Data-local map tasks=2 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=47617 Total time spent by all reduces in occupied slots (ms)=8244 Total time spent by all map tasks (ms)=47617 Total time spent by all reduce tasks (ms)=8244 Total vcore-milliseconds taken by all map tasks=47617 Total vcore-milliseconds taken by all reduce tasks=8244 Total megabyte-milliseconds taken by all map tasks=48759808 Total megabyte-milliseconds taken by all reduce tasks=8441856 Map-Reduce Framework Map input records=2035 Map output records=14239 Map output bytes=155828 Map output materialized bytes=42717 Input split bytes=292 Combine input records=14239 Combine output records=2653 Reduce input groups=2402 Reduce shuffle bytes=42717 Reduce input records=2653 Reduce output records=2402 Spilled Records=5306 Shuffled Maps =3 Failed Shuffles=0 Merged Map outputs=3 GC time elapsed (ms)=881 CPU time spent (ms)=22320 Physical memory (bytes) snapshot=690192384 Virtual memory (bytes) snapshot=10862809088 Total committed heap usage (bytes)=380243968 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=101407 File Output Format Counters Bytes Written=30167

8.查看wordcount運行結果(由於結果太長,只舉出了部分結果)

[hadoop@ltt1 ~]$ hadoop fs -cat /output/*
worldwide,    4
would    1
writing    2
writing,    4
written    19
xmlenc    1
year    1
you    12
your    5
zlib    1
 252.227-7014(a)(1))    1
§    1
“AS    1
“Contributor    1
“Contributor”    1
“Covered    1
“Executable”    1
“Initial    1
“Larger    1
“Licensable”    1
“License”    1
“Modifications”    1
“Original    1
“Participant”)    1
“Patent    1
“Source    1
“Your”)    1
“You”    2
“commercial    3
“control”    1

>>提君博客原創  http://www.cnblogs.com/tijun/  <<

至此,通過一個wordcount的一個小慄子,簡介實踐了一下hdfs的創建文件夾,上傳文件,查看目錄,運行wordcount實例。

提君博客原創

>>提君博客原創  http://www.cnblogs.com/tijun/  <<


您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • --set_account_data 重新生成用戶編號 BEGIN DECLARE temp_id INT(8); /*用戶id*/ DECLARE temp_manager INT(8); /*上級id*/ DECLARE temp_accounter_no VARCHAR(64); /*上級編碼 ...
  • 1. cast()的用法 MySQL中大多數類型轉化可以使用cast(值 as 目標類型)的語法來完成。 cast後面必須緊接著左括弧: mysql> select cast(123 as char); + + | cast(123 as char) | + + | 123 | + + 1 row ...
  • 本文主要講述數據挖掘分析領域中,最常用的四種大數據分析方法:描述型分析、診斷型分析、預測型分析和指令型分析。 ...
  • MongoDB: jar包 下載地址: https://oss.sonatype.org/content/repositories/releases/org/mongodb/ mongodb的基本安裝使用: 參考地址: http://www.cnblogs.com/sxdcgaq8080/p/614 ...
  • 這是Hadoop學習全程記錄第1篇,在這篇里我將介紹一下如何在Linux下安裝Hadoop1.x。 先說明一下我的開發環境: 虛擬機:VMware8.0; 操作系統:CentOS6.4; 版本:jdk1.8;hadoop1.2.1 ①下載hadoop1.2.1,網盤:鏈接:http://pan.ba ...
  • 不可見索引概念 不可見索引(Invisible Index)是ORACLE 11g引入的新特性。不可見索引是會被優化器忽略的不可見索引,除非在會話或系統級別上將OPTIMIZER_USE_INVISIBLE_INDEXES初始化參數顯式設置為TRUE。此參數的預設值是FALSE。如果是虛擬索引是為了... ...
  • 一對一:比如一個學生對應一個身份證號、學生檔案; 一對多:一個班可以有很多學生,但是一個學生只能在一個班; 多對多:一個班可以有很多學生,學生也可以有很多課程; 一對多關係處理: 我們以學生和班級之間的關係來說明一對多的關係處理方法。假設現有基本表班級表(班級號,備註信息,……)。學生表(學號,姓名 ...
  • 第一次寫博客,看了很久博客但是寫還是第一次,總是提筆不知怎樣去寫,導致現在才開始提筆,寫下曾經的學習過程及心得筆記,如果哪裡有誤 望各位大神不吝指出! 基本的select語句 -語法:select *|{[DISTINCT] column|expression [alias],...} from t ...
一周排行
    -Advertisement-
    Play Games
  • 移動開發(一):使用.NET MAUI開發第一個安卓APP 對於工作多年的C#程式員來說,近來想嘗試開發一款安卓APP,考慮了很久最終選擇使用.NET MAUI這個微軟官方的框架來嘗試體驗開發安卓APP,畢竟是使用Visual Studio開發工具,使用起來也比較的順手,結合微軟官方的教程進行了安卓 ...
  • 前言 QuestPDF 是一個開源 .NET 庫,用於生成 PDF 文檔。使用了C# Fluent API方式可簡化開發、減少錯誤並提高工作效率。利用它可以輕鬆生成 PDF 報告、發票、導出文件等。 項目介紹 QuestPDF 是一個革命性的開源 .NET 庫,它徹底改變了我們生成 PDF 文檔的方 ...
  • 項目地址 項目後端地址: https://github.com/ZyPLJ/ZYTteeHole 項目前端頁面地址: ZyPLJ/TreeHoleVue (github.com) https://github.com/ZyPLJ/TreeHoleVue 目前項目測試訪問地址: http://tree ...
  • 話不多說,直接開乾 一.下載 1.官方鏈接下載: https://www.microsoft.com/zh-cn/sql-server/sql-server-downloads 2.在下載目錄中找到下麵這個小的安裝包 SQL2022-SSEI-Dev.exe,運行開始下載SQL server; 二. ...
  • 前言 隨著物聯網(IoT)技術的迅猛發展,MQTT(消息隊列遙測傳輸)協議憑藉其輕量級和高效性,已成為眾多物聯網應用的首選通信標準。 MQTTnet 作為一個高性能的 .NET 開源庫,為 .NET 平臺上的 MQTT 客戶端與伺服器開發提供了強大的支持。 本文將全面介紹 MQTTnet 的核心功能 ...
  • Serilog支持多種接收器用於日誌存儲,增強器用於添加屬性,LogContext管理動態屬性,支持多種輸出格式包括純文本、JSON及ExpressionTemplate。還提供了自定義格式化選項,適用於不同需求。 ...
  • 目錄簡介獲取 HTML 文檔解析 HTML 文檔測試參考文章 簡介 動態內容網站使用 JavaScript 腳本動態檢索和渲染數據,爬取信息時需要模擬瀏覽器行為,否則獲取到的源碼基本是空的。 本文使用的爬取步驟如下: 使用 Selenium 獲取渲染後的 HTML 文檔 使用 HtmlAgility ...
  • 1.前言 什麼是熱更新 游戲或者軟體更新時,無需重新下載客戶端進行安裝,而是在應用程式啟動的情況下,在內部進行資源或者代碼更新 Unity目前常用熱更新解決方案 HybridCLR,Xlua,ILRuntime等 Unity目前常用資源管理解決方案 AssetBundles,Addressable, ...
  • 本文章主要是在C# ASP.NET Core Web API框架實現向手機發送驗證碼簡訊功能。這裡我選擇是一個互億無線簡訊驗證碼平臺,其實像阿裡雲,騰訊雲上面也可以。 首先我們先去 互億無線 https://www.ihuyi.com/api/sms.html 去註冊一個賬號 註冊完成賬號後,它會送 ...
  • 通過以下方式可以高效,並保證數據同步的可靠性 1.API設計 使用RESTful設計,確保API端點明確,並使用適當的HTTP方法(如POST用於創建,PUT用於更新)。 設計清晰的請求和響應模型,以確保客戶端能夠理解預期格式。 2.數據驗證 在伺服器端進行嚴格的數據驗證,確保接收到的數據符合預期格 ...