大數據(1)：基於sogou.500w.utf8數據的MapReduce程式設計

-Advertisement-

1.使用ECLIPSE工具打包運行WORDCOUNT實例，統計莎士比亞文集各單詞計數（文件SHAKESPEARE.TXT）。 ①WorldCount.java 中的main函數修改如下： ②導出WordCount的jar包： export->jar file->next->next->Main cl ...

1.使用ECLIPSE工具打包運行WORDCOUNT實例，統計莎士比亞文集各單詞計數（文件SHAKESPEARE.TXT）。

①WorldCount.java 中的main函數修改如下：

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//設置輸入文本路徑
FileInputFormat.addInputPath(job, new Path("/input"));
//設置mp結果輸出路徑
FileOutputFormat.setOutputPath(job, new Path("/output/wordcount"));    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

②導出WordCount的jar包：
　　export->jar file->next->next->Main class裡面選擇WordCount->Finish。
③使用scp將wc.jar拷貝到node1機器，創建目錄：hadoop fs –mkdir /input,將shakespeare.txt上傳到hdfs上，運行wc.jar文件：hadoop jar wc.jar
④使用hadoop fs -cat /output/wordcount/part-r-00000 grep|head -n 30 查看前30條輸出結果：

2.對於SOGOU_500W_UTF文件，完成下列程式設計內容：

（1）統計每個用戶搜索的關鍵字總長度

Mapreduce程式：

public class sougou3 {
public static class Sougou3Map extends
Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
　　String line = value.toString();
　　String[] vals = line.split("\t");
　　String uid = vals[1];
　　String search = vals[2];
　　context.write(new Text(uid), new Text(search+"|"+search.length()));
}
}
public static class Sougou3Reduce extends
Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
　　String result = "";
　　for (Text value : values) {
　　　　String strVal = value.toString();
　　　　result += (strVal+" ");
　　}
　　context.write(new Text(key + "\t"), new Text(result));
　　}
　　}
}

輸出結果：

（2）統計2011年12月30日1點到2點之間，搜索過的UID有哪些？

Mapreduce程式：

public class sougou1 {

    public static class Sougou1Map extends
            Mapper<Object, Text, Text, Text> {

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            String[] vals = line.split("\t");
            String time = vals[0];
            String uid = vals[1];
            //2008-07-10 19:20:00
            String formatTime = time.substring(0,4)+"-"+time.substring(4,6)+"-"+time.substring(6,8)+" "
                    +time.substring(8,10)+":"+time.substring(10,12)+":"+time.substring(12,14);
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            Date date;
            try {
                date = sdf.parse(formatTime);
                Date date1 = sdf.parse("2011-12-30 01:00:00");
                Date date2 = sdf.parse("2011-12-30 02:00:00");
                //日期在範圍區間上
                if (date.getTime() > date1.getTime() && date.getTime() < date2.getTime()){
                    context.write(new Text(uid), new Text(formatTime));
                }
            } catch (ParseException e) {
                e.printStackTrace();
            }
        }
    }
    public static class Sougou1Reduce extends
            Reducer<Text, Text, Text, Text> {
        public void reduce(Text key, Iterable<Text> values,
                Context context) throws IOException, InterruptedException {
                String result = "";
                for (Text value : values) {
                    result += value.toString()+"|";
                }
                context.write(key, new Text(result));
        }
    }
}

輸出結果：
左邊是用戶id，右邊分別是時間，以“|”作為分割。

（3）統計搜索過‘仙劍奇俠’的每個UID搜索該關鍵詞的次數。

Mapreduce程式：

public class sougou2 {
    public static class Sougou2Map extends
            Mapper<Object, Text, Text, IntWritable> {
        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            String[] vals = line.split("\t");
            String uid = vals[1];
            String search = vals[2];
            if (search.equals("仙劍奇俠")){
                context.write(new Text(uid), new IntWritable(1));
            }
        }
    }
    public static class Sougou2Reduce extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
                int result = 0;
                for (IntWritable value : values) {
                    result += value.get();
                }
                context.write(new Text(key+"\t"), new IntWritable(result));
        }
    }
}

輸出結果：
UID為：6856e6e003a05cc912bfe13ebcea8a04的用戶搜索過“仙劍奇俠”共1次。

3.使用MAPREDUCE程式設計實現對文件中下列數據的排序操作78 11 56 87 25 63 19 22 55

Mapreduce程式：

public class Sort {
    //map將輸入中的value化成IntWritable類型，作為輸出的key    
    public static class Map extends Mapper<Object,Text,IntWritable,NullWritable>{
        private static IntWritable data=new IntWritable();
        //實現map函數
        public void map(Object key,Text value,Context context)
                throws IOException,InterruptedException{
            String line=value.toString();
            data.set(Integer.parseInt(line));
            context.write(data, NullWritable.get());
        }
    }
   
    //reduce將輸入中的key複製到輸出數據的key上，
    //然後根據輸入的value-list中元素的個數決定key的輸出次數
    //用全局linenum來代表key的位次
    public static class Reduce extends
            Reducer<IntWritable,NullWritable,IntWritable,NullWritable>{
       
       
        //實現reduce函數
        public void reduce(IntWritable key,Iterable<NullWritable> values,Context context)
                throws IOException,InterruptedException{
            for(NullWritable val:values){
                context.write(key, NullWritable.get());
            }
        }
 
    }

}

輸出內容為:

4.學生成績文件TXT內容（欄位用TAB鍵分隔）如下，使用MAPREDUCE計算每個學生的平均成績

李平 87 89 98 75
張三 66 78 69 70
李四 96 82 78 90
王五 82 77 74 86
趙六 88 72 81 76

Mapreduce 程式：

public class Score {

    public static class ScoreMap extends
            Mapper<Object, Text, Text, NullWritable> {

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            context.write(value, NullWritable.get());
        }

    }

    public static class ScoreReduce extends
            Reducer<Text, NullWritable, Text, IntWritable> {
        public void reduce(Text key, Iterable<NullWritable> values,
                Context context) throws IOException, InterruptedException {
            for (NullWritable nullWritable : values) {
                String line = key.toString();
                String[] vals = line.split("\t");
                String name = vals[0];
                int val1 = Integer.parseInt(vals[1]);
                int val2 = Integer.parseInt(vals[2]);
                int val3 = Integer.parseInt(vals[3]);
                int average = (val1 + val2 + val3) / 3;
                context.write(new Text(name), new IntWritable(average));
            }
        }
    }
}

輸出結果為

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

Android Studio Activity Intent 閃退崩潰 Toolbar

今天寫登錄註冊頁面，點擊登錄頁面的“註冊”按鈕後軟體突然崩潰，直接閃退，因為是新手，只能去網上搜。雖然網上解決方法眾多，但也沒找到可行的。想起來可以看Logcat，馬上重新運行應用，查看崩潰時的日誌，發現日誌比較多，還都不認識，只好又複製日誌上網搜索，無果。突發奇想，把註冊頁面有關Toolbar的內 ...
iOS-隱藏Navigationbar【導航欄無縫圓滑的隱藏】

1.ViewController .m 頭部代理代理方法 2.KKViewController（目標ViewController）新建一個KKViewController .h .m 頭部代理手勢代理方法效果圖 ...
輪播圖記錄篇

RecyclerView做的一個輪播效果，適配器有視圖緩存，避免了一些記憶體問題首先是藉助 PagerSnapHelper 讓RecyclerView每次只滑動一個，然後添加一個指示器，這裡指示器是動態生成的，自己做了個簡單的view 很簡單的一個效果，直接上代碼 public class Imag ...
iOS開發-LayoutGuide（從top/bottom LayoutGuide到Safe Area）

iOS7 topLayoutGuide/bottomLayoutGuide 創建一個叫做LayoutGuideStudy的工程，我們打開看一下Main.storyboard： storyboard-top_bottom_layoutGuide.png 可以看到View Controller下麵出現t ...
手機拍照駕駛證識別技術，實現手機上便捷查詢違章記錄

隨著社會經濟的發展，車輛已經成了城鎮居民的必需品，而查駕駛證違章記錄、繳納違章罰款，也已經成了人們的生活常態了。而移動互聯的飛速發展，讓這些以前需要跑銀行、跑交警大隊的事項，如今通過手機APP就能解決了。但是手機APP也有讓人困擾的地方，那就是手動輸入駕駛證信息非常繁瑣，且操作容易出現錯誤，不過當手 ...
iOS百度地圖 Demo

效果圖一、環境設置 1.開發環境：Xocode 7.3.1 2.模擬器環境：iOS 9.3 3.iOS 9 之後不能直接使用 HTTP 進行請求，需要在 Info.plist 新增一段用於控制 ATS 的配置：也即：註： bitcode 是 xcode 7 之後新增的配置選項，預設為 YES， ...
CK2137-Android Studio-2017最新版p2p金融項目實戰

CK2137-Android Studio-2017最新版p2p金融項目實戰隨筆背景：在很多時候，很多入門不久的朋友都會問我：我是從其他語言轉到程式開發的，有沒有一些基礎性的資料給我們學習學習呢，你的框架感覺一下太大了，希望有個循序漸進的教程或者視頻來學習就好了。對於學習有困難不知道如何提升自己可 ...
Redis學習筆記--常用命令

以下為本人學習Redis的備忘錄，記錄了大部分常用命令 1.客戶端連接redis服務端：啟動Redis服務端 redis-server /yourpath/redis.conf 啟動Redis命令行客戶端建立長連接：redis-cli -h 127.0.0.1 -p 6379 若省略參數則將使用 ...

大數據(1)：基於sogou.500w.utf8數據的MapReduce程式設計

1.使用ECLIPSE工具打包運行WORDCOUNT實例，統計莎士比亞文集各單詞計數（文件SHAKESPEARE.TXT）。

2.對於SOGOU_500W_UTF文件，完成下列程式設計內容：

（1） 統計每個用戶搜索的關鍵字總長度

（2） 統計2011年12月30日1點到2點之間，搜索過的UID有哪些？

（3） 統計搜索過‘仙劍奇俠’的每個UID搜索該關鍵詞的次數。

3.使用MAPREDUCE程式設計實現對文件中下列數據的排序操作78 11 56 87 25 63 19 22 55

4.學生成績文件TXT內容（欄位用TAB鍵分隔）如下，使用MAPREDUCE計算每個學生的平均成績

（1）統計每個用戶搜索的關鍵字總長度

（2）統計2011年12月30日1點到2點之間，搜索過的UID有哪些？

（3）統計搜索過‘仙劍奇俠’的每個UID搜索該關鍵詞的次數。