第4章 Hadoop文件參數配置實驗一：hadoop 全分佈配置 1.1 實驗目的完成本實驗，您應該能夠：掌握 hadoop 全分佈的配置掌握 hadoop 全分佈的安裝掌握 hadoop 配置文件的參數意義 1.2 實驗要求熟悉 hadoop 全分佈的安裝瞭解 hadoop 配置文件 ...

第4章 Hadoop文件參數配置

實驗一：hadoop 全分佈配置

1.1 實驗目的

完成本實驗，您應該能夠：

掌握 hadoop 全分佈的配置
掌握 hadoop 全分佈的安裝
掌握 hadoop 配置文件的參數意義

1.2 實驗要求

熟悉 hadoop 全分佈的安裝
瞭解 hadoop 配置文件的意義

1.3 實驗過程

1.3.1 實驗任務一：在 Master 節點上安裝 Hadoop

1.3.1.1 步驟一：解壓縮 hadoop-2.7.1.tar.gz 安裝包到/usr 目錄下

[root@master ~]# tar zvxf jdk-8u152-linux-x64.tar.gz -C /usr/local/src/

[root@master ~]# tar zvxf hadoop-2.7.1.tar.gz -C /usr/local/src/

1.3.1.2 步驟二：將 hadoop-2.7.1 文件夾重命名為 hadoop

[root@master ~]# cd /usr/local/src/
[root@master src]# ls
hadoop-2.7.1  jdk1.8.0_152
[root@master src]# mv hadoop-2.7.1/ hadoop
[root@master src]# mv jdk1.8.0_152/ jdk
[root@master src]# ls
hadoop  jdk

1.3.1.3 步驟三：配置 Hadoop 環境變數

[root@master ~]# vi /etc/profile.d/hadoop.sh

註意:在第二章安裝單機 Hadoop 系統已經配置過環境變數,先刪除之前配置後添加

#寫入以下信息
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

1.3.1.4 步驟四：使配置的 Hadoop 的環境變數生效

[root@master ~]# source /etc/profile.d/hadoop.sh 
[root@master ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

1.3.1.5 步驟五：執行以下命令修改 hadoop-env.sh 配置文件

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh

#寫入以下信息
export JAVA_HOME=/usr/local/src/jdk

1.3.2 實驗任務二：配置 hdfs-site.xml 文件參數

執行以下命令修改 hdfs-site.xml 配置文件。

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml

#在文件中<configuration>和</configuration>一對標簽之間追加以下配置信息
<configuration>
		<property>
				<name>dfs.namenode.name.dir</name>
				<value>file:/usr/local/src/hadoop/dfs/name</value>
		</property>
		<property>
				<name>dfs.datanode.data.dir</name>
				<value>file:/usr/local/src/hadoop/dfs/data</value>
		</property>
		<property>
				<name>dfs.replication</name>
				<value>2</value>
		</property>
</configuration>

創建目錄
[root@master ~]# mkdir -p /usr/local/src/hadoop/dfs/{name,data}

對於 Hadoop 的分散式文件系統 HDFS 而言，一般都是採用冗餘存儲，冗餘因數通常為3，也就是說，一份數據保存三份副本。所以，修改 dfs.replication 的配置，使 HDFS 文件的備份副本數量設定為2個。

1.3.3 實驗任務三：配置 core-site.xml 文件參數

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml

#在文件中<configuration>和</configuration>一對標簽之間追加以下配置信息

<configuration>
		<property>
				<name>fs.defaultFS</name>
				<value>hdfs://master:9000</value>
		</property>
		<property>
				<name>io.file.buffer.size</name>
		<value>131072</value>
		</property>
		<property>
				<name>hadoop.tmp.dir</name>
				<value>file:/usr/local/src/hadoop/tmp</value>
		</property>
</configuration>

#保存以上配置後創建目錄
[root@master ~]# mkdir -p /usr/local/src/hadoop/tmp

如沒有配置 hadoop.tmp.dir 參數，此時系統預設的臨時目錄為：/tmp/hadoop-hadoop。該目錄在每次 Linux 系統重啟後會被刪除，必須重新執行 Hadoop 文件系統格式化命令，否則 Hadoop 運行會出錯。

1.3.4 實驗任務四：配置 mapred-site.xml

[root@master ~]# cd /usr/local/src/hadoop/etc/hadoop/
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml

#在文件中<configuration>和</configuration>一對標簽之間追加以下配置信息

<configuration>
		<property>
				<name>mapreduce.framework.name</name>
		<value>yarn</value>
		</property>
		<property>
				<name>mapreduce.jobhistory.address</name>
				<value>master:10020</value>
		</property>
		<property>
				<name>mapreduce.jobhistory.webapp.address</name>
				<value>master:19888</value>
		</property>
</configuration>

1.3.5 實驗任務五：配置 yarn-site.xml

[root@master hadoop]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml

#在文件中<configuration>和</configuration>一對標簽之間追加以下配置信息

<configuration>
<!-- Site specific YARN configuration properties -->
		<property>
				<name>arn.resourcemanager.address</name>
				<value>master:8032</value>
		</property>
		<property>
				<name>yarn.resourcemanager.scheduler.address</name>
				<value>master:8030</value>
		</property>
		<property>
				<name>yarn.resourcemanager.webapp.address</name>
				<value>master:8088</value>
		</property>
		<property>
				<name>yarn.resourcemanager.resource-tracker.address</name>
				<value>master:8031</value>
		</property>
		<property>
				<name>yarn.resourcemanager.admin.address</name>
				<value>master:8033</value>
		</property>
		<property>
				<name>yarn.nodemanager.aux-services</name>
				<value>mapreduce_shuffle</value>
		</property>
		<property>
			  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
			  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
		</property>
</configuration>

1.3.6 實驗任務六：Hadoop 其它相關配置

1.3.6.1 步驟一：配置 masters 文件

#修改 masters 配置文件
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/masters

#加入以下配置信息
10.10.10.128

1.3.6.2 步驟二：配置 slaves 文件

#修改 slaves 配置文件
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/slaves

#刪除 localhost，加入以下配置信息
10.10.10.129
10.10.10.130

1.3.6.3 步驟三：新建用戶以及修改目錄許可權

#新建用戶
[root@master ~]# useradd hadoop 
[root@master ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.

#修改目錄許可權
[root@master ~]# chown -R hadoop.hadoop /usr/local/src/
[root@master ~]# cd /usr/local/src/
[root@master src]# ll
total 0
drwxr-xr-x 11 hadoop hadoop 171 Mar 27 01:51 hadoop
drwxr-xr-x  8 hadoop hadoop 255 Sep 14  2017 jdk

1.3.6.4 步驟四：配置master能夠免密登錄所有slave節點

[root@master ~]# ssh-keygen -t rsa

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Ibeslip4Bo9erREJP37u7qhlwaEeMOCg8DlJGSComhk root@master
The key's randomart image is:
+---[RSA 2048]----+
|B.oo |
|Oo.o |
|=o=.  . o|
|E.=.o  + o   |
|.* BS|
|* o =  o |
| * * o+  |
|o O *o   |
|.=.+==   |
+----[SHA256]-----+

[root@master ~]# ssh-copy-id root@slave1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (10.10.10.129)' can't be established.
ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave1's password: 
Number of key(s) added: 1
Now try logging into the machine, with:   "ssh 'root@slave1'"
and check to make sure that only the key(s) you wanted were added.

[root@master ~]# ssh-copy-id root@slave2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (10.10.10.130)' can't be established.
ECDSA key fingerprint is SHA256:Z643OMlGh0yMEc5i85oZ7c21NHdkzSZD9hY9K39xzP4.
ECDSA key fingerprint is MD5:e0:ef:47:5f:ad:75:9a:44:08:bc:f2:10:8e:d6:53:4a.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@slave2's password: 
Number of key(s) added: 1  
Now try logging into the machine, with:   "ssh 'root@slave2'"
and check to make sure that only the key(s) you wanted were added.
   
[root@master ~]# ssh slave1
Last login: Sun Mar 27 02:58:38 2022 from master
[root@slave1 ~]# exit
logout
Connection to slave1 closed.

[root@master ~]# ssh slave2
Last login: Sun Mar 27 00:26:12 2022 from 10.10.10.1
[root@slave2 ~]# exit
logout
Connection to slave2 closed.

1.3.6.5 步驟五：同步/usr/local/src/目錄下所有文件至所有slave節點

[root@master ~]# scp -r /usr/local/src/* root@slave1:/usr/local/src/

[root@master ~]# scp -r /usr/local/src/* root@slave2:/usr/local/src/

[root@master ~]# scp /etc/profile.d/hadoop.sh root@slave1:/etc/profile.d/
hadoop.sh                                   100%  151    45.9KB/s   00:00 
   
[root@master ~]# scp /etc/profile.d/hadoop.sh root@slave2:/etc/profile.d/
hadoop.sh                                   100%  151    93.9KB/s   00:00

1.3.6.6 步驟六：在所有slave節點執行以下命令

(1)在slave1

[root@slave1 ~]# useradd hadoop 
[root@slave1 ~]# echo 'hadoop' | passwd --stdin hadoop 
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.

[root@slave1 ~]# chown -R hadoop.hadoop /usr/local/src/
[root@slave1 ~]# ll /usr/local/src/
total 0
drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:07 hadoop
drwxr-xr-x  8 hadoop hadoop 255 Mar 27 03:07 jdk

[root@slave1 ~]# source /etc/profile.d/hadoop.sh 

[root@slave1 ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

（2）在slave2

[root@slave2 ~]# useradd hadoop
[root@slave2 ~]# echo 'hadoop' | passwd --stdin hadoop
Changing password for user hadoop.
passwd: all authentication tokens updated successfully.

[root@slave2 ~]# chown -R hadoop.hadoop /usr/local/src/
[root@slave2 ~]# ll /usr/local/src/
total 0
drwxr-xr-x 11 hadoop hadoop 171 Mar 27 03:09 hadoop
drwxr-xr-x  8 hadoop hadoop 255 Mar 27 03:09 jdk

[root@slave2 ~]# source /etc/profile.d/hadoop.sh 

[root@slave2 ~]# echo $PATH
/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

第5章 Hadoop集群運行

實驗一：hadoop 集群運行

1.1 實驗目的

完成本實驗，您應該能夠：

掌握 hadoop 的運行狀態
掌握 hadoop 文件系統格式化配置
掌握 hadoop java 運行狀態查看
掌握 hadoop hdfs 報告查看
掌握 hadoop 節點狀態查看
掌握停止 hadoop 進程操作

1.2 實驗要求

熟悉如何查看 hadoop 的運行狀態
熟悉停止 hadoop 進程的操作

1.3 實驗過程

1.3.1 實驗任務一：配置 Hadoop 格式化

1.3.1.1 步驟一：NameNode 格式化

將 NameNode 上的數據清零，第一次啟動 HDFS 時要進行格式化，以後啟動無需再格式化，否則會缺失 DataNode 進程。另外，只要運行過 HDFS，Hadoop 的工作目錄（本書設置為/usr/local/src/hadoop/tmp）就會有數據，如果需要重新格式化，則在格式化之前一定要先刪除工作目錄下的數據，否則格式化時會出問題。

執行如下命令，格式化 NameNode

[root@master ~]# su - hadoop 
Last login: Fri Apr  1 23:34:46 CST 2022 on pts/1

[hadoop@master ~]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ ./bin/hdfs namenode -format
22/04/02 01:22:42 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************

1.3.1.2 步驟二：啟動 NameNode

[hadoop@master hadoop]$ hadoop-daemon.sh start namenode
namenode running as process 11868. Stop it first.

1.3.2 實驗任務二：查看 Java 進程

啟動完成後，可以使用 JPS 命令查看是否成功。JPS 命令是 Java 提供的一個顯示當前所有 Java 進程 pid 的命令。

[hadoop@master hadoop]$ jps
12122 Jps
11868 NameNode

1.3.2.1 步驟一：切換到Hadoop用戶

[hadoop@master ~]$ su - hadoop 
Password: 
Last login: Sat Apr  2 01:22:13 CST 2022 on pts/1
Last failed login: Sat Apr  2 04:47:08 CST 2022 on pts/1
There was 1 failed login attempt since the last successful login.

1.3.3 實驗任務三：查看 HDFS 的報告

[hadoop@master ~]$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------

1.3.3.1 步驟一：生成密鑰

[hadoop@master ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:nW/cVxmRp5Ht9TKGT61OmGbhQtkBdpHyS5prGhx24pI [email protected]
The key's randomart image is:
+---[RSA 2048]----+
|  o.oo +.|
| ...o o.=|
|   = o *+|
| .o.* * *|
|S.+= O =.|
|   = ++oB.+ .|
|  E +  =+o. .|
|   . .o.  .. |
|.o   |
+----[SHA256]-----+

[hadoop@master ~]$ ssh-copy-id slave1
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave1 (10.10.10.129)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave1's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'slave1'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@master ~]$ ssh-copy-id slave2
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'slave2 (10.10.10.130)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@slave2's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'slave2'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@master ~]$ ssh-copy-id master
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'master (10.10.10.128)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'master'"
and check to make sure that only the key(s) you wanted were added.

1.3.4 實驗任務四：停止dfs.sh

[hadoop@master ~]$ stop-dfs.sh 
Stopping namenodes on [master]
master: stopping namenode
10.10.10.129: no datanode to stop
10.10.10.130: no datanode to stop
Stopping secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:BE2tM2BCeGBc6aGRKBTbMTh80VP9noFKzqDknL+0Jes.
ECDSA key fingerprint is MD5:a2:25:9c:bc:d0:df:fc:ec:44:4a:c0:10:26:f2:ef:c7.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: no secondarynamenode to stop

1.3.4.1 重啟並驗證

[hadoop@master ~]$ start-dfs.sh 
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.example.com.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.example.com.out

[hadoop@master ~]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.example.com.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out

[hadoop@master ~]$ jps
12934 NameNode
13546 Jps
13131 SecondaryNameNode
13291 ResourceManager

如果在master上看到ResourceManager，並且在slave上看到NodeManager就表示成功
[hadoop@master ~]$ jps
12934 NameNode
13546 Jps
13131 SecondaryNameNode
13291 ResourceManager

[root@slave1 ~]# jps
11906 NodeManager
11797 DataNode
12037 Jps

[root@slave2 ~]# jps
12758 NodeManager
12648 DataNode
12889 Jps

[hadoop@master ~]$ hdfs dfs -mkdir /input
[hadoop@master ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2022-04-02 05:18 /input
[hadoop@master ~]$ mkdir ~/input
[hadoop@master ~]$ vim ~/input/data.txt
Hello World
Hello Hadoop
Hello Huasan
~

[hadoop@master ~]$ hdfs dfs -put ~/input/data.txt 
.bash_logout       .bashrc            .oracle_jre_usage/ .viminfo           
.bash_profile      input/             .ssh/              
[hadoop@master ~]$ hdfs dfs -put ~/input/data.txt /input
[hadoop@master ~]$ hdfs dfs -cat /input/data.txt
Hello World
Hello Hadoop
Hello Huasan
[hadoop@master ~]$ hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input/data.txt /output
22/04/02 05:31:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/02 05:31:21 INFO input.FileInputFormat: Total input paths to process : 1
22/04/02 05:31:21 INFO mapreduce.JobSubmitter: number of splits:1
22/04/02 05:31:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1648846845675_0001
22/04/02 05:31:22 INFO impl.YarnClientImpl: Submitted application application_1648846845675_0001
22/04/02 05:31:22 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1648846845675_0001/
22/04/02 05:31:22 INFO mapreduce.Job: Running job: job_1648846845675_0001
22/04/02 05:31:30 INFO mapreduce.Job: Job job_1648846845675_0001 running in uber mode : false
22/04/02 05:31:30 INFO mapreduce.Job:  map 0% reduce 0%
22/04/02 05:31:38 INFO mapreduce.Job:  map 100% reduce 0%
22/04/02 05:31:42 INFO mapreduce.Job:  map 100% reduce 100%
22/04/02 05:31:42 INFO mapreduce.Job: Job job_1648846845675_0001 completed successfully
22/04/02 05:31:42 INFO mapreduce.Job: Counters: 49
    File System Counters
            FILE: Number of bytes read=56
            FILE: Number of bytes written=230931
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=136
            HDFS: Number of bytes written=34
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
    Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=5501
            Total time spent by all reduces in occupied slots (ms)=1621
            Total time spent by all map tasks (ms)=5501
            Total time spent by all reduce tasks (ms)=1621
            Total vcore-seconds taken by all map tasks=5501
            Total vcore-seconds taken by all reduce tasks=1621
            Total megabyte-seconds taken by all map tasks=5633024
            Total megabyte-seconds taken by all reduce tasks=1659904
    Map-Reduce Framework
            Map input records=3
            Map output records=6
            Map output bytes=62
            Map output materialized bytes=56
            Input split bytes=98
            Combine input records=6
            Combine output records=4
            Reduce input groups=4
            Reduce shuffle bytes=56
            Reduce input records=4
            Reduce output records=4
            Spilled Records=8
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=572
            CPU time spent (ms)=1860
            Physical memory (bytes) snapshot=428474368
            Virtual memory (bytes) snapshot=4219695104
            Total committed heap usage (bytes)=284164096
    Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
    File Input Format Counters 
            Bytes Read=38
    File Output Format Counters 
            Bytes Written=34

[hadoop@master ~]$ hdfs dfs -cat /output/part-r-00000
Hadoop  1
Hello   3
Huasan  1
World   1

第6章 Hive組建安裝配置

實驗一：Hive 組件安裝配置

1.1. 實驗目的

完成本實驗，您應該能夠：

掌握Hive 組件安裝配置
掌握Hive 組件格式化和啟動

1.2. 實驗要求

熟悉Hive 組件安裝配置
瞭解Hive 組件格式化和啟動

1.3. 實驗過程

1.3.1. 實驗任務一：下載和解壓安裝文件

1.3.1.1. 步驟一：基礎環境和安裝準備

Hive 組件需要基於Hadoop 系統進行安裝。因此，在安裝 Hive 組件前，需要確保 Hadoop 系統能夠正常運行。本章節內容是基於之前已部署完畢的 Hadoop 全分佈系統，在 master 節點上實現 Hive 組件安裝。
Hive 組件的部署規劃和軟體包路徑如下：

（1）當前環境中已安裝 Hadoop 全分佈系統。

（2）本地安裝 MySQL 資料庫（賬號 root，密碼 Password123$），軟體包在/opt/software/mysql-5.7.18 路徑下。

（3）MySQL 埠號（3306）。

（4）MySQL 的 JDBC 驅動包/opt/software/mysql-connector-java-5.1.47.jar，在此基礎上更新 Hive 元數據存儲。

（5）Hive 軟體包/opt/software/apache-hive-2.0.0-bin.tar.gz。

1.3.1.2. 步驟二：解壓安裝文件

（1）使用 root 用戶，將 Hive 安裝包
/opt/software/apache-hive-2.0.0-bin.tar.gz 路解壓到/usr/local/src 路徑下。

[root@master ~]# tar -zxvf /opt/software/apache-hive-2.0.0-bin.tar.gz -C /usr/local/src/

（2）將解壓後的 apache-hive-2.0.0-bin 文件夾更名為 hive；

[root@master ~]# mv /usr/local/src/apache-hive-2.0.0-bin/ /usr/local/src/hive/

（3）修改 hive 目錄歸屬用戶和用戶組為 hadoop

[root@master ~]# chown -R hadoop:hadoop /usr/local/src/hive

1.3.2. 實驗任務二：設置 Hive 環境

1.3.2.1. 步驟一：卸載MariaDB 資料庫

Hive 元數據存儲在 MySQL 資料庫中，因此在部署 Hive 組件前需要首先在 Linux 系統下安裝 MySQL 資料庫，併進行 MySQL 字元集、安全初始化、遠程訪問許可權等相關配置。需要使用 root 用戶登錄，執行如下操作步驟：

（1）關閉 Linux 系統防火牆，並將防火牆設定為系統開機並不自動啟動。

[root@master ~]# systemctl stop firewalld
[root@master ~]# systemctl disable firewalld

（2）卸載 Linux 系統自帶的 MariaDB。

首先查看 Linux 系統中 MariaDB 的安裝情況。

[root@master ~]# rpm -qa | grep mariadb

2）卸載 MariaDB 軟體包。
我這裡沒有就不需要卸載

1.3.2.2. 步驟二：安裝MySQL 資料庫

（1）按如下順序依次按照 MySQL 資料庫的 mysql common、mysql libs、mysql client 軟體包。

[root@master ~]# cd /opt/software/mysql-5.7.18/

[root@master mysql-5.7.18]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...  ################################# [100%]
package mysql-community-common-5.7.18-1.el7.x86_64 is already installed

[root@master mysql-5.7.18]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...  ################################# [100%]
package mysql-community-libs-5.7.18-1.el7.x86_64 is already installed

[root@master mysql-5.7.18]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm
warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...  ################################# [100%]
package mysql-community-client-5.7.18-1.el7.x86_64 is already installed

（2）安裝 mysql server 軟體包。

[root@master mysql-5.7.18]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm 
warning: mysql-community-server-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
Preparing...  ################################# [100%]
package mysql-community-server-5.7.18-1.el7.x86_64 is already installed

（3）修改 MySQL 資料庫配置，在/etc/my.cnf 文件中添加如表 6-1 所示的 MySQL 資料庫配置項。

將以下配置信息添加到/etc/my.cnf 文件 symbolic-links=0 配置信息的下方。

default-storage-engine=innodb 

innodb_file_per_table 

collation-server=utf8_general_ci 

init-connect='SET NAMES utf8' 

character-set-server=utf8

（4）啟動 MySQL 資料庫。

[root@master ~]# systemctl start mysqld

（5）查詢 MySQL 資料庫狀態。mysqld 進程狀態為 active (running)，則表示 MySQL 資料庫正常運行。

如果 mysqld 進程狀態為 failed，則表示 MySQL 資料庫啟動異常。此時需要排查/etc/my.cnf 文件。

[root@master ~]# systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2022-04-10 22:54:39 CST; 1h 0min ago
 Docs: man:mysqld(8)
   http://dev.mysql.com/doc/refman/en/using-systemd.html
 Main PID: 929 (mysqld)
   CGroup: /system.slice/mysqld.service
   └─929 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/my...

Apr 10 22:54:35 master systemd[1]: Starting MySQL Server...
Apr 10 22:54:39 master systemd[1]: Started MySQL Server.

（6）查詢 MySQL 資料庫預設密碼。

[root@master ~]# cat /var/log/mysqld.log | grep password
2022-04-08T16:20:04.456271Z 1 [Note] A temporary password is generated for root@localhost: 0yf>>yWdMd8_

MySQL 資料庫是安裝後隨機生成的，所以每次安裝後生成的預設密碼不相同。

（7）MySQL 資料庫初始化。 0yf>>yWdMd8_

執行 mysql_secure_installation 命令初始化 MySQL 資料庫，初始化過程中需要設定資料庫 root 用戶登錄密碼，密碼需符合安全規則，包括大小寫字元、數字和特殊符號，可設定密碼為 Password123$。

在進行 MySQL 資料庫初始化過程中會出現以下交互確認信息：

1）Change the password for root ? ((Press y|Y for Yes, any other key for No)表示是否更改 root 用戶密碼，在鍵盤輸入 y 和回車。

2）Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No)表示是否使用設定的密碼繼續，在鍵盤輸入 y 和回車。

3）Remove anonymous users? (Press y|Y for Yes, any other key for No)表示是否刪除匿名用戶，在鍵盤輸入 y 和回車。

4）Disallow root login remotely? (Press y|Y for Yes, any other key for No) 表示是否拒絕 root 用戶遠程登錄，在鍵盤輸入 n 和回車，表示允許 root 用戶遠程登錄。

5）Remove test database and access to it? (Press y|Y for Yes, any other key for No)表示是否刪除測試資料庫，在鍵盤輸入 y 和回車。

6）Reload privilege tables now? (Press y|Y for Yes, any other key for No) 表示是否重新載入授權表，在鍵盤輸入 y 和回車。

mysql_secure_installation 命令執行過程如下：

[root@master ~]# mysql_secure_installation

Securing the MySQL server deployment.

Enter password for user root: 
The 'validate_password' plugin is installed on the server.
The subsequent steps will run with the existing configuration
of the plugin.
Using existing password for root.

Estimated strength of the password: 100 
Change the password for root ? ((Press y|Y for Yes, any other key for No) : y

New password: 

Re-enter new password: 

Estimated strength of the password: 100 
Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
By default, a MySQL installation has an anonymous user,
allowing anyone to log into MySQL without having to have
a user account created for them. This is intended only for
testing, and to make the installation go a bit smoother.
You should remove them before moving into a production
environment.

Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
Success.

Normally, root should only be allowed to connect from
'localhost'. This ensures that someone cannot guess at
the root password from the network.

Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n

 ... skipping.
By default, MySQL comes with a database named 'test' that
anyone can access. This is also intended only for testing,
and should be removed before moving into a production
environment.

Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
 - Dropping test database...
Success.

 - Removing privileges on test database...
Success.

Reloading the privilege tables will ensure that all changes
made so far will take effect immediately.

Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
Success.

All done!

(7) 添加 root 用戶從本地和遠程訪問 MySQL 資料庫表單的授權。

[root@master ~]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.18 MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> grant all privileges on *.* to root@'localhost' identified by 'Password123$'; 
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> grant all privileges on *.* to root@'%' identified by 'Password123$';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> select user,host from mysql.user where user='root';
+------+-----------+
| user | host  |
+------+-----------+
| root | % |
| root | localhost |
+------+-----------+
2 rows in set (0.00 sec)

mysql> exit;
Bye

1.3.2.3. 步驟三：配置 Hive 組件

（1）設置 Hive 環境變數並使其生效。

[root@master ~]# vim /etc/profile

export HIVE_HOME=/usr/local/src/hive
export PATH=$PATH:$HIVE_HOME/bin

[root@master ~]# source /etc/profile

（2）修改 Hive 組件配置文件。

切換到 hadoop 用戶執行以下對 Hive 組件的配置操作。
將/usr/local/src/hive/conf 文件夾下 hive-default.xml.template 文件，更名為hive-site.xml。

[root@master ~]# su - hadoop 
Last login: Sun Apr 10 23:27:25 CS

[hadoop@master ~]$ cp /usr/local/src/hive/conf/hive-default.xml.template  /usr/local/src/hive/conf/hive-site.xml

（3）通過 vi 編輯器修改 hive-site.xml 文件實現 Hive 連接 MySQL 資料庫，並設定Hive 臨時文件存儲路徑。

[hadoop@master ~]$ vi /usr/local/src/hive/conf/hive-site.xml

1）設置 MySQL 資料庫連接。

<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;us eSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>

2）配置 MySQL 資料庫 root 的密碼。

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Password123$</value>
<description>password to use against s database</description>
</property>

3）驗證元數據存儲版本一致性。若預設 false，則不用修改。

 <property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>

4）配置資料庫驅動。

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

5）配置資料庫用戶名 javax.jdo.option.ConnectionUserName 為 root。

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>

6 ）將以下位置的 ${system:java.io.tmpdir}/${system:user.name} 替換為“/usr/local/src/hive/tmp”目錄及其子目錄。

需要替換以下 4 處配置內容：

<name>hive.querylog.location</name>
<value>/usr/local/src/hive/tmp</value>
<description>Location of Hive run time structured log file</description>

<name>hive.exec.local.scratchdir</name>
<value>/usr/local/src/hive/tmp</value>

<name>hive.downloaded.resources.dir</name>
<value>/usr/local/src/hive/tmp/resources</value>

<name>hive.server2.logging.operation.log.location</name>
<value>/usr/local/src/hive/tmp/operation_logs</value>

7）在Hive安裝目錄中創建臨時文件夾 tmp。

[hadoop@master ~]$ mkdir /usr/local/src/hive/tmp

至此，Hive 組件安裝和配置完成。

1.3.2.4. 步驟四：初始化 hive 元數據

1）將 MySQL 資料庫驅動（/opt/software/mysql-connector-java-5.1.46.jar）拷貝到Hive 安裝目錄的 lib 下；

[hadoop@master ~]$ cp /opt/software/mysql-connector-java-5.1.46.jar /usr/local/src/hive/lib/

2）重新啟動 hadooop 即可

[hadoop@master ~]$ stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
10.10.10.129: stopping datanode
10.10.10.130: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
10.10.10.129: stopping nodemanager
10.10.10.130: stopping nodemanager
no proxyserver to stop

[hadoop@master ~]$ start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
10.10.10.129: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave1.out
10.10.10.130: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
10.10.10.130: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
10.10.10.129: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave1.out

3）初始化資料庫

[hadoop@master ~]$ schematool -initSchema -dbType mysql 
which: no hbase in (/usr/local/src/jdk/bin:/usr/local/src/hadoop/bin:/usr/local/src/hadoop/sbin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/src/hive/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&us eSSL=false
Metastore Connection Driver :com.mysql.jdbc.Driver
Metastore connection User:   root
Mon Apr 11 00:46:32 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.0.0
Initialization script hive-schema-2.0.0.mysql.sql
Password123$
Password123$
No current connection
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!

4）啟動 hive

[hadoop@master hive]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/src/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:50 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri May 20 18:51:52 CST 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

第7章 ZooKeeper組件安裝配置

實驗一：ZooKeeper 組件安裝配置

1.1.實驗目的

完成本實驗，您應該能夠：

掌握下載和安裝 ZooKeeper
掌握 ZooKeeper 的配置選項
掌握啟動 ZooKeeper

1.2.實驗要求

瞭解 ZooKeeper 的配置選項
熟悉啟動 ZooKeeper

1.3.實驗過程

1.3.1 實驗任務一：配置時間同步

[root@master ~]# yum -y install chrony

[root@master ~]# cat /etc/chrony.conf 
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst 
 
[root@master ~]# systemctl restart chronyd.service 
[root@master ~]# systemctl enable chronyd.service 

[root@master ~]# date 
Fri Apr 15 15:40:14 CST 2022

[root@slave1 ~]# yum -y install chrony

[root@slave1 ~]# cat /etc/chrony.conf 
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst

[root@slave1 ~]# systemctl restart chronyd.service
[root@slave1 ~]# systemctl enable chronyd.service

[root@slave1 ~]# date
Fri Apr 15 15:40:17 CST 2022

[root@slave2 ~]# yum -y install chrony

[root@slave2 ~]# cat /etc/chrony.conf 
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time1.aliyun.com iburst

[root@slave2 ~]# systemctl restart chronyd.service
[root@slave2 ~]# systemctl enable chronyd.service 

[root@slave2 ~]# date
Fri Apr 15 15:40:20 CST 2022

1.3.2 實驗任務二：下載和安裝 ZooKeeper

ZooKeeper最新的版本可以通過官網http://hadoop.apache.org/zookeeper/來獲取，安裝 ZooKeeper 組件需要與 Hadoop 環境適配。

註意，各節點的防火牆需要關閉，否則會出現連接問題。

1.ZooKeeper 的安裝包 zookeeper-3.4.8.tar.gz 已放置在 Linux系統/opt/software
目錄下。

2.解壓安裝包到指定目標，在 Master 節點執行如下命令。

[root@master ~]# tar xf /opt/software/zookeeper-3.4.8.tar.gz -C /usr/local/src/

[root@master ~]# cd /usr/local/src/
[root@master src]# mv zookeeper-3.4.8/ zookeeper

1.3.3 實驗任務三：ZooKeeper的配置選項

1.3.3.1 步驟一：Master節點配置

（1）在 ZooKeeper 的安裝目錄下創建 data 和 logs 文件夾。

 [root@master src]# cd /usr/local/src/zookeeper/
 [root@master zookeeper]# mkdir data logs

（2）在每個節點寫入該節點的標識編號，每個節點編號不同，master節點寫入 1，slave1 節點寫入2，slave2 節點寫入3。

[root@master zookeeper]# echo '1' > /usr/local/src/zookeeper/data/myid

（3）修改配置文件 zoo.cfg

[root@master zookeeper]# cd /usr/local/src/zookeeper/conf/
[root@master conf]# cp zoo_sample.cfg zoo.cfg

修改 dataDir 參數內容如下：

[root@master conf]# vi zoo.cfg 
dataDir=/usr/local/src/zookeeper/data

（4）在 zoo.cfg 文件末尾追加以下參數配置，表示三個 ZooKeeper 節點的訪問埠號。

server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

（5）修改ZooKeeper安裝目錄的歸屬用戶為 hadoop 用戶。

[root@master conf]# chown -R hadoop:hadoop /usr/local/src/

1.3.3.2 步驟二：Slave 節點配置

（1）從 Master 節點複製 ZooKeeper 安裝目錄到兩個 Slave 節點。

[root@master ~]# scp -r /usr/local/src/zookeeper node1:/usr/local/src/
[root@master ~]# scp -r /usr/local/src/zookeeper node2:/usr/local/src/

（2）在slave1節點上修改 zookeeper 目錄的歸屬用戶為 hadoop 用戶。

[root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/
[root@slave1 ~]# ll /usr/local/src/
total 4
drwxr-xr-x. 12 hadoop hadoop  183 Apr  2 18:11 hadoop
drwxr-xr-x   9 hadoop hadoop  183 Apr 15 16:37 hbase
drwxr-xr-x.  8 hadoop hadoop  255 Apr  2 18:06 jdk
drwxr-xr-x  12 hadoop hadoop 4096 Apr 22 15:31 zookeeper

（3）在slave1節點上配置該節點的myid為2。

[root@slave1 ~]# echo 2 > /usr/local/src/zookeeper/data/myid

（4）在slave2節點上修改 zookeeper 目錄的歸屬用戶為 hadoop 用戶。

[root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/

（5）在slave2節點上配置該節點的myid為3。

[root@slave2 ~]# echo 3 > /usr/local/src/zookeeper/data/myid

1.3.3.3 步驟三：系統環境變數配置

在 master、slave1、slave2 三個節點增加環境變數配置。

[root@master conf]# vi /etc/profile.d/zookeeper.sh
export ZOOKEEPER_HOME=/usr/local/src/zookeeper
export PATH=${ZOOKEEPER_HOME}/bin:$PATH

[root@master ~]# scp /etc/profile.d/zookeeper.sh node1:/etc/profile.d/
zookeeper.sh 100%   8742.3KB/s   00:00

[root@master ~]# scp /etc/profile.d/zookeeper.sh node2:/etc/profile.d/
zookeeper.sh 100%   8750.8KB/s   00:00

1.3.4 實驗任務四：啟動 ZooKeeper

啟動ZooKeeper需要使用Hadoop用戶進行操作。

（1）分別在 master、slave1、slave2 三個節點使用 zkServer.sh start 命令啟動ZooKeeper。

[root@master ~]# su - hadoop 
Last login: Fri Apr 15 21:54:17 CST 2022 on pts/0

[hadoop@master ~]$ jps
3922 Jps

[hadoop@master ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

[hadoop@master ~]$ jps
3969 Jps
3950 QuorumPeerMain

[root@slave1 ~]# su - hadoop 
Last login: Fri Apr 15 22:06:47 CST 2022 on pts/0

[hadoop@slave1 ~]$ jps
1370 Jps

[hadoop@slave1 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

[hadoop@slave1 ~]$ jps
1395 QuorumPeerMain
1421 Jps

[root@slave2 ~]# su - hadoop 
Last login: Fri Apr 15 16:25:52 CST 2022 on pts/1

[hadoop@slave2 ~]$ jps
1336 Jps

[hadoop@slave2 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

[hadoop@slave2 ~]$ jps
1361 QuorumPeerMain
1387 Jps

（2）三個節點都啟動完成後，再統一查看 ZooKeeper 運行狀態。

[hadoop@master conf]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower

[hadoop@slave1 ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: leader

[hadoop@slave2 conf]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower

環境的搭建

第4章 Hadoop文件參數配置

實驗一：hadoop 全分佈配置

1.1 實驗目的

1.2 實驗要求

1.3 實驗過程

1.3.1 實驗任務一：在 Master 節點上安裝 Hadoop

1.3.1.1 步驟一：解壓縮 hadoop-2.7.1.tar.gz 安裝包到/usr 目錄下

1.3.1.2 步驟二：將 hadoop-2.7.1 文件夾重命名為 hadoop

1.3.1.3 步驟三：配置 Hadoop 環境變數

1.3.1.4 步驟四：使配置的 Hadoop 的環境變數生效

1.3.1.5 步驟五：執行以下命令修改 hadoop-env.sh 配置文件

1.3.2 實驗任務二：配置 hdfs-site.xml 文件參數

1.3.3 實驗任務三：配置 core-site.xml 文件參數

1.3.4 實驗任務四：配置 mapred-site.xml

1.3.5 實驗任務五：配置 yarn-site.xml

1.3.6 實驗任務六：Hadoop 其它相關配置

1.3.6.1 步驟一：配置 masters 文件

1.3.6.2 步驟二：配置 slaves 文件

1.3.6.3 步驟三：新建用戶以及修改目錄許可權

1.3.6.4 步驟四：配置master能夠免密登錄所有slave節點

1.3.6.5 步驟五：同步/usr/local/src/目錄下所有文件至所有slave節點

1.3.6.6 步驟六：在所有slave節點執行以下命令

第5章 Hadoop集群運行

實驗一：hadoop 集群運行

1.1 實驗目的

1.2 實驗要求

1.3 實驗過程

1.3.1 實驗任務一：配置 Hadoop 格式化

1.3.1.1 步驟一：NameNode 格式化

1.3.1.2 步驟二：啟動 NameNode

1.3.2 實驗任務二：查看 Java 進程

1.3.2.1 步驟一：切換到Hadoop用戶

1.3.3 實驗任務三：查看 HDFS 的報告

1.3.3.1 步驟一：生成密鑰

1.3.4 實驗任務四：停止dfs.sh

1.3.4.1 重啟並驗證

第6章 Hive組建安裝配置

實驗一：Hive 組件安裝配置

1.1. 實驗目的

1.2. 實驗要求

1.3. 實驗過程

1.3.1. 實驗任務一：下載和解壓安裝文件

1.3.1.1. 步驟一：基礎環境和安裝準備

1.3.1.2. 步驟二：解壓安裝文件

1.3.2. 實驗任務二：設置 Hive 環境

1.3.2.1. 步驟一：卸載MariaDB 資料庫

1.3.2.2. 步驟二：安裝MySQL 資料庫

1.3.2.3. 步驟三：配置 Hive 組件

1.3.2.4. 步驟四：初始化 hive 元數據

第7章 ZooKeeper組件安裝配置

實驗一：ZooKeeper 組件安裝配置

1.1.實驗目的

1.2.實驗要求

1.3.實驗過程

1.3.1 實驗任務一：配置時間同步

1.3.2 實驗任務二：下載和安裝 ZooKeeper

1.3.3 實驗任務三：ZooKeeper的配置選項

1.3.3.1 步驟一：Master節點配置

1.3.3.2 步驟二：Slave 節點配置

1.3.3.3 步驟三：系統環境變數配置

1.3.4 實驗任務四：啟動 ZooKeeper

第8章 HBase組件安裝配置

實驗一：HBase 組件安裝與配置

1.1實驗目的