本來這一節想寫Hadoop的分散式高可用環境的搭建,寫到一半,發現還是有必要先介紹一下ZooKeeper這個東西。 ZooKeeper理念介紹 ZooKeeper是為分散式應用來提供協同服務的,而且ZooKeeper本身也是分散式的,由分佈在至少三台機器上,這幾台機器形成一個Quorum,就像一個劇 ...
本來這一節想寫Hadoop的分散式高可用環境的搭建,寫到一半,發現還是有必要先介紹一下ZooKeeper這個東西。
ZooKeeper理念介紹
ZooKeeper是為分散式應用來提供協同服務的,而且ZooKeeper本身也是分散式的,由分佈在至少三台機器上,這幾台機器形成一個Quorum,就像一個劇團一樣。這個團里有個團長,就是leader的角色,其他的是follower。這個劇團里的每個人腦子裡都記住同樣的東西(ZooKeeper是基於記憶體的),並且及時和leader保持同步,所有client可連接任何一個server即可。劇團里的每個人都有一個編號myid。如果劇團里的leader掛斷後,剩下的幾個要重新選舉出新的leader來確保服務正常運行。
1. ZooKeepe的安裝
ZooKeeper的安裝挺簡單,就是解壓,設置環境變數就可以了
[root@hadoop100 bin]# tar -zxvf /opt/software/zookeeper-3.4.10.tar.gz -C /opt/modules/
打開/ect/profile 編輯環境變數,加上下麵的內容:
#JAVA_HOME export JAVA_HOME=/opt/modules/jdk1.8.0_121 export PATH=$PATH:$JAVA_HOME/bin #HADOOP_HOME export HADOOP_HOME=/opt/modules/hadoop-2.7.3 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin #ZOOKEEPER export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.10 export PATH=$PATH:ZOOKEEPER_HOME/bin
然後 source /ect/profile 讓更改生效。記得用xsync 和xcall超級腳本,把更改同步到整個集群。
[root@hadoop100 bin]# xsync /etc/profile
[root@hadoop100 bin]# xcall source /etc/profile
2. ZooKeeper的配置
1. Zookeeper 需要一個data目錄,用於存儲zookeeper記憶體資料庫的鏡像和日誌。然後更改zoo.cfg文件。ZooKeeper解壓後提供了一個/opt/modules/zookeeper-3.4.10/conf/zoo_sample.cfg文件,把這個複製一下或者改個名字叫zoo.cfg, 修改一下裡面的dataDir的指向。
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/opt/modules/zookeeper-3.4.10/zkData # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 ~
要搭建ZooKeeper的機器環境,zookeeper伺服器的數量應該是奇數台。最少要3台。
# 連接到leader 伺服器的tick數,超過這個tick數 這台伺服器還沒有連接上leader,那這台機 器就被認為是死掉了 initLimit = 5 # 在和leader同步過程中所允許落後的最大tick數,如果超過這個,那就是掉隊了 syncLimit = 2 server.100=hadoop100:2888:3888 server.101=hadoop101:2888:3888 server.102=hadoop102:2888:3888 server.103=hadoop103:2888:3888 server.104=hadoop104:2888:3888
機器的參數配置的格式是這樣的:
Server.A=B:C:D。
A是一個數字,表示這個是第幾號伺服器;
B是這個伺服器的ip地址;
C是這個伺服器與集群中的Leader伺服器交換信息的埠;
D是萬一集群中的Leader伺服器掛了,需要一個埠來重新進行選舉,選出一個新的Leader,而這個埠就是用來執行選舉時伺服器相互通信的埠。
註意更改完畢後別忘了分發到集群中。zookeeper本身是也分散式的。先把相關文件分發到集群中的其他機器上。
[root@hadoop100 modules]# xsync zookeeper-3.4.10/
然後為每台機器做上獨特的標記,在data目錄里創建myId文件,內容就是上面配置文件中的數字
[root@hadoop100 zookeeper-3.4.10]# cd zkData/ [root@hadoop100 zkData]# echo 100 > myid
在集群的其他幾台機器上修改myid文件的內容,讓myid的內容和配置文件中的編號一致。這時候只能麻煩點,依次登錄到每台機器上創建 data目錄下的myid文件了。
[root@hadoop100 zkData]# ssh hadoop101
Last login: Thu Sep 19 14:10:35 2019 from gateway [root@hadoop101 ~]# echo 101 > /opt/modules/zookeeper-3.4.10/zkData/myid
[root@hadoop101 ~]#exit
[root@hadoop100 zkData]# ssh hadoop101
Last login: Thu Sep 19 14:10:35 2019 from gateway
[root@hadoop101 ~]# echo 101 > /opt/modules/zookeeper-3.4.10/zkData/myid
[root@hadoop101 ~]# exit
logout
Connection to hadoop101 closed.
[root@hadoop100 zkData]# ssh hadoop102
Last login: Tue Sep 17 13:26:48 2019 from hadoop100
[root@hadoop102 ~]# echo 102 > /opt/modules/zookeeper-3.4.10/zkData/myid
[root@hadoop102 ~]# exit
logout
Connection to hadoop102 closed.
[root@hadoop100 zkData]# ssh hadoop103
Last login: Tue Sep 17 13:17:00 2019 from hadoop100
[root@hadoop103 ~]# echo 103 > /opt/modules/zookeeper-3.4.10/zkData/myid
[root@hadoop103 ~]# exit
logout
Connection to hadoop103 closed.
[root@hadoop100 zkData]# ssh hadoop104
Last login: Tue Sep 17 11:04:38 2019 from hadoop100
[root@hadoop104 ~]# echo 104 > /opt/modules/zookeeper-3.4.10/zkData/myid
[root@hadoop104 ~]# exit
logout
Connection to hadoop104 closed.
檢查一下確保沒問題
[root@hadoop100 bin]# xcall cat /opt/modules/zookeeper-3.4.10/zkData/myid ---------running at localhost-------- 100 ---------running at hadoop101------- 101 ---------running at hadoop102------- 102 ---------running at hadoop103------- 103 ---------running at hadoop104------- 104 [root@hadoop100 bin]#
好了,基本配置好了,準備啟動了,ZooKeeper集群都要啟動ZooKeeper服務。我用之前介紹過的超級腳本xcall. (後來發現用這種方式靠不住,說是啟動了,其實沒啟動 ;;;)
[root@hadoop100 zkData]# xcall /opt/modules/zookeeper-3.4.10/bin/zkServer.sh start ---------running at localhost-------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ---------running at hadoop101------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ---------running at hadoop102------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ---------running at hadoop103------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ---------running at hadoop104------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [root@hadoop100 zkData]#
錯誤排查:Error contacting service. It is probably not running.
查看一下運行狀態, 啊哦,怎麼沒啟動呢?
[root@hadoop100 bin]# xcall /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status ---------running at localhost-------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Error contacting service. It is probably not running. ---------running at hadoop101------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Error contacting service. It is probably not running. ---------running at hadoop102------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Error contacting service. It is probably not running. ---------running at hadoop103------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Error contacting service. It is probably not running. ---------running at hadoop104------- ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Error contacting service. It is probably not running. [root@hadoop100 bin]#
後來發現需要單獨ssh到每台機器上單獨啟動就可以了,可能是xcall神器有的時候不可靠。不過提示一點,zkServer.sh start-foreground 命令,可以在查看詳細啟動過程,方便排查錯誤。
[root@hadoop101 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh start-foreground ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg 2019-09-19 14:52:29,093 [myid:] - INFO [main:QuorumPeerConfig@134] - Reading configuration from: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg 2019-09-19 14:52:29,122 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop104 to address: hadoop104/192.168.56.104 2019-09-19 14:52:29,123 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop103 to address: hadoop103/192.168.56.103 2019-09-19 14:52:29,123 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop102 to address: hadoop102/192.168.56.102 2019-09-19 14:52:29,124 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop101 to address: hadoop101/192.168.56.101 2019-09-19 14:52:29,124 [myid:] - INFO [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop100 to address: hadoop100/192.168.56.100 2019-09-19 14:52:29,124 [myid:] - INFO [main:QuorumPeerConfig@396] - Defaulting to majority quorums 2019-09-19 14:52:29,134 [myid:101] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 2019-09-19 14:52:29,135 [myid:101] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0 2019-09-19 14:52:29,135 [myid:101] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. 2019-09-19 14:52:29,150 [myid:101] - INFO [main:QuorumPeerMain@127] - Starting quorum peer 2019-09-19 14:52:29,171 [myid:101] - INFO [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181 2019-09-19 14:52:29,172 [myid:101] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:130) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) [root@hadoop101 ~]#
如果jps命令能看到QuorumPeerMain就是已經啟動成功了。
[root@hadoop100 bin]# jps 1885 QuorumPeerMain 2029 Jps
SSH單獨登錄到各個伺服器上依次啟動,並查看狀態,可以發現我現在的集群環境中hadoop102是leader,其他幾台是follower:
[root@hadoop100 bin]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [root@hadoop100 bin]# ssh hadoop101 Last login: Thu Sep 19 15:04:12 2019 from hadoop100 [root@hadoop101 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [root@hadoop101 ~]# exit logout Connection to hadoop101 closed. [root@hadoop100 bin]# ssh hadoop102 Last login: Thu Sep 19 15:04:48 2019 from hadoop100 [root@hadoop102 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: leader [root@hadoop102 ~]# exit logout Connection to hadoop102 closed. [root@hadoop100 bin]# ssh hadoop103 Last login: Thu Sep 19 15:05:07 2019 from hadoop100 [root@hadoop103 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [root@hadoop103 ~]# exit logout Connection to hadoop103 closed. [root@hadoop100 bin]# ssh hadoop104 Last login: Thu Sep 19 15:05:51 2019 from hadoop100 [root@hadoop104 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status ZooKeeper JMX enabled by default Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [root@hadoop104 ~]# exit logout Connection to hadoop104 closed. [root@hadoop100 bin]#
好了,到現在為止,我的ZooKeeper集群環境已經搭建成功了。
題外話
學習研究的話可以用虛擬機,真要認真做點事還是要上雲,比如阿裡雲。如果你需要,可以用我的下麵這個鏈接,有折扣返現。
https://promotion.aliyun.com/ntms/yunparter/invite.html?userCode=vltv9frd