本文主要參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/RackAwareness.html hadoop組件是機棧敏感(譯註rack,機棧,可以簡單理解為節點的擺放)。 例如,HDFS塊的分佈會利用 ...
本文主要參考 http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/RackAwareness.html
hadoop組件是機棧敏感(譯註rack,機棧,可以簡單理解為節點的擺放)。
例如,HDFS塊的分佈會利用機棧敏感來做容錯,方式是把複製放在不同的機棧。這樣即便網路切換導致故障或者其它斷開導致的故障,也會有一些數據是可用的。
譯註:在重要的系統中,還有完全有必要考慮這個的,網路設備也會出現故障,畢竟這些網路設備的負載也很繁重的。
haoop主守護程式可以獲得集群從屬節點(數據節點)的機棧id,方式是激活一個完畢腳本或者是配置文件制定的java類。無論使用哪種方式,它們的輸出都必須符合java的org.apache.hadoop.net.DNSToSwitchMapping介面。
這個介面要求一一對應,拓撲信息(機器擺放信息)必須形如'/myrack/myhost',其中‘/'是拓撲分隔符,’myrack'是rack的識別符,'myhost'是主機名稱。假定一個rack有24個子網,那麼其中一個就可以使用'/192.168.100.0/192.168.100.5'.
如果要使用java類做拓撲映射,那麼類的名稱必須是通過配置文件的net.topology.node.switch.mapping.impl來設定。例如,networkTopology.java,已經包含在hadoop發佈程式中,管理員可以配置。
使用java類的好處是,當一個新的數據節點加入的時候,hadoop不需要調用外部進程(這樣可以更高效一些)。
如果使用外部腳本,那麼必須使用參數net.topology.script.file.name來配置。不同於java類,完畢拓撲腳本並沒有包含在hadoop發佈程式中,必須由管理員提供。當hadoop調用這些腳本的時候,會發送多個ip地址給ARGV.需要發送給腳本的ip地址個數,是
由net.topology.script.number.args控制的,預設值是100.如果net.topology.script.number.args設置為1,那麼數據節點或者幾點管理器每提交一個ip地址,腳本就要被調用一次。
如果net.topology.script.file.name 或者 net.topology.node.switch.mapping.impl的值沒有設置,那麼rack id '/default-rack'就任意ip的返回值。然而,這樣的結果看起來一點也不理想,它可能會導致塊同步問題(譯註:大家一個rack,寫入那裡無所謂,所以可能性能和容錯都會存在一些問題)。
譯註:以上幾個參數都是在core-site.xml中配置的
原文,給出了兩個例子,一個是python,一個是bash
------------------------
-- python
------------------------
#!/usr/bin/python
# this script makes assumptions about the physical environment.
# 1) each rack is its own layer 3 network with a /24 subnet, which
# could be typical where each rack has its own
# switch with uplinks to a central core router.
#
# +-----------+
# |core router|
# +-----------+
# / \
# +-----------+ +-----------+
# |rack switch| |rack switch|
# +-----------+ +-----------+
# | data node | | data node |
# +-----------+ +-----------+
# | data node | | data node |
# +-----------+ +-----------+
#
# 2) topology script gets list of IP's as input, calculates network address, and prints '/network_address/ip'.
import netaddr
import sys
sys.argv.pop(0) # discard name of topology script from argv list as we just want IP addresses
netmask = '255.255.255.0' # set netmask to what's being used in your environment. The example uses a /24
for ip in sys.argv: # loop over list of datanode IP's
address = '{0}/{1}'.format(ip, netmask) # format address string so it looks like 'ip/netmask' to make netaddr work
try:
network_address = netaddr.IPNetwork(address).network # calculate and print network address
print "/{0}".format(network_address)
except:
print "/rack-unknown" # print catch-all value if unable to calculate network address
------------------------
-- bash
------------------------
#!/bin/bash
# Here's a bash example to show just how simple these scripts can be
# Assuming we have flat network with everything on a single switch, we can fake a rack topology.
# This could occur in a lab environment where we have limited nodes,like 2-8 physical machines on a unmanaged switch.
# This may also apply to multiple virtual machines running on the same physical hardware.
# The number of machines isn't important, but that we are trying to fake a network topology when there isn't one.
#
# +----------+ +--------+
# |jobtracker| |datanode|
# +----------+ +--------+
# \ /
# +--------+ +--------+ +--------+
# |datanode|--| switch |--|datanode|
# +--------+ +--------+ +--------+
# / \
# +--------+ +--------+
# |datanode| |namenode|
# +--------+ +--------+
#
# With this network topology, we are treating each host as a rack. This is being done by taking the last octet
# in the datanode's IP and prepending it with the word '/rack-'. The advantage for doing this is so HDFS
# can create its 'off-rack' block copy.
# 1) 'echo $@' will echo all ARGV values to xargs.
# 2) 'xargs' will enforce that we print a single argv value per line
# 3) 'awk' will split fields on dots and append the last field to the string '/rack-'. If awk
# fails to split on four dots, it will still print '/rack-' last field value
echo $@ | xargs -n 1 | awk -F '.' '{print "/rack-"$NF}'
總結:
網路拓撲如何設計,無論在哪個集群中,都是需要密切關註的