上次寫redis的學習筆記還是2014年,一轉眼已經快2年過去了,在段時間里,redis最大的變化之一就是cluster功能的正式發佈,以前要搞redis集群,得藉助一致性hash來自己搞sharding,現在方便多了,直接上cluster功能就行了,而且還支持節點動態添加、HA、節點增減後緩存重新分佈(resharding)。
下麵是參考官方教程cluster-tutorial 在mac機上搭建cluster的過程:
一、下載最新版redis 編譯
目前最新版是3.0.7,下載地址:http://www.redis.io/download
編譯很簡單,一個make命令即可,不清楚的同學,可參考我之前的筆記: redis 學習筆記(1)-編譯、啟動、停止
二、建6個目錄
mkdir ~/app/redis-cluster/ #先建一個根目錄 mkdir 7000 7001 7002 7003 7004 7005
註:與大多數分散式中間件一樣,redis的HA也是依賴選舉演算法來保證高可用性的,所以類似ZK一樣,一般是奇數個節點(可以允許N/2以下的節點失效),再考慮到每個節點做Master-Slave互為備份,所以一個redis cluster集群最少也得6個節點。
然後把步驟1里編譯好的redis,複製到這6個目錄下。
三、配置文件
port 7000 cluster-enabled yes cluster-config-file nodes.conf cluster-node-timeout 5000 appendonly yes
把上面這段保存成redis-cluster.conf,放到每個目錄的redis目錄中,註意修改port埠,即7000目錄下的port為7000,7001目錄下的port為7001...
cluster-node-timeout 是集群中各節點相互通訊時,允許"失聯"的最大毫秒數,上面的配置為5秒,如果超過5秒某個節點沒向其它節點彙報成功,認為該節點掛了。
四、依次啟動各個redis
在每個目錄redis的src子目錄下,輸入:
./redis-server ../redis-cluster.conf
這樣7000~7005這6個節點就啟動了。
五、安裝redis的ruby模塊
gem install redis #註:這個步驟建議翻~牆,不然你懂的
解釋:雖然步驟4把6個redis server啟動成功了,但是彼此之間是完全獨立的,需要藉助其它工具將其加入cluster,而這個工具就是redis提供的一個名為redis-trib.rb的ruby腳本(個人估計redis的作者比較偏愛ruby),mac自帶了ruby2.0環境,但是沒有redis模塊,所以要安裝這玩意兒,否則接下來的創建cluster將失敗。
六、創建cluster
./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 \ 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005
仍然保持在某個目錄的src子目錄下,運行上面這段shell腳本,cluster就創建成功了,replicas 1的意思,就是每個節點創建1個副本(即:slave),所以最終的結果,就是後面的127.0.0.1:7000~127.0.0.1:7005中,會有3個會指定成master,而其它3個會指定成slave。
註:利用redis-trib創建cluster的操作,只需要一次即可,假設系統關機,把所有6個節點全關閉後,下次重啟後,即自動進入cluster模式,不用再次redis-trib.rb create。
此時,如何用ps查看redis進程,會看到每個進程後附帶了cluster的字樣
如果想知道,哪些埠的節點是master,哪些埠的節點是slave,可以用下麵的命令:
./redis-trib.rb check 127.0.0.1:7000
輸出結果如下:
>>> Performing Cluster Check (using node 127.0.0.1:7000) S: 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e 127.0.0.1:7000 slots: (0 slots) slave replicates 38910c5baafea02c5303505acfd9bd331c608cfc M: e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa 127.0.0.1:7001 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: 88e16f91609c03277f2ee6ce5285932f58c221c1 127.0.0.1:7005 slots: (0 slots) slave replicates ec964a7c7cd53b986f54318a190c1426fc53a5fa S: be7e9fd3b7d096b037306bc14e1017150fa59d7a 127.0.0.1:7004 slots: (0 slots) slave replicates e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa M: 38910c5baafea02c5303505acfd9bd331c608cfc 127.0.0.1:7003 slots:0-5460 (5461 slots) master 1 additional replica(s) M: ec964a7c7cd53b986f54318a190c1426fc53a5fa 127.0.0.1:7002 slots:10923-16383 (5461 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
從上面的輸出,可以看出7000、7004、7005是slave,而7001、7003、7002是master(如果大家人為做過一些failover的測試,比如把某個節點手動停掉,再恢復,輸出的結果可能與上面不太一樣),除了check參數,還有一個常用的參數info
./redis-trib.rb info 127.0.0.1:7000
輸出結果如下:
127.0.0.1:7001 (e0e8dfdd...) -> 2 keys | 5462 slots | 1 slaves. 127.0.0.1:7003 (38910c5b...) -> 2 keys | 5461 slots | 1 slaves. 127.0.0.1:7002 (ec964a7c...) -> 0 keys | 5461 slots | 1 slaves. [OK] 4 keys in 3 masters. 0.00 keys per slot on average.
它會把所有的master信息輸出,包括這個master上有幾個緩存key,有幾個slave,所有master上的keys合計,以及平均每個slot上有多少key,想瞭解更多redis-trib腳本的其它參數,可以用
./redis-trib.rb help
輸出如下:
Usage: redis-trib <command> <options> <arguments ...> create host1:port1 ... hostN:portN --replicas <arg> check host:port info host:port fix host:port --timeout <arg> reshard host:port --from <arg> --to <arg> --slots <arg> --yes --timeout <arg> --pipeline <arg> rebalance host:port --weight <arg> --auto-weights --use-empty-masters --timeout <arg> --simulate --pipeline <arg> --threshold <arg> add-node new_host:new_port existing_host:existing_port --slave --master-id <arg> del-node host:port node_id set-timeout host:port milliseconds call host:port command arg arg .. arg import host:port --from <arg> --copy --replace help (show this help) For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.
上面已經多次出現了slot這個詞,略為解釋一下:
如上圖,redis-cluster把整個集群的存儲空間劃分為16384個slot(譯為:插槽?),當6個節點分為3主3從時,相當於整個cluster中有3組HA的節點,3個master會平均分攤所有slot,每次向cluster中的key做操作時(比如:讀取/寫入緩存),redis會對key值做CRC32演算法處理,得到一個數值,然後再對16384取模,通過餘數判斷該緩存項應該落在哪個slot上,確定了slot,也就確定了保存在哪個master節點上,當cluster擴容或刪除節點時,只需要將slot重新分配即可(即:把部分slot從一些節點移動到其它節點)。
七、redis-cli客戶端操作
./redis-cli -c -h localhost -p 7000
註意加參數-c,表示進入cluster模式,隨便添加一個緩存試試:
localhost:7000> set user1 jimmy -> Redirected to slot [8106] located at 127.0.0.1:7001 OK
註意第2行的輸出,表示user1這個緩存通過計算後,落在8106這個slot上,最終定位在7001這個埠對應的節點上(解釋:因為7000是slave,7001才是master,只有master才能寫入),如果是在7001上重覆上面的操作時,不會出現第2行(解釋:7001是master,所以不存在redirect的過程)
➜ src ./redis-cli -c -h localhost -p 7001 localhost:7001> set user1 yang OK localhost:7001>
八、FailOver測試
先用redis-trib.rb 查看下當前的主、從情況
➜ src ./redis-trib.rb check localhost:7000 >>> Performing Cluster Check (using node localhost:7000) S: 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e localhost:7000 slots: (0 slots) slave replicates 38910c5baafea02c5303505acfd9bd331c608cfc M: ec964a7c7cd53b986f54318a190c1426fc53a5fa 127.0.0.1:7002 slots:10923-16383 (5461 slots) master 1 additional replica(s) M: e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa 127.0.0.1:7001 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: be7e9fd3b7d096b037306bc14e1017150fa59d7a 127.0.0.1:7004 slots: (0 slots) slave replicates e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa S: 88e16f91609c03277f2ee6ce5285932f58c221c1 127.0.0.1:7005 slots: (0 slots) slave replicates ec964a7c7cd53b986f54318a190c1426fc53a5fa M: 38910c5baafea02c5303505acfd9bd331c608cfc 127.0.0.1:7003 slots:0-5460 (5461 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
從輸出上看7000是7003(38910c5baafea02c5303505acfd9bd331c608cfc)的slave,現在我們人工把7003的redis進程給kill掉,然後觀察7000的終端輸出:
872:S 21 Mar 10:55:55.663 * Connecting to MASTER 127.0.0.1:7003 3872:S 21 Mar 10:55:55.663 * MASTER <-> SLAVE sync started 3872:S 21 Mar 10:55:55.663 # Error condition on socket for SYNC: Connection refused 3872:S 21 Mar 10:55:55.771 * Marking node 38910c5baafea02c5303505acfd9bd331c608cfc as failing (quorum reached). 3872:S 21 Mar 10:55:55.771 # Cluster state changed: fail 3872:S 21 Mar 10:55:55.869 # Start of election delayed for 954 milliseconds (rank #0, offset 183). 3872:S 21 Mar 10:55:56.703 * Connecting to MASTER 127.0.0.1:7003 3872:S 21 Mar 10:55:56.703 * MASTER <-> SLAVE sync started 3872:S 21 Mar 10:55:56.703 # Error condition on socket for SYNC: Connection refused 3872:S 21 Mar 10:55:56.909 # Starting a failover election for epoch 10. 3872:S 21 Mar 10:55:56.911 # Failover election won: I'm the new master. 3872:S 21 Mar 10:55:56.911 # configEpoch set to 10 after successful failover 3872:M 21 Mar 10:55:56.911 * Discarding previously cached master state. 3872:M 21 Mar 10:55:56.911 # Cluster state changed: ok
註意5,6,11這幾行,第5行表明由於7003宕機,cluster狀態已經切換到fail狀態,第6行表示發起選舉,第11行表示7000埠對應的節點當選為new master。
九、cluster 擴容
業務規模變大後,集群擴容是早晚的事情,下麵演示如何再添加2個節點,先把7000複製二份,變成7006,7007,然後進入7006/7007目錄redis的src子目錄下
rm nodes.conf dump.rdb appendonly.aof
由於7000我們剛纔啟動過,裡面有已經有一些數據了,所以要把數據文件,日誌文件,以及cluster的nodes.conf文件刪除,變成一個空的redis獨立節點,否則無法加入cluster。
然後修改redis-cluster.conf
port 7000 cluster-enabled yes cluster-config-file "nodes.conf" cluster-node-timeout 10000 appendonly yes # Generated by CONFIG REWRITE dir "/Users/yjmyzz/app/redis-cluster/7000/redis-3.0.7/src"
要修改的地方有二處,1是第一行的埠,改成與7006/7007匹配的埠,2是最後2行,這是7000運行後,自動添加的,把最後二行刪除。
做完這些後,啟動7006,7007這二個redis節點,此時這2個新節點與cluster沒有任何關係,可以用下麵的命令將7006做為master添加到cluster中。
./redis-trib.rb add-node 127.0.0.1:7006 127.0.0.1:7000
註:第1個參數為新節點的"IP:埠",第2個參數為集群中的任一有效的節點。
順利的話,輸出如下:
>>> Adding node 127.0.0.1:7006 to cluster 127.0.0.1:7000 >>> Performing Cluster Check (using node 127.0.0.1:7000) M: 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e 127.0.0.1:7000 slots:0-5460 (5461 slots) master 1 additional replica(s) M: e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa 127.0.0.1:7001 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: be7e9fd3b7d096b037306bc14e1017150fa59d7a 127.0.0.1:7004 slots: (0 slots) slave replicates e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa M: ec964a7c7cd53b986f54318a190c1426fc53a5fa 127.0.0.1:7002 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 88e16f91609c03277f2ee6ce5285932f58c221c1 127.0.0.1:7005 slots: (0 slots) slave replicates ec964a7c7cd53b986f54318a190c1426fc53a5fa S: 38910c5baafea02c5303505acfd9bd331c608cfc 127.0.0.1:7003 slots: (0 slots) slave replicates 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. >>> Send CLUSTER MEET to node 127.0.0.1:7006 to make it join the cluster. [OK] New node added correctly.
可以再用check確認下狀態:
➜ src ./redis-trib.rb check 127.0.0.1:7000 >>> Performing Cluster Check (using node 127.0.0.1:7000) M: 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e 127.0.0.1:7000 slots:0-5460 (5461 slots) master 1 additional replica(s) M: e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa 127.0.0.1:7001 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: be7e9fd3b7d096b037306bc14e1017150fa59d7a 127.0.0.1:7004 slots: (0 slots) slave replicates e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa M: 226d1af3c95bf0798ea9fed86373b89347f889da 127.0.0.1:7006 slots: (0 slots) master 0 additional replica(s) M: ec964a7c7cd53b986f54318a190c1426fc53a5fa 127.0.0.1:7002 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 88e16f91609c03277f2ee6ce5285932f58c221c1 127.0.0.1:7005 slots: (0 slots) slave replicates ec964a7c7cd53b986f54318a190c1426fc53a5fa S: 38910c5baafea02c5303505acfd9bd331c608cfc 127.0.0.1:7003 slots: (0 slots) slave replicates 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
12-14行說明7006已經是cluster的新master了,繼續,用下麵的命令把7007當成slave加入:
./redis-trib.rb add-node --slave --master-id 226d1af3c95bf0798ea9fed86373b89347f889da 127.0.0.1:7007 127.0.0.1:7000
這裡多出了二個參數:--slave 表示準備將新節點當成slave加入,--master-id xxxxx 則是指定要當誰的slave,後面的xxx部分,即為前面check的輸出結果中,7006的ID,完事之後,可以再次確認狀態:
➜ src ./redis-trib.rb check 127.0.0.1:7000 >>> Performing Cluster Check (using node 127.0.0.1:7000) M: 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e 127.0.0.1:7000 slots:0-5460 (5461 slots) master 1 additional replica(s) S: 792bcccf35845c4922dd33d7f9827420ebb89bc9 127.0.0.1:7007 slots: (0 slots) slave replicates 226d1af3c95bf0798ea9fed86373b89347f889da M: e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa 127.0.0.1:7001 slots:5461-10922 (5462 slots) master 1 additional replica(s) S: be7e9fd3b7d096b037306bc14e1017150fa59d7a 127.0.0.1:7004 slots: (0 slots) slave replicates e0e8dfddd4e9d855090d6efd18e55ea9c0e1f7aa M: 226d1af3c95bf0798ea9fed86373b89347f889da 127.0.0.1:7006 slots: (0 slots) master 1 additional replica(s) M: ec964a7c7cd53b986f54318a190c1426fc53a5fa 127.0.0.1:7002 slots:10923-16383 (5461 slots) master 1 additional replica(s) S: 88e16f91609c03277f2ee6ce5285932f58c221c1 127.0.0.1:7005 slots: (0 slots) slave replicates ec964a7c7cd53b986f54318a190c1426fc53a5fa S: 38910c5baafea02c5303505acfd9bd331c608cfc 127.0.0.1:7003 slots: (0 slots) slave replicates 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
觀察6-8行、15-17行,說明7007已經是7006的slave。
十、reshard 重新劃分slot
增加新的節點之後,問題就來了,16384個slot已經被其它3組節點分完了,新節點沒有slot,沒辦法存放緩存,所以需要將slot重新分佈。
➜ src ./redis-trib.rb info 127.0.0.1:7000 127.0.0.1:7000 (0b7e0d53...) -> 4 keys | 5461 slots | 1 slaves. 127.0.0.1:7001 (e0e8dfdd...) -> 4 keys | 5462 slots | 1 slaves. 127.0.0.1:7006 (226d1af3...) -> 0 keys | 0 slots | 1 slaves. #7006上完全沒有slot 127.0.0.1:7002 (ec964a7c...) -> 9 keys | 5461 slots | 1 slaves. [OK] 17 keys in 4 masters. 0.00 keys per slot on average.
用下麵的命令可以重新分配slot
./redis-trib.rb reshard 127.0.0.1:7000
reshard後面的IP:port,只要是在cluster中的有效節點即可。
➜ src ./redis-trib.rb reshard 127.0.0.1:7000 >>> Performing Cluster Check (using node 127.0.0.1:7000) M: 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e 127.0.0.1:7000 slots:1792-4095 (2304 slots) master 0 additional replica(s) ... [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 1000 #這裡輸入要移動多少slot What is the receiving node ID? 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e #這裡輸入目標節點的id Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1:all #將所有node都當成源節點 ... Moving slot 4309 from ec964a7c7cd53b986f54318a190c1426fc53a5fa Moving slot 4310 from ec964a7c7cd53b986f54318a190c1426fc53a5fa Moving slot 4311 from ec964a7c7cd53b986f54318a190c1426fc53a5fa Moving slot 4312 from ec964a7c7cd53b986f54318a190c1426fc53a5fa Moving slot 4313 from ec964a7c7cd53b986f54318a190c1426fc53a5fa Do you want to proceed with the proposed reshard plan (yes/no)? yes #確認執行
註:第一個交互詢問,填寫多少slot移動時,要好好想想,如果填成16384,則將所有slot都移動到一個固定節點上,會導致更加不均衡!建議每次移動500~1000,這樣對線上的影響比較小。
另外在填寫source node時,除了all之外,還可以直接填寫源節點的id,即:
[OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 300 What is the receiving node ID? 0b7e0d5337e87ac7b59bba4c1248e5c9e8d1905e Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1:226d1af3c95bf0798ea9fed86373b89347f889da #這裡填寫源節點的id Source node #2:done #這裡輸入done表示,不再繼續添加源節點了
reshard可以多次操作,直到達到期望的分佈為止(註:個人覺得redis的reshard這裡有點麻煩,要移動多少slot需要人工計算,如果能提供一個參數之類,讓16384個slot自動平均分配就好了),調整完成後,可以再看看分佈情況:
➜ src ./redis-trib.rb info 127.0.0.1:7000 127.0.0.1:7000 (0b7e0d53...) -> 4 keys | 4072 slots | 0 slaves. 127.0.0.1:7001 (e0e8dfdd...) -> 5 keys | 4099 slots | 0 slaves. 127.0.0.1:7006 (226d1af3...) -> 5 keys | 4132 slots | 4 slaves. 127.0.0.1:7002 (ec964a7c...) -> 3 keys | 4081 slots | 0 slaves. [OK] 17 keys in 4 masters. 0.00 keys per slot on average.
十一、刪除節點del-node
既然有擴容,就會有反向需求,某些節點不再需要時,可以用del-node刪除,比如剛纔我一陣亂倒騰後,發現7006已經有4個slave了,而其它master一個slave都沒有,這明顯不合理。
刪除節點命令:
./redis-trib.rb del-node 127.0.0.1:7006 88e16f91609c03277f2ee6ce5285932f58c221c1
del-node後面的ip:port只要是cluster中有效節點即可,最後一個參數為目標節點的id,註意:只有slave節點和空的master節點可以刪除,如果master非空,先用reshard把上面的slot移動到其它node後再刪除,如果有一組master-slave節點,將master上所有slot移到其它節點,然後將master刪除,剩下的slave會另尋他主,變成其它master的slave。
另外:刪除節點的含義,不僅僅是從cluster中將這個節點移除,還會直接將目標節點的redis服務停止。