MHA(Master High Availability)是一套相對成熟的MySQL高可用方案,能做到在0~30s內自動完成資料庫的故障切換操作,在master伺服器不宕機的情況下,基本能保證數據的一致性。 它由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。其中,MH ...
MHA(Master High Availability)是一套相對成熟的MySQL高可用方案,能做到在0~30s內自動完成資料庫的故障切換操作,在master伺服器不宕機的情況下,基本能保證數據的一致性。
它由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。其中,MHA Manager可以單獨部署在一臺獨立的機器上管理多個master-slave集群,也可以部署在一臺slave上。MHA Node則運行在每個mysql節點上,MHA Manager會定時探測集群中的master節點,當master出現故障時,它自動將最新數據的slave提升為master,然後將其它所有的slave指向新的master。
在MHA自動故障切換過程中,MHA試圖保存master的二進位日誌,從而最大程度地保證數據不丟失,當這並不總是可行的,譬如,主伺服器硬體故障或無法通過ssh訪問,MHA就沒法保存二進位日誌,這樣就只進行了故障轉移但丟失了最新數據。可結合MySQL 5.5中推出的半同步複製來降低數據丟失的風險。
MHA軟體由兩部分組成:Manager工具包和Node工具包,具體說明如下:
MHA Manager:
1. masterha_check_ssh:檢查MHA的SSH配置狀況
2. masterha_check_repl:檢查MySQL的複製狀況
3. masterha_manager:啟動MHA
4. masterha_check_status:檢測當前MHA運行狀態
5. masterha_master_monitor:檢測master是否宕機
6. masterha_master_switch:控制故障轉移(自動或手動)
7. masterha_conf_host:添加或刪除配置的server信息
8. masterha_stop:關閉MHA
MHA Node:
save_binary_logs:保存或複製master的二進位日誌
apply_diff_relay_logs:識別差異的relay log並將差異的event應用到其它slave中
filter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用這個工具)
purge_relay_logs:消除中繼日誌(不會堵塞SQL線程)
另有如下幾個腳本需自定義:
1. master_ip_failover:管理VIP
2. master_ip_online_change:
3. masterha_secondary_check:當MHA manager檢測到master不可用時,通過masterha_secondary_check腳本來進一步確認,減低誤切的風險。
4. send_report:當發生故障切換時,可通過send_report腳本發送告警信息。
集群信息
角色 IP地址 ServerID 類型
Master 192.168.244.10 1 寫入
Candicate master 192.168.244.20 2 讀
Slave 192.168.244.30 3 讀
Monitor host 192.168.244.40 監控集群組
註:操作系統均為RHEL 6.7
其中,master對外提供寫服務,備選master提供讀服務,slave也提供相關的讀服務,一旦master宕機,將會把備選master提升為新的master,slave指向新的master
一、在所有節點上安裝MHA node
1. 在MySQL伺服器上安裝MHA node所需的perl模塊(DBD:mysql)
# yum install perl-DBD-MySQL -y
2. 在所有的節點上安裝mha node
下載地址為:https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2
由於該網址在國內被牆,相關文件下載後,放到了個人網盤中,http://pan.baidu.com/s/1boS31vT,有需要的童鞋可自行下載。
# tar xvf mha4mysql-node-0.56.tar.gz
# cd mha4mysql-node-0.56
# perl Makefile.PL
Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Can.pm line 6. BEGIN failed--compilation aborted at inc/Module/Install/Can.pm line 6. Compilation failed in require at inc/Module/Install.pm line 283. Can't locate ExtUtils/MakeMaker.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Makefile.pm line 4. BEGIN failed--compilation aborted at inc/Module/Install/Makefile.pm line 4. Compilation failed in require at inc/Module/Install.pm line 283. Can't locate ExtUtils/MM_Unix.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/Install/Metadata.pm line 349.View Code
通過報錯可以看出,是相關依賴包沒有安裝。
# yum install perl-ExtUtils-MakeMaker -y
# perl Makefile.PL
*** Module::AutoInstall version 1.03 *** Checking for Perl dependencies... Can't locate CPAN.pm in @INC (@INC contains: inc /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at inc/Module/AutoInstall.pm line 277.
# yum install perl-CPAN -y
# perl Makefile.PL
*** Module::AutoInstall version 1.03 *** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.609) - DBD::mysql ...loaded. (4.013) *** Module::AutoInstall configuration finished. Checking if your kit is complete... Looks good Writing Makefile for mha4mysql::nodeView Code
# make
# make install
至此,MHA node節點安裝完畢,會在/usr/local/bin下生成以下腳本文件
# ll /usr/local/bin/ total 44 -r-xr-xr-x 1 root root 16367 Jul 20 07:00 apply_diff_relay_logs -r-xr-xr-x 1 root root 4807 Jul 20 07:00 filter_mysqlbinlog -r-xr-xr-x 1 root root 8261 Jul 20 07:00 purge_relay_logs -r-xr-xr-x 1 root root 7525 Jul 20 07:00 save_binary_logs
二、在Monitor host節點上部署MHA Manager
# tar xvf mha4mysql-manager-0.56.tar.gz
# cd mha4mysql-manager-0.56
# perl Makefile.PL
*** Module::AutoInstall version 1.03 *** Checking for Perl dependencies... [Core Features] - DBI ...loaded. (1.609) - DBD::mysql ...loaded. (4.013) - Time::HiRes ...missing. - Config::Tiny ...missing. - Log::Dispatch ...missing. - Parallel::ForkManager ...missing. - MHA::NodeConst ...missing. ==> Auto-install the 5 mandatory module(s) from CPAN? [y] y *** Dependencies will be installed the next time you type 'make'. *** Module::AutoInstall configuration finished. Checking if your kit is complete... Looks good Warning: prerequisite Config::Tiny 0 not found. Warning: prerequisite Log::Dispatch 0 not found. Warning: prerequisite MHA::NodeConst 0 not found. Warning: prerequisite Parallel::ForkManager 0 not found. Warning: prerequisite Time::HiRes 0 not found. Writing Makefile for mha4mysql::managerView Code
# make
# make install
執行完畢後,會在/usr/local/bin下新增以下幾個文件
# ll /usr/local/bin/ total 40 -r-xr-xr-x 1 root root 1991 Jul 20 00:50 masterha_check_repl -r-xr-xr-x 1 root root 1775 Jul 20 00:50 masterha_check_ssh -r-xr-xr-x 1 root root 1861 Jul 20 00:50 masterha_check_status -r-xr-xr-x 1 root root 3197 Jul 20 00:50 masterha_conf_host -r-xr-xr-x 1 root root 2513 Jul 20 00:50 masterha_manager -r-xr-xr-x 1 root root 2161 Jul 20 00:50 masterha_master_monitor -r-xr-xr-x 1 root root 2369 Jul 20 00:50 masterha_master_switch -r-xr-xr-x 1 root root 5167 Jul 20 00:50 masterha_secondary_check -r-xr-xr-x 1 root root 1735 Jul 20 00:50 masterha_stop
三、配置SSH登錄無密碼驗證
1. 在manager上配置到所有Node節點的無密碼驗證
# ssh-keygen
一路按“Enter”
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
2. 在Master(192.168.244.10)上配置
# ssh-keygen
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
3. 在Candicate master(192.168.244.20)上配置
# ssh-keygen
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
4. 在Slave(192.168.244.30)上配置
# ssh-keygen
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i /root/.ssh/id_rsa.pub [email protected]
四、搭建主從複製環境
1. 在Master上執行備份
# mysqldump --master-data=2 --single-transaction -R --triggers -A > all.sql
其中,-R是備份存儲過程,--triggers是備份觸發器 -A代表全庫
2. 在Master上創建複製用戶
mysql> grant replication slave on *.* to 'repl'@'192.168.244.%' identified by 'repl'; Query OK, 0 rows affected (0.09 sec)
3. 查看備份文件all.sql中的CHANGE MASTER語句
# head -n 30 all.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=120;
4. 將備份文件複製到Candicate master和Slave上
# scp all.sql 192.168.244.20:/root/
# scp all.sql 192.168.244.30:/root/
5. 在Candicate master上搭建從庫
# mysql < all.sql
設置複製信息
mysql> CHANGE MASTER TO -> MASTER_HOST='192.168.244.10', -> MASTER_USER='repl', -> MASTER_PASSWORD='repl', -> MASTER_LOG_FILE='mysql-bin.000002', -> MASTER_LOG_POS=120; Query OK, 0 rows affected, 2 warnings (0.19 sec) mysql> start slave; Query OK, 0 rows affected (0.02 sec) mysql> show slave status\G
6. 在Slave上搭建從庫
7. slave伺服器設置為read only
mysql> set global read_only=1; Query OK, 0 rows affected (0.04 sec)
8. 在Master中創建監控用戶
mysql> grant all privileges on *.* to 'monitor'@'%' identified by 'monitor123'; Query OK, 0 rows affected (0.07 sec)
五、 配置MHA
1. 在Monitor host(192.168.244.40)上創建MHA工作目錄,並且創建相關配置文件
# mkdir -p /etc/masterha
# vim /etc/masterha/app1.cnf
[server default] manager_log=/masterha/app1/manager.log //設置manager的日誌 manager_workdir=/masterha/app1 //設置manager的工作目錄 master_binlog_dir=/var/lib/mysql //設置master預設保存binlog的位置,以便MHA可以找到master的日誌 master_ip_failover_script= /usr/local/bin/master_ip_failover //設置自動failover時候的切換腳本 master_ip_online_change_script= /usr/local/bin/master_ip_online_change //設置手動切換時候的切換腳本 user=monitor // 設置監控用戶 password=monitor123 //設置監控用戶的密碼 ping_interval=1 //設置監控主庫,發送ping包的時間間隔,預設是3秒,嘗試三次沒有回應的時候進行自動failover remote_workdir=/tmp //設置遠端mysql在發生切換時binlog的保存位置 repl_user=repl //設置複製環境中的複製用戶名 repl_password=repl //設置複製用戶的密碼 report_script=/usr/local/bin/send_report //設置發生切換後發送的報警的腳本 secondary_check_script= /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306 //一旦MHA到master的監控之間出現問題,MHA Manager將會判斷其它兩個slave是否能建立到master_ip 3306埠的連接 shutdown_script="" //設置故障發生後關閉故障主機腳本(該腳本的主要作用是關閉主機防止發生腦裂) ssh_user=root //設置ssh的登錄用戶名 [server1] hostname=192.168.244.10 port=3306 [server2] hostname=192.168.244.20 port=3306 candidate_master=1 //設置為候選master,如果設置該參數以後,發生主從切換以後將會將此從庫提升為主庫,即使這個主庫不是集群中最新的slave check_repl_delay=0 //預設情況下如果一個slave落後master 100M的relay logs的話,MHA將不會選擇該slave作為一個新的master,因為對於這個slave的恢復需要花費很長時間,通過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略複製延時,這個參數對於設置了candidate_master=1的主機非常有用,因為它保證了這個候選主在切換過程中一定是最新的master [server3] hostname=192.168.244.30 port=3306
註意:
1> 在編輯該文件時,後面的註釋切記要去掉,MHA並不會將後面的內容識別為註釋。
2> 配置文件中設置了master_ip_failover_script和secondary_check_script兩個選項,對應的文件在上面提供的百度雲盤中有。
2. 設置relay log清除方式(在每個Slave上)
mysql> set global relay_log_purge=0; Query OK, 0 rows affected (0.00 sec)
MHA在發生切換過程中,從庫在恢復的過程中,依賴於relay log的相關信息,所以我們這裡要將relay log的自動清楚設置為OFF,採用手動清楚relay log的方式。
在預設情況下,從伺服器上的中繼日誌會在SQL線程執行完後被自動刪除。但是在MHA環境中,這些中繼日誌在恢復其它從伺服器時可能會被用到,因此需要禁用中繼日誌的自動清除。改為定期手動清除SQL線程應用完的中繼日誌。
在ext3文件系統下,刪除大的文件需要一定的時間,這樣會導致嚴重的複製延遲,所以在Linux中,一般都是通過硬鏈接的方式來刪除大文件。
3. 設置定期清理relay腳本
MHA節點中包含了purge_relay_logs腳本,它可以為relay log創建硬鏈接,執行set global relay_log_purge=1,等待幾秒鐘以便SQL線程切換到新的中繼日誌,再執行set global relay_log_purge=0。
下麵看看腳本的使用方法:
# purge_relay_logs --user=monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/
2017-04-24 20:27:46: purge_relay_logs script started. Found relay_log.info: /var/lib/mysql/relay-log.info Opening /var/lib/mysql/mysqld-relay-bin.000001 .. Opening /var/lib/mysql/mysqld-relay-bin.000002 .. Opening /var/lib/mysql/mysqld-relay-bin.000003 .. Opening /var/lib/mysql/mysqld-relay-bin.000004 .. Opening /var/lib/mysql/mysqld-relay-bin.000005 .. Opening /var/lib/mysql/mysqld-relay-bin.000006 .. Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if i t keeps up); SET GLOBAL relay_log_purge=0; .. ok.2017-04-24 20:27:50: All relay log purging operations succeeded.
其中,
--user:mysql用戶名
--password:mysql用戶的密碼
--host: mysqlserver地址
--workdir:指定創建relay log的硬鏈接的位置,預設的是/var/tmp。由於系統不同分區創建硬鏈接文件會失敗,故需要指定具體的硬鏈接的位置。
--disable_relay_log_purge:預設情況下,如果relay_log_purge=1,則腳本會直接退出。通過設置這個參數,該腳本會首先將relay_log_purge設置為1,清除掉relay log後,再將該參數設置為0。
設置crontab來定期清理relay log
MHA在切換的過程中會直接調用mysqlbinlog命令,故需要在環境變數中指定mysqlbinlog的具體路徑。
# vim /etc/cron.d/purge_relay_logs
0 4 * * * /usr/local/bin/purge_relay_logs --user=monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/ >> /tmp/purge _relay_logs.log 2>&1
註意:最好是每台slave伺服器在不同時間點執行該計劃任務。
4. 將mysqlbinlog的路徑添加到環境變數中
六、 檢查SSH的配置
在Monitor host上執行
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Wed Jul 20 14:33:36 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jul 20 14:33:36 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed Jul 20 14:33:36 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed Jul 20 14:33:36 2016 - [info] Starting SSH connection tests.. Wed Jul 20 14:33:51 2016 - [debug] Wed Jul 20 14:33:36 2016 - [debug] Connecting via SSH from root@192.168.244.10(192.168.244.10:22) to root@192.168.244.20(192.168.244.20:22).. Wed Jul 20 14:33:48 2016 - [debug] ok. Wed Jul 20 14:33:48 2016 - [debug] Connecting via SSH from root@192.168.244.10(192.168.244.10:22) to root@192.168.244.30(192.168.244.30:22).. Wed Jul 20 14:33:50 2016 - [debug] ok. Wed Jul 20 14:33:55 2016 - [debug] Wed Jul 20 14:33:37 2016 - [debug] Connecting via SSH from root@192.168.244.30(192.168.244.30:22) to root@192.168.244.10(192.168.244.10:22).. Wed Jul 20 14:33:49 2016 - [debug] ok. Wed Jul 20 14:33:49 2016 - [debug] Connecting via SSH from root@192.168.244.30(192.168.244.30:22) to root@192.168.244.20(192.168.244.20:22).. Wed Jul 20 14:33:54 2016 - [debug] ok. Wed Jul 20 14:33:55 2016 - [debug] Wed Jul 20 14:33:36 2016 - [debug] Connecting via SSH from root@192.168.244.20(192.168.244.20:22) to root@192.168.244.10(192.168.244.10:22).. Wed Jul 20 14:33:49 2016 - [debug] ok. Wed Jul 20 14:33:49 2016 - [debug] Connecting via SSH from root@192.168.244.20(192.168.244.20:22) to root@192.168.244.30(192.168.244.30:22).. Wed Jul 20 14:33:54 2016 - [debug] ok. Wed Jul 20 14:33:55 2016 - [info] All SSH connection tests passed successfully.View Code
七、查看整個集群的狀態
在Monitor host上執行
# masterha_check_repl --conf=/etc/masterha/app1.cnf
Wed Jul 20 14:44:30 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jul 20 14:44:30 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed Jul 20 14:44:30 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed Jul 20 14:44:30 2016 - [info] MHA::MasterMonitor version 0.56. Wed Jul 20 14:44:31 2016 - [info] GTID failover mode = 0 Wed Jul 20 14:44:31 2016 - [info] Dead Servers: Wed Jul 20 14:44:31 2016 - [info] Alive Servers: Wed Jul 20 14:44:31 2016 - [info] 192.168.244.10(192.168.244.10:3306) Wed Jul 20 14:44:31 2016 - [info] 192.168.244.20(192.168.244.20:3306) Wed Jul 20 14:44:31 2016 - [info] 192.168.244.30(192.168.244.30:3306) Wed Jul 20 14:44:31 2016 - [info] Alive Slaves: Wed Jul 20 14:44:31 2016 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31 (oldest major version between slaves) log-bin:disabled Wed Jul 20 14:44:31 2016 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed Jul 20 14:44:31 2016 - [info] Primary candidate for the new Master (candidate_master is set) Wed Jul 20 14:44:31 2016 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31 (oldest major version between slaves) log-bin:disabled Wed Jul 20 14:44:31 2016 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed Jul 20 14:44:31 2016 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306) Wed Jul 20 14:44:31 2016 - [info] Checking slave configurations.. Wed Jul 20 14:44:31 2016 - [warning] log-bin is not set on slave 192.168.244.20(192.168.244.20:3306). This host cannot be a master. Wed Jul 20 14:44:31 2016 - [warning] log-bin is not set on slave 192.168.244.30(192.168.244.30:3306). This host cannot be a master. Wed Jul 20 14:44:31 2016 - [info] Checking replication filtering settings.. Wed Jul 20 14:44:31 2016 - [info] binlog_do_db= , binlog_ignore_db= Wed Jul 20 14:44:31 2016 - [info] Replication filtering check ok. Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln361] None of slaves can be master. Check failover configuration file or log-bin settings in my.cnf Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48. Wed Jul 20 14:44:31 2016 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Wed Jul 20 14:44:31 2016 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK!View Code
報錯很明顯,Candicate master和Slave都沒有啟動log-bin,如果沒有啟動的話,後續就無法提升為主
設置log-bin後,重新執行:
Wed Jul 20 15:49:58 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Jul 20 15:49:58 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed Jul 20 15:49:58 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed Jul 20 15:49:58 2016 - [info] MHA::MasterMonitor version 0.56. Wed Jul 20 15:49:59 2016 - [info] GTID failover mode = 0 Wed Jul 20 15:49:59 2016 - [info] Dead Servers: Wed Jul 20 15:49:59 2016 - [info] Alive Servers: Wed Jul 20 15:49:59 2016 - [info] 192.168.244.10(192.168.244.10:3306) Wed Jul 20 15:49:59 2016 - [info] 192.168.244.20(192.168.244.20:3306) Wed Jul 20 15:49:59 2016 - [info] 192.168.244.30(192.168.244.30:3306) Wed Jul 20 15:49:59 2016 - [info] Alive Slaves: Wed Jul 20 15:49:59 2016 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Wed Jul 20 15:49:59 2016 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed Jul 20 15:49:59 2016 - [info] Primary candidate for the new Master (candidate_master is set) Wed Jul 20 15:49:59 2016 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Wed Jul 20 15:49:59 2016 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed Jul 20 15:49: