1、概念 mvcc作用在於解決併發條件下,讀寫衝突的問題。一般用於RC和RR隔離級別,解決臟讀和不可重覆讀的問題。 (1)當前讀 讀取的是記錄的最新版本,讀取時還要保證其他事務不能修改當前記錄,會對讀取的記錄進行加鎖。對於我們日常的操作,如:select ... lock in share mo ...
啟動
MHA的啟動腳本為masterha_manager(安裝後,預設路徑--/usr/local/bin/masterha_manager)。啟動的過程中會主動檢查各節點的SSH連接和主從複製的狀態是否正常。運行期間,manager會調用masterha_master_monitor腳本(masterha_master_monitor進一步調用XXX/mha4mysql-manager-0.5?/lib/MHA/MasterMonitor.pm 和 HealthCheck.pm 等腳本),探測各節點的運行情況。探測間隔由manager配置文件中的ping_interval參數決定,探測三次主節點無反應,就判定為宕機。
故障選主
---讀取配置文件中是否有候選主庫的參數--candidate_master=1;如果有該參數,並且check_repl_delay=0,則將該節點提升為新的主庫。
--如果沒有指定候選主節點,則自動判斷所有從庫的日誌量,將最接近主資料庫的從庫提升為新的主庫。
---按照配置文件中,節點的先後順序選主。
數據補償
---判斷主庫SSH的連通性,如果能連通,則通過“save_binary_logs”腳本將缺失的binlog發送給從庫,並恢復;
---如果主庫無法連通,則通過“apply_diff_relay_logs”腳本計算從庫的relay log的差異,並恢復到其他從庫;
角色切換
新選出的主庫,解除從庫身份,剩餘從庫與新的主庫建立主從關係。
VIP偏移
虛擬IP的綁定。
思考
如果在FailOver的過程中,主庫恢復了怎麼辦?
要分情況了,可能會FailOver繼續也可能要FailOver終止。下麵是FailOver終止的Log。
Sat Jan 20 09:27:28 2024 - [warning] Got timeout on MySQL Ping(SELECT) child process and killed it! at /usr/local/share/perl5/MHA/HealthCheck.pm line 431.
Sat Jan 20 09:27:28 2024 - [info] Executing SSH check script: exit 0
Sat Jan 20 09:27:32 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.171.172.171' (4))
Sat Jan 20 09:27:32 2018 - [warning] Connection failed 2 time(s)..
Sat Jan 20 09:27:34 2024 - [warning] HealthCheck: Got timeout on checking SSH connection to 172.171.172.171! at /usr/local/share/perl5/MHA/HealthCheck.pm line 342.
Sat Jan 20 09:27:35 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.171.172.171' (4))
Sat Jan 20 09:27:35 2024 - [warning] Connection failed 3 time(s)..
Sat Jan 20 09:27:38 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.171.172.171' (4))
Sat Jan 20 09:27:38 2024 - [warning] Connection failed 4 time(s)..
Sat Jan 20 09:27:38 2024 - [warning] Master is not reachable from health checker!
Sat Jan 20 09:27:38 2024 - [warning] Master 172.171.172.171(172.171.172.171:3307) is not reachable!
Sat Jan 20 09:27:38 2024 - [warning] SSH is NOT reachable.
Sat Jan 20 09:27:38 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /data/mhacnf/qqweixinod.cnf again, and trying to connect to all servers to check server status..
Sat Jan 20 09:27:38 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jan 20 09:27:38 2024 - [info] Reading application default configuration from /data/mhacnf/qqweixinod.cnf..
Sat Jan 20 09:27:38 2024 - [info] Reading server configuration from /data/mhacnf/qqweixinod.cnf..
Sat Jan 20 09:27:39 2024 - [info] GTID failover mode = 1
Sat Jan 20 09:27:39 2024 - [info] Dead Servers:
Sat Jan 20 09:27:39 2024 - [info] 172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:39 2024 - [info] Alive Servers:
Sat Jan 20 09:27:39 2024 - [info] 172.171.172.172(172.171.172.172:3307)
Sat Jan 20 09:27:39 2024 - [info] 172.171.172.173(172.171.172.173:3307)
Sat Jan 20 09:27:39 2024 - [info] Alive Slaves:
Sat Jan 20 09:27:39 2024 - [info] 172.171.172.172(172.171.172.172:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Sat Jan 20 09:27:39 2024 - [info] GTID ON
Sat Jan 20 09:27:39 2024 - [info] Replicating from 172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:39 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jan 20 09:27:39 2024 - [info] 172.171.172.173(172.171.172.173:3307) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Sat Jan 20 09:27:39 2024 - [info] GTID ON
Sat Jan 20 09:27:39 2024 - [info] Replicating from 172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:39 2024 - [info] Checking slave configurations..
Sat Jan 20 09:27:39 2024 - [info] Checking replication filtering settings..
Sat Jan 20 09:27:39 2024 - [info] Replication filtering check ok.
Sat Jan 20 09:27:39 2024 - [info] Master is down!
Sat Jan 20 09:27:39 2024 - [info] Terminating monitoring script.
Sat Jan 20 09:27:39 2024 - [info] Got exit code 20 (Master dead).
Sat Jan 20 09:27:39 2024 - [info] MHA::MasterFailover version 0.56.
Sat Jan 20 09:27:39 2024 - [info] Starting master failover.
Sat Jan 20 09:27:39 2024 - [info]
Sat Jan 20 09:27:39 2024 - [info] * Phase 1: Configuration Check Phase..
Sat Jan 20 09:27:39 2024 - [info]
Sat Jan 20 09:27:40 2024 - [info] GTID failover mode = 1
Sat Jan 20 09:27:40 2024 - [info] Dead Servers:
Sat Jan 20 09:27:40 2024 - [info] 172.171.172.171(172.171.172.171:3307)
Sat Jan 20 09:27:40 2018 - [info] Checking master reachability via MySQL(double check)...
Sat Jan 20 09:27:40 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln218] The master 172.171.172.171(172.171.172.171:3307) is reachable via MySQL (error=1:Connection Succeeded) ! Stop failover. Sat Jan 20 09:27:40 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_manager line 65.
註:Log中的3307是資料庫的DB埠,別奇怪.
如果是在 Checking master reachability via MySQL(double check) 的過程中(或者check前),發現恢復了,則退出切換過程。並且MHA的進程也會被退出(KIll),masterha_manager 需要重新手動啟動。
Checking master reachability via MySQL(double check) ---MasterFailover.pm
源碼如下:
# quick check that the dead server is really dead # not double check when ping_type is insert, # because check_connection_fast_util can rerurn true if insert-check detects I/O failure. if ( $servers_config[0]->{ping_type} ne $MHA::ManagerConst::PING_TYPE_INSERT ) { $log->info("Checking master reachability via MySQL(double check)..."); if ( my $rc = MHA::DBHelper::check_connection_fast_util( $dead_master->{hostname}, $dead_master->{port}, $dead_master->{user}, $dead_master->{password} ) ) { $log->error( sprintf( "The master %s is reachable via MySQL (error=%s) ! Stop failover.", $dead_master->get_hostinfo(), $rc ) ); croak; } $log->info(" ok."); }