概述 在RAC環境下配置OGG,要想實現RAC節點故障時,OGG能自動的failover到正常節點,要保證兩點: 1. OGG的checkpoint,trail,BR文件放置在共用的集群文件系統上,RAC各節點都能訪問到 2. 需要有集群軟體的來監測OGG進程,以及發生故障時,自動在正常節點重啟OG ...
概述
在RAC環境下配置OGG,要想實現RAC節點故障時,OGG能自動的failover到正常節點,要保證兩點:
1. OGG的checkpoint,trail,BR文件放置在共用的集群文件系統上,RAC各節點都能訪問到
2. 需要有集群軟體的來監測OGG進程,以及發生故障時,自動在正常節點重啟OGG(failover)
Oracle Grid Infrastructure Standalone Agents (XAG)搭配Oracle支持的集群文件系統,可以實現OGG的自動failover,本文介紹相關的配置步驟。
組件及版本要求
要想使用XAG實現自動failover,相關軟體的版本必須滿足要求:
至於集群文件系統,Oracle官方文檔給出的建議是ACFS,DBFS和OCFS,我覺得其他集群文件系統,比如Veritas 的集群文件系統應該也可以。
本文示例使用的是ACFS。
測試環境軟體版本
源端資料庫:11.2.0.4 RAC (ASM)
目標端資料庫:12.1.0.2 RAC(ASM)
GoldenGate : 12.2.0.1.1
操作系統:源端和目標端都是Oracle Enterprise Linux 6.5 (64bit)
配置步驟
安裝GI XAG
XAG需要單獨去Oracle官網下載安裝 ,下載位置是:http://www.oracle.com/technetwork/database/database-technologies/clusterware/downloads/index.html
目前的版本是7,文件是xagpack_7b.zip
解壓縮文件,然後用GI的安裝用戶(一般是“grid”),執行xagsetup.sh進行安裝:
[grid@rac1 xag]$ ./xagsetup.sh --install --directory /u01/app/grid/xaghome --all_nodes
Installing Oracle Grid Infrastructure Agents on: rac1
Installing Oracle Grid Infrastructure Agents on: rac2
Done.
在目標端也安裝XAG,方法和源端相同。
源端(11.2)創建ACFS
11.2.0.4 在OEL上如果想用ACFS,必須安裝PSU補丁到11.2.0.4.4以上。補丁過程略過。
使用ACFS的磁碟組的屬性值COMPATIBLE.ASM和COMPATIBLE.ADVM必須設置為11.2 :
使用ASMCMD或ASMCA創建ACFS捲:
創建通用ACFS
此時ACFS還不是CRS管理的,可以使用ASMCMD的volinfo命令或/sbin/acfsutil registry查看ACFS信息
ASMCMD> volinfo -a
Diskgroup Name: DATA
Volume Name: VOLOGG1
Volume Device: /dev/asm/vologg1-426
State: ENABLED
Size (MB): 3072
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /u01/app/grid/acfsmounts/data_vol1
[root@rac1 ~]# /sbin/acfsutil registry
Mount Object:
Device: /dev/asm/vologg1-426
Mount Point: /u01/app/grid/acfsmounts/data_vol1
Disk Group: DATA
Volume: VOLOGG1
Options: none
Nodes: all
源端(11.2)將ACFS註冊到CRS
首先從通用ACFS的註冊信息中刪除我們剛纔創建的ACFS的條目
[root@rac1 ~]# /sbin/acfsutil registry -d /u01/app/grid/acfsmounts/data_vol1
acfsutil registry: successfully removed ACFS mount point /u01/app/grid/acfsmounts/data_vol1 from Oracle Registry
然後,用SRVCTL工具進行CRS資源註冊:
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl add filesystem -d /dev/asm/vologg1-426 -v VOLOGG1 -g DATA -m /u01/app/grid/acfsmounts/data_vol1 -u grid
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
手工啟動資源,(mount ACFS)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/srvctl start filesystem -d /dev/asm/vologg1-426
[root@rac1 ~]#
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl status resource -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.data.vologg1.acfs
ONLINE ONLINE rac1 mounted on /u01/app /grid/acfsmounts/dat a_vol1
ONLINE ONLINE rac2 mounted on /u01/app/grid/acfsmounts/dat a_vol1
[root@rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 32G 12G 74% /
tmpfs 2.0G 437M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
[root@rac2 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rac1-lv_root 45G 25G 19G 58% /
tmpfs 2.0G 440M 1.6G 23% /dev/shm
/dev/sda1 477M 55M 397M 13% /boot
/dev/asm/vologg1-426 3.0G 83M 3.0G 3% /u01/app/grid/acfsmounts/data_vol1
目標端(12.1)創建ACFS及註冊
12c創建ACFS和11g的主要區別是,沒有了通用和資料庫home用文件系統的選項,創建後會生成註冊文件系統到CRS的腳本。
運行系統生成的腳本,完成註冊及掛載:
[root@oel65vm11 scripts]# ./acfs_script.sh
ACFS file system /u01/app/grid/acfsmounts/ogg_vol1 is mounted on nodes oel65vm11,oel65vm12
查看資源信息:
[root@oel65vm11 bin]# ./crsctl status resource -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.VOLOGG2.advm
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.DATA.dg
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.asm
ONLINE ONLINE oel65vm11 Started,STABLE
ONLINE ONLINE oel65vm12 Started,STABLE
ora.data.vologg2.acfs
ONLINE ONLINE oel65vm11 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ONLINE ONLINE oel65vm12 mounted on /u01/app/grid/acfsmounts/ogg_vol1,STABLE
ora.net1.network
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
ora.ons
ONLINE ONLINE oel65vm11 STABLE
ONLINE ONLINE oel65vm12 STABLE
註意,所有節點必須關掉SELINUX,否則會出現ACFS無權寫入的錯誤。
安裝Oracle GoldenGate
這個版本的ogg同時支持11g和12c的資料庫,在圖形界面安裝時,用戶可以選擇對應不同資料庫版本的ogg
將OGG安裝在前面創建的ACFS上:
源端的安裝位置:/u01/app/grid/acfsmounts/data_vol1/ogg12
目標端的安裝位置:/u01/app/grid/acfsmounts/ogg_vol1/ogg12
選擇自動啟動Manager進程。
資料庫準備工作
l 變更源端資料庫為歸檔模式,過程省略。
l 源端資料庫增加相關日誌及修改參數:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
Database altered.
SQL> ALTER DATABASE FORCE LOGGING;
Database altered.
SQL> SELECT supplemental_log_data_min, force_logging FROM v$database;
SUPPLEME FORCE_LOGGING
-------- ---------------------------------------
YES YES
SQL> ALTER SYSTEM SWITCH LOGFILE;
System altered.
SQL> alter system set ENABLE_GOLDENGATE_REPLICATION=true;
System altered.
l 在源端和目標端創建OGG資料庫用戶及授權,我的例子里創建的用戶是GGADM。
OGG用戶需要的許可權可以參閱聯機文檔《Installing and Configuring Oracle GoldenGate for Oracle Database 12c (12.2.0.1)》中的章節 4.1.4.1 Oracle 11.2.0.4 or Later Database Privileges,我們這個測試為了方便,授予用戶DBA角色,以及使用特定系統包的授權:
SQL> BEGIN
dbms_goldengate_auth.grant_admin_privilege
2 3 (
grantee => 'GGADM',
privilege_type => 'CAPTURE',
grant_select_privileges => TRUE
);
END;
/ 4 5 6 7 8 9
PL/SQL procedure successfully completed.
源端OGG設置
l 登錄資料庫:
GGSCI (rac1.hthorizontest.com) 1> dblogin userid ggadm password ggadm
Successfully logged into database.
l 註冊集成式抽取
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 2> register extract ext1 database;
2016-04-07 23:44:38 INFO OGG-02003 Extract EXT1 successfully registered with database at SCN 1291634.
l 增加抽取進程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 3> ADD EXTRACT ext1 INTEGRATED TRANLOG, BEGIN NOW
EXTRACT (Integrated) added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 4> ADD EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et, EXTRACT ext1
EXTTRAIL added.
l 增加傳送進程
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 5> ADD EXTRACT pump1 EXTTRAILSOURCE /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
EXTRACT added.
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 6>EDIT PARAMS EXT1
加入下麵內容:
EXTRACT ext1
USERID ggadm, PASSWORD ggadm
TRANLOGOPTIONS INTEGRATED PARAMS (MAX_SGA_SIZE 100)
EXTTRAIL /u01/app/grid/acfsmounts/data_vol1/ogg12/dirdat/et
TABLE test.*;
GGSCI (rac1.hthorizontest.com as ggadm@tdb1) 7>EDIT PARAMS PUMP1
加入下麵內容:
EXTRACT pump1
USERID ggadm, PASSWORD ggadm
RMTHOST 192.168.0.11, MGRPORT 7809
RMTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt
TABLE TEST.*;
然後啟動所有進程。
在11.2.0.4版本,如果實現集成的capture模式,在啟動抽取進程時,會提示需要安裝補丁17030189,主要是因為使用集成的capture,需要修改數據字典表。
但是在安裝了PSU後,有時會導致這個補丁和其他補丁衝突,也可以手工執行prvtlmpg.plb來解決問題。
(EXTRACT Abending With OGG-02912 (Doc ID 2091679.1))
目標端OGG設置
GGSCI (oel65vm11.hthorizon.com) 8> dblogin userid ggadm password ggadm
Successfully logged into database.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 9>ADD CHECKPOINTTABLE ggadm.checkpointtab
Successfully created checkpoint table ggadm.checkpointtab
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 10> ADD REPLICAT rep1, EXTTRAIL /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt checkpointtable ggadm.checkpointtab
REPLICAT added.
GGSCI (oel65vm11.hthorizon.com as ggadm@racdb1) 11>EDIT PARAMS REP1
加入下麵內容:
REPLICAT rep1
USERID ggadm, PASSWORD ggadm
ASSUMETARGETDEFS
DISCARDFILE /u01/app/grid/acfsmounts/ogg_vol1/ogg12/dirdat/rt, PURGE
MAP TEST.* TARGET TEST.*;
然後啟動進程,測試OGG數據複製是否正常
修改OGG MGR參數
為了讓OGG的Manager進程能夠自動啟動複製進程,需要將下列配置加進Manager的配置文件:
AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
重啟Manager進程使之生效。
源端和目標端都要修改。
配置源端XAG
l 添加APP VIP(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.36 -vipname=xag.gg_1-vip.vip -user=oracle
l 允許grid用戶啟動資源(以root身份)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
l 啟動VIP(以grid身份)
[root@rac1 ~]# su - grid
[grid@rac1 ~]$ /u01/app/11.2.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'rac1'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'rac1' succeeded
l 查看狀態
[grid@rac1 ~]$ crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rac1
l 創建OGG對應的CRS資源(以root身份)
[root@rac1 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_1 --gg_home /u01/app/grid/acfsmounts/data_vol1/ogg12 --instance_type source --nodes rac1,rac2 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg1.acfs --databases ora.tdb.db --oracle_home /u01/app/oracle/product/11.2.0/dbhome_1 --monitor_extracts ext1,pump1
[root@rac1 ~]# cd /u01/app/grid/xaghome/bin
[root@rac1 bin]# ./agctl status goldengate gg_1
Goldengate instance 'gg_1' is not running
l 授權grid啟動資源
上面的命令執行完畢,會自動創建一個對應ogg的CRS資源,需要授權grid有權管理它:
[root@oel65vm11 bin]# /u01/app/11.2.0/grid/bin/crsctl setperm resource xag.gg_1.goldengate -u user:grid:r-x
配置目標端XAG
過程和源端類似,
l 創建VIP資源:
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/appvipcfg create -network=1 -ip=192.168.0.26 -vipname=xag.gg_1-vip.vip -user=oracle
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
[root@oel65vm11 ~]# /u01/app/12.1.0/grid/bin/crsctl relocate resource xag.gg_1-vip.vip -n oel65vm11
CRS-2673: Attempting to stop 'xag.gg_1-vip.vip' on 'oel65vm12'
CRS-2677: Stop of 'xag.gg_1-vip.vip' on 'oel65vm12' succeeded
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'oel65vm11'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'oel65vm11' succeeded
l 創建ogg 對應的CRS資源
[root@oel65vm11 bin]# /u01/app/grid/xaghome/bin/agctl add goldengate gg_2 --gg_home /u01/app/grid/acfsmounts/ogg_vol1/ogg12 --instance_type target --nodes oel65vm11,oel65vm12 --vip_name xag.gg_1-vip.vip --filesystems ora.data.vologg2.acfs --databases ora.racdb.db --oracle_home /u01/app/oracle/product/12.1.0/dbhome_1 --monitor_replicats rep1
l 授權
[root@oel65vm11 bin]# /u01/app/12.1.0/grid/bin/crsctl setperm resource xag.gg_2.goldengate -u user:grid:r-x
修改PUMP進程
將PUMP進程對應的源端地址修改為我們剛纔創建的VIP
RMTHOST 192.168.0.26, MGRPORT 7809
重啟PUMP進程
啟動CRS OGG資源
進入ggsci命令行,將源端和目標段進程都停掉
l 啟動目標端資源
[grid@oel65vm11 ~]$ cd $ORACLE_BASE
[grid@oel65vm11 grid]$ cd xaghome/bin
[grid@oel65vm11 bin]$ ./agctl start goldengate gg_2 --node oel65vm11
[grid@oel65vm11 bin]$ crsctl status resource xag.gg_2.goldengate
NAME=xag.gg_2.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on oel65vm11
l 啟動源端資源
[grid@rac1 bin]$ cd $ORACLE_BASE
[grid@rac1 grid]$ cd xaghome/bin
[grid@rac1 bin]$ ./agctl start goldengate gg_1 --node rac1
[grid@rac1 bin]$ crsctl status resource xag.gg_1.goldengate
NAME=xag.gg_1.goldengate
TYPE=xag.goldengate.type
TARGET=ONLINE
STATE=ONLINE on rac1
啟動後,進入GGSCI命令行,查看進程狀態,如果進程都自動啟動了,說明配置沒有問題。
切換測試
使用命令測試源端切換:
[grid@rac1 bin]$ ./agctl relocate goldengate gg_1 --node rac2
[grid@rac1 bin]$ crsctl status resource –t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
xag.gg_1-vip.vip
1 ONLINE ONLINE rac2
xag.gg_1.goldengate
1 ONLINE ONLINE rac2
再做一個切斷電源的測試,我們以“關掉電源”的方式關閉目標端的主機oel65vm11
在主機oel65vm12上,可以看到RAC的vip failover到了本節點,ogg的vip和gg_2對應的資源也自動failover到了本節點:
[grid@oel65vm12 ~]$ crsctl status resource -t
。。。。。。
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
。。。。。。
ora.oel65vm11.vip
1 ONLINE INTERMEDIATE oel65vm12 FAILED OVER,STABLE
ora.oel65vm12.vip
1 ONLINE ONLINE oel65vm12 STABLE
ora.racdb.db
1 ONLINE OFFLINE STABLE
2 ONLINE ONLINE oel65vm12 Open,STABLE
ora.scan1.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_1-vip.vip
1 ONLINE ONLINE oel65vm12 STABLE
xag.gg_2.goldengate
1 ONLINE ONLINE oel65vm12 STABLE
上面只是一個最簡單的例子,沒有考慮各種複雜的情況,例如,同時部署有監控jagent,或者downstream複製等等,所以現實的生產環境往往比這個例子複雜得多。