換了網線異常了,CRS無法正常啟動,clssnmSendingThread: sending status msg to all nodes同事換網線前我將節點2正常關閉了,換完網線告訴我,發現節點2死活起不來了,看上面的日誌和一些帖子最後也沒解決,嘗試過重啟、網線拔掉重新插上、查看過存儲是否正常和 ...
換了網線異常了,CRS無法正常啟動,clssnmSendingThread: sending status msg to all nodes
同事換網線前我將節點2正常關閉了,換完網線告訴我,發現節點2死活起不來了,看上面的日誌和一些帖子最後也沒解決,嘗試過重啟、網線拔掉重新插上、查看過存儲是否正常和存儲重新掛載。。。。看過一個帖子說可能是OCR信息發生了改變,不過之前沒備份,也沒忘這方面深入考慮。
最後還是沒搞定,主要是技術有限,沒準確的定位出具體問題也不敢輕易亂動。。。
20xx-12-16 19:01:05.792: [ CSSD][3786819328]clssnmSendingThread: sending join msg to all nodes
20xx-12-16 19:01:05.792: [ CSSD][3786819328]clssnmSendingThread: sent 5 join msgs to all nodes
20xx-12-16 19:01:06.295: [GIPCHALO][3811858176] gipchaLowerProcessNode: no valid interfaces found to node for 7286464 ms, node 0x7fecd0028450 { host 'myrac1', haName 'CSS_myrac-cluster', srcLuid fac66ea4-f1a960af, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [249 : 249], createTime 7037424, sentRegister 1, localMonitor 1, flags 0x4 }
20xx-12-16 19:01:06.303: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
20xx-12-16 19:01:06.420: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618800, LATS 7286584, lastSeqNo 211618797, uniqueness 1576485880, timestamp 1576494065/8540734
20xx-12-16 19:01:06.435: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618802, LATS 7286594, lastSeqNo 211618799, uniqueness 1576485880, timestamp 1576494066/8541524
20xx-12-16 19:01:07.304: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
20xx-12-16 19:01:07.421: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618803, LATS 7287584, lastSeqNo 211618800, uniqueness 1576485880, timestamp 1576494066/8541734
20xx-12-16 19:01:07.435: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618805, LATS 7287604, lastSeqNo 211618802, uniqueness 1576485880, timestamp 1576494067/8542524
20xx-12-16 19:01:08.304: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
20xx-12-16 19:01:08.422: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618806, LATS 7288584, lastSeqNo 211618803, uniqueness 1576485880, timestamp 1576494067/8542734
20xx-12-16 19:01:08.436: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618808, LATS 7288604, lastSeqNo 211618805, uniqueness 1576485880, timestamp 1576494068/8543524
20xx-12-16 19:01:09.304: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
20xx-12-16 19:01:09.422: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618809, LATS 7289584, lastSeqNo 211618806, uniqueness 1576485880, timestamp 1576494068/8543744
20xx-12-16 19:01:09.437: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618811, LATS 7289604, lastSeqNo 211618808, uniqueness 1576485880, timestamp 1576494069/8544524
20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmRcfgMgrThread: Local Join
20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: begin on node(2), waittime 193000
20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: set curtime (7289964) for my node
20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: scanning 32 nodes
20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: Node myrac1, number 1, is in an existing cluster with disk state 3
20xx-12-16 19:01:09.803: [ CSSD][3785242368]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
20xx-12-16 19:01:10.305: [ CSSD][3789973248]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
20xx-12-16 19:01:10.423: [ CSSD][3799754496]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618812, LATS 7290584, lastSeqNo 211618809, uniqueness 1576485880, timestamp 1576494069/8544744
20xx-12-16 19:01:10.437: [ CSSD][3804591872]clssnmvDHBValidateNcopy: node 1, myrac1, has a disk HB, but no network HB, DHB has rcfg 471981092, wrtcnt, 211618814, LATS 7290604, lastSeqNo 211618811, uniqueness 1576485880, timestamp 1576494070/8545524
20xx-12-16 19:01:10.794: [ CSSD][3786819328]clssnmSendingThread: sending join msg to all nodes
20xx-12-16 19:01:10.794: [ CSSD][3786819328]clssnmSendingThread: sent 5 join msgs to all nodes
20xx-12-16 20:36:02.919: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), commissioner(-1/0)
20xx-12-16 20:36:02.919: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(118), status(0), sendresp(1)
20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmTestSetLastGrockUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(118) msgseq(119), lastupdt<0x7fbb58031e10>, ignoreseq(0)
20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmGrockOpTagProcess: Request to commission member(1) using key(1) for grock(CLSN.ONSNETPROC.MASTER)
20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), commissioner(1/1)
20xx-12-16 20:36:02.920: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(119), status(0), sendresp(1)
20xx-12-16 20:36:02.921: [ CSSD][2756265728]clssgmTestSetLastGrockUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(119) msgseq(120), lastupdt<0x7fbb5804d490>, ignoreseq(0)
20xx-12-16 20:36:02.921: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), private data(2052), incarn(40)
20xx-12-16 20:36:02.921: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(120), status(0), sendresp(1)
20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmTestSetLastGrockUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(120) msgseq(121), lastupdt<0x7fbb5803dee0>, ignoreseq(0)
20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmGrockOpTagProcess: Request to commission member(-1) using key(1) for grock(CLSN.ONSNETPROC.MASTER)
20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmUpdateGrpData: grock(CLSN.ONSNETPROC.MASTER), commissioner(-1/0)
20xx-12-16 20:36:02.922: [ CSSD][2756265728]clssgmHandleGrockRcfgUpdate: grock(CLSN.ONSNETPROC.MASTER), updateseq(121), status(0), sendresp(1)
20xx-12-16 20:36:05.064: [ CSSD][2753111808]clssnmSendingThread: sending status msg to all nodes
20xx-12-16 20:36:05.064: [ CSSD][2753111808]clssnmSendingThread: sent 5 status msgs to all nodes
20xx-12-16 20:36:09.065: [ CSSD][2753111808]clssnmSendingThread: sending status msg to all nodes
20xx-12-16 20:36:09.065: [ CSSD][2753111808]clssnmSendingThread: sent 4 status msgs to all nodes
20xx-12-16 20:36:14.066: [ CSSD][2753111808]clssnmSendingThread: sending status msg to all nodes
...
根據日誌能判斷出bond信息變了嗎?我當時沒發現也沒分析出來,最後同事說改了bond!當時不是說只換根網線重新排下線嗎?我說改回去試試,果然如此,重啟一切正常了
胡亂重啟了下,沒起來。。。
[root@myrac2 bin]# ./crsctl query crs activeversion
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
[root@myrac2 bin]# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
[grid@myrac2 ~]$ cd /u01/app/11.2.0/grid/bin/
[grid@myrac2 bin]$ srvctl start nodeapps -n myrac2
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.net1.network is registered
Cannot communicate with crsd
PRCR-1035 : Failed to look up CRS resource myrac2 for ora.cluster_vip.type
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
[grid@myrac2 bin]$ srvctl start asm -n myrac2
PRCR-1070 : Failed to check if resource ora.asm is registered
Cannot communicate with crsd
[grid@myrac2 bin]$ srvctl start database -d testdb2
PRCD-1027 : Failed to retrieve database testdb2
PRCR-1115 : Failed to find entities of type resource that match filters ((NAME == ora.testdb2.db) && (TYPE == ora.database.type)) and contain attributes VERSION,ORACLE_HOME,DATABASE_TYPE
Cannot communicate with crsd
[grid@myrac2 bin]$
節點2被修改的bond,明顯跟1不一樣
[root@myrac2 11.2.0]# service network status
Configured devices:
lo bond0 bond1 em1 em2 em3 em4
Currently active devices:
lo em1 em2 em3 em4 bond0 bond1
[root@myrac2 11.2.0]#
節點1
[root@myrac1 ~]# service network status
Configured devices:
lo bond0 em1 em2 em3 em4 idrac
Currently active devices:
lo em1 em2 em3 bond0
拋開技術行不行先不說,單這件事來說,同事之間的合作有時候更重要。一不小心你就會給別人挖個坑或掉到別人給你挖的坑