The controlfile header block returned by the OS has a sequence number that is too old. The controlfile might be corrupted.[VMware/CEntOS]

環境：Oracle 11.2, CEntOS 6.6, VMware 原因：磁碟I/O性能低效，VMware虛擬機備份占用大量磁碟I/O吞吐能力現象：最後的alert日誌內容 Fri Feb 03 17:38:57 2017********************* ATTENTION: **** ...

環境：Oracle 11.2, CEntOS 6.6, VMware

原因：磁碟I/O性能低效，VMware虛擬機備份占用大量磁碟I/O吞吐能力

現象：最後的alert日誌內容

Fri Feb 03 17:38:57 2017
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
     analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
USER (ospid: 30341): terminating the instance
Fri Feb 03 17:38:58 2017
System state dump requested by (instance=1, osid=30341 (PR00)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/wmsdst/WMSDI/trace/WMSDI_diag_30193_20170203173858.trc
Dumping diagnostic data in directory=[cdmp_20170203173858], requested by (instance=1, osid=30341 (PR00)), summary=[abnormal instance termination].
Instance terminated by USER, pid = 30341

問題處理：定位並消除磁碟性能影響因素，查詢並按照MOS上的最新解決方案處理。

簡單處理：請先備份每一個在用的控制文件、因為我們並不知道哪一個是最新的。

然後可以startup mount;看看哪一個控制文件的sequence是最新的，複製覆蓋其他未同步的控制文件。

至於MOS上的BUG 14281768 - CONTROL FILE GETS CORRUPTEDOLUTION儘可能不要這麼處理，除非在限定短時間內找不到磁碟I/O性能為何低效時臨時處理。

Error is typically raised when the Controlfile is overwritten by an older copy of the Controlfile. Most likely this happened due to Storage OR I/o error.
All copies of the control file must have the same internal sequence number for Oracle to start up the database or shut it down in normal or immediate mode.

To make a sanity check in the future , please set the following parameter :-

SQL> alter system set "_controlfile_update_check"='HIGH' scope=spfile; -- then bounce the database.

Please check with your OS System/Storage admin regarding the issue.

The precautions is to relocate the control file on a fast and direct I/O enabled disk , the main target is not letting the OS to write an old copy (cached copy of the controlfile to it).
To reverse the parameter setting :-

SQL> alter system set "_controlfile_update_check"='OFF' scope=spfile; -- then bounce the database.

更多問題觸發場景：

VMware上的Oracle資料庫最佳實踐指南 54頁

12. Backup and Recovery
12.1 Oracle Backup and Recovery Overview
The main purpose of a database backup and recovery strategy is to protect the database against data loss and reconstruct the database after data loss. Typical backup tasks performed by an Oracle DBA would include setting up the database environment for backup and recovery, setting up a backup schedule, monitoring the backup and recovery environment, and troubleshooting backup problems
A backup can be either a physical or a logical backup. Physical backups are physical copies of the database files which include data files, control files, and archive log files. Logical backups contain a logical copy of the data, such as tables, indexes, procedures, functions, and so on. You can use Oracle Data Pump to export logical data to binary files, which you can later import into the database.
There are levels of triggering Oracle database backups within the VMware environment:
* In guest Oracle backup using Oracle Recovery Manager (RMAN)
* VMware level backup using VMware vSphere Data Protection™ / VMware vSphere Data Protection Advanced
* Storage based backup tools
* vSphere Virtual Volumes using vSphere 6.0
vSphere recommends either using Oracle Recovery Manager (RMAN), storage-based backup tools, or vSphere Virtual Volumes using vSphere 6.0.
12.2 Oracle Recovery Manager (RMAN)
For implementing an effective Oracle database backup and recovery strategy, Oracle Recovery Manager (RMAN) is typically the preferred Oracle solution.
RMAN provides a common interface for backup tasks across different host operating systems, and offers several backup techniques not available through user-managed methods.
The method of deploying and using RMAN to backup an Oracle database does not change when virtualizing an Oracle database. It is same across both physical and virtualized environments.
For more information on Oracle Recovery Manager, see the Oracle documentation at https://docs.oracle.com/database/121/BRADV/toc.htm.
12.3 vSphere Data Protection
Any virtual machine VMDK can be backed up with VMware snapshot technology as long as it is not set to Independent-Persistent mode.
A virtual machine housing an Oracle database has two types of VMDKs—guest OS VMDK and the VMDKs housing the Oracle data files.
VMware does not recommend that you back up a high transactional, heavy I/O-centric Oracle database using VMware snapshot technology because, during the snapshot removal (consolidation), there is a brief stun moment. No activity is permitted against the virtual machine, which might result in performance issue and service disruptions.
For more information, see A snapshot removal can stop a virtual machine for long time (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002836).
You can, however, back up Oracle non-production databases (development, test, QA, pre-production, and so on) using VMware snapshot technology.

此文地址：http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/vmware-oracle-databases-on-vmware-best-practices-guide.pdf

當啟用更改塊跟蹤 (CBT) 後，備份虛擬機失敗 (2119254)

Symptoms

免責聲明：本文為 Backing up the virtual machine fails when CBT is enabled (2114076) 的翻譯版本。儘管我們會不斷努力為本文提供最佳翻譯版本，但本地化的內容可能會過時。有關最新內容，請參見英文版本。

在主機上升級至 VMware ESXi 6.0.x 後備份啟用了更改塊跟蹤 (CBT) 的虛擬機失敗
在主機上安裝 VMware ESXi 6.0.x 後備份啟用了更改塊跟蹤 (CBT) 的虛擬機失敗
打開虛擬機電源失敗。
擴展虛擬磁碟大小失敗。
生成虛擬機靜止快照失敗。
vSphere Client 顯示類似以下內容的錯誤：

An error occurred while taking a snapshot: msg.snapshot.error-QUIESCINGERROR

註意：此錯誤可能存在，也可能不存在

在正在運行受影響虛擬機的 ESXi 主機上的 /var/log/vmkernel.log 文件中，您會看到類似以下內容的錯誤：

<YYYY-MM-DD>T<TIME>.623Z cpu5:809536)WARNING: CBT: 191: No memory available! Called from 0x4180219af50e
<YYYY-MM-DD>T<TIME>.637Z cpu5:809536)WARNING: CBT: 191: No memory available! Called from 0x4180219af50e
<YYYY-MM-DD>T<TIME>.648Z cpu5:809536)WARNING: CBT: 191: No memory available! Called from 0x4180219af50e

在受影響虛擬機的 vmware.log 文件中，您會看到類似以下內容的條目：

vcpu-0| I120: DISKLIB-CBT : Creating cbt node 92b78c-cbt failed with error Cannot allocate memory (0xbad0014, Out of memory)

Purpose

要解決在啟用 CBT 後備份虛擬機發生故障的問題，升級到 ESXi 6.0 Build 2715440。

Cause

出現此問題的原因是堆耗盡。嘗試啟用更改塊跟蹤 (CBT) 時可能會出現此問題。如果虛擬機中的大量虛擬磁碟達到閾值上限，則會因為堆耗盡而導致 CBT 啟用失敗。多個虛擬機啟用了 CBT 時也會出現此問題。對於啟用了 VSS 的 Windows 虛擬機，生成靜止快照會產生雙倍的記憶體開銷。最後，如果堆即將耗盡，則執行 vMotion 也會引起此問題，因為該過程也涉及生成快照。

註意：虛擬磁碟可跨虛擬機，也可位於一個虛擬機內。

Resolution

該問題在 VMware ESXi 6.0 Build 2715440 中已得到解決，後者可從 VMware Downloads 獲取。有關詳細信息，請參見 VMware ESXi 6.0, Patch Release ESXi600-201505001 (2116125)。

Additional Information

有關啟用和禁用 CBT 的詳細信息，請參見在虛擬機上啟用塊修改跟蹤 (CBT) (2078214)。

此文地址：

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2119254

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2114076