联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
rac数据库多个节点均处于open状态,数据查询正常,但是应用入库有些时候会失败报类似ORA-01187: cannot read from file because it failed verification tests错误:
故障最初原因是由于有坏盘,换盘之后,有两个节点数据实例crash
Mon Aug 19 21:16:47 2024 Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc: ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei99.dbf' ORA-01207: file is more recent than control file - old control file Errors in file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_ckpt_75779.trc: ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei99.dbf' ORA-01207: file is more recent than control file - old control file CKPT (ospid: 75779): terminating the instance due to error 1242 Mon Aug 19 21:16:47 2024 System state dump requested by (instance=5, osid=75779 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff5/trace/xff5_diag_75725.trc Mon Aug 19 21:16:52 2024 ORA-1092 : opitsk aborting process Mon Aug 19 21:16:53 2024 ORA-1092 : opitsk aborting process Mon Aug 19 21:16:53 2024 License high water mark = 131 Termination issued to instance processes. Waiting for the processes to exit Mon Aug 19 21:17:02 2024 Instance termination failed to kill one or more processes Instance terminated by CKPT, pid = 75779 Mon Aug 19 21:17:03 2024 USER (ospid: 33495): terminating the instance Termination issued to instance processes. Waiting for the processes to exit Mon Aug 19 21:17:13 2024 Instance termination failed to kill one or more processes Instance terminated by USER, pid = 33495
但是数据库人工启动成功,查询所有数据文件均处于online状态
可是有部分入库进程非常慢大量等待在enq:HW – contention
所有数据库节点alert日志偶尔报ORA-01186: file 1399 failed verification tests等错
Tue Aug 20 21:30:02 2024 Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc: ORA-01186: file 1399 failed verification tests ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei99.dbf' ORA-01207: file is more recent than control file - old control file File 1399 not verified due to error ORA-01122 Read of datafile '+DATA/xifenfei99.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_dbw0_43828.trc: ORA-01186: file 1399 failed verification tests ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei99.dbf' ORA-01207: file is more recent than control file - old control file File 1399 not verified due to error ORA-01122
基于这种情况,初步判断:
1. 是由于该集群本身多节点(6个节点),只要有节点是open状态,其他节点关闭再启动依旧可以正常启动,但是无法写入数据到报ORA-01207错误的数据文件中(可以读取数据).
2. 如果所有节点关闭关闭,然后数据库无法正常启动会报ORA-01207: file is more recent than control file错误
这样的情况,根据以往经验,ORA-01207: file is more recent than control file通过重建ctl即可恢复,先关闭所有节点,然后尝试启动一个节点
SQL> alter database open; alter database open * ERROR at line 1: ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei99.dbf' ORA-01207: file is more recent than control file - old control file
alter database open Wed Aug 21 14:14:22 2024 SUCCESS: diskgroup REDO was mounted Wed Aug 21 14:14:22 2024 NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established Wed Aug 21 14:14:27 2024 Errors in file /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_47884.trc: ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei99.dbf' ORA-01207: file is more recent than control file - old control file ORA-1122 signalled during: alter database open...
和预期的一样,重试重建ctl,然后数据库报ORA-00600 [krhpfh_03-1210]错误
SQL> shutdown immediate; ORA-01109: database not open Database dismounted. ORACLE instance shut down. SQL> startup nomount pfile='/tmp/xff/pfile'; ORACLE instance started. Total System Global Area 1.3255E+11 bytes Fixed Size 2244832 bytes Variable Size 9.7442E+10 bytes Database Buffers 3.4897E+10 bytes Redo Buffers 208654336 bytes SQL> @rectl Control file created. SQL> SQL> SQL> SQL> recover database; ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done SQL> recover database using backup controlfile; ORA-00283: recovery session canceled due to errors ORA-00600: internal error code, arguments: [krhpfh_03-1210], [fno =], [1399], [fhcpc =], [274968], [fhccc =], [274983], [], [], [], [], [] ORA-01110: data file 1399: '+DATA/xifenfei99.dbf'
这里的提示是有fhcpc和fhccc值不对导致,通过bbed查看相关值
BBED> set file 1399 FILE# 1399 BBED> p kcvfhccc ub4 kcvfhccc @148 0x00043227 ===>274983(10进制) BBED> p kcvfhcpc ub4 kcvfhcpc @140 0x00043218 ===>274968(10进制)
报错比较明显通过bbed修改这两个值
BBED> m /x 2a390400 offset 148 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y File: /tmp/xff/1399.dbf.header (1399) Block: 1 Offsets: 148 to 659 Dba:0x5dc00001 ------------------------------------------------------------------------ 2a390400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0c000000 0f004441 5441315f 5442535f 45515f30 31000000 00000000 00000000 00000000 78010000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 cfebdd33 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 419333df 81001c0a 6ab13046 06000000 c1520400 02000000 10000000 7e000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 00000000 00000000 <32 bytes per line> BBED> m /x 2b390400 offset 140 File: /tmp/xff/1399.dbf.header (1399) Block: 1 Offsets: 140 to 651 Dba:0x5dc00001 ------------------------------------------------------------------------ 2b390400 e6ef524d 2a390400 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0c000000 0f004441 5441315f 5442535f 45515f30 31000000 00000000 00000000 00000000 78010000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 cfebdd33 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 419333df 81001c0a 6ab13046 06000000 c1520400 02000000 10000000 7e000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0d000d00 0d000100 <32 bytes per line>
修改好这些值之后,recover database和open数据库成功,检查字典正常,业务读写也正常,完成本次恢复任务
SQL> @hcheck HCheck Version 07MAY18 on 21-AUG-2024 15:13:02 ---------------------------------------------- Catalog Version 11.2.0.3.0 (1102000300) db_name: XFF Catalog Fixed Procedure Name Version Vs Release Timestamp Result ------------------------------ ... ---------- -- ---------- -------------- ------ .- LobNotInObj ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- MissingOIDOnObjCol ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- SourceNotInObj ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- OversizedFiles ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- PoorDefaultStorage ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- PoorStorage ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- TabPartCountMismatch ... 1102000300 <= *All Rel* 08/21 15:13:02 PASS .- OrphanedTabComPart ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- MissingSum$ ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- MissingDir$ ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- DuplicateDataobj ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- ObjSynMissing ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- ObjSeqMissing ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- OrphanedUndo ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- OrphanedIndex ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- OrphanedIndexPartition ... 1102000300 <= *All Rel* 08/21 15:13:03 PASS .- OrphanedIndexSubPartition ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- OrphanedTable ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- OrphanedTablePartition ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- OrphanedTableSubPartition ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- MissingPartCol ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- OrphanedSeg$ ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- OrphanedIndPartObj# ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- DuplicateBlockUse ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- FetUet ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- Uet0Check ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- SeglessUET ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadInd$ ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadTab$ ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadIcolDepCnt ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- ObjIndDobj ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- TrgAfterUpgrade ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- ObjType0 ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadOwner ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- StmtAuditOnCommit ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadPublicObjects ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadSegFreelist ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- BadDepends ... 1102000300 <= *All Rel* 08/21 15:13:04 PASS .- CheckDual ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- ObjectNames ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- BadCboHiLo ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- ChkIotTs ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- NoSegmentIndex ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- BadNextObject ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- DroppedROTS ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- FilBlkZero ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- DbmsSchemaCopy ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- OrphanedObjError ... 1102000300 > 1102000000 08/21 15:13:05 PASS .- ObjNotLob ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- MaxControlfSeq ... 1102000300 <= *All Rel* 08/21 15:13:05 PASS .- SegNotInDeferredStg ... 1102000300 > 1102000000 08/21 15:13:06 PASS .- SystemNotRfile1 ... 1102000300 > 902000000 08/21 15:13:06 PASS .- DictOwnNonDefaultSYSTEM ... 1102000300 <= *All Rel* 08/21 15:13:07 PASS .- OrphanTrigger ... 1102000300 <= *All Rel* 08/21 15:13:07 PASS .- ObjNotTrigger ... 1102000300 <= *All Rel* 08/21 15:13:07 PASS --------------------------------------- 21-AUG-2024 15:13:07 Elapsed: 5 secs --------------------------------------- Found 0 potential problem(s) and 0 warning(s) PL/SQL procedure successfully completed. Statement processed. Complete output is in trace file: /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_70961_HCHECK.trc