联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
一套近200T的,6个节点的RAC,由于存储管线链路不稳定,导致服务器经常性掉盘,引起asm 磁盘组频繁dismount/mount,数据库集群节点不停的重启,修复好链路问题之后,数据库启动报ORA-01113,ORA-01110
通过Oracle数据库异常恢复检查脚本(Oracle Database Recovery Check)脚本检测,发现有10个数据文件异常,无法正常恢复
该库比较大,有近200T,因此恢复需要各位谨慎(无法做现场备份,另外客户要求2天时间必须恢复好)
由于数据库是非归档模式,该库无法通过应用归档日志来实现对这些文件进行恢复,对于这种情况,直接使用dbms_diskgroup把数据文件头拷贝到文件系统中,类似操作
SQL> @dbms_diskgroup_get_block.sql +DATA/xifenfei.dbf 1 1 /tmp/xff/xifenfei.dbf.header Parameter 1: ASM_file_name (required) Parameter 2: block_to_extract (required) Parameter 3 number_of_blocks_to_extract (required) Parameter 4: FileSystem_File_Name (required) old 14: v_AsmFilename := '&ASM_File_Name'; new 14: v_AsmFilename := '+DATA/xifenfei.dbf'; old 15: v_offstart := '&block_to_extract'; new 15: v_offstart := '1'; old 16: v_numblks := '&number_of_blocks_to_extract'; new 16: v_numblks := '1'; old 17: v_FsFilename := '&FileSystem_File_Name'; new 17: v_FsFilename := '/tmp/xff/xifenfei.dbf.header'; File: +DATA/xifenfei.dbf Type: 2 Data File Size (in logical blocks): 3978880 Logical Block Size: 16384 Physical Block Size: 512 PL/SQL procedure successfully completed.
然后通过bbed修改相关scn
BBED> set filename 'xifenfei.dbf.header' FILENAME xifenfei.dbf.header BBED> set blocksize 16384 BLOCKSIZE 16384 BBED> map File: xifenfei.dbf.header (0) Block: 1 Dba:0x00000000 ------------------------------------------------------------ Data File Header struct kcvfh, 860 bytes @0 ub4 tailchk @16380 BBED> p kcvfh.kcvfhckp.kcvcpscn struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xa8061324 ub2 kscnwrp @488 0x0081 BBED> assign file 295 block 1 kcvfh.kcvfhckp.kcvcpscn = file 1 block 1 kcvfh.kcvfhckp.kcvcpscn; struct kcvcpscn, 8 bytes @484 ub4 kscnbas @484 0xa8133e2b ub2 kscnwrp @488 0x0081
然后把修改的数据文件头写回到asm中
SQL> @dbms_diskgroup_cp_block_to_asm.sql /tmp/xff/xifenfei.dbf.header +DATA/xifenfei.dbf 1 1 Parameter 1: v_FsFileName (required) Parameter 2: v_AsmFileName (required) Parameter 3 v_offstart (required) Parameter 4 v_numblks (required) old 16: v_FsFileName := '&v_FsFileName'; new 16: v_FsFileName := '/tmp/xff/xifenfei.dbf.header'; old 17: v_AsmFileName := '&v_AsmFileName'; new 17: v_AsmFileName := '+DATA/xifenfei.dbf'; old 18: v_offstart := '&v_offstart'; new 18: v_offstart := '1'; old 19: v_numblks := '&v_numblks'; new 19: v_numblks := '1'; File: +DATA/xifenfei.dbf Type: 2 Data File Size (in logical blocks): 3978880 Logical Block Size: 16384 PL/SQL procedure successfully completed.
查询文件头是否修改成功
[oracle@xff1 xff]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.3.0 Production on Sat Aug 10 16:45:02 2024 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> set numw 16 SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file# in (1,295); CHECKPOINT_CHANGE# ------------------ 556870614571 556870614571 SQL> recover datafile 295; Media recovery complete.
通过上述操作,确认bbed修改文件头成功,后续类似方法对其他9个文件进行修改,并打开数据库
SQL> recover database; Media recovery complete. SQL> alter database open; Database altered.
alert日志提示
Sat Aug 10 16:46:11 2024 ALTER DATABASE RECOVER datafile 295 Media Recovery Start Serial Media Recovery started WARNING! Recovering data file 295 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. Media Recovery Complete (xff1) Completed: ALTER DATABASE RECOVER datafile 295 Sat Aug 10 16:46:39 2024 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Sat Aug 10 16:46:51 2024 WARNING! Recovering data file 1139 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1140 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1601 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1803 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1827 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 1931 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2185 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2473 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. WARNING! Recovering data file 2616 from a fuzzy backup. It might be an online backup taken without entering the begin backup command. Sat Aug 10 16:46:54 2024 Parallel Media Recovery started with 64 slaves Media Recovery Complete (xff1) Completed: ALTER DATABASE RECOVER database Sat Aug 10 17:19:58 2024 alter database open This instance was first to open Sat Aug 10 17:19:58 2024 SUCCESS: diskgroup DATA was mounted Sat Aug 10 17:19:58 2024 NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established Sat Aug 10 17:20:10 2024 Picked broadcast on commit scheme to generate SCNs Sat Aug 10 17:20:10 2024 SUCCESS: diskgroup REDO was mounted Sat Aug 10 17:20:10 2024 NOTE: dependency between database xff and diskgroup resource ora.REDO.dg is established Thread 1 opened at log sequence 124958 Current log# 14 seq# 124958 mem# 0: +REDO/xff/log2.ora Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Sat Aug 10 17:20:14 2024 SMON: enabling cache recovery Instance recovery: looking for dead threads Instance recovery: lock domain invalid but no dead threads [33770] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:261099864 end:261100854 diff:990 (9 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK Sat Aug 10 17:20:16 2024 minact-scn: Inst 1 is now the master inc#:2 mmon proc-id:33650 status:0x7 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 Starting background process GTX0 Sat Aug 10 17:20:16 2024 GTX0 started with pid=45, OS id=34119 Starting background process RCBG Sat Aug 10 17:20:16 2024 RCBG started with pid=46, OS id=34121 replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Sat Aug 10 17:20:16 2024 QMNC started with pid=47, OS id=34134 Starting background process SMCO Completed: alter database open
检查数据字典一致性
SQL> @hcheck.sql HCheck Version 07MAY18 on 10-AUG-2024 18:24:49 ---------------------------------------------- Catalog Version 11.2.0.3.0 (1102000300) db_name: XFF Catalog Fixed Procedure Name Version Vs Release Timestamp Result ------------------------------ ... ---------- -- ---------- -------------- ------ .- LobNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS .- MissingOIDOnObjCol ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS .- SourceNotInObj ... 1102000300 <= *All Rel* 08/10 18:24:49 PASS .- OversizedFiles ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- PoorDefaultStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- PoorStorage ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- TabPartCountMismatch ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- OrphanedTabComPart ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- MissingSum$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- MissingDir$ ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- DuplicateDataobj ... 1102000300 <= *All Rel* 08/10 18:24:50 PASS .- ObjSynMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- ObjSeqMissing ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedUndo ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedIndex ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedIndexPartition ... 1102000300 <= *All Rel* 08/10 18:24:51 PASS .- OrphanedIndexSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedTable ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedTablePartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedTableSubPartition ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- MissingPartCol ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedSeg$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- OrphanedIndPartObj# ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- DuplicateBlockUse ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- FetUet ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- Uet0Check ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- SeglessUET ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- BadInd$ ... 1102000300 <= *All Rel* 08/10 18:24:52 PASS .- BadTab$ ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadIcolDepCnt ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- ObjIndDobj ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- TrgAfterUpgrade ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- ObjType0 ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadOwner ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- StmtAuditOnCommit ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadPublicObjects ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadSegFreelist ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadDepends ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- CheckDual ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- ObjectNames ... 1102000300 <= *All Rel* 08/10 18:24:53 PASS .- BadCboHiLo ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- ChkIotTs ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- NoSegmentIndex ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- BadNextObject ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- DroppedROTS ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- FilBlkZero ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- DbmsSchemaCopy ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- OrphanedObjError ... 1102000300 > 1102000000 08/10 18:24:54 PASS .- ObjNotLob ... 1102000300 <= *All Rel* 08/10 18:24:54 PASS .- MaxControlfSeq ... 1102000300 <= *All Rel* 08/10 18:24:55 PASS .- SegNotInDeferredStg ... 1102000300 > 1102000000 08/10 18:25:18 PASS .- SystemNotRfile1 ... 1102000300 > 902000000 08/10 18:25:18 PASS .- DictOwnNonDefaultSYSTEM ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS .- OrphanTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS .- ObjNotTrigger ... 1102000300 <= *All Rel* 08/10 18:25:18 PASS --------------------------------------- 10-AUG-2024 18:25:18 Elapsed: 29 secs --------------------------------------- Found 0 potential problem(s) and 0 warning(s) PL/SQL procedure successfully completed. Statement processed. Complete output is in trace file: /u01/app/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_71148_HCHECK.trc
运气不错,数据字典本身没有损坏,业务直接运行,一切正常(主要原因是在光纤链路不稳定的情况下,客户已经没有往库中写入数据)