本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger的Oracle&MySQL技术博客
本文链接地址: 某省GA 30TB数据库恢复记录
前不久某省GA的由于存储掉电导致多套数据库无法启动,其中恢复某套大库时遇到了一些新的问题,之前从来没有遇到过。开始整个磁盘组都无法mount,经过抢修后成功打开了磁盘组。
Tue Jun 04 20:53:05 2019 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. NOTE: ASM instance returned error dumped to trace file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_111035.trc Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_111035.trc: ORA-00600: internal error code, arguments: [ORA_NPI_ERROR], [600], [ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], [] ], [], [], [], [], [], [], [], [], [] Tue Jun 04 20:53:05 2019 SUCCESS: diskgroup xxxx was dismounted SUCCESS: diskgroup xxxx was dismounted Tue Jun 04 20:53:05 2019 Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_105457.trc: ORA-00202: control file: '+xxxx/xxxx/controlfile/current.260.958145939' ORA-15078: ASM diskgroup was forcibly dismounted Tue Jun 04 20:53:06 2019 Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ckpt_105541.trc: ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '+xxxx/xxxx/controlfile/current.260.958145939' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ckpt_105541.trc: ORA-00221: error on write to control file ORA-00206: error in writing (block 3, # blocks 1) of control file ORA-00202: control file: '+xxxx/xxxx/controlfile/current.260.958145939' ORA-15078: ASM diskgroup was forcibly dismounted ORA-15078: ASM diskgroup was forcibly dismounted CKPT (ospid: 105541): terminating the instance due to error 221 Tue Jun 04 20:53:06 2019 System state dump requested by (instance=1, osid=105541 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_diag_105444_20190604205306.trc Dumping diagnostic data in directory=[cdmp_20190604205306], requested by (instance=1, osid=105541 (CKPT)), summary=[abnormal instance termination]. Instance terminated by CKPT, pid = 105541 。。。。。。。。。 Dump continued from file: /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_32878.trc ORA-15335: ASM metadata corruption detected in disk group 'xxxx' ORA-15130: diskgroup "xxxx" is being dismounted ORA-15066: offlining disk "xxxx_0015" in group "xxxx" may result in a data loss ORA-15196: invalid ASM block header [kfc.c:26368] [chec
经过一番抢救之后,数据库终于打开了,但是无法进行任何DDL操作,创建测试表报错:
SQL> create table test110 as select * from dba_objects where rownum <10; create table test110 as select * from dba_objects where rownum <10 * ERROR at line 1: ORA-00600: internal error code, arguments: [kqd-objerror$], [U], [2], [107], [PG_TJBB_DTZH], [], [], [], [], [], [], []
这个错误之前很少见到。我们跑一下Oracle 官方提供的hcheck脚本,检查了一下数据库中的数据字典情况,发现确实存在一些不一致的问题,如下:
Catalog Fixed Procedure Name Version Vs Release Timestamp Result ------------------------------ ... ---------- -- ---------- -------------- ------ .- LobNotInObj ... 1102000400 <= *All Rel* 06/06 08:05:19 PASS .- MissingOIDOnObjCol ... 1102000400 <= *All Rel* 06/06 08:05:19 PASS .- SourceNotInObj ... 1102000400 <= *All Rel* 06/06 08:05:19 PASS .- OversizedFiles ... 1102000400 <= *All Rel* 06/06 08:05:41 PASS .- PoorDefaultStorage ... 1102000400 <= *All Rel* 06/06 08:05:42 PASS .- PoorStorage ... 1102000400 <= *All Rel* 06/06 08:05:42 PASS .- TabPartCountMismatch ... 1102000400 <= *All Rel* 06/06 08:05:42 PASS .- OrphanedTabComPart ... 1102000400 <= *All Rel* 06/06 08:05:42 PASS .- MissingSum$ ... 1102000400 <= *All Rel* 06/06 08:05:42 PASS .- MissingDir$ ... 1102000400 <= *All Rel* 06/06 08:05:42 PASS .- DuplicateDataobj ... 1102000400 <= *All Rel* 06/06 08:05:42 FAIL HCKE-0014: Duplicate dataobj# (Doc ID 1360519.1) DATAOBJ#=378264 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_DWXXGLB_190603 Type=TABLE DATAOBJ#=378264 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_DWXXGLB_190603 Type=TABLE DATAOBJ#=378265 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_STYSB_190603 Type=TABLE DATAOBJ#=378265 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_STYSB_190603 Type=TABLE DATAOBJ#=378266 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_SAVECOLUMNINFO_190603 Type=TABLE DATAOBJ#=378266 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_SAVECOLUMNINFO_190603 Type=TABLE DATAOBJ#=378283 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_YH_DWXX_190603 Type=TABLE DATAOBJ#=378283 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_YH_DWXX_190603 Type=TABLE .- ObjSynMissing ... 1102000400 <= *All Rel* 06/06 08:05:44 PASS .- ObjSeqMissing ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedUndo ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedIndex ... 1102000400 <= *All Rel* 06/06 08:05:46 FAIL HCKE-0016: Orphaned IND$ (no SEG$) (Doc ID 1360531.1) ORPHAN IND$: OBJ=378298 DOBJ=378298 TS=59 RFILE/BLOCK=59/3827354 INDEX=CRJBKU.SYS_IL0000378296C00004$$ OF TABLE=CRJBKU.D_YH_XP_190603 BO#=378296 ORPHAN IND$: OBJ=378301 DOBJ=378301 TS=59 RFILE/BLOCK=236/1244970 INDEX=CRJBKU.SYS_IL0000378299C00009$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378303 DOBJ=378303 TS=59 RFILE/BLOCK=236/1244986 INDEX=CRJBKU.SYS_IL0000378299C00010$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378305 DOBJ=378305 TS=59 RFILE/BLOCK=236/1245002 INDEX=CRJBKU.SYS_IL0000378299C00013$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378307 DOBJ=378307 TS=59 RFILE/BLOCK=236/1245018 INDEX=CRJBKU.SYS_IL0000378299C00023$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378309 DOBJ=378309 TS=59 RFILE/BLOCK=236/1245034 INDEX=CRJBKU.SYS_IL0000378299C00024$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378311 DOBJ=378311 TS=59 RFILE/BLOCK=236/1245050 INDEX=CRJBKU.SYS_IL0000378299C00025$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378313 DOBJ=378313 TS=59 RFILE/BLOCK=236/1249290 INDEX=CRJBKU.SYS_IL0000378299C00031$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 ORPHAN IND$: OBJ=378315 DOBJ=378315 TS=59 RFILE/BLOCK=236/1249306 INDEX=CRJBKU.SYS_IL0000378299C00032$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299 .- OrphanedIndexPartition ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedIndexSubPartition ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedTable ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedTablePartition ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedTableSubPartition ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- MissingPartCol ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- OrphanedSeg$ ... 1102000400 <= *All Rel* 06/06 08:05:46 FAIL HCKE-0023: Orphaned SEG$ Entry (Doc ID 1360934.1) ORPHAN SEG$: SegType=TYPE2 UNDO TS=2 RFILE/BLOCK=109/352 ORPHAN SEG$: SegType=TYPE2 UNDO TS=2 RFILE/BLOCK=109/304 .- OrphanedIndPartObj# ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- DuplicateBlockUse ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- FetUet ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- Uet0Check ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- SeglessUET ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- BadInd$ ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- BadTab$ ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- BadIcolDepCnt ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- ObjIndDobj ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- TrgAfterUpgrade ... 1102000400 <= *All Rel* 06/06 08:05:46 PASS .- ObjType0 ... 1102000400 <= *All Rel* 06/06 08:05:47 PASS .- BadOwner ... 1102000400 <= *All Rel* 06/06 08:05:47 PASS .- StmtAuditOnCommit ... 1102000400 <= *All Rel* 06/06 08:05:47 PASS .- BadPublicObjects ... 1102000400 <= *All Rel* 06/06 08:05:47 PASS .- BadSegFreelist ... 1102000400 <= *All Rel* 06/06 08:05:47 PASS .- BadDepends ... 1102000400 <= *All Rel* 06/06 08:05:47 PASS .- CheckDual ... 1102000400 <= *All Rel* 06/06 08:05:48 PASS .- ObjectNames ... 1102000400 <= *All Rel* 06/06 08:05:48 PASS .- BadCboHiLo ... 1102000400 <= *All Rel* 06/06 08:05:49 PASS .- ChkIotTs ... 1102000400 <= *All Rel* 06/06 08:05:52 PASS .- NoSegmentIndex ... 1102000400 <= *All Rel* 06/06 08:05:52 PASS .- BadNextObject ... 1102000400 <= *All Rel* 06/06 08:05:53 PASS .- DroppedROTS ... 1102000400 <= *All Rel* 06/06 08:05:53 PASS .- FilBlkZero ... 1102000400 <= *All Rel* 06/06 08:05:53 PASS .- DbmsSchemaCopy ... 1102000400 <= *All Rel* 06/06 08:05:53 PASS .- OrphanedObjError ... 1102000400 > 1102000000 06/06 08:05:53 PASS .- ObjNotLob ... 1102000400 <= *All Rel* 06/06 08:05:53 PASS .- MaxControlfSeq ... 1102000400 <= *All Rel* 06/06 08:05:53 PASS .- SegNotInDeferredStg ... 1102000400 > 1102000000 06/06 08:05:55 PASS .- SystemNotRfile1 ... 1102000400 > 902000000 06/06 08:05:56 PASS .- DictOwnNonDefaultSYSTEM ... 1102000400 <= *All Rel* 06/06 08:05:56 PASS .- OrphanTrigger ... 1102000400 <= *All Rel* 06/06 08:05:56 PASS .- ObjNotTrigger ... 1102000400 <= *All Rel* 06/06 08:05:56 PASS --------------------------------------- 06-JUN-2019 08:05:56 Elapsed: 38 secs --------------------------------------- Found 19 potential problem(s) and 0 warning(s) Contact Oracle Support with the output and trace file to check if the above needs attention or not
我们可以看到数据字典确实有问题,比如ind$。针对这个问题有2种基本处理方式,可以参考官方的文档,分别时补充sys.objerror$的记录和删除记录(当然是针对不同的情况)。
我这里通过类似的方法处理后,可以顺利删除上述异常的一些对象(经客户确认数据可以丢弃,我们备份表结构后删除即可)。
整个恢复其实不复制,主要是前期无法远程,导致进展很慢,简单记录一下。