某省GA 30TB数据库恢复记录

本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger的Oracle&MySQL技术博客

本文链接地址: 某省GA 30TB数据库恢复记录

前不久某省GA的由于存储掉电导致多套数据库无法启动,其中恢复某套大库时遇到了一些新的问题,之前从来没有遇到过。开始整个磁盘组都无法mount,经过抢修后成功打开了磁盘组。

Tue Jun 04 20:53:05 2019
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
NOTE: ASM instance returned error dumped to trace file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_111035.trc
Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_111035.trc:
ORA-00600: internal error code, arguments: [ORA_NPI_ERROR], [600], [ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Tue Jun 04 20:53:05 2019
SUCCESS: diskgroup xxxx was dismounted
SUCCESS: diskgroup xxxx was dismounted
Tue Jun 04 20:53:05 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_105457.trc:
ORA-00202: control file: '+xxxx/xxxx/controlfile/current.260.958145939'
ORA-15078: ASM diskgroup was forcibly dismounted
Tue Jun 04 20:53:06 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ckpt_105541.trc:
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+xxxx/xxxx/controlfile/current.260.958145939'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ckpt_105541.trc:
ORA-00221: error on write to control file
ORA-00206: error in writing (block 3, # blocks 1) of control file
ORA-00202: control file: '+xxxx/xxxx/controlfile/current.260.958145939'
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
CKPT (ospid: 105541): terminating the instance due to error 221
Tue Jun 04 20:53:06 2019
System state dump requested by (instance=1, osid=105541 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_diag_105444_20190604205306.trc
Dumping diagnostic data in directory=[cdmp_20190604205306], requested by (instance=1, osid=105541 (CKPT)), summary=[abnormal instance termination].
Instance terminated by CKPT, pid = 105541
。。。。。。。。。


Dump continued from file: /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_32878.trc
ORA-15335: ASM metadata corruption detected in disk group 'xxxx'
ORA-15130: diskgroup "xxxx" is being dismounted
ORA-15066: offlining disk "xxxx_0015" in group "xxxx" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26368] [chec

经过一番抢救之后,数据库终于打开了,但是无法进行任何DDL操作,创建测试表报错:

SQL> create table test110 as select * from dba_objects where rownum <10;
create table test110 as select * from dba_objects where rownum <10
                                      *
ERROR at line 1:
ORA-00600: internal error code, arguments: [kqd-objerror$], [U], [2], [107],
[PG_TJBB_DTZH], [], [], [], [], [], [], []

这个错误之前很少见到。我们跑一下Oracle 官方提供的hcheck脚本,检查了一下数据库中的数据字典情况,发现确实存在一些不一致的问题,如下:

Catalog	 Fixed
Procedure Name			   Version    Vs Release    Timestamp	   Result
------------------------------ ... ---------- -- ---------- -------------- ------
.- LobNotInObj		       ... 1102000400 <=  *All Rel* 06/06 08:05:19 PASS
.- MissingOIDOnObjCol	       ... 1102000400 <=  *All Rel* 06/06 08:05:19 PASS
.- SourceNotInObj	       ... 1102000400 <=  *All Rel* 06/06 08:05:19 PASS
.- OversizedFiles	       ... 1102000400 <=  *All Rel* 06/06 08:05:41 PASS
.- PoorDefaultStorage	       ... 1102000400 <=  *All Rel* 06/06 08:05:42 PASS
.- PoorStorage		       ... 1102000400 <=  *All Rel* 06/06 08:05:42 PASS
.- TabPartCountMismatch        ... 1102000400 <=  *All Rel* 06/06 08:05:42 PASS
.- OrphanedTabComPart	       ... 1102000400 <=  *All Rel* 06/06 08:05:42 PASS
.- MissingSum$		       ... 1102000400 <=  *All Rel* 06/06 08:05:42 PASS
.- MissingDir$		       ... 1102000400 <=  *All Rel* 06/06 08:05:42 PASS
.- DuplicateDataobj	       ... 1102000400 <=  *All Rel* 06/06 08:05:42 FAIL

HCKE-0014: Duplicate dataobj# (Doc ID 1360519.1)
DATAOBJ#=378264 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_DWXXGLB_190603 Type=TABLE
DATAOBJ#=378264 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_DWXXGLB_190603 Type=TABLE
DATAOBJ#=378265 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_STYSB_190603 Type=TABLE
DATAOBJ#=378265 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_MH_STYSB_190603 Type=TABLE
DATAOBJ#=378266 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_SAVECOLUMNINFO_190603 Type=TABLE
DATAOBJ#=378266 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_SAVECOLUMNINFO_190603 Type=TABLE
DATAOBJ#=378283 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_YH_DWXX_190603 Type=TABLE
DATAOBJ#=378283 Tablespace=TBSCRJ_DATA_2 Name=CRJBKU.D_YH_DWXX_190603 Type=TABLE

.- ObjSynMissing	       ... 1102000400 <=  *All Rel* 06/06 08:05:44 PASS
.- ObjSeqMissing	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedUndo 	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedIndex	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 FAIL

HCKE-0016: Orphaned IND$ (no SEG$) (Doc ID 1360531.1)
ORPHAN IND$: OBJ=378298 DOBJ=378298 TS=59 RFILE/BLOCK=59/3827354 INDEX=CRJBKU.SYS_IL0000378296C00004$$ OF TABLE=CRJBKU.D_YH_XP_190603 BO#=378296
ORPHAN IND$: OBJ=378301 DOBJ=378301 TS=59 RFILE/BLOCK=236/1244970 INDEX=CRJBKU.SYS_IL0000378299C00009$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378303 DOBJ=378303 TS=59 RFILE/BLOCK=236/1244986 INDEX=CRJBKU.SYS_IL0000378299C00010$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378305 DOBJ=378305 TS=59 RFILE/BLOCK=236/1245002 INDEX=CRJBKU.SYS_IL0000378299C00013$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378307 DOBJ=378307 TS=59 RFILE/BLOCK=236/1245018 INDEX=CRJBKU.SYS_IL0000378299C00023$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378309 DOBJ=378309 TS=59 RFILE/BLOCK=236/1245034 INDEX=CRJBKU.SYS_IL0000378299C00024$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378311 DOBJ=378311 TS=59 RFILE/BLOCK=236/1245050 INDEX=CRJBKU.SYS_IL0000378299C00025$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378313 DOBJ=378313 TS=59 RFILE/BLOCK=236/1249290 INDEX=CRJBKU.SYS_IL0000378299C00031$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299
ORPHAN IND$: OBJ=378315 DOBJ=378315 TS=59 RFILE/BLOCK=236/1249306 INDEX=CRJBKU.SYS_IL0000378299C00032$$ OF TABLE=CRJBKU.D_YH_ZW_190603 BO#=378299

.- OrphanedIndexPartition      ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedIndexSubPartition   ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedTable	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedTablePartition      ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedTableSubPartition   ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- MissingPartCol	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- OrphanedSeg$ 	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 FAIL

HCKE-0023: Orphaned SEG$ Entry (Doc ID 1360934.1)
ORPHAN SEG$: SegType=TYPE2 UNDO TS=2 RFILE/BLOCK=109/352
ORPHAN SEG$: SegType=TYPE2 UNDO TS=2 RFILE/BLOCK=109/304

.- OrphanedIndPartObj#	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- DuplicateBlockUse	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- FetUet		       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- Uet0Check		       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- SeglessUET		       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- BadInd$		       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- BadTab$		       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- BadIcolDepCnt	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- ObjIndDobj		       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- TrgAfterUpgrade	       ... 1102000400 <=  *All Rel* 06/06 08:05:46 PASS
.- ObjType0		       ... 1102000400 <=  *All Rel* 06/06 08:05:47 PASS
.- BadOwner		       ... 1102000400 <=  *All Rel* 06/06 08:05:47 PASS
.- StmtAuditOnCommit	       ... 1102000400 <=  *All Rel* 06/06 08:05:47 PASS
.- BadPublicObjects	       ... 1102000400 <=  *All Rel* 06/06 08:05:47 PASS
.- BadSegFreelist	       ... 1102000400 <=  *All Rel* 06/06 08:05:47 PASS
.- BadDepends		       ... 1102000400 <=  *All Rel* 06/06 08:05:47 PASS
.- CheckDual		       ... 1102000400 <=  *All Rel* 06/06 08:05:48 PASS
.- ObjectNames		       ... 1102000400 <=  *All Rel* 06/06 08:05:48 PASS
.- BadCboHiLo		       ... 1102000400 <=  *All Rel* 06/06 08:05:49 PASS
.- ChkIotTs		       ... 1102000400 <=  *All Rel* 06/06 08:05:52 PASS
.- NoSegmentIndex	       ... 1102000400 <=  *All Rel* 06/06 08:05:52 PASS
.- BadNextObject	       ... 1102000400 <=  *All Rel* 06/06 08:05:53 PASS
.- DroppedROTS		       ... 1102000400 <=  *All Rel* 06/06 08:05:53 PASS
.- FilBlkZero		       ... 1102000400 <=  *All Rel* 06/06 08:05:53 PASS
.- DbmsSchemaCopy	       ... 1102000400 <=  *All Rel* 06/06 08:05:53 PASS
.- OrphanedObjError	       ... 1102000400 >  1102000000 06/06 08:05:53 PASS
.- ObjNotLob		       ... 1102000400 <=  *All Rel* 06/06 08:05:53 PASS
.- MaxControlfSeq	       ... 1102000400 <=  *All Rel* 06/06 08:05:53 PASS
.- SegNotInDeferredStg	       ... 1102000400 >  1102000000 06/06 08:05:55 PASS
.- SystemNotRfile1	       ... 1102000400 >   902000000 06/06 08:05:56 PASS
.- DictOwnNonDefaultSYSTEM     ... 1102000400 <=  *All Rel* 06/06 08:05:56 PASS
.- OrphanTrigger	       ... 1102000400 <=  *All Rel* 06/06 08:05:56 PASS
.- ObjNotTrigger	       ... 1102000400 <=  *All Rel* 06/06 08:05:56 PASS
---------------------------------------
06-JUN-2019 08:05:56  Elapsed: 38 secs
---------------------------------------
Found 19 potential problem(s) and 0 warning(s)
Contact Oracle Support with the output and trace file
to check if the above needs attention or not

我们可以看到数据字典确实有问题,比如ind$。针对这个问题有2种基本处理方式,可以参考官方的文档,分别时补充sys.objerror$的记录和删除记录(当然是针对不同的情况)。

我这里通过类似的方法处理后,可以顺利删除上述异常的一些对象(经客户确认数据可以丢弃,我们备份表结构后删除即可)。

 

整个恢复其实不复制,主要是前期无法远程,导致进展很慢,简单记录一下。

此条目发表在 未分类 分类目录。将固定链接加入收藏夹。

评论功能已关闭。