本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger的Oracle&MySQL技术博客
本文链接地址: 一个135TB的Oracle DataBase恢复case
前几天帮助某用户进行了一次数据库恢复操作;通过检查数据库文件发现发现整个库的文件Size之和超过135TB,其中段大小超过132TB:
SQL> select sum(bytes/1024/1024/1024/1024) "TB" from dba_segments; SUM(BYTES/1024/1024/1024/1024) "TB" ------------------------------ 132.866246
首先我们来看看数据库的恢复情况,考虑到客户信息涉密文件,这里我仅做简单的描述:
Wed May 16 15:34:22 2018 Errors in file /oracle/admin/yddb/udump/yddb1_ora_3081048.trc: ORA-00600: internal error code, arguments: [504], [0x70000001000AF08], [1], [0], [slave class create], [0], [0], [0x700000501CE7C08] Wed May 16 15:34:22 2018 Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 22479) ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071' Wed May 16 15:34:22 2018 Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 22479) ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071' Wed May 16 15:34:23 2018 Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 22479) ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071' Wed May 16 15:34:24 2018 Errors in file /oracle/admin/xxxx/udump/xxxx1_ora_3081048.trc: ORA-00600: internal error code, arguments: [504], [0x70000001000AF08], [1], [0], [slave class create], [0], [0], [0x700000501CE7C08] Wed May 16 15:34:24 2018 Errors in file /oracle/admin/xxxx/udump/xxxx1_ora_3081048.trc: ORA-00600: internal error code, arguments: [4193], [64], [69], [], [], [], [], [] Wed May 16 15:34:24 2018 Trace dumping is performing id=[cdmp_20180516153424] Wed May 16 15:34:24 2018 Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 22479) ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071' Wed May 16 15:34:25 2018 Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 22479) ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071' Wed May 16 15:34:25 2018 DEBUG: Replaying xcb 0x7000004fc60b848, pmd 0x700000501ce8c48 for failed op 8 Doing block recovery for file 2 block 508535 No block recovery was needed Wed May 16 15:34:35 2018 Errors in file /oracle/admin/xxxx/bdump/xxxx1_smon_2818542.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 22479) ORA-01110: data file 1: '+DISKGROUP1/xxxx/datafile/system.259.738348071'
这里可以发现Oracle SMON进程在进行事务恢复时报错,同时alert抛出了ORA-00600 [4193]错误;而该错误则与Undo表空间存在联系。
由此可见Undo表空间也存在异常。对于system数据字典的file #1 block 22479号block。这里我们首先将其从ASM磁盘组copy到文件系统,然后通过提前编译好的Oracle内置block edit tools进行修改,将该block中涉及到的未提交事务强制进行提交;通过人为沟通已提交事务的方式,来绕过Oracle对该Block涉及的事务进行回滚(该数据块涉及的对象为obj#=8,是一个Cluster block,修改相对复杂一些)。
修改完该block之后,再将该block copy到asm中,再次启动数据库,发现能够顺利open数据库。
其次由于undo表空间也存在异常,因此这里需要将undo表空间进行重建,如下是重建的步骤:
1) 创建新的undo表空间 create undo tablespace undotbs11 datafile '+DISKGROUP1' size 4096m; 2) 修改参数 alter systen set undo_tablespace=undotbs11 scope=spfile sid='xxxx1'; 3) 删除损坏的undo表空间 drop tablespace undotbs1 including contents and datafiles;
重建完undo之后,接下来就是处理坏块相关内容,300多个坏块这里分别进行了排查,基本上集中在某一天的数据上,其中还有个别是Index。处理相对简单一些,这里不再重复!
整个恢复Case并不复杂,这里不再重复,相关ORA-00600错误的处理方法在blog中均可搜索到。