联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
接手客户一个云平台硬件故障恢复之后,数据库无法启动的case,通过分析alert日志,发现数据库在open过程中报ORA-01172: recovery of thread 1 stuck at block 2220167 of file 262,ORA-01151: use media recovery to recover block, restore backup if needed等相关错误(其实也就是在做实例恢复的过程中报了logically corrupt导致无法完成实例恢复
Sat Nov 01 14:29:10 2025 ALTER DATABASE OPEN Beginning crash recovery of 1 threads parallel recovery started with 15 processes Started redo scan Completed redo scan read 7034 KB redo, 937 data blocks need recovery Started redo application at Thread 1: logseq 296553, block 389408 Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2220584) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p009_2648.trc Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2218886) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p007_2644.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1db86 (file 262, block 2218886) Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1e228 (file 262, block 2220584) Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2219845) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_2646.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1df45 (file 262, block 2219845) Sat Nov 01 14:29:11 2025 Hex dump of (file 262, block 2220167) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_2632.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf' for corruption at rdba: 0x41a1e087 (file 262, block 2220167) Reread (file 262, block 2218886) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2218886 OF FILE 262 Reread (file 262, block 2220584) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2220584 OF FILE 262 Reread (file 262, block 2219845) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2219845 OF FILE 262 Reread (file 262, block 2220167) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2220167 OF FILE 262 Sat Nov 01 14:29:26 2025 Slave exiting with ORA-1172 exception Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p007_2644.trc: ORA-01172: recovery of thread 1 stuck at block 2218886 of file 262 ORA-01151: use media recovery to recover block, restore backup if needed Sat Nov 01 14:29:26 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_2646.trc: ORA-10388: parallel query server interrupt (failure) Sat Nov 01 14:29:26 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p009_2648.trc: ORA-10388: parallel query server interrupt (failure) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p008_2646.trc: ORA-10388: parallel query server interrupt (failure) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p009_2648.trc: ORA-10388: parallel query server interrupt (failure) Sat Nov 01 14:29:26 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_2632.trc: ORA-10388: parallel query server interrupt (failure) Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_p001_2632.trc: ORA-10388: parallel query server interrupt (failure) Sat Nov 01 14:29:26 2025 Aborting crash recovery due to slave death, attempting serial crash recovery Beginning crash recovery of 1 threads Started redo scan Completed redo scan read 7034 KB redo, 937 data blocks need recovery Started redo application at Thread 1: logseq 296553, block 389408 Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Hex dump of (file 262,block 2220167) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2606.trc Reading datafile '/data/orcl/datafile/xifenfei12.dbf'for corruption at rdba: 0x41a1e087 (file 262,block 2220167) Reread (file 262, block 2220167) found same corrupt data (logically corrupt) RECOVERY OF THREAD 1 STUCK AT BLOCK 2220167 OF FILE 262 Aborting crash recovery due to error 1172 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2606.trc: ORA-01172: recovery of thread 1 stuck at block 2220167 of file 262 ORA-01151: use media recovery to recover block, restore backup if needed Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_2606.trc: ORA-01172: recovery of thread 1 stuck at block 2220167 of file 262 ORA-01151: use media recovery to recover block, restore backup if needed ORA-1172 signalled during: ALTER DATABASE OPEN...
接手故障之后,尝试recover database恢复,结果报ORA-600 4552错误
SQL> recover database; ORA-10562: Error occurred while applying redo to data block (file# 262, block# 2222153) ORA-10564: tablespace XIFENFEI ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' ORA-00600: internal error code, arguments: [4552], [1], [0], [], [], [], [], [], [], [], [], []
关于ORA-600 4552对应的alert日志信息
Sat Nov 01 17:49:58 2025 ALTER DATABASE RECOVER database Media Recovery Start started logmerger process Parallel Media Recovery started with 16 slaves Sat Nov 01 17:49:59 2025 Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Sat Nov 01 17:49:59 2025 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr0c_28770.trc (incident=1018821): ORA-00600: internal error code, arguments: [4552], [1], [0], [], [], [], [], [], [], [], [], [] Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_1018821/orcl_pr0c_28770_i1018821.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Nov 01 17:50:03 2025 Sweep [inc][1018821]: completed Sweep [inc2][1018821]: completed Slave exiting with ORA-10562 exception Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr0c_28770.trc: ORA-10562: Error occurred while applying redo to data block (file# 262, block# 2222153) ORA-10564: tablespace xifenfei ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' ORA-00600: internal error code, arguments: [4552], [1], [0], [], [], [], [], [], [], [], [], [] Recovery Slave PR0C previously exited with exception 10562 Media Recovery failed with error 448 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_28732.trc: ORA-00283: recovery session canceled due to errors ORA-00448: normal completion of background process ORA-10562 signalled during: ALTER DATABASE RECOVER database ...
无法整个库级别recover,尝试数据文件recover操作
SQL> recover datafile 1; Media recovery complete. ………… SQL> recover datafile 22,23,24,26,25,27,28,29,30; Media recovery complete. ………… SQL> recover datafile 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261; Media recovery complete. SQL> recover datafile 262; ORA-00283: recovery session canceled due to errors ORA-00600: internal error code, arguments: [3020], [262], [2215808], [1101123456], [], [], [], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 262, block# 2215808, file offset is 972029952 bytes) ORA-10564: tablespace XIFENFEI ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' SQL> recover datafile 263; Media recovery complete.
出file# 262数据文件之外,其他文件全部recover成功,对应的ORA-600 3020错误相关alert日志信息
Sat Nov 01 17:53:37 2025 ALTER DATABASE RECOVER datafile 262 Media Recovery Start Serial Media Recovery started Recovery of Online Redo Log: Thread 1 Group 3 Seq 296553 Reading mem 0 Mem# 0: /data/orcl/onlinelog/redo03a.log Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_28561.trc (incident=1018717): ORA-00600: internal error code, arguments: [3020], [262], [2215808], [1101123456], [], [], [], [], [] ORA-10567: Redo is inconsistent with data block (file# 262, block# 2215808, file offset is 972029952 bytes) ORA-10564: tablespace xifenfei ORA-01110: data file 262: '/data/orcl/datafile/xifenfei12.dbf' ORA-10560: block type '0' Incident details in:/u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_1018717/orcl_ora_28561_i1018717.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Media Recovery failed with error 600 ORA-283 signalled during: ALTER DATABASE RECOVER datafile 262 ...
对于这种情况,有两种处理方式:
1)在recover过程中对于报错的block标记为坏块,然后继续恢复,这样正常应用日志成功,再把标记的坏块修复好
2)直接修改该文件头跳过该文件跳过这些block的应用日志,直接骗过数据库
在本case中由于客户急着恢复业务,需要尽快处理,所以采用了第一个方案,这里我使用自研的m_scn(modify_scn)工具快速修改相关数据文件信息
[oracle@host-172-18-50-10 tmp]$ cat 1.txt 1@/data/orcl/datafile/system01.dbf 262@/data/orcl/datafile/xifenfei12.dbf [oracle@host-172-18-50-10 tmp]$ ./m_scn 1.txt Please Enter Password: ===== Starting Datafile Header modification program ===== Datafile list file: 1.txt Operation Mode: Only Modify Datafile Header CheckPoint Block Size: 8192 Log Path: /tmp/modify_scn --------------------------------------------------------- Preparing Datafile list file... Verifying Datafile existence... Datafile verification passed Initializing working directory... Recovery script created: /tmp/modify_scn/backup/recover_datafile.sh --------------------------------------------------------- Starting Datafile Header processing (total 2 files)... [1/2] Processing Datafile Header: /data/orcl/datafile/system01.dbf (File number: 1) - Skipping file number 1 (control file) --------------------------------------------------------- [2/2] Processing Datafile Header: /data/orcl/datafile/xifenfei12.dbf (File number: 262) - Backing up Datafile header... - Executing Datafile Header modification with block size 8192... - Datafile Header processing completed --------------------------------------------------------- Cleaning up temporary files... ================= All operations completed ================= Note: Execute /tmp/modify_scn/backup/recover_datafile.sh operation for rollback
然后查询相关scn信息,确认修改文件信息没有问题并尝试recover 262号文件
[oracle@host-172-18-50-10 tmp]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Nov 1 18:02:18 2025
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 4.1823E+10 bytes
Fixed Size 2262368 bytes
Variable Size 4294970016 bytes
Database Buffers 3.7447E+10 bytes
Redo Buffers 78614528 bytes
Database mounted.
SQL> set pages 10000
SQL> set numw 16
SQL> SELECT status,
2 checkpoint_change#,
3 to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,
4 last_change#,
5 count(*) ROW_NUM
FROM v$datafile
6 7 GROUP BY status, checkpoint_change#, checkpoint_time,last_change#
ORDER BY status, checkpoint_change#, checkpoint_time;
8
set numw 16
col CHECKPOINT_TIME for a40
set lines 150
set pages 1000
SELECT status,
to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss') checkpoint_time,FUZZY,checkpoint_change#,
count(*) ROW_NUM
FROM v$datafile_header
GROUP BY status, checkpoint_change#, to_char(checkpoint_time,'yyyy-mm-dd hh24:mi:ss'),fuzzy
ORDER BY status, checkpoint_change#, checkpoint_time;
SELECT dd.FILE#,
dd.NAME,
dd.STATUS,
dd.checkpoint_change# dfile_chkp_change,
dh.checkpoint_change# dfile_hed_chkp_change,
dh.recover,
dh.fuzzy
FROM v$datafile dd,
v$datafile_header dh
WHERE dd.FILE#=dh.FILE#
AND dd.checkpoint_change#<>dh.checkpoint_change#;
STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME LAST_CHANGE# ROW_NUM
------- ------------------ ------------------- ---------------- ----------------
ONLINE 16816934458875 2025-11-01 17:58:28 16816934458875 258
RECOVER 16816934368799 2025-11-01 05:29:39 16816934456943 1
SYSTEM 16816934458875 2025-11-01 17:58:28 16816934458875 4
SQL> SQL> SQL> SQL> SQL> SQL> SQL> 2 3 4 5 6
STATUS CHECKPOINT_TIME FUZ CHECKPOINT_CHANGE# ROW_NUM
------- ---------------------------------------- --- ------------------ ----------------
OFFLINE 2025-11-01 17:58:28 NO 16816934458875 1
ONLINE 2025-11-01 17:58:28 NO 16816934458875 262
SQL> SQL> 2 3 4 5 6 7 8 9 10 11
FILE#
----------------
NAME
----------------------------------------------------------------------------------
STATUS DFILE_CHKP_CHANGE DFILE_HED_CHKP_CHANGE REC FUZ
------- ----------------- --------------------- --- ---
262
/data/orcl/datafile/xifenfei12.dbf
RECOVER 16816934368799 16816934458875 YES NO
SQL> recover datafile 262;
Media recovery complete.
open数据库成功
SQL> alter database open; Database altered.
Sat Nov 01 18:06:00 2025 ALTER DATABASE OPEN Thread 1 opened at log sequence 296554 Current log# 1 seq# 296554 mem# 0: /data/orcl/onlinelog/redo01a.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery [33941] Successfully onlined Undo Tablespace 143. Undo initialization finished serial:0 start:12793234 end:12793304 diff:70 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Sat Nov 01 18:06:01 2025 QMNC started with pid=20, OS id=33973 Completed: ALTER DATABASE OPEN
至此基本上完成本次恢复任务,后续根据alert日志,有个别表可能由于在file# 262中丢失一些数据导致不一致的问题进行单独,其他没有太大问题,最快帮客户恢复了业务

