联系:手机(17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,请不得以任何形式转载,否则有进一步追究法律责任的权利.]
接到网友请求,由于操作人员粗心把asm disk的磁盘映射到另外的机器上,并且格式化为了win ntfs文件系统,导致asm 磁盘组异常,数据库无法使用
asm 日志报ORA-27072错
Mon Nov 30 12:00:13 2015 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27070: async read/write failed OSD-04008: WriteFile() 失败, 无法写入文件 O/S-Error: (OS 21) 设备未就绪。 WARNING: IO Failed. group:1 disk(number.incarnation):0.0xf0f0bbfb disk_path:\\.\ORCLDISKDATA0 AU:1 disk_offset(bytes):2093056 io_size:4096 operation:Write type:synchronous result:I/O error process_id:868 WARNING: disk 0.4042308603 (DATA_0000) not responding to heart beat ERROR: too many offline disks in PST (grp 1) WARNING: Disk DATA_0000 in mode 0x7f will be taken offline Mon Nov 30 12:00:13 2015 NOTE: process 576:37952 initiating offline of disk 0.4042308603 (DATA_0000) with mask 0x7e in group 1 WARNING: Disk DATA_0000 in mode 0x7f is now being taken offline NOTE: initiating PST update: grp = 1, dsk = 0/0xf0f0bbfb, mode = 0x15 kfdp_updateDsk(): 5 kfdp_updateDskBg(): 5 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):1.0xf0f0bbfc disk_path:\\.\ORCLDISKDATA1 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):1.0xf0f0bbfc disk_path:\\.\ORCLDISKDATA1 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):2.0xf0f0bbfd disk_path:\\.\ORCLDISKDATA2 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):2.0xf0f0bbfd disk_path:\\.\ORCLDISKDATA2 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):3.0xf0f0bbfe disk_path:\\.\ORCLDISKDATA3 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):3.0xf0f0bbfe disk_path:\\.\ORCLDISKDATA3 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):4.0xf0f0bbff disk_path:\\.\ORCLDISKDATA4 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):4.0xf0f0bbff disk_path:\\.\ORCLDISKDATA4 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):6.0xf0f0bc01 disk_path:\\.\ORCLDISKDATA6 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):6.0xf0f0bc01 disk_path:\\.\ORCLDISKDATA6 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):7.0xf0f0bc02 disk_path:\\.\ORCLDISKDATA7 AU:1 disk_offset(bytes):1048576 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 Errors in file c:\app\administrator\diag\asm\+asm\+asm\trace\+asm_gmon_868.trc: ORA-27072: File I/O error WARNING: IO Failed. group:1 disk(number.incarnation):7.0xf0f0bc02 disk_path:\\.\ORCLDISKDATA7 AU:1 disk_offset(bytes):1052672 io_size:4096 operation:Read type:synchronous result:I/O error process_id:868 ERROR: no PST quorum in group: required 1, found 0 WARNING: Disk DATA_0000 in mode 0x7f offline aborted Mon Nov 30 12:00:14 2015 SQL> alter diskgroup DATA dismount force /* ASM SERVER */ NOTE: cache dismounting (not clean) group 1/0xBB404B03 (DATA) Mon Nov 30 12:00:14 2015 NOTE: halting all I/Os to diskgroup DATA Mon Nov 30 12:00:14 2015 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=367.7265 last written ABA 367.7265 NOTE: cache dismounted group 1/0xBB404B03 (DATA) kfdp_dismount(): 6 kfdp_dismountBg(): 6 NOTE: De-assigning number (1,0) from disk (\\.\ORCLDISKDATA0) NOTE: De-assigning number (1,1) from disk (\\.\ORCLDISKDATA1) NOTE: De-assigning number (1,2) from disk (\\.\ORCLDISKDATA2) NOTE: De-assigning number (1,3) from disk (\\.\ORCLDISKDATA3) NOTE: De-assigning number (1,4) from disk (\\.\ORCLDISKDATA4) NOTE: De-assigning number (1,5) from disk (\\.\ORCLDISKDATA5) NOTE: De-assigning number (1,6) from disk (\\.\ORCLDISKDATA6) NOTE: De-assigning number (1,7) from disk (\\.\ORCLDISKDATA7) SUCCESS: diskgroup DATA was dismounted NOTE: cache deleting context for group DATA 1/-1153414397 SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER */ ERROR: PST-initiated MANDATORY DISMOUNT of group DATA
这里的asm日志很明显由于asm disk无法正常访问,报ORA-27072错误,磁盘组强制dismount.
分析磁盘情况
通过与客户沟通,确定从I到O本为asm disk 被格式化为了NTFS文件系统的磁盘,结合asmtool分析可以发现还有一个asm disk没有格式化掉,该磁盘组中一个共有8个磁盘格式化掉了7个.
通过kfed分析磁盘信息
C:\Users\Administrator>kfed read '\\.\J:' kfbh.endian: 235 ; 0x000: 0xeb kfbh.hard: 82 ; 0x001: 0x52 kfbh.type: 144 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 78 ; 0x003: 0x4e kfbh.block.blk: 542328404 ; 0x004: T=0 NUMB=0x20534654 kfbh.block.obj: 2105376 ; 0x008: TYPE=0x0 NUMB=0x2020 kfbh.check: 2050 ; 0x00c: 0x00000802 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 63488 ; 0x014: 0x0000f800 kfbh.spare1: 16711743 ; 0x018: 0x00ff003f kfbh.spare2: 2048 ; 0x01c: 0x00000800 ERROR!!!, failed to get the oracore error message C:\Users\Administrator>kfed read '\\.\J:' blkn=2 kfbh.endian: 70 ; 0x000: 0x46 kfbh.hard: 73 ; 0x001: 0x49 kfbh.type: 76 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 69 ; 0x003: 0x45 kfbh.block.blk: 196656 ; 0x004: T=0 NUMB=0x30030 kfbh.block.obj: 33563364 ; 0x008: TYPE=0x0 NUMB=0x22e4 kfbh.check: 0 ; 0x00c: 0x00000000 kfbh.fcn.base: 65537 ; 0x010: 0x00010001 kfbh.fcn.wrap: 65592 ; 0x014: 0x00010038 kfbh.spare1: 416 ; 0x018: 0x000001a0 kfbh.spare2: 1024 ; 0x01c: 0x00000400 ERROR!!!, failed to get the oracore error message C:\Users\Administrator>kfed read '\\.\J:' blkn=256 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 13 ; 0x002: KFBTYP_PST_NONE kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 2147483648 ; 0x004: T=1 NUMB=0x0 kfbh.block.obj: 2147483654 ; 0x008: TYPE=0x8 NUMB=0x6 kfbh.check: 17662471 ; 0x00c: 0x010d8207 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 ERROR!!!, failed to get the oracore error message C:\Users\Administrator>kfed read '\\.\J:' blkn=510 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 254 ; 0x004: T=0 NUMB=0xfe kfbh.block.obj: 2147483654 ; 0x008: TYPE=0x8 NUMB=0x6 kfbh.check: 717599272 ; 0x00c: 0x2ac5b228 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISKDATA6 ; 0x000: length=13 kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144 kfdhdb.driver.reserved[1]: 54 ; 0x00c: 0x00000036 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 …………
通过分析,可以确定asm disk的备份block没有被覆盖,原则上可以通过备份block实现磁盘组恢复,从而减小了恢复难度
kfed恢复磁盘头
C:\Users\Administrator> kfed repair '\\.\J:' C:\Users\Administrator>kfed read '\\.\J:' kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 254 ; 0x004: T=0 NUMB=0xfe kfbh.block.obj: 2147483654 ; 0x008: TYPE=0x8 NUMB=0x6 kfbh.check: 717599272 ; 0x00c: 0x2ac5b228 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdhdb.driver.provstr: ORCLDISKDATA6 ; 0x000: length=13 kfdhdb.driver.reserved[0]: 1096040772 ; 0x008: 0x41544144 kfdhdb.driver.reserved[1]: 54 ; 0x00c: 0x00000036 kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000 kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000 kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000 kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000 …………
确定asm disk相关信息
对于7个被格式化的磁盘都进行类似处理之后,通过工具看到相关磁盘信息如下
恢复处理
根据ntfs的文件系统分布,我们可以知道,虽然asm disk header备份block正常,但是asm disk中间部分依旧有不少au会被破坏
这样的情况,不合适直接使用工具拷贝出来datafile(由于可能记录block的字典正好被覆盖,导致拷贝出来的文件异常,在恢复过程中我们也做了试验小文件拷贝ok,大文件拷贝然后使用dbv检测有很多坏块),我们采用工具(asm disk header 彻底损坏恢复)从底层扫描直接重组出来asm disk中的数据文件,然后结合拷贝出来的控制文件,redo文件,参数文件,然后通过重命名相关路径,然后直接open数据库
Q:\>sqlplus / as sysdba SQL*Plus: Release 11.2.0.1.0 Production on 星期三 1月 22 16:08:18 2014 Copyright (c) 1982, 2010, Oracle. All rights reserved. 连接到: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> set pages 1000 SQL> col name for a100 SQL> set lines 150 SQL> select file#,name from v$datafile; FILE# NAME ---------- -------------------------------------------------------------------- 1 +DATA/vspdb/datafile/system.256.778520603 2 +DATA/vspdb/datafile/sysaux.257.778520603 3 +DATA/vspdb/datafile/undotbs1.258.778520603 4 +DATA/vspdb/datafile/users.259.778520603 5 +DATA/vspdb/datafile/vsp_tbs.293.779926097 ………… 147 +DATA/vspdb/datafile/index_dg.418.864665747 148 +DATA/vspdb/datafile/data_dg.419.864667053 149 +DATA/vspdb/datafile/vsp_mm_tbs.420.890410367 150 +DATA/vspdb/datafile/vsp_mm_tbs.421.890410457 SQL> select member from v$logfile; MEMBER ------------------------------------------------------------------------------------- +DATA/vspdb/onlinelog/group_7.263.862676593 +DATA/vspdb/onlinelog/group_7.262.862676601 +DATA/vspdb/onlinelog/group_4.410.862652291 +DATA/vspdb/onlinelog/group_4.411.862652307 +DATA/vspdb/onlinelog/group_5.412.862653715 +DATA/vspdb/onlinelog/group_5.413.862653727 +DATA/vspdb/onlinelog/group_6.414.862676425 +DATA/vspdb/onlinelog/group_6.415.862676433 重命名数据文件和redo文件,open数据库 SQL> recover database; 完成介质恢复。 SQL> alter database open; 数据库已更改。 已用时间: 00: 00: 04.51
由于部分block被覆盖,使用空块代替,导致数据访问到该block就会出现ora-8103(模拟普通ORA-08103并解决,模拟极端ORA-08103并解决)错误,对于该种对象,最简单处理方法就是直接通过dul抽出来数据然后truncate table重新导入数据,当然如果你想彻底安全逻辑方式重建库最靠谱