联系:手机(17813235971) QQ(107644445)
标题:分区无法识别导致asm diskgroup无法mount
作者:惜分飞©版权所有[未经本人同意,请不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户咨询由于主机重启之后,导致四个磁盘组中的data2磁盘组无法mount(报ORA-15032,ORA-15017,ORA-15063),数据库无法open,让我们帮忙分析解决
Wed Mar 09 18:10:53 2016 NOTE: Assigning number (1,1) to disk (/dev/oracleasm/disks/VOL011) Wed Mar 09 18:10:53 2016 ERROR: no read quorum in group: required 1, found 0 disks NOTE: cache dismounting (clean) group 1/0xBD42B778 (DATA2) NOTE: messaging CKPT to quiesce pins Unix process pid: 45093, image: oracle@BA (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 1/0xBD42B778 (DATA2) NOTE: cache ending mount (fail) of group DATA2 number=1 incarn=0xbd42b778 NOTE: cache deleting context for group DATA2 1/0xbd42b778 GMON dismounting group 1 at 16 for pid 18, osid 45093 NOTE: Disk DATA2_0001 in mode 0x9 marked for de-assignment ERROR: diskgroup DATA2 was not mounted ORA-15032: not all alterations performed ORA-15017: diskgroup "DATA2" cannot be mounted ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA2" ERROR: ALTER DISKGROUP DATA2 MOUNT /* asm agent *//* {0:0:431} */
这里很明显由于缺少asm disk导致data2无法正常mount,进一步分析发现data2是有两块磁盘组成
Mon Sep 14 13:14:35 2015 SQL> create diskgroup data2 external redundancy disk '/dev/oracleasm/disks/VOL010','/dev/oracleasm/disks/VOL011' NOTE: Assigning number (4,0) to disk (/dev/oracleasm/disks/VOL010) NOTE: Assigning number (4,1) to disk (/dev/oracleasm/disks/VOL011) NOTE: initializing header on grp 4 disk DATA2_0000 NOTE: initializing header on grp 4 disk DATA2_0001 NOTE: initiating PST update: grp = 4 Mon Sep 14 13:14:35 2015 GMON updating group 4 at 29 for pid 26, osid 51535 NOTE: group DATA2: initial PST location: disk 0000 (PST copy 0) NOTE: PST update grp = 4 completed successfully NOTE: cache registered group DATA2 number=4 incarn=0xea085f62 NOTE: cache began mount (first) of group DATA2 number=4 incarn=0xea085f62 NOTE: cache opening disk 0 of grp 4: DATA2_0000 path:/dev/oracleasm/disks/VOL010 NOTE: cache opening disk 1 of grp 4: DATA2_0001 path:/dev/oracleasm/disks/VOL011 NOTE: cache creating group 4/0xEA085F62 (DATA2) NOTE: cache mounting group 4/0xEA085F62 (DATA2) succeeded NOTE: allocating F1X0 on grp 4 disk DATA2_0000 NOTE: diskgroup must now be re-mounted prior to first use NOTE: cache dismounting (clean) group 4/0xEA085F62 (DATA2) NOTE: messaging CKPT to quiesce pins Unix process pid: 51535, image: oracle@BA (TNS V1-V3) NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 4/0xEA085F62 (DATA2) GMON dismounting group 4 at 30 for pid 26, osid 51535 GMON dismounting group 4 at 31 for pid 26, osid 51535 NOTE: Disk DATA2_0000 in mode 0x7e marked for de-assignment NOTE: Disk DATA2_0001 in mode 0x7e marked for de-assignment SUCCESS: diskgroup DATA2 was created
结合这部分信息,我们可以确定data2磁盘组是由两个磁盘组构成,分别为VOL010和VOL011,现在由于只发现了VOL011,因此data2磁盘组无法正常mount.观察发现该系统使用的是asmlib,通过oracleasm querydisk命令结合fdisk的盘符,
基本上可以确定VOL010丢失应该在mpathb盘(由于只有该盘和分区未被使用,其他盘和分区已经全部被现在可以查询到的asmlib使用作为asmdisk)之上
Disk /dev/mapper/mpathb: 3846.7 GB, 3846677987328 bytes 255 heads, 63 sectors/track, 467665 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/mapper/mpathbp1 1 267350 2147483647+ ee GPT Disk /dev/mapper/mpathbp1: 3846.7 GB, 3846675890176 bytes 255 heads, 63 sectors/track, 467665 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xb84bb99a Device Boot Start End Blocks Id System /dev/mapper/mpathbp1p1 1 200513 1610620641 83 Linux /dev/mapper/mpathbp1p2 200514 267349 536860170 83 Linux /dev/mapper/mpathbp1p3 267350 467665 1609038270 83 Linux
这里我们发现奇怪现象:mpathb盘先使用parted分为一个mapthbp1分区,然后又使用fdisk分了三个p1p1,p1p2,p1p3三个子分区.然后我们查看/dev/mapper/中的设备情况
发现没有p1p1,p1p2,p1p3这三个本该属于mapthb上的子分区.现在基本上明确,是由于对mapthb先使用了parted分区,然后再使用fdisk分区,在操作系统重启之后,无法正常识别相关子分区导致该问题.到此解决该问题的思路有三种.
1. 因为磁盘分区表信息是正常的,就是分区表信息没有同步到操作系统之上,想办法同步过去即可,os部分内容,此处忽略
2. 使用数据文件重组的方式直接对data2这两个asm disk进行重组,这里因为三个子分区未发现,直接对mapthbp1分区进行扫描即可,参考:asm disk header 彻底损坏恢复
3. 因为分区对于asm disk来说主要就是设置了磁盘的偏移量和大小,如果找到磁盘的偏移量,然后确定asm disk大小,直接通过dd命令把该部分dd到新的磁盘设备之上,然后直接mount磁盘组即可,这里重点讲解第三种方法恢复处理
使用dd出来mapthp1的磁盘头,然后使用bbed找出来偏移量,主要依据是第一次出现01820101信息的部分
BBED> d File: bp1 (0) Block: 64 Offsets: 0 to 63 Dba:0x00000000 ------------------------------------------------------------------------ 01820101 00000000 00000080 bc60223c 00000000 00000000 00000000 00000000 4f52434c 4449534b 564f4c30 31300000 00000000 00000000 00000000 00000000 <32 bytes per line> BBED> show all FILE# 0 BLOCK# 64 OFFSET 0 DBA 0x00000000 (0 0,64) FILENAME bp1 BIFILE bifile.bbd LISTFILE BLOCKSIZE 512 MODE Browse EDIT Unrecoverable IBASE Dec OBASE Dec WIDTH 80 COUNT 64 LOGFILE log.bbd SPOOL No
这里基本上可以定位到asm disk header对于mapthbp1的偏移量为32256,dd出来asm disk header分析
使用kfed查看磁盘头信息
现在基本上可以确定,asm disk大小为1572871M,磁盘的偏移量为32256,然后使用dd命令把这部分dd到新的磁盘设备上,然后oracleasm scandisks后
data2 mount成功,数据库正常open,此数据库完美恢复