联系:手机/微信(+86 17813235971) QQ(107644445)
标题:WARNING: Read Failed.导致asm磁盘组异常
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户对asm dg进行扩容,一段时间之后,asm data 磁盘组直接dismount
Wed May 29 18:37:25 2019 SUCCESS: ALTER DISKGROUP DATA ADD DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M , '/dev/oracleasm/disks/DATA_0027' SIZE 511993M , '/dev/oracleasm/disks/DATA_0026' SIZE 511993M , '/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */ NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1 Wed May 29 18:37:26 2019 Starting background process ARB0 Wed May 29 18:37:26 2019 ARB0 started with pid=34, OS id=96638 NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O NOTE: Attempting voting file refresh on diskgroup DATA NOTE: Refresh completed on diskgroup DATA. No voting file found. cellip.ora not found. Wed May 29 19:21:43 2019 WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096 WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 (DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596 ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027) ORA-15080: synchronous I/O operation to a disk failed ORA-27072: File I/O error Linux-x86_64 Error: 5: Input/output error Additional information: 4 Additional information: 704 Additional information: -1 NOTE: cache initiating offline of disk 27 group DATA NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1 NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear Wed May 29 19:21:43 2019 GMON updating disk modes for group 1 at 10 for pid 35, osid 31879 ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) Wed May 29 19:21:43 2019 NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000) Wed May 29 19:21:43 2019 NOTE: halting all I/Os to diskgroup 1 (DATA) WARNING: Offline for disk DATA_0027 in mode 0x7f failed. Wed May 29 19:21:43 2019 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207 Wed May 29 19:21:43 2019 ERROR: ORA-15130 thrown in ARB0 for group number 1 Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc: ORA-15130: diskgroup "" is being dismounted ORA-15130: diskgroup "DATA" is being dismounted Wed May 29 19:21:43 2019 NOTE: stopping process ARB0
后续继续mount data 磁盘组成功,但是立马又dismount
Wed May 29 18:37:25 2019 SUCCESS: ALTER DISKGROUP DATA ADD DISK '/dev/oracleasm/disks/DATA_0028' SIZE 511993M , '/dev/oracleasm/disks/DATA_0027' SIZE 511993M , '/dev/oracleasm/disks/DATA_0026' SIZE 511993M , '/dev/oracleasm/disks/DATA_0025' SIZE 511993M /* ASMCA */ NOTE: starting rebalance of group 1/0x9e18e2f1 (DATA) at power 1 Wed May 29 18:37:26 2019 Starting background process ARB0 Wed May 29 18:37:26 2019 ARB0 started with pid=34, OS id=96638 NOTE: assigning ARB0 to group 1/0x9e18e2f1 (DATA) with 1 parallel I/O NOTE: Attempting voting file refresh on diskgroup DATA NOTE: Refresh completed on diskgroup DATA. No voting file found. cellip.ora not found. Wed May 29 19:21:43 2019 WARNING: Read Failed. group:1 disk:27 AU:0 offset:360448 size:4096 WARNING: cache failed reading from group=1(DATA) dsk=27 blk=88 count=1 from disk= 27 (DATA_0027) kfkist=0x20 status=0x02 osderr=0x0 file=kfc.c line=11596 ERROR: cache failed to read group=1(DATA) dsk=27 blk=88 from disk(s): 27(DATA_0027) ORA-15080: synchronous I/O operation to a disk failed ORA-27072: File I/O error Linux-x86_64 Error: 5: Input/output error Additional information: 4 Additional information: 704 Additional information: -1 NOTE: cache initiating offline of disk 27 group DATA NOTE: process _user31879_+asm1 (31879) initiating offline of disk 27.3915911747 (DATA_0027) with mask 0x7e in group 1 NOTE: initiating PST update: grp = 1, dsk = 27/0xe9681243, mask = 0x6a, op = clear Wed May 29 19:21:43 2019 GMON updating disk modes for group 1 at 10 for pid 35, osid 31879 ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy. ERROR: too many offline disks in PST (grp 1) Wed May 29 19:21:43 2019 NOTE: cache dismounting (not clean) group 1/0x9E18E2F1 (DATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 90256, image: oracle@ftz-db-o1 (B000) Wed May 29 19:21:43 2019 NOTE: halting all I/Os to diskgroup 1 (DATA) WARNING: Offline for disk DATA_0027 in mode 0x7f failed. Wed May 29 19:21:43 2019 NOTE: LGWR doing non-clean dismount of group 1 (DATA) NOTE: LGWR sync ABA=27.3207 last written ABA 27.3207 Wed May 29 19:21:43 2019 ERROR: ORA-15130 thrown in ARB0 for group number 1 Errors in file /oracle/grid_base/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_96638.trc: ORA-15130: diskgroup "" is being dismounted ORA-15130: diskgroup "DATA" is being dismounted Wed May 29 19:21:43 2019 NOTE: stopping process ARB0
对于上述的故障现象,本质原因是由于asm 磁盘组增加新磁盘之后,开始做rebalance,但是由于遭遇到 27号盘上有IO读错误,使得asm磁盘组无法正常完成rebalance,因而data磁盘组无法稳定的mount。解决该问题思路,通过patch asm磁盘组,禁止rebalance,从而使得data磁盘组不再dismount,再进行后续恢复