联系:手机/微信(+86 17813235971) QQ(107644445)
标题:硬件故障导致ORA-01242 ORA-01122等错误
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户多个节点rac,早上反馈说有两个节点实例异常,需要分析原因,查看其中一个节点的数据库alert日志,发现是由于访问1399号文件异常报ORA-01242 ORA-01122等错误,导致实例crash
Mon Aug 19 20:48:02 2024 Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc: ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf' ORA-01207: file is more recent than control file - old control file Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_ckpt_75582.trc: ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf' ORA-01207: file is more recent than control file - old control file CKPT (ospid: 75582): terminating the instance due to error 1242 Mon Aug 19 20:48:02 2024 System state dump requested by (instance=6, osid=75582 (CKPT)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_diag_75520.trc Termination issued to instance processes. Waiting for the processes to exit Mon Aug 19 20:48:13 2024 ORA-1092 : opitsk aborting process
继续分析日志发现集群尝试拉起该实例,遭遇ORA-01186,ORA-01122无法启动成功
ALTER DATABASE OPEN /* db agent *//* {0:6:39} */ Mon Aug 19 20:49:34 2024 SUCCESS: diskgroup DATA was mounted Mon Aug 19 20:49:34 2024 NOTE: dependency between database xff and diskgroup resource ora.DATA.dg is established Mon Aug 19 20:50:41 2024 Picked broadcast on commit scheme to generate SCNs Mon Aug 19 20:50:42 2024 Read of datafile '+DATA/xifenfei_01-157.dbf' (fno 1399) header failed with ORA-01207 Rereading datafile 1399 header failed with ORA-01207 Errors in file /u01/app/oracle/diag/rdbms/xff/xff6/trace/xff6_dbw0_29208.trc: ORA-01186: file 1399 failed verification tests ORA-01122: database file 1399 failed verification check ORA-01110: data file 1399: '+DATA/xifenfei_01-157.dbf' ORA-01207: file is more recent than control file - old control file File 1399 not verified due to error ORA-01122
这个错误是数据库文件访问异常导致,根据经验,出现这种问题一般是由于底层异常导致,查看系统messages日志,发现有硬件磁盘报错
Aug 19 20:41:58 xff6 fcoemon: FC_HOST_EVENT 6894 at 1724071318 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:41:58 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:41:58 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:41:58 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:42:03 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:42:03 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:03 xff6 fcoemon: FC_HOST_EVENT 6895 at 1724071323 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:07 xff6 fcoemon: FC_HOST_EVENT 6896 at 1724071327 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:07 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:07 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:07 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:12 xff6 fcoemon: FC_HOST_EVENT 6897 at 1724071332 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:12 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:12 xff6 kernel: sd 1:0:0:44: [sdat] Aug 19 20:42:12 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:25 xff6 fcoemon: FC_HOST_EVENT 6898 at 1724071345 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:25 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:25 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:25 xff6 kernel: <<vendor>> ASC=0xe0 ASCQ=0x1ASC=0xe0 ASCQ=0x1 Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6899 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:41 xff6 kernel: sd 1:0:0:42: [sdar] Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6 Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6900 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:41 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:41 xff6 kernel: <<vendor>> ASC=0xd0 ASCQ=0x6ASC=0xd0 ASCQ=0x6 Aug 19 20:42:41 xff6 fcoemon: FC_HOST_EVENT 6901 at 1724071361 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:53 xff6 fcoemon: FC_HOST_EVENT 6902 at 1724071373 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:53 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:42:53 xff6 kernel: sd 1:0:0:41: [sdaq] Aug 19 20:42:53 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap] Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:43:03 xff6 kernel: sd 1:0:0:40: [sdap] Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6903 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6904 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:43:03 xff6 fcoemon: FC_HOST_EVENT 6905 at 1724071383 secs on host1:code 65535=vendor_unique datalen 32 data=512 Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:43:03 xff6 kernel: Sense Key : Recovered Error [current] Aug 19 20:43:03 xff6 kernel: sd 1:0:0:43: [sdas] Aug 19 20:43:03 xff6 kernel: <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Aug 19 20:49:26 xff6 kernel: scsi_verify_blk_ioctl: 683 callbacks suppressed
客户进一步分析是由于昨天存储坏了一块盘,然后热备盘顶上了,但是不知道什么原因出现了文件访问异常,可能和当时的rebuild过程有关系.由于客户是rac环境,还有部分剩余节点运行正常,对于异常节点直接启动库成功
节点写入数据报ORA-01187: cannot read from file because it failed verification tests错误
在所有节点通过执行ALTER SYSTEM CHECK DATAFILES,然后所有节点操作正常