联系:手机/微信(+86 17813235971) QQ(107644445)
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
有客户数据库版本为12.1.0.1 版本RAC,突发发生重启,让协助分析原因
数据库alert日志报ORA-15064错误
Mon Apr 15 15:06:26 2019 WARNING: inbound connection timed out (ORA-3136) Mon Apr 15 15:41:26 2019 NOTE: ASMB terminating Mon Apr 15 15:41:26 2019 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_asmb_61426.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 1892 Serial number: 29 Mon Apr 15 15:41:26 2019 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_asmb_61426.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 1892 Serial number: 29 Mon Apr 15 15:41:26 2019 System state dump requested by (instance=1, osid=61426 (ASMB)), summary=[abnormal instance termination]. Mon Apr 15 15:41:26 2019 USER (ospid: 61426): terminating the instance due to error 15064 Mon Apr 15 15:41:26 2019 System State dumped to trace file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_61287.trc Mon Apr 15 15:41:27 2019 opiodr aborting process unknown ospid (1171) as a result of ORA-1092 Mon Apr 15 15:41:27 2019 ORA-1092 : opitsk aborting process
这里看,明显asmb异常导致数据库无法正常访问asm从而出现数据库crash的问题.
分析asm日志
Mon Apr 15 15:41:26 2019 WARNING: client [+ASM1:+ASM] not responsive for 2069s; state=0x1. pid 23155 NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc WARNING: client [orcl1:orcl] not responsive for 2069s; state=0x1. killing pid 61436 NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc WARNING: fencing client [orcl1:orcl] after 2069 seconds (mbr 2) WARNING: client [-MGMTDB:_mgmtdb] not responsive for 2070s; state=0x1. killing pid 24026 NOTE: umbilicus traces dumped to /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gen0_23050.trc WARNING: fencing client [-MGMTDB:_mgmtdb] after 2070 seconds (mbr 1) Mon Apr 15 15:41:26 2019 NOTE: cleaned up ASM client -MGMTDB:_mgmtdb NOTE: cleaned up ASM client orcl1:orcl Mon Apr 15 15:41:43 2019 NOTE: Standard client -MGMTDB:_mgmtdb registered, osid 183707, mbr 0x1 (reg:1371965153) Mon Apr 15 15:42:16 2019 NOTE: Standard client orcl1:orcl registered, osid 184063, mbr 0x2 (reg:2088418628) Mon Apr 15 15:44:30 2019 Warning: VKTM detected a time drift. Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
asm日志中和mos中的GEN0 terminating the ASM instance due to error 15082 (文档 ID 2096988.1)描述比较匹配.根据客户反馈,他们使用ntp进行修改了时间,基本上可以确定是由于oracle的Bug 19032250(在12.1.0.2中修复)在ntp修改时间跨度过大触发的相关问题(人工直接修改时间也可能出现类似问题)
对于rac修改时间建议
1. 如果时间慢了,关闭数据库和集群直接把时间向前调整,启动集群和数据库
2. 如果时间快了,关闭数据库和集群等实际时间过关闭集群和库的时间之后,再往回调整时间,启动集群和数据库