本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客
本文链接地址: PMON crash the instance of RAC
这是某个网友的问题,其一套rac的某个节点被重启了,通过分析日志我们可以看到其中一个rac节点crash了,如下:
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc (incident=584434): ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584434/cspora2_pmon_57410148_i584434.trc Thu Feb 05 10:24:04 2015 Trace dumping is performing id=[cdmp_20150205102404] Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc (incident=584435): ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584435/cspora2_pmon_57410148_i584435.trc Thu Feb 05 10:24:06 2015 Sweep Incident[584434]: completed Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc: ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] PMON (ospid: 57410148): terminating the instance due to error 472 Thu Feb 05 10:24:18 2015 Termination issued to instance processes. Waiting for the processes to exit Instance termination failed to kill one or more processes Instance terminated by PMON, pid = 57410148 Thu Feb 05 10:24:58 2015 Starting ORACLE instance (normal) sskgpgetexecname failed to get name LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Picked latch-free SCN scheme 3 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up ORACLE RDBMS Version: 11.1.0.7.0. Using parameter settings in server-side spfile /dev/rlv_spfile ....... ....... ORA-1154 signalled during: alter database open... Thu Feb 05 10:31:11 2015 Reconfiguration started (old inc 102, new inc 104) List of nodes: 1 Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Thu Feb 05 10:31:12 2015 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Thu Feb 05 10:31:12 2015
从上述alert log可以看出,数据库实例是被pmon进程强行终止的,之所以pmon进程会强行终止instance,
很明显之前出现了ORA-00600错误。很明显上述ORA-00600错误是pmon进程出现的。我们来分析pmon进程的trace内容:
*** ACTION NAME:() 2015-02-05 10:24:01.185 Dump continued from file: /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 584434 (ORA 600 [kjucvl:!busy]) ======== ----- Beginning of Customized Incident Dump(s) ----- ----------enqueue 7000001cbbcdaf0------------------------ lock version : 311947 Owner node : 1 grant_level : KJUSEREX req_level : KJUSEREX bast_level : KJUSERNL notify_func : 0 resp : 70000018f124ac0 procp : 7000001cb0e3960 pid : 46924312 proc version : 922 oprocp : 0 opid : 0 group lock owner : 0 xid : 0000-0000-00000000 dd_time : 0.0 secs dd_count : 0 timeout : 0.0 secs On_timer_q? : N On_dd_q? : N lock_state : CONVERTING Open Options : KJUSERPROCESS_OWNED Convert options : KJUSERGETVALUE KJUSERNOQUEUE KJUSEREXPRESS KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK History : 0x955252b5 Msg_Seq : 0x0 res_seq : 41 valblk : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V ----------resource 70000018f124ac0---------------------- resname : [0x970d5d33][0x0],[DX] Local node : 1 dir_node : 0 master_node : 0 hv idx : 18 hv last r.inc : 98 current inc : 98 hv status : 0 hv master : 0 open options : Held mode : KJUSEREX Cvt mode : KJUSERNL Next Cvt mode : KJUSERNL msg_seq : 1 res_seq : 41 grant_bits : KJUSERNL KJUSEREX grant mode : KJUSERNL KJUSERCR KJUSERCW KJUSERPR KJUSERPW KJUSEREX count : 1 0 0 0 0 1 val_state : KJUSERVS_VALUE valblk : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V access_node : 1 vbreq_state : 0 state : x8 resp : 70000018f124ac0 On Scan_q? : N Total accesses: 5475 Imm. accesses: 5057 Granted_locks : 2 Cvting_locks : 0 value_block: 51 4e 58 44 97 0d 5d 33 d1 56 c8 d8 00 04 ac 1e GRANTED_Q : lp 7000001cbbcdaf0 gl KJUSEREX rp 70000018f124ac0 [0x970d5d33][0x0],[DX] master 0 pid 46924312 bast 0 rseq 41 mseq 0 history 0x955252b5 open opt KJUSERPROCESS_OWNED CONVERT_Q: ----- End of Customized Incident Dump(s) ----- *** 2015-02-05 10:24:01.286 ----- SQL Statement (None) ----- Current SQL information unavailable - no cursor. ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+002c bl 105e6a628 ksedst1()+0064 bl 101f9c14c ksedst()+0028 bl ksedst1() FFFFFFFFFFF4880 ? 000000000 ? dbkedDefDump()+0874 bl ksedst() 110EA2180 ? ksedmp()+0048 bl dbkedDefDump() 3FFFF51D0 ? 110000A90 ? ksfdmp()+0058 bl ksedmp() 000000020 ? dbgexPhaseII()+0130 bl _ptrgl() dbgexExplicitEndInc bl dbgexPhaseII() 000000000 ? 000000000 ? ()+0210 000000000 ? dbgeEndDDEInvocatio bl dbgexExplicitEndInc 110378410 ? 11037F370 ? nImpl()+0224 () dbgeEndDDEInvocatio bl dbgeEndDDEInvocatio 110378410 ? 11037F370 ? n()+0038 nImpl() kjucvl()+0590 bl 101f9d824 kjuscv()+0c90 bl kjucvl() 000000002 ? FFFFFFFFFFF9700 ? FFFFFFFFFFF9570 ? 24244228CB0E3960 ? 102196C80 ? FFFFFFFFFFF9220 ? 000000000 ? 110323C28 ? ksicon_v()+06cc bl kjuscv() 7000001CBBCDAF0 ? 500000000000000 ? 79FFFF9D00 ? 0105358E0 ? 7000001BDF32FB4 ? 000000000 ? 000000000 ? 000000000 ? k2qsod()+0250 bl ksicon_v() 1071DED20 ? 000000000 ? 000000000 ? 084242240 ? 70000017FEF01A0 ? 000000046 ? 70000017FEF01C0 ? kssxdl()+076c bl _ptrgl() kssdel()+0048 bl kssxdl() 7000001BDF32F38 ? 382242228 ? kssdch()+0d30 bl kssdel() FFFFFFFFFFFA550 ? 4224822800000003 ? ksuxds()+08b8 bl kssdch() 3CD1FEAE8 ? 7000001CD3ACCA0 ? kssxdl()+076c bl _ptrgl() kssdel()+0048 bl kssxdl() 7000001CD3ACCA0 ? 3CE4C7B78 ? kssdch()+0d30 bl kssdel() FFFFFFFFFFFF5D0 ? FFFFFFFFFFFF5E0 ? ksudlp()+0180 bl kssdch() 000000000 ? 000000000 ? kssxdl()+076c bl _ptrgl() kssdel()+0048 bl kssxdl() 7000001CF9B5E48 ? 300000000 ? ksuxdl()+029c bl kssdel() 033339530 ? 093339530 ? ksuxda()+02ac bl ksuxdl() 7000001CF9B5E48 ? 0FFFFC1D0 ? ksucln()+0938 bl ksuxda() ksbrdp()+075c bl _ptrgl() opirip()+0444 bl 105e65300 opidrv()+0418 bl opirip() 732F6373706F7261 ? 410323C28 ? FFFFFFFFFFFF420 ? sou2o()+0090 bl opidrv() 3204C0756C ? 4A0071E60 ? FFFFFFFFFFFF420 ? opimai_real()+0148 bl 101f9be30 main()+0090 bl opimai_real() 000000000 ? 000000000 ? __start()+0070 bl main() 000000000 ? 000000000 ?
从上述信息来看,出现了DX锁,对于DX锁,这是一种分布式事务锁。通过分析call stack,我们可以定位到
该问题其实是一个Oracle bug:11868640。
通过对比Bug 11868640 – ORA-600 [kjucvl:!busy] possible in PMON in RAC (文档 ID 11868640.8) ,
我们可以发现pmon trace中的call stack与上述bug描述完全一致。实际上该问题,可以通过安装patch或psu来解决。
很简单的一个问题,供大家参考!
Related posts: