PMON crash the instance of RAC

本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客

本文链接地址: PMON crash the instance of RAC

这是某个网友的问题,其一套rac的某个节点被重启了,通过分析日志我们可以看到其中一个rac节点crash了,如下:

Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc  (incident=584434):
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584434/cspora2_pmon_57410148_i584434.trc
Thu Feb 05 10:24:04 2015
Trace dumping is performing id=[cdmp_20150205102404]
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc  (incident=584435):
ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584435/cspora2_pmon_57410148_i584435.trc
Thu Feb 05 10:24:06 2015
Sweep Incident[584434]: completed
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc:
ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []
PMON (ospid: 57410148): terminating the instance due to error 472
Thu Feb 05 10:24:18 2015
Termination issued to instance processes. Waiting for the processes to exit
Instance termination failed to kill one or more processes
Instance terminated by PMON, pid = 57410148
Thu Feb 05 10:24:58 2015
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side spfile /dev/rlv_spfile
.......
.......
ORA-1154 signalled during: alter database open...
Thu Feb 05 10:31:11 2015
Reconfiguration started (old inc 102, new inc 104)
List of nodes:
1
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Thu Feb 05 10:31:12 2015
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Feb 05 10:31:12 2015

从上述alert log可以看出,数据库实例是被pmon进程强行终止的,之所以pmon进程会强行终止instance,
很明显之前出现了ORA-00600错误。很明显上述ORA-00600错误是pmon进程出现的。我们来分析pmon进程的trace内容:

*** ACTION NAME:() 2015-02-05 10:24:01.185

Dump continued from file: /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc
ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 584434 (ORA 600 [kjucvl:!busy]) ========
----- Beginning of Customized Incident Dump(s) -----
----------enqueue 7000001cbbcdaf0------------------------
lock version     : 311947
Owner node       : 1
grant_level      : KJUSEREX
req_level        : KJUSEREX
bast_level       : KJUSERNL
notify_func      : 0
resp             : 70000018f124ac0
procp            : 7000001cb0e3960
pid              : 46924312
proc version     : 922
oprocp           : 0
opid             : 0
group lock owner : 0
xid              : 0000-0000-00000000
dd_time          : 0.0 secs
dd_count         : 0
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : N
lock_state       : CONVERTING
Open Options     :  KJUSERPROCESS_OWNED
Convert options  : KJUSERGETVALUE KJUSERNOQUEUE KJUSEREXPRESS KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK
History          : 0x955252b5
Msg_Seq          : 0x0
res_seq          : 41
valblk           : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V
----------resource 70000018f124ac0----------------------
resname       : [0x970d5d33][0x0],[DX]
Local node    : 1
dir_node      : 0
master_node   : 0
hv idx        : 18
hv last r.inc : 98
current inc   : 98
hv status     : 0
hv master     : 0
open options  :
Held mode     : KJUSEREX
Cvt mode      : KJUSERNL
Next Cvt mode : KJUSERNL
msg_seq       : 1
res_seq       : 41
grant_bits    : KJUSERNL KJUSEREX
grant mode    : KJUSERNL  KJUSERCR  KJUSERCW  KJUSERPR  KJUSERPW  KJUSEREX
count         : 1         0         0         0         0         1
val_state     : KJUSERVS_VALUE
valblk        : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V
access_node   : 1
vbreq_state   : 0
state         : x8
resp          : 70000018f124ac0
On Scan_q?    : N
Total accesses: 5475
Imm.  accesses: 5057
Granted_locks : 2
Cvting_locks  : 0
value_block:  51 4e 58 44 97 0d 5d 33 d1 56 c8 d8 00 04 ac 1e
GRANTED_Q :
lp 7000001cbbcdaf0 gl KJUSEREX rp 70000018f124ac0 [0x970d5d33][0x0],[DX]
 master 0 pid 46924312 bast 0 rseq 41 mseq 0 history 0x955252b5
 open opt  KJUSERPROCESS_OWNED
CONVERT_Q:
----- End of Customized Incident Dump(s) -----

*** 2015-02-05 10:24:01.286
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+002c      bl       105e6a628
ksedst1()+0064       bl       101f9c14c
ksedst()+0028        bl       ksedst1()            FFFFFFFFFFF4880 ? 000000000 ?
dbkedDefDump()+0874  bl       ksedst()             110EA2180 ?
ksedmp()+0048        bl       dbkedDefDump()       3FFFF51D0 ? 110000A90 ?
ksfdmp()+0058        bl       ksedmp()             000000020 ?
dbgexPhaseII()+0130  bl       _ptrgl()
dbgexExplicitEndInc  bl       dbgexPhaseII()       000000000 ? 000000000 ?
()+0210                                            000000000 ?
dbgeEndDDEInvocatio  bl       dbgexExplicitEndInc  110378410 ? 11037F370 ?
nImpl()+0224                  ()
dbgeEndDDEInvocatio  bl       dbgeEndDDEInvocatio  110378410 ? 11037F370 ?
n()+0038                      nImpl()
kjucvl()+0590        bl       101f9d824
kjuscv()+0c90        bl       kjucvl()             000000002 ? FFFFFFFFFFF9700 ?
 FFFFFFFFFFF9570 ?
 24244228CB0E3960 ?
 102196C80 ? FFFFFFFFFFF9220 ?
 000000000 ? 110323C28 ?
ksicon_v()+06cc      bl       kjuscv()             7000001CBBCDAF0 ?
 500000000000000 ?
 79FFFF9D00 ? 0105358E0 ?
 7000001BDF32FB4 ? 000000000 ?
 000000000 ? 000000000 ?
k2qsod()+0250        bl       ksicon_v()           1071DED20 ? 000000000 ?
 000000000 ? 084242240 ?
 70000017FEF01A0 ? 000000046 ?
 70000017FEF01C0 ?
kssxdl()+076c        bl       _ptrgl()
kssdel()+0048        bl       kssxdl()             7000001BDF32F38 ? 382242228 ?
kssdch()+0d30        bl       kssdel()             FFFFFFFFFFFA550 ?
 4224822800000003 ?
ksuxds()+08b8        bl       kssdch()             3CD1FEAE8 ? 7000001CD3ACCA0 ?
kssxdl()+076c        bl       _ptrgl()
kssdel()+0048        bl       kssxdl()             7000001CD3ACCA0 ? 3CE4C7B78 ?
kssdch()+0d30        bl       kssdel()             FFFFFFFFFFFF5D0 ?
 FFFFFFFFFFFF5E0 ?
ksudlp()+0180        bl       kssdch()             000000000 ? 000000000 ?
kssxdl()+076c        bl       _ptrgl()
kssdel()+0048        bl       kssxdl()             7000001CF9B5E48 ? 300000000 ?
ksuxdl()+029c        bl       kssdel()             033339530 ? 093339530 ?
ksuxda()+02ac        bl       ksuxdl()             7000001CF9B5E48 ? 0FFFFC1D0 ?
ksucln()+0938        bl       ksuxda()
ksbrdp()+075c        bl       _ptrgl()
opirip()+0444        bl       105e65300
opidrv()+0418        bl       opirip()             732F6373706F7261 ?
 410323C28 ? FFFFFFFFFFFF420 ?
sou2o()+0090         bl       opidrv()             3204C0756C ? 4A0071E60 ?
 FFFFFFFFFFFF420 ?
opimai_real()+0148   bl       101f9be30
main()+0090          bl       opimai_real()        000000000 ? 000000000 ?
__start()+0070       bl       main()               000000000 ? 000000000 ?

从上述信息来看,出现了DX锁,对于DX锁,这是一种分布式事务锁。通过分析call stack,我们可以定位到
该问题其实是一个Oracle bug:11868640。

通过对比Bug 11868640 – ORA-600 [kjucvl:!busy] possible in PMON in RAC (文档 ID 11868640.8) ,
我们可以发现pmon trace中的call stack与上述bug描述完全一致。实际上该问题,可以通过安装patch或psu来解决。

很简单的一个问题,供大家参考!

Related posts:

  1. ora-00600 [kkslgbv0]
  2. BUG 10008092 caused instance crash
  3. database crash with ora-00494
  4. PMON failed to acquire latch导致crash的案例联想
  5. Instance immediate crash after open
此条目发表在 未分类 分类目录,贴了 标签。将固定链接加入收藏夹。

评论功能已关闭。