联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ora.storage无法启动报ORA-12514故障处理
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
19.11集群,节点2人工重启之后,crs启动异常
[grid@xff2 ~]$ crsctl status res -t -init -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE xff2 STABLE ora.cluster_interconnect.haip 1 ONLINE ONLINE xff2 STABLE ora.crf 1 ONLINE ONLINE xff2 STABLE ora.crsd 1 ONLINE OFFLINE STABLE ora.cssd 1 ONLINE ONLINE xff2 STABLE ora.cssdmonitor 1 ONLINE ONLINE xff2 STABLE ora.ctssd 1 ONLINE ONLINE xff2 OBSERVER,STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.drivers.acfs 1 ONLINE ONLINE xff2 STABLE ora.evmd 1 ONLINE ONLINE xff2 STABLE ora.gipcd 1 ONLINE ONLINE xff2 STABLE ora.gpnpd 1 ONLINE ONLINE xff2 STABLE ora.mdnsd 1 ONLINE ONLINE xff2 STABLE ora.storage 1 ONLINE OFFLINE STABLE --------------------------------------------------------------------------------
crs的alert日志显示
2024-03-05 12:46:26.021 [CLSECHO(3653)]ACFS-9327: Verifying ADVM/ACFS devices. 2024-03-05 12:46:26.040 [CLSECHO(3661)]ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'. 2024-03-05 12:46:26.065 [CLSECHO(3673)]ACFS-9156: Detecting control device '/dev/ofsctl'. 2024-03-05 12:46:26.357 [CLSECHO(3703)]ACFS-9294: updating file /etc/sysconfig/oracledrivers.conf 2024-03-05 12:46:26.376 [CLSECHO(3711)]ACFS-9322: completed 2024-03-05 12:46:27.764 [CSSDMONITOR(3855)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 3855 2024-03-05 12:46:27.839 [OSYSMOND(3857)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 3857 2024-03-05 12:46:28.129 [CSSDAGENT(3890)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 3890 2024-03-05 12:46:29.125 [OCSSD(3910)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 3910 2024-03-05 12:46:30.187 [OCSSD(3910)]CRS-1713: CSSD daemon is started in hub mode 2024-03-05 12:46:31.428 [OCSSD(3910)]CRS-1707: Lease acquisition for node xff2 number 2 completed 2024-03-05 12:46:32.630 [OCSSD(3910)]CRS-1621: The IPMI configuration data for this node stored in the Oracle registry is incomplete; details at (:CSSNK00002:) in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc 2024-03-05 12:46:32.630 [OCSSD(3910)]CRS-1617: The information required to do node kill for node xff2 is incomplete; details at (:CSSNM00004:) in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc 2024-03-05 12:46:32.638 [OCSSD(3910)]CRS-1605: CSSD voting file is online: /dev/sda1; details in /u01/app/grid/diag/crs/xff2/crs/trace/ocssd.trc. 2024-03-05 12:46:33.546 [OCSSD(3910)]CRS-1601: CSSD Reconfiguration complete. Active nodes are xff1 xff2 . 2024-03-05 12:46:35.405 [OCSSD(3910)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation. 2024-03-05 12:46:35.533 [OCTSSD(4138)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 4138 2024-03-05 12:46:36.339 [OCTSSD(4138)]CRS-2403: The Cluster Time Synchronization Service on host xff2 is in observer mode. 2024-03-05 12:46:37.601 [OCTSSD(4138)]CRS-2407: The new Cluster Time Synchronization Service reference node is host xff1. 2024-03-05 12:46:37.601 [OCTSSD(4138)]CRS-2401: The Cluster Time Synchronization Service started on host xff2. 2024-03-05 12:46:54.181 [ORAROOTAGENT(2427)]CRS-5019: All OCR locations are on ASM disk groups [SYSTEMDG], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc". 2024-03-05 12:47:15.209 [OLOGGERD(4553)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 4553 2024-03-05 12:52:04.581 [CRSCTL(8313)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/xff2/crs/trace/crsctl_8313.trc. 2024-03-05 12:56:44.519 [ORAROOTAGENT(2427)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:5:3} in /u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc. 2024-03-05 12:56:44.608 [OHASD(2217)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00221:) {0:5:3} in /u01/app/grid/diag/crs/xff2/crs/trace/ohasd.trc. 2024-03-05 12:56:44.606 [ORAROOTAGENT(2427)]CRS-5017: The resource action "ora.storage start" encountered the following error: 2024-03-05 12:56:44.606+agent's abort action pending. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/xff2/crs/trace/ohasd_orarootagent_root.trc". 2024-03-05 12:57:58.464 [CRSD(11801)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 11801 2024-03-05 12:58:12.059 [CRSD(11801)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/xff2/crs/trace/crsd.trc.
ohasd_orarootagent_root 日志
2024-03-05 12:52:00.769 : OCRRAW:4255452928: kgfnConnect3: Got a Connection Error when connecting to ASM. 2024-03-05 12:52:00.771 : OCRRAW:4255452928: kgfnConnect2: failed to connect 2024-03-05 12:52:00.771 : OCRRAW:4255452928: kgfnConnect2Retry: failed to connect connect after 1 attempts, 124s elapsed 2024-03-05 12:52:00.771 : OCRRAW:4255452928: kgfo_kge2slos error stack at kgfoAl06: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor ORA-12514: TNS:listener does not currently know of service requested in connect descriptor 2024-03-05 12:52:00.771 : OCRRAW:4255452928: -- trace dump on error exit -- 2024-03-05 12:52:00.771 : OCRRAW:4255452928: Error [kgfoAl06] in [kgfokge] at kgfo.c:2176 2024-03-05 12:52:00.771 : OCRRAW:4255452928: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor ORA-12514: TNS:listener does not currently know of service requested 2024-03-05 12:52:00.771 : OCRRAW:4255452928: Category: 7 "/u01/app/grid/diag/crs/xff2/crs/trace/crsctl_8313.trc" 208L, 11809C 2024-03-05 12:52:03.543 : OCRRAW:4255452928: 9379 Error 4 opening dom root in 0xf9afdb79c0 2024-03-05 12:52:03.551 : OCRRAW:4255452928: kgfnConnect2: kgfnGetBeqData failed 2024-03-05 12:52:03.577 : OCRRAW:4255452928: kgfnConnect2Int: cstr=(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM))) 2024-03-05 12:52:03.578 : OCRRAW:4255452928: kgfnConnect2Int: ServerAttach 2024-03-05 12:52:04.579 : OCRRAW:4255452928: kgfnServerAttachConnErrors: Encountered service based error 12514 2024-03-05 12:52:04.579 : OCRRAW:4255452928: kgfnRecordErr 12514 OCI error: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor 2024-03-05 12:52:04.579 : OCRRAW:4255452928: kgfnConnect3: Got a Connection Error when connecting to ASM. 2024-03-05 12:52:04.581 : OCRRAW:4255452928: kgfnConnect2: failed to connect 2024-03-05 12:52:04.581 : OCRRAW:4255452928: kgfnConnect2Retry: failed to connect connect after 1 attempts, 122s elapsed 2024-03-05 12:52:04.581 : OCRRAW:4255452928: kgfo_kge2slos error stack at kgfoAl06: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor ORA-12514: TNS:listener does not currently know of service requested in connect descriptor 2024-03-05 12:52:04.581 : OCRRAW:4255452928: -- trace dump on error exit -- 2024-03-05 12:52:04.581 : OCRRAW:4255452928: Error [kgfoAl06] in [kgfokge] at kgfo.c:3180 2024-03-05 12:52:04.581 : OCRRAW:4255452928: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor ORA-12514: TNS:listener does not currently know of service requested 2024-03-05 12:52:04.581 : OCRRAW:4255452928: Category: 7 2024-03-05 12:52:04.581 : OCRRAW:4255452928: DepInfo: 12514 2024-03-05 12:52:04.581 : OCRRAW:4255452928: ADR is not properly configured 2024-03-05 12:52:04.581 : OCRRAW:4255452928: -- trace dump end -- OCRASM:4255452928: SLOS : SLOS: cat=7, opn=kgfoAl06, dep=12514, loc=kgfokge 2024-03-05 12:52:04.581 : OCRASM:4255452928: ASM Error Stack : ORA-12514: TNS:listener does not currently know of service requested in connect descriptor ORA-12514: TNS:listener does not currently know of service requested in connect descriptor 2024-03-05 12:52:04.581 : OCRASM:4255452928: proprasmo: kgfoCheckMount returned [7] 2024-03-05 12:52:04.581 : OCRASM:4255452928: proprasmo: The ASM instance is down 2024-03-05 12:52:04.635 : OCRRAW:4255452928: proprioo: Failed to open [+SYSTEMDG/xff-cluster/OCRFILE/registry.255.1072903025]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2024-03-05 12:52:04.635 : OCRRAW:4255452928: proprioo: No OCR/OLR devices are usable OCRUTL:4255452928: u_fill_errorbuf: Error Info : [Insufficient quorum to open OCR devices] default:4255452928: u_set_gbl_comp_error: comptype '107' : error '0' 2024-03-05 12:52:04.635 : OCRRAW:4255452928: proprinit: Could not open raw device 2024-03-05 12:52:04.635 : default:4255452928: a_init:7!: Backend init unsuccessful : [26] 2024-03-05 12:52:04.637 : default:4255452928: clsvactversion:4: Retrieving Active Version from local storage.
通过这里,初步判断是由于节点2访问(DESCRIPTION=(TCP_USER_TIMEOUT=1)(CONNECT_TIMEOUT=60)(EXPIRE_TIME=1)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=+ASM)))异常导致,查看节点1的该监听状态
[grid@xff1 ~]$ lsnrctl status ASMNET1LSNR_ASM LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2024 13:04:51 Copyright (c) 1991, 2021, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM))) STATUS of the LISTENER ------------------------ Alias ASMNET1LSNR_ASM Version TNSLSNR for Linux: Version 19.0.0.0.0 - Production Start Date 20-MAY-2021 23:53:50 Uptime 25 days 8 hr. 15 min. 15 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/19c/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/xff1/asmnet1lsnr_asm/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525))) The listener supports no services The command completed successfully
发现该监听没有注册服务进去,检查相关listener参数配置
[grid@xff1 ~]$ sqlplus / as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 5 13:26:29 2024 Version 19.11.0.0.0 Copyright (c) 1982, 2020, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.11.0.0.0 SQL> show parameter listener; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ forward_listener string listener_networks string local_listener string remote_listener string
初步判断是由于节点1的ASMNET1LSNR_ASM监听状态异常,很可能是由于asm实例的listener参数异常导致,比较稳妥的解决方案是重启节点1,让其重新生成listener相关参数,实现动态注册,临时解决方法,
[grid@xff1 ~]$ sqlplus / as sysasm SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 5 13:05:11 2024 Version 19.11.0.0.0 Copyright (c) 1982, 2020, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.11.0.0.0 SQL> ALTER SYSTEM SET local_listener ='(ADDRESS=(PROTOCOL=TCP)(HOST=节点1私网IP)(PORT=1525))' sid='+ASM1' SCOPE=MEMORY; System altered. [grid@xff1 ~]$ lsnrctl status ASMNET1LSNR_ASM LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 05-MAR-2024 13:05:21 Copyright (c) 1991, 2021, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM))) STATUS of the LISTENER ------------------------ Alias ASMNET1LSNR_ASM Version TNSLSNR for Linux: Version 19.0.0.0.0 - Production Start Date 20-MAY-2021 23:53:50 Uptime 25 days 8 hr. 15 min. 45 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/19c/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/xff1/asmnet1lsnr_asm/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=节点1私网IP)(PORT=1525))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_DATA" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_FRA" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_SYSTEMDG" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... The command completed successfully [grid@xff1 ~]$
设置节点1的asm实例的local_listener 参数之后,集群启动成功
[grid@xff2 ~]$ crsctl status res -t -init -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE xff2 STABLE ora.cluster_interconnect.haip 1 ONLINE ONLINE xff2 STABLE ora.crf 1 ONLINE ONLINE xff2 STABLE ora.crsd 1 ONLINE ONLINE xff2 STABLE ora.cssd 1 ONLINE ONLINE xff2 STABLE ora.cssdmonitor 1 ONLINE ONLINE xff2 STABLE ora.ctssd 1 ONLINE ONLINE xff2 OBSERVER,STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.drivers.acfs 1 ONLINE ONLINE xff2 STABLE ora.evmd 1 ONLINE ONLINE xff2 STABLE ora.gipcd 1 ONLINE ONLINE xff2 STABLE ora.gpnpd 1 ONLINE ONLINE xff2 STABLE ora.mdnsd 1 ONLINE ONLINE xff2 STABLE ora.storage 1 ONLINE ONLINE xff2 STABLE --------------------------------------------------------------------------------