RAC环境下删除了/var/tmp/.oracle/的临时文件,有什么后果,以及如何处理

联系:QQ(5163721)

标题:RAC环境下删除了/var/tmp/.oracle/的临时文件,有什么后果,以及如何处理

作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]

测试目的: 模拟RAC环境下有人误操作,删除了/var/tmp/.oracle/*下的oracle临时文件(删除Network Socket File)
测试过程:观察会有什么后果,以及如何处理。
.
测试环境:OEL 6.6 ,Oracle 11.2.0.4 Standalone(单实例使用ASM的环境)
如果是RAC,测试结论应该大体一致(机制类似)。

[root@lunarlib rootwork]# cat /etc/oracle-release 
Oracle Linux Server release 6.6
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# uname -a
Linux lunarlib 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep 10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@lunarlib rootwork]# 

在Linux平台上,RAC或者HAS(单实例使用ASM的环境,比如standalone或者我们说的Oracle Restart)使用的Network Socket File在/var/tmp/.oracle/*文件:
(在其他平台(比如, AIX HPUX等等)Network Socket File可能在:ls -lrt /tmp/.oracle/* /tmp/.oracle 或者 /usr/tmp/.oracle)

[root@lunarlib etc]# ls -lrt /var/tmp/.oracle/* 
prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasd
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sprocr_local_conn_0_PROL
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:43 /var/tmp/.oracle/sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sAevm
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sCevm
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/slunarlibDBG_CSSD
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost
[root@lunarlib etc]# 

使用crsctl stop has -f停止has,然后就可以直接删除/var/tmp/.oracle/* 下面的Network Socket File:

[root@lunarlib rootwork]# crsctl stop has -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'lunarlib'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'lunarlib'
CRS-2673: Attempting to stop 'ora.CRSDG.dg' on 'lunarlib'
CRS-2673: Attempting to stop 'ora.lunardb.db' on 'lunarlib'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'lunarlib' succeeded
CRS-2677: Stop of 'ora.lunardb.db' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.DATADG1.dg' on 'lunarlib'
CRS-2673: Attempting to stop 'ora.DATADG2.dg' on 'lunarlib'
CRS-2677: Stop of 'ora.DATADG1.dg' on 'lunarlib' succeeded
CRS-2677: Stop of 'ora.DATADG2.dg' on 'lunarlib' succeeded
CRS-2677: Stop of 'ora.CRSDG.dg' on 'lunarlib' succeeded
CRS-2679: Attempting to clean 'ora.CRSDG.dg' on 'lunarlib'
CRS-2681: Clean of 'ora.CRSDG.dg' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'lunarlib'
CRS-2677: Stop of 'ora.asm' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'lunarlib'
CRS-2677: Stop of 'ora.cssd' on 'lunarlib' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'lunarlib'
CRS-2677: Stop of 'ora.evmd' on 'lunarlib' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'lunarlib' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* 
prw-r--r-- 1 grid oinstall 0 Oct 11 01:30 /var/tmp/.oracle/npohasd
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.2
srwxrwxrwx 1 grid oinstall 0 Oct 11 05:44 /var/tmp/.oracle/s#4577.1
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:01 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/s#5185.1
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 11:03 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 11:33 /var/tmp/.oracle/s#5516.1
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:12 /var/tmp/.oracle/sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/slunarlibDBG_CSSD
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sAevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sCevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:13 /var/tmp/.oracle/sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1
srwxrwxrwx 1 grid oinstall 0 Jan 11 17:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost
[root@lunarlib rootwork]#
[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/*
[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
[root@lunarlib rootwork]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@lunarlib rootwork]# 

如果/var/tmp/.oracle目录不存在,可以手工重建:

[root@lunarlib rootwork]# mkdir /var/tmp/.oracle
[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
[root@lunarlib rootwork]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      5177     1  1 18:12 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      5306     1  1 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      5311     1  1 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      5339     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      5341     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      5356     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      5387  5339  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5400  5264  0 18:14 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# ls -lrt /var/tmp/.oracle/* 
prw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/npohasd
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sprocr_local_conn_0_PROL
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/slunarlibDBG_OHASD
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:12 /var/tmp/.oracle/sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/s#5341.1
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/slunarlibDBG_CSSD
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib__lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sAevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sCevm
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOracle_CSS_LclLstnr_localhost_1
-rw-r--r-- 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 18:14 /var/tmp/.oracle/sOCSSD_LL_lunarlib_localhost
[root@lunarlib rootwork]# 

如果在has正常运行的状态下删除上述oracle临时文件,那么数据库可以使用,但是不能正常关闭:

[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/* 
[root@lunarlib rootwork]# ll /var/tmp/.oracle/* 
ls: cannot access /var/tmp/.oracle/*: No such file or directory
[root@lunarlib rootwork]# ll /var/tmp/.oracle/
total 0
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2877     1  0 17:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      5177     1  0 18:12 ?        00:00:04 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      5653  5264  0 18:21 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2877     1  0 17:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      5177     1  0 18:12 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      5660  5264  0 18:23 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      5177     1  0 18:12 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      5306     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      5311     1  0 18:14 ?        00:00:05 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      5339     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      5341     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      5356     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      5387  5339  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      5662  5264  0 18:23 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# crsctl status res -t
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
[root@lunarlib rootwork]# 

可以看到,这时,crs通信异常了。
我们看下数据库:

[oracle@lunarlib work]$ ss

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:17 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options

SYS@lunardb>alter system switch logfile;

System altered.

Elapsed: 00:00:00.14
SYS@lunardb>alter system checkpoint;

System altered.

Elapsed: 00:00:00.06
SYS@lunardb>shutdown immediate
ORA-29701: unable to connect to Cluster Synchronization Service
SYS@lunardb>exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
[oracle@lunarlib work]$ 

这里看到数据库可以正常使用,但是不能关闭,关闭是报错:不能跟CSS进程通信。

[oracle@lunarlib work]$ ss

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:26:46 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options

SYS@lunardb>

数据库的alert显示为:

Mon Jan 11 18:26:37 2016
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
[oracle@lunarlib trace]$ cat lunardb_ora_22027.trc
Trace file /u01/app/oracle/diag/rdbms/lunardb/lunardb/trace/lunardb_ora_22027.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name:    Linux
Node name:      lunarlib
Release:        3.8.13-44.1.1.el6uek.x86_64
Version:        #2 SMP Wed Sep 10 06:10:25 PDT 2014
Machine:        x86_64
Instance name: lunardb
Redo thread mounted by this instance: 1
Oracle process number: 23
Unix process pid: 22027, image: oracle@lunarlib (TNS V1-V3)


*** 2016-01-11 18:26:37.174
*** SESSION ID:(135.10871) 2016-01-11 18:26:37.174
*** CLIENT ID:() 2016-01-11 18:26:37.174
*** SERVICE NAME:(SYS$USERS) 2016-01-11 18:26:37.174
*** MODULE NAME:(sqlplus@lunarlib (TNS V1-V3)) 2016-01-11 18:26:37.174
*** ACTION NAME:() 2016-01-11 18:26:37.174
 
Stopping background process SMCO

*** 2016-01-11 18:26:38.176
kgxgncin: CLSS init failed with status 3
kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS
NOTE: kfmsInit: ASM failed to initialize group services
[oracle@lunarlib trace]$ 

检查一下oarcle的进程:

[oracle@lunarlib trace]$ ps -ef|grep ora_
oracle    5495     1  0 18:14 ?        00:00:00 ora_pmon_lunardb
oracle    5497     1  0 18:14 ?        00:00:00 ora_psp0_lunardb
oracle    5504     1  4 18:14 ?        00:00:36 ora_vktm_lunardb
oracle    5508     1  0 18:14 ?        00:00:00 ora_gen0_lunardb
oracle    5510     1  0 18:14 ?        00:00:00 ora_diag_lunardb
oracle    5512     1  0 18:14 ?        00:00:00 ora_dbrm_lunardb
oracle    5514     1  0 18:14 ?        00:00:00 ora_dia0_lunardb
oracle    5516     1  0 18:14 ?        00:00:00 ora_mman_lunardb
oracle    5518     1  0 18:14 ?        00:00:00 ora_dbw0_lunardb
oracle    5520     1  0 18:14 ?        00:00:00 ora_lgwr_lunardb
oracle    5522     1  0 18:14 ?        00:00:00 ora_ckpt_lunardb
oracle    5524     1  0 18:14 ?        00:00:00 ora_smon_lunardb
oracle    5526     1  0 18:14 ?        00:00:00 ora_reco_lunardb
oracle    5528     1  0 18:14 ?        00:00:00 ora_rbal_lunardb
oracle    5530     1  0 18:14 ?        00:00:00 ora_asmb_lunardb
oracle    5532     1  0 18:14 ?        00:00:00 ora_mmon_lunardb
oracle    5536     1  0 18:14 ?        00:00:00 ora_mmnl_lunardb
oracle    5540     1  0 18:14 ?        00:00:00 ora_mark_lunardb
oracle    5568     1  0 18:14 ?        00:00:00 ora_arc0_lunardb
oracle    5570     1  0 18:14 ?        00:00:00 ora_arc1_lunardb
oracle    5572     1  0 18:14 ?        00:00:00 ora_arc2_lunardb
oracle    5574     1  0 18:14 ?        00:00:00 ora_arc3_lunardb
oracle    5583     1  0 18:14 ?        00:00:00 ora_qmnc_lunardb
oracle    5611     1  0 18:14 ?        00:00:00 ora_q000_lunardb
oracle    5613     1  0 18:14 ?        00:00:00 ora_q001_lunardb
oracle    6691  6657  0 18:29 pts/4    00:00:00 grep ora_
oracle   22988     1  0 18:26 ?        00:00:00 ora_o000_lunardb
oracle   23012     1  0 18:26 ?        00:00:00 ora_o001_lunardb
[oracle@lunarlib trace]$ 

使用shutdown abort关闭数据库:

SYS@lunardb>shutdown abort
ORACLE instance shut down.
SYS@lunardb>exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning and Automatic Storage Management options
[oracle@lunarlib work]$ 
[oracle@lunarlib trace]$ ps -ef|grep ora_
oracle    6709  6657  0 18:31 pts/4    00:00:00 grep ora_
[oracle@lunarlib trace]$ 

alert显示:

Mon Jan 11 18:30:38 2016
Shutting down instance (abort)
License high water mark = 5
USER (ospid: 26332): terminating the instance
Instance terminated by USER, pid = 26332
Mon Jan 11 18:30:38 2016
Instance shutdown complete

这时,如果数据库再次启动就会报错:

[oracle@lunarlib work]$ ss

SQL*Plus: Release 11.2.0.4.0 Production on Mon Jan 11 18:31:50 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SYS@lunardb>startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATADG1/lunardb/spfilelunardb.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATADG1/lunardb/spfilelunardb.ora
ORA-29701: unable to connect to Cluster Synchronization Service
SYS@lunardb>

而此时has的其他进程是存在的,只是/var/tmp/.oracle/* 下面的网络socket文件不在了:

[root@lunarlib rootwork]# ll /var/tmp/.oracle/* 
ls: cannot access /var/tmp/.oracle/*: No such file or directory
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2877     1  0 17:12 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      5177     1  0 18:12 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      6723  4677  0 18:33 pts/0    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      5177     1  0 18:12 ?        00:00:08 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      5306     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      5339     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      5341     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      5356     1  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      5387  5339  0 18:14 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      6725  4677  0 18:33 pts/0    00:00:00 grep d.bin
[root@lunarlib rootwork]# 
/u01/app/11.2.0.4/grid/log/lunarlib/ohasd下的ohasd.log中有如下信息:
2016-01-11 18:28:09.091: [ CRSCOMM][406906624] IpcL: connection to member 9 has been removed
2016-01-11 18:28:09.091: [CLSFRAME][406906624] Removing IPC Member:{Relative|Node:0|Process:9|Type:3}
2016-01-11 18:28:09.091: [CLSFRAME][406906624] Disconnected from AGENT process: {Relative|Node:0|Process:9|Type:3}
2016-01-11 18:28:09.092: [    AGFW][333440768]{0:0:132} Agfw Proxy Server received process disconnected notification, count=1
2016-01-11 18:28:09.092: [    AGFW][333440768]{0:0:132} /u01/app/11.2.0.4/grid/bin/oraagent_grid disconnected.
2016-01-11 18:28:09.092: [    AGFW][333440768]{0:0:132} Agent /u01/app/11.2.0.4/grid/bin/oraagent_grid[5311] stopped!
2016-01-11 18:28:09.092: [ CRSCOMM][333440768]{0:0:132} IpcL: removeConnection: Member 9 does not exist in pending connections.
2016-01-11 18:28:09.093: [    AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:28:09.093: [    AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:3
2016-01-11 18:28:09.095: [   CRSPE][322934528]{0:0:133} Disconnected from server:
2016-01-11 18:28:09.098: [    AGFW][333440768]{0:0:132} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:31:39.112: [    INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) :  Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:31:39.112: [    AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid because pid is not initialized
2016-01-11 18:31:39.112: [    AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:31:39.112: [    AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:5
2016-01-11 18:31:39.119: [    AGFW][333440768]{0:0:132} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:35:09.131: [    INIT][333440768]{0:0:132} {0:0:132} Created alert : (:CRSAGF00130:) :  Failed to start the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:35:09.131: [    AGFW][333440768]{0:0:132} Can not stop the agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid because pid is not initialized
2016-01-11 18:35:09.131: [    AGFW][333440768]{0:0:132} Restarting the agent /u01/app/11.2.0.4/grid/bin/oraagent_grid
2016-01-11 18:35:09.131: [    AGFW][333440768]{0:0:132} Starting the agent: /u01/app/11.2.0.4/grid/bin/oraagent with user id: grid and incarnation:7
2016-01-11 18:35:09.137: [    AGFW][333440768]{0:0:132} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /u01/app/11.2.0.4/grid/bin/oraagent_grid

此时,使用crsctl stop has -f不能停止has服务:

[root@lunarlib rootwork]# crsctl stop has -f
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
[root@lunarlib rootwork]# 

reboot是比较好的选择。那么,如果主机不方便reboot,怎么办呢?
不能重启主机,咱们可以手工处理相关问题。首先,手工清理所有has的进程的网络通讯socket临时文件:

[root@lunarlib rootwork]# rm -rf /var/tmp/.oracle/*
[root@lunarlib rootwork]# ll /var/tmp/.oracle/
total 0
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid      4332     1  0 18:40 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid      4560     1  0 18:42 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/cssdagent
grid      4566     1  0 18:42 ?        00:00:11 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid      4591     1  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid      4594     1  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid      4603     1  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid      4639  4591  0 18:42 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root      4994  4305  0 19:02 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# ps -ef|grep ohasd
root      2882     1  0 18:40 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid      4332     1  0 18:40 ?        00:00:09 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root      4996  4305  0 19:02 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# crsctl status res -t
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# crsctl stop has -f
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
[root@lunarlib rootwork]# 

这时正常的停止has的命令都不能使用了,因为进程间通讯的socket文件被我们删除了。
但是我们可以kill他们:

[root@lunarlib rootwork]# kill -9 4332 4560 4566 4591 4594 4603 4639 2882 4332
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15575  4305  0 19:04 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15548     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15580  4305  0 19:04 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# kill -9 15548
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15608  4305  0 19:04 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15623  4305  0 19:04 pts/1    00:00:00 grep d.bin
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# /etc/init.d/init.ohasd stop -f
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15650  4305  0 19:05 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# /etc/init.d/init.ohasd stop
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root     15672  4305  0 19:05 pts/1    00:00:00 grep ohasd
[root@lunarlib rootwork]# 

在我的测试中,has环境下,一次kill所有进程主机都没有重启(在rac环境下,kill ocssd.bin可能会引起主机重启):

[root@lunarlib rootwork]# ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00000000 0          root       600        1         
0x00000000 65537      root       600        1         

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

[root@lunarlib rootwork]#

然后手工重启has:

[root@lunarlib rootwork]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.
[root@lunarlib rootwork]# 
[root@lunarlib rootwork]# ps -ef|grep ohasd
root     15581     1  0 19:04 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
root     15817  4520  0 19:09 pts/0    00:00:00 tail -f ohasd.log
root     15935 15908  0 19:10 pts/2    00:00:00 grep ohasd
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15806  4305  0 19:09 pts/1    00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start has
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid     15851     1  0 19:09 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root     15937 15908  0 19:10 pts/2    00:00:00 grep d.bin
[root@lunarlib rootwork]# 

随着has的启动,它自己创建了新的网络通讯socket文件:

[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasd
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock
[root@lunarlib rootwork]# ps -ef|grep d.bin
root     15806  4305  0 19:09 pts/1    00:00:00 /u01/app/11.2.0.4/grid/bin/crsctl.bin start has
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid     15851     1  0 19:09 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
root     15940 15908  0 19:11 pts/2    00:00:00 grep d.bin
[root@lunarlib rootwork]# ps -ef|grep d.bin
grid     15811     1  1 19:09 ?        00:00:01 /u01/app/11.2.0.4/grid/bin/ohasd.bin reboot
grid     15947     1  0 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/cssdagent
grid     15952     1  1 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/oraagent.bin
grid     15977     1  0 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/tnslsnr LISTENER -inherit
grid     15980     1  1 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmd.bin
grid     15994     1  1 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/ocssd.bin 
grid     16026 15980  0 19:11 ?        00:00:00 /u01/app/11.2.0.4/grid/bin/evmlogger.bin -o /u01/app/11.2.0.4/grid/evm/log/evmlogger.info -l /u01/app/11.2.0.4/grid/evm/log/evmlogger.log
root     16040 15908  0 19:11 pts/2    00:00:00 grep d.bin
[root@lunarlib rootwork]# ll /var/tmp/.oracle
total 0
prw-r--r-- 1 grid oinstall 0 Jan 11 19:04 npohasd
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.1
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 s#15977.2
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sAevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sCevm
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sCRSD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_CSSD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 slunarlibDBG_EVMD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 slunarlibDBG_OHASD
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib_localhost_lock
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOCSSD_LL_lunarlib__lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sOHASD_IPC_SOCKET_11_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sOHASD_UI_SOCKET
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:11 sOracle_CSS_LclLstnr_localhost_1_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL
-rw-r--r-- 1 grid oinstall 0 Jan 11 19:09 sprocr_local_conn_0_PROL_lock
srwxrwxrwx 1 grid oinstall 0 Jan 11 19:11 sSYSTEM.evm.acceptor.auth
[root@lunarlib rootwork]# 

现在has全部启动正常了:

[root@lunarlib rootwork]# crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRSDG.dg
               ONLINE  ONLINE       lunarlib                                     
ora.DATADG1.dg
               ONLINE  ONLINE       lunarlib                                     
ora.DATADG2.dg
               ONLINE  ONLINE       lunarlib                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       lunarlib                                     
ora.asm
               ONLINE  ONLINE       lunarlib                 Started             
ora.ons
               OFFLINE OFFLINE      lunarlib                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       lunarlib                                     
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       lunarlib                                     
ora.lunardb.db
      1        ONLINE  ONLINE       lunarlib                 Open                
[root@lunarlib rootwork]# 

总结,如果RAC或者HAS下
1,在Linux平台上,Network Socket File在/var/tmp/.oracle/目录下。在其他平台,可能的目录有:/tmp/.oracle/*, /tmp/.oracle 或者 /usr/tmp/.oracle
2,如果CRS或者HAS没有启动,删除oracle临时文件(Network Socket File),在CRS重启后会自动重新创建,没有不良影响。
3,如果CRS或者HAS已经启动并正常运行中,删除oracle临时文件,不影响数据库运行,但是数据库不能正常关闭(可以abort,但是不能启动)
4,如果出现了上面的情况3,CRS不能关闭(包括使用-f选项),只能手工清理共享内存段和kill 进程。在HAS中,kill ocssd.bin进程不会造成主机重启。但是在RAC环境下kill ocssd.bin进程会造成主机重启。
5,如果完成了上面的情况4,只需要重启CRS或者HAS就可以了。

此条目发表在 未分类 分类目录,贴了 标签。将固定链接加入收藏夹。

评论功能已关闭。