联系:QQ(5163721)
标题:10.2.0.1 RAC中重建CRS的过程(比如错误的修改IP导致CRS不能启动)
作者:Lunar©版权所有[文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.]
VIP是oracle 10.1引入的一个绑定在public IP上的静态IP,它的信息岁PUBLIC网卡的信息一起保存在OCR(Oracle Cluster Registry)中。
很多时候客户有一个需求,比如在测试环境是一套IP,迁移到生产后是另一套IP。
这时如果客户不想重装系统,那么就需要修改RAC的IP。
.
通常这样的操作本身都是小操作,不需要大动干戈,但前提是一定按照官方流程操作,比如修改IP时,CRS是开启的还是关闭的?
再或者一类常见错误是,修改IP时,需要的填写的是子网信息而不是IP地址,这也是一类常见的因为修改IP造成CRS不能启动的。
类似这些小地方不注意,就会造成CRS不能启动。
不论是10.2、11.1,还是11.2,这些东西基本的核心内容其实是变化不大的,不同的是11.2更加完善和提供了更多可选的修复方法(报错非官方支持的)。
.
总结一下大概的东西:
1,PUBLIC网络对应的主机名不能修改,如果要修改,官方说法是一定要重装CRS。
这里我感觉是这样的:如果把主机上那几个用主机名名门的目录名称修改为新的主机名,然后执行重新配置CRS的操作,理论上可以行。
比如10g是rootdelete.sh rootdeinstall.sh root.sh
11.2以后是deconfig和reconfig
不过测试环境和自己玩的也就算了,生产环境,我还是感觉按照官方说法靠谱,否则日后出现各种怪癖问题,那就是给自己挖坑了……
.
2,对于PUBLIC网络,不修改主机名,只修改PUBLIC IP,这个不需要修改CRS的配置,因为不论是10g, 11.1 ,还是11.2,CRS中都没有就具体IP
3,对于修改PUBLIC的网络接口名(interface)、子网(subnet)和掩码信息(netmask)等信息,需要使用oifcfg进行修改。
这个以前有客户问过,需要多久?
答:理论上10分钟以内。如果需要验证和回退措施等检验,那么申请30分钟到1小时的停业务时间吧。
.
4,如果修改私有网络,在10.2和11.1,因为这些信息保存在CRS中,因此需要相应的修改CRS信息(使用oifcfg)
从11.2开始,CRS(Cluster Ready Service)升级为GI(Grid Infrastructure)。私有IP对应的私有主机名不再存储在OCR中了。
而是通过GNPN架构来管理(即 网络即插即用,后面会讲。OCR和私有IP对应的主机名也不再有依赖关系。
因此,可以随便修改而不需要在CRS层面改动任何东西。
.
5,如果修改错了,没关系,10.2和11.1的杀手锏是 rootdelete.sh rootdeinstall.sh root.sh
而11.2除了重新配置集群使用的deconfig和reconfig,还增加了使用gpnp profile来修复的功能。
同时gpnptool不知能修复这类问题,还能修复ASM spfile和CRS profile存放位置的信息异常等问题。
(比如ASM中查找磁盘的位置写错了,那么GI是无法正常启动的)
.
具体的,先从10.2中重建CRS的构成如下:
该过程可以用于修复,2类问题:
1,因为修改IP(PUBLIC IP,VIP,PRIVATE IP)等问题,造成CRS不能启动。
2,或者其他跟CRS配置相关的问题导致的CRS不能启动的问题。
.
具体操作如下:
1,停止节点1的CRS
[root@rh1 ~]# crsctl stop crs Stopping resources. Successfully stopped CRS resources Stopping CSSD. Shutting down CSS daemon. Shutdown request successfully issued. [root@rh1 ~]#
2,在节点1执行rootdelete.sh来删除CRS配置信息:
[root@rh1 ~]# cd $ORA_CRS_HOME/install [root@rh1 install]# ls cluster.ini install.excl paramfile.crs rootaddnode.sbs rootdeletenode.sh rootlocaladd cmdllroot.sh install.incl preupdate.sh rootconfig rootdelete.sh rootupgrade envVars.properties make.log readme.txt rootdeinstall.sh rootinstall templocal [root@rh1 install]# ./rootdelete.sh Shutting down Oracle Cluster Ready Services (CRS): Stopping resources. Error while stopping resources. Possible cause: CRSD is down. Stopping CSSD. Unable to communicate with the CSS daemon. Shutdown has begun. The daemons should exit soon. Checking to see if Oracle CRS stack is down... Oracle CRS stack is not running. Oracle CRS stack is down now. Removing script for Oracle Cluster Ready services Updating ocr file for downgrade Cleaning up SCR settings in '/etc/oracle/scls_scr' [root@rh1 install]#
检查现在网卡的信息:
[root@rh1 install]#ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:8A:1A:12 inet addr:192.168.10.11 Bcast:192.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8a:1a12/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1504 errors:0 dropped:0 overruns:0 frame:0 TX packets:1295 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:129195 (126.1 KiB) TX bytes:118741 (115.9 KiB) Interrupt:10 Base address:0x1400 eth1 Link encap:Ethernet HWaddr 00:0C:29:8A:1A:1C inet addr:192.168.20.11 Bcast:192.168.20.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8a:1a1c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:204 errors:0 dropped:0 overruns:0 frame:0 TX packets:210 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:19882 (19.4 KiB) TX bytes:19404 (18.9 KiB) Interrupt:9 Base address:0x1480 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:19827 errors:0 dropped:0 overruns:0 frame:0 TX packets:19827 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6788710 (6.4 MiB) TX bytes:6788710 (6.4 MiB) [root@rh1 install]#
3,在节点2停止CRS
[root@rh2 ~]# crsctl stop crs Stopping resources. Error while stopping resources. Possible cause: CRSD is down. Stopping CSSD. Unable to communicate with the CSS daemon. [root@rh2 ~]#
4,在节点2执行rootdelete.sh来删除CRS配置信息:
[root@rh2 ~]# cd $ORA_CRS_HOME/install [root@rh2 install]# ./rootdelete.sh Shutting down Oracle Cluster Ready Services (CRS): Stopping resources. Error while stopping resources. Possible cause: CRSD is down. Stopping CSSD. Unable to communicate with the CSS daemon. Shutdown has begun. The daemons should exit soon. Checking to see if Oracle CRS stack is down... Oracle CRS stack is not running. Oracle CRS stack is down now. Removing script for Oracle Cluster Ready services Updating ocr file for downgrade Cleaning up SCR settings in '/etc/oracle/scls_scr' [root@rh2 install]#
5,在节点1执行rootdeinstall.sh,清理OCR设备
[root@rh1 install]# ./rootdeinstall.sh Removing contents from OCR device 2560+0 records in 2560+0 records out [root@rh1 install]# [root@rh1 install]# ps -e | grep -i 'ocs[s]d' [root@rh1 install]# ps -e | grep -i 'cr[s]d.bin' [root@rh1 install]# ps -e | grep -i 'ev[m]d.bin' [root@rh1 install]# ps -ef|grep crs root 2309 32489 0 13:32 pts/1 00:00:00 grep crs [root@rh1 install]# ps -ef|grep d.bin root 2311 32489 0 13:32 pts/1 00:00:00 grep d.bin [root@rh1 install]#
6,在节点1执行root.sh来完成节点1的CRS的重新配置
[root@rh1 crs]# ./root.sh WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root WARNING: directory '/u01/app/oracle/product' is not owned by root WARNING: directory '/u01/app/oracle' is not owned by root WARNING: directory '/u01/app' is not owned by root WARNING: directory '/u01' is not owned by root Checking to see if Oracle CRS stack is already configured Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root WARNING: directory '/u01/app/oracle/product' is not owned by root WARNING: directory '/u01/app/oracle' is not owned by root WARNING: directory '/u01/app' is not owned by root WARNING: directory '/u01' is not owned by root assigning default hostname rh1 for node 1. assigning default hostname rh2 for node 2. Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node <nodenumber>: <nodename> <private interconnect name> <hostname> node 1: rh1 int1 rh1 node 2: rh2 int2 rh2 Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Now formatting voting device: /dev/raw/raw1 Format of 1 voting devices complete. Startup will be queued to init within 90 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. rh1 CSS is inactive on these nodes. rh2 Local node checking complete. Run root.sh on remaining nodes to start CRS daemons. [root@rh1 crs]#
7,在节点2执行root.sh来完成节点1的CRS的重新配置
[root@rh2 crs]# ./root.sh WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root WARNING: directory '/u01/app/oracle/product' is not owned by root WARNING: directory '/u01/app/oracle' is not owned by root WARNING: directory '/u01/app' is not owned by root WARNING: directory '/u01' is not owned by root Checking to see if Oracle CRS stack is already configured Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully WARNING: directory '/u01/app/oracle/product/ora10g' is not owned by root WARNING: directory '/u01/app/oracle/product' is not owned by root WARNING: directory '/u01/app/oracle' is not owned by root WARNING: directory '/u01/app' is not owned by root WARNING: directory '/u01' is not owned by root clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. assigning default hostname rh1 for node 1. assigning default hostname rh2 for node 2. Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node <nodenumber>: <nodename> <private interconnect name> <hostname> node 1: rh1 int1 rh1 node 2: rh2 int2 rh2 clscfg: Arguments check out successfully. NO KEYS WERE WRITTEN. Supply -force parameter to override. -force is destructive and will destroy any previous cluster configuration. Oracle Cluster Registry for cluster has already been initialized Startup will be queued to init within 90 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. rh1 rh2 CSS is active on all nodes. Waiting for the Oracle CRSD and EVMD to start Oracle CRS stack installed and running under init(1M) Running vipca(silent) for configuring nodeapps The given interface(s), "eth0" is not public. Public interfaces should be used to configure virtual IPs. [root@rh2 crs]#
确认两个节点的CRS进程都已经正常启动:
[root@rh1 crs]# ps -ef|grep d.bin root 29156 1 0 12:17 ? 00:00:00 /u01/app/oracle/product/ora10g/crs/bin/crsd.bin reboot oracle 29418 29154 0 12:18 ? 00:00:00 /u01/app/oracle/product/ora10g/crs/bin/evmd.bin oracle 29585 29555 0 12:18 ? 00:00:00 /u01/app/oracle/product/ora10g/crs/bin/ocssd.bin [root@rh1 crs]# [root@rh2 crs]# ps -ef|grep d.bin root 19689 1 0 11:57 ? 00:00:00 /u01/app/oracle/product/ora10g/crs/bin/crsd.bin reboot oracle 19961 19687 0 11:58 ? 00:00:00 /u01/app/oracle/product/ora10g/crs/bin/evmd.bin oracle 20096 20070 0 11:58 ? 00:00:00 /u01/app/oracle/product/ora10g/crs/bin/ocssd.bin root 21283 8784 0 11:59 pts/1 00:00:00 grep d.bin [root@rh2 crs]#
检查现在的网卡信息,现在vip已经分别绑定在每个节点的public网卡上了:
[root@rh1 crs]# ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:3F:E6:E7 inet addr:192.168.10.11 Bcast:192.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe3f:e6e7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1305 errors:0 dropped:0 overruns:0 frame:0 TX packets:731 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:202565 (197.8 KiB) TX bytes:184325 (180.0 KiB) Interrupt:9 Base address:0x1400 eth1 Link encap:Ethernet HWaddr 00:0C:29:3F:E6:F1 inet addr:192.168.20.11 Bcast:192.168.20.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe3f:e6f1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5727 errors:0 dropped:0 overruns:0 frame:0 TX packets:8359 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4252719 (4.0 MiB) TX bytes:7524822 (7.1 MiB) Interrupt:10 Base address:0x1480 eth1:1 Link encap:Ethernet HWaddr 00:0C:29:3F:E6:F1 inet addr:192.168.10.21 Bcast:192.168.10.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:10 Base address:0x1480 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:85980 errors:0 dropped:0 overruns:0 frame:0 TX packets:85980 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6918640 (6.5 MiB) TX bytes:6918640 (6.5 MiB) [root@rh1 crs]#
然后再节点1(rh1)运行VIPCA来配置nodeapps
[root@rh1 crs]# crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.rh1.gsd application ONLINE ONLINE rh1 ora.rh1.ons application ONLINE ONLINE rh1 ora.rh1.vip application ONLINE ONLINE rh1 ora.rh2.gsd application ONLINE ONLINE rh2 ora.rh2.ons application ONLINE ONLINE rh2 ora.rh2.vip application ONLINE ONLINE rh2 [root@rh1 crs]# [root@rh2 crs]# crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.rh1.gsd application ONLINE ONLINE rh1 ora.rh1.ons application ONLINE ONLINE rh1 ora.rh1.vip application ONLINE ONLINE rh1 ora.rh2.gsd application ONLINE ONLINE rh2 ora.rh2.ons application ONLINE ONLINE rh2 ora.rh2.vip application ONLINE ONLINE rh2 [root@rh2 crs]#
在节点1添加相应的ASM服务和ASM服务:
[oracle@rh1 ~]$ srvctl add database -d rac -o /u01/app/oracle/product/ora10g/db [oracle@rh1 ~]$ srvctl add instance -d rac -i rac1 -n rh1 [oracle@rh1 ~]$ srvctl add instance -d rac -i rac2 -n rh2 [oracle@rh1 ~]$ srvctl add asm -n rh1 -i +ASM1 -o $ORACLE_HOME [oracle@rh1 ~]$ srvctl add asm -n rh2 -i +ASM2 -o $ORACLE_HOME [oracle@rh1 ~]$ 启动ASM和数据库 [oracle@rh1 ~]$ srvctl start asm -n rh1 [oracle@rh1 ~]$ srvctl start asm -n rh2 [oracle@rh1 ~]$ srvctl start database -d rac [oracle@rh1 ~]$
一切ok了:
[oracle@rh1 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.rac.db application ONLINE ONLINE rh1 ora....c1.inst application ONLINE ONLINE rh1 ora....c2.inst application ONLINE ONLINE rh2 ora....SM1.asm application ONLINE ONLINE rh1 ora.rh1.gsd application ONLINE ONLINE rh1 ora.rh1.ons application ONLINE ONLINE rh1 ora.rh1.vip application ONLINE ONLINE rh1 ora....SM2.asm application ONLINE ONLINE rh2 ora.rh2.gsd application ONLINE ONLINE rh2 ora.rh2.ons application ONLINE ONLINE rh2 ora.rh2.vip application ONLINE ONLINE rh2 [oracle@rh1 ~]$
使用netca重建LISTENER:
[oracle@rh1 admin]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.rac.db application ONLINE ONLINE rh1 ora....c1.inst application ONLINE ONLINE rh1 ora....c2.inst application ONLINE ONLINE rh2 ora....SM1.asm application ONLINE ONLINE rh1 ora....H1.lsnr application ONLINE ONLINE rh1 ora.rh1.gsd application ONLINE ONLINE rh1 ora.rh1.ons application ONLINE ONLINE rh1 ora.rh1.vip application ONLINE ONLINE rh1 ora....SM2.asm application ONLINE ONLINE rh2 ora....H2.lsnr application ONLINE ONLINE rh2 ora.rh2.gsd application ONLINE ONLINE rh2 ora.rh2.ons application ONLINE ONLINE rh2 ora.rh2.vip application ONLINE ONLINE rh2 [oracle@rh1 admin]$