Oracle database 10g RAC,非正常删除节点
如题所示,本文将呈现关于如何以非正常的方式删除Oracle 10g RAC的节点的技术细节。
通常,在你需要以非正常方式删除节点的时候,都是某个节点的状态已经异常了。
要开始本文的前提是:你需要有一个运行正常的三节点Oracle database 10g RAC环境。
关于该环境的搭建,可以参考我以往的文章:
Oracle database 10g RAC on RHEL5U10,http://d-prototype.com/archives/881
在本文中,模拟故障的节点为:节点三。
下面开始本文的话题。
——————————————————————————
在开始前,查看下RAC当前的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
[root@rac1 ~]# su - grid [grid@rac1 ~]$ crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....c3.inst application 0/5 0/0 ONLINE ONLINE rac3 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE ONLINE rac3 ora....C3.lsnr application 0/5 0/0 ONLINE ONLINE rac3 ora.rac3.gsd application 0/5 0/0 ONLINE ONLINE rac3 ora.rac3.ons application 0/3 0/0 ONLINE ONLINE rac3 ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac3 [grid@rac1 ~]$ [root@rac1 ~]# su - grid [grid@rac1 ~]$ crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....c3.inst application 0/5 0/0 ONLINE ONLINE rac3 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE ONLINE rac3 ora....C3.lsnr application 0/5 0/0 ONLINE ONLINE rac3 ora.rac3.gsd application 0/5 0/0 ONLINE ONLINE rac3 ora.rac3.ons application 0/3 0/0 ONLINE ONLINE rac3 ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac3 [grid@rac1 ~]$ |
为了模拟节点的异常状态,将节点三的主机关闭:
关闭前:
1 2 3 4 5 6 7 8 9 10 |
[grid@rac1 ~]$ ping -c 3 rac3 PING rac3 (166.100.0.15) 56(84) bytes of data. 64 bytes from rac3 (166.100.0.15): icmp_seq=1 ttl=64 time=0.195 ms 64 bytes from rac3 (166.100.0.15): icmp_seq=2 ttl=64 time=0.190 ms 64 bytes from rac3 (166.100.0.15): icmp_seq=3 ttl=64 time=0.193 ms --- rac3 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.190/0.192/0.195/0.016 ms [grid@rac1 ~]$ |
关闭:
1 2 3 4 5 6 |
[root@rac3 ~]# shutdown -h now Broadcast message from root (pts/2) (Tue Feb 3 15:40:41 2015): The system is going down for system halt NOW! [root@rac3 ~]# |
关闭后:
1 2 3 4 5 6 7 |
[grid@rac1 ~]$ ping -c 3 rac3 PING rac3 (166.100.0.15) 56(84) bytes of data. --- rac3 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms [grid@rac1 ~]$ |
节点三关闭后,查看CRS的资源状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
[grid@rac1 ~]$ crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....c3.inst application 0/5 0/0 ONLINE OFFLINE ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE OFFLINE ora....C3.lsnr application 0/5 0/0 ONLINE OFFLINE ora.rac3.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color lsnr ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora....C3.lsnr application 0/5 0/0 ONLINE OFFLINE [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color db ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color inst ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....c3.inst application 0/5 0/0 ONLINE OFFLINE [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color asm ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE OFFLINE [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color vip ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color ons ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE [grid@rac1 ~]$ [grid@rac1 ~]$ |
从上面的状态可以看到:与节点三有关的资源都处于不可用的状态。
(其实就算这种程度的故障,其实还是很理想的。一般,某个节点异常的时候,比方说节点三异常了,可能节点三对应的VIP的资源“ora.rac3.vip”还是ONLINE得,又或者其他的一些资源飘在“rac3”上面,…等等。本文由于环境限制,无法模拟更复杂的错误场景,故仅用最理想的错误现象,去描述发生问题时,将异常节点从RAC架构中删除的方式,予以参考。)
(可选)环境变量:$DISPLAY
1 2 3 4 |
[root@rac1 ~]# export DISPLAY=168.0.1.190:0.0 [root@rac1 ~]# xhost + access control disabled, clients can connect from any host [root@rac1 ~]# |
下面正式开始描述:
一、使用DBCA删除DATABASE实例
在正常的删除节点步骤中,删除Database实例的第一步应该是停掉异常节点上的RAC相关服务。
你可以选择以下两种方式达成该目的:
1.异常节点上:crsctl stop crs
2.集群任意节点上发出:crs_stop <异常节点的CRS资源名称,Eg:ora.rac.rac3.inst>
这里由于集群异常,异常节点已被关机,所以,上述两个步骤不存在。
然后,就可以真正的操作数据库实例的删除操作了:
在RAC当前的存活节点上运行DBCA:dbca,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[root@rac1 ~]# su - oracle [oracle@rac1 ~]$ env | grep SID ORACLE_SID=dblogic1 [oracle@rac1 ~]$ ps -ef | grep pmon oracle 17009 1 0 Jan30 ? 00:00:08 asm_pmon_+ASM1 oracle 30480 30374 0 16:06 pts/2 00:00:00 grep pmon oracle 31076 1 0 Jan30 ? 00:00:08 ora_pmon_dblogic1 [oracle@rac1 ~]$ [oracle@rac1 ~]$ xhost + access control disabled, clients can connect from any host [oracle@rac1 ~]$ [oracle@rac1 ~]$ dbca & [1] 30553 [oracle@rac1 ~]$ |
删除完成后,查看下当前RAC资源的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
[oracle@rac1 ~]$ su - grid Password: [grid@rac1 ~]$ crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE OFFLINE ora....C3.lsnr application 0/5 0/0 ONLINE OFFLINE ora.rac3.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep db ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 [grid@rac1 ~]$ [grid@rac1 ~]$ crs_stat -t -v | grep --color inst ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 [grid@rac1 ~]$ |
你可以发现,和之前相比,与节点三相关的“inst”资源已经被删掉了。
二、删除ASM实例
上面虽然已经删掉了节点三的”inst”相关的资源,但是“asm”的资源仍旧存在的:
1 2 3 4 5 |
[grid@rac1 ~]$ crs_stat -t -v | grep --color asm ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE OFFLINE [grid@rac1 ~]$ |
在删除ASM实例前,需要停掉要删除的节点上的ASM实例,在本环境中,异常节点关机,故而该步骤省掉了。
如果需要停掉某个节点上的ASM资源,可以如下操作:
/u01/app/10g/grid/bin/srvctl stop asm -n
删除ASM实例,在任意一个RAC当前活动的节点上都可以操作,具体如下:
1 2 3 4 5 6 7 8 9 10 |
[oracle@rac1 ~]$ /u01/app/10g/grid/bin/crs_stat -t -v | grep --color asm ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....SM3.asm application 0/5 0/0 ONLINE OFFLINE [oracle@rac1 ~]$ [oracle@rac1 ~]$ /u01/app/10g/grid/bin/srvctl remove asm -n rac3 [oracle@rac1 ~]$ /u01/app/10g/grid/bin/crs_stat -t -v | grep --color asm ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 [oracle@rac1 ~]$ |
如果“srvctl remove asm -n
/u01/app/10g/grid/bin/srvctl remove asm -n rac3 -f
如上所示,此时,故障节点的ASM实例的RAC资源也被删除了。
三、删掉故障节点的监听资源
删除前:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
[oracle@rac1 ~]$ /u01/app/10g/grid/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora....C3.lsnr application 0/5 0/0 ONLINE OFFLINE ora.rac3.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 [oracle@rac1 ~]$ [oracle@rac1 ~]$ /u01/app/10g/grid/bin/crs_stat -t -v | grep --color lsnr ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora....C3.lsnr application 0/5 0/0 ONLINE OFFLINE [oracle@rac1 ~]$ |
删除:
1 2 3 4 5 6 |
[root@rac1 ~]# /u01/app/10g/grid/bin/crs_unregister ora.rac3.LISTENER_RAC3.lsnr [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/bin/crs_stat -t -v | grep --color lsnr ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 [root@rac1 ~]# |
四、删除故障节点的NodeApp资源
删除前的状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
[root@rac1 ~]# /u01/app/10g/grid/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora.rac3.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/bin/crs_stat -t -v | grep --color rac3 ora.rac3.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE ora.rac3.vip application 0/0 0/0 ONLINE ONLINE rac1 [root@rac1 ~]# [root@rac1 ~]# |
停掉与节点三有关的NodeApp资源,并予以删除:
1 2 3 4 5 6 7 |
[root@rac1 ~]# /u01/app/10g/grid/bin/srvctl stop nodeapps -n rac3 [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/bin/srvctl remove nodeapps -n rac3 Please confirm that you intend to remove the node-level applications on node rac3 (y/[n]) y [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/bin/crs_stat -t -v | grep --color rac3 [root@rac1 ~]# |
此时,RAC的集群资源状态:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[root@rac1 ~]# /u01/app/10g/grid/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.dblogic.db application 0/1 0/1 ONLINE ONLINE rac2 ora....c1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....c2.inst application 0/5 0/0 ONLINE ONLINE rac2 ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 [root@rac1 ~]# |
五、从OCR中删除故障节点,并更新OCR
具体操作如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[root@rac1 ~]# /u01/app/10g/grid/bin/olsnodes -n -i rac1 1 rac1-vip rac2 2 rac2-vip rac3 3 <none> [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/install/rootdeletenode.sh rac3,3 CRS-0210: Could not find resource 'ora.rac3.ons'. CRS-0210: Could not find resource 'ora.rac3.vip'. CRS-0210: Could not find resource 'ora.rac3.gsd'. CRS-0210: Could not find resource ora.rac3.vip. CRS nodeapps are deleted successfully clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. Successfully deleted 14 values from OCR. Key SYSTEM.css.interfaces.noderac3 marked for deletion is not there. Ignoring. Successfully deleted 5 keys from OCR. Node deletion operation successful. 'rac3,3' deleted successfully [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/bin/olsnodes -n -i rac1 1 rac1-vip rac2 2 rac2-vip [root@rac1 ~]# |
六、删除故障节点的Clusterware
具体如下:
1 2 3 4 5 6 7 8 |
[oracle@rac1 ~]$ /u01/app/10g/grid/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/10g/grid/ "CLUSTER_NODES=rac1,rac2" CRS=TRUE Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/grid/oraInventory 'UpdateNodeList' was successful. [oracle@rac1 ~]$ |
————————————————————————————
Done。