Oracle DB 10g RAC:VIPCA配置错误后的修复
今天,在操作10g RAC的时候,遇到了这样的一个问题:
1 2 3 4 5 6 7 8 |
[root@xymis2 ~]# ora.xymis1.gsd application 0/5 0/0 ONLINE OFFLINE ora.xymis1.ons application 0/3 0/0 ONLINE OFFLINE ora.xymis1.vip application 0/0 0/0 ONLINE ONLINE xymis2 ora.xymis2.gsd application 0/5 0/0 ONLINE ONLINE xymis2 ora.xymis2.ons application 0/3 0/0 ONLINE ONLINE xymis2 ora.xymis2.vip application 0/0 0/0 ONLINE OFFLINE [root@xymis2 ~]# |
如上所示,RAC的集群资源状态不正常。
且,仅能在节点二上“crsctl start crs”(启动CRS),节点一启动CRS失败(节点一的私有网卡启动失败,IP地址占用)。
在节点二的报错日志中,可以看到如下信息:
Log File:/u01/app/10g/grid/log/xymis2/racg/ora.xymis2.vip.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
2015-06-09 17:53:00.352: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:00.352: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:00.352: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:00.352: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:00.352: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:00.352: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the n 2015-06-09 17:53:02.355: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: etwork (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:02.355: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/10g/grid 2015-06-09 17:53:02.355: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: clsrcexecut: cmd = /u01/app/10g/grid/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/app/10g/grid/bin/racgvip start xymis2 2015-06-09 17:53:02.355: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: clsrcexecut: rc = 1, time = 22.010s 2015-06-09 17:53:02.559: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) 2015-06-09 17:53:02.559: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/app/10g/grid 2015-06-09 17:53:02.559: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: clsrcexecut: cmd = /u01/app/10g/grid/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/app/10g/grid/bin/racgvip check xymis2 2015-06-09 17:53:02.559: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: clsrcexecut: rc = 1, time = 0.210s 2015-06-09 17:53:02.559: [ RACG][780690112] [7838][780690112][ora.xymis2.vip]: end for resource = ora.xymis2.vip, action = start, status = 1, time = 22.260s 2015-06-09 17:53:23.059: [ RACG][1485247168] [8419][1485247168][ora.xymis2.vip]: IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) IP:10.0.0.2 is already up in the network (host=xymis2) |
————————————————
该问题是由于:误配VIP而引发的。
在我的环境中,网络IP的分配策略如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[root@xymis2 ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. # Localhost 127.0.0.1 localhost # Public 170.0.2.15 xymis1 170.0.2.16 xymis2 # Private 10.0.0.1 xymis1-priv 10.0.0.2 xymis2-priv # VIP 170.0.2.17 xymis1-vip 170.0.2.18 xymis2-vip [root@xymis2 ~]# |
而发生问题的时候,能够启动CRS的节点二的网络IP状态是:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
[root@xymis2 ~]# ifconfig eth0 Link encap:Ethernet HWaddr A0:D3:C1:F2:D0:CC inet addr:170.0.2.16 Bcast:170.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::a2d3:c1ff:fef2:d0cc/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2720690 errors:0 dropped:0 overruns:0 frame:0 TX packets:287158 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4041509021 (3.7 GiB) TX bytes:23654520 (22.5 MiB) Interrupt:91 Memory:f6bf0000-f6c00000 eth0:1 Link encap:Ethernet HWaddr A0:D3:C1:F2:D0:CC inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:91 Memory:f6bf0000-f6c00000 eth1 Link encap:Ethernet HWaddr A0:D3:C1:F2:D0:CD inet addr:10.0.0.2 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::a2d3:c1ff:fef2:d0cd/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:8917 errors:0 dropped:0 overruns:0 frame:0 TX packets:9634 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3290794 (3.1 MiB) TX bytes:5150971 (4.9 MiB) Interrupt:99 Memory:f6bc0000-f6bd0000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:95671 errors:0 dropped:0 overruns:0 frame:0 TX packets:95671 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:5248180 (5.0 MiB) TX bytes:5248180 (5.0 MiB) [root@xymis2 ~]# |
节点一的IP地址占用是由于节点二的VIP错误的启用了节点一的私有IP导致的。
在VIPCA的配置过程中,会需要提供VIP的别名信息,如下图所示:
如果上述提供的别名错了,那么最终指向的IP地址也会错误。
应该填写的是“XXXX-vip”的别名。
在本文文首的错误中,VIPCA被错误的设置为了RAC的私有网卡的地址,而私有网卡的地址已经启用,故而VIP的进程日志中会提示:
“IP:10.0.0.2 is already up in the network (host=xymis2)”
解决方法:
修改RAC集群资源中的VIP的设定,具体如下所示。
首先,停掉节点一和节点二的Nodeapp。
因为当前仅能通过节点二访问集群件,故而在节点二操作。
1 2 3 |
[root@xymis2 ~]# /u01/app/10g/grid/bin/srvctl stop nodeapps -n xymis1 [root@xymis2 ~]# /u01/app/10g/grid/bin/srvctl stop nodeapps -n xymis2 [root@xymis2 ~]# |
所有节点的NodeApp全部停掉后的状态:
1 2 3 4 5 6 7 8 9 10 |
[root@xymis2 ~]# /u01/app/10g/grid/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.xymis1.gsd application 0/5 0/0 OFFLINE OFFLINE ora.xymis1.ons application 0/3 0/0 OFFLINE OFFLINE ora.xymis1.vip application 0/0 0/0 OFFLINE OFFLINE ora.xymis2.gsd application 0/5 0/0 OFFLINE OFFLINE ora.xymis2.ons application 0/3 0/0 OFFLINE OFFLINE ora.xymis2.vip application 0/0 0/0 OFFLINE OFFLINE [root@xymis2 ~]# |
所有及诶单的NodeAPP都关闭后,节点二对节点一的私有IP的占用就解除了。
在节点一上,通过“ifup eth1”,将私有网卡拉起来。
然后,根据当前系统的IP资源分配策略,修改VIP的设定:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
[root@xymis2 ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. # Localhost 127.0.0.1 localhost # Public 170.0.2.15 xymis1 170.0.2.16 xymis2 # Private 10.0.0.1 xymis1-priv 10.0.0.2 xymis2-priv # VIP 170.0.2.17 xymis1-vip 170.0.2.18 xymis2-vip [root@xymis2 ~]# [root@xymis2 ~]# /u01/app/10g/grid/bin/srvctl modify nodeapps -n xymis1 -A 170.0.2.17/255.255.255.0/eth0 [root@xymis2 ~]# /u01/app/10g/grid/bin/srvctl modify nodeapps -n xymis2 -A 170.0.2.18/255.255.255.0/eth0 [root@xymis2 ~]# |
节点一之前由于私有网卡没有起来,所以CRS资源没有起来。
在继续操作前,需要将节点一的CRS资源拉起来:/u01/app/10g/grid/bin/crsctl start crs
启动节点一跟节点二的Nodeapp:
1 2 |
[root@xymis2 ~]# /u01/app/10g/grid/bin/srvctl start nodeapps -n xymis1 [root@xymis2 ~]# /u01/app/10g/grid/bin/srvctl start nodeapps -n xymis2 |
最后,查看下当前的RAC集群件状态:
1 2 3 4 5 6 7 8 9 10 |
[root@xymis2 ~]# /u01/app/10g/grid/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora.xymis1.gsd application 0/5 0/0 ONLINE ONLINE xymis1 ora.xymis1.ons application 0/3 0/0 ONLINE ONLINE xymis1 ora.xymis1.vip application 0/0 0/0 ONLINE ONLINE xymis1 ora.xymis2.gsd application 0/5 0/0 ONLINE ONLINE xymis2 ora.xymis2.ons application 0/3 0/0 ONLINE ONLINE xymis2 ora.xymis2.vip application 0/0 0/0 ONLINE ONLINE xymis2 [root@xymis2 ~]# |
——————————————————————————————————
Done。
在NODEAPP服务异常的状态下别用VIPCA,越搞越复杂,不过技术进步了,分析有条理了!