MOUNT.OCFS2 and Oracle RAC 10g error:Transport endpoint is not connected while mounting
问题如题所示,它发生在为OCFS2增加节点的过程后的新节点挂载集群文件系统的时候,具体的报错信息如下:
1 2 3 |
[root@rac3 ~]# mount.ocfs2 -o datavolume,nointr /dev/sdb1 /ocfs/ mount.ocfs2: Transport endpoint is not connected while mounting /dev/sdb1 on /ocfs/. Check 'dmesg' for more information on this error. [root@rac3 ~]# |
报错日志(/var/log/messages)如下:
1 2 3 4 5 6 7 8 |
Sep 17 18:41:50 rac3 kernel: (3558,0):o2net_connect_expired:1585 ERROR: no connection established with node 1 after 30.0 seconds, giving up and returning errors. Sep 17 18:41:50 rac3 kernel: (4531,0):dlm_request_join:901 ERROR: status = -107 Sep 17 18:41:50 rac3 kernel: (4531,0):dlm_try_to_join_domain:1049 ERROR: status = -107 Sep 17 18:41:50 rac3 kernel: (4531,0):dlm_join_domain:1321 ERROR: status = -107 Sep 17 18:41:50 rac3 kernel: (4531,0):dlm_register_domain:1514 ERROR: status = -107 Sep 17 18:41:50 rac3 kernel: (4531,0):ocfs2_dlm_init:2024 ERROR: status = -107 Sep 17 18:41:50 rac3 kernel: (4531,0):ocfs2_mount_volume:1133 ERROR: status = -107 Sep 17 18:41:50 rac3 kernel: ocfs2: Unmounting device (8,17) on (node 2) |
造成上述问题的原因有以下几个方面:
1.OCFS2中各个节点:防火墙与SELINUX
1 2 3 4 5 |
[root@rac3 ~]# service iptables status Firewall is stopped. [root@rac3 ~]# sestatus SELinux status: disabled [root@rac3 ~]# |
连通性:
1 2 3 4 5 6 7 |
[root@rac3 ~]# nc -z 192.168.119.21 7777 Connection to 192.168.119.21 7777 port [tcp/cbt] succeeded! [root@rac3 ~]# nc -z 192.168.119.22 7777 Connection to 192.168.119.22 7777 port [tcp/cbt] succeeded! [root@rac3 ~]# nc -z 192.168.119.31 7777 Connection to 192.168.119.31 7777 port [tcp/cbt] succeeded! [root@rac3 ~]# |
2.各个节点的O2CB的状态(Configure值)一致:
节点一:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[root@rac1 ~]# /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold: 61 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 3000 Checking O2CB heartbeat: Not active [root@rac1 ~]# |
节点二:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[root@rac2 ~]# /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold: 61 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 3000 Checking O2CB heartbeat: Not active [root@rac2 ~]# |
节点三:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[root@rac3 ~]# /etc/init.d/o2cb status Module "configfs": Loaded Filesystem "configfs": Mounted Module "ocfs2_nodemanager": Loaded Module "ocfs2_dlm": Loaded Module "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold: 61 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 3000 Checking O2CB heartbeat: Not active [root@rac3 ~]# |
正常情况下,注意了上面几点就可以成功挂载。
——————————————————
** 如果挂载的OCFS2上启动了Oracle RAC,则需要停掉RAC(节点一、节点二)才可以正常的挂载新增节点(节点三),否则仍然会出现上面的错误。
(节点一)
停止Oracle RAC 10g:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
[root@rac1 ~]# /u01/app/10g/grid/bin/crs_stop -all Attempting to stop `ora.rac1.gsd` on member `rac1` Attempting to stop `ora.ORCL.db` on member `rac1` Attempting to stop `ora.rac1.ons` on member `rac1` Stop of `ora.rac1.gsd` on member `rac1` succeeded. Stop of `ora.rac1.ons` on member `rac1` succeeded. Stop of `ora.ORCL.db` on member `rac1` succeeded. Attempting to stop `ora.rac3.vip` on member `rac1` Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` Target set to OFFLINE for `ora.rac2.LISTENER_RAC2.lsnr` Target set to OFFLINE for `ora.rac2.ASM2.asm` Attempting to stop `ora.rac2.vip` on member `rac1` Stop of `ora.rac3.vip` on member `rac1` succeeded. Stop of `ora.rac2.vip` on member `rac1` succeeded. Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded. `ora.ORCL.ORCL1.inst` is already OFFLINE. Attempting to stop `ora.rac1.ASM1.asm` on member `rac1` Stop of `ora.rac1.ASM1.asm` on member `rac1` succeeded. Attempting to stop `ora.rac1.vip` on member `rac1` Stop of `ora.rac1.vip` on member `rac1` succeeded. CRS-0216: Could not stop resource 'ora.ORCL.ORCL1.inst'. [root@rac1 ~]# [root@rac1 ~]# /u01/app/10g/grid/bin/crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora....L1.inst application 0/5 0/0 OFFLINE OFFLINE ora....L2.inst application 0/5 0/0 OFFLINE OFFLINE ora.ORCL.db application 0/1 0/1 OFFLINE OFFLINE ora....SM1.asm application 0/5 0/0 OFFLINE OFFLINE ora....C1.lsnr application 0/5 0/0 OFFLINE OFFLINE ora.rac1.gsd application 0/5 0/0 OFFLINE OFFLINE ora.rac1.ons application 0/3 0/0 OFFLINE OFFLINE ora.rac1.vip application 0/0 0/0 OFFLINE OFFLINE ora....SM2.asm application 0/5 0/0 OFFLINE OFFLINE ora....C2.lsnr application 0/5 0/0 OFFLINE OFFLINE ora.rac2.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac2.ons application 0/3 0/0 ONLINE OFFLINE ora.rac2.vip application 0/0 0/0 OFFLINE OFFLINE ora.rac3.gsd application 0/5 0/0 ONLINE OFFLINE ora.rac3.ons application 0/3 0/0 ONLINE OFFLINE ora.rac3.vip application 0/0 0/0 OFFLINE OFFLINE [root@rac1 ~]# [root@rac1 ~]# ps -ef | grep d.bin root 4478 1 0 18:42 ? 00:00:00 /bin/su -l oracle -c sh -c 'ulimit -c unlimited; cd /u01/app/10g/grid/log/rac1/evmd; exec /u01/app/10g/grid/bin/evmd ' root 4557 1 0 18:42 ? 00:00:00 /u01/app/10g/grid/bin/crsd.bin reboot oracle 6310 4478 0 19:00 ? 00:00:00 /u01/app/10g/grid/bin/evmd.bin root 6397 6284 0 19:00 ? 00:00:00 /bin/su -l oracle -c /bin/sh -c 'ulimit -c unlimited; cd /u01/app/10g/grid/log/rac1/cssd; /u01/app/10g/grid/bin/ocssd || exit $?' oracle 6398 6397 0 19:00 ? 00:00:00 /bin/sh -c ulimit -c unlimited; cd /u01/app/10g/grid/log/rac1/cssd; /u01/app/10g/grid/bin/ocssd || exit $? oracle 6421 6398 0 19:00 ? 00:00:00 /u01/app/10g/grid/bin/ocssd.bin oracle 6664 6310 0 19:00 ? 00:00:00 /u01/app/10g/grid/bin/evmlogger.bin -o /u01/app/10g/grid/evm/log/evmlogger.info -l /u01/app/10g/grid/evm/log/evmlogger.log root 14795 4108 0 19:05 pts/1 00:00:00 grep d.bin [root@rac1 ~]# [root@rac1 ~]# /etc/init.d/init.crs stop Shutting down Oracle Cluster Ready Services (CRS): Stopping resources. Successfully stopped CRS resources Stopping CSSD. Shutting down CSS daemon. Shutdown request successfully issued. Shutdown has begun. The daemons should exit soon. [root@rac1 ~]# [root@rac1 ~]# ps -ef | grep d.bin root 15325 4108 0 19:05 pts/1 00:00:00 grep d.bin [root@rac1 ~]# [root@rac1 ~]# |
如果不停止Orace RAC 10g,那么卸载OCFS2挂载点的时候,会报错“资源繁忙”:
1 2 3 4 |
[root@rac1 ~]# umount /ocfs umount: /ocfs: device is busy umount: /ocfs: device is busy [root@rac1 ~]# |
资源繁忙的时候,无法强制关闭O2CB服务:
1 2 3 4 5 6 7 8 |
[root@rac2 ~]# /etc/init.d/o2cb unload Stopping O2CB cluster ocfs2: Failed Unable to stop cluster as heartbeat region still active [root@rac2 ~]# [root@rac2 ~]# /etc/init.d/o2cb stop Stopping O2CB cluster ocfs2: Failed Unable to stop cluster as heartbeat region still active [root@rac2 ~]# |
(节点一、节点二)
RAC成功停止后,OCFS2的挂载点就能够被卸载了:
1 2 3 4 5 6 7 8 9 10 11 |
[root@rac1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16G 6.6G 8.2G 45% / /dev/sda1 99M 13M 82M 13% /boot tmpfs 1005M 0 1005M 0% /dev/shm /dev/hdc 3.3G 3.3G 0 100% /iso /dev/sdc1 1.9G 289M 1.6G 16% /ocfs [root@rac1 ~]# [root@rac1 ~]# umount /ocfs [root@rac1 ~]# |
(节点三)
再次尝试挂载OCFS2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[root@rac3 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16G 3.6G 12G 25% / /dev/sda1 99M 13M 82M 13% /boot tmpfs 1005M 0 1005M 0% /dev/shm /dev/hdc 3.3G 3.3G 0 100% /iso [root@rac3 ~]# mount -t ocfs2 -o datavolume,nointr /dev/sdb1 /ocfs/ [root@rac3 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 16G 3.6G 12G 25% / /dev/sda1 99M 13M 82M 13% /boot tmpfs 1005M 0 1005M 0% /dev/shm /dev/hdc 3.3G 3.3G 0 100% /iso /dev/sdb1 1.9G 281M 1.6G 15% /ocfs [root@rac3 ~]# |
挂载成功。
成功挂载的日志(/var/log/messages):
1 2 3 4 |
Sep 17 19:12:23 rac3 kernel: ocfs2_dlm: Nodes in domain ("4D31A0A86089432A82FA73C4E94C724B"): 2 Sep 17 19:12:23 rac3 kernel: kjournald starting. Commit interval 5 seconds Sep 17 19:12:23 rac3 kernel: ocfs2: Mounting device (8,17) on (node 2, slot 0) Sep 17 19:12:33 rac3 kernel: (3558,0):o2net_connect_expired:1585 ERROR: no connection established with node 2 after 30.0 seconds, giving up and returning errors. |
(需要注意的是,发生上述问题的节点三还没有执行root.sh,将自身信息写入OCR。)
——————————————————————————————————————————
Ending。