Very recently I had to configure a customer’s RAC private interconnect from bonding to HAIP to get benefit of both NICs.
So I would like to recap here what the hypothetic steps would be if you need to do the same.
In this example I’ll switch from a single-NIC interconnect (eth1) rather than from a bond configuration, so if you are familiar with the RAC Attack! environment you can try to put everything in place on your own.
First, you need to plan the new network configuration in advance, keeping in mind that there are a couple of important restrictions:
- Your interconnect interface naming must be uniform on all nodes in the cluster. The interconnect uses the interface name in its configuration and it doesn’t support different names on different hosts
- You must bind the different private interconnect interfaces in different subnets (see Note: 1481481.1 – 11gR2 CSS Terminates/Node Eviction After Unplugging one Network Cable in Redundant Interconnect Environment if you need an explanation)
Implementation
The RAC Attack book uses one interface per node for the interconnect (eth1, using network 172.16.100.0)
To make things a little more complex, we’ll not use the eth1 in the new HAIP configuration, so we’ll test also the deletion of the old interface.
What you need to do is add two new interfaces (host only in your virtualbox) and configure them as eth2 and eth3, e.g. in networks 172.16.101.0 and 172.16.102.0)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
eth2 Link encap:Ethernet HWaddr 08:00:27:32:76:DD inet addr:172.16.101.51 Bcast:172.16.101.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe32:76dd/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:29 errors:0 dropped:0 overruns:0 frame:0 TX packets:25 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2044 (1.9 KiB) TX bytes:1714 (1.6 KiB) eth3 Link encap:Ethernet HWaddr 08:00:27:2E:05:4B inet addr:172.16.102.61 Bcast:172.16.102.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe2e:54b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:19 errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1140 (1.1 KiB) TX bytes:720 (720.0 b) |
modify /var/named/racattack in order to use the new addresses (RAC doesn’t care about logical names, it’s just for our convenience):
1 2 3 4 5 6 7 8 9 10 |
collabn1 A 192.168.78.51 collabn1-vip A 192.168.78.61 collabn1-priv A 172.16.100.51 collabn1-priv1 A 172.16.101.51 collabn1-priv2 A 172.16.102.61 collabn2 A 192.168.78.52 collabn2-vip A 192.168.78.62 collabn2-priv A 172.16.100.52 collabn2-priv1 A 172.16.101.52 collabn2-priv2 A 172.16.102.62 |
add also the reverse lookup in in-addr.arpa:
1 2 3 4 |
51.101.16.172 PTR collabn1-priv1.racattack. 52.102.16.172 PTR collabn1-priv2.racattack. 61.101.16.172 PTR collabn2-priv1.racattack. 62.102.16.172 PTR collabn2-priv2.racattack. |
restart named on the first node and check that both nodes can ping all the names correctly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
[root@collabn1 named]# ping collabn2-priv1 PING collabn2-priv1.racattack (172.16.101.52) 56(84) bytes of data. 64 bytes from 172.16.101.52: icmp_seq=1 ttl=64 time=1.27 ms 64 bytes from 172.16.101.52: icmp_seq=2 ttl=64 time=0.396 ms ^C --- collabn2-priv1.racattack ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1293ms rtt min/avg/max/mdev = 0.396/0.835/1.275/0.440 ms [root@collabn1 named]# ping collabn2-priv2 PING collabn2-priv2.racattack (172.16.102.62) 56(84) bytes of data. 64 bytes from 172.16.102.62: icmp_seq=1 ttl=64 time=0.924 ms 64 bytes from 172.16.102.62: icmp_seq=2 ttl=64 time=0.251 ms ^C --- collabn2-priv2.racattack ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1480ms rtt min/avg/max/mdev = 0.251/0.587/0.924/0.337 ms [root@collabn1 named]# ping collabn1-priv2 PING collabn1-priv2.racattack (172.16.102.61) 56(84) bytes of data. 64 bytes from 172.16.102.61: icmp_seq=1 ttl=64 time=0.019 ms 64 bytes from 172.16.102.61: icmp_seq=2 ttl=64 time=0.032 ms ^C --- collabn1-priv2.racattack ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1240ms rtt min/avg/max/mdev = 0.019/0.025/0.032/0.008 ms [root@collabn1 named]# ping collabn1-priv1 PING collabn1-priv1.racattack (172.16.101.51) 56(84) bytes of data. 64 bytes from 172.16.101.51: icmp_seq=1 ttl=64 time=0.017 ms 64 bytes from 172.16.101.51: icmp_seq=2 ttl=64 time=0.060 ms ^C --- collabn1-priv1.racattack ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1224ms rtt min/avg/max/mdev = 0.017/0.038/0.060/0.022 ms |
check the nodes that compose the cluster:
1 2 3 |
[root@collabn1 network-scripts]# olsnodes -s collabn1 Active collabn2 Active |
on all nodes, make a copy of the gpnp profile.xml (just in case, the oifcfg tool does the copy automatically)
1 2 |
$ cd $GRID_HOME/gpnp/`hostname`/profiles/peer/ $ cp -p profile.xml profile.xml.bk |
List the available networks:
1 2 3 4 5 6 |
[root@collabn1 bin]# ./oifcfg iflist -p -n eth0 192.168.78.0 PRIVATE 255.255.255.0 eth1 172.16.100.0 PRIVATE 255.255.255.0 eth1 169.254.0.0 UNKNOWN 255.255.0.0 eth2 172.16.101.0 PRIVATE 255.255.255.0 eth3 172.16.102.0 PRIVATE 255.255.255.0 |
Get the current ip configuration for the interconnect:
1 2 3 |
[root@collabn1 bin]# ./oifcfg getif eth0 192.168.78.0 global public eth1 172.16.100.0 global cluster_interconnect |
one one node only, set the new interconnect interfaces:
1 2 3 4 5 6 7 |
[root@collabn1 network-scripts]# oifcfg setif -global eth2/172.16.101.0:cluster_interconnect [root@collabn1 network-scripts]# oifcfg setif -global eth3/172.16.102.0:cluster_interconnect [root@collabn1 network-scripts]# oifcfg getif eth0 192.168.78.0 global public eth1 172.16.100.0 global cluster_interconnect eth2 172.16.101.0 global cluster_interconnect eth3 172.16.102.0 global cluster_interconnect |
check that the other nodes has received the new configuration:
1 2 3 4 5 |
[root@collabn2 bin]# ./oifcfg getif eth0 192.168.78.0 global public eth1 172.16.100.0 global cluster_interconnect eth2 172.16.101.0 global cluster_interconnect eth3 172.16.102.0 global cluster_interconnect |
Before deleting the old interface, it would be sensible to stop your cluster resources (in some cases, one of the nodes may be evicted), in any case the cluster must be restarted completely in order to get the new interfaces working.
Note: having three interfaces in a HAIP interconnect is perfectly working, HAIP works from 2 to 4 interfaces. I’m showing how to delete eth1 just for information!! 🙂
1 2 3 4 5 |
[root@collabn1 network-scripts]# oifcfg delif -global eth1/172.16.100.0 [root@collabn1 network-scripts]# oifcfg getif eth0 192.168.78.0 global public eth2 172.16.101.0 global cluster_interconnect eth3 172.16.102.0 global cluster_interconnect |
on all nodes, shutdown the CRS:
1 2 3 |
[root@collabn1 network-scripts]# crsctl stop crs CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'collabn1' ... |
Now you can disable the old interface:
1 |
[root@collabn1 network-scripts]# ifdown eth1 |
and modify the parameter ONBOOT=no inside the configuration script of eth1 interface.
Start the cluster again:
1 |
[root@collabn1 network-scripts]# crsctl start crs |
And check that the resources are up & running:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# crscst stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 ora.LISTENER.lsnr ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 ora.asm ONLINE ONLINE collabn1 Started ONLINE ONLINE collabn2 Started ora.gsd OFFLINE OFFLINE collabn1 OFFLINE OFFLINE collabn2 ora.net1.network ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 ora.ons ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE collabn2 ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE collabn1 ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE collabn1 ora.collabn1.vip 1 ONLINE ONLINE collabn1 ora.collabn2.vip 1 ONLINE ONLINE collabn2 ora.cvu 1 ONLINE ONLINE collabn1 ora.oc4j 1 ONLINE ONLINE collabn1 ora.orcl.db 1 ONLINE ONLINE collabn1 Open 2 ONLINE ONLINE collabn2 Open ora.scan1.vip 1 ONLINE ONLINE collabn2 ora.scan2.vip 1 ONLINE ONLINE collabn1 ora.scan3.vip 1 ONLINE ONLINE collabn1 |
Testing the high availability
Disconnect cable from one of the two interfaces (virtually if you’re in virtualbox 🙂 )
Pay attention at the NO-CARRIER status (in eth2 in this example):
1 2 3 4 5 6 7 8 9 10 11 |
# ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:07:33:94 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000 link/ether 08:00:27:7f:b4:88 brd ff:ff:ff:ff:ff:ff 4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000 link/ether 08:00:27:51:1d:78 brd ff:ff:ff:ff:ff:ff 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:39:86:f2 brd ff:ff:ff:ff:ff:ff |
check that the CRS is still up & running:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 ora.LISTENER.lsnr ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 ora.asm ONLINE ONLINE collabn1 Started ONLINE ONLINE collabn2 Started ora.gsd OFFLINE OFFLINE collabn1 OFFLINE OFFLINE collabn2 ora.net1.network ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 ora.ons ONLINE ONLINE collabn1 ONLINE ONLINE collabn2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE collabn2 ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE collabn1 ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE collabn1 ora.collabn1.vip 1 ONLINE ONLINE collabn1 ora.collabn2.vip 1 ONLINE ONLINE collabn2 ora.cvu 1 ONLINE ONLINE collabn1 ora.oc4j 1 ONLINE ONLINE collabn1 ora.orcl.db 1 ONLINE ONLINE collabn1 Open 2 ONLINE ONLINE collabn2 Open ora.scan1.vip 1 ONLINE ONLINE collabn2 ora.scan2.vip 1 ONLINE ONLINE collabn1 ora.scan3.vip 1 ONLINE ONLINE collabn1 |
The virtual interface eth2:1 as failed over on the second interface as eth3:2
1 2 3 4 5 6 7 |
eth3:1 Link encap:Ethernet HWaddr 08:00:27:39:86:F2 inet addr:169.254.185.134 Bcast:169.254.255.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth3:2 Link encap:Ethernet HWaddr 08:00:27:39:86:F2 inet addr:169.254.104.52 Bcast:169.254.127.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
After the cable is reconnected, the virtual interface is back on eth2:
1 2 3 |
eth2:1 Link encap:Ethernet HWaddr 08:00:27:51:1D:78 inet addr:169.254.104.52 Bcast:169.254.127.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 |
Further information
For this post I’ve used a RAC version 11.2, but RAC 12c use the very same procedure.
You can discover more here about HAIP:
http://docs.oracle.com/cd/E11882_01/server.112/e10803/config_cw.htm#HABPT5279
And here about how to set it (beside this post!):
https://docs.oracle.com/cd/E11882_01/rac.112/e41959/admin.htm#CWADD90980
https://docs.oracle.com/cd/E11882_01/rac.112/e41959/oifcfg.htm#BCGGEFEI
Cheers
—
Ludo