I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.
I know, I know, there is a paper from Dell and Oracle named:
How Dell Migrated from SUSE Linux to Oracle Linux
That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:
– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart
What I want to achieve instead is:
– add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
– move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
– repeat (for the remaining nodes)
because the customer will also migrate to new hardware.
In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
oracle@sles01:~> crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE sles01 STABLE ora.LISTENER.lsnr ONLINE ONLINE sles01 STABLE ora.asm ONLINE ONLINE sles01 Started,STABLE ora.net1.network ONLINE ONLINE sles01 STABLE ora.ons ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE sles01 STABLE ora.cvu 1 ONLINE ONLINE sles01 STABLE ora.oc4j 1 OFFLINE OFFLINE STABLE ora.scan1.vip 1 ONLINE ONLINE sles01 STABLE ora.sles01.vip 1 ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- oracle@sles01:~> cat /etc/issue Welcome to SUSE Linux Enterprise Server 11 SP4 (x86_64) - Kernel \r (\l). |
I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:
- Add new ip addresses (public, private, vip) to the DNS/hosts
- Install the new OEL server
- Keep the same user and groups (uid, gid, etc)
- Verify the network connectivity and setup SSH equivalence
- Check that the multicast connection is ok
- Add the storage, configure persistent naming (udev) and verify that the disks (major, minor, names) are the very same
- The network cards also must be the very same
Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to
- Kernel release mismatch
- Package mismatch
Here’s an example of output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
oracle@sles01:~> cluvfy stage -pre nodeadd -n rhel01 Performing pre-checks for node addition Checking node reachability... Node reachability check passed from node "sles01" Checking user equivalence... User equivalence check passed for user "oracle" Package existence check passed for "cvuqdisk" Checking CRS integrity... CRS integrity check passed Clusterware version consistency passed. Checking shared resources... Checking CRS home location... Location check passed for: "/u01/app/12.1.0/grid" Shared resources check for node addition passed Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful Check: Node connectivity using interfaces on subnet "192.168.56.0" Node connectivity passed for subnet "192.168.56.0" with node(s) sles01,rhel01 TCP connectivity check passed for subnet "192.168.56.0" Check: Node connectivity using interfaces on subnet "172.16.100.0" Node connectivity passed for subnet "172.16.100.0" with node(s) rhel01,sles01 TCP connectivity check passed for subnet "172.16.100.0" Checking subnet mask consistency... Subnet mask consistency check passed for subnet "192.168.56.0". Subnet mask consistency check passed for subnet "172.16.100.0". Subnet mask consistency check passed. Node connectivity check passed Checking multicast communication... Checking subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251"... Check of subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251" passed. Check of multicast communication passed. Total memory check passed Available memory check passed Swap space check passed Free disk space check passed for "sles01:/usr,sles01:/var,sles01:/etc,sles01:/u01/app/12.1.0/grid,sles01:/sbin,sles01:/tmp" Free disk space check passed for "rhel01:/usr,rhel01:/var,rhel01:/etc,rhel01:/u01/app/12.1.0/grid,rhel01:/sbin,rhel01:/tmp" Check for multiple users with UID value 1101 passed User existence check passed for "oracle" Run level check passed Hard limits check passed for "maximum open file descriptors" Soft limits check passed for "maximum open file descriptors" Hard limits check passed for "maximum user processes" Soft limits check passed for "maximum user processes" System architecture check passed WARNING: PRVF-7524 : Kernel version is not consistent across all the nodes. Kernel version = "3.0.101-63-default" found on nodes: sles01. Kernel version = "3.8.13-16.2.1.el6uek.x86_64" found on nodes: rhel01. Kernel version check passed Kernel parameter check passed for "semmsl" Kernel parameter check passed for "semmns" Kernel parameter check passed for "semopm" Kernel parameter check passed for "semmni" Kernel parameter check passed for "shmmax" Kernel parameter check passed for "shmmni" Kernel parameter check passed for "shmall" Kernel parameter check passed for "file-max" Kernel parameter check passed for "ip_local_port_range" Kernel parameter check passed for "rmem_default" Kernel parameter check passed for "rmem_max" Kernel parameter check passed for "wmem_default" Kernel parameter check passed for "wmem_max" Kernel parameter check passed for "aio-max-nr" Package existence check passed for "make" Package existence check passed for "libaio" Package existence check passed for "binutils" Package existence check passed for "gcc(x86_64)" Package existence check passed for "gcc-c++(x86_64)" Package existence check passed for "glibc" Package existence check passed for "glibc-devel" Package existence check passed for "ksh" Package existence check passed for "libaio-devel" Package existence check failed for "libstdc++33" Check failed on nodes: rhel01 Package existence check failed for "libstdc++43-devel" Check failed on nodes: rhel01 Package existence check passed for "libstdc++-devel(x86_64)" Package existence check failed for "libstdc++46" Check failed on nodes: rhel01 Package existence check failed for "libgcc46" Check failed on nodes: rhel01 Package existence check passed for "sysstat" Package existence check failed for "libcap1" Check failed on nodes: rhel01 Package existence check failed for "nfs-kernel-server" Check failed on nodes: rhel01 Check for multiple users with UID value 0 passed Current group ID check passed Starting check for consistency of primary group of root user Check for consistency of root user's primary group passed Group existence check passed for "asmadmin" Group existence check passed for "asmoper" Group existence check passed for "asmdba" Checking ASMLib configuration. Check for ASMLib configuration passed. Checking OCR integrity... OCR integrity check passed Checking Oracle Cluster Voting Disk configuration... Oracle Cluster Voting Disk configuration check passed Time zone consistency check passed Starting Clock synchronization checks using Network Time Protocol(NTP)... NTP Configuration file check started... No NTP Daemons or Services were found to be running Clock synchronization check using Network Time Protocol(NTP) passed User "oracle" is not part of "root" group. Check passed Checking integrity of file "/etc/resolv.conf" across nodes "domain" and "search" entries do not coexist in any "/etc/resolv.conf" file All nodes have same "search" order defined in file "/etc/resolv.conf" PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: sles01,rhel01 Check for integrity of file "/etc/resolv.conf" failed Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ... Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed Pre-check for node addition was unsuccessful on all the nodes. |
So the problem is not if the check succeed or not (it will not), but what fails.
Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors. I need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
oracle@sles01:/u01/app/12.1.0/grid/addnode> ./addnode.sh -silent "CLUSTER_NEW_NODES={rhel01}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rhel01-vip}" -ignorePrereq -ignoreSysPrereqs Starting Oracle Universal Installer... Checking Temp space: must be greater than 120 MB. Actual 27479 MB Passed Checking swap space: must be greater than 150 MB. Actual 2032 MB Passed Prepare Configuration in progress. Prepare Configuration successful. .................................................. 9% Done. You can find the log of this install session at: /u01/app/oraInventory/logs/addNodeActions2015-11-09_09-57-16PM.log Instantiate files in progress. Instantiate files successful. .................................................. 15% Done. Copying files to node in progress. Copying files to node successful. .................................................. 79% Done. Saving cluster inventory in progress. .................................................. 87% Done. Saving cluster inventory successful. The Cluster Node Addition of /u01/app/12.1.0/grid was successful. Please check '/tmp/silentInstall.log' for more details. As a root user, execute the following script(s): 1. /u01/app/oraInventory/orainstRoot.sh 2. /u01/app/12.1.0/grid/root.sh Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes: [rhel01] Execute /u01/app/12.1.0/grid/root.sh on the following nodes: [rhel01] The scripts can be executed in parallel on all the nodes. If there are any policy managed databases managed by cluster, proceed with the addnode procedure without executing the root.sh script. Ensure that root.sh script is executed after all the policy managed databases managed by clusterware are extended to the new nodes. .......... Update Inventory in progress. .................................................. 100% Done. Update Inventory successful. Successfully Setup Software. |
Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
[oracle@rhel01 install]$ sudo /u01/app/12.1.0/grid/root.sh Performing root user operation for Oracle 12c The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/app/12.1.0/grid Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Relinking oracle with rac_on option Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params 2015/11/09 23:18:42 CLSRSC-363: User ignored prerequisites during installation OLR initialization - successful 2015/11/09 23:19:08 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf' CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Starting Oracle High Availability Services-managed resources CRS-2672: Attempting to start 'ora.mdnsd' on 'rhel01' CRS-2672: Attempting to start 'ora.evmd' on 'rhel01' CRS-2676: Start of 'ora.mdnsd' on 'rhel01' succeeded CRS-2676: Start of 'ora.evmd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rhel01' CRS-2676: Start of 'ora.gpnpd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'rhel01' CRS-2676: Start of 'ora.gipcd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rhel01' CRS-2676: Start of 'ora.cssdmonitor' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rhel01' CRS-2672: Attempting to start 'ora.diskmon' on 'rhel01' CRS-2676: Start of 'ora.diskmon' on 'rhel01' succeeded CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rhel01' CRS-2676: Start of 'ora.cssd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rhel01' CRS-2672: Attempting to start 'ora.ctssd' on 'rhel01' CRS-2676: Start of 'ora.ctssd' on 'rhel01' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rhel01' CRS-2676: Start of 'ora.asm' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.storage' on 'rhel01' CRS-2676: Start of 'ora.storage' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'rhel01' CRS-2676: Start of 'ora.crsd' on 'rhel01' succeeded CRS-6017: Processing resource auto-start for servers: rhel01 CRS-2672: Attempting to start 'ora.ons' on 'rhel01' CRS-2676: Start of 'ora.ons' on 'rhel01' succeeded CRS-6016: Resource auto-start has completed for server rhel01 CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources CRS-4123: Oracle High Availability Services has been started. 2015/11/09 23:22:06 CLSRSC-343: Successfully started Oracle clusterware stack clscfg: EXISTING configuration version 5 detected. clscfg: version 5 is 12c Release 1. Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Preparing packages for installation... cvuqdisk-1.0.9-1 2015/11/09 23:22:23 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded |
Bingo! Let’s check if everything is up and running:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
[oracle@rhel01 ~]$ /u01/app/12.1.0/grid/bin/crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE ora.LISTENER.lsnr ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE ora.asm ONLINE ONLINE rhel01 Started,STABLE ONLINE ONLINE sles01 Started,STABLE ora.net1.network ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE ora.ons ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE sles01 STABLE ora.cvu 1 ONLINE ONLINE sles01 STABLE ora.oc4j 1 OFFLINE OFFLINE STABLE ora.rhel01.vip 1 ONLINE ONLINE rhel01 STABLE ora.scan1.vip 1 ONLINE ONLINE sles01 STABLE ora.sles01.vip 1 ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[oracle@rhel01 ~]$ olsnodes -s sles01 Active rhel01 Active [oracle@rhel01 ~]$ ssh rhel01 uname -r 3.8.13-16.2.1.el6uek.x86_64 [oracle@rhel01 ~]$ ssh sles01 uname -r 3.0.101-63-default [oracle@rhel01 ~]$ ssh rhel01 cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 (Santiago) [oracle@rhel01 ~]$ ssh sles01 cat /etc/issue Welcome to SUSE Linux Enterprise Server 11 SP4 (x86_64) - Kernel \r (\l). |
So yes, it works, but remember that it’s not a supported long-term configuration.
In my case I expect to migrate the whole cluster from SLES to OEL in one day.
NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html
Otherwise, you need to configure a new interface name for the cluster with oifcfg.
HTH
—
Ludovico