Yesterday I have presented the Oracle Rapid Home Provisioning technology for the RAC SIG, you can find the recording on YouTube:
Cheers
—
Ludo
Yesterday I have presented the Oracle Rapid Home Provisioning technology for the RAC SIG, you can find the recording on YouTube:
Cheers
—
Ludo
In a few days I will give a presentation at UKOUG Tech15 about Rapid Home Provisioning, it will be the first time that I present this session in public.
I usually like to give the link to the material to my audience, so here we go:
Slides:
Demo:
Enjoy
—
Ludovico
In a few days I will give a presentation at UKOUG Tech15 about Global Data Services, it will be the first time that I present this session.
I usually like to give the link to the material to my audience, so here we go:
Credits
I have to give special credits to my colleague Robert Bialek. I’ve got a late confirmation for this session and my slide deck was not ready at all, so I have used a big part of his original work. Most of the content included in the slides has been created by Robert, not me. (Thank you for your help! :-))
Slides
Demo recording
Demo script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
clear function db { export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin } function gsm { export ORACLE_HOME=/u01/app/oracle/product/12.1.0/gsmhome_1 export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin } db echo "#### CURRENT CONFIGURATION: CLASSIC DATA GUARD, 3 DATABASES ####" dgmgrl -echo sys/password1@oltp_de <<EOF show configuration EOF echo "next: GSM config" read -p "" gsm echo "#### GSM CONFIGURATION ####" echo "GDS COMMAND: config" gdsctl <<EOF connect gsm_admin/password1@gsm1 config exit EOF echo "next: ADD GDSPOOL" read -p "" echo "#### ADD GDSPOOL ####" echo "GDS COMMAND: add gdspool -gdspool sales" gdsctl <<EOF connect gsm_admin/password1@gsm1 add gdspool -gdspool sales exit EOF echo "next: ADD BROKERCONFIG" read -p "" echo "#### ADD BROKERCONFIG ####" echo "GDS COMMAND: add brokerconfig -connect gsm02.trivadistraining.com:1521/oltp_de -pwd password1 -gdspool sales -region germany" gdsctl <<EOF connect gsm_admin/password1@gsm1 add brokerconfig -connect gsm02.trivadistraining.com:1521/oltp_de -pwd password1 -gdspool sales -region germany exit EOF echo "next: config databases" read -p "" echo "#### CONFIG DATABASES ####" echo "GDS COMMAND: config database" gdsctl <<EOF connect gsm_admin/password1@gsm1 config database exit EOF echo "next: modify databases" read -p "" echo "#### MODIFY DATABASES ####" echo "GDS COMMAND: modify database -database oltp_ch1 -region switzerland modify database -database oltp_ch2 -region switzerland " gdsctl <<EOF connect gsm_admin/password1@gsm1 modify database -database oltp_ch1 -region switzerland modify database -database oltp_ch2 -region switzerland config database exit EOF echo "next: add service read/write" read -p "" echo "#### ADD SERVICE R/W ####" echo "GDS COMMAND: add service -gdspool sales -service gsales_rw -role primary -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover start service -service gsales_rw services" gdsctl <<EOF connect gsm_admin/password1@gsm1 add service -gdspool sales -service gsales_rw -role primary -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover start service -service gsales_rw services exit EOF echo "next: ADD SERVICE R/O" read -p "" echo "#### ADD SERVICE R/O ####" echo "GDS COMMAND: add service -gdspool sales -service gsales_ro -role PHYSICAL_STANDBY -failover_primary -lag 20 -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover start service -service gsales_ro services " gdsctl <<EOF connect gsm_admin/password1@gsm1 add service -gdspool sales -service gsales_ro -role PHYSICAL_STANDBY -failover_primary -lag 20 -preferred_all -failovertype SELECT -failovermethod BASIC -failoverretry 5 -failoverdelay 3 -locality LOCAL_ONLY -region_failover start service -service gsales_ro services exit EOF echo "next: stop apply ch1 (run cli_ro_short.sh first)" read -p "" db echo "#### STOP APPLY DATA GUARD ON OLTP_CH1 ####" dgmgrl -echo sys/password1@oltp_de <<EOF edit database oltp_ch1 set state='apply-off'; EOF echo "next: gds services" read -p "" gsm echo "#### GDS SERVICES ####" echo "GDS COMMAND: services " gdsctl <<EOF connect gsm_admin/password1@gsm1 services exit EOF echo "next: stop apply ch2 (run cli_ro_short.sh first)" read -p "" db echo "#### STOP APPLY DATA GUARD ON OLTP_CH2 ####" dgmgrl -echo sys/password1@oltp_de <<EOF edit database oltp_ch2 set state='apply-off'; EOF echo "next: gds services" read -p "" gsm echo "#### GDS SERVICES ####" echo "GDS COMMAND: services " gdsctl <<EOF connect gsm_admin/password1@gsm1 services exit EOF echo "next: gds services" read -p "" gsm echo "#### GDS SERVICES ####" echo "GDS COMMAND: services " gdsctl <<EOF connect gsm_admin/password1@gsm1 services exit EOF echo "next: start apply ch1 and ch2" read -p "" db echo "#### START APPLY DATA GUARD ON OLTP_CH1 and OLTP_CH2 ####" dgmgrl -echo sys/password1@oltp_de <<EOF edit database oltp_ch1 set state='apply-on'; EOF echo "sleeping 5" sleep 5 dgmgrl -echo sys/password1@oltp_de <<EOF edit database oltp_ch2 set state='apply-on'; EOF echo "next: gds services" read -p "" gsm echo "#### GDS SERVICES ####" echo "GDS COMMAND: services " gdsctl <<EOF connect gsm_admin/password1@gsm1 services exit EOF echo "next: gds services" read -p "" gsm echo "#### GDS SERVICES ####" echo "GDS COMMAND: services " gdsctl <<EOF connect gsm_admin/password1@gsm1 services exit EOF echo "next: switchover to CH1 (run cli_ro_long.sh and cli_rw_long.sh first)" read -p "" db echo "#### VALIDATE DATABASE OLTP_CH1 ####" dgmgrl -echo sys/password1@oltp_de <<EOF validate database oltp_ch1; EOF echo "next: switchover" read -p "" echo "#### SWITCHOVER TO OLTP_CH1 ####" dgmgrl -echo sys/password1@oltp_de <<EOF switchover to oltp_ch1; EOF echo "next: gds services" read -p "" |
And the script to revert the demo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
clear function db { export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin } function gsm { export ORACLE_HOME=/u01/app/oracle/product/12.1.0/gsmhome_1 export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin } db dgmgrl -echo sys/password1@oltp_de <<EOF switchover to oltp_de; EOF gsm echo "#### STOP and DELETE SERVICE, REMOVE BROKERCONFIG, REMOVE POOL ####" gdsctl <<EOF connect gsm_admin/password1@gsm1 stop service -service gsales_ro stop service -service gsales_rw remove service -service gsales_ro remove service -service gsales_rw remove brokerconfig remove gdspool -gdspool sales config exit EOF db dgmgrl -echo sys/password1@oltp_de <<EOF show configuration EOF echo "DEMO reverted." read -p "" |
Cheers
—
Ludovico
Update: I will give this presentation at UKOUG Tech15, Wed 9 December at 14:30.
This presentation has had a very poor score in selections for conferences (no OOW, no DOAG) but people liked it very much at Paris Oracle Meetup. The Database on ACFS is mainstream now, thanks to the new ODA releases. Having some knowledge about why and how you should run (not) Databases on ACFS is definitely worth a read.
Slides
Demo 1 recording
Demo 2 recording
Demo script (DB ACFS clone from Standby Database)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
#!/bin/bash #################### # create the clone # #################### set -x NUM=`echo $$ | cut -c 1-4` export NEWNAME=${1:-SN$NUM} cat <<EOF ################################################ ################################################ ## ## CLONING DATABASE USING NEW SID: $NEWNAME ## ################################################ ################################################ EOF export ORACLE_SID=ACFSDB_1 dgmgrl <<EOF connect sys/racattack edit database ACFSDB set state="APPLY-OFF"; exit EOF acfsutil snap create -w $NEWNAME /u02 dgmgrl <<EOF connect sys/racattack edit database ACFSDB set state="APPLY-ON"; exit EOF cd /u02/.ACFS/snaps/$NEWNAME/ACFSDB sqlplus / as sysdba <<EOF alter database backup controlfile to trace as '/u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc' reuse resetlogs; create pfile='/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora' from spfile; exit EOF sed -i -e "s/u02\/ACFSDB\//u02\/.ACFS\/snaps\/$NEWNAME\/ACFSDB\//g" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc sed -i -e "s/CREATE CONTROLFILE.*$/CREATE CONTROLFILE REUSE SET DATABASE \"$NEWNAME\" RESETLOGS FORCE LOGGING NOARCHIVELOG/" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc rm /u02/.ACFS/snaps/$NEWNAME/ACFSDB/ACFSDB/fast_recovery_area/ACFSDB/controlfile/* rm /u02/.ACFS/snaps/$NEWNAME/ACFSDB/ACFSDB/controlfile/* sed -i '/^ACFS.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i -e "s/u02\/ACFSDB\//u02\/.ACFS\/snaps\/$NEWNAME\/ACFSDB\//g" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.db_name.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.db_unique_name.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.dispatchers.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.audit_file_dest.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.fal_server.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.log_archive_config.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.log_archive_dest_1.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.memory_target.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.service_names.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora sed -i '/^\*\.cluster_database.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora #find /u02/.ACFS/snaps/SNAP1/ACFSDB/ACFSDB/fast_recovery_area/ACFSDB/archivelog/ -type f -exec mv {} /u02/.ACFS/snaps/SNAP1/ACFSDB/archivelog/ \; mkdir -p $ORACLE_BASE/admin/$NEWNAME/adump cat >>/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora <<EOF *.audit_file_dest='$ORACLE_BASE/admin/$NEWNAME/adump' *.db_name='$NEWNAME' *.db_unique_name='$NEWNAME' *.dispatchers='(PROTOCOL=TCP) (SERVICE=${NEWNAME}XDB)' *.log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST' *.sga_target=1900M $NEWNAME.instance_number=1 $NEWNAME.undo_tablespace=UNDOTBS1 *.service_names='$NEWNAME' *.cluster_database=false EOF export ORACLE_SID=$NEWNAME head -n $((`grep -n ^RECOVER /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc | awk -F: '{print $1}'`-2)) /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc > /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control1.trc sqlplus / as sysdba <<EOF create spfile from pfile='/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora'; @/u02/.ACFS/snaps/$NEWNAME/ACFSDB/control1.trc --recover automatic database using backup controlfile until cancel; --CANCEL alter database open resetlogs; alter tablespace temp add tempfile size 50M ; EOF |
Comments are, as always, very appreciated 🙂
—
Ludo
I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.
I know, I know, there is a paper from Dell and Oracle named:
That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:
– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart
What I want to achieve instead is:
– add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
– move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
– repeat (for the remaining nodes)
because the customer will also migrate to new hardware.
In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
oracle@sles01:~> crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE sles01 STABLE ora.LISTENER.lsnr ONLINE ONLINE sles01 STABLE ora.asm ONLINE ONLINE sles01 Started,STABLE ora.net1.network ONLINE ONLINE sles01 STABLE ora.ons ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE sles01 STABLE ora.cvu 1 ONLINE ONLINE sles01 STABLE ora.oc4j 1 OFFLINE OFFLINE STABLE ora.scan1.vip 1 ONLINE ONLINE sles01 STABLE ora.sles01.vip 1 ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- oracle@sles01:~> cat /etc/issue Welcome to SUSE Linux Enterprise Server 11 SP4 (x86_64) - Kernel \r (\l). |
I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:
Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to
Here’s an example of output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
oracle@sles01:~> cluvfy stage -pre nodeadd -n rhel01 Performing pre-checks for node addition Checking node reachability... Node reachability check passed from node "sles01" Checking user equivalence... User equivalence check passed for user "oracle" Package existence check passed for "cvuqdisk" Checking CRS integrity... CRS integrity check passed Clusterware version consistency passed. Checking shared resources... Checking CRS home location... Location check passed for: "/u01/app/12.1.0/grid" Shared resources check for node addition passed Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful Check: Node connectivity using interfaces on subnet "192.168.56.0" Node connectivity passed for subnet "192.168.56.0" with node(s) sles01,rhel01 TCP connectivity check passed for subnet "192.168.56.0" Check: Node connectivity using interfaces on subnet "172.16.100.0" Node connectivity passed for subnet "172.16.100.0" with node(s) rhel01,sles01 TCP connectivity check passed for subnet "172.16.100.0" Checking subnet mask consistency... Subnet mask consistency check passed for subnet "192.168.56.0". Subnet mask consistency check passed for subnet "172.16.100.0". Subnet mask consistency check passed. Node connectivity check passed Checking multicast communication... Checking subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251"... Check of subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251" passed. Check of multicast communication passed. Total memory check passed Available memory check passed Swap space check passed Free disk space check passed for "sles01:/usr,sles01:/var,sles01:/etc,sles01:/u01/app/12.1.0/grid,sles01:/sbin,sles01:/tmp" Free disk space check passed for "rhel01:/usr,rhel01:/var,rhel01:/etc,rhel01:/u01/app/12.1.0/grid,rhel01:/sbin,rhel01:/tmp" Check for multiple users with UID value 1101 passed User existence check passed for "oracle" Run level check passed Hard limits check passed for "maximum open file descriptors" Soft limits check passed for "maximum open file descriptors" Hard limits check passed for "maximum user processes" Soft limits check passed for "maximum user processes" System architecture check passed WARNING: PRVF-7524 : Kernel version is not consistent across all the nodes. Kernel version = "3.0.101-63-default" found on nodes: sles01. Kernel version = "3.8.13-16.2.1.el6uek.x86_64" found on nodes: rhel01. Kernel version check passed Kernel parameter check passed for "semmsl" Kernel parameter check passed for "semmns" Kernel parameter check passed for "semopm" Kernel parameter check passed for "semmni" Kernel parameter check passed for "shmmax" Kernel parameter check passed for "shmmni" Kernel parameter check passed for "shmall" Kernel parameter check passed for "file-max" Kernel parameter check passed for "ip_local_port_range" Kernel parameter check passed for "rmem_default" Kernel parameter check passed for "rmem_max" Kernel parameter check passed for "wmem_default" Kernel parameter check passed for "wmem_max" Kernel parameter check passed for "aio-max-nr" Package existence check passed for "make" Package existence check passed for "libaio" Package existence check passed for "binutils" Package existence check passed for "gcc(x86_64)" Package existence check passed for "gcc-c++(x86_64)" Package existence check passed for "glibc" Package existence check passed for "glibc-devel" Package existence check passed for "ksh" Package existence check passed for "libaio-devel" Package existence check failed for "libstdc++33" Check failed on nodes: rhel01 Package existence check failed for "libstdc++43-devel" Check failed on nodes: rhel01 Package existence check passed for "libstdc++-devel(x86_64)" Package existence check failed for "libstdc++46" Check failed on nodes: rhel01 Package existence check failed for "libgcc46" Check failed on nodes: rhel01 Package existence check passed for "sysstat" Package existence check failed for "libcap1" Check failed on nodes: rhel01 Package existence check failed for "nfs-kernel-server" Check failed on nodes: rhel01 Check for multiple users with UID value 0 passed Current group ID check passed Starting check for consistency of primary group of root user Check for consistency of root user's primary group passed Group existence check passed for "asmadmin" Group existence check passed for "asmoper" Group existence check passed for "asmdba" Checking ASMLib configuration. Check for ASMLib configuration passed. Checking OCR integrity... OCR integrity check passed Checking Oracle Cluster Voting Disk configuration... Oracle Cluster Voting Disk configuration check passed Time zone consistency check passed Starting Clock synchronization checks using Network Time Protocol(NTP)... NTP Configuration file check started... No NTP Daemons or Services were found to be running Clock synchronization check using Network Time Protocol(NTP) passed User "oracle" is not part of "root" group. Check passed Checking integrity of file "/etc/resolv.conf" across nodes "domain" and "search" entries do not coexist in any "/etc/resolv.conf" file All nodes have same "search" order defined in file "/etc/resolv.conf" PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: sles01,rhel01 Check for integrity of file "/etc/resolv.conf" failed Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ... Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed Pre-check for node addition was unsuccessful on all the nodes. |
So the problem is not if the check succeed or not (it will not), but what fails.
Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors. I need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
oracle@sles01:/u01/app/12.1.0/grid/addnode> ./addnode.sh -silent "CLUSTER_NEW_NODES={rhel01}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rhel01-vip}" -ignorePrereq -ignoreSysPrereqs Starting Oracle Universal Installer... Checking Temp space: must be greater than 120 MB. Actual 27479 MB Passed Checking swap space: must be greater than 150 MB. Actual 2032 MB Passed Prepare Configuration in progress. Prepare Configuration successful. .................................................. 9% Done. You can find the log of this install session at: /u01/app/oraInventory/logs/addNodeActions2015-11-09_09-57-16PM.log Instantiate files in progress. Instantiate files successful. .................................................. 15% Done. Copying files to node in progress. Copying files to node successful. .................................................. 79% Done. Saving cluster inventory in progress. .................................................. 87% Done. Saving cluster inventory successful. The Cluster Node Addition of /u01/app/12.1.0/grid was successful. Please check '/tmp/silentInstall.log' for more details. As a root user, execute the following script(s): 1. /u01/app/oraInventory/orainstRoot.sh 2. /u01/app/12.1.0/grid/root.sh Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes: [rhel01] Execute /u01/app/12.1.0/grid/root.sh on the following nodes: [rhel01] The scripts can be executed in parallel on all the nodes. If there are any policy managed databases managed by cluster, proceed with the addnode procedure without executing the root.sh script. Ensure that root.sh script is executed after all the policy managed databases managed by clusterware are extended to the new nodes. .......... Update Inventory in progress. .................................................. 100% Done. Update Inventory successful. Successfully Setup Software. |
Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
[oracle@rhel01 install]$ sudo /u01/app/12.1.0/grid/root.sh Performing root user operation for Oracle 12c The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/app/12.1.0/grid Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Relinking oracle with rac_on option Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params 2015/11/09 23:18:42 CLSRSC-363: User ignored prerequisites during installation OLR initialization - successful 2015/11/09 23:19:08 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf' CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Oracle High Availability Services has been started. CRS-4133: Oracle High Availability Services has been stopped. CRS-4123: Starting Oracle High Availability Services-managed resources CRS-2672: Attempting to start 'ora.mdnsd' on 'rhel01' CRS-2672: Attempting to start 'ora.evmd' on 'rhel01' CRS-2676: Start of 'ora.mdnsd' on 'rhel01' succeeded CRS-2676: Start of 'ora.evmd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rhel01' CRS-2676: Start of 'ora.gpnpd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'rhel01' CRS-2676: Start of 'ora.gipcd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rhel01' CRS-2676: Start of 'ora.cssdmonitor' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rhel01' CRS-2672: Attempting to start 'ora.diskmon' on 'rhel01' CRS-2676: Start of 'ora.diskmon' on 'rhel01' succeeded CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rhel01' CRS-2676: Start of 'ora.cssd' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rhel01' CRS-2672: Attempting to start 'ora.ctssd' on 'rhel01' CRS-2676: Start of 'ora.ctssd' on 'rhel01' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rhel01' CRS-2676: Start of 'ora.asm' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.storage' on 'rhel01' CRS-2676: Start of 'ora.storage' on 'rhel01' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'rhel01' CRS-2676: Start of 'ora.crsd' on 'rhel01' succeeded CRS-6017: Processing resource auto-start for servers: rhel01 CRS-2672: Attempting to start 'ora.ons' on 'rhel01' CRS-2676: Start of 'ora.ons' on 'rhel01' succeeded CRS-6016: Resource auto-start has completed for server rhel01 CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources CRS-4123: Oracle High Availability Services has been started. 2015/11/09 23:22:06 CLSRSC-343: Successfully started Oracle clusterware stack clscfg: EXISTING configuration version 5 detected. clscfg: version 5 is 12c Release 1. Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Preparing packages for installation... cvuqdisk-1.0.9-1 2015/11/09 23:22:23 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded |
Bingo! Let’s check if everything is up and running:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
[oracle@rhel01 ~]$ /u01/app/12.1.0/grid/bin/crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE ora.LISTENER.lsnr ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE ora.asm ONLINE ONLINE rhel01 Started,STABLE ONLINE ONLINE sles01 Started,STABLE ora.net1.network ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE ora.ons ONLINE ONLINE rhel01 STABLE ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE sles01 STABLE ora.cvu 1 ONLINE ONLINE sles01 STABLE ora.oc4j 1 OFFLINE OFFLINE STABLE ora.rhel01.vip 1 ONLINE ONLINE rhel01 STABLE ora.scan1.vip 1 ONLINE ONLINE sles01 STABLE ora.sles01.vip 1 ONLINE ONLINE sles01 STABLE -------------------------------------------------------------------------------- |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[oracle@rhel01 ~]$ olsnodes -s sles01 Active rhel01 Active [oracle@rhel01 ~]$ ssh rhel01 uname -r 3.8.13-16.2.1.el6uek.x86_64 [oracle@rhel01 ~]$ ssh sles01 uname -r 3.0.101-63-default [oracle@rhel01 ~]$ ssh rhel01 cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 (Santiago) [oracle@rhel01 ~]$ ssh sles01 cat /etc/issue Welcome to SUSE Linux Enterprise Server 11 SP4 (x86_64) - Kernel \r (\l). |
So yes, it works, but remember that it’s not a supported long-term configuration.
In my case I expect to migrate the whole cluster from SLES to OEL in one day.
NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html
Otherwise, you need to configure a new interface name for the cluster with oifcfg.
HTH
—
Ludovico
Yesterday I’ve got a weird problem with Adaptive Cursor Sharing. I’m not sure yet about the issue, but it seems to be related to cursor sharing histograms. Hopefully one day I will blog about what I’ve learnt from this experience.
To better monitor the problem on that specific query, I’ve prepared this script (tested on 12.1.0.2):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
COLUMN Shareable HEADING 'S|H|A|R|E|A|B|L|E' COLUMN "Bind-Aware" HEADING 'B|I|N|D| |A|W|A|R|E' COLUMN Sensitive HEADING 'S|E|N|S|I|T|I|V|E' COLUMN Reoptimizable HEADING 'R|E|O|P|T|I|M|I|Z|A|B|L|E' BREAK on child_number on Execs on "Gets/Exec" on "Ela/Exec" on "Sensitive" on "Shareable" on "Bind-Aware" on bucket0 on bucket1 on bucket2 on cnt on "Reoptimizable" on is_resolved_adaptive_plan select * from (select * from ( select s.child_number, s.plan_hash_value, executions as Execs, round(buffer_gets/executions) as "Gets/Exec", round(elapsed_time/executions) as "Ela/Exec", is_bind_sensitive as "Sensitive", is_shareable as "Shareable", is_bind_aware as "Bind-Aware", to_char(h.bucket_id) as bucket, h.count as cnt, is_reoptimizable as "Reoptimizable", is_resolved_adaptive_plan, "UNBOUND_CURSOR", "SQL_TYPE_MISMATCH", "OPTIMIZER_MISMATCH", "OUTLINE_MISMATCH", "STATS_ROW_MISMATCH", "LITERAL_MISMATCH", "FORCE_HARD_PARSE", "EXPLAIN_PLAN_CURSOR", "BUFFERED_DML_MISMATCH", "PDML_ENV_MISMATCH", "INST_DRTLD_MISMATCH", "SLAVE_QC_MISMATCH", "TYPECHECK_MISMATCH", "AUTH_CHECK_MISMATCH", "BIND_MISMATCH", "DESCRIBE_MISMATCH", "LANGUAGE_MISMATCH", "TRANSLATION_MISMATCH", "BIND_EQUIV_FAILURE", "INSUFF_PRIVS", "INSUFF_PRIVS_REM", "REMOTE_TRANS_MISMATCH", "LOGMINER_SESSION_MISMATCH", "INCOMP_LTRL_MISMATCH", "OVERLAP_TIME_MISMATCH", "EDITION_MISMATCH", "MV_QUERY_GEN_MISMATCH", "USER_BIND_PEEK_MISMATCH", "TYPCHK_DEP_MISMATCH", "NO_TRIGGER_MISMATCH", "FLASHBACK_CURSOR", "ANYDATA_TRANSFORMATION", "PDDL_ENV_MISMATCH", "TOP_LEVEL_RPI_CURSOR", "DIFFERENT_LONG_LENGTH", "LOGICAL_STANDBY_APPLY", "DIFF_CALL_DURN", "BIND_UACS_DIFF", "PLSQL_CMP_SWITCHS_DIFF", "CURSOR_PARTS_MISMATCH", "STB_OBJECT_MISMATCH", "CROSSEDITION_TRIGGER_MISMATCH", "PQ_SLAVE_MISMATCH", "TOP_LEVEL_DDL_MISMATCH", "MULTI_PX_MISMATCH", "BIND_PEEKED_PQ_MISMATCH", "MV_REWRITE_MISMATCH", "ROLL_INVALID_MISMATCH", "OPTIMIZER_MODE_MISMATCH", "PX_MISMATCH", "MV_STALEOBJ_MISMATCH", "FLASHBACK_TABLE_MISMATCH", "LITREP_COMP_MISMATCH", "PLSQL_DEBUG", "LOAD_OPTIMIZER_STATS", "ACL_MISMATCH", "FLASHBACK_ARCHIVE_MISMATCH", "LOCK_USER_SCHEMA_FAILED", "REMOTE_MAPPING_MISMATCH", "LOAD_RUNTIME_HEAP_FAILED", "HASH_MATCH_FAILED", "PURGED_CURSOR", "BIND_LENGTH_UPGRADEABLE", "USE_FEEDBACK_STATS" from v$sql s join v$sql_cs_histogram h on (s.sql_id=h.sql_id and s.child_number=h.child_number and s.con_id=h.con_id ) join v$sql_shared_cursor shc on (shc.sql_id=h.sql_id and shc.child_number=h.child_number and s.con_id=shc.con_id ) where s.sql_id='&sql_id' ) pivot (sum(cnt) for (bucket) IN ('0' AS Bucket0,'1' AS Bucket1,'2' AS Bucket2)) ) unpivot (result FOR reason_type IN ("UNBOUND_CURSOR", "SQL_TYPE_MISMATCH", "OPTIMIZER_MISMATCH", "OUTLINE_MISMATCH", "STATS_ROW_MISMATCH", "LITERAL_MISMATCH", "FORCE_HARD_PARSE", "EXPLAIN_PLAN_CURSOR", "BUFFERED_DML_MISMATCH", "PDML_ENV_MISMATCH", "INST_DRTLD_MISMATCH", "SLAVE_QC_MISMATCH", "TYPECHECK_MISMATCH", "AUTH_CHECK_MISMATCH", "BIND_MISMATCH", "DESCRIBE_MISMATCH", "LANGUAGE_MISMATCH", "TRANSLATION_MISMATCH", "BIND_EQUIV_FAILURE", "INSUFF_PRIVS", "INSUFF_PRIVS_REM", "REMOTE_TRANS_MISMATCH", "LOGMINER_SESSION_MISMATCH", "INCOMP_LTRL_MISMATCH", "OVERLAP_TIME_MISMATCH", "EDITION_MISMATCH", "MV_QUERY_GEN_MISMATCH", "USER_BIND_PEEK_MISMATCH", "TYPCHK_DEP_MISMATCH", "NO_TRIGGER_MISMATCH", "FLASHBACK_CURSOR", "ANYDATA_TRANSFORMATION", "PDDL_ENV_MISMATCH", "TOP_LEVEL_RPI_CURSOR", "DIFFERENT_LONG_LENGTH", "LOGICAL_STANDBY_APPLY", "DIFF_CALL_DURN", "BIND_UACS_DIFF", "PLSQL_CMP_SWITCHS_DIFF", "CURSOR_PARTS_MISMATCH", "STB_OBJECT_MISMATCH", "CROSSEDITION_TRIGGER_MISMATCH", "PQ_SLAVE_MISMATCH", "TOP_LEVEL_DDL_MISMATCH", "MULTI_PX_MISMATCH", "BIND_PEEKED_PQ_MISMATCH", "MV_REWRITE_MISMATCH", "ROLL_INVALID_MISMATCH", "OPTIMIZER_MODE_MISMATCH", "PX_MISMATCH", "MV_STALEOBJ_MISMATCH", "FLASHBACK_TABLE_MISMATCH", "LITREP_COMP_MISMATCH", "PLSQL_DEBUG", "LOAD_OPTIMIZER_STATS", "ACL_MISMATCH", "FLASHBACK_ARCHIVE_MISMATCH", "LOCK_USER_SCHEMA_FAILED", "REMOTE_MAPPING_MISMATCH", "LOAD_RUNTIME_HEAP_FAILED", "HASH_MATCH_FAILED", "PURGED_CURSOR", "BIND_LENGTH_UPGRADEABLE", "USE_FEEDBACK_STATS")) where result='Y' order by child_number; |
The result is something similar (in my case it has 26 child cursors):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
R E O B P S S I T E H N I N A D M S R I I E A Z T A W A I B A B V L R L CHILD_NUMBER PLAN_HASH_VALUE EXECS Gets/Exec Ela/Exec E E E E I BUCKET0 BUCKET1 BUCKET2 REASON_TYPE R ------------ --------------- ---------- ---------- ---------- - - - - - ---------- ---------- ---------- ----------------------------- - 0 2293695281 455 2466 14464 Y Y Y N 0 455 0 ROLL_INVALID_MISMATCH Y 2293695281 BIND_EQUIV_FAILURE Y 1 1690560038 99 13943 103012 Y Y Y N 0 99 0 ROLL_INVALID_MISMATCH Y 1690560038 BIND_EQUIV_FAILURE Y 2 3815006743 541 43090 230245 Y Y Y N 0 541 0 BIND_EQUIV_FAILURE Y 3815006743 ROLL_INVALID_MISMATCH Y 3 1483632464 251 4111 18940 Y Y Y N 49 202 0 ROLL_INVALID_MISMATCH Y 1483632464 BIND_EQUIV_FAILURE Y 4 3815006743 1152 42632 220730 Y Y Y N 0 1000 0 BIND_EQUIV_FAILURE Y 3815006743 ROLL_INVALID_MISMATCH Y 5 3922835573 150 39252 184176 Y Y Y N 0 150 0 ROLL_INVALID_MISMATCH Y 3922835573 BIND_EQUIV_FAILURE Y 6 767857637 3 4731 124707 Y Y Y N 0 3 0 ROLL_INVALID_MISMATCH Y 767857637 BIND_EQUIV_FAILURE Y 7 767857637 11 4739 71119 Y Y Y N 0 11 0 BIND_EQUIV_FAILURE Y 8 2800467281 1 307 249727 Y Y Y N 0 1 0 BIND_EQUIV_FAILURE Y 9 3123241890 536 2982 14428 Y Y Y N 6 530 0 ROLL_INVALID_MISMATCH Y 3123241890 BIND_EQUIV_FAILURE Y 10 3125518635 17 315 16492 Y Y Y N 16 1 0 ROLL_INVALID_MISMATCH Y 3125518635 BIND_EQUIV_FAILURE Y 11 2184442252 130 4686 40188 Y Y Y N 0 130 0 ROLL_INVALID_MISMATCH Y 2184442252 BIND_EQUIV_FAILURE Y 12 3815006743 553 42765 231391 Y Y Y N 0 553 0 ROLL_INVALID_MISMATCH Y 3815006743 BIND_EQUIV_FAILURE Y 13 1166983254 47 14193 111256 Y Y Y N 0 47 0 BIND_EQUIV_FAILURE Y 1166983254 ROLL_INVALID_MISMATCH Y 14 2307602173 2 38 45922 Y Y Y N 2 0 0 BIND_EQUIV_FAILURE Y 2307602173 ROLL_INVALID_MISMATCH Y 15 767857637 11 4304 59617 Y Y Y N 0 11 0 BIND_EQUIV_FAILURE Y 767857637 ROLL_INVALID_MISMATCH Y 16 3108045525 2 34591 176749 Y N N N 1 1 0 ROLL_INVALID_MISMATCH Y 3108045525 LOAD_OPTIMIZER_STATS Y 3108045525 BIND_EQUIV_FAILURE Y 17 3108045525 6 1794 33335 Y Y Y N 4 2 0 BIND_EQUIV_FAILURE Y 3108045525 ROLL_INVALID_MISMATCH Y 18 2440443365 470 2009 13361 Y Y Y N 0 470 0 ROLL_INVALID_MISMATCH Y 2440443365 BIND_EQUIV_FAILURE Y 19 4079924956 15 2032 19773 Y Y Y N 8 7 0 ROLL_INVALID_MISMATCH Y 4079924956 BIND_EQUIV_FAILURE Y 20 777919270 32 2675 18260 Y Y Y N 11 21 0 BIND_EQUIV_FAILURE Y 777919270 ROLL_INVALID_MISMATCH Y 21 1428146033 63 13929 111116 Y Y Y N 0 63 0 ROLL_INVALID_MISMATCH Y 1428146033 BIND_EQUIV_FAILURE Y 22 3815006743 218 43673 234642 Y Y Y N 0 218 0 BIND_EQUIV_FAILURE Y 3815006743 ROLL_INVALID_MISMATCH Y 23 277802667 1 62 99268 Y Y Y N 1 0 0 BIND_EQUIV_FAILURE Y 277802667 ROLL_INVALID_MISMATCH Y 24 3898025231 3 2364 111231 Y Y Y N 0 3 0 BIND_EQUIV_FAILURE Y 3898025231 ROLL_INVALID_MISMATCH Y 25 767857637 2 6495 169363 Y Y Y N 0 2 0 ROLL_INVALID_MISMATCH Y 767857637 BIND_EQUIV_FAILURE Y 26 3690167092 100 2998 20138 Y Y Y N 0 100 0 BIND_EQUIV_FAILURE Y 3690167092 ROLL_INVALID_MISMATCH Y |
It’s a quick way to get the relevant information in a single result.
Off course, if you need deeper details, you should consider something more powerful like SQLd360 from Mauro Pagano.
Credits: I’ve got the unpivot idea (and copied that part of the code) from this post by Timur Akhmadeev.
—
Ludo
This quick post is mainly for myself… I will certainly use it for reference in the future.
Debugging problems due to adaptive dynamic sampling and in general adaptive features sometimes needs to get historical data about, e.g., parse time.
In order to get this information you may need to query the view DBA_HIST_SYS_TIME_MODEL (take care, it needs Diagnostic Pack license!)
You can use this query as an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
with h as ( select s.snap_id, s.BEGIN_INTERVAL_TIME, --s.END_INTERVAL_TIME, g.STAT_ID, g.stat_name, nvl( decode( greatest( VALUE, nvl(lag(VALUE) over (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),0) ), VALUE, VALUE - lag(VALUE) over (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id ), VALUE ), 0 ) VALUE from DBA_HIST_SNAPSHOT s, DBA_HIST_SYS_TIME_MODEL g, v$instance i where s.SNAP_ID=g.SNAP_ID and s.BEGIN_INTERVAL_TIME >= trunc(to_timestamp(nvl('&startdate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')) and s.BEGIN_INTERVAL_TIME < = trunc(to_timestamp(nvl('&enddate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')+1) and s.instance_number=i.instance_number and s.instance_number=g.instance_number ) select p.begin_interval_time, p.value as "parse time elapsed", t.value as "DB time", round(p.value/t.value,2)*100 as "parse pct", par.value as opt_adapt_feat from h p, h t , dba_hist_parameter par where p.snap_id=t.snap_id and p.snap_id=par.snap_id and p.stat_name='parse time elapsed' and t.stat_name='DB time' and par.parameter_name='optimizer_adaptive_features' and t.value>0 order by p.begin_interval_time / |
In this specific example, it shows the “parse time elapsed”, the “DB time” and the percentage parse/dbtime, along with the value of the parameter “optimizer_adaptive_features“. You can use it to check if changing the parameters related to adaptive dynamic sampling improves or not the parse time.
The output will be something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
BEGIN_INTERVAL_TIME parse time elapsed DB time parse pct OPT_ADAPT_FEAT -------------- ---------- ------------------ ----------- ---------- ---------- 23-OCT-15 03.00.36.569 AM 3235792 57030479 5.67 TRUE 23-OCT-15 03.30.38.712 AM 3438093 60262996 5.71 TRUE 23-OCT-15 04.00.40.709 AM 4622998 69813760 6.62 TRUE 23-OCT-15 04.30.42.776 AM 4590463 56441202 8.13 TRUE 23-OCT-15 05.00.44.735 AM 13772357 113741371 12.11 TRUE 23-OCT-15 05.30.46.722 AM 3448944 49807800 6.92 TRUE 23-OCT-15 06.00.48.664 AM 4792886 54235691 8.84 TRUE 23-OCT-15 06.30.50.713 AM 8527305 58775613 14.51 TRUE 23-OCT-15 07.00.52.667 AM 8518273 75248056 11.32 TRUE 23-OCT-15 07.30.54.622 AM 9800048 17381081 1.07 TRUE 23-OCT-15 08.00.56.609 AM 6986551 1629027583 .43 TRUE 23-OCT-15 08.30.58.568 AM 8414695 2493025822 .34 TRUE 23-OCT-15 09.00.00.457 AM 13648260 2412333113 .57 TRUE 23-OCT-15 09.30.02.384 AM 15186610 4635080356 .33 TRUE 23-OCT-15 10.00.04.298 AM 23465769 39080849 3.17 FALSE 23-OCT-15 10.30.06.421 AM 12152991 2654461964 .46 FALSE 23-OCT-15 11.00.08.444 AM 24901111 549936076 4.53 FALSE 23-OCT-15 11.30.10.485 AM 8080236 354568317 2.28 FALSE 23-OCT-15 12.00.12.453 PM 4291839 91028268 4.71 FALSE 23-OCT-15 12.30.14.430 PM 3675163 177312397 2.07 FALSE 23-OCT-15 01.00.16.468 PM 9184841 231138367 3.97 FALSE 23-OCT-15 01.30.18.438 PM 8132397 162607229 5 FALSE 23-OCT-15 02.00.20.707 PM 13375709 210251458 6.36 FALSE 23-OCT-15 02.30.23.740 PM 10116413 285114368 3.55 FALSE 23-OCT-15 03.00.25.699 PM 8067777 123864339 6.51 FALSE 23-OCT-15 03.30.27.641 PM 5787931 110621767 5.23 FALSE |
HTH
—
Ludo
Losing the Disk Group that contains OCR and voting files has always been a challenge. It requires you to take regular backups of OCR, spfile and diskgroup metadata.
Since Oracle 12cR1, there are a few additional components you must take care of:
– The ASM password file (if you have Flex ASM it can be quite critical)
– The Grid Infrastructure Management Repository
Why ASM password file is important? Well, you can read this good blog post form my colleague Robert Bialek: http://blog.trivadis.com/b/robertbialek/archive/2014/10/26/are-you-using-oracle-12c-flex-asm-if-yes-do-you-have-asm-password-file-backup.aspx
So the problem here, is not whether you should back them up or not, but how you can restore them quickly.
Assumptions: you back up regularly:
ASM parameter file:
1 2 3 |
SQL> create pfile='/backup/spfileASM.ora' from spfile; File created. |
Oracle Cluster Registry:
1 2 |
grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/ocrconfig -manualbackup tvdrach03 2015/09/21 14:30:39 /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150921_143039.ocr 0 |
ASM Diskgroup Metadata:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
ASMCMD [+] > md_backup GRID.dg -G GRID Disk group metadata to be backed up: GRID Current alias directory path: _MGMTDB/DATAFILE Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE Current alias directory path: tvdrac-cluster Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5 Current alias directory path: ASM/PASSWORD Current alias directory path: _MGMTDB/TEMPFILE Current alias directory path: tvdrac-cluster/ASMPARAMETERFILE Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/TEMPFILE Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815 Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6 Current alias directory path: _MGMTDB/ONLINELOG Current alias directory path: _MGMTDB Current alias directory path: ASM Current alias directory path: tvdrac-cluster/OCRFILE Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7 Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6/TEMPFILE Current alias directory path: _MGMTDB/CONTROLFILE Current alias directory path: _MGMTDB/PARAMETERFILE |
ASM password file:
1 2 |
ASMCMD [+GRID] > pwcopy +GRID/orapwASM /backup/ copying +GRID/orapwASM -> /backup/orapwASM |
What about the GIMR?
According to the MOS Note: FAQ: 12c Grid Infrastructure Management Repository (GIMR) (Doc ID 1568402.1), there is no such need for the moment.
Weird, huh? The -MGMTDB itself contains for the moment just the Cluster Health Monitor repository, but expect to see its important increasing with the next versions of Oracle Grid Infrastructure.
If you REALLY want to back it up (even if not fundamental, it is not a bad idea, after all), you can do it.
The -MGMTDB is in noarchivelog by default. You need to either put it in archivelog mode (and set a recovery area, etc etc) or back it up while it is mounted.
Because the Cluster Health Monitor (ora.crf) depends on it, you have to stop it beforehand:
1 2 3 |
grid@tvdrach01:~/ [-MGMTDB] crsctl stop resource ora.crf -init CRS-2673: Attempting to stop 'ora.crf' on 'tvdrach01' CRS-2677: Stop of 'ora.crf' on 'tvdrach01' succeeded |
Then you can operate with -MGMTDB:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
grid@tvdrach01:~/ [-MGMTDB] srvctl stop mgmtdb -stopoption IMMEDIATE grid@tvdrach01:~/ [-MGMTDB] srvctl start mgmtdb -startoption MOUNT grid@tvdrach01:~/ [-MGMTDB] grid@tvdrach02:~/ [-MGMTDB] rman Recovery Manager: Release 12.1.0.2.0 - Production on Sun Sep 27 17:59:55 2015 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. RMAN> connect target / connected to target database: _MGMTDB (DBID=1095800268, not open) RMAN> backup as compressed backupset database format '+DATA'; Starting backup at 27-SEP-15 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=24 device type=DISK channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00011 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdata.269.891526555 input datafile file number=00007 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/system.270.891526555 input datafile file number=00008 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysaux.271.891526555 input datafile file number=00010 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysgridhomedata.272.891526555 input datafile file number=00012 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdatadb.273.891526555 input datafile file number=00009 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/users.274.891526555 channel ORA_DISK_1: starting piece 1 at 27-SEP-15 channel ORA_DISK_1: finished piece 1 at 27-SEP-15 piece handle=+DATA/_MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.256.891540019 tag=TAG20150927T180016 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00001 name=+GRID/_MGMTDB/DATAFILE/system.258.891526155 input datafile file number=00003 name=+GRID/_MGMTDB/DATAFILE/sysaux.257.891526135 input datafile file number=00004 name=+GRID/_MGMTDB/DATAFILE/undotbs1.259.891526181 channel ORA_DISK_1: starting piece 1 at 27-SEP-15 channel ORA_DISK_1: finished piece 1 at 27-SEP-15 piece handle=+DATA/_MGMTDB/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.257.891540043 tag=TAG20150927T180016 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00005 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/system.265.891526233 input datafile file number=00006 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/sysaux.266.891526233 channel ORA_DISK_1: starting piece 1 at 27-SEP-15 channel ORA_DISK_1: finished piece 1 at 27-SEP-15 piece handle=+DATA/_MGMTDB/20BC2691871B0B14E0533358A8C01AC6/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.258.891540069 tag=TAG20150927T180016 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15 Finished backup at 27-SEP-15 Starting Control File and SPFILE Autobackup at 27-SEP-15 piece handle=/u01/app/grid/12.1.0.2/dbs/c-1095800268-20150927-00 comment=NONE Finished Control File and SPFILE Autobackup at 27-SEP-15 RMAN> alter database open; Statement processed RMAN> |
Now, imagine that you loose the GRID diskgroup (nowadays, with the ASM Filter Driver, it’s more complex to corrupt a device by mistake, but let’s assume that you do it):
1 2 3 4 |
root@tvdrach01:~/ [-MGMTDB] dd if=/dev/zero of=/dev/asm-disk1 bs=1M count=128 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 0.360653 s, 372 MB/s |
The cluster will not start anymore, you need to disable the crs, reboot and start it in exclusive mode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
root@tvdrach01:~/ [-MGMTDB] crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.evmd' on 'tvdrach01' CRS-2672: Attempting to start 'ora.mdnsd' on 'tvdrach01' CRS-2676: Start of 'ora.mdnsd' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.evmd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'tvdrach01' CRS-2676: Start of 'ora.gpnpd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'tvdrach01' CRS-2672: Attempting to start 'ora.gipcd' on 'tvdrach01' CRS-2676: Start of 'ora.cssdmonitor' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.gipcd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'tvdrach01' CRS-2672: Attempting to start 'ora.diskmon' on 'tvdrach01' CRS-2676: Start of 'ora.diskmon' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.cssd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'tvdrach01' CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'tvdrach01' CRS-2672: Attempting to start 'ora.ctssd' on 'tvdrach01' CRS-2676: Start of 'ora.ctssd' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.drivers.acfs' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.asm' on 'tvdrach01' CRS-2676: Start of 'ora.asm' on 'tvdrach01' succeeded root@tvdrach01:~/ [-MGMTDB] |
Then you can recreate the GRID disk group and restore everything inside it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
SQL> alter system set asm_diskstring='/dev/asm*'; System altered. SQL> create diskgroup GRID external redundancy disk '/dev/asm-disk1' attribute 'COMPATIBLE.ADVM'='12.1.0.0.0', 'COMPATIBLE.ASM'='12.1.0.0.0'; Diskgroup created. SQL> create spfile='+GRID' from pfile='/backup/spfileASM.ora'; File created. SQL> root@tvdrach01:~/ [+ASM1] ocrconfig -restore /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150927_174702.ocr root@tvdrach01:~/ [+ASM1] grid@tvdrach01:~/ [+ASM1] crsctl replace votedisk '+GRID' Successful addition of voting disk a375f4bdb7854f8fbf7a92cd880fba60. Successfully replaced voting disk group with +GRID. CRS-4266: Voting file(s) successfully replaced root@tvdrach01:~/ [+ASM1] crsctl stop crs -f ... root@tvdrach01:~/ [+ASM1] crsctl start crs ... ASMCMD [+] > pwcopy --asm /backup/orapwASM +GRID/orapwASM copying /backup/orapwASM -> +GRID/orapwASM |
Finally, the last missing component: the GIMR.
You can recreate it or restore it (if you backed it up at some point in time).
Let’s see how to recreate it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
grid@tvdrach03:~/ [-MGMTDB] srvctl disable mgmtdb grid@tvdrach03:~/ [-MGMTDB] srvctl remove mgmtdb Remove the database _mgmtdb? (y/[n]) y grid@tvdrach01:~/ [+ASM1] dbca -silent -createDatabase -sid -MGMTDB \ > -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc \ > -gdbName _mgmtdb -storageType ASM -diskGroupName +GRID \ > -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -characterset AL32UTF8 \ > -autoGeneratePasswords -skipUserTemplateCheck Cleaning up failed steps 5% complete Registering database with Oracle Grid Infrastructure 11% complete Copying database files 12% complete 14% complete 21% complete 27% complete 34% complete 41% complete 44% complete Creating and starting Oracle instance 46% complete 51% complete 52% complete 53% complete 58% complete 62% complete 63% complete 66% complete Completing Database Creation 70% complete 80% complete 90% complete 100% complete Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb0.log" for further details. grid@tvdrach01:~/ [+ASM1] dbca -silent -createPluggableDatabase -sourceDB -MGMTDB \ > -pdbName tvdrac_cluster -createPDBFrom RMANBACKUP \ > -PDBBackUpfile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.dfb \ > -PDBMetadataFile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.xml \ > -createAsClone true -internalSkipGIHomeCheck Creating Pluggable Database Creating Pluggable Database 4% complete 12% complete 21% complete 38% complete 55% complete O-GRINF Grid Infrastructure Disaster Recovery Page 21 85% complete Completing Pluggable Database Creation 100% complete Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/tvdrac_cluster/_mgmtdb.log" for further details. grid@tvdrach01:~/ [+ASM1] srvctl status mgmtdb Database is enabled Instance -MGMTDB is running on node tvdrach01 grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/crsctl modify res ora.crf -attr ENABLED=1 -init grid@tvdrach01:~/ [+ASM1] crsctl start res ora.crf -init CRS-2672: Attempting to start 'ora.crf' on 'tvdrach01' CRS-2676: Start of 'ora.crf' on 'tvdrach01' succeeded grid@tvdrach01:~/ [+ASM1] |
Conclusion
Recovering from a lost Disk Group / Cluster is not rocket science. Just practice it every now and then. If you do not have a test RAC, you can build your lab on your laptop using the RAC Attack instructions. If you want to test all the scenarios, the RAC SIG webcast: Oracle 11g Clusterware failure scenarios with practical demonstrations by Kamran Agayev is the best starting point, IMHO. Just keep in mind that Flex ASM and the GIMR add more complexity.
HTH
—
Ludovico
Last week I have hosted the second Swiss RAC Attack workshop at Trivadis offices in Geneva. It has been a great success, with 21 total participants: 5 Ninjas, 4 alumni and 14 people actively installing or playing with RAC 12c on their laptops.
Last year I was suprised by a participant coming fron Nanterre. This year two people came directly from Moscow, just for the workshop!
We’ve got good pizza and special beer: Chimay , Vedett, Duvel, Andechs…
Last but not least, our friend Marc Fielding was visiting Switzerland last week, so he took the opportunity to join us and make the workshop even more interesting! 😀
Looking forward to organize it again in one year! Thank you guys 🙂
—
Ludovico
When I started my contribution to the Oracle community, I was doing it for two reasons. The noble: Give back to the community what I have learnt from it. The narcissist: Try to be as good as my favourite bloggers and get fame and prestige.
What I was not expecting, is that now I am getting more from the community than what I was getting before starting my contribution. Not only at a technical level, but also in terms of friends, travels, network.
Earlier this summer, I have been invited, along with Franck Pachot, to present at the Paris Oracle Meetup. We have been there last Friday (4 Sept 2015), and it has been mind blowing.
http://www.meetup.com/parisoracle/events/224181879/
I’m excited about it because this meetup started only a couple of years ago, and it has already had speakers like Jonathan Lewis and Tom Kyte. But what surprises more, is the meetup has a near-zero budget.
The legend says that Gregory Gouillou and other french guys met Christian Antognini at the Open World (or was it UKOUG Tech?), discussed about the lack of active OUGs in France and then Chris said: “If you want it, make it happen”.
The Paris Oracle Meetup now is a reality, and I am proud of having been part of it.
Thank you @RBELHADJ, @Ycolin, @GregoryGuillou, @ParisOracle ! 🙂
Here is @ludodba SLOBing ASM vs. ACFS at @ParisOracle pic.twitter.com/YbnpJ4fSGQ
— Franck Pachot (@FranckPachot) September 4, 2015