Losing the Disk Group that contains OCR and voting files has always been a challenge. It requires you to take regular backups of OCR, spfile and diskgroup metadata.
Since Oracle 12cR1, there are a few additional components you must take care of:
– The ASM password file (if you have Flex ASM it can be quite critical)
– The Grid Infrastructure Management Repository
Why ASM password file is important? Well, you can read this good blog post form my colleague Robert Bialek: http://blog.trivadis.com/b/robertbialek/archive/2014/10/26/are-you-using-oracle-12c-flex-asm-if-yes-do-you-have-asm-password-file-backup.aspx
So the problem here, is not whether you should back them up or not, but how you can restore them quickly.
Assumptions: you back up regularly:
ASM parameter file:
1 2 3 |
SQL> create pfile='/backup/spfileASM.ora' from spfile; File created. |
Oracle Cluster Registry:
1 2 |
grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/ocrconfig -manualbackup tvdrach03 2015/09/21 14:30:39 /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150921_143039.ocr 0 |
ASM Diskgroup Metadata:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
ASMCMD [+] > md_backup GRID.dg -G GRID Disk group metadata to be backed up: GRID Current alias directory path: _MGMTDB/DATAFILE Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE Current alias directory path: tvdrac-cluster Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5 Current alias directory path: ASM/PASSWORD Current alias directory path: _MGMTDB/TEMPFILE Current alias directory path: tvdrac-cluster/ASMPARAMETERFILE Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/TEMPFILE Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815 Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6 Current alias directory path: _MGMTDB/ONLINELOG Current alias directory path: _MGMTDB Current alias directory path: ASM Current alias directory path: tvdrac-cluster/OCRFILE Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7 Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6/TEMPFILE Current alias directory path: _MGMTDB/CONTROLFILE Current alias directory path: _MGMTDB/PARAMETERFILE |
ASM password file:
1 2 |
ASMCMD [+GRID] > pwcopy +GRID/orapwASM /backup/ copying +GRID/orapwASM -> /backup/orapwASM |
What about the GIMR?
According to the MOS Note: FAQ: 12c Grid Infrastructure Management Repository (GIMR) (Doc ID 1568402.1), there is no such need for the moment.
Weird, huh? The -MGMTDB itself contains for the moment just the Cluster Health Monitor repository, but expect to see its important increasing with the next versions of Oracle Grid Infrastructure.
If you REALLY want to back it up (even if not fundamental, it is not a bad idea, after all), you can do it.
The -MGMTDB is in noarchivelog by default. You need to either put it in archivelog mode (and set a recovery area, etc etc) or back it up while it is mounted.
Because the Cluster Health Monitor (ora.crf) depends on it, you have to stop it beforehand:
1 2 3 |
grid@tvdrach01:~/ [-MGMTDB] crsctl stop resource ora.crf -init CRS-2673: Attempting to stop 'ora.crf' on 'tvdrach01' CRS-2677: Stop of 'ora.crf' on 'tvdrach01' succeeded |
Then you can operate with -MGMTDB:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
grid@tvdrach01:~/ [-MGMTDB] srvctl stop mgmtdb -stopoption IMMEDIATE grid@tvdrach01:~/ [-MGMTDB] srvctl start mgmtdb -startoption MOUNT grid@tvdrach01:~/ [-MGMTDB] grid@tvdrach02:~/ [-MGMTDB] rman Recovery Manager: Release 12.1.0.2.0 - Production on Sun Sep 27 17:59:55 2015 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. RMAN> connect target / connected to target database: _MGMTDB (DBID=1095800268, not open) RMAN> backup as compressed backupset database format '+DATA'; Starting backup at 27-SEP-15 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=24 device type=DISK channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00011 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdata.269.891526555 input datafile file number=00007 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/system.270.891526555 input datafile file number=00008 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysaux.271.891526555 input datafile file number=00010 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysgridhomedata.272.891526555 input datafile file number=00012 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdatadb.273.891526555 input datafile file number=00009 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/users.274.891526555 channel ORA_DISK_1: starting piece 1 at 27-SEP-15 channel ORA_DISK_1: finished piece 1 at 27-SEP-15 piece handle=+DATA/_MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.256.891540019 tag=TAG20150927T180016 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00001 name=+GRID/_MGMTDB/DATAFILE/system.258.891526155 input datafile file number=00003 name=+GRID/_MGMTDB/DATAFILE/sysaux.257.891526135 input datafile file number=00004 name=+GRID/_MGMTDB/DATAFILE/undotbs1.259.891526181 channel ORA_DISK_1: starting piece 1 at 27-SEP-15 channel ORA_DISK_1: finished piece 1 at 27-SEP-15 piece handle=+DATA/_MGMTDB/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.257.891540043 tag=TAG20150927T180016 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00005 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/system.265.891526233 input datafile file number=00006 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/sysaux.266.891526233 channel ORA_DISK_1: starting piece 1 at 27-SEP-15 channel ORA_DISK_1: finished piece 1 at 27-SEP-15 piece handle=+DATA/_MGMTDB/20BC2691871B0B14E0533358A8C01AC6/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.258.891540069 tag=TAG20150927T180016 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15 Finished backup at 27-SEP-15 Starting Control File and SPFILE Autobackup at 27-SEP-15 piece handle=/u01/app/grid/12.1.0.2/dbs/c-1095800268-20150927-00 comment=NONE Finished Control File and SPFILE Autobackup at 27-SEP-15 RMAN> alter database open; Statement processed RMAN> |
Now, imagine that you loose the GRID diskgroup (nowadays, with the ASM Filter Driver, it’s more complex to corrupt a device by mistake, but let’s assume that you do it):
1 2 3 4 |
root@tvdrach01:~/ [-MGMTDB] dd if=/dev/zero of=/dev/asm-disk1 bs=1M count=128 128+0 records in 128+0 records out 134217728 bytes (134 MB) copied, 0.360653 s, 372 MB/s |
The cluster will not start anymore, you need to disable the crs, reboot and start it in exclusive mode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
root@tvdrach01:~/ [-MGMTDB] crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started. CRS-2672: Attempting to start 'ora.evmd' on 'tvdrach01' CRS-2672: Attempting to start 'ora.mdnsd' on 'tvdrach01' CRS-2676: Start of 'ora.mdnsd' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.evmd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'tvdrach01' CRS-2676: Start of 'ora.gpnpd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'tvdrach01' CRS-2672: Attempting to start 'ora.gipcd' on 'tvdrach01' CRS-2676: Start of 'ora.cssdmonitor' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.gipcd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'tvdrach01' CRS-2672: Attempting to start 'ora.diskmon' on 'tvdrach01' CRS-2676: Start of 'ora.diskmon' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.cssd' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'tvdrach01' CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'tvdrach01' CRS-2672: Attempting to start 'ora.ctssd' on 'tvdrach01' CRS-2676: Start of 'ora.ctssd' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.drivers.acfs' on 'tvdrach01' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'tvdrach01' succeeded CRS-2672: Attempting to start 'ora.asm' on 'tvdrach01' CRS-2676: Start of 'ora.asm' on 'tvdrach01' succeeded root@tvdrach01:~/ [-MGMTDB] |
Then you can recreate the GRID disk group and restore everything inside it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
SQL> alter system set asm_diskstring='/dev/asm*'; System altered. SQL> create diskgroup GRID external redundancy disk '/dev/asm-disk1' attribute 'COMPATIBLE.ADVM'='12.1.0.0.0', 'COMPATIBLE.ASM'='12.1.0.0.0'; Diskgroup created. SQL> create spfile='+GRID' from pfile='/backup/spfileASM.ora'; File created. SQL> root@tvdrach01:~/ [+ASM1] ocrconfig -restore /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150927_174702.ocr root@tvdrach01:~/ [+ASM1] grid@tvdrach01:~/ [+ASM1] crsctl replace votedisk '+GRID' Successful addition of voting disk a375f4bdb7854f8fbf7a92cd880fba60. Successfully replaced voting disk group with +GRID. CRS-4266: Voting file(s) successfully replaced root@tvdrach01:~/ [+ASM1] crsctl stop crs -f ... root@tvdrach01:~/ [+ASM1] crsctl start crs ... ASMCMD [+] > pwcopy --asm /backup/orapwASM +GRID/orapwASM copying /backup/orapwASM -> +GRID/orapwASM |
Finally, the last missing component: the GIMR.
You can recreate it or restore it (if you backed it up at some point in time).
Let’s see how to recreate it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
grid@tvdrach03:~/ [-MGMTDB] srvctl disable mgmtdb grid@tvdrach03:~/ [-MGMTDB] srvctl remove mgmtdb Remove the database _mgmtdb? (y/[n]) y grid@tvdrach01:~/ [+ASM1] dbca -silent -createDatabase -sid -MGMTDB \ > -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc \ > -gdbName _mgmtdb -storageType ASM -diskGroupName +GRID \ > -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -characterset AL32UTF8 \ > -autoGeneratePasswords -skipUserTemplateCheck Cleaning up failed steps 5% complete Registering database with Oracle Grid Infrastructure 11% complete Copying database files 12% complete 14% complete 21% complete 27% complete 34% complete 41% complete 44% complete Creating and starting Oracle instance 46% complete 51% complete 52% complete 53% complete 58% complete 62% complete 63% complete 66% complete Completing Database Creation 70% complete 80% complete 90% complete 100% complete Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb0.log" for further details. grid@tvdrach01:~/ [+ASM1] dbca -silent -createPluggableDatabase -sourceDB -MGMTDB \ > -pdbName tvdrac_cluster -createPDBFrom RMANBACKUP \ > -PDBBackUpfile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.dfb \ > -PDBMetadataFile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.xml \ > -createAsClone true -internalSkipGIHomeCheck Creating Pluggable Database Creating Pluggable Database 4% complete 12% complete 21% complete 38% complete 55% complete O-GRINF Grid Infrastructure Disaster Recovery Page 21 85% complete Completing Pluggable Database Creation 100% complete Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/tvdrac_cluster/_mgmtdb.log" for further details. grid@tvdrach01:~/ [+ASM1] srvctl status mgmtdb Database is enabled Instance -MGMTDB is running on node tvdrach01 grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/crsctl modify res ora.crf -attr ENABLED=1 -init grid@tvdrach01:~/ [+ASM1] crsctl start res ora.crf -init CRS-2672: Attempting to start 'ora.crf' on 'tvdrach01' CRS-2676: Start of 'ora.crf' on 'tvdrach01' succeeded grid@tvdrach01:~/ [+ASM1] |
Conclusion
Recovering from a lost Disk Group / Cluster is not rocket science. Just practice it every now and then. If you do not have a test RAC, you can build your lab on your laptop using the RAC Attack instructions. If you want to test all the scenarios, the RAC SIG webcast: Oracle 11g Clusterware failure scenarios with practical demonstrations by Kamran Agayev is the best starting point, IMHO. Just keep in mind that Flex ASM and the GIMR add more complexity.
HTH
—
Ludovico
Latest posts by Ludovico (see all)
- New views in Oracle Data Guard 23c - January 3, 2024
- New in Data Guard 21c and 23c: Automatic preparation of the primary - December 22, 2023
- Does FLASHBACK QUERY work across incarnations or after a Data Guard failover? - December 13, 2023
You Rock man! 🙂
Ciao, non ci ho capito molto ma tanto era per dirti ciao!
Pingback: Grid Infrastructure 12c: Recovering the GRID Disk Group and recreating the GIMR - Ludovico Caldara - Blogs - triBLOG