In my config, the Oracle Database version is 19.7 and the databases are actually CDBs. No Grid Infrastructure, non-OMF datafiles.
It is important to highlight that a lot of things have changed since 12.1. And because 19c is the LTS version now, it does not make sense to try anything older.
First, I just want to make sure that my standbys are aligned.
“Specify ALL to include the new PDB in all standby CDBs. This is the default.”
“Specify NONE to exclude the new PDB from all standby CDBs. When a PDB is excluded from all standby CDBs, the PDB’s data files are unnamed and marked offline on all of the standby CDBs. Standby recovery will not stop if the data files for the PDB are not found on the standby. […]”
So, in order to avoid the MRP to crash, I should have included STANDBYS=NONE
But the documentation is not up to date, because in my case the PDB is skipped automatically and the recovery process DOES NOT STOP:
However, the recovery is marked ENABLED for the PDB on the standby, while usind STANDBYS=NONE it would have been DISABLED.
So, another difference with the doc who states: “You can enable a PDB on a standby CDB after it was excluded on that standby CDB by copying the data files to the correct location, bringing the PDB online, and marking it as enabled for recovery.”
This reflects the findings of Philippe Fierens in his blog (http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).
This behavior has been introduced probably between 12.2 and 19c, but I could not manage to find exactly when, as it is not explicitly stated in the documentation.
However, I remember well that in 220.127.116.11, the MRP process was crashing.
In my configuration, not on purpose, but interesting for this article, the first standby has the very same directory structure, while the cascaded standby has not.
In any case, there is a potentially big problem for all the customers implementing Multitenant on Data Guard:
With the old behaviour (MRP crashing), it was easy to spot when a PDB was cloned online into a primary database, because a simple dgmgrl “show configuration” whould have displayed a warning because of the increasing lag (following the MRP crash).
With the current behavior, the MRP keeps recovering and the “show configuration” displays “SUCCESS” despite there is a PDB not copied on the standby (thus not protected).
The missing PDB is easy to spot once I know that I have to do it. However, for each PDB to recover (I might have many!), I have to prepare the rename of datafiles and creation of directory (do not forget I am using non-OMF here).
Now, the datafile names on the standby got changed to …/UNNAMEDnnnnn.
So I have to get the original ones from the primary database and do the same replace that db_file_name_convert would do:
select'set newname for datafile '||file#||' to '''||replace(name,'/TOOLCDB1/','/TOOLCDX1/')||''';'asrename_filefromv$datafilewherecon_id=6;
and put this in a rman script (this will be for the second standby, the first has the same name so same PATH):
Then, I need to stop the recovery, start it and stopping again, put the datafiles online and finally restart the recover.
These are the same steps used my Philippe in his blog post, just adapted to my taste 🙂
Now, I do not have anymore any datafiles offline on the standby:
SQL>select'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!'fromv$recover_filewhereonline_status='OFFLINE'groupbycon_id;
I will not publish the steps for the second standby, they are exactly the same (same output as well).
At the end, for me it is important to highlight that monitoring the OFFLINE datafiles on the standby becomes a crucial point to guarantee the health of Data Guard in Multitenant. Relying on the Broker status or “PDB recovery disabled” is not enough.
On the bright side, it is nice to see that Cascade Standby configurations do not introduce any variation, so cascaded standbys can be threated the same as “direct” standby databases.
Not at all! The tables are still there and reference to tablespaces that do not exist anymore. Possible?
Actually, the tablespaces definition are still there if we look at v$tablespace:
If we give a look at the DBA_TABLESPACES view definition, there are a few interesting filters:
What is their meaning?
online$/* status (see KTT.H): */
/* 1 = ONLINE, 2 = OFFLINE, 3 = INVALID */
flags/* various flags: see ktt3.h */
/* 0x800 = this actually describes a group */-- 2048
/* 0x1000000 = bad transport of ts in pdb */-- 16777216
So the first WHERE clause skips all the INVALID TABLESPACES (when you drop a tablespace it is still stored in ts$ with this state), the second skips the definition of TEMPORARY TABLESPACE GROUPS, the third one is actually our candidate.
Indeed, this is what we get from ts$ for these tablespaces:
dbms_output.put_line('Cannot drop default tablespace '||r_tbsps.name||'. Please change the default tablespace and drop it afterwards.');
This is the output for the clone procedure we have just seen:
The PL/SQL block can be quite slow depending on the segments and tablespaces, so it might be a good idea to have a custom script instead of this automated one.
What about user DEF?
The automated procedure has not locked the account DEF. Why?
Actually, the user DEF still has some segments in the DATA tablespace. Hence, the procedure cannot be sure what was the original intention: copy the user ABC ? The clone procedure allows only to specify the tablespaces, so this is the only possible result.
There are case, however, where you cannot do it. For example, it the existing PDB should have been the clone, or if you are converting a copy of the same database from Non-CDB to PDB using autoupgrade (with autoupgrade you cannot modify the CREATE PLUGGABLE DATABASE statement).
In this case the solution might be to change the DBID of the existing PDB, via unplug/plug:
Ok, if you’re reading this post, you may want to read also the previous one that explains something more about the problem.
Briefly said, if you have a CDB running on ASM in a MAA architecture and you do not have Active Data Guard, when you clone a PDB you have to “copy” the datafiles somehow on the standby. The only solution offered by Oracle (in a MOS Note, not in the documentation) is to restore the PDB from the primary to the standby site, thus transferring it over the network. But if you have a huge PDB this is a bad solution because it impacts your network connectivity. (Note: ending up with a huge PDB IMHO can only be caused by bad consolidation. I do not recommend to consolidate huge databases on Multitenant).
So I’ve worked out another solution, that still has many defects and is almost not viable, but it’s technically interesting because it permits to discover a little more about Multitenant and Data Guard.
The three options
At the primary site, the process is always the same: Oracle copies the datafiles of the source, and it modifies the headers so that they can be used by the new PDB (so it changes CON_ID, DBID, FILE#, and so on).
On the standby site, by opposite, it changes depending on the option you choose:
Option 1: Active Data Guard
If you have ADG, the ADG itself will take care of copying the datafile on the standby site, from the source standby pdb to the destination standby pdb. Once the copy is done, the MRP0 will continue the recovery. The modification of the header block of the destination PDB is done by the MRP0 immediately after the copy (at least this is what I understand).
Option 2: No Active Data Guard, but STANDBYS=none
In this case, the copy on the standby site doesn’t happen, and the recovery process just add the entry of the new datafiles in the controlfile, with status OFFLINE and name UNKNOWNxxx. However, the source file cannot be copied anymore, because the MRP0 process will expect to have a copy of the destination datafile, not the source datafile. Also, any tentative of restore of the datafile 28 (in this example) will give an error because it does not belong to the destination PDB. So the only chance is to restore the destination PDB from the primary.
Option 3: No Active Data Guard, no STANDBYS=none
This is the case that I want to explain actually. Without the flag STANDBYS=none, the MRP0 process will expect to change the header of the new datafile, but because the file does not exist yet, the recovery process dies.
We can then copy it manually from the source standby pdb, and restart the recovery process, that will change the header. This process needs to be repeated for each datafile. (that’s why it’s not a viable solution, right now).
We need to fix the datafiles one by one, but most of the steps can be done once for all the datafiles.
Copy the source PDB from the standby
What do we need to do? Well, the recovery process is stopped, so we can safely copy the datafiles of the source PDB from the standby site because they have not moved yet. (meanwhile, we can put the primary source PDB back in read-write mode).
Now there’s the interesting part: we need to assign the datafile copies of the maaz PDB to LUDO.
Sadly, the OMF will create the copies on the bad location (it’s a copy, to they are created on the same location as the source PDB).
We cannot try to uncatalog and recatalog the copies, because they will ALWAYS be affected to the source PDB. Neither we can use RMAN because it will never associate the datafile copies to the new PDB. We need to rename the files manually.
In my #OOW14 presentation about MAA and Multitenant, more precisely at slide #59, “PDB Creation from other PDB without ADG*”, I list a few commands that you can use to achieve a “correct” Pluggable Database clone in case you’re not using Active Data Guard.
What’s the problem with cloning a PDB in a MAA environment without ADG? If you’ve attended my session you should know the answer…
If you plan to create a PDB as a clone from a different PDB, then copy the data files that belong to the source PDB over to the standby database. (This step is not necessary in an Active Data Guard environment because the data files are copied automatically when the PDB is created on the standby database.)
But because there are good possibilities (99%?) that in a MAA environment you’re using ASM, this step is not so simple: you cannot copy the datafiles exactly where you want, it’s OMF, and the recovery process expects the files to be where the controlfile says they should be.
So, if you clone the PDB, the recovery process on the standby doesn’t find the datafiles at the correct location, thus the recovery process will stop and will not start until you fix manually. That’s why Oracle has implemented the new syntax “STANDBYS=NONE” that disables the recovery on the standby for a specific PDB: it lets you disable the recovery temporarily while the recovery process continues to apply logs on the remaining PDBs. (Note, however, that this feature is not intended as a generic solution for having PDBs not replicated. The recommended solution in this case is having two distinct CDBs, one protected by DG, the other not).
With ADG, when you clone the PDB on the primary, on the standby the ADG takes care of the following steps, no matter if on ASM or FS:
recover up to the point where the file# is registered in the controlfile
copy the datafiles from the source DB ON THE STANDBY DATABASE (so no copy over the network)
rename the datafile in the controlfile
continue with the recovery
If you don’t have ADG, and you’re on ASM, Oracle documentation says nothing with enough detail to let you solve the problem. So in August I’ve worked out the “easy” solution that I’ve also included in my slides (#59 and #60):
SQL> create pluggable database DEST from SRC standbys=none;
RMAN> backup as copy pluggable database DEST format ‘/tmp/dest%f.dbf’;
$ scp /tmp/dest*.dbf remote:/tmp
RMAN> catalog start with ‘/tmp/dest’
RMAN> set newnamefor pluggable database DEST to new;
RMAN> restore pluggable database DEST;
RMAN> switch pluggable database DEST to copy;
DGMGRL> edit database ‘STBY’ set state=’APPLY-OFF’;
SQL> Alter pluggable database DEST enable recovery;
DGMGRL> edit database ‘STBY’ set state=’APPLY-ON’;
Once at #OOW14, after endless conversations at the Demo Grounds, I’ve discovered that Oracle has worked out the very same solution requiring network transfer and that it has been documented in a new note.
Making Use of the STANDBYS=NONE Feature with Oracle Multitenant (Doc ID 1916648.1)
This note is very informative and I recommend to read it carefully!
What changes (better) in comparison with my first solution, is that Oracle suggests to use the new feature “restore from service”:
I’ve questioned the developers at the Demo Grounds about the necessity to use network transfer (I had the chance to speak directly with the developer who has written this piece of code!! :-)) and they said that they had worked out only this solution. So, if you have a huge PDB to clone, the network transfer from the primary to standby may impact severely your Data Guard environment and/or your whole infrastructure, for the time of the transfer.
Of course, I have a complex, undocumented solution, I hope I will find the time to document it, so stay tuned if you’re curious! 🙂
DISCLAIMER: I’ve got this information by chatting with Oracle developers at the Demo Grounds. The functionality is not documented yet and Oracle may change it at its sole discretion. Please refer to the documentation if/when it will be updated 😉
Here you can find the material related to my session at Oracle Open World 2014. I’m sorry I’m late in publishing them, but I challenge you to find spare time during Oracle Open World! It’s the busiest week of the year! (Hard Work, Hard Play)
select inst_id,con_id,name,open_mode from gv\$pdbswhere con_id!=2order by con_id,inst_id;
pause"please connect to the RW service"
pause"next... dgmgrl status and validate"
echos"Validate Standby database"
pause"next... switchover to CDBGVA"
echos"Switchover to CDBGVA! (it takes a while)"
There’s one slide describing the procedure for cloning one PDB using the standbys clause. Oracle has released a Note while I was preparing my slides (one month ago) and I wasn’t aware of it, so you may also checkout this note on MOS:
Making Use of the STANDBYS=NONE Feature with Oracle Multitenant (Doc ID 1916648.1)
But, on the standby database, the PDB somehow was existing.
I’ve tried to play a little, and finally decided to disable the recovery for the PDB (new in 18.104.22.168).
But to disable the recovery I was needing to connect to the PDB, but the PDB was somehow “inexistent”: