How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog ūüėÄ

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

spd_awr_high_cpu_sar

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

spd_awr_high_cpu_em

 

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

spd_awr_high_cpu_m001It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

spd_awr_high_cpu_instOk, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

(SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER ,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP ,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE
FROM DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST
WHERE SN.BEGIN_INTERVAL_TIME > TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME ) ,DELTA_DATA AS
(SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE + (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS DELTA_TIME
FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2 ,SNAP_RANGES SR
WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID = SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM2.SNAP_ID = SR.MAX_SNAP AND TM2.STAT_ID = SR.STAT_ID )
SELECT STAT_NAME ,ROUND(SUM(DELTA_TIME/1000000),2) AS SECS
FROM DELTA_DATA GROUP BY STAT_NAME

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

spd_cpu_sql_monitorDBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

spd_awr_franckOh no, not Franck again! He always discovers new stuff and blogs about it before I do ūüôā

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that  it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

SQL> select sql_id,  executions, loads, cpu_time from v$sqlstats where sql_id='auyf8px9ywc6j';

SQL_ID        EXECUTIONS      LOADS   CPU_TIME
------------- ---------- ---------- ----------
auyf8px9ywc6j          0         11          0

SQL> select sql_id,  child_number from v$sql where sql_id='auyf8px9ywc6j';

no rows selected

SQL>

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

SQL> select sql_id, child_number from v$sql where sql_id='auyf8px9ywc6j';

SQL_ID        CHILD_NUMBER
------------- ------------
auyf8px9ywc6j            0

SQL> select * from table (dbms_xplan.display_cursor('auyf8px9ywc6j',0, 'ALL +NOTE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  auyf8px9ywc6j, child number 0

WITH SNAP_RANGES AS (SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER
,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP
,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS
MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE FROM
DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST WHERE SN.BEGIN_INTERVAL_TIME >
TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND
SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY
SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME )
,DELTA_DATA AS (SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE
WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE
+ (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS
DELTA_TIME FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2
,SNAP_RANGES SR WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER =
SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID =
SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.

NOTE: cannot fetch plan for SQL_ID: auyf8px9ywc6j, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)


22 rows selected.

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

SQL> alter system set events 'sql_trace [sql:auyf8px9ywc6j]';

SQL> alter system set events 'trace[rdbms.SQL_Optimizer.*][sql:auyf8px9ywc6j]';

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$3877D5D0 (#3)
Applicable DS directives:
   dirid = 17707367266596005344, state = 5, flags = 1, loc = 1 {CJ(8694)[1, 2]}
   dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}
   dirid = 10027833930063681981, state = 1, flags = 5, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]; (8436)[1, 5]; (8436)[1, 5]}
Checking valid directives for the query block
  SPD: Directive valid: dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}
  SPD: Return code in qosdDSDirSetup: EXISTS, estType = GROUP_BY
  SPD: Return code in qosdDSDirSetup: NODIR, estType = HAVING
  SPD: Return code in qosdDSDirSetup: NODIR, estType = QUERY_BLOCK

 

PARSING IN CURSOR #139834781881608 len=1106 dep=4 uid=0 oct=3 lid=0 tim=3349661181783 hv=4280474888 ad='95770310' sqlid='8w3h8fvzk5r88'
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT /*+ FULL ("ST") */ "WRM$_SNAPSHOT"."DBID" "DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER" "INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME" "STARTUP_TIME","ST"."STAT_ID" "STAT_ID","ST"."STAT_NAME" "STAT_NAME",MIN("WRM$_SNAPSHOT"."SNAP_ID") "MIN_SNAP",MAX("WRM$_SNAPSHOT"."SNAP_ID") "MAX_SNAP",MIN(CAST("WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME" AS DATE)) "MIN_DATE",MAX(CAST("WRM$_SNAPSHOT"."END_INTERVAL_TIME" AS DATE)) "MAX_DATE" FROM SYS."WRM$_SNAPSHOT" "WRM$_SNAPSHOT","WRH$_STAT_NAME" "ST" WHERE "WRM$_SNAPSHOT"."DBID"="ST"."DBID" AND ("ST"."STAT_NAME"='DB CPU' OR "ST"."STAT_NAME"='DB time') AND "WRM$_SNAPSHOT"."STATUS"=0 AND "WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME">TRUNC(SYSDATE@!)-7 AND "WRM$_SNAPSHOT"."END_INTERVAL_TIME"<TRUNC(SYSDATE@!) GROUP BY "WRM$_SNAPSHOT"."DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME","ST"."STAT_ID","ST"."STAT_NAME") "VW_DIS_1") innerQuery
END OF STMT
...
>> Query Blk Card adjusted from 3.000000 to 2.000000 due to adaptive dynamic sampling

 

*** KEWUXS - encountered error: (ORA-12751: violation du temps UC ou des règles relatives au temps d'exécution
ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 14
ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 92
ORA-06512: à ligne 1
ORA-06512: à "SYS.DBMS_SQL", ligne 1707
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 312
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 522
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 694
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 791
ORA-06512: à ligne 1
)

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

SQL> select TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED, LAST_USED from dba_sql_plan_directives where directive_id in (10027833930063681981, 17707367266596005344, 17748238338555778238);

TYPE             ENA STATE      AUT REASON                               CREATED
---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------
LAST_MODIFIED                                                               LAST_USED
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     03-JUN-16 02.10.41.000000 PM
03-JUN-16 04.14.32.000000 PM

DYNAMIC_SAMPLING YES USABLE     YES SINGLE TABLE CARDINALITY MISESTIMATE 27-MAR-16 09.01.20.000000 AM
17-APR-16 09.13.01.000000 AM                                                17-APR-16 09.13.01.000000000 AM

DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     13-FEB-16 06.07.36.000000 AM
27-FEB-16 06.03.09.000000 AM                                                03-JUN-16 02.10.41.000000000 PM

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

SQL> r
  1  select distinct d.directive_id, TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED
  2  from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on
  3*     (d.directive_id=o.directive_id) where o.owner='SYS' and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT')

DIRECTIVE_ID TYPE             ENA STATE      AUT REASON                               CREATED
------------ ---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------
LAST_MODIFIED
---------------------------------------------------------------------------
  8.8578E+18 DYNAMIC_SAMPLING YES USABLE     YES JOIN CARDINALITY MISESTIMATE         14-FEB-16 08.11.29.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7748E+19 DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     19-MAR-16 02.15.17.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7170E+19 DYNAMIC_SAMPLING YES USABLE     YES JOIN CARDINALITY MISESTIMATE         14-FEB-16 08.11.29.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7707E+19 DYNAMIC_SAMPLING YES USABLE     YES SINGLE TABLE CARDINALITY MISESTIMATE 13-MAR-16 08.04.38.000000 AM
06-JUN-16 01.57.35.000000 PM

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

BEGIN
  FOR rec in (select d.directive_id as did 
    from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on
    (d.directive_id=o.directive_id) where o.owner='SYS'
      and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT'))
  LOOP
    DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'ENABLED','NO');
    DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'AUTO_DROP','NO');
  END LOOP;
END;
/

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

-- export from a source where the directives exist and have been disabled
SET SERVEROUTPUT ON
DECLARE
  my_list  DBMS_SPD.OBJECTTAB := DBMS_SPD.ObjectTab();
  dir_cnt  NUMBER;
BEGIN
  DBMS_SPD.CREATE_STGTAB_DIRECTIVE  (table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM' );
  my_list.extend(3);
 
  -- TAB table
  my_list(1).owner := 'SYS';
  my_list(1).object_name := 'WRH$_SYS_TIME_MODEL';
  my_list(1).object_type := 'TABLE';
  my_list(2).owner := 'SYS';
  my_list(2).object_name := 'WRH$_STAT_NAME';
  my_list(2).object_type := 'TABLE';
  my_list(3).owner := 'SYS';
  my_list(3).object_name := 'WRM$_SNAPSHOT';
  my_list(3).object_type := 'TABLE';

  dir_cnt := DBMS_SPD.PACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM', obj_list => my_list);
   DBMS_OUTPUT.PUT_LINE('dir_cnt = ' || dir_cnt);
END;
/

expdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=expdp_AUYF8PX9YWC6J_DIRECTIVES.log tables=system.AUYF8PX9YWC6J_DIRECTIVES

-- import into the freshly upgraded/created 12c database
impdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=impdp_AUYF8PX9YWC6J_DIRECTIVES.log

SELECT DBMS_SPD.UNPACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM') FROM DUAL;

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

Ludovico

 

Cloning a PDB with ASM and Data Guard (no ADG) without network transfer

Ok, if you’re reading this post, you may want to read also the previous one that explains something more about the problem.

Briefly said, if you have a CDB running on ASM in a MAA architecture and you do not have Active Data Guard, when¬†you clone a PDB you have to “copy” the datafiles somehow on the standby. The only solution offered by Oracle (in a MOS Note, not in the documentation) is to restore the PDB from the primary to the standby site, thus transferring it over the network. But if you have a huge PDB this is a bad solution because it impacts your network connectivity. (Note: ending up with a huge PDB IMHO can only be caused by bad consolidation. I do not recommend to consolidate huge databases on Multitenant).

So I’ve worked out another¬†solution, that still has many defects and is almost not viable, but it’s technically interesting because it permits to discover a little more about Multitenant and Data Guard.

The three options

At the primary site, the process is always the same: Oracle copies the datafiles of the source, and it modifies the headers so that they can be used by the new PDB (so it changes CON_ID, DBID, FILE#, and so on).

On the standby site, by opposite, it changes depending on the option you choose:

Option 1: Active Data Guard

If you have ADG, the ADG itself will take care of copying the datafile on the standby site, from the source standby pdb to the destination standby pdb. Once the copy is done, the MRP0 will continue the recovery. The modification of the header block of the destination PDB is done by the MRP0 immediately after the copy (at least this is what I understand).

ADG_PDB_copy

Option 2: No Active Data Guard, but STANDBYS=none

In this case, the copy on the standby site doesn’t happen, and the recovery process just add the entry of¬†the new datafiles in the controlfile, with status OFFLINE and name UNKNOWNxxx. ¬†However, the source file cannot be copied¬†anymore, because the MRP0 process will expect to have a copy of the destination datafile, not the source datafile. Also, any tentative of restore of the datafile 28 (in this example) will give an error because it does not belong to the destination PDB. So the only chance is to restore the destination PDB from the primary.
NOADG_PDB_STANDBYS_NONE_copy

Option 3: No Active Data Guard, no STANDBYS=none

This is the case that I want to explain actually. Without the flag STANDBYS=none, the MRP0 process will expect to change the header of the new datafile, but because the file does not exist yet, the recovery process dies.
We can then copy it manually from the source standby pdb, and restart the recovery process, that will change the header. This process needs to be repeated for each datafile. (that’s why it’s not a viable solution, right now).

NOADG_PDB_copy

Let’s try it together:

The Environment

Primary

08:13:08 SYS@CDBATL_2> select db_unique_name, instance_name from v$database, gv$instance;

DB_UNIQUE_NAME                 INSTANCE_NAME
------------------------------ ----------------
CDBATL                         CDBATL_2
CDBATL                         CDBATL_1

Standby

07:35:56 SYS@CDBGVA_2> select db_unique_name, instance_name from v$database, gv$instance;

DB_UNIQUE_NAME                 INSTANCE_NAME
------------------------------ ----------------
CDBGVA                         CDBGVA_1
CDBGVA                         CDBGVA_2

The current user PDB (any resemblance to real people is purely coincidental ūüėȬ†#haveUSeenMaaz):

08:14:31 SYS@CDBATL_2> select open_mode, name from gv$pdbs where name='MAAZ';

OPEN_MODE  NAME
---------- ------------------------------
OPEN       MAAZ
OPEN       MAAZ

Cloning the PDB on the primary

First, make sure that the source PDB is open read-only

08:45:54 SYS@CDBATL_2> alter pluggable database maaz close immediate instances=all;

Pluggable database altered.

08:46:20 SYS@CDBATL_2> alter pluggable database maaz open read only instances=all;

Pluggable database altered.

08:46:32 SYS@CDBATL_2> select open_mode, name from gv$pdbs where name='MAAZ' ;

OPEN_MODE  NAME
---------- ------------------------------
READ ONLY  MAAZ
READ ONLY  MAAZ

Then, clone the PDB on the primary without the clause STANDBYS=NONE:

08:46:41 SYS@CDBATL_2> create pluggable database LUDO from MAAZ;

Pluggable database created.

Review the clone on the Standby

At this point, on the standby the alert log show that the SYSTEM datafile is missing, and the recovery process stops.

Mon Dec 15 17:46:11 2014
Recovery created pluggable database LUDO
Mon Dec 15 17:46:11 2014
Errors in file /u01/app/oracle/diag/rdbms/cdbgva/CDBGVA_2/trace/CDBGVA_2_mrp0_16464.trc:
ORA-01565: error in identifying file '+DATA'
ORA-17503: ksfdopn:2 Failed to open file +DATA
ORA-15045: ASM file name '+DATA' is not in reference form
Recovery was unable to create the file as:
'+DATA'
MRP0: Background Media Recovery terminated with error 1274
Mon Dec 15 17:46:11 2014
Errors in file /u01/app/oracle/diag/rdbms/cdbgva/CDBGVA_2/trace/CDBGVA_2_mrp0_16464.trc:
ORA-01274: cannot add data file that was originally created as '+DATA/CDBATL/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.825.866396765'
Mon Dec 15 17:46:11 2014
Managed Standby Recovery not using Real Time Apply
Mon Dec 15 17:46:11 2014
Recovery interrupted!
Recovery stopped due to failure in applying recovery marker (opcode 17.34).
Datafiles are recovered to a consistent state at change 10433175 but controlfile could be ahead of datafiles.
Mon Dec 15 17:46:11 2014
Errors in file /u01/app/oracle/diag/rdbms/cdbgva/CDBGVA_2/trace/CDBGVA_2_mrp0_16464.trc:
ORA-01274: cannot add data file that was originally created as '+DATA/CDBATL/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.825.866396765'
Mon Dec 15 17:46:11 2014
MRP0: Background Media Recovery process shutdown (CDBGVA_2)

One remarkable thing, is that in the standby controlfile, ONLY THE SYSTEM DATAFILE exists:

18:02:50 SYS@CDBGVA_2> select con_id from v$pdbs where name='LUDO';

    CON_ID
----------
         4

18:03:10 SYS@CDBGVA_2> select name from v$datafile where con_id=4;

NAME
---------------------------------------------------------------------------
+DATA/CDBATL/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.825.866396765

We need to fix the datafiles one by one, but most of the steps can be done once for all the datafiles.

Copy the source PDB from the standby

What do we need to do? Well, the recovery process is stopped, so we can safely copy the datafiles of  the source PDB from the standby site because they have not moved yet. (meanwhile, we can put the primary source PDB back in read-write mode).

-- on primary
08:58:07 SYS@CDBATL_2> alter pluggable database maaz close immediate instances=all;

Pluggable database altered.

08:58:15 SYS@CDBATL_2> alter pluggable database maaz open read write instances=all;

Pluggable database altered.

Copy the datafiles:

## on the standby:
RMAN> backup as copy pluggable database MAAZ;

Starting backup at 15-DEC-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=58 instance=CDBGVA_2 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00029 name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.463.857404625
output file name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.863.866397043 tag=TAG20141215T175041 RECID=54 STAMP=866397046
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00028 name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.283.857404623
output file name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049 tag=TAG20141215T175041 RECID=55 STAMP=866397051
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 15-DEC-14

Starting Control File and SPFILE Autobackup at 15-DEC-14
piece handle=+DATA/CDBGVA/AUTOBACKUP/2014_12_15/s_866396771.865.866397053 comment=NONE
Finished Control File and SPFILE Autobackup at 15-DEC-14

Do the magic

Now there’s the interesting part: we need to assign the datafile copies of the maaz PDB to LUDO.

Sadly, the OMF will create the copies on the bad location (it’s a copy, to they are created on the same location as the source PDB).

We cannot try to uncatalog and recatalog the copies, because they will ALWAYS be affected to the source PDB. Neither we can use RMAN because it will never associate the datafile copies to the new PDB. We need to rename the files manually.

RMAN> list datafilecopy all;

List of Datafile Copies
=======================

Key File S Completion Time Ckp SCN Ckp Time
------- ---- - --------------- ---------- ---------------
55 28 A 15-DEC-14 10295232 14-DEC-14
 Name: +DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.86639709
 Tag: TAG20141215T175041

54 29 A 15-DEC-14 10295232 14-DEC-14
 Name: +DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.863.86639703
 Tag: TAG20141215T175041


RMAN> select name, guid from v$pdbs;

NAME       GUID
---------- --------------------------------
PDB$SEED   FFBCECBB503D606BE043334EA8C019B7
MAAZ       0243BF7B39D4440AE053334EA8C0E471
LUDO       0A4A0048D5321597E053334EA8C0E40A

It’s better to uncatalog the datafile copies before, so we keep the catalog clean:

RMAN> change datafilecopy '+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049' uncatalog;

uncataloged datafile copy
datafile copy file name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049 RECID=55 STAMP=866397051
Uncataloged 1 objects


RMAN> change datafilecopy '+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.863.866397043' uncatalog;

uncataloged datafile copy
datafile copy file name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.863.866397043 RECID=54 STAMP=866397046
Uncataloged 1 objects

Then, because we cannot rename files on a standby database with standby file management set to AUTO, we need to put it temporarily to MANUAL.

10:24:21 SYS@CDBGVA_2> alter database rename file '+DATA/CDBATL/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.825.866396765' to '+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049';
alter database rename file '+DATA/CDBATL/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.825.866396765' to '+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049'
*
ERROR at line 1:
ORA-01275: Operation RENAME is not allowed if standby file management is automatic.
10:27:49 SYS@CDBGVA_2> select name, ispdb_modifiable from v$parameter where name like 'standby%';

NAME                                                         ISPDB
------------------------------------------------------------ -----
standby_archive_dest                                         FALSE
standby_file_management                                      FALSE

standby_file_management is not PDB modifiable, so we need to do it for the whole CDB.

10:31:42 SYS@CDBGVA_2> alter system set standby_file_management=manual;

System altered.

18:05:04 SYS@CDBGVA_2> alter database rename file '+DATA/CDBATL/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.825.866396765' to '+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049';

Database altered.



then we need to set back the standby_file_management=auto or the recover will not start:

10:34:24 SYS@CDBGVA_2> alter system set standby_file_management=auto;
System altered.

We can now restart the recovery.

The recovery process will:
Рchange the new datafile by modifying the header for the new PDB
– create the entry for the second datafile in the controlfile
– crash again because the datafile is missing

18:11:30 SYS@CDBGVA_2> alter database recover managed standby database;
alter database recover managed standby database
*
ERROR at line 1:
ORA-00283: recovery session canceled due to errors
ORA-01111: name for data file 61 is unknown - rename to correct file
ORA-01110: data file 61: '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/UNNAMED00061'
ORA-01157: cannot identify/lock data file 61 - see DBWR trace file
ORA-01111: name for data file 61 is unknown - rename to correct file
ORA-01110: data file 61: '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/UNNAMED00061'


18:11:33 SYS@CDBGVA_2> select name from v$datafile where con_id=4;

NAME
---------------------------------------------------------------------------
+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049
/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/UNNAMED00061

We already have the SYSAUX datafile, right? So we can alter the name again:

18:14:21 SYS@CDBGVA_2> alter system set standby_file_management=manual;

System altered.

18:14:29 SYS@CDBGVA_2> alter database rename file '/u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/UNNAMED00061' to '+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.863.866397043';

Database altered.

18:14:31 SYS@CDBGVA_2> alter system set standby_file_management=auto;

System altered.

18:14:35 SYS@CDBGVA_2> alter database recover managed standby database;

This time all the datafiles have been copied (no user datafile for this example) and the recovery process will continue!! ūüôā so we can hit ^C and start it in background.

18:14:35 SYS@CDBGVA_2> alter database recover managed standby database;
alter database recover managed standby database
*
ERROR at line 1:
ORA-16043: Redo apply has been canceled.
ORA-01013: user requested cancel of current operation

 

18:18:10 SYS@CDBGVA_2> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;

Database altered.

18:18:19 SYS@CDBGVA_2>

The Data Guard configuration reflects the success of this operation.

Do we miss anything?

Of course, we do!! The datafile names of the new PDB reside in the wrong ASM path. We need to fix them!

18:23:07 SYS@CDBGVA_2> alter database recover managed standby database cancel;

Database altered.

RMAN> backup as copy pluggable database ludo;

Starting backup at 15-DEC-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=60 instance=CDBGVA_2 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00061 name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/sysaux.863.866397043
output file name=+DATA/CDBGVA/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/sysaux.866.866398933 tag=TAG20141215T182213 RECID=56 STAMP=866398937
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00060 name=+DATA/CDBGVA/0243BF7B39D4440AE053334EA8C0E471/DATAFILE/system.864.866397049
output file name=+DATA/CDBGVA/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.867.866398941 tag=TAG20141215T182213 RECID=57 STAMP=866398943
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 15-DEC-14

Starting Control File and SPFILE Autobackup at 15-DEC-14
piece handle=+DATA/CDBGVA/AUTOBACKUP/2014_12_15/s_866398689.868.866398945 comment=NONE
Finished Control File and SPFILE Autobackup at 15-DEC-14

RMAN> switch pluggable database ludo to copy;

using target database control file instead of recovery catalog
datafile 60 switched to datafile copy "+DATA/CDBGVA/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.867.866398941"
datafile 61 switched to datafile copy "+DATA/CDBGVA/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/sysaux.866.866398933"

18:23:54 SYS@CDBGVA_2> select name from v$datafile where con_id=4;

NAME
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+DATA/CDBGVA/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/system.867.866398941
+DATA/CDBGVA/0A4A0048D5321597E053334EA8C0E40A/DATAFILE/sysaux.866.866398933

 

I know there’s no practical use of this procedure, but it helps a lot in understanding how Multitenant has been implemented.

I expect some improvements in 12.2!!

Cheers

Ludo