DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL not working on 12c? Here’s why…

It is bad to realize, after a few years, that my customer’s Audit Cleanup procedures are not working properly for every database…

NOTE: The post is based on standard audit, not unified audit.

My customer developed a quite nice procedure for database housekeeping (including diag dest, OS audit trail, recyclebin, DB audit…)

But after some performance problems, I have come across the infamous sql_id 4ztz048yfq32s:

This SQL comes from the “Failed Logon Attempts” metric in Enterprise Manager.

I’ve checked the specific database, and the table SYS.AUD$ was containing way too many rows, dating before our purge time:

The cleanup procedure does basically this:

But despite a retention window of 31 days, the rows are still there:

(today is 27.04.2018, so the oldest records are more than 1 year old)

I’ve checked with ASH, the actual delete statement executed by the clean_audit_trail procedure is:

So, the DBID clause is OK, but the NTIMESTAMP# clause is  not!

Why?

Long story long (hint, it’s a bug, not filed yet):

The cleanup metadata is stored into the view DBA_AUDIT_MGMT_LAST_ARCH_TS. Its structure in 11g was:

But in 12c, there are 2 new columns:

When the database is upgraded from 11g to 12c, the two new columns are set to “0” by default.

But when the procedure DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP is executed, the actual dbid is used, and new lines appear:

It is clear now that the DELETE statement is not constructed properly. It should get the LAST_ARCHIVE_TS of the actual DBID being purged… but it takes the other one.

According to my tests, it does not use neither the correct timestamp for the dbid, nor get the oldest timestamp: it uses instead the timestamp of the first record found by the clause “WHERE AUDIT_TRAIL=’STANDARD AUDIT TRAIL'”. It depends on the physical location of the row in the table! Clearly a big mess… (PS, not sure 100%, but this is what I suppose)

So, I have tried to modify the archive time for DBID 0:

Trying to execute the cleanup again, now leads to a better timestamp:

I have then tried to play a little bit with the DBA_AUDIT_MGMT_LAST_ARCH_TS view (and the underlying table DAM_LAST_ARCH_TS$).

First, I’ve faked the DBID:

Then, I have tried to increase the retention timestamp (500 days):

Finally, I have tried to purge the audit trail with both DBIDs:

As I expected, in both cases the the cleanup generated the delete with the timestamp of the fake DBID:

Is it possible to delete the unwanted records from the view DBA_AUDIT_MGMT_LAST_ARCH_TS?

Not only is possible, but I recommend it:

Afterwards, the timestamp in the where condition is correct and remains correct after subsequent executions of DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP.

Conclusions, IMPORTANT FOR THE DATABASE OPERATIONS:

The upgrade causes the unwanted lines with DBID=0 in the DBA_AUDIT_MGMT_LAST_ARCH_TS view.

Moreover, any duplicate changes the DBID: any subsequent execution of DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP in the duplicated database will lead to additional lines in the view.

This is what I plan to do now:

  • Whenever I upgrade from 11g to 12c, cleanup the data from DBA_AUDIT_MGMT_LAST_ARCH_TS and schedule the cleanup for DBID 0 as well
  • Whenever I duplicate a database, I execute a DELETE (without clauses) from DBA_AUDIT_MGMT_LAST_ARCH_T and a truncate of the table SYS.AUD$ (it is a duplicate, after all!)

HTH

BP and Patch 22652097: set optimizer_adaptive_statistics to FALSE explicitly or it might not work!

Update 14.03.2018: After some exchanges with Nigel Bayliss, the behaviour described here has been filed as unpublished bug 27626925: OPTIMIZER ADAPTIVE STATS DEFAULT FALSE NOT HONORED WHEN ENABLED IN OCT OR JAN BP. It will be fixed starting with April’s bundle patch.

 

According to Nigel’s blog post:

The Oracle 12.1.0.2 October 2017 BP and the Adaptive Optimizer

if you installled the patch 22652097 prior to apply the Bundle Patch 171018, the BP apply in the database should recognize that the patch was already in place and keep it activated. This is done through the fix control 26664361.

When fix_control 26664361:0 -> Patch 22652097 is not enabled: the parameter optimizer_adaptive_features (OAF) works

When fix_control 26664361:1 -> Patch 22652097 is enabled; optimizer_adaptive_features is discarded and the two new parameters have the priority: optimizer_adaptive_plans (OAP) and optimizer_adaptive_statistics (OAS).

But at my customer, I had another behavior.

My patching story might be very similar to yours!

When I started upgrading my customer’s database to 12c in early 2015, I experienced very soon the infamous problems with SQL Plan Directives (SPD) and Adaptive Dynamic Sampling (ADS) that I described in my paper: ADAPTIVE FEATURES OR: HOW I LEARNED TO STOP WORRYING AND TROUBLESHOOT THE BOMB .

Early fixes

When I was new to the problem, the quick fix for the problematic applications was to set OAF to FALSE.

Later, I discovered some more details and decided to opt for setting:

In other cases, I disabled the specific directives that were causing problems.

But many databases did not have so many problems, and I left the defaults.

Patch 22652097 on top of BP170718 

At some point, me and my customer decided to apply the fix 22652097, on top of BP170718 that was our current patch level at that time.

The patch installation on a test database was complaining about the optimizer_adaptive_feature set: this parameter was not used anymore. This issue is nicely explained by Flora in her post Patch 22652097 in 12.1 makes optimizer_adaptive_features parameter obsolete.

In order to apply that patch on the remaining databases, we did:

  • alter system reset optimizer_adaptive_features;
  • alter system reset “_optimizer_dsdir_usage_control”;
  • Applied the patch on binaries and datapatch on the databases.

The result at this point was that:

  • optimizer_adaptive_features was not set
  • optimizer_adaptive_plans was set to true
  • optimizer_adaptive_statistics was set to false.

It might seems superflous to say, but it’s not, the SQL Plan Directives were not used anymore: no Adaptice Dynamic Sampling and no performance problems.

Bundle Patch 180116

Three weeks ago, we installled the last Bundle Patch in order to fix some Grid Infrastructure problems, and the BP, as described in Nigel’s note (and Mike Dietrich and many other bloggers :-)) contains the patch 22652097.

According to Nigel’s post, the patch installation should have detected that the patch 22652097 was already there and activate it.

And indeed, after we applied the BP, the fix_control 26664361 was set to 1 (that means that the patch 22652097 is enabled). So we went live with this setup without additional checks.

One week later, we started experiencing performance problems again. I noticed immediately that the Adaptive Dynamic Sampling was very aggressive again, and the SQL Plan Directives used again.

But the fix was there AND ENABLED!

After a few tests, I realized that the SPD is not used anymore if I set optimizer_adaptive_statistics EXPLICITLY to false.

optimizer_adaptive_statistics must be set explicitly, the default does not work

And here’s the proof:

I use once again the great SPD example by Tim Hall (sorry Tim, it’s not the first time that I steal your work 🙂 ) . You can find here:

SQL Plan Directives in Oracle Database 12c Release 1 (12.1)

After applying the BP, I have the default parameter, not set explicitly, and the fix_control enabled:


If I run the test statement (again, find it here https://oracle-base.com/articles/12c/sql-plan-directives-12cr1) the directives are used:


but then I set the parameter explicitly:

and the SPD usage (and consequently, ADS), are gone:

Conclusion

Set the parameter EXPLICITLY when you apply the BP that contains the fix.

And ALWAYS test the behavior!

You can check how many statements use the dynamic sampling by following this short blog post by Dominic Brooks:

Which of my sql statements are using dynamic sampling?

HTH

Get the Most out of Oracle Data Guard – The material

Here we go: as usual, the feedback that I usually get after my talks (specifically, after POUG High Five conference), is if I will share my demo scripts and material.

Sadly, the demos I am doing for my presentation “Get the most out of Oracle Data Guard” are quite tied to an environment built for the purpose of the demos. So, do not expect to get scripts easy to use as is, but rather to get some ideas beyond the demo themselves.

I hope they will help to get the whole picture.

Of course, if you need to implement a cloning strategy based on Data Guard or any other solution that I describe in this post, please feel free to contact me, I will be glad to help you implement it in your environment.

Slides

Demo 1

Video:

Scripts:

 

Demo 2

Video:


Scripts:

 

Demo 3

Video:

Scripts:

Preparation:

snap_acfs.pl

 

snap_databasae.pl

clone_from_snap.pl

Cheers

Ludovico

12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

Recently my customer patched its 12.1.0.2 databases with the Bundle Patch 170718 on the new servers (half of the customer’s environment). The old servers are still on 161018 Bundle Patch.

We realized that we could not move anymore the databases from the old servers to the new ones because the duplicate from active database was failing with this error:

The last lines shows the same error that Franck blogged about some months ago.

Oracle 12.2 had introduced incompatibility with previous releases in remote file transfer via SQL*Net. At least this is what it seems. According to Oracle, this is due to a bugfix present in Oracle 12.2

Now, the bundle patch that we installed on BP 170718 contains the same bugfix (Patch for bug 18633374).

So, the incompatibility happens now between databases of the same “Major Release” (12.1.0.2).

There are two possible workarounds:

  1. Apply the same patch level on both sides (BP170718 in my case)
  2. Apply just the patch 18633374 on top of your current PSU/DBBP (a merge might be necessary).

We used the second approach and now we can setup Data Guard again to move our databases without downtime:

HTH

Ludovico

 

 

Another problem with “KSV master wait” and “ASM file metadata operation”

My customer today tried to do a duplicate on a cluster. When preparing the auxiliary instance, she noticed that the startup nomount was hanging forever: Nothing in the alert, nothing in the trace files.

Because the database and the spfile were stored inside ASM, I’ve been quite suspicious…

The ASM trace files had the following entries:

The ASM instance had the following sessions waiting:

OMS?

Around 12:38:56, another colleague in the office added a disk to one of the disk groups, through Enterprise Manager 12c!

But there were no rebalance operations:

It’s not the first time that I hit this type of problems. Sadly, sometimes it requires a full restart of the cluster or of ASM (because of different bugs).

This time, however, I have tried to kill only the foreground sessions waiting on “ASM file metadata operation”, starting with the one coming from the OMS.

Surprisingly, after killing that session, everything was fine again:

I never add disks via OMS (I’m a sqlplus guy ;-)) , I wonder what went wrong with it 🙂

Ludovico

RMAN Catalog Housekeeping: how to purge the old incarnations

First, let me apologize because every post in my blog starts with a disclaimer… but sometimes it is really necessary. 😉

Disclaimer: this blog post contains PL/SQL code that deletes incarnations from your RMAN recovery catalog. Please DON’T use it unless you deeply understand what you are doing, as it can compromise your backup and recovery strategy.

Small introduction

You may have a central RMAN catalog that stores all the backup metadata for your databases. If it is the case, you will have a database entry for each of your databases and a new incarnation entry for each duplicate, incomplete recovery or  flashback (or whatever).

You should also have a delete strategy that deletes the obsolete backups from either your DISK or SBT_TAPE media. If you have old incarnations, however, after some time you will notice that their information never goes away from your catalog, and you may end up soon or later to do some housekeeping. But there is nothing more tedious than checking and deleting the incarnations one by one, especially if you have average big numbers like this catalog:

Where db, dbinc, bdf and brl contain reslectively the registered databases, incarnations, datafile backups and archivelog backups.

Different incarnations?

Consider the following query:

You can run it safely: it returns the list of incarnations hierarchically connected to their parent, by database name, key and level.

Then you have several types of behaviors:

  • Normal databases (created once, never restored or flashed back) will have just one or two incarnations (it depends on how they are created):

They are usually the ones that you may want to keep in your catalog, unless the database no longer exist: in this case perhaps you omitted the deletion from the catalog when you have dropped your database?

  • Flashed back databases (flashed back multiple times) will have as many incarnations as the number of flashbacks, but all connected with the incarnation prior to the flashback:

Here, despite you have several incarnations, they all belong to the same database (same DB_KEY and DBID), then you must also keep it inside the recovery catalog.

  • Non-production databases that are frequently refreshed from the production database (via duplicate) will have several incarnations with different DBIDs and DB_KEY:

This is usually the most frequent case: here you want to delete the old incarnations, but only as far as there are no backups attached to them that are still in the recovery window.

  • You may also have orphaned incarnations:

In this case, again, it depends whether the DBID and DB_KEY are the same as the current incarnation or not.

What do you need to delete?

Basically:

  • Incarnations of databases that no longer exist
  • Incarnations of existing databases where the database has a more recent current incarnation, only if there are no backups still in the retention window

How to do it?

In order to be sure 100% that you can delete an incarnation, you have to verify that there are no recent backups (for instance, no backups more rercent than the current recovery window for that database). If the database does not have a specified recovery window but rather a default “CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default”, it is a bit more problematic… in this case let’s assume that we consider “old” an incarnation that does not backup since 1 year (365 days), ok?

Getting the last backup of each database

Sadly, there is not a single table where you can verify that. You have to collect the information from several tables. I think bdf, al, cdf, bs would suffice in most cases.

When you delete an incarnation you specify a db_key: you have to get the last backup for each db_key, with queries like this:

Putting together all the tables:

Getting the  recovery window

The configuration information for each database is stored inside the conf table, but the retention information is stored in a VARCHAR2, either ‘TO RECOVERY WINDOW OF % DAYS’ or ‘TO REDUNDANCY %’

You need to convert it to a number when the retention policy is recovery windows, otherwise you default it to 365 days wher the redundancy is used. You can add a column and a join to the query:

and eventually, either display if it the incarnation is no more used or filter by usage:

Delete the incarnations!

You can delete the incarnations with this procedure:

This procedure will raise an exception (-20001, ‘Database not found’) when a database does not exist anymore (either already deleted by this procedure or by another session), so you need to handle it.

Putting all together:

I have used this procedure today for the first time and it worked like a charm.

However, if you have any adjustment or suggestion, don’t hesitate to comment it 🙂

HTH

DBMS_QOPATCH, datapatch, rollback, apply force

I am working for a customer on a quite big implementation of Cold Failover Cluster with Oracle Grid Infrastructure on Linux. I hope to have some material to publish soon about it! However, in this post I will be talking about patching the database in a cold-failover environment.

DISCLAIMER: I use massively scripts provided in this great blog post by Simon Pane:

https://www.pythian.com/blog/oracle-database-12c-patching-dbms_qopatch-opatch_xml_inv-and-datapatch/

Thank you Simon for sharing this 🙂

Intro

We are not yet in the process of doing out-of-place patching; at the moment the customer prefers to do in-place patching:

  • evacuate a node by relocating all the databases on other nodes
  • patching the node binaries
  • move back the databases and patch them with datapatch
  • do the same for the remaining nodes

I beg to disagree with this method, being a fan of having many patched golden copies distributed on all servers and patching the databases by just changing the ORACLE_HOME and running datapatch (like Rapid Home Provisioning does). But, this is the situation today, and we have to live with it.

Initial situation

  • Server 1, 2 and 3: one-off 20139391 applied
  • New database created

cfc_qopatch1When the DBCA creates a new database, in 12.1.0.2, it does not run datapatch by default, thus, the database does not have any patches installed.

However, this specific one-off patch does not modify anything in the database (sql_patch=false)

and the datapatch runs without touching the db:

Next step: I evacuate the server 2 and patch it, then I relocate my database on it

cfc_qopatch2

Now the database is not at the same level of the binaries and need to be patched:

The column CONSTITUENT is important here because it tells us what the parent patch_id is. This is the column that we have to check when we want to know if the patch has been applied on the database.

Now the patch is visible inside the dba_registry_sqlpatch:

Notice that the child patches are not listed in thie view.

Rolling back

Now, one node is patched, but the others are not. What happen if I relocate the patched database to a non-patched node?

cfc_qopatch3

The patch is applied inside the database but not in the binaries!