BP and Patch 22652097: set optimizer_adaptive_statistics to FALSE explicitly or it might not work!

Update 14.03.2018: After some exchanges with Nigel Bayliss, the behaviour described here has been filed as unpublished bug 27626925: OPTIMIZER ADAPTIVE STATS DEFAULT FALSE NOT HONORED WHEN ENABLED IN OCT OR JAN BP. It will be fixed starting with April’s bundle patch.

 

According to Nigel’s blog post:

The Oracle 12.1.0.2 October 2017 BP and the Adaptive Optimizer

if you installled the patch 22652097 prior to apply the Bundle Patch 171018, the BP apply in the database should recognize that the patch was already in place and keep it activated. This is done through the fix control 26664361.

When fix_control 26664361:0 -> Patch 22652097 is not enabled: the parameter optimizer_adaptive_features (OAF) works

When fix_control 26664361:1 -> Patch 22652097 is enabled; optimizer_adaptive_features is discarded and the two new parameters have the priority: optimizer_adaptive_plans (OAP) and optimizer_adaptive_statistics (OAS).

But at my customer, I had another behavior.

My patching story might be very similar to yours!

When I started upgrading my customer’s database to 12c in early 2015, I experienced very soon the infamous problems with SQL Plan Directives (SPD) and Adaptive Dynamic Sampling (ADS) that I described in my paper: ADAPTIVE FEATURES OR: HOW I LEARNED TO STOP WORRYING AND TROUBLESHOOT THE BOMB .

Early fixes

When I was new to the problem, the quick fix for the problematic applications was to set OAF to FALSE.

Later, I discovered some more details and decided to opt for setting:

In other cases, I disabled the specific directives that were causing problems.

But many databases did not have so many problems, and I left the defaults.

Patch 22652097 on top of BP170718 

At some point, me and my customer decided to apply the fix 22652097, on top of BP170718 that was our current patch level at that time.

The patch installation on a test database was complaining about the optimizer_adaptive_feature set: this parameter was not used anymore. This issue is nicely explained by Flora in her post Patch 22652097 in 12.1 makes optimizer_adaptive_features parameter obsolete.

In order to apply that patch on the remaining databases, we did:

  • alter system reset optimizer_adaptive_features;
  • alter system reset “_optimizer_dsdir_usage_control”;
  • Applied the patch on binaries and datapatch on the databases.

The result at this point was that:

  • optimizer_adaptive_features was not set
  • optimizer_adaptive_plans was set to true
  • optimizer_adaptive_statistics was set to false.

It might seems superflous to say, but it’s not, the SQL Plan Directives were not used anymore: no Adaptice Dynamic Sampling and no performance problems.

Bundle Patch 180116

Three weeks ago, we installled the last Bundle Patch in order to fix some Grid Infrastructure problems, and the BP, as described in Nigel’s note (and Mike Dietrich and many other bloggers :-)) contains the patch 22652097.

According to Nigel’s post, the patch installation should have detected that the patch 22652097 was already there and activate it.

And indeed, after we applied the BP, the fix_control 26664361 was set to 1 (that means that the patch 22652097 is enabled). So we went live with this setup without additional checks.

One week later, we started experiencing performance problems again. I noticed immediately that the Adaptive Dynamic Sampling was very aggressive again, and the SQL Plan Directives used again.

But the fix was there AND ENABLED!

After a few tests, I realized that the SPD is not used anymore if I set optimizer_adaptive_statistics EXPLICITLY to false.

optimizer_adaptive_statistics must be set explicitly, the default does not work

And here’s the proof:

I use once again the great SPD example by Tim Hall (sorry Tim, it’s not the first time that I steal your work 🙂 ) . You can find here:

SQL Plan Directives in Oracle Database 12c Release 1 (12.1)

After applying the BP, I have the default parameter, not set explicitly, and the fix_control enabled:


If I run the test statement (again, find it here https://oracle-base.com/articles/12c/sql-plan-directives-12cr1) the directives are used:


but then I set the parameter explicitly:

and the SPD usage (and consequently, ADS), are gone:

Conclusion

Set the parameter EXPLICITLY when you apply the BP that contains the fix.

And ALWAYS test the behavior!

You can check how many statements use the dynamic sampling by following this short blog post by Dominic Brooks:

Which of my sql statements are using dynamic sampling?

HTH

My own Dbvisit Replicate integration with Grid Infrastructure

I am helping my customer for a PoC of Dbvisit Replicate as a logical replication tool. I will not discuss (at least, not in this post) about the capabilities of the tool itself, its configuration or the caveats that you should beware of when you do logical replication. Instead, I will concentrate on how we will likely integrate it in the current environment.

My role in this PoC is to make sure that the tool will be easy to operate from the operational point of view, and the database operations, here, are supported by Oracle Grid Infrastructure and cold failover clusters.

Note: there are official Dbvisit  online resources  about how to configure Dbvisit Replicate in a cluster. I aim to complement those informations, not copy them.

Quick overview

If you know Dbvisit replicate, skip this paragraph.

There are three main components of Dbvisit Replicate: The FETCHER, the MINE and the APPLY processes. The FETCHER gets the redo stream from the source and sends it to the MINE process. The MINE process elaborates the redo streams and converts it in proprietary transaction log files (named plog). The APPLY process gets the plog files and applies the transactions on the destination database.

From an architectural point of view, MINE and APPLY do not need to run close to the databases that are part of the configuration. The FETCHER process, by opposite, needs to be local to the source database online log files (and archived logs).

Because the MINE process is the most resource intensive, it is not convenient to run it where the databases reside, as it might consume precious CPU resources that are licensed for Oracle Database. So, first step in this PoC: the FETCHER processes will run on the cluster, while MINE and APPLY will run on a dedicated Virtual Machine.

dbvisit_gi_overview

Clustering considerations

  • the FETCHER does NOT need to run on the server of the source database: having access to the online logs through the ASM instance is enough
  • to avoid SPoF, the fetcher should be a cluster resource that can relocate without problems
  • to simplify the configuration, the FETCHER configuration and the Dbvisit binaries should be on a shared filesystem (the FETCHER does not persist any data, just the logs)
  • the destination database might be literally anywhere: the APPLY connects via SQL*Net, so a correct name resolution and routing to the destination database are enough

so the implementation steps are:

  1. create a shared filesystem
  2. install dbvisit in the shared filesystem
  3. create the Dbvisit Replicate configuration on the dedicated VM
  4. copy the configuration files on the cluster
  5. prepare an action script
  6. configure the resource
  7. test!

Convention over configuration: the importance of a strong naming convention

Before starting the implementation, I decided to put all the caveats related to the FETCHER  resource relocation on paper:

  • Where will the configuration files reside? Dbvisit has an important variable: the Configuration Name. All the operations are done by passing a configuration file named /{PATH}/{CONFIG_NAME}/{CONFIG_NAME}-{PROCESS_TYPE}.ddc to the dbvrep binary. So, I decided to put ALL the configuration directories under the same path: given the Configuration Name, I will always be able to get the configuration file path.
  • How will the configuration files relocate from one node to the other? Easy here: they won’t. I will use an ACFS filesystem
  • How can I link the cluster resource with its configuration name? Easy again: I call my resources dbvrep.CONFIGNAME.PROCESS_TYPE. e.g. dbvrep.FROM_A_TO_B.fetcher
  • How will I manage the need to use a new version of dbvisit in the future? Old and new versions must coexist: Instead of using external configuration files, I will just use a custom resource attribute named DBVREP_HOME inside my resource type definition. (see later)
  • What port number should I use? Of course, many fetchers started on different servers should not have conflicts. This is something that might be either planned or made dynamic. I will opt for the first one. But instead of getting the port number inside the Dbvisit configuration, I will use a custom resource attribute: DBVREP_PORT.

Considerations on the FETCHER listen address

This requires a dedicated paragraph. The Dbvisit documentation suggest to  create a VIP, bind on the VIP address and create a dependency between the FETCHER resource and the VIP. Here is where my configuration will differ.

Having a separate VIP per FETCHER resource might, potentially, lead to dozens of VIPs in the cluster. Everything will depend on the success of the PoC and on how many internal clients will decide to ask for such implementation. Many VIPs == many interactions with network admins for address reservation, DNS configurations, etc. Long story short, it might slow down the creation and maintenance of new configurations.

Instead, each FETCHER will listen to the local server address, and the action script will take care of:

  • getting the current host name
  • getting the current ASM instance
  • changing the settings of the specific Dbvisit Replicate configuration (ASM instance and FETCHER listen address)
  • starting the FETCHER

Implementation

Now that all the caveats and steps are clear, I can show how I implemented it:

Create a shared filesystem

Install dbvisit in the shared filesystem

Create the Dbvisit Replicate configuration on the dedicated VM

Copy the configuration files from the Dbvisit VM to the cluster

Prepare an action script

Configure the resource

Test!

 

Also the relocation worked as expected: when the settings are modified through:

The MINE process get the change dynamically, so no need to restart it.

Last consideration

Adding a hard dependency between the DB and the FETCHER will require to stop the DB with the force option or to always stop the fetcher before the database. Also, the start of the DB will pullup the FETCHER (pullup:always) and the opposite as well. We will consider furtherly if we will use this dependency or if we will manage it differently (e.g. through the action script).

The hard dependency declared without the global keyword, will always start the fetcher on the server where the database runs. This is not required, but it might be nice to see the fetcher on the same node. Again, a consideration that we will discuss furtherly.

HTH

Ludovico

Get the Most out of Oracle Data Guard – The material

Here we go: as usual, the feedback that I usually get after my talks (specifically, after POUG High Five conference), is if I will share my demo scripts and material.

Sadly, the demos I am doing for my presentation “Get the most out of Oracle Data Guard” are quite tied to an environment built for the purpose of the demos. So, do not expect to get scripts easy to use as is, but rather to get some ideas beyond the demo themselves.

I hope they will help to get the whole picture.

Of course, if you need to implement a cloning strategy based on Data Guard or any other solution that I describe in this post, please feel free to contact me, I will be glad to help you implement it in your environment.

Slides

Demo 1

Video:

Scripts:

 

Demo 2

Video:


Scripts:

 

Demo 3

Video:

Scripts:

Preparation:

snap_acfs.pl

 

snap_databasae.pl

clone_from_snap.pl

Cheers

Ludovico

12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

Recently my customer patched its 12.1.0.2 databases with the Bundle Patch 170718 on the new servers (half of the customer’s environment). The old servers are still on 161018 Bundle Patch.

We realized that we could not move anymore the databases from the old servers to the new ones because the duplicate from active database was failing with this error:

The last lines shows the same error that Franck blogged about some months ago.

Oracle 12.2 had introduced incompatibility with previous releases in remote file transfer via SQL*Net. At least this is what it seems. According to Oracle, this is due to a bugfix present in Oracle 12.2

Now, the bundle patch that we installed on BP 170718 contains the same bugfix (Patch for bug 18633374).

So, the incompatibility happens now between databases of the same “Major Release” (12.1.0.2).

There are two possible workarounds:

  1. Apply the same patch level on both sides (BP170718 in my case)
  2. Apply just the patch 18633374 on top of your current PSU/DBBP (a merge might be necessary).

We used the second approach and now we can setup Data Guard again to move our databases without downtime:

HTH

Ludovico

 

 

Which Oracle Databases use most CPU on my server?

Assumptions

  • You have many (hundreds) of instances and more than a couple of servers
  • One of your servers have high CPU Load
  • You have Enterprise Manager 12c but the Database Load does not filter by server
  • You want to have an historical representation of the user CPU utilization, per instance

Getting the data from the EM Repository

With the following query, connected to the SYSMAN schema of your EM repository, you can get the hourly max() and/or avg() of user CPU by instance and time.

Suppose you select just the max value: the result will be similar to this:

 

Putting it into excel

There are one million ways to do something more reusable than excel (like rrdtool scripts, gnuplot, R, name it), but Excel is just right for most people out there (including me when I feel lazy).

  • Configure an Oracle Client and add the ODBC data source to the EM repository:

odbc_emrep

  • Open Excel, go to “Data” – “Connections” and add a new connection:
    • Search…
    • New Source
    • DSN ODBC
  • Select your new ODBC data source, user, password
  • Uncheck “Connection to a specific table”
  • Give a name and click Finish
  • On the DSN -> Properties -> Definition, enter the SQL text I have provided previously

connection_properties_odbc_excel

The result should be something similar: ( but much longer :-))

first_step_excelPivoting the results

Create e new sheet and name it “pivot”, Click on “Create Pivot Table”, select your data and your dimensions:

pivotThe result:

pivotedCreating the Graph

Now that the data is correctly formatted, it’s easyy to add a graph:

just select the entire pivot table and create a new stacked area graph.

The result will be similar to this:

graph_cpu_load_excel

With such graph, it is easy to spot which databases consumed most CPU on the system in a defined period, and to track the progress if you start a “performance campaign”.

For example, you can see that the “green” and “red” databases were consuming constantly some CPU up to 17.05.2017 and then some magic solved the CPU problem for those instances.

It is also quite convenient for checking the results of new instance caging settings…

The resulting CPU will not necessarily be 100%: the SYS CPU time is not included, as well as the user CPU of all the other processes that are either not DB or not monitored with Enterprise Manager.

HTH

Ludovico

Another problem with “KSV master wait” and “ASM file metadata operation”

My customer today tried to do a duplicate on a cluster. When preparing the auxiliary instance, she noticed that the startup nomount was hanging forever: Nothing in the alert, nothing in the trace files.

Because the database and the spfile were stored inside ASM, I’ve been quite suspicious…

The ASM trace files had the following entries:

The ASM instance had the following sessions waiting:

OMS?

Around 12:38:56, another colleague in the office added a disk to one of the disk groups, through Enterprise Manager 12c!

But there were no rebalance operations:

It’s not the first time that I hit this type of problems. Sadly, sometimes it requires a full restart of the cluster or of ASM (because of different bugs).

This time, however, I have tried to kill only the foreground sessions waiting on “ASM file metadata operation”, starting with the one coming from the OMS.

Surprisingly, after killing that session, everything was fine again:

I never add disks via OMS (I’m a sqlplus guy ;-)) , I wonder what went wrong with it 🙂

Ludovico

RMAN Catalog Housekeeping: how to purge the old incarnations

First, let me apologize because every post in my blog starts with a disclaimer… but sometimes it is really necessary. 😉

Disclaimer: this blog post contains PL/SQL code that deletes incarnations from your RMAN recovery catalog. Please DON’T use it unless you deeply understand what you are doing, as it can compromise your backup and recovery strategy.

Small introduction

You may have a central RMAN catalog that stores all the backup metadata for your databases. If it is the case, you will have a database entry for each of your databases and a new incarnation entry for each duplicate, incomplete recovery or  flashback (or whatever).

You should also have a delete strategy that deletes the obsolete backups from either your DISK or SBT_TAPE media. If you have old incarnations, however, after some time you will notice that their information never goes away from your catalog, and you may end up soon or later to do some housekeeping. But there is nothing more tedious than checking and deleting the incarnations one by one, especially if you have average big numbers like this catalog:

Where db, dbinc, bdf and brl contain reslectively the registered databases, incarnations, datafile backups and archivelog backups.

Different incarnations?

Consider the following query:

You can run it safely: it returns the list of incarnations hierarchically connected to their parent, by database name, key and level.

Then you have several types of behaviors:

  • Normal databases (created once, never restored or flashed back) will have just one or two incarnations (it depends on how they are created):

They are usually the ones that you may want to keep in your catalog, unless the database no longer exist: in this case perhaps you omitted the deletion from the catalog when you have dropped your database?

  • Flashed back databases (flashed back multiple times) will have as many incarnations as the number of flashbacks, but all connected with the incarnation prior to the flashback:

Here, despite you have several incarnations, they all belong to the same database (same DB_KEY and DBID), then you must also keep it inside the recovery catalog.

  • Non-production databases that are frequently refreshed from the production database (via duplicate) will have several incarnations with different DBIDs and DB_KEY:

This is usually the most frequent case: here you want to delete the old incarnations, but only as far as there are no backups attached to them that are still in the recovery window.

  • You may also have orphaned incarnations:

In this case, again, it depends whether the DBID and DB_KEY are the same as the current incarnation or not.

What do you need to delete?

Basically:

  • Incarnations of databases that no longer exist
  • Incarnations of existing databases where the database has a more recent current incarnation, only if there are no backups still in the retention window

How to do it?

In order to be sure 100% that you can delete an incarnation, you have to verify that there are no recent backups (for instance, no backups more rercent than the current recovery window for that database). If the database does not have a specified recovery window but rather a default “CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default”, it is a bit more problematic… in this case let’s assume that we consider “old” an incarnation that does not backup since 1 year (365 days), ok?

Getting the last backup of each database

Sadly, there is not a single table where you can verify that. You have to collect the information from several tables. I think bdf, al, cdf, bs would suffice in most cases.

When you delete an incarnation you specify a db_key: you have to get the last backup for each db_key, with queries like this:

Putting together all the tables:

Getting the  recovery window

The configuration information for each database is stored inside the conf table, but the retention information is stored in a VARCHAR2, either ‘TO RECOVERY WINDOW OF % DAYS’ or ‘TO REDUNDANCY %’

You need to convert it to a number when the retention policy is recovery windows, otherwise you default it to 365 days wher the redundancy is used. You can add a column and a join to the query:

and eventually, either display if it the incarnation is no more used or filter by usage:

Delete the incarnations!

You can delete the incarnations with this procedure:

This procedure will raise an exception (-20001, ‘Database not found’) when a database does not exist anymore (either already deleted by this procedure or by another session), so you need to handle it.

Putting all together:

I have used this procedure today for the first time and it worked like a charm.

However, if you have any adjustment or suggestion, don’t hesitate to comment it 🙂

HTH

DBMS_QOPATCH, datapatch, rollback, apply force

I am working for a customer on a quite big implementation of Cold Failover Cluster with Oracle Grid Infrastructure on Linux. I hope to have some material to publish soon about it! However, in this post I will be talking about patching the database in a cold-failover environment.

DISCLAIMER: I use massively scripts provided in this great blog post by Simon Pane:

https://www.pythian.com/blog/oracle-database-12c-patching-dbms_qopatch-opatch_xml_inv-and-datapatch/

Thank you Simon for sharing this 🙂

Intro

We are not yet in the process of doing out-of-place patching; at the moment the customer prefers to do in-place patching:

  • evacuate a node by relocating all the databases on other nodes
  • patching the node binaries
  • move back the databases and patch them with datapatch
  • do the same for the remaining nodes

I beg to disagree with this method, being a fan of having many patched golden copies distributed on all servers and patching the databases by just changing the ORACLE_HOME and running datapatch (like Rapid Home Provisioning does). But, this is the situation today, and we have to live with it.

Initial situation

  • Server 1, 2 and 3: one-off 20139391 applied
  • New database created

cfc_qopatch1When the DBCA creates a new database, in 12.1.0.2, it does not run datapatch by default, thus, the database does not have any patches installed.

However, this specific one-off patch does not modify anything in the database (sql_patch=false)

and the datapatch runs without touching the db:

Next step: I evacuate the server 2 and patch it, then I relocate my database on it

cfc_qopatch2

Now the database is not at the same level of the binaries and need to be patched:

The column CONSTITUENT is important here because it tells us what the parent patch_id is. This is the column that we have to check when we want to know if the patch has been applied on the database.

Now the patch is visible inside the dba_registry_sqlpatch:

Notice that the child patches are not listed in thie view.

Rolling back

Now, one node is patched, but the others are not. What happen if I relocate the patched database to a non-patched node?

cfc_qopatch3

The patch is applied inside the database but not in the binaries!

If I run datapatch again, the patch is rolled back:

The patch has been rolled back according to the datapatch, and the action is shown in the dba_registry_sqlpatch:

But if I look at the logfile, the patch had some errors:

Indeed, the patch looks still there:

If I try to run it again, it does nothing/it fails saying the patch is not there:

What does it say on the patched node?

Whaaat? datapatch there says that the patch IS in the registry and there’s nothing to do. Let’s try to force its apply again:

Conclusion

I’m not sure whether it is safe to run the patched database in a non-patched Oracle Home. I guess it is time for a new SR 🙂

Meanwhile, we will try hard not to relocate the databases once they have been patched.

Cheers

Ludo

Getting the Oracle Homes in a server from the oraInventory

The information contained in the oratab should always be updated, but it is not always reliable. If you want to know what Oracle installations you have in a server, better to get it from the Oracle Universal Installer or, if you want some shortcuts, do some grep magics inside the inventory with the shell.

The following diagram is a simplified structure of the inventory that shows what entries are present in the central inventory (one per server) and the local inventories (one per Oracle Home).

inventory_structureYou can use this simple function to get some content out of it, including the edition (that information is a step deeper in the local inventory).

HTH

Loading resolved Adaptive Plans in the SQL Plan Management

In my previous post, I have shown that loading Adaptive Plans in the SQL Plan Baseline leads to using the original plan. Well, actually, this is true when you capture them via the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter.

Thanks to a tweet by Neil Chandler, I’ve realized that it was a good idea to show also the case when the plan is loaded manually.

When the adaptive plan switches to the alternative plan, the plan_hash_value also changes, and can be loaded manually in the baseline with DBMS_SPM.

Let’s reset everything and retry quickly to:

  • Capture the plan automatically (this will lead to the original plan)
  • Load the plan manually (I will specify to load the alternative plan, if resolved)
  • Drop the plan captured automatically
  • Use the newly accepted baseline

To recap:

  • The capture process will always load the original plan
  • It is possible to decide to load manually the original one or the alternative one (if resolved)
  • Using automatic capture is a bad idea

HTH

Ludo