Getting the Oracle Homes in a server from the oraInventory

The information contained in the oratab should always be updated, but it is not always reliable. If you want to know what Oracle installations you have in a server, better to get it from the Oracle Universal Installer or, if you want some shortcuts, do some grep magics inside the inventory with the shell.

The following diagram is a simplified structure of the inventory that shows what entries are present in the central inventory (one per server) and the local inventories (one per Oracle Home).

inventory_structureYou can use this simple function to get some content out of it, including the edition (that information is a step deeper in the local inventory).

HTH

Loading resolved Adaptive Plans in the SQL Plan Management

In my previous post, I have shown that loading Adaptive Plans in the SQL Plan Baseline leads to using the original plan. Well, actually, this is true when you capture them via the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter.

Thanks to a tweet by Neil Chandler, I’ve realized that it was a good idea to show also the case when the plan is loaded manually.

When the adaptive plan switches to the alternative plan, the plan_hash_value also changes, and can be loaded manually in the baseline with DBMS_SPM.

Let’s reset everything and retry quickly to:

  • Capture the plan automatically (this will lead to the original plan)
  • Load the plan manually (I will specify to load the alternative plan, if resolved)
  • Drop the plan captured automatically
  • Use the newly accepted baseline

To recap:

  • The capture process will always load the original plan
  • It is possible to decide to load manually the original one or the alternative one (if resolved)
  • Using automatic capture is a bad idea

HTH

Ludo

How Adaptive Plans work with SQL Plan Baselines?

Disclaimer: after writing this post (but before publishing it) I have seen that other people already blogged about it, so I am ashamed of publishing it anyway… but that’s blogger’s life 🙂

Wednesday I have got a nice question after my presentation about Adaptive Features at the DOAG16 conference:

What happens when you load an adaptive plan in a SQL Plan Baseline?
Does it load only the final plan or does it load the whole plan including the inactive operations? Will the plan be evaluated again using the inflection point?

I have decided to do some tests in order to give the best possible answer. I did not spend the time to rethink about producing an adaptive plan. Tim Hall already did an excellent test case to create and alter an adaptive plan in his blog, so I have reused massively most of its code. Thanks Tim :-).

I will not post all the code (please find it in Tim’s post), I will go straight to the plans.

First: I have an adaptive plan that resolves to NESTED LOOPS:

Second: I load the plan (lazy way: using baseline capture at session level)

Third: re-run the statement and check the plan

It does not look adaptive, but I can also check from the function DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE:

Again, despite in the Note section it says it is adaptive, it does not look like an adaptive plan.

Can I trust this information? Of course I did not and tried to check the plan with and without baseline after changing the rows to force a plan switch to HJ (again taking Tim’s example):

After changing the rows:

  • when I do not use the baseline, the plan resolves to HASH JOIN
  • when I use it, the baseline forces to NESTED LOOPS.

So the plan in the baseline is not adaptive and it forces to what has been loaded. Is it the final plan or the original one? I have to capture it again to see if a new baseline appears:

A new baseline does not appear, so it looks that the original plan is considered by the capture process and not the resolved one! To be 100% sure, let’s try to drop the existing one and redo the test:

So, despite the fact that I have an adaptive plan that switches from NL to HJ, only the NESTED LOOPS operations are captured in the baseline, I can infer the only the original plan is loaded as SQL Plan Baseline.

References:

How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog 😀

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

spd_awr_high_cpu_sar

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

spd_awr_high_cpu_em

 

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

spd_awr_high_cpu_m001It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

spd_awr_high_cpu_instOk, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

spd_cpu_sql_monitorDBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

spd_awr_franckOh no, not Franck again! He always discovers new stuff and blogs about it before I do 🙂

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that  it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

 

 

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

Ludovico

 

Getting the DBID and Incarnation from the RMAN Catalog

Using the RMAN catalog is an option. There is a long discussion between DBAs on whether should you use the catalog or not.

But because I like (a lot) the RMAN catalog and I generally use it, I assume that most of you do it 😉

When you want to restore from the RMAN catalog, you need to get the DBID of the database you want to restore and, sometimes, also the incarnation key.

The DBID is used to identify the database you want to restore. The DBID is different for every newly created / duplicated database, but beware that if you duplicate your database manually (using restore/recover), you actually need to change your DBID using the nid tool, otherwise you will end up by having more than one database registered in the catalog with the very same DBID. This is evil! The DB_NAME is also something that you may want to make sure is unique within your database farm.

The Incarnation Key changes whenever you do an “open resetlogs”, following for example a flashback database, an incomplete recovery, or just a “open resetlogs” without any specific need.

2016-02-15 09_43_34-Sametime Appshare Highlighter

In the image, you can see that you may want to restore to a point in time after the open resetlogs (blue incarnation) or before it (red incarnation). Depending on which one you need to restore, you may need to use the command RESET DATABASE TO INCARNATION.

https://docs.oracle.com/database/121/RCMRF/rcmsynta2007.htm#RCMRF148

If you have a dynamic and big environment, you probably script your restores procedures, that’s why getting the DBID and incarnation key using the RMAN commands may be more complex than just querying the catalog using sqlplus.

How do I get the history of my database incarnations?

You can get it easily for all your databases using the handy hierarchical queries on the RMAN catalog (db names and ids are obfuscated for obvious reasons):

What about getting the correct DBID/DBINC_KEY pair for a specific database/time?

You can get the time windows for each incarnation using the lead() analytical function:

With this query, you can see that every incarnation has a reset time and a “next reset time”.

It’s easy then to get exactly what you need by adding a couple of where clauses:

So, if I need to restore the database 1465419F until time 2016-01-20 00:00:00, i need to set DBID=1048383773 and reset the database to incarnation 1256014297.

Cheers

Ludo

Recording of “Rapid Home Provisioning” webinar for the RAC SIG

Yesterday I have presented the Oracle Rapid Home Provisioning technology for the RAC SIG, you can find the recording on YouTube:

Cheers

Ludo

Rapid Home Provisioning

In a few days I will give a presentation at UKOUG Tech15 about Rapid Home Provisioning, it will be the first time that I present this session in public.

I usually like to give the link to the material to my audience, so here we go:

Slides:

Demo:

Enjoy

Ludovico

Oracle Active Data Guard and Global Data Services in Action!

In a few days I will give a presentation at UKOUG Tech15 about Global Data Services, it will be the first time that I present this session.

I usually like to give the link to the material to my audience, so here we go:

Credits

I have to give special credits to my colleague Robert Bialek. I’ve got a late confirmation for this session and my slide deck was not ready at all, so I have used a big part of his original work. Most of the content included in the slides has been created by Robert, not me. (Thank you for your help! :-))

Slides

Demo recording

Demo script

And the script to revert the demo:

Cheers

Ludovico

 

Oracle Database on ACFS: a perfect marriage?

Update: I will give this presentation at UKOUG Tech15, Wed 9 December at 14:30.

This presentation has had a very poor score in selections for conferences (no OOW, no DOAG) but people liked it very much at Paris Oracle Meetup.  The Database on ACFS is mainstream now, thanks to the new ODA releases. Having some knowledge about why and how you should run (not) Databases on ACFS is definitely worth a read.

Slides

Demo 1 recording

Demo 2 recording

Demo script (DB ACFS clone from Standby Database)

 

Comments are, as always, very appreciated 🙂

Ludo

Migrating Oracle RAC from SuSE to OEL (or RHEL) live

I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.

I know, I know, there is a paper from Dell and Oracle named:

How Dell Migrated from SUSE Linux to Oracle Linux

That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:

– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart

What I want to achieve instead is:
add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
– move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
– repeat (for the remaining nodes)

because the customer will also migrate to new hardware.

In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).

I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:

  • Add new ip addresses (public, private, vip) to the DNS/hosts
  • Install the new OEL server
  • Keep the same user and groups (uid, gid, etc)
  • Verify the network connectivity and setup SSH equivalence
  • Check that the multicast connection is ok
  • Add the storage, configure persistent naming (udev) and verify that the disks (major, minor, names) are the very same
  • The network cards also must be the very same

Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to

  • Kernel release mismatch
  • Package mismatch

Here’s an example of output:

So the problem is not if the check succeed or not (it will not), but what fails.

Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors.  I need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it works:

Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:

Bingo! Let’s check if everything is up and running:

So yes, it works, but remember that it’s not a supported long-term configuration.

In my case I expect to migrate the whole cluster from SLES to OEL in one day.

NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html

Otherwise, you need to configure a new interface name for the cluster with oifcfg.

HTH

Ludovico