Loading resolved Adaptive Plans in the SQL Plan Management

In my previous post, I have shown that loading Adaptive Plans in the SQL Plan Baseline leads to using the original plan. Well, actually, this is true when you capture them via the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter.

Thanks to a tweet by Neil Chandler, I’ve realized that it was a good idea to show also the case when the plan is loaded manually.

When the adaptive plan switches to the alternative plan, the plan_hash_value also changes, and can be loaded manually in the baseline with DBMS_SPM.

Let’s reset everything and retry quickly to:

  • Capture the plan automatically (this will lead to the original plan)
  • Load the plan manually (I will specify to load the alternative plan, if resolved)
  • Drop the plan captured automatically
  • Use the newly accepted baseline

To recap:

  • The capture process will always load the original plan
  • It is possible to decide to load manually the original one or the alternative one (if resolved)
  • Using automatic capture is a bad idea

HTH

Ludo

How Adaptive Plans work with SQL Plan Baselines?

Disclaimer: after writing this post (but before publishing it) I have seen that other people already blogged about it, so I am ashamed of publishing it anyway… but that’s blogger’s life ūüôā

Wednesday I have got a nice question after my presentation about Adaptive Features at the DOAG16 conference:

What happens when you load an adaptive plan in a SQL Plan Baseline?
Does it load only the final plan or does it load the whole plan including the inactive operations? Will the plan be evaluated again using the inflection point?

I have decided to do some tests in order to give the best possible answer. I did not spend the time to rethink about producing an adaptive plan. Tim Hall already did an excellent test case to create and alter an adaptive plan in his blog, so I have reused massively most of its code. Thanks Tim :-).

I will not post all the code (please find it in Tim’s post), I will go straight to the plans.

First: I have an adaptive plan that resolves to NESTED LOOPS:

Second: I load the plan (lazy way: using baseline capture at session level)

Third: re-run the statement and check the plan

It does not look adaptive, but I can also check from the function DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE:

Again, despite in the Note section it says it is adaptive, it does not look like an adaptive plan.

Can I trust this information? Of course I did not and tried to check the plan with and without baseline after changing the rows to force a plan switch to HJ (again taking Tim’s example):

After changing the rows:

  • when I do not use the baseline, the plan resolves to HASH JOIN
  • when I use it, the baseline forces to NESTED LOOPS.

So the plan in the baseline is not adaptive and it forces to what has been loaded. Is it the final plan or the original one? I have to capture it again to see if a new baseline appears:

A new baseline does not appear, so it looks that the original plan is considered by the capture process and not the resolved one! To be 100% sure, let’s try to drop the existing one and redo the test:

So, despite the fact that I have an adaptive plan that switches from NL to HJ, only the NESTED LOOPS operations are captured in the baseline, I can infer the only the original plan is loaded as SQL Plan Baseline.

References:

How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog ūüėÄ

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

spd_awr_high_cpu_sar

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

spd_awr_high_cpu_em

 

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

spd_awr_high_cpu_m001It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

spd_awr_high_cpu_instOk, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

spd_cpu_sql_monitorDBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

spd_awr_franckOh no, not Franck again! He always discovers new stuff and blogs about it before I do ūüôā

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that  it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

 

 

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

Ludovico

 

Oracle Active Data Guard and Global Data Services in Action!

In a few days I will give a presentation at UKOUG Tech15 about Global Data Services, it will be the first time that I present this session.

I usually like to give the link to the material to my audience, so here we go:

Credits

I have to give special credits to my colleague Robert Bialek. I’ve got a late confirmation for this session and my slide deck was not ready at all, so I have used a big part of his original¬†work. Most of the content included in the slides has been created by Robert, not me. (Thank you for your help! :-))

Slides

Demo recording

Demo script

And the script to revert the demo:

Cheers

Ludovico

 

Oracle Database on ACFS: a perfect marriage?

Update: I will give this presentation at UKOUG Tech15, Wed 9 December at 14:30.

This presentation has had a very poor score in selections for conferences (no OOW, no DOAG) but people liked it very much at Paris Oracle Meetup.  The Database on ACFS is mainstream now, thanks to the new ODA releases. Having some knowledge about why and how you should run (not) Databases on ACFS is definitely worth a read.

Slides

Demo 1 recording

Demo 2 recording

Demo script (DB ACFS clone from Standby Database)

 

Comments are, as always, very appreciated ūüôā

Ludo

Migrating Oracle RAC from SuSE to OEL (or RHEL) live

I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.

I know, I know, there is a paper from Dell and Oracle named:

How Dell Migrated from SUSE Linux to Oracle Linux

That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:

– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart

What I want to achieve instead is:
add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
–¬†move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
Рrepeat (for the remaining nodes)

because the customer will also migrate to new hardware.

In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).

I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:

  • Add new ip addresses (public, private, vip) to the DNS/hosts
  • Install the new OEL server
  • Keep the same user and groups (uid, gid, etc)
  • Verify the network connectivity and setup SSH equivalence
  • Check that the multicast connection is ok
  • Add the storage, configure persistent naming (udev) and verify that the disks (major, minor, names) are the very same
  • The network cards also must be the very same

Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to

  • Kernel release mismatch
  • Package mismatch

Here’s an example of output:

So the problem is not if the check succeed or not (it will not), but what fails.

Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors.¬†¬†I¬†need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it¬†works:

Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:

Bingo! Let’s check if everything is up and running:

So yes, it works, but remember that it’s not a supported long-term configuration.

In my case I expect to migrate the whole cluster from SLES to OEL in one day.

NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html

Otherwise, you need to configure a new interface name for the cluster with oifcfg.

HTH

Ludovico

Get information about Cursor Sharing for a SQL_ID

Yesterday I’ve got a weird problem with Adaptive Cursor Sharing. I’m not sure yet about the issue, but it seems to be related to cursor sharing histograms. Hopefully one day I will blog about what I’ve learnt from this experience.

To better monitor the problem on that specific query, I’ve prepared this script (tested on 12.1.0.2):

The result is something similar (in my case it has 26 child cursors):

It’s a quick way to get the relevant information in a single result.

Off course, if you need deeper details, you should consider something more powerful like SQLd360 from Mauro Pagano.

Credits: I’ve got the unpivot idea (and copied that part of the code) from this post by Timur Akhmadeev.

Ludo

Querying the dba_hist_sys_time_model to get historical data

This quick post is mainly¬†for myself… I will certainly use it for reference in the future.

Debugging problems due to adaptive dynamic sampling and in general adaptive features sometimes needs to get historical data about, e.g., parse time.

In order to get this information you may need to query the view DBA_HIST_SYS_TIME_MODEL (take care, it needs Diagnostic Pack license!)

You can use this query as an example.

 

In this specific example, it shows the “parse time elapsed”, the “DB time” and the percentage parse/dbtime, along with the value of the parameter “optimizer_adaptive_features“. You can use it to check if changing the parameters related to adaptive dynamic sampling improves or not the parse time.

The output will be something like this:

HTH

Ludo