trivadis sessions at UKOUG Tech16

UKOUG Tech16 will start in less than a week. Trivadis will be there with many speakers, 10 sessions in total 🙂
If you are a delegate, come along and have a chat with us!

Super Sunday

Monday 05/12

Tuesday 06/12

Wednesday 07/12

See you there 🙂

DBMS_QOPATCH, datapatch, rollback, apply force

I am working for a customer on a quite big implementation of Cold Failover Cluster with Oracle Grid Infrastructure on Linux. I hope to have some material to publish soon about it! However, in this post I will be talking about patching the database in a cold-failover environment.

DISCLAIMER: I use massively scripts provided in this great blog post by Simon Pane:

https://www.pythian.com/blog/oracle-database-12c-patching-dbms_qopatch-opatch_xml_inv-and-datapatch/

Thank you Simon for sharing this 🙂

Intro

We are not yet in the process of doing out-of-place patching; at the moment the customer prefers to do in-place patching:

  • evacuate a node by relocating all the databases on other nodes
  • patching the node binaries
  • move back the databases and patch them with datapatch
  • do the same for the remaining nodes

I beg to disagree with this method, being a fan of having many patched golden copies distributed on all servers and patching the databases by just changing the ORACLE_HOME and running datapatch (like Rapid Home Provisioning does). But, this is the situation today, and we have to live with it.

Initial situation

  • Server 1, 2 and 3: one-off 20139391 applied
  • New database created

cfc_qopatch1When the DBCA creates a new database, in 12.1.0.2, it does not run datapatch by default, thus, the database does not have any patches installed.

However, this specific one-off patch does not modify anything in the database (sql_patch=false)

and the datapatch runs without touching the db:

Next step: I evacuate the server 2 and patch it, then I relocate my database on it

cfc_qopatch2

Now the database is not at the same level of the binaries and need to be patched:

The column CONSTITUENT is important here because it tells us what the parent patch_id is. This is the column that we have to check when we want to know if the patch has been applied on the database.

Now the patch is visible inside the dba_registry_sqlpatch:

Notice that the child patches are not listed in thie view.

Rolling back

Now, one node is patched, but the others are not. What happen if I relocate the patched database to a non-patched node?

cfc_qopatch3

The patch is applied inside the database but not in the binaries!

If I run datapatch again, the patch is rolled back:

The patch has been rolled back according to the datapatch, and the action is shown in the dba_registry_sqlpatch:

But if I look at the logfile, the patch had some errors:

Indeed, the patch looks still there:

If I try to run it again, it does nothing/it fails saying the patch is not there:

What does it say on the patched node?

Whaaat? datapatch there says that the patch IS in the registry and there’s nothing to do. Let’s try to force its apply again:

Conclusion

I’m not sure whether it is safe to run the patched database in a non-patched Oracle Home. I guess it is time for a new SR 🙂

Meanwhile, we will try hard not to relocate the databases once they have been patched.

Cheers

Ludo

Getting the Oracle Homes in a server from the oraInventory

The information contained in the oratab should always be updated, but it is not always reliable. If you want to know what Oracle installations you have in a server, better to get it from the Oracle Universal Installer or, if you want some shortcuts, do some grep magics inside the inventory with the shell.

The following diagram is a simplified structure of the inventory that shows what entries are present in the central inventory (one per server) and the local inventories (one per Oracle Home).

inventory_structureYou can use this simple function to get some content out of it, including the edition (that information is a step deeper in the local inventory).

HTH

Loading resolved Adaptive Plans in the SQL Plan Management

In my previous post, I have shown that loading Adaptive Plans in the SQL Plan Baseline leads to using the original plan. Well, actually, this is true when you capture them via the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter.

Thanks to a tweet by Neil Chandler, I’ve realized that it was a good idea to show also the case when the plan is loaded manually.

When the adaptive plan switches to the alternative plan, the plan_hash_value also changes, and can be loaded manually in the baseline with DBMS_SPM.

Let’s reset everything and retry quickly to:

  • Capture the plan automatically (this will lead to the original plan)
  • Load the plan manually (I will specify to load the alternative plan, if resolved)
  • Drop the plan captured automatically
  • Use the newly accepted baseline

To recap:

  • The capture process will always load the original plan
  • It is possible to decide to load manually the original one or the alternative one (if resolved)
  • Using automatic capture is a bad idea

HTH

Ludo

How Adaptive Plans work with SQL Plan Baselines?

Disclaimer: after writing this post (but before publishing it) I have seen that other people already blogged about it, so I am ashamed of publishing it anyway… but that’s blogger’s life 🙂

Wednesday I have got a nice question after my presentation about Adaptive Features at the DOAG16 conference:

What happens when you load an adaptive plan in a SQL Plan Baseline?
Does it load only the final plan or does it load the whole plan including the inactive operations? Will the plan be evaluated again using the inflection point?

I have decided to do some tests in order to give the best possible answer. I did not spend the time to rethink about producing an adaptive plan. Tim Hall already did an excellent test case to create and alter an adaptive plan in his blog, so I have reused massively most of its code. Thanks Tim :-).

I will not post all the code (please find it in Tim’s post), I will go straight to the plans.

First: I have an adaptive plan that resolves to NESTED LOOPS:

Second: I load the plan (lazy way: using baseline capture at session level)

Third: re-run the statement and check the plan

It does not look adaptive, but I can also check from the function DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE:

Again, despite in the Note section it says it is adaptive, it does not look like an adaptive plan.

Can I trust this information? Of course I did not and tried to check the plan with and without baseline after changing the rows to force a plan switch to HJ (again taking Tim’s example):

After changing the rows:

  • when I do not use the baseline, the plan resolves to HASH JOIN
  • when I use it, the baseline forces to NESTED LOOPS.

So the plan in the baseline is not adaptive and it forces to what has been loaded. Is it the final plan or the original one? I have to capture it again to see if a new baseline appears:

A new baseline does not appear, so it looks that the original plan is considered by the capture process and not the resolved one! To be 100% sure, let’s try to drop the existing one and redo the test:

So, despite the fact that I have an adaptive plan that switches from NL to HJ, only the NESTED LOOPS operations are captured in the baseline, I can infer the only the original plan is loaded as SQL Plan Baseline.

References:

Autumn: a season of conferences and travels

It is not a news that autumn is the busiest season for people involved in the Oracle Community. Thanks to the OTN Nordic Tour this year I am setting my new record 🙂

In the next 2 months I will give 15 presentations in 8 distinct countries and in 3 distinct languages (Italian, French, English).

If you are based in one of those countries, you can join and say hello 🙂

Date/Time Event
11/10/2016
11:00 am - 12:00 pm
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb [Nordic Tour 2016 - Denmark]
Oracle Denmark, Ballerup
11/10/2016
2:10 pm - 3:10 pm
Migrating to 12c: 300 DBs in 300 days. What we learned [Nordic Tour 2016 - Denmark]
Oracle Denmark, Ballerup
12/10/2016
11:15 am - 12:00 pm
Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Norway]
Felix Conference Center, Oslo
12/10/2016
1:00 pm - 1:45 pm
Self-Service Database Operations made easy with APEX [Nordic Tour 2016 - Norway]
Felix Conference Center, Oslo
12/10/2016
3:00 pm - 3:45 pm
Database Migration Assistant for Unicode (DMU): a Real Customer Case [Nordic Tour 2016 - Norway]
Felix Conference Center, Oslo
13/10/2016
3:10 pm - 4:00 pm
Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Finland]
Accenture Finland, Helsinki
14/10/2016
9:00 am - 9:45 am
Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Sweden]
Stockholm, Stockholm
14/10/2016
10:00 am - 10:45 am
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb. [Nordic Tour 2016 - Sweden]
Stockholm, Stockholm
11/11/2016
9:30 am - 10:15 am
Migrating to 12c: 300 DBs in 300 days. What we learned. [ITOUG Tech Day 2016]
UNA Hotel Century, Milano
11/11/2016
12:00 pm - 12:45 pm
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb.
UNA Hotel Century, Milano
16/11/2016
11:00 am - 11:45 am
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb [DOAG 2016]
DOAG Konferenz 2016, Nürnberg
22/11/2016
10:50 am - 11:30 am
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quels problèmes peut-on rencontrer ? [Swiss Data Forum 16]
Aquatis Hotel, Lausanne
23/11/2016
9:00 am - 12:00 pm
Migrating to Oracle Database 12c: 300 Databases in 300 Days [Oracle Tech Breakfast]
Oracle Business Breakfast, Oracle Suisse SA, Geneva
07/12/2016
12:30 pm - 1:15 pm
Upgrading 300 Databases to 12c in 300 Days. What Can Go Wrong? [UKOUG_Tech16]
International Convention Centre, Birmingham, Birmingham
07/12/2016
3:10 pm - 4:00 pm
Adaptive Features or: How I Learned to Stop Worrying & Troubleshoot the Bomb [UKOUG Tech16]
International Convention Centre, Birmingham, Birmingham

The updated list of upcoming events can be found here.

How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog 😀

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

spd_awr_high_cpu_sar

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

spd_awr_high_cpu_em

 

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

spd_awr_high_cpu_m001It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

spd_awr_high_cpu_instOk, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

spd_cpu_sql_monitorDBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

spd_awr_franckOh no, not Franck again! He always discovers new stuff and blogs about it before I do 🙂

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that  it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

 

 

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

Ludovico

 

The short story of two ACE Directors, competitors and friends

Well, this is a completely different post from what I usually publish. I like to blog about technology, personal interests and achievements.

This time I really would like to spend a few words to praise a friend.

I met Franck Pachot for the first time back in 2012, it was my first month in Trivadis and, believe it or not, Franck was working for it as well. I have the evidence here 😉

It was the first time since years that I was meeting someone at least as smart as me on the Oracle stack (later, it happened many more times to meet smarter people, but that’s another story).

A few months later, he left Trivadis to join it’s sworn enemy dbi services. But established friendships and like mindedness don’t disappear, we continued to meet whenever an opportunity was coming up, and we started almost simultaneously to boost our blogging activities, doing public presentations and expanding our presence on the social medias (mostly Twitter).

After I’ve got my Oracle ACE status in 2014, we went together at the Oracle Open World. I used to know many folks there and I can say that I helped Franck to meet many smart people inside and outside the ACE Program. A month after the OOW, he became an Oracle ACE.

DSC_0088_2DSC02749_2Franck’s energy, passion and devotion for the Oracle Community are endless. What he’s doing, including his last big effort, is just great and all the people in the Oracle Community respect him. I can say that now he is far more active than me in the Oracle Community (at least regarding “public” activities ;-))

DSC02741_2We both had the target of becoming Oracle ACE Directors, and I have spent a bad month in April when I became an ACE Director and his nomination was still pending.

I said: “If you become ACE Director by the end of April I will write a blog post about you.” And that’s where this post comes from.

Congratulations ACE Director Franck, perfect timing! 🙂

O_ACEDirectorLogo_clrrev

Ludo

 

 

Bash tips & tricks [ep. 7]: Cleanup on EXIT with a trap

This is the seventh epidose of a small series.

Description:

Pipes, temporary files, lock files, processes spawned in background, rows inserted in a status table that need to be updated… Everything need to be cleaned up if the script exits, even when the exit condition is not triggered inside the script.

BAD:

The worst practice is, of course, to forget to cleanup the tempfiles, leaving my output and temporary directories full of files *.tmp, *.pipe, *.lck, etc. I will not show the code because the list of bad practices is quite long…

Better than forgiving to cleanup, but still very bad, is to cleanup everything just before triggering the exit command (in the following example, F_check_exit is a function that exits the script if the first argument is non-zero, as defined it in the previous episode):

A better approach, would be to put all the cleanup tasks in a Cleanup()  function and then call this function instead of duplicating all the code everywhere:

But still, I need to make sure that I insert this piece of code everywhere. Not optimal yet.

I may include the Cleanup function inside the F_check_exit function, but then I have two inconvenients:
1 – I need to define the Cleanup function in every script that includes my include file
2 – still there will be exit conditions that are not trapped

GOOD:

The good approach would be to trap the EXIT signal with the Cleanup function:

Much better! But what if my include script has some logic that also creates some temporary files?

I can create a global F_Cleanup function that eventually executes the local Cleanup function, if defined. Let me show this:

Include script:

Main script:

The Cleanup function will be executed only if defined.

No Cleanup function: no worries, but still the F_Cleanup function can do some global cleanup not specific to the main script.

Bash tips & tricks [ep. 6]: Check the exit code

This is the sixth epidose of a small series.

Description:

Every command in a script may fail due to external reasons. Bash programming is not functional programming! 🙂

After running a command, make sure that you check the exit code and either raise a warning or exit with an error, depending on how a failure can impact the execution of the script.

BAD:

The worst example is not to check the exit code at all:

Next one is better, but you may have a lot of additional code to type:

Again, Log_Close, eok, eerror, etc are functions defined using the previous Bash Tips & Tricks in this series.

GOOD:

Define once the check functions that you will use after every command: