How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog 😀

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

spd_awr_high_cpu_sar

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

spd_awr_high_cpu_em

 

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

spd_awr_high_cpu_m001It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

spd_awr_high_cpu_instOk, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

spd_cpu_sql_monitorDBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

spd_awr_franckOh no, not Franck again! He always discovers new stuff and blogs about it before I do 🙂

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that  it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

 

 

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

Ludovico

 

The following two tabs change content below.

Ludovico

Principal Product Manager at Oracle
Ludovico is a member of the Oracle Database High Availability (HA), Scalability & Maximum Availability Architecture (MAA) Product Management team in Oracle. He focuses on Oracle Data Guard, Flashback technologies, and Cloud MAA.

14 thoughts on “How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

  1. In my case we are getting CPU Spikes everyday 2-3 incidents for more than 75% CPU usage, I compared the 4-5 AWR reports and found that, MMON_SLAVE sql module I found in all the AWR’s but not ORA-12751 in the alert log, can it be the same issue ?

    Thanks
    Ojas

  2. do you think it applies to out situation by looking the the below logs:

    2019-02-12T13:07:48.655533-05:00
    Archived Log entry 52779 added for T-1.S-26944 ID 0x2999d297 LAD:1
    2019-02-12T13:09:59.923455-05:00
    Thread 1 cannot allocate new log, sequence 26946
    Checkpoint not complete
    Current log# 5 seq# 26945 mem# 0: /export/home/oracle/ora11g/oradata/maindb1/redolog/redo5.log
    2019-02-12T13:16:10.323674-05:00
    minact-scn: useg scan erroring out with error e:12751
    2019-02-12T13:26:22.679074-05:00
    minact-scn: useg scan erroring out with error e:12751
    2019-02-12T13:41:34.211132-05:00
    Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may c
    onsider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 16777216 bytes. Both ASH size and
    the total number of emergency flushes since instance startup can be monitored by running the following query:
    select total_size,awr_flush_emergency_count from v$ash_info;
    2019-02-12T13:41:40.211818-05:00
    minact-scn: useg scan erroring out with error e:12751
    Suspending MMON action ‘Block Cleanout Optim, Undo Segment Scan’ for 104400 seconds
    2019-02-12T13:51:52.988842-05:00
    Suspending MMON action ‘undo usage’ for 104400 seconds

  3. Yours is a different issue as I see for
    This document Doc ID 2006839.1
    MMON Slave Fails With ORA-12751 While Querying V$RMAN_STATUS (Doc ID 2006839.1)

  4. In my case Oracle supported pointed to Doc ID 2006839.1 which refers to bug 14078947 and we applied the workaround by deleting and locking the table stats for X$KCCRSR.

  5. I’m having exact the same line errors as AL. Working now with Oracle support to determine the issue and hopefully get this fixed somehow.

  6. Not sure my is same case or not with below errors:

    not in wait at each sample
    —– END DDE Action: ‘ORA_12751_DUMP’ (SUCCESS, 9 csec) —–
    —– END DDE Actions Dump (total 9 csec) —–
    *** KEWUXS – encountered error: (ORA-12751: cpu time or run time policy violation
    ORA-06512: at “SYS.DBMS_SQL”, line 1839
    ORA-06512: at “SYS.DBMS_FEATURE_USAGE_INTERNAL”, line 276
    ORA-06512: at “SYS.DBMS_FEATURE_USAGE_INTERNAL”, line 522
    ORA-06512: at “SYS.DBMS_FEATURE_USAGE_INTERNAL”, line 694
    ORA-06512: at “SYS.DBMS_FEATURE_USAGE_INTERNAL”, line 791
    ORA-06512: at line 1
    )
    DDE rules only execution for: ORA 12751
    *****************************************************
    [121 samples, 13:03:01 – 13:05:01]
    not in wait at each sample
    —– END DDE Action: ‘ORA_12751_DUMP’ (SUCCESS, 7 csec) —–
    —– END DDE Actions Dump (total 7 csec) —–
    *** KEWFADS: Error=13515 encountered by Auto DBFUS slave.
    KEBM: MMON slave action policy violation. kewfmadsa_; viol=2; err=13515

  7. Pingback: How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR - Ludovico Caldara - Blogs - triBLOG

  8. Hi Ludo,

    When I opened the SR, Oracle support suggested:

    alter system set “_swrf_mmon_dbfus”=false;
    or
    alter system set “_optimizer_dsdir_usage_control” 0;

    But your solution is much better as the solution (or workaround) is in the same scope than the problem.

    Regards,
    Franck.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.