DBA survival BLOG

Some DBA stuff and lots of Oracle Data Guard

How to fix CPU usage problem in 12c due to DBMS_FEATURE_AWR

Posted on June 6, 2016 by Ludovico

I love my job because I always have suprises. This week’s surprise has been another problem related to SQL Plan Directives in 12c. Because it is a common problem that potentially affects ALL the customers, I am glad to share the solution on my blog 😀

Symptom of the problem: High CPU usage on the server

My customer’s DBA team has spotted a consistent high CPU utilisation on its servers:

Everyday, at the same time, and for 20-40 minutes, the servers hosting the Oracle databases run literally out of CPU.

Troubleshooting

Ok, it would be too easy to give the solution now. If you cannot wait, jump at the end of this post. But what I like more is to explain how I came to it.

First, I gave a look at the processes consuming CPU. Most of the servers have many consolidated databases on them. Surprisingly, this is what I have found:

It seems that the source of the problem is not a single database, but all of them. Isn’t it? And I see another pattern here: the CPU usage comes always from the [m001] process, so it is not related to a user process.

My customer has Diagnostic Pack so it is easy to go deeper, but you can get the same result with other free tools like s-ash, statspack and snapper. However, this is what I have found in the Instance Top Activity:

Ok, everything comes from a single query with sql_id auyf8px9ywc6j. This is the full sql_text:

(SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER ,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP ,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE
FROM DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST
WHERE SN.BEGIN_INTERVAL_TIME > TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME ) ,DELTA_DATA AS
(SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE + (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS DELTA_TIME
FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2 ,SNAP_RANGES SR
WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID = SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM2.SNAP_ID = SR.MAX_SNAP AND TM2.STAT_ID = SR.STAT_ID )
SELECT STAT_NAME ,ROUND(SUM(DELTA_TIME/1000000),2) AS SECS
FROM DELTA_DATA GROUP BY STAT_NAME

(SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER ,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP ,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE

FROM DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST

WHERE SN.BEGIN_INTERVAL_TIME > TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME ) ,DELTA_DATA AS

(SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE + (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS DELTA_TIME

FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2 ,SNAP_RANGES SR

WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID = SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.INSTANCE_NUMBER = SR.INSTANCE_NUMBER AND TM2.SNAP_ID = SR.MAX_SNAP AND TM2.STAT_ID = SR.STAT_ID )

SELECT STAT_NAME ,ROUND(SUM(DELTA_TIME/1000000),2) AS SECS

FROM DELTA_DATA GROUP BY STAT_NAME

It looks like something made by a DBA, but it comes from the MMON.

Looking around, it seems closely related to two PL/SQL calls that I could find in the SQL Monitor and that systematically fail every day:

DBMS_FEATURE_AWR function calls internally the SQL auyf8px9ywc6j.

The MOS does not know anything about that query, but the internet does:

Oh no, not Franck again! He always discovers new stuff and blogs about it before I do 🙂

In his blog post, he points out that the query fails because of error ORA-12751 (resource plan limiting CPU usage) and that it is a problem of Adaptive Dynamic Sampling. Is it true?

What I like to do when I have a problematic sql_id, is to run sqld360 from Mauro Pagano, but the resulting zip file does not contain anything useful, because actually there are no executions and no plans.

SQL> select sql_id,  executions, loads, cpu_time from v$sqlstats where sql_id='auyf8px9ywc6j';

SQL_ID        EXECUTIONS      LOADS   CPU_TIME
------------- ---------- ---------- ----------
auyf8px9ywc6j          0         11          0

SQL> select sql_id,  child_number from v$sql where sql_id='auyf8px9ywc6j';

no rows selected

SQL>

SQL> select sql_id, executions, loads, cpu_time from v$sqlstats where sql_id='auyf8px9ywc6j';

SQL_ID EXECUTIONS LOADS CPU_TIME

------------- ---------- ---------- ----------

auyf8px9ywc6j 0 11 0

SQL> select sql_id, child_number from v$sql where sql_id='auyf8px9ywc6j';

no rows selected

SQL>

During the execution of the statement (or better, during the period with high CPU usage), there is an entry in v$sql, but no plans associated:

SQL> select sql_id, child_number from v$sql where sql_id='auyf8px9ywc6j';

SQL_ID        CHILD_NUMBER
------------- ------------
auyf8px9ywc6j            0

SQL> select * from table (dbms_xplan.display_cursor('auyf8px9ywc6j',0, 'ALL +NOTE'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  auyf8px9ywc6j, child number 0

WITH SNAP_RANGES AS (SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER
,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP
,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS
MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE FROM
DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST WHERE SN.BEGIN_INTERVAL_TIME >
TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND
SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY
SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME )
,DELTA_DATA AS (SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE
WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE
+ (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS
DELTA_TIME FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2
,SNAP_RANGES SR WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER =
SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID =
SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.

NOTE: cannot fetch plan for SQL_ID: auyf8px9ywc6j, CHILD_NUMBER: 0
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)


22 rows selected.

SQL> select sql_id, child_number from v$sql where sql_id='auyf8px9ywc6j';

SQL_ID CHILD_NUMBER

------------- ------------

auyf8px9ywc6j 0

SQL> select * from table (dbms_xplan.display_cursor('auyf8px9ywc6j',0, 'ALL +NOTE'));

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

SQL_ID auyf8px9ywc6j, child number 0

WITH SNAP_RANGES AS (SELECT /*+ FULL(ST) */ SN.DBID ,SN.INSTANCE_NUMBER

,SN.STARTUP_TIME ,ST.STAT_ID ,ST.STAT_NAME ,MIN(SN.SNAP_ID) AS MIN_SNAP

,MAX(SN.SNAP_ID) AS MAX_SNAP ,MIN(CAST(BEGIN_INTERVAL_TIME AS DATE)) AS

MIN_DATE ,MAX(CAST(END_INTERVAL_TIME AS DATE)) AS MAX_DATE FROM

DBA_HIST_SNAPSHOT SN ,WRH$_STAT_NAME ST WHERE SN.BEGIN_INTERVAL_TIME >

TRUNC(SYSDATE) - 7 AND SN.END_INTERVAL_TIME < TRUNC(SYSDATE) AND

SN.DBID = ST.DBID AND ST.STAT_NAME IN ('DB time', 'DB CPU') GROUP BY

SN.DBID,SN.INSTANCE_NUMBER,SN.STARTUP_TIME,ST.STAT_ID,ST.STAT_NAME )

,DELTA_DATA AS (SELECT SR.DBID ,SR.INSTANCE_NUMBER ,SR.STAT_NAME ,CASE

WHEN SR.STARTUP_TIME BETWEEN SR.MIN_DATE AND SR.MAX_DATE THEN TM1.VALUE

+ (TM2.VALUE - TM1.VALUE) ELSE (TM2.VALUE - TM1.VALUE) END AS

DELTA_TIME FROM WRH$_SYS_TIME_MODEL TM1 ,WRH$_SYS_TIME_MODEL TM2

,SNAP_RANGES SR WHERE TM1.DBID = SR.DBID AND TM1.INSTANCE_NUMBER =

SR.INSTANCE_NUMBER AND TM1.SNAP_ID = SR.MIN_SNAP AND TM1.STAT_ID =

SR.STAT_ID AND TM2.DBID = SR.DBID AND TM2.

NOTE: cannot fetch plan for SQL_ID: auyf8px9ywc6j, CHILD_NUMBER: 0

Please verify value of SQL_ID and CHILD_NUMBER;

It could also be that the plan is no longer in cursor cache (check v$sql_plan)

22 rows selected.

And this is very likely because the statement is still parsing, and all the time is due to the Dynamic Sampling. But because the plan is not there yet, I cannot check it in the DBMS_XPLAN.DISPLAY_CURSOR.

I decided then to trace it with those two statements:

SQL> alter system set events 'sql_trace [sql:auyf8px9ywc6j]';

SQL> alter system set events 'trace[rdbms.SQL_Optimizer.*][sql:auyf8px9ywc6j]';

SQL> alter system set events 'sql_trace [sql:auyf8px9ywc6j]';

SQL> alter system set events 'trace[rdbms.SQL_Optimizer.*][sql:auyf8px9ywc6j]';

At the next execution I see indeed the Adaptive Dynamic Sampling in the trace file, the errror due to the exhausted CPU in the resource plan, and the directives that caused the Adaptive Dynamic Sampling:

=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$3877D5D0 (#3)
Applicable DS directives:
   dirid = 17707367266596005344, state = 5, flags = 1, loc = 1 {CJ(8694)[1, 2]}
   dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}
   dirid = 10027833930063681981, state = 1, flags = 5, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]; (8436)[1, 5]; (8436)[1, 5]}
Checking valid directives for the query block
  SPD: Directive valid: dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}
  SPD: Return code in qosdDSDirSetup: EXISTS, estType = GROUP_BY
  SPD: Return code in qosdDSDirSetup: NODIR, estType = HAVING
  SPD: Return code in qosdDSDirSetup: NODIR, estType = QUERY_BLOCK

=======================================

SPD: BEGIN context at query block level

=======================================

Query Block SEL$3877D5D0 (#3)

Applicable DS directives:

dirid = 17707367266596005344, state = 5, flags = 1, loc = 1 {CJ(8694)[1, 2]}

dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}

dirid = 10027833930063681981, state = 1, flags = 5, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]; (8436)[1, 5]; (8436)[1, 5]}

Checking valid directives for the query block

SPD: Directive valid: dirid = 17748238338555778238, state = 5, flags = 1, loc = 4 {(8694)[2, 3, 4]; (8460)[2, 3]}

SPD: Return code in qosdDSDirSetup: EXISTS, estType = GROUP_BY

SPD: Return code in qosdDSDirSetup: NODIR, estType = HAVING

SPD: Return code in qosdDSDirSetup: NODIR, estType = QUERY_BLOCK

PARSING IN CURSOR #139834781881608 len=1106 dep=4 uid=0 oct=3 lid=0 tim=3349661181783 hv=4280474888 ad='95770310' sqlid='8w3h8fvzk5r88'
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT /*+ FULL ("ST") */ "WRM$_SNAPSHOT"."DBID" "DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER" "INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME" "STARTUP_TIME","ST"."STAT_ID" "STAT_ID","ST"."STAT_NAME" "STAT_NAME",MIN("WRM$_SNAPSHOT"."SNAP_ID") "MIN_SNAP",MAX("WRM$_SNAPSHOT"."SNAP_ID") "MAX_SNAP",MIN(CAST("WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME" AS DATE)) "MIN_DATE",MAX(CAST("WRM$_SNAPSHOT"."END_INTERVAL_TIME" AS DATE)) "MAX_DATE" FROM SYS."WRM$_SNAPSHOT" "WRM$_SNAPSHOT","WRH$_STAT_NAME" "ST" WHERE "WRM$_SNAPSHOT"."DBID"="ST"."DBID" AND ("ST"."STAT_NAME"='DB CPU' OR "ST"."STAT_NAME"='DB time') AND "WRM$_SNAPSHOT"."STATUS"=0 AND "WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME">TRUNC(SYSDATE@!)-7 AND "WRM$_SNAPSHOT"."END_INTERVAL_TIME"<TRUNC(SYSDATE@!) GROUP BY "WRM$_SNAPSHOT"."DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME","ST"."STAT_ID","ST"."STAT_NAME") "VW_DIS_1") innerQuery
END OF STMT
...
>> Query Blk Card adjusted from 3.000000 to 2.000000 due to adaptive dynamic sampling

PARSING IN CURSOR #139834781881608 len=1106 dep=4 uid=0 oct=3 lid=0 tim=3349661181783 hv=4280474888 ad='95770310' sqlid='8w3h8fvzk5r88'

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") */ 1 AS C1 FROM (SELECT /*+ FULL ("ST") */ "WRM$_SNAPSHOT"."DBID" "DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER" "INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME" "STARTUP_TIME","ST"."STAT_ID" "STAT_ID","ST"."STAT_NAME" "STAT_NAME",MIN("WRM$_SNAPSHOT"."SNAP_ID") "MIN_SNAP",MAX("WRM$_SNAPSHOT"."SNAP_ID") "MAX_SNAP",MIN(CAST("WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME" AS DATE)) "MIN_DATE",MAX(CAST("WRM$_SNAPSHOT"."END_INTERVAL_TIME" AS DATE)) "MAX_DATE" FROM SYS."WRM$_SNAPSHOT" "WRM$_SNAPSHOT","WRH$_STAT_NAME" "ST" WHERE "WRM$_SNAPSHOT"."DBID"="ST"."DBID" AND ("ST"."STAT_NAME"='DB CPU' OR "ST"."STAT_NAME"='DB time') AND "WRM$_SNAPSHOT"."STATUS"=0 AND "WRM$_SNAPSHOT"."BEGIN_INTERVAL_TIME">TRUNC(SYSDATE@!)-7 AND "WRM$_SNAPSHOT"."END_INTERVAL_TIME"<TRUNC(SYSDATE@!) GROUP BY "WRM$_SNAPSHOT"."DBID","WRM$_SNAPSHOT"."INSTANCE_NUMBER","WRM$_SNAPSHOT"."STARTUP_TIME","ST"."STAT_ID","ST"."STAT_NAME") "VW_DIS_1") innerQuery

END OF STMT

...

>> Query Blk Card adjusted from 3.000000 to 2.000000 due to adaptive dynamic sampling

*** KEWUXS - encountered error: (ORA-12751: violation du temps UC ou des règles relatives au temps d'exécution
ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 14
ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 92
ORA-06512: à ligne 1
ORA-06512: à "SYS.DBMS_SQL", ligne 1707
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 312
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 522
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 694
ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 791
ORA-06512: à ligne 1
)

*** KEWUXS - encountered error: (ORA-12751: violation du temps UC ou des règles relatives au temps d'exécution

ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 14

ORA-06512: à "SYS.DBMS_FEATURE_AWR", ligne 92

ORA-06512: à ligne 1

ORA-06512: à "SYS.DBMS_SQL", ligne 1707

ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 312

ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 522

ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 694

ORA-06512: à "SYS.DBMS_FEATURE_USAGE_INTERNAL", ligne 791

ORA-06512: à ligne 1

)

So, there are some SQL Plan Directives that force the CBO to run ADS for this query.

SQL> select TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED, LAST_USED from dba_sql_plan_directives where directive_id in (10027833930063681981, 17707367266596005344, 17748238338555778238);

TYPE             ENA STATE      AUT REASON                               CREATED
---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------
LAST_MODIFIED                                                               LAST_USED
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     03-JUN-16 02.10.41.000000 PM
03-JUN-16 04.14.32.000000 PM

DYNAMIC_SAMPLING YES USABLE     YES SINGLE TABLE CARDINALITY MISESTIMATE 27-MAR-16 09.01.20.000000 AM
17-APR-16 09.13.01.000000 AM                                                17-APR-16 09.13.01.000000000 AM

DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     13-FEB-16 06.07.36.000000 AM
27-FEB-16 06.03.09.000000 AM                                                03-JUN-16 02.10.41.000000000 PM

SQL> select TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED, LAST_USED from dba_sql_plan_directives where directive_id in (10027833930063681981, 17707367266596005344, 17748238338555778238);

TYPE ENA STATE AUT REASON CREATED

---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------

LAST_MODIFIED LAST_USED

--------------------------------------------------------------------------- ---------------------------------------------------------------------------

DYNAMIC_SAMPLING YES USABLE YES GROUP BY CARDINALITY MISESTIMATE 03-JUN-16 02.10.41.000000 PM

03-JUN-16 04.14.32.000000 PM

DYNAMIC_SAMPLING YES USABLE YES SINGLE TABLE CARDINALITY MISESTIMATE 27-MAR-16 09.01.20.000000 AM

17-APR-16 09.13.01.000000 AM 17-APR-16 09.13.01.000000000 AM

DYNAMIC_SAMPLING YES USABLE YES GROUP BY CARDINALITY MISESTIMATE 13-FEB-16 06.07.36.000000 AM

27-FEB-16 06.03.09.000000 AM 03-JUN-16 02.10.41.000000000 PM

This query touches three tables, so instead of relying on the DIRECTIVE_IDs, it’s better to get the directives by object name:

SQL> r
  1  select distinct d.directive_id, TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED
  2  from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on
  3*     (d.directive_id=o.directive_id) where o.owner='SYS' and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT')

DIRECTIVE_ID TYPE             ENA STATE      AUT REASON                               CREATED
------------ ---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------
LAST_MODIFIED
---------------------------------------------------------------------------
  8.8578E+18 DYNAMIC_SAMPLING YES USABLE     YES JOIN CARDINALITY MISESTIMATE         14-FEB-16 08.11.29.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7748E+19 DYNAMIC_SAMPLING YES USABLE     YES GROUP BY CARDINALITY MISESTIMATE     19-MAR-16 02.15.17.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7170E+19 DYNAMIC_SAMPLING YES USABLE     YES JOIN CARDINALITY MISESTIMATE         14-FEB-16 08.11.29.000000 AM
06-JUN-16 01.57.35.000000 PM

  1.7707E+19 DYNAMIC_SAMPLING YES USABLE     YES SINGLE TABLE CARDINALITY MISESTIMATE 13-MAR-16 08.04.38.000000 AM
06-JUN-16 01.57.35.000000 PM

SQL> r

1 select distinct d.directive_id, TYPE, ENABLED, STATE, AUTO_DROP, REASON, CREATED, LAST_MODIFIED

2 from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on

3* (d.directive_id=o.directive_id) where o.owner='SYS' and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT')

DIRECTIVE_ID TYPE ENA STATE AUT REASON CREATED

------------ ---------------- --- ---------- --- ------------------------------------ ---------------------------------------------------------------------------

LAST_MODIFIED

---------------------------------------------------------------------------

8.8578E+18 DYNAMIC_SAMPLING YES USABLE YES JOIN CARDINALITY MISESTIMATE 14-FEB-16 08.11.29.000000 AM

06-JUN-16 01.57.35.000000 PM

1.7748E+19 DYNAMIC_SAMPLING YES USABLE YES GROUP BY CARDINALITY MISESTIMATE 19-MAR-16 02.15.17.000000 AM

06-JUN-16 01.57.35.000000 PM

1.7170E+19 DYNAMIC_SAMPLING YES USABLE YES JOIN CARDINALITY MISESTIMATE 14-FEB-16 08.11.29.000000 AM

06-JUN-16 01.57.35.000000 PM

1.7707E+19 DYNAMIC_SAMPLING YES USABLE YES SINGLE TABLE CARDINALITY MISESTIMATE 13-MAR-16 08.04.38.000000 AM

06-JUN-16 01.57.35.000000 PM

Solution

At this point, the solution is the same already pointed out in one of my previous blog posts: disable the directives individually!

BEGIN
  FOR rec in (select d.directive_id as did 
    from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on
    (d.directive_id=o.directive_id) where o.owner='SYS'
      and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT'))
  LOOP
    DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'ENABLED','NO');
    DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'AUTO_DROP','NO');
  END LOOP;
END;
/

BEGIN

FOR rec in (select d.directive_id as did

from dba_sql_plan_directives d join dba_sql_plan_dir_objects o on

(d.directive_id=o.directive_id) where o.owner='SYS'

and o.object_name in ('WRH$_SYS_TIME_MODEL','WRH$_STAT_NAME','WRM$_SNAPSHOT'))

LOOP

DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'ENABLED','NO');

DBMS_SPD.ALTER_SQL_PLAN_DIRECTIVE ( rec.did, 'AUTO_DROP','NO');

END LOOP;

END;

This very same PL/SQL block must be run on ALL the 12c databases affected by this Adaptive Dynamic Sampling problem on the sql_id auyf8px9ywc6j.

If you have just migrated the database to 12c, it would make even more sense to programmatically “inject” the disabled SQL Plan Directives into every freshly created or upgraded 12c database (until Oracle releases a patch for this non-bug).

-- export from a source where the directives exist and have been disabled
SET SERVEROUTPUT ON
DECLARE
  my_list  DBMS_SPD.OBJECTTAB := DBMS_SPD.ObjectTab();
  dir_cnt  NUMBER;
BEGIN
  DBMS_SPD.CREATE_STGTAB_DIRECTIVE  (table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM' );
  my_list.extend(3);
 
  -- TAB table
  my_list(1).owner := 'SYS';
  my_list(1).object_name := 'WRH$_SYS_TIME_MODEL';
  my_list(1).object_type := 'TABLE';
  my_list(2).owner := 'SYS';
  my_list(2).object_name := 'WRH$_STAT_NAME';
  my_list(2).object_type := 'TABLE';
  my_list(3).owner := 'SYS';
  my_list(3).object_name := 'WRM$_SNAPSHOT';
  my_list(3).object_type := 'TABLE';

  dir_cnt := DBMS_SPD.PACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM', obj_list => my_list);
   DBMS_OUTPUT.PUT_LINE('dir_cnt = ' || dir_cnt);
END;
/

expdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=expdp_AUYF8PX9YWC6J_DIRECTIVES.log tables=system.AUYF8PX9YWC6J_DIRECTIVES

-- import into the freshly upgraded/created 12c database
impdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=impdp_AUYF8PX9YWC6J_DIRECTIVES.log

SELECT DBMS_SPD.UNPACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM') FROM DUAL;

-- export from a source where the directives exist and have been disabled

SET SERVEROUTPUT ON

DECLARE

my_list DBMS_SPD.OBJECTTAB := DBMS_SPD.ObjectTab();

dir_cnt NUMBER;

BEGIN

DBMS_SPD.CREATE_STGTAB_DIRECTIVE (table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM' );

my_list.extend(3);

-- TAB table

my_list(1).owner := 'SYS';

my_list(1).object_name := 'WRH$_SYS_TIME_MODEL';

my_list(1).object_type := 'TABLE';

my_list(2).owner := 'SYS';

my_list(2).object_name := 'WRH$_STAT_NAME';

my_list(2).object_type := 'TABLE';

my_list(3).owner := 'SYS';

my_list(3).object_name := 'WRM$_SNAPSHOT';

my_list(3).object_type := 'TABLE';

dir_cnt := DBMS_SPD.PACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM', obj_list => my_list);

DBMS_OUTPUT.PUT_LINE('dir_cnt = ' || dir_cnt);

END;

expdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=expdp_AUYF8PX9YWC6J_DIRECTIVES.log tables=system.AUYF8PX9YWC6J_DIRECTIVES

-- import into the freshly upgraded/created 12c database

impdp directory=data_pump_dir dumpfile=AUYF8PX9YWC6J_DIRECTIVES.dmp logfile=impdp_AUYF8PX9YWC6J_DIRECTIVES.log

SELECT DBMS_SPD.UNPACK_STGTAB_DIRECTIVE(table_name => 'AUYF8PX9YWC6J_DIRECTIVES', table_owner=> 'SYSTEM') FROM DUAL;

It comes without saying that the next execution has been very quick, consuming almost no CPU and without using ADS.

HTH

—

Ludovico

Oracle Database on ACFS: a perfect marriage?

Posted on November 26, 2015 by Ludovico

Update: I will give this presentation at UKOUG Tech15, Wed 9 December at 14:30.

This presentation has had a very poor score in selections for conferences (no OOW, no DOAG) but people liked it very much at Paris Oracle Meetup. The Database on ACFS is mainstream now, thanks to the new ODA releases. Having some knowledge about why and how you should run (not) Databases on ACFS is definitely worth a read.

Slides

Demo 1 recording

Demo 2 recording

Demo script (DB ACFS clone from Standby Database)

#!/bin/bash
####################
# create the clone #
####################

set -x

NUM=`echo $$ | cut -c 1-4`
export NEWNAME=${1:-SN$NUM}

cat <<EOF 
################################################
################################################
##
## CLONING DATABASE USING NEW SID: $NEWNAME
##
################################################
################################################

EOF

export ORACLE_SID=ACFSDB_1

dgmgrl <<EOF
connect sys/racattack
edit database ACFSDB set state="APPLY-OFF";
exit
EOF

acfsutil snap create -w $NEWNAME /u02

dgmgrl <<EOF
connect sys/racattack
edit database ACFSDB set state="APPLY-ON";
exit
EOF

cd /u02/.ACFS/snaps/$NEWNAME/ACFSDB

sqlplus / as sysdba <<EOF
alter database backup controlfile to trace as '/u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc' reuse resetlogs;
create pfile='/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora' from spfile;
exit
EOF

sed -i -e "s/u02\/ACFSDB\//u02\/.ACFS\/snaps\/$NEWNAME\/ACFSDB\//g" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc
sed -i -e "s/CREATE CONTROLFILE.*$/CREATE CONTROLFILE REUSE SET DATABASE \"$NEWNAME\" RESETLOGS FORCE LOGGING NOARCHIVELOG/" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc
rm /u02/.ACFS/snaps/$NEWNAME/ACFSDB/ACFSDB/fast_recovery_area/ACFSDB/controlfile/*
rm /u02/.ACFS/snaps/$NEWNAME/ACFSDB/ACFSDB/controlfile/*
sed -i '/^ACFS.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i -e "s/u02\/ACFSDB\//u02\/.ACFS\/snaps\/$NEWNAME\/ACFSDB\//g" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.db_name.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.db_unique_name.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.dispatchers.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.audit_file_dest.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.fal_server.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.log_archive_config.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.log_archive_dest_1.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.memory_target.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.service_names.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora
sed -i '/^\*\.cluster_database.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora


#find /u02/.ACFS/snaps/SNAP1/ACFSDB/ACFSDB/fast_recovery_area/ACFSDB/archivelog/ -type f -exec mv {} /u02/.ACFS/snaps/SNAP1/ACFSDB/archivelog/ \;

mkdir -p $ORACLE_BASE/admin/$NEWNAME/adump

cat  >>/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora <<EOF
*.audit_file_dest='$ORACLE_BASE/admin/$NEWNAME/adump'
*.db_name='$NEWNAME'
*.db_unique_name='$NEWNAME'
*.dispatchers='(PROTOCOL=TCP) (SERVICE=${NEWNAME}XDB)'
*.log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST'
*.sga_target=1900M
$NEWNAME.instance_number=1
$NEWNAME.undo_tablespace=UNDOTBS1
*.service_names='$NEWNAME'
*.cluster_database=false
EOF

export ORACLE_SID=$NEWNAME

head -n $((`grep -n ^RECOVER /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc | awk -F: '{print $1}'`-2)) /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc > /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control1.trc

 
sqlplus / as sysdba <<EOF
create spfile from pfile='/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora';
@/u02/.ACFS/snaps/$NEWNAME/ACFSDB/control1.trc
--recover automatic database using backup controlfile until cancel;
--CANCEL
alter database open resetlogs;
alter tablespace temp add tempfile size 50M ;
EOF

#!/bin/bash

####################

# create the clone #

####################

set -x

NUM=`echo $$ | cut -c 1-4`

export NEWNAME=${1:-SN$NUM}

cat <<EOF

################################################

## CLONING DATABASE USING NEW SID: $NEWNAME

################################################

EOF

export ORACLE_SID=ACFSDB_1

dgmgrl <<EOF

connect sys/racattack

edit database ACFSDB set state="APPLY-OFF";

exit

EOF

acfsutil snap create -w $NEWNAME /u02

dgmgrl <<EOF

connect sys/racattack

edit database ACFSDB set state="APPLY-ON";

exit

EOF

cd /u02/.ACFS/snaps/$NEWNAME/ACFSDB

sqlplus / as sysdba <<EOF

alter database backup controlfile to trace as '/u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc' reuse resetlogs;

create pfile='/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora' from spfile;

exit

EOF

sed -i -e "s/u02\/ACFSDB\//u02\/.ACFS\/snaps\/$NEWNAME\/ACFSDB\//g" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc

sed -i -e "s/CREATE CONTROLFILE.*$/CREATE CONTROLFILE REUSE SET DATABASE \"$NEWNAME\" RESETLOGS FORCE LOGGING NOARCHIVELOG/" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc

rm /u02/.ACFS/snaps/$NEWNAME/ACFSDB/ACFSDB/fast_recovery_area/ACFSDB/controlfile/*

rm /u02/.ACFS/snaps/$NEWNAME/ACFSDB/ACFSDB/controlfile/*

sed -i '/^ACFS.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i -e "s/u02\/ACFSDB\//u02\/.ACFS\/snaps\/$NEWNAME\/ACFSDB\//g" /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.db_name.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.db_unique_name.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.dispatchers.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.audit_file_dest.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.fal_server.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.log_archive_config.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.log_archive_dest_1.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.memory_target.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.service_names.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

sed -i '/^\*\.cluster_database.*$/d' /u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora

#find /u02/.ACFS/snaps/SNAP1/ACFSDB/ACFSDB/fast_recovery_area/ACFSDB/archivelog/ -type f -exec mv {} /u02/.ACFS/snaps/SNAP1/ACFSDB/archivelog/ \;

mkdir -p $ORACLE_BASE/admin/$NEWNAME/adump

cat >>/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora <<EOF

*.audit_file_dest='$ORACLE_BASE/admin/$NEWNAME/adump'

*.db_name='$NEWNAME'

*.db_unique_name='$NEWNAME'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=${NEWNAME}XDB)'

*.log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST'

*.sga_target=1900M

$NEWNAME.instance_number=1

$NEWNAME.undo_tablespace=UNDOTBS1

*.service_names='$NEWNAME'

*.cluster_database=false

EOF

export ORACLE_SID=$NEWNAME

head -n $((`grep -n ^RECOVER /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc | awk -F: '{print $1}'`-2)) /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control.trc > /u02/.ACFS/snaps/$NEWNAME/ACFSDB/control1.trc

sqlplus / as sysdba <<EOF

create spfile from pfile='/u02/.ACFS/snaps/$NEWNAME/ACFSDB/init$NEWNAME.ora';

@/u02/.ACFS/snaps/$NEWNAME/ACFSDB/control1.trc

--recover automatic database using backup controlfile until cancel;

--CANCEL

alter database open resetlogs;

alter tablespace temp add tempfile size 50M ;

EOF

Comments are, as always, very appreciated 🙂

—

Ludo

Generating graphs massively from Windows Performance Counters logs

Posted on April 23, 2013 by Ludovico

Windows Performance Monitor is an invaluable tool when you don’t have external enterprise monitoring tools and you need to face performance problems, whether you have a web/application server, a mail server or a database server.

But what I don’t personally like of it is what you get in terms of graphing. If you schedule and collect a big amount of performance metrics you will likely get lost in adding/removing such metrics from the graphical interface.

What I’ve done long time ago (and I’ve done again recently after my old laptop has been stolen 🙁 ) is to prepare a PHP script that parse the resulting CSV file and generate automatically one graph for each metric that could be found.

Unfortunately, most of Windows Sysadmin between you will disagree that I’ve done this using a Linux Box. But I guess you can use my script if you install php inside cygwin. The other tool you need, is rrdtool, again I use it massively to resolve my graphing needs.

How to collect your data

Basically you need to create any Data Collector within the Performance Monitor that generates a log file. You can specify directly a CSV file (Log format: Comma separated) or generate a BLG file and convert it later (Log format: Binary). System dumps are not used, so if you use the standard Performace template, you can delete it from your collection.

Remember that the more counters you take, the more the graph generation will take. The script does not run in parallel, so it will use only one core. Generally:

(Time to complete) = (Num Counters) * (Num Samples) * (Speed factor)

1	(Time to complete) = (Num Counters) * (Num Samples) * (Speed factor)

Where (Speed factor) is depending on both the CPU speed and the disk speed because of the huge number of syncs required to update several thousands of files. I’ve tried to reduce the number of rrdupdates by queuing several update values in a single command line and I’ve noticed an important increase of performances, but I know it’s not enough.

Converting a BLG (binary) log into a CSV log

Just use the relog tool:

C:\PerfLogs\Admin\Perftest\LUDO_20130423-000002> <strong>relog "Performance Counter.blg" -f csv -o "Performance Counter.csv"</strong>

Input
----------------
File(s):
     Performance Counter.blg (Binary)

Begin:    23.4.2013 14:56:02
End:      23.4.2013 15:33:37
Samples:  452

100.00%

Output
----------------
File:     Performance Counter.csv

Begin:    23.4.2013 14:56:02
End:      23.4.2013 15:33:37
Samples:  452

The command completed successfully.

C:\PerfLogs\Admin\Perftest\LUDO_20130423-000002> <strong>relog "Performance Counter.blg" -f csv -o "Performance Counter.csv"</strong>

Input

----------------

File(s):

Performance Counter.blg (Binary)

Begin: 23.4.2013 14:56:02

End: 23.4.2013 15:33:37

Samples: 452

100.00%

Output

----------------

File: Performance Counter.csv

Begin: 23.4.2013 14:56:02

End: 23.4.2013 15:33:37

Samples: 452

The command completed successfully.

Generating the graphs

Transfer the CSV on the box where you have the php and rrdtool configured, then run:

[root@lucrac01 temp]# php process_l.php PerformanceCounter.csv

453--Creating rrd /root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.rrd

Creating rrd /root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.rrd
Creating rrd /root/temp/LUDO/IPv4/Fragmented_Datagrams_sec.rrd
Creating rrd /root/temp/LUDO/IPv4/Datagrams_sec.rrd

...

Creating rrd /root/temp/LUDO/Memory/Pages_Input_sec.rrd
Creating rrd /root/temp/LUDO/Memory/Pool_Paged_Resident_Bytes.rrd
Creating rrd /root/temp/LUDO/Memory/Write_Copies_sec.rrd

...

Creating rrd /root/temp/LUDO/PhysicalDisk_2_E__/Avg._Disk_sec_Transfer.rrd
Creating rrd /root/temp/LUDO/PhysicalDisk_1_D__/Avg._Disk_sec_Transfer.rrd
Creating rrd /root/temp/LUDO/PhysicalDisk_0_C__/Avg._Disk_sec_Transfer.rrd
----......

1.Generating Graph: /root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.png
rrdtool graph /root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.png --start "1366721762" --end "1366724017" --width 453 DEF:ds0=/root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.rrd:value:LAST:step=5 LINE1:ds0#0000FF:"IPv4\Datagrams Received Delivered/sec" VDEF:ds0max=ds0,MAXIMUM VDEF:ds0avg=ds0,AVERAGE VDEF:ds0min=ds0,MINIMUM COMMENT:" " COMMENT:" Maximum " GPRINT:ds0max:"%6.2lf" COMMENT:" Average " GPRINT:ds0avg:"%6.2lf" COMMENT:" Minimum " GPRINT:ds0min:"%6.2lf"
534x177
2.Generating Graph: /root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.png
rrdtool graph /root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.png --start "1366721762" --end "1366724017" --width 453 DEF:ds0=/root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.rrd:value:LAST:step=5 LINE1:ds0#0000FF:"IPv4\Datagrams Received Unknown Protocol" VDEF:ds0max=ds0,MAXIMUM VDEF:ds0avg=ds0,AVERAGE VDEF:ds0min=ds0,MINIMUM COMMENT:" " COMMENT:" Maximum " GPRINT:ds0max:"%6.2lf" COMMENT:" Average " GPRINT:ds0avg:"%6.2lf" COMMENT:" Minimum " GPRINT:ds0min:"%6.2lf"
534x177
...

[root@lucrac01 temp]# php process_l.php PerformanceCounter.csv

453--Creating rrd /root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.rrd

Creating rrd /root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.rrd

Creating rrd /root/temp/LUDO/IPv4/Fragmented_Datagrams_sec.rrd

Creating rrd /root/temp/LUDO/IPv4/Datagrams_sec.rrd

...

Creating rrd /root/temp/LUDO/Memory/Pages_Input_sec.rrd

Creating rrd /root/temp/LUDO/Memory/Pool_Paged_Resident_Bytes.rrd

Creating rrd /root/temp/LUDO/Memory/Write_Copies_sec.rrd

...

Creating rrd /root/temp/LUDO/PhysicalDisk_2_E__/Avg._Disk_sec_Transfer.rrd

Creating rrd /root/temp/LUDO/PhysicalDisk_1_D__/Avg._Disk_sec_Transfer.rrd

Creating rrd /root/temp/LUDO/PhysicalDisk_0_C__/Avg._Disk_sec_Transfer.rrd

----......

1.Generating Graph: /root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.png

rrdtool graph /root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.png --start "1366721762" --end "1366724017" --width 453 DEF:ds0=/root/temp/LUDO/IPv4/Datagrams_Received_Delivered_sec.rrd:value:LAST:step=5 LINE1:ds0#0000FF:"IPv4\Datagrams Received Delivered/sec" VDEF:ds0max=ds0,MAXIMUM VDEF:ds0avg=ds0,AVERAGE VDEF:ds0min=ds0,MINIMUM COMMENT:" " COMMENT:" Maximum " GPRINT:ds0max:"%6.2lf" COMMENT:" Average " GPRINT:ds0avg:"%6.2lf" COMMENT:" Minimum " GPRINT:ds0min:"%6.2lf"

534x177

2.Generating Graph: /root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.png

rrdtool graph /root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.png --start "1366721762" --end "1366724017" --width 453 DEF:ds0=/root/temp/LUDO/IPv4/Datagrams_Received_Unknown_Protocol.rrd:value:LAST:step=5 LINE1:ds0#0000FF:"IPv4\Datagrams Received Unknown Protocol" VDEF:ds0max=ds0,MAXIMUM VDEF:ds0avg=ds0,AVERAGE VDEF:ds0min=ds0,MINIMUM COMMENT:" " COMMENT:" Maximum " GPRINT:ds0max:"%6.2lf" COMMENT:" Average " GPRINT:ds0avg:"%6.2lf" COMMENT:" Minimum " GPRINT:ds0min:"%6.2lf"

534x177

...

Now it’s done!

The script generate a folder with the name of the server (LUDO in my example) and a subfolder for each class of counters (as you see in Performance Monitor).

Inside each folder you will have a PNG (and an rrd) for each metric.

Important: The RRD are generated with a single round-robin archive with a size equal to the number of samples. If you want to have the rrd to store your historical data you’ll need to modify the script. Also, the size of the graph will be the same as the number of samples (for best reading), but limited to 1000 to avoid huge images.

Future Improvements

Would be nice to have a prepared set of graphs for standard graphs with multiple metrics (e.g. CPU user, system and idle together) and additional lines like regressions…

Download the script: process_l_php.txt and rename it with a .php extension.

Hope you’ll find it useful!

Cheers

Ludo

Oracle capacity planning with RRDTOOL

Posted on May 25, 2009 by Ludovico

RRDize everything, chapter 2

Oracle Database Server has the most powerful system catalog that allows to query almost any aspect inside an oracle instance.
You can query many v$ fixed views at regular intervals and populate many RRD files through rrdtool: space usage, wait events. system statistics and so on…

Since release 10.1 Oracle has introduced Automatic Workload Repository, a finer version of old good Statspack.
No matter if you are using AWR or statspack, you can rely on their views to collect data for your RRDs.

If you are administering a new instance and you haven’t collected its statistics so far, you can query (as example) the DBA_HIST_BG_EVENT_SUMMARY view to gather all AWR data about wait events. Historical views could be useful also to collect historical data once a week rather than query the fixed views every few minutes doing the hard work twice (you and AWR).

The whole process of gathering performance data and update rrd files can be resumed into the following steps:

– connect to the database
– query the AWR’s views
– build and execute an rrdtool update command
– check if rrd file exists or create it
– update the rrd file

The less rrdtool update commands you will execute, the better the whole process will perform.
Do it in a language you are comfortable with and that supports easily connection descriptors.

Since I’m very comfortable with php, I did it this way.

This is a very basilar script that works greatly for me with good performances:

#!/usr/bin/php -f
< ?php                                         

define('WD','/opt/oracle/awr');
$cs         = $_SERVER['argv'][1];
$user       = 'mymonitoruser';
$pass       = 'mystrongpassword'; 

/* open a new connection */
$ds = oci_connect($user, $pass, $cs)
        or die ("Cannot connect to Oracle Database ".$cs."\n");

/* setting client nls environment */
$sql = "alter session set nls_timestamp_format='MM/DD/YY HH24:MI'";
$stmt = oci_parse($ds, $sql);
oci_execute($stmt);
oci_free_statement($stmt);                                         

/* create directory that will contain rrds (if not exists) */
if(!file_exists(WD.'/'.$cs))
                mkdir(WD.'/'.$cs);
if(!file_exists(WD.'/'.$cs.'/wait'))
                mkdir(WD.'/'.$cs.'/wait');                   

/* function to create new RRDs */
function createRRD($name, $interval, $cs) {
        $hb = $interval*5; //heartbeat
        $cmd="rrdtool create ".WD."/".$cs."/wait/${name}.rrd -s ".$interval." \
                -b \"now -3month\" DS:waits:DERIVE:$hb:0:U \
                DS:mswaited:DERIVE:$hb:0:U \
                RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:30:336 \
                RRA:AVERAGE:0.5:120:372 RRA:AVERAGE:0.5:720:730 \
                RRA:MIN:0.5:1:1440 RRA:MIN:0.5:30:336 \
                RRA:MIN:0.5:120:372 RRA:MIN:0.5:720:730 \
                RRA:MAX:0.5:1:1440 RRA:MAX:0.5:30:336 \
                RRA:MAX:0.5:120:372 RRA:MAX:0.5:720:730 \
                RRA:LAST:0.5:1:1440";
        //print $cmd."\n";
        return passthru($cmd);
}                                                                              

/* take the snapshot frequency from dba_hist_wr_control
 to create the RDD with correct heartbeat value */
$sql = 'select extract(hour from snap_interval)*3600 +
extract(minute from snap_interval)*60 as SEED from DBA_HIST_WR_CONTROL';
$stmt = oci_parse($ds, $sql);
oci_execute($stmt);
$row = oci_fetch_assoc($stmt);
$interval = $row['SEED'];
unset($row);
oci_free_statement($stmt);                                              

/* statement definition that will collect
 all snapshots for a certain wait event with more than
 a certain amonut of time waited.
 Gathering ALL EVENTS could be time consuming and useless.
 I fetch rows ordered by event_name rather
 then by date because I can update many values
 into the same rrd with very few rrdupdate commands
*/
$sql = 'select s.END_INTERVAL_TIME END_INTERVAL_TIME,
    g.EVENT_NAME, g.WAIT_CLASS, g.TOTAL_WAITS,
    round(g.TIME_WAITED_MICRO/1000) MS
  from DBA_HIST_SNAPSHOT s,
   dba_hist_bg_event_summary g,
   v$instance i
 where s.SNAP_ID=g.SNAP_ID and g.wait_class!=\'Idle\'
  and g.TIME_WAITED_MICRO&gt;100000
  and s.instance_number=i.instance_number
  and s.instance_number=g.instance_number
 order by 2,1';                                      

/* default prefetch size (148) matches default snapshot retention (24hx7dd) */
$stmt = oci_parse($ds, $sql);
oci_set_prefetch($stmt, 148);
oci_execute($stmt);

$i=0;
$oldevent="";
while ($row = oci_fetch_assoc($stmt)) {
        if ($oldevent != $row['EVENT_NAME']) {
                //NEW EVENT DETECTED: WILL START A NEW UPDATE CMD
                if ($i != 0 &amp;&amp; !empty($cmd)) {
                        /* not the first occurrence,
                         I bet there's something in my buffer */
                        passthru($cmd);
                }
                $cleanName = preg_replace ("([^[:alnum:]_-])","_",$row['EVENT_NAME']);
                // if there is no rrd for this event, I create a new one
                if (!file_exists(WD."/".$cs."/wait/${cleanName}.rrd")) {
                        createRRD($cleanName, $interval, $cs);
                }
                /*
                * I initialize a new update command. This string act as a buffer: I append many
                * values to be updated so I'll update many values in a single command line:
                * less forks of rrdtool and less file opens: the whole update process has an
                * enormous improvement.
                */
                $precmd="rrdtool update ".WD."/".$cs."/wait/${cleanName}.rrd ";
                $lastcmd="rrdtool info ".WD."/".$cs."/wait/${cleanName}.rrd".
                        "| grep last_update | awk '{print \$NF}'";
                $last=trim(`$lastcmd`);
                printf ("%s - %s - last: %d\n", $row['EVENT_NAME'], $cleanName, $last);
                $i=0;
                $cmd=$precmd;
                $oldevent=$row['EVENT_NAME'];
        }
        $time=strtotime($row['END_INTERVAL_TIME']);
        //print "time: ".$time."  last: ".$last."\n";
        if ( $time &gt; $last ) {
                $cmd.=" ".$time.":".$row['TOTAL_WAITS'].":".$row['MS'];
                $i++;
        }
        if ($i &gt;= 40) {
                // when I reach 40 values per commandline I force
                // the update: next loop will reinitialize a new commandline.
                passthru($cmd);
                $cmd=$precmd;
                $i=0;
        }
        unset($row);

}
if ($i != 0) {
        /* one more update pending in my buffer */
        passthru($cmd);
}
oci_free_statement($stmt);
oci_close($ds);
?>

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

#!/usr/bin/php -f

< ?php

define('WD','/opt/oracle/awr');

$cs = $_SERVER['argv'][1];

$user = 'mymonitoruser';

$pass = 'mystrongpassword';

/* open a new connection */

$ds = oci_connect($user, $pass, $cs)

or die ("Cannot connect to Oracle Database ".$cs."\n");

/* setting client nls environment */

$sql = "alter session set nls_timestamp_format='MM/DD/YY HH24:MI'";

$stmt = oci_parse($ds, $sql);

oci_execute($stmt);

oci_free_statement($stmt);

/* create directory that will contain rrds (if not exists) */

if(!file_exists(WD.'/'.$cs))

mkdir(WD.'/'.$cs);

if(!file_exists(WD.'/'.$cs.'/wait'))

mkdir(WD.'/'.$cs.'/wait');

/* function to create new RRDs */

function createRRD($name, $interval, $cs) {

$hb = $interval*5; //heartbeat

$cmd="rrdtool create ".WD."/".$cs."/wait/${name}.rrd -s ".$interval." \

-b \"now -3month\" DS:waits:DERIVE:$hb:0:U \

DS:mswaited:DERIVE:$hb:0:U \

RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:30:336 \

RRA:AVERAGE:0.5:120:372 RRA:AVERAGE:0.5:720:730 \

RRA:MIN:0.5:1:1440 RRA:MIN:0.5:30:336 \

RRA:MIN:0.5:120:372 RRA:MIN:0.5:720:730 \

RRA:MAX:0.5:1:1440 RRA:MAX:0.5:30:336 \

RRA:MAX:0.5:120:372 RRA:MAX:0.5:720:730 \

RRA:LAST:0.5:1:1440";

//print $cmd."\n";

return passthru($cmd);

}

/* take the snapshot frequency from dba_hist_wr_control

to create the RDD with correct heartbeat value */

$sql = 'select extract(hour from snap_interval)*3600 +

extract(minute from snap_interval)*60 as SEED from DBA_HIST_WR_CONTROL';

$stmt = oci_parse($ds, $sql);

oci_execute($stmt);

$row = oci_fetch_assoc($stmt);

$interval = $row['SEED'];

unset($row);

oci_free_statement($stmt);

/* statement definition that will collect

all snapshots for a certain wait event with more than

a certain amonut of time waited.

Gathering ALL EVENTS could be time consuming and useless.

I fetch rows ordered by event_name rather

then by date because I can update many values

into the same rrd with very few rrdupdate commands

$sql = 'select s.END_INTERVAL_TIME END_INTERVAL_TIME,

g.EVENT_NAME, g.WAIT_CLASS, g.TOTAL_WAITS,

round(g.TIME_WAITED_MICRO/1000) MS

from DBA_HIST_SNAPSHOT s,

dba_hist_bg_event_summary g,

v$instance i

where s.SNAP_ID=g.SNAP_ID and g.wait_class!=\'Idle\'

and g.TIME_WAITED_MICRO>100000

and s.instance_number=i.instance_number

and s.instance_number=g.instance_number

order by 2,1';

/* default prefetch size (148) matches default snapshot retention (24hx7dd) */

$stmt = oci_parse($ds, $sql);

oci_set_prefetch($stmt, 148);

oci_execute($stmt);

$i=0;

$oldevent="";

while ($row = oci_fetch_assoc($stmt)) {

if ($oldevent != $row['EVENT_NAME']) {

//NEW EVENT DETECTED: WILL START A NEW UPDATE CMD

if ($i != 0 && !empty($cmd)) {

/* not the first occurrence,

I bet there's something in my buffer */

passthru($cmd);

}

$cleanName = preg_replace ("([^[:alnum:]_-])","_",$row['EVENT_NAME']);

// if there is no rrd for this event, I create a new one

if (!file_exists(WD."/".$cs."/wait/${cleanName}.rrd")) {

createRRD($cleanName, $interval, $cs);

}

* I initialize a new update command. This string act as a buffer: I append many

* values to be updated so I'll update many values in a single command line:

* less forks of rrdtool and less file opens: the whole update process has an

* enormous improvement.

$precmd="rrdtool update ".WD."/".$cs."/wait/${cleanName}.rrd ";

$lastcmd="rrdtool info ".WD."/".$cs."/wait/${cleanName}.rrd".

"| grep last_update | awk '{print \$NF}'";

$last=trim(`$lastcmd`);

printf ("%s - %s - last: %d\n", $row['EVENT_NAME'], $cleanName, $last);

$i=0;

$cmd=$precmd;

$oldevent=$row['EVENT_NAME'];

}

$time=strtotime($row['END_INTERVAL_TIME']);

//print "time: ".$time." last: ".$last."\n";

if ( $time > $last ) {

$cmd.=" ".$time.":".$row['TOTAL_WAITS'].":".$row['MS'];

$i++;

}

if ($i >= 40) {

// when I reach 40 values per commandline I force

// the update: next loop will reinitialize a new commandline.

passthru($cmd);

$cmd=$precmd;

$i=0;

}

unset($row);

}

if ($i != 0) {

/* one more update pending in my buffer */

passthru($cmd);

}

oci_free_statement($stmt);

oci_close($ds);

Depending on how many different wait events you have, you’ll have a certain number of rrd files:

# ls -l
total 3864
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 Streams_AQ__enqueue_blocked_on_low_memory.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 buffer_busy_waits.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_parallel_write.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_sequential_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 Apr 30 10:12 cursor__pin_S_wait_on_X.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_scattered_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_sequential_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 events_in_waitclass_Other.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__cache_buffers_chains.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__library_cache.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 latch__library_cache_lock.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 latch__redo_writing.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__row_cache_objects.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__shared_pool.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 library_cache_load_lock.rrd
-rw-r--r-- 1 ludovico ludovico 165304 Apr 15 13:17 library_cache_lock.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_buffer_space.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_parallel_write.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_sequential_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_single_write.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_switch_completion.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 log_file_sync.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 os_thread_startup.rrd

# ls -l

total 3864

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 Streams_AQ__enqueue_blocked_on_low_memory.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 buffer_busy_waits.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_parallel_write.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_sequential_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 Apr 30 10:12 cursor__pin_S_wait_on_X.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_scattered_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_sequential_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 events_in_waitclass_Other.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__cache_buffers_chains.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__library_cache.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 latch__library_cache_lock.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 latch__redo_writing.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__row_cache_objects.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__shared_pool.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 library_cache_load_lock.rrd

-rw-r--r-- 1 ludovico ludovico 165304 Apr 15 13:17 library_cache_lock.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_buffer_space.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_parallel_write.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_sequential_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_single_write.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_switch_completion.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 log_file_sync.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 os_thread_startup.rrd

As you can see, they are not so big…

Once you have your data in rrd files, it’s quite simple to script even complex plots with several datasources. Everything depends on the results you want.
This script stack all my wait events for a certain instance: it takes the directory containing all the rrds as first argument and the number of hours we want to be plotted as second argument:

cs=$1
hours=${2:-148}

eventlist=`ls $cs/wait/*rrd`

colors[1]="#000000"
colors[2]="#000055"
colors[3]="#0000aa"
colors[4]="#0000ff"
colors[5]="#550055"
colors[6]="#aa00aa"
colors[7]="#ff00ff"
colors[8]="#550000"
colors[9]="#aa0000"
colors[10]="#ff0000"
colors[11]="#555500"
colors[12]="#aaaa00"
colors[13]="#ffff00"
colors[14]="#005500"
colors[15]="#00aa00"
colors[16]="#00ff00"
colors[17]="#005555"
colors[18]="#00aaaa"
colors[19]="#00ffff"
colors[20]="#555555"
colors[21]="#aaaaaa"

i=0

for event in $eventlist ; do
        if [ $i -eq 0 ] ; then
                end=`rrdtool info $event | grep last_update | awk '{print $NF}'`
                end=`rrdtool info $cs/wait/control_file_parallel_write.rrd | grep last_update | awk '{print $NF}'`
                cmd="rrdtool graph - -s end-${hours}hours -e $end  -v \"milliseconds waited\" -l 0 -w 640 -h 240 -t \"$cs WAIT PROFILE\""
                i=$(($i+1))
        fi
        color=${colors[$i]}
        echo $color
        evname=`basename $event | sed -e s/\.rrd\$//`
        cmd="$cmd  DEF:$evname=$event:mswaited:AVERAGE"
        cmd="$cmd  AREA:${evname}${color}:"$evname":STACK"
        i=$(($i+1))
        if [ $i -eq 20 ] ; then
                i=1
        fi
done
        cmd="$cmd  |display /dev/input"
        echo $cmd
        eval $cmd
exit

cs=$1

hours=${2:-148}

eventlist=`ls $cs/wait/*rrd`

colors[1]="#000000"

colors[2]="#000055"

colors[3]="#0000aa"

colors[4]="#0000ff"

colors[5]="#550055"

colors[6]="#aa00aa"

colors[7]="#ff00ff"

colors[8]="#550000"

colors[9]="#aa0000"

colors[10]="#ff0000"

colors[11]="#555500"

colors[12]="#aaaa00"

colors[13]="#ffff00"

colors[14]="#005500"

colors[15]="#00aa00"

colors[16]="#00ff00"

colors[17]="#005555"

colors[18]="#00aaaa"

colors[19]="#00ffff"

colors[20]="#555555"

colors[21]="#aaaaaa"

i=0

for event in $eventlist ; do

if [ $i -eq 0 ] ; then

end=`rrdtool info $event | grep last_update | awk '{print $NF}'`

end=`rrdtool info $cs/wait/control_file_parallel_write.rrd | grep last_update | awk '{print $NF}'`

cmd="rrdtool graph - -s end-${hours}hours -e $end -v \"milliseconds waited\" -l 0 -w 640 -h 240 -t \"$cs WAIT PROFILE\""

i=$(($i+1))

color=${colors[$i]}

echo $color

evname=`basename $event | sed -e s/\.rrd\$//`

cmd="$cmd DEF:$evname=$event:mswaited:AVERAGE"

cmd="$cmd AREA:${evname}${color}:"$evname":STACK"

i=$(($i+1))

if [ $i -eq 20 ] ; then

i=1

done

cmd="$cmd |display /dev/input"

echo $cmd

eval $cmd

exit

The resulting command is very long:

rrdtool graph - -s end-148hours -e 1243252800 \
 -v "milliseconds waited" -l 0 -w 640 -h 240 -t "mydb WAIT PROFILE"\
 DEF:Streams_AQ__enqueue_blocked_on_low_memory=mydb/wait/Streams_AQ__enqueue_blocked_on_low_memory.rrd:mswaited:AVERAGE \
 AREA:Streams_AQ__enqueue_blocked_on_low_memory#000000:Streams_AQ__enqueue_blocked_on_low_memory:STACK\
 DEF:buffer_busy_waits=mydb/wait/buffer_busy_waits.rrd:mswaited:AVERAGE \
 AREA:buffer_busy_waits#000055:buffer_busy_waits:STACK\
 DEF:control_file_parallel_write=mydb/wait/control_file_parallel_write.rrd:mswaited:AVERAGE \
 AREA:control_file_parallel_write#0000aa:control_file_parallel_write:STACK\
 DEF:control_file_sequential_read=mydb/wait/control_file_sequential_read.rrd:mswaited:AVERAGE \
 AREA:control_file_sequential_read#0000ff:control_file_sequential_read:STACK\
 DEF:cursor__pin_S_wait_on_X=mydb/wait/cursor__pin_S_wait_on_X.rrd:mswaited:AVERAGE \
 AREA:cursor__pin_S_wait_on_X#550055:cursor__pin_S_wait_on_X:STACK\
 DEF:db_file_scattered_read=mydb/wait/db_file_scattered_read.rrd:mswaited:AVERAGE \
 AREA:db_file_scattered_read#aa00aa:db_file_scattered_read:STACK\
 DEF:db_file_sequential_read=mydb/wait/db_file_sequential_read.rrd:mswaited:AVERAGE \
 AREA:db_file_sequential_read#ff00ff:db_file_sequential_read:STACK\
 DEF:events_in_waitclass_Other=mydb/wait/events_in_waitclass_Other.rrd:mswaited:AVERAGE \
 AREA:events_in_waitclass_Other#550000:events_in_waitclass_Other:STACK\
 DEF:latch__cache_buffers_chains=mydb/wait/latch__cache_buffers_chains.rrd:mswaited:AVERAGE \
 AREA:latch__cache_buffers_chains#aa0000:latch__cache_buffers_chains:STACK\
 DEF:latch__library_cache=mydb/wait/latch__library_cache.rrd:mswaited:AVERAGE \
 AREA:latch__library_cache#ff0000:latch__library_cache:STACK\
 DEF:latch__library_cache_lock=mydb/wait/latch__library_cache_lock.rrd:mswaited:AVERAGE \
 AREA:latch__library_cache_lock#555500:latch__library_cache_lock:STACK\
 DEF:latch__redo_writing=mydb/wait/latch__redo_writing.rrd:mswaited:AVERAGE \
 AREA:latch__redo_writing#aaaa00:latch__redo_writing:STACK\
 DEF:latch__row_cache_objects=mydb/wait/latch__row_cache_objects.rrd:mswaited:AVERAGE \
 AREA:latch__row_cache_objects#ffff00:latch__row_cache_objects:STACK\
 DEF:latch__shared_pool=mydb/wait/latch__shared_pool.rrd:mswaited:AVERAGE \
 AREA:latch__shared_pool#005500:latch__shared_pool:STACK\
 DEF:library_cache_load_lock=mydb/wait/library_cache_load_lock.rrd:mswaited:AVERAGE \
 AREA:library_cache_load_lock#00aa00:library_cache_load_lock:STACK\
 DEF:library_cache_lock=mydb/wait/library_cache_lock.rrd:mswaited:AVERAGE \
 AREA:library_cache_lock#00ff00:library_cache_lock:STACK\
 DEF:log_buffer_space=mydb/wait/log_buffer_space.rrd:mswaited:AVERAGE \
 AREA:log_buffer_space#005555:log_buffer_space:STACK\
 DEF:log_file_parallel_write=mydb/wait/log_file_parallel_write.rrd:mswaited:AVERAGE \
 AREA:log_file_parallel_write#00aaaa:log_file_parallel_write:STACK\
 DEF:log_file_sequential_read=mydb/wait/log_file_sequential_read.rrd:mswaited:AVERAGE \
 AREA:log_file_sequential_read#00ffff:log_file_sequential_read:STACK\
 DEF:log_file_single_write=mydb/wait/log_file_single_write.rrd:mswaited:AVERAGE \
 AREA:log_file_single_write#000000:log_file_single_write:STACK\
 DEF:log_file_switch_completion=mydb/wait/log_file_switch_completion.rrd:mswaited:AVERAGE \
 AREA:log_file_switch_completion#000055:log_file_switch_completion:STACK\
 DEF:log_file_sync=mydb/wait/log_file_sync.rrd:mswaited:AVERAGE \
 AREA:log_file_sync#0000aa:log_file_sync:STACK\
 DEF:os_thread_startup=mydb/wait/os_thread_startup.rrd:mswaited:AVERAGE \
 AREA:os_thread_startup#0000ff:os_thread_startup:STACK |display /dev/input

rrdtool graph - -s end-148hours -e 1243252800 \

-v "milliseconds waited" -l 0 -w 640 -h 240 -t "mydb WAIT PROFILE"\

DEF:Streams_AQ__enqueue_blocked_on_low_memory=mydb/wait/Streams_AQ__enqueue_blocked_on_low_memory.rrd:mswaited:AVERAGE \

AREA:Streams_AQ__enqueue_blocked_on_low_memory#000000:Streams_AQ__enqueue_blocked_on_low_memory:STACK\

DEF:buffer_busy_waits=mydb/wait/buffer_busy_waits.rrd:mswaited:AVERAGE \

AREA:buffer_busy_waits#000055:buffer_busy_waits:STACK\

DEF:control_file_parallel_write=mydb/wait/control_file_parallel_write.rrd:mswaited:AVERAGE \

AREA:control_file_parallel_write#0000aa:control_file_parallel_write:STACK\

DEF:control_file_sequential_read=mydb/wait/control_file_sequential_read.rrd:mswaited:AVERAGE \

AREA:control_file_sequential_read#0000ff:control_file_sequential_read:STACK\

DEF:cursor__pin_S_wait_on_X=mydb/wait/cursor__pin_S_wait_on_X.rrd:mswaited:AVERAGE \

AREA:cursor__pin_S_wait_on_X#550055:cursor__pin_S_wait_on_X:STACK\

DEF:db_file_scattered_read=mydb/wait/db_file_scattered_read.rrd:mswaited:AVERAGE \

AREA:db_file_scattered_read#aa00aa:db_file_scattered_read:STACK\

DEF:db_file_sequential_read=mydb/wait/db_file_sequential_read.rrd:mswaited:AVERAGE \

AREA:db_file_sequential_read#ff00ff:db_file_sequential_read:STACK\

DEF:events_in_waitclass_Other=mydb/wait/events_in_waitclass_Other.rrd:mswaited:AVERAGE \

AREA:events_in_waitclass_Other#550000:events_in_waitclass_Other:STACK\

DEF:latch__cache_buffers_chains=mydb/wait/latch__cache_buffers_chains.rrd:mswaited:AVERAGE \

AREA:latch__cache_buffers_chains#aa0000:latch__cache_buffers_chains:STACK\

DEF:latch__library_cache=mydb/wait/latch__library_cache.rrd:mswaited:AVERAGE \

AREA:latch__library_cache#ff0000:latch__library_cache:STACK\

DEF:latch__library_cache_lock=mydb/wait/latch__library_cache_lock.rrd:mswaited:AVERAGE \

AREA:latch__library_cache_lock#555500:latch__library_cache_lock:STACK\

DEF:latch__redo_writing=mydb/wait/latch__redo_writing.rrd:mswaited:AVERAGE \

AREA:latch__redo_writing#aaaa00:latch__redo_writing:STACK\

DEF:latch__row_cache_objects=mydb/wait/latch__row_cache_objects.rrd:mswaited:AVERAGE \

AREA:latch__row_cache_objects#ffff00:latch__row_cache_objects:STACK\

DEF:latch__shared_pool=mydb/wait/latch__shared_pool.rrd:mswaited:AVERAGE \

AREA:latch__shared_pool#005500:latch__shared_pool:STACK\

DEF:library_cache_load_lock=mydb/wait/library_cache_load_lock.rrd:mswaited:AVERAGE \

AREA:library_cache_load_lock#00aa00:library_cache_load_lock:STACK\

DEF:library_cache_lock=mydb/wait/library_cache_lock.rrd:mswaited:AVERAGE \

AREA:library_cache_lock#00ff00:library_cache_lock:STACK\

DEF:log_buffer_space=mydb/wait/log_buffer_space.rrd:mswaited:AVERAGE \

AREA:log_buffer_space#005555:log_buffer_space:STACK\

DEF:log_file_parallel_write=mydb/wait/log_file_parallel_write.rrd:mswaited:AVERAGE \

AREA:log_file_parallel_write#00aaaa:log_file_parallel_write:STACK\

DEF:log_file_sequential_read=mydb/wait/log_file_sequential_read.rrd:mswaited:AVERAGE \

AREA:log_file_sequential_read#00ffff:log_file_sequential_read:STACK\

DEF:log_file_single_write=mydb/wait/log_file_single_write.rrd:mswaited:AVERAGE \

AREA:log_file_single_write#000000:log_file_single_write:STACK\

DEF:log_file_switch_completion=mydb/wait/log_file_switch_completion.rrd:mswaited:AVERAGE \

AREA:log_file_switch_completion#000055:log_file_switch_completion:STACK\

DEF:log_file_sync=mydb/wait/log_file_sync.rrd:mswaited:AVERAGE \

AREA:log_file_sync#0000aa:log_file_sync:STACK\

DEF:os_thread_startup=mydb/wait/os_thread_startup.rrd:mswaited:AVERAGE \

AREA:os_thread_startup#0000ff:os_thread_startup:STACK |display /dev/input

This is the resulting graph:

OHHHHHHHHHHHH COOOOL!!!
😉

Any comment is appreciated! thanks

How to collect Oracle Application Server performance data with DMS and RRDtool

Posted on March 2, 2009 by Ludovico

RRDize everything, chapter 1

If you are managing some Application Server deployments you should have wondered how to check and collect performance data.
As stated in documentation, you can gather performance metrics with the dmstool utility.
AFAIK, this can be done from 9.0.2 release upwards, but i’m concerned DMS will not work on Weblogic.

Mainly, you should have an external server that acts as collector (it could be a server in the Oracle AS farm as well): copy the dms.jar library from an Oracle AS installation to your collector and use it as you would use dmstool:

java -jar dms.jar [dmstool options]

1	java -jar dms.jar [dmstool options]

There are three basilar methods to get data:

Get all metrics at once:

java -jar dms.jar -dump -a "youraddress://..." [format=xml]

1	java -jar dms.jar -dump -a "youraddress://..." [format=xml]

Get only the interesting metrics:

java -jar dms.jar -a "youraddress://..." metric metric ...

1	java -jar dms.jar -a "youraddress://..." metric metric ...

Get metrics included into specific DMS tables:

java -jar dms.jar -a "youraddress://..." -table table table ...

1	java -jar dms.jar -a "youraddress://..." -table table table ...

What youraddress:// is, it depends on the component you are trying to connect:

opmn://asserver:6003
http://asserver:7200/dms0/Spy
ajp13://asserver:3301/dmsoc4j/Spy

opmn://asserver:6003

http://asserver:7200/dms0/Spy

ajp13://asserver:3301/dmsoc4j/Spy

If you are trying to connect to the OHS (Apache), be careful to allow remote access from the collector by editing the dms.conf file.

Now that you can query dms data, you should store it somewhere.
Personally, I did a first attempt with dmstool -dump format=xml. I wrote a parser in PHP with SimpleXML extension and I did a lot of inserts into a MySQL database. After a few months the whole data collected from tens of servers was too much to be mantained…
To avoid the maintenance of a DWH-grade database I investigated and found RRDTool. Now I’m asking how could I live without it!

I then wrote a parser in awk that parse the output of the dms.jar call and invoke an rrdtool update command.
I always use dms.jar -table command. The output has always the same format:

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------
TABLE1_Name
---------------

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

---
TABLE2_Name
---

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

##EOF

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------

TABLE1_Name

---------------

record1_metric1.name: value units

record1_metric2.name: value units

....

record2_metric1.name: value units

record2_metric2.name: value units

....

---

TABLE2_Name

---

record1_metric1.name: value units

record1_metric2.name: value units

....

record2_metric1.name: value units

record2_metric2.name: value units

....

##EOF

So I written an awk file that works for me.
use it this way:

 java -jar dms.jar ... | awk -f parse_output.awk

1	java -jar dms.jar ... \| awk -f parse_output.awk

####################
# parse_output.awk #
####################

#function pl() replaces all non alphanumeric occurrences with an underscore
function pl(input) {
        return gensub("[^[:alnum:]_-]","_","G",input);
}

# function get_rrd_path() returns a path where the rrd files should be placed
# I should rewrite a new path for each dms table... I'll skip many of them
function get_rrd_path() {
        if (table == "mod_oc4j_destination_metrics")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Name.value"]), pl(var) );
        if (table == "mod_oc4j_mount_pt_metrics")
                return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );
        if (table == "ohs_server")
                return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
        if (table == "JVM")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Process"]), pl(var) );
        if (table == "opmn_process")
                return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),
                  pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),
                  pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),
                  pl(record["Name"]), pl(var) );

        return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
}
# function process_record actually does the dirty work of invoking the update script
function process_record() {
        #every record has a timeStamp.ts metric that I should use to update my rrd
        ts=substr(record["timeStamp.ts"],0,10);
        for ( var in record ) {
        if ( var != "timeStamp.ts" &amp;&amp; record[var] ~ /^[[:digit:]]+$/ ) {
            if ( var ~ /\.(count|completed|time)$/ ) {
                dstype="DERIVE";
            } else {
                if ( var == "responseSize.value" ) {
                    dstype="DERIVE";
                } else {
                    dstype="GAUGE";
                }
            }
            rrdFile=sprintf("/path_to_data/%s",get_rrd_path());
            #### update_metric_rrd is a shell script listed below!!!!!
            cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",
                rrdFile,dstype,ts,record[var]);
            system(cmd);
            }
        }
}

# parse_record() populates an hash array
# with all metrics belonging to the table record
function parse_record() {
    #print "RRRR -  START OF RECORD (table " table ")"
    delete record
    while ( ! /^$/ ) {
        # I'm parsing the record as far I'm in this while statement
        # the array hash is the name of the dms metric basename.
        # $1 is the metric name but I have to trim the final ":"
        key=substr($1,0,length($1)-1)
        record[key]=$2
        getline
    }
    # this function is included in funcions.awk:
    # I invoke it to process the record I've just parsed
    process_record();
}
BEGIN {
    # as far as started is 0, I've never reached the first table
    started=0
}

#MAIN
{
    # I jump over the first lines until I reach the first table
    if (started==0) {
        while ( ! /^---/ ) {
           getline
        }
        started=1
    }

    # looking for the next occurrence of a table
    # all tables start with:
    # ----------
    # table_name
    # ----------
    if ( /^---/ ) {
        # first table reached: the next row is my table name,
        # then I reach again a dashed line -----
        getline table
        getline trash
        #print ""
        #print "##########################"
        print "  TABELLA " table
        #print "##########################"
        next
    }

    if ( ! /^$/ ) {
        # reached an empty line: could be the end of a record or the and of a table
        # since a new table is threated in previous "if" statement, I'm starting a new record.
        parse_record()
    }

}

END {
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

####################

# parse_output.awk #

####################

#function pl() replaces all non alphanumeric occurrences with an underscore

function pl(input) {

return gensub("[^[:alnum:]_-]","_","G",input);

}

# function get_rrd_path() returns a path where the rrd files should be placed

# I should rewrite a new path for each dms table... I'll skip many of them

function get_rrd_path() {

if (table == "mod_oc4j_destination_metrics")

return sprintf("%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Name.value"]), pl(var) );

if (table == "mod_oc4j_mount_pt_metrics")

return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );

if (table == "ohs_server")

return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );

if (table == "JVM")

return sprintf("%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Process"]), pl(var) );

if (table == "opmn_process")

return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),

pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),

pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),

pl(record["Name"]), pl(var) );

return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );

}

# function process_record actually does the dirty work of invoking the update script

function process_record() {

#every record has a timeStamp.ts metric that I should use to update my rrd

ts=substr(record["timeStamp.ts"],0,10);

for ( var in record ) {

if ( var != "timeStamp.ts" && record[var] ~ /^[[:digit:]]+$/ ) {

if ( var ~ /\.(count|completed|time)$/ ) {

dstype="DERIVE";

} else {

if ( var == "responseSize.value" ) {

dstype="DERIVE";

} else {

dstype="GAUGE";

}

rrdFile=sprintf("/path_to_data/%s",get_rrd_path());

#### update_metric_rrd is a shell script listed below!!!!!

cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",

rrdFile,dstype,ts,record[var]);

system(cmd);

}

# parse_record() populates an hash array

# with all metrics belonging to the table record

function parse_record() {

#print "RRRR - START OF RECORD (table " table ")"

delete record

while ( ! /^$/ ) {

# I'm parsing the record as far I'm in this while statement

# the array hash is the name of the dms metric basename.

# $1 is the metric name but I have to trim the final ":"

key=substr($1,0,length($1)-1)

record[key]=$2

getline

}

# this function is included in funcions.awk:

# I invoke it to process the record I've just parsed

process_record();

}

BEGIN {

# as far as started is 0, I've never reached the first table

started=0

}

#MAIN

{

# I jump over the first lines until I reach the first table

if (started==0) {

while ( ! /^---/ ) {

getline

}

started=1

}

# looking for the next occurrence of a table

# all tables start with:

# ----------

# table_name

# ----------

if ( /^---/ ) {

# first table reached: the next row is my table name,

# then I reach again a dashed line -----

getline table

getline trash

#print ""

#print "##########################"

print " TABELLA " table

#print "##########################"

}

if ( ! /^$/ ) {

# reached an empty line: could be the end of a record or the and of a table

# since a new table is threated in previous "if" statement, I'm starting a new record.

parse_record()

}

END {

}

And this is the code for update_metric_rrd:

#!/bin/bash
RRDFILE=$1
DSTYPE=$2
TS=$3
VALUE=$4

rrdtool update $RRDFILE ${TS}:${VALUE}

if [ $? -ne 0 ] ; then
        DIR=`dirname $RRDFILE`

        [ -d $DIR ] || mkdir -p $DIR
        [ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \
                DS:metric:${DSTYPE}:7200:0:U \
                RRA:AVERAGE:0.5:1:672 \
                RRA:AVERAGE:0.5:4:1080 \
                RRA:AVERAGE:0.5:12:1460 \
                RRA:AVERAGE:0.5:48:1095 \
                RRA:MAX:0.5:4:1080 \
                RRA:MAX:0.5:12:1460 \
                RRA:MAX:0.5:48:1095 \
                RRA:LAST:0.5:1:672
        rrdtool update $RRDFILE ${TS}:${VALUE}
fi

#!/bin/bash

RRDFILE=$1

DSTYPE=$2

TS=$3

VALUE=$4

rrdtool update $RRDFILE ${TS}:${VALUE}

if [ $? -ne 0 ] ; then

DIR=`dirname $RRDFILE`

[ -d $DIR ] || mkdir -p $DIR

[ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \

DS:metric:${DSTYPE}:7200:0:U \

RRA:AVERAGE:0.5:1:672 \

RRA:AVERAGE:0.5:4:1080 \

RRA:AVERAGE:0.5:12:1460 \

RRA:AVERAGE:0.5:48:1095 \

RRA:MAX:0.5:4:1080 \

RRA:MAX:0.5:12:1460 \

RRA:MAX:0.5:48:1095 \

RRA:LAST:0.5:1:672

rrdtool update $RRDFILE ${TS}:${VALUE}

Once you have all your rrd files populated, it’s easy to script automatic reporting. You would probably want a graph with the request count served by your Apache cluster, along with its linear regression:

rrdtool graph - -s "end-${hours}hours" -e $end \
                -v "Requests Completed/sec" \
        -w 640 -h 240 --slope-mode \
                -t "HTTP Requests for www.ludovicocaldara.net" \
                DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \
                DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \
                CDEF:request_completed=1request_completed,2request_completed,+ \
                VDEF:slope=request_completed,LSLSLOPE \
                VDEF:lslint=request_completed,LSLINT \
                CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \
                LINE1:reg#666666:"Regression" \
                AREA:1request_completed#4040AA:"wwwserver1"  \
                AREA:2request_completed#6666FF:"wwwserver1":STACK  \
        &gt; mygraph.png

rrdtool graph - -s "end-${hours}hours" -e $end \

-v "Requests Completed/sec" \

-w 640 -h 240 --slope-mode \

-t "HTTP Requests for www.ludovicocaldara.net" \

DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \

DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \

CDEF:request_completed=1request_completed,2request_completed,+ \

VDEF:slope=request_completed,LSLSLOPE \

VDEF:lslint=request_completed,LSLINT \

CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \

LINE1:reg#666666:"Regression" \

AREA:1request_completed#4040AA:"wwwserver1" \

AREA:2request_completed#6666FF:"wwwserver1":STACK \

> mygraph.png

This is the result:
OHS request completed
OHHHHHHHHHHHH!!!! COOL!!!!

That’s all for DMS capacity planning. Stay tuned, more about rrdtool is coming!

JBoss Portal and MySQL scalability: What The…???

Posted on December 10, 2008 by Ludovico

I found several queries running on a MySQL 5.0 database like this one:

SELECT PATH, NAME FROM JBP_OBJECT_NODE  WHERE PK IN (
SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'LUDOVICO'
and ju.jbp_enabled = 1));

SELECT PATH, NAME FROM JBP_OBJECT_NODE WHERE PK IN (

SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'LUDOVICO'

and ju.jbp_enabled = 1));

This query is related to JBoss Portal and does a full scan on table JBP_OBJECT_NODE.

It has bad performances (>0.8 sec) with just a few records:

mysql> select count(*) from JBP_OBJECT_NODE;
+———-+
| count(*) |
+———-+
| 33461 |
+———-+

If I rewrite the query using an inner join (à la Oracle, please forgive me) instead of a subquery I get an index scan:

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from  JBP_OBJECT_NODE
a, JBP_OBJECT_NODE_SEC b
where a.pk=b.NODE_KEY
and b.ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'UTDEMO'
and ju.jbp_enabled = 1);

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from JBP_OBJECT_NODE

a, JBP_OBJECT_NODE_SEC b

where a.pk=b.NODE_KEY

and b.ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'UTDEMO'

and ju.jbp_enabled = 1);

With 30k records the execution time falls down from 0.8 secs to 0.01 secs…
That’s NOT all! I found this open bug:

https://jira.jboss.org/jira/browse/JBPORTAL-2040

With many users registered in, the JBoss Portal Admin console tooks over a minute to show a single page…

I don’t like portals…