DBA survival BLOG

DBA stuff and Oracle Data Guard

Dataguard check script for Real Application Clusters (MAA)

Posted on December 31, 2010 by Ludovico

Two years after my posts:
Quick Oracle Dataguard check script and More about Dataguard and how to check it I faced a whole new Dataguard between two Oracle Real Application Clusters, aka Oracle Maximum Availability Architecture (MAA).

This enviromnent is relying on Windows OS. Don’t know how this could be called “availability” but here we are. I revisited my scripts in a quick and very dirty way. Please consider that I did copy and paste to check the alignment once per thread, but it should be improved with some kind of iteration to check each thread in a more structured fashion.

#!D:\oracle\product\10.2.0\db_1\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl.exe -w
use DBI;
use DBD::Oracle qw(:ora_session_modes);
# DB connection #
my $prod  = "prod";
my $stby = "stby";
my $prodh;
unless ($prodh = DBI-&gt;connect('dbi:Oracle:'.$prod, 
    'sys', 'strongpwd', 
    {PrintError=&gt;0, AutoCommit =&gt; 0,
    ora_session_mode =&gt; ORA_SYSDBA}))  {
print "Error connecting to DB: $DBI::errstr\n";
exit(1);
}
$prodh-&gt;{RaiseError}=1;

my $stbyh;
unless ($stbyh = DBI-&gt;connect('dbi:Oracle:'.$stby,
    'sys', 'strongpwd',
    {PrintError=&gt;0, AutoCommit =&gt; 0,
    ora_session_mode =&gt; ORA_SYSDBA}))  {
print "Error connecting to DB: $DBI::errstr\n";
$prodh-&gt;disconnect;
exit(1);
}
$stbyh-&gt;{RaiseError}=1;

my $sth;
### query stdby MRP0
$sth = $stbyh-&gt;prepare( &lt;&lt;EOSQL );
select thread#, SEQUENCE#, BLOCK#
    from gv\$managed_standby 
    where process='MRP0'
EOSQL
$sth-&gt;execute();
my ($mrpthread, $mrpsequence, $mrpblock) = $sth-&gt;fetchrow_array();
$sth-&gt;finish();

### query stdby RFS
$sth = $stbyh-&gt;prepare( &lt;&lt;EOSQL );
select thread#, SEQUENCE#, BLOCK#
    from gv\$managed_standby 
    where process='RFS' and client_process='LGWR' order by thread#
EOSQL
$sth-&gt;execute();
my ($rfsthread1, $rfssequence1, $rfsblock1) = $sth-&gt;fetchrow_array();
my ($rfsthread2, $rfssequence2, $rfsblock2) = $sth-&gt;fetchrow_array();
$sth-&gt;finish();

### query prod
$sth = $prodh-&gt;prepare( &lt;&lt;EOSQL );
select thread#, SEQUENCE#, BLOCK#
    from gv\$managed_standby
    where process='LNS' order by thread#
EOSQL
$sth-&gt;execute();
my ($pthread1, $psequence1, $pblock1) = $sth-&gt;fetchrow_array();
my ($pthread2, $psequence2, $pblock2) = $sth-&gt;fetchrow_array();
$sth-&gt;finish();


printf ("ENVIRONM  Thread Sequence   Block\n");
printf ("--------- ------ ---------- ----------\n");
printf ("PROD     LNS1  1 %10d %10d\n", $psequence1, $pblock1);
printf ("STANDBY  RFS1  1 %10d %10d\n", $rfssequence1, $rfsblock1);
printf ("PROD     LSN2  2 %10d %10d\n", $psequence2, $pblock2);
printf ("STANDBY  RFS2  2 %10d %10d\n", $rfssequence2, $rfsblock2);
printf ("STANDBY  MRP0  %d %10d %10d\n", $mrpthread, $mrpsequence, $mrpblock);

my $psequence;
my $pblock;
if ( $mrpthread == 1 ) {
$psequence=$psequence1;
$pblock=$pblock1;
} else {
$psequence=$psequence2;
$pblock=$pblock2;
}

$sth = $stbyh-&gt;prepare( &lt;&lt;EOSQL );
select nvl(sum(blocks),0)
+ $pblock - $mrpblock as BLOCK_GAP
from gv\$archived_log
where thread#=$mrpthread and sequence#
between $mrpsequence and $psequence
EOSQL
$sth-&gt;execute();
my ($mrpblockgap) = $sth-&gt;fetchrow_array();
$sth-&gt;finish();

$sth = $stbyh-&gt;prepare( &lt;&lt;EOSQL );
select nvl(sum(blocks),0)
+ $pblock1 - $rfsblock1 as BLOCK_GAP
from gv\$archived_log
where thread#=1 and sequence#
between $rfssequence1 and $psequence1
EOSQL
$sth-&gt;execute();
my ($rfsblockgap1) = $sth-&gt;fetchrow_array();
$sth-&gt;finish();

$sth = $stbyh-&gt;prepare( &lt;&lt;EOSQL );
select nvl(sum(blocks),0)
+ $pblock2 - $rfsblock2 as BLOCK_GAP
from gv\$archived_log
where thread#=2 and sequence#
between $rfssequence2 and $psequence2
EOSQL
$sth-&gt;execute();
my ($rfsblockgap2) = $sth-&gt;fetchrow_array();
$sth-&gt;finish();
printf ("\n\n%-10d blocks gap in TRANSMISSION\n", $rfsblockgap1+$rfsblockgap2);
printf ("%-10d blocks gap in APPLY (MRP0)\n", $mrpblockgap);

$stbyh-&gt;disconnect;
$prodh-&gt;disconnect;

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

#!D:\oracle\product\10.2.0\db_1\perl\5.8.3\bin\MSWin32-x86-multi-thread\perl.exe -w

use DBI;

use DBD::Oracle qw(:ora_session_modes);

# DB connection #

my $prod = "prod";

my $stby = "stby";

my $prodh;

unless ($prodh = DBI->connect('dbi:Oracle:'.$prod,

'sys', 'strongpwd',

{PrintError=>0, AutoCommit => 0,

ora_session_mode => ORA_SYSDBA})) {

print "Error connecting to DB: $DBI::errstr\n";

exit(1);

}

$prodh->{RaiseError}=1;

my $stbyh;

unless ($stbyh = DBI->connect('dbi:Oracle:'.$stby,

'sys', 'strongpwd',

{PrintError=>0, AutoCommit => 0,

ora_session_mode => ORA_SYSDBA})) {

print "Error connecting to DB: $DBI::errstr\n";

$prodh->disconnect;

exit(1);

}

$stbyh->{RaiseError}=1;

my $sth;

### query stdby MRP0

$sth = $stbyh->prepare( <<EOSQL );

select thread#, SEQUENCE#, BLOCK#

from gv\$managed_standby

where process='MRP0'

EOSQL

$sth->execute();

my ($mrpthread, $mrpsequence, $mrpblock) = $sth->fetchrow_array();

$sth->finish();

### query stdby RFS

$sth = $stbyh->prepare( <<EOSQL );

select thread#, SEQUENCE#, BLOCK#

from gv\$managed_standby

where process='RFS' and client_process='LGWR' order by thread#

EOSQL

$sth->execute();

my ($rfsthread1, $rfssequence1, $rfsblock1) = $sth->fetchrow_array();

my ($rfsthread2, $rfssequence2, $rfsblock2) = $sth->fetchrow_array();

$sth->finish();

### query prod

$sth = $prodh->prepare( <<EOSQL );

select thread#, SEQUENCE#, BLOCK#

from gv\$managed_standby

where process='LNS' order by thread#

EOSQL

$sth->execute();

my ($pthread1, $psequence1, $pblock1) = $sth->fetchrow_array();

my ($pthread2, $psequence2, $pblock2) = $sth->fetchrow_array();

$sth->finish();

printf ("ENVIRONM Thread Sequence Block\n");

printf ("--------- ------ ---------- ----------\n");

printf ("PROD LNS1 1 %10d %10d\n", $psequence1, $pblock1);

printf ("STANDBY RFS1 1 %10d %10d\n", $rfssequence1, $rfsblock1);

printf ("PROD LSN2 2 %10d %10d\n", $psequence2, $pblock2);

printf ("STANDBY RFS2 2 %10d %10d\n", $rfssequence2, $rfsblock2);

printf ("STANDBY MRP0 %d %10d %10d\n", $mrpthread, $mrpsequence, $mrpblock);

my $psequence;

my $pblock;

if ( $mrpthread == 1 ) {

$psequence=$psequence1;

$pblock=$pblock1;

} else {

$psequence=$psequence2;

$pblock=$pblock2;

}

$sth = $stbyh->prepare( <<EOSQL );

select nvl(sum(blocks),0)

+ $pblock - $mrpblock as BLOCK_GAP

from gv\$archived_log

where thread#=$mrpthread and sequence#

between $mrpsequence and $psequence

EOSQL

$sth->execute();

my ($mrpblockgap) = $sth->fetchrow_array();

$sth->finish();

$sth = $stbyh->prepare( <<EOSQL );

select nvl(sum(blocks),0)

+ $pblock1 - $rfsblock1 as BLOCK_GAP

from gv\$archived_log

where thread#=1 and sequence#

between $rfssequence1 and $psequence1

EOSQL

$sth->execute();

my ($rfsblockgap1) = $sth->fetchrow_array();

$sth->finish();

$sth = $stbyh->prepare( <<EOSQL );

select nvl(sum(blocks),0)

+ $pblock2 - $rfsblock2 as BLOCK_GAP

from gv\$archived_log

where thread#=2 and sequence#

between $rfssequence2 and $psequence2

EOSQL

$sth->execute();

my ($rfsblockgap2) = $sth->fetchrow_array();

$sth->finish();

printf ("\n\n%-10d blocks gap in TRANSMISSION\n", $rfsblockgap1+$rfsblockgap2);

printf ("%-10d blocks gap in APPLY (MRP0)\n", $mrpblockgap);

$stbyh->disconnect;

$prodh->disconnect;

Please foreward me every improvement you implement over my code: it would be nice to post it here.

Oracle capacity planning with RRDTOOL

Posted on May 25, 2009 by Ludovico

RRDize everything, chapter 2

Oracle Database Server has the most powerful system catalog that allows to query almost any aspect inside an oracle instance.
You can query many v$ fixed views at regular intervals and populate many RRD files through rrdtool: space usage, wait events. system statistics and so on…

Since release 10.1 Oracle has introduced Automatic Workload Repository, a finer version of old good Statspack.
No matter if you are using AWR or statspack, you can rely on their views to collect data for your RRDs.

If you are administering a new instance and you haven’t collected its statistics so far, you can query (as example) the DBA_HIST_BG_EVENT_SUMMARY view to gather all AWR data about wait events. Historical views could be useful also to collect historical data once a week rather than query the fixed views every few minutes doing the hard work twice (you and AWR).

The whole process of gathering performance data and update rrd files can be resumed into the following steps:

– connect to the database
– query the AWR’s views
– build and execute an rrdtool update command
– check if rrd file exists or create it
– update the rrd file

The less rrdtool update commands you will execute, the better the whole process will perform.
Do it in a language you are comfortable with and that supports easily connection descriptors.

Since I’m very comfortable with php, I did it this way.

This is a very basilar script that works greatly for me with good performances:

#!/usr/bin/php -f
< ?php                                         

define('WD','/opt/oracle/awr');
$cs         = $_SERVER['argv'][1];
$user       = 'mymonitoruser';
$pass       = 'mystrongpassword'; 

/* open a new connection */
$ds = oci_connect($user, $pass, $cs)
        or die ("Cannot connect to Oracle Database ".$cs."\n");

/* setting client nls environment */
$sql = "alter session set nls_timestamp_format='MM/DD/YY HH24:MI'";
$stmt = oci_parse($ds, $sql);
oci_execute($stmt);
oci_free_statement($stmt);                                         

/* create directory that will contain rrds (if not exists) */
if(!file_exists(WD.'/'.$cs))
                mkdir(WD.'/'.$cs);
if(!file_exists(WD.'/'.$cs.'/wait'))
                mkdir(WD.'/'.$cs.'/wait');                   

/* function to create new RRDs */
function createRRD($name, $interval, $cs) {
        $hb = $interval*5; //heartbeat
        $cmd="rrdtool create ".WD."/".$cs."/wait/${name}.rrd -s ".$interval." \
                -b \"now -3month\" DS:waits:DERIVE:$hb:0:U \
                DS:mswaited:DERIVE:$hb:0:U \
                RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:30:336 \
                RRA:AVERAGE:0.5:120:372 RRA:AVERAGE:0.5:720:730 \
                RRA:MIN:0.5:1:1440 RRA:MIN:0.5:30:336 \
                RRA:MIN:0.5:120:372 RRA:MIN:0.5:720:730 \
                RRA:MAX:0.5:1:1440 RRA:MAX:0.5:30:336 \
                RRA:MAX:0.5:120:372 RRA:MAX:0.5:720:730 \
                RRA:LAST:0.5:1:1440";
        //print $cmd."\n";
        return passthru($cmd);
}                                                                              

/* take the snapshot frequency from dba_hist_wr_control
 to create the RDD with correct heartbeat value */
$sql = 'select extract(hour from snap_interval)*3600 +
extract(minute from snap_interval)*60 as SEED from DBA_HIST_WR_CONTROL';
$stmt = oci_parse($ds, $sql);
oci_execute($stmt);
$row = oci_fetch_assoc($stmt);
$interval = $row['SEED'];
unset($row);
oci_free_statement($stmt);                                              

/* statement definition that will collect
 all snapshots for a certain wait event with more than
 a certain amonut of time waited.
 Gathering ALL EVENTS could be time consuming and useless.
 I fetch rows ordered by event_name rather
 then by date because I can update many values
 into the same rrd with very few rrdupdate commands
*/
$sql = 'select s.END_INTERVAL_TIME END_INTERVAL_TIME,
    g.EVENT_NAME, g.WAIT_CLASS, g.TOTAL_WAITS,
    round(g.TIME_WAITED_MICRO/1000) MS
  from DBA_HIST_SNAPSHOT s,
   dba_hist_bg_event_summary g,
   v$instance i
 where s.SNAP_ID=g.SNAP_ID and g.wait_class!=\'Idle\'
  and g.TIME_WAITED_MICRO&gt;100000
  and s.instance_number=i.instance_number
  and s.instance_number=g.instance_number
 order by 2,1';                                      

/* default prefetch size (148) matches default snapshot retention (24hx7dd) */
$stmt = oci_parse($ds, $sql);
oci_set_prefetch($stmt, 148);
oci_execute($stmt);

$i=0;
$oldevent="";
while ($row = oci_fetch_assoc($stmt)) {
        if ($oldevent != $row['EVENT_NAME']) {
                //NEW EVENT DETECTED: WILL START A NEW UPDATE CMD
                if ($i != 0 &amp;&amp; !empty($cmd)) {
                        /* not the first occurrence,
                         I bet there's something in my buffer */
                        passthru($cmd);
                }
                $cleanName = preg_replace ("([^[:alnum:]_-])","_",$row['EVENT_NAME']);
                // if there is no rrd for this event, I create a new one
                if (!file_exists(WD."/".$cs."/wait/${cleanName}.rrd")) {
                        createRRD($cleanName, $interval, $cs);
                }
                /*
                * I initialize a new update command. This string act as a buffer: I append many
                * values to be updated so I'll update many values in a single command line:
                * less forks of rrdtool and less file opens: the whole update process has an
                * enormous improvement.
                */
                $precmd="rrdtool update ".WD."/".$cs."/wait/${cleanName}.rrd ";
                $lastcmd="rrdtool info ".WD."/".$cs."/wait/${cleanName}.rrd".
                        "| grep last_update | awk '{print \$NF}'";
                $last=trim(`$lastcmd`);
                printf ("%s - %s - last: %d\n", $row['EVENT_NAME'], $cleanName, $last);
                $i=0;
                $cmd=$precmd;
                $oldevent=$row['EVENT_NAME'];
        }
        $time=strtotime($row['END_INTERVAL_TIME']);
        //print "time: ".$time."  last: ".$last."\n";
        if ( $time &gt; $last ) {
                $cmd.=" ".$time.":".$row['TOTAL_WAITS'].":".$row['MS'];
                $i++;
        }
        if ($i &gt;= 40) {
                // when I reach 40 values per commandline I force
                // the update: next loop will reinitialize a new commandline.
                passthru($cmd);
                $cmd=$precmd;
                $i=0;
        }
        unset($row);

}
if ($i != 0) {
        /* one more update pending in my buffer */
        passthru($cmd);
}
oci_free_statement($stmt);
oci_close($ds);
?>

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

#!/usr/bin/php -f

< ?php

define('WD','/opt/oracle/awr');

$cs = $_SERVER['argv'][1];

$user = 'mymonitoruser';

$pass = 'mystrongpassword';

/* open a new connection */

$ds = oci_connect($user, $pass, $cs)

or die ("Cannot connect to Oracle Database ".$cs."\n");

/* setting client nls environment */

$sql = "alter session set nls_timestamp_format='MM/DD/YY HH24:MI'";

$stmt = oci_parse($ds, $sql);

oci_execute($stmt);

oci_free_statement($stmt);

/* create directory that will contain rrds (if not exists) */

if(!file_exists(WD.'/'.$cs))

mkdir(WD.'/'.$cs);

if(!file_exists(WD.'/'.$cs.'/wait'))

mkdir(WD.'/'.$cs.'/wait');

/* function to create new RRDs */

function createRRD($name, $interval, $cs) {

$hb = $interval*5; //heartbeat

$cmd="rrdtool create ".WD."/".$cs."/wait/${name}.rrd -s ".$interval." \

-b \"now -3month\" DS:waits:DERIVE:$hb:0:U \

DS:mswaited:DERIVE:$hb:0:U \

RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:30:336 \

RRA:AVERAGE:0.5:120:372 RRA:AVERAGE:0.5:720:730 \

RRA:MIN:0.5:1:1440 RRA:MIN:0.5:30:336 \

RRA:MIN:0.5:120:372 RRA:MIN:0.5:720:730 \

RRA:MAX:0.5:1:1440 RRA:MAX:0.5:30:336 \

RRA:MAX:0.5:120:372 RRA:MAX:0.5:720:730 \

RRA:LAST:0.5:1:1440";

//print $cmd."\n";

return passthru($cmd);

}

/* take the snapshot frequency from dba_hist_wr_control

to create the RDD with correct heartbeat value */

$sql = 'select extract(hour from snap_interval)*3600 +

extract(minute from snap_interval)*60 as SEED from DBA_HIST_WR_CONTROL';

$stmt = oci_parse($ds, $sql);

oci_execute($stmt);

$row = oci_fetch_assoc($stmt);

$interval = $row['SEED'];

unset($row);

oci_free_statement($stmt);

/* statement definition that will collect

all snapshots for a certain wait event with more than

a certain amonut of time waited.

Gathering ALL EVENTS could be time consuming and useless.

I fetch rows ordered by event_name rather

then by date because I can update many values

into the same rrd with very few rrdupdate commands

$sql = 'select s.END_INTERVAL_TIME END_INTERVAL_TIME,

g.EVENT_NAME, g.WAIT_CLASS, g.TOTAL_WAITS,

round(g.TIME_WAITED_MICRO/1000) MS

from DBA_HIST_SNAPSHOT s,

dba_hist_bg_event_summary g,

v$instance i

where s.SNAP_ID=g.SNAP_ID and g.wait_class!=\'Idle\'

and g.TIME_WAITED_MICRO>100000

and s.instance_number=i.instance_number

and s.instance_number=g.instance_number

order by 2,1';

/* default prefetch size (148) matches default snapshot retention (24hx7dd) */

$stmt = oci_parse($ds, $sql);

oci_set_prefetch($stmt, 148);

oci_execute($stmt);

$i=0;

$oldevent="";

while ($row = oci_fetch_assoc($stmt)) {

if ($oldevent != $row['EVENT_NAME']) {

//NEW EVENT DETECTED: WILL START A NEW UPDATE CMD

if ($i != 0 && !empty($cmd)) {

/* not the first occurrence,

I bet there's something in my buffer */

passthru($cmd);

}

$cleanName = preg_replace ("([^[:alnum:]_-])","_",$row['EVENT_NAME']);

// if there is no rrd for this event, I create a new one

if (!file_exists(WD."/".$cs."/wait/${cleanName}.rrd")) {

createRRD($cleanName, $interval, $cs);

}

* I initialize a new update command. This string act as a buffer: I append many

* values to be updated so I'll update many values in a single command line:

* less forks of rrdtool and less file opens: the whole update process has an

* enormous improvement.

$precmd="rrdtool update ".WD."/".$cs."/wait/${cleanName}.rrd ";

$lastcmd="rrdtool info ".WD."/".$cs."/wait/${cleanName}.rrd".

"| grep last_update | awk '{print \$NF}'";

$last=trim(`$lastcmd`);

printf ("%s - %s - last: %d\n", $row['EVENT_NAME'], $cleanName, $last);

$i=0;

$cmd=$precmd;

$oldevent=$row['EVENT_NAME'];

}

$time=strtotime($row['END_INTERVAL_TIME']);

//print "time: ".$time." last: ".$last."\n";

if ( $time > $last ) {

$cmd.=" ".$time.":".$row['TOTAL_WAITS'].":".$row['MS'];

$i++;

}

if ($i >= 40) {

// when I reach 40 values per commandline I force

// the update: next loop will reinitialize a new commandline.

passthru($cmd);

$cmd=$precmd;

$i=0;

}

unset($row);

}

if ($i != 0) {

/* one more update pending in my buffer */

passthru($cmd);

}

oci_free_statement($stmt);

oci_close($ds);

Depending on how many different wait events you have, you’ll have a certain number of rrd files:

# ls -l
total 3864
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 Streams_AQ__enqueue_blocked_on_low_memory.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 buffer_busy_waits.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_parallel_write.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_sequential_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 Apr 30 10:12 cursor__pin_S_wait_on_X.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_scattered_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_sequential_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 events_in_waitclass_Other.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__cache_buffers_chains.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__library_cache.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 latch__library_cache_lock.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 latch__redo_writing.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__row_cache_objects.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__shared_pool.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 library_cache_load_lock.rrd
-rw-r--r-- 1 ludovico ludovico 165304 Apr 15 13:17 library_cache_lock.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_buffer_space.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_parallel_write.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_sequential_read.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_single_write.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_switch_completion.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 log_file_sync.rrd
-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 os_thread_startup.rrd

# ls -l

total 3864

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 Streams_AQ__enqueue_blocked_on_low_memory.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 buffer_busy_waits.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_parallel_write.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_sequential_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 Apr 30 10:12 cursor__pin_S_wait_on_X.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_scattered_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_sequential_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 events_in_waitclass_Other.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__cache_buffers_chains.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__library_cache.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 latch__library_cache_lock.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 latch__redo_writing.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__row_cache_objects.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__shared_pool.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 library_cache_load_lock.rrd

-rw-r--r-- 1 ludovico ludovico 165304 Apr 15 13:17 library_cache_lock.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_buffer_space.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_parallel_write.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_sequential_read.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_single_write.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_switch_completion.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 log_file_sync.rrd

-rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 os_thread_startup.rrd

As you can see, they are not so big…

Once you have your data in rrd files, it’s quite simple to script even complex plots with several datasources. Everything depends on the results you want.
This script stack all my wait events for a certain instance: it takes the directory containing all the rrds as first argument and the number of hours we want to be plotted as second argument:

cs=$1
hours=${2:-148}

eventlist=`ls $cs/wait/*rrd`

colors[1]="#000000"
colors[2]="#000055"
colors[3]="#0000aa"
colors[4]="#0000ff"
colors[5]="#550055"
colors[6]="#aa00aa"
colors[7]="#ff00ff"
colors[8]="#550000"
colors[9]="#aa0000"
colors[10]="#ff0000"
colors[11]="#555500"
colors[12]="#aaaa00"
colors[13]="#ffff00"
colors[14]="#005500"
colors[15]="#00aa00"
colors[16]="#00ff00"
colors[17]="#005555"
colors[18]="#00aaaa"
colors[19]="#00ffff"
colors[20]="#555555"
colors[21]="#aaaaaa"

i=0

for event in $eventlist ; do
        if [ $i -eq 0 ] ; then
                end=`rrdtool info $event | grep last_update | awk '{print $NF}'`
                end=`rrdtool info $cs/wait/control_file_parallel_write.rrd | grep last_update | awk '{print $NF}'`
                cmd="rrdtool graph - -s end-${hours}hours -e $end  -v \"milliseconds waited\" -l 0 -w 640 -h 240 -t \"$cs WAIT PROFILE\""
                i=$(($i+1))
        fi
        color=${colors[$i]}
        echo $color
        evname=`basename $event | sed -e s/\.rrd\$//`
        cmd="$cmd  DEF:$evname=$event:mswaited:AVERAGE"
        cmd="$cmd  AREA:${evname}${color}:"$evname":STACK"
        i=$(($i+1))
        if [ $i -eq 20 ] ; then
                i=1
        fi
done
        cmd="$cmd  |display /dev/input"
        echo $cmd
        eval $cmd
exit

cs=$1

hours=${2:-148}

eventlist=`ls $cs/wait/*rrd`

colors[1]="#000000"

colors[2]="#000055"

colors[3]="#0000aa"

colors[4]="#0000ff"

colors[5]="#550055"

colors[6]="#aa00aa"

colors[7]="#ff00ff"

colors[8]="#550000"

colors[9]="#aa0000"

colors[10]="#ff0000"

colors[11]="#555500"

colors[12]="#aaaa00"

colors[13]="#ffff00"

colors[14]="#005500"

colors[15]="#00aa00"

colors[16]="#00ff00"

colors[17]="#005555"

colors[18]="#00aaaa"

colors[19]="#00ffff"

colors[20]="#555555"

colors[21]="#aaaaaa"

i=0

for event in $eventlist ; do

if [ $i -eq 0 ] ; then

end=`rrdtool info $event | grep last_update | awk '{print $NF}'`

end=`rrdtool info $cs/wait/control_file_parallel_write.rrd | grep last_update | awk '{print $NF}'`

cmd="rrdtool graph - -s end-${hours}hours -e $end -v \"milliseconds waited\" -l 0 -w 640 -h 240 -t \"$cs WAIT PROFILE\""

i=$(($i+1))

color=${colors[$i]}

echo $color

evname=`basename $event | sed -e s/\.rrd\$//`

cmd="$cmd DEF:$evname=$event:mswaited:AVERAGE"

cmd="$cmd AREA:${evname}${color}:"$evname":STACK"

i=$(($i+1))

if [ $i -eq 20 ] ; then

i=1

done

cmd="$cmd |display /dev/input"

echo $cmd

eval $cmd

exit

The resulting command is very long:

rrdtool graph - -s end-148hours -e 1243252800 \
 -v "milliseconds waited" -l 0 -w 640 -h 240 -t "mydb WAIT PROFILE"\
 DEF:Streams_AQ__enqueue_blocked_on_low_memory=mydb/wait/Streams_AQ__enqueue_blocked_on_low_memory.rrd:mswaited:AVERAGE \
 AREA:Streams_AQ__enqueue_blocked_on_low_memory#000000:Streams_AQ__enqueue_blocked_on_low_memory:STACK\
 DEF:buffer_busy_waits=mydb/wait/buffer_busy_waits.rrd:mswaited:AVERAGE \
 AREA:buffer_busy_waits#000055:buffer_busy_waits:STACK\
 DEF:control_file_parallel_write=mydb/wait/control_file_parallel_write.rrd:mswaited:AVERAGE \
 AREA:control_file_parallel_write#0000aa:control_file_parallel_write:STACK\
 DEF:control_file_sequential_read=mydb/wait/control_file_sequential_read.rrd:mswaited:AVERAGE \
 AREA:control_file_sequential_read#0000ff:control_file_sequential_read:STACK\
 DEF:cursor__pin_S_wait_on_X=mydb/wait/cursor__pin_S_wait_on_X.rrd:mswaited:AVERAGE \
 AREA:cursor__pin_S_wait_on_X#550055:cursor__pin_S_wait_on_X:STACK\
 DEF:db_file_scattered_read=mydb/wait/db_file_scattered_read.rrd:mswaited:AVERAGE \
 AREA:db_file_scattered_read#aa00aa:db_file_scattered_read:STACK\
 DEF:db_file_sequential_read=mydb/wait/db_file_sequential_read.rrd:mswaited:AVERAGE \
 AREA:db_file_sequential_read#ff00ff:db_file_sequential_read:STACK\
 DEF:events_in_waitclass_Other=mydb/wait/events_in_waitclass_Other.rrd:mswaited:AVERAGE \
 AREA:events_in_waitclass_Other#550000:events_in_waitclass_Other:STACK\
 DEF:latch__cache_buffers_chains=mydb/wait/latch__cache_buffers_chains.rrd:mswaited:AVERAGE \
 AREA:latch__cache_buffers_chains#aa0000:latch__cache_buffers_chains:STACK\
 DEF:latch__library_cache=mydb/wait/latch__library_cache.rrd:mswaited:AVERAGE \
 AREA:latch__library_cache#ff0000:latch__library_cache:STACK\
 DEF:latch__library_cache_lock=mydb/wait/latch__library_cache_lock.rrd:mswaited:AVERAGE \
 AREA:latch__library_cache_lock#555500:latch__library_cache_lock:STACK\
 DEF:latch__redo_writing=mydb/wait/latch__redo_writing.rrd:mswaited:AVERAGE \
 AREA:latch__redo_writing#aaaa00:latch__redo_writing:STACK\
 DEF:latch__row_cache_objects=mydb/wait/latch__row_cache_objects.rrd:mswaited:AVERAGE \
 AREA:latch__row_cache_objects#ffff00:latch__row_cache_objects:STACK\
 DEF:latch__shared_pool=mydb/wait/latch__shared_pool.rrd:mswaited:AVERAGE \
 AREA:latch__shared_pool#005500:latch__shared_pool:STACK\
 DEF:library_cache_load_lock=mydb/wait/library_cache_load_lock.rrd:mswaited:AVERAGE \
 AREA:library_cache_load_lock#00aa00:library_cache_load_lock:STACK\
 DEF:library_cache_lock=mydb/wait/library_cache_lock.rrd:mswaited:AVERAGE \
 AREA:library_cache_lock#00ff00:library_cache_lock:STACK\
 DEF:log_buffer_space=mydb/wait/log_buffer_space.rrd:mswaited:AVERAGE \
 AREA:log_buffer_space#005555:log_buffer_space:STACK\
 DEF:log_file_parallel_write=mydb/wait/log_file_parallel_write.rrd:mswaited:AVERAGE \
 AREA:log_file_parallel_write#00aaaa:log_file_parallel_write:STACK\
 DEF:log_file_sequential_read=mydb/wait/log_file_sequential_read.rrd:mswaited:AVERAGE \
 AREA:log_file_sequential_read#00ffff:log_file_sequential_read:STACK\
 DEF:log_file_single_write=mydb/wait/log_file_single_write.rrd:mswaited:AVERAGE \
 AREA:log_file_single_write#000000:log_file_single_write:STACK\
 DEF:log_file_switch_completion=mydb/wait/log_file_switch_completion.rrd:mswaited:AVERAGE \
 AREA:log_file_switch_completion#000055:log_file_switch_completion:STACK\
 DEF:log_file_sync=mydb/wait/log_file_sync.rrd:mswaited:AVERAGE \
 AREA:log_file_sync#0000aa:log_file_sync:STACK\
 DEF:os_thread_startup=mydb/wait/os_thread_startup.rrd:mswaited:AVERAGE \
 AREA:os_thread_startup#0000ff:os_thread_startup:STACK |display /dev/input

rrdtool graph - -s end-148hours -e 1243252800 \

-v "milliseconds waited" -l 0 -w 640 -h 240 -t "mydb WAIT PROFILE"\

DEF:Streams_AQ__enqueue_blocked_on_low_memory=mydb/wait/Streams_AQ__enqueue_blocked_on_low_memory.rrd:mswaited:AVERAGE \

AREA:Streams_AQ__enqueue_blocked_on_low_memory#000000:Streams_AQ__enqueue_blocked_on_low_memory:STACK\

DEF:buffer_busy_waits=mydb/wait/buffer_busy_waits.rrd:mswaited:AVERAGE \

AREA:buffer_busy_waits#000055:buffer_busy_waits:STACK\

DEF:control_file_parallel_write=mydb/wait/control_file_parallel_write.rrd:mswaited:AVERAGE \

AREA:control_file_parallel_write#0000aa:control_file_parallel_write:STACK\

DEF:control_file_sequential_read=mydb/wait/control_file_sequential_read.rrd:mswaited:AVERAGE \

AREA:control_file_sequential_read#0000ff:control_file_sequential_read:STACK\

DEF:cursor__pin_S_wait_on_X=mydb/wait/cursor__pin_S_wait_on_X.rrd:mswaited:AVERAGE \

AREA:cursor__pin_S_wait_on_X#550055:cursor__pin_S_wait_on_X:STACK\

DEF:db_file_scattered_read=mydb/wait/db_file_scattered_read.rrd:mswaited:AVERAGE \

AREA:db_file_scattered_read#aa00aa:db_file_scattered_read:STACK\

DEF:db_file_sequential_read=mydb/wait/db_file_sequential_read.rrd:mswaited:AVERAGE \

AREA:db_file_sequential_read#ff00ff:db_file_sequential_read:STACK\

DEF:events_in_waitclass_Other=mydb/wait/events_in_waitclass_Other.rrd:mswaited:AVERAGE \

AREA:events_in_waitclass_Other#550000:events_in_waitclass_Other:STACK\

DEF:latch__cache_buffers_chains=mydb/wait/latch__cache_buffers_chains.rrd:mswaited:AVERAGE \

AREA:latch__cache_buffers_chains#aa0000:latch__cache_buffers_chains:STACK\

DEF:latch__library_cache=mydb/wait/latch__library_cache.rrd:mswaited:AVERAGE \

AREA:latch__library_cache#ff0000:latch__library_cache:STACK\

DEF:latch__library_cache_lock=mydb/wait/latch__library_cache_lock.rrd:mswaited:AVERAGE \

AREA:latch__library_cache_lock#555500:latch__library_cache_lock:STACK\

DEF:latch__redo_writing=mydb/wait/latch__redo_writing.rrd:mswaited:AVERAGE \

AREA:latch__redo_writing#aaaa00:latch__redo_writing:STACK\

DEF:latch__row_cache_objects=mydb/wait/latch__row_cache_objects.rrd:mswaited:AVERAGE \

AREA:latch__row_cache_objects#ffff00:latch__row_cache_objects:STACK\

DEF:latch__shared_pool=mydb/wait/latch__shared_pool.rrd:mswaited:AVERAGE \

AREA:latch__shared_pool#005500:latch__shared_pool:STACK\

DEF:library_cache_load_lock=mydb/wait/library_cache_load_lock.rrd:mswaited:AVERAGE \

AREA:library_cache_load_lock#00aa00:library_cache_load_lock:STACK\

DEF:library_cache_lock=mydb/wait/library_cache_lock.rrd:mswaited:AVERAGE \

AREA:library_cache_lock#00ff00:library_cache_lock:STACK\

DEF:log_buffer_space=mydb/wait/log_buffer_space.rrd:mswaited:AVERAGE \

AREA:log_buffer_space#005555:log_buffer_space:STACK\

DEF:log_file_parallel_write=mydb/wait/log_file_parallel_write.rrd:mswaited:AVERAGE \

AREA:log_file_parallel_write#00aaaa:log_file_parallel_write:STACK\

DEF:log_file_sequential_read=mydb/wait/log_file_sequential_read.rrd:mswaited:AVERAGE \

AREA:log_file_sequential_read#00ffff:log_file_sequential_read:STACK\

DEF:log_file_single_write=mydb/wait/log_file_single_write.rrd:mswaited:AVERAGE \

AREA:log_file_single_write#000000:log_file_single_write:STACK\

DEF:log_file_switch_completion=mydb/wait/log_file_switch_completion.rrd:mswaited:AVERAGE \

AREA:log_file_switch_completion#000055:log_file_switch_completion:STACK\

DEF:log_file_sync=mydb/wait/log_file_sync.rrd:mswaited:AVERAGE \

AREA:log_file_sync#0000aa:log_file_sync:STACK\

DEF:os_thread_startup=mydb/wait/os_thread_startup.rrd:mswaited:AVERAGE \

AREA:os_thread_startup#0000ff:os_thread_startup:STACK |display /dev/input

This is the resulting graph:

OHHHHHHHHHHHH COOOOL!!!
😉

Any comment is appreciated! thanks

How to collect Oracle Application Server performance data with DMS and RRDtool

Posted on March 2, 2009 by Ludovico

RRDize everything, chapter 1

If you are managing some Application Server deployments you should have wondered how to check and collect performance data.
As stated in documentation, you can gather performance metrics with the dmstool utility.
AFAIK, this can be done from 9.0.2 release upwards, but i’m concerned DMS will not work on Weblogic.

Mainly, you should have an external server that acts as collector (it could be a server in the Oracle AS farm as well): copy the dms.jar library from an Oracle AS installation to your collector and use it as you would use dmstool:

java -jar dms.jar [dmstool options]

1	java -jar dms.jar [dmstool options]

There are three basilar methods to get data:

Get all metrics at once:

java -jar dms.jar -dump -a "youraddress://..." [format=xml]

1	java -jar dms.jar -dump -a "youraddress://..." [format=xml]

Get only the interesting metrics:

java -jar dms.jar -a "youraddress://..." metric metric ...

1	java -jar dms.jar -a "youraddress://..." metric metric ...

Get metrics included into specific DMS tables:

java -jar dms.jar -a "youraddress://..." -table table table ...

1	java -jar dms.jar -a "youraddress://..." -table table table ...

What youraddress:// is, it depends on the component you are trying to connect:

opmn://asserver:6003
http://asserver:7200/dms0/Spy
ajp13://asserver:3301/dmsoc4j/Spy

opmn://asserver:6003

http://asserver:7200/dms0/Spy

ajp13://asserver:3301/dmsoc4j/Spy

If you are trying to connect to the OHS (Apache), be careful to allow remote access from the collector by editing the dms.conf file.

Now that you can query dms data, you should store it somewhere.
Personally, I did a first attempt with dmstool -dump format=xml. I wrote a parser in PHP with SimpleXML extension and I did a lot of inserts into a MySQL database. After a few months the whole data collected from tens of servers was too much to be mantained…
To avoid the maintenance of a DWH-grade database I investigated and found RRDTool. Now I’m asking how could I live without it!

I then wrote a parser in awk that parse the output of the dms.jar call and invoke an rrdtool update command.
I always use dms.jar -table command. The output has always the same format:

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------
TABLE1_Name
---------------

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

---
TABLE2_Name
---

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

##EOF

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------

TABLE1_Name

---------------

record1_metric1.name: value units

record1_metric2.name: value units

....

record2_metric1.name: value units

record2_metric2.name: value units

....

---

TABLE2_Name

---

record1_metric1.name: value units

record1_metric2.name: value units

....

record2_metric1.name: value units

record2_metric2.name: value units

....

##EOF

So I written an awk file that works for me.
use it this way:

 java -jar dms.jar ... | awk -f parse_output.awk

1	java -jar dms.jar ... \| awk -f parse_output.awk

####################
# parse_output.awk #
####################

#function pl() replaces all non alphanumeric occurrences with an underscore
function pl(input) {
        return gensub("[^[:alnum:]_-]","_","G",input);
}

# function get_rrd_path() returns a path where the rrd files should be placed
# I should rewrite a new path for each dms table... I'll skip many of them
function get_rrd_path() {
        if (table == "mod_oc4j_destination_metrics")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Name.value"]), pl(var) );
        if (table == "mod_oc4j_mount_pt_metrics")
                return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );
        if (table == "ohs_server")
                return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
        if (table == "JVM")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Process"]), pl(var) );
        if (table == "opmn_process")
                return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),
                  pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),
                  pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),
                  pl(record["Name"]), pl(var) );

        return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
}
# function process_record actually does the dirty work of invoking the update script
function process_record() {
        #every record has a timeStamp.ts metric that I should use to update my rrd
        ts=substr(record["timeStamp.ts"],0,10);
        for ( var in record ) {
        if ( var != "timeStamp.ts" &amp;&amp; record[var] ~ /^[[:digit:]]+$/ ) {
            if ( var ~ /\.(count|completed|time)$/ ) {
                dstype="DERIVE";
            } else {
                if ( var == "responseSize.value" ) {
                    dstype="DERIVE";
                } else {
                    dstype="GAUGE";
                }
            }
            rrdFile=sprintf("/path_to_data/%s",get_rrd_path());
            #### update_metric_rrd is a shell script listed below!!!!!
            cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",
                rrdFile,dstype,ts,record[var]);
            system(cmd);
            }
        }
}

# parse_record() populates an hash array
# with all metrics belonging to the table record
function parse_record() {
    #print "RRRR -  START OF RECORD (table " table ")"
    delete record
    while ( ! /^$/ ) {
        # I'm parsing the record as far I'm in this while statement
        # the array hash is the name of the dms metric basename.
        # $1 is the metric name but I have to trim the final ":"
        key=substr($1,0,length($1)-1)
        record[key]=$2
        getline
    }
    # this function is included in funcions.awk:
    # I invoke it to process the record I've just parsed
    process_record();
}
BEGIN {
    # as far as started is 0, I've never reached the first table
    started=0
}

#MAIN
{
    # I jump over the first lines until I reach the first table
    if (started==0) {
        while ( ! /^---/ ) {
           getline
        }
        started=1
    }

    # looking for the next occurrence of a table
    # all tables start with:
    # ----------
    # table_name
    # ----------
    if ( /^---/ ) {
        # first table reached: the next row is my table name,
        # then I reach again a dashed line -----
        getline table
        getline trash
        #print ""
        #print "##########################"
        print "  TABELLA " table
        #print "##########################"
        next
    }

    if ( ! /^$/ ) {
        # reached an empty line: could be the end of a record or the and of a table
        # since a new table is threated in previous "if" statement, I'm starting a new record.
        parse_record()
    }

}

END {
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

####################

# parse_output.awk #

####################

#function pl() replaces all non alphanumeric occurrences with an underscore

function pl(input) {

return gensub("[^[:alnum:]_-]","_","G",input);

}

# function get_rrd_path() returns a path where the rrd files should be placed

# I should rewrite a new path for each dms table... I'll skip many of them

function get_rrd_path() {

if (table == "mod_oc4j_destination_metrics")

return sprintf("%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Name.value"]), pl(var) );

if (table == "mod_oc4j_mount_pt_metrics")

return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );

if (table == "ohs_server")

return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );

if (table == "JVM")

return sprintf("%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Process"]), pl(var) );

if (table == "opmn_process")

return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),

pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),

pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),

pl(record["Name"]), pl(var) );

return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );

}

# function process_record actually does the dirty work of invoking the update script

function process_record() {

#every record has a timeStamp.ts metric that I should use to update my rrd

ts=substr(record["timeStamp.ts"],0,10);

for ( var in record ) {

if ( var != "timeStamp.ts" && record[var] ~ /^[[:digit:]]+$/ ) {

if ( var ~ /\.(count|completed|time)$/ ) {

dstype="DERIVE";

} else {

if ( var == "responseSize.value" ) {

dstype="DERIVE";

} else {

dstype="GAUGE";

}

rrdFile=sprintf("/path_to_data/%s",get_rrd_path());

#### update_metric_rrd is a shell script listed below!!!!!

cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",

rrdFile,dstype,ts,record[var]);

system(cmd);

}

# parse_record() populates an hash array

# with all metrics belonging to the table record

function parse_record() {

#print "RRRR - START OF RECORD (table " table ")"

delete record

while ( ! /^$/ ) {

# I'm parsing the record as far I'm in this while statement

# the array hash is the name of the dms metric basename.

# $1 is the metric name but I have to trim the final ":"

key=substr($1,0,length($1)-1)

record[key]=$2

getline

}

# this function is included in funcions.awk:

# I invoke it to process the record I've just parsed

process_record();

}

BEGIN {

# as far as started is 0, I've never reached the first table

started=0

}

#MAIN

{

# I jump over the first lines until I reach the first table

if (started==0) {

while ( ! /^---/ ) {

getline

}

started=1

}

# looking for the next occurrence of a table

# all tables start with:

# ----------

# table_name

# ----------

if ( /^---/ ) {

# first table reached: the next row is my table name,

# then I reach again a dashed line -----

getline table

getline trash

#print ""

#print "##########################"

print " TABELLA " table

#print "##########################"

}

if ( ! /^$/ ) {

# reached an empty line: could be the end of a record or the and of a table

# since a new table is threated in previous "if" statement, I'm starting a new record.

parse_record()

}

END {

}

And this is the code for update_metric_rrd:

#!/bin/bash
RRDFILE=$1
DSTYPE=$2
TS=$3
VALUE=$4

rrdtool update $RRDFILE ${TS}:${VALUE}

if [ $? -ne 0 ] ; then
        DIR=`dirname $RRDFILE`

        [ -d $DIR ] || mkdir -p $DIR
        [ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \
                DS:metric:${DSTYPE}:7200:0:U \
                RRA:AVERAGE:0.5:1:672 \
                RRA:AVERAGE:0.5:4:1080 \
                RRA:AVERAGE:0.5:12:1460 \
                RRA:AVERAGE:0.5:48:1095 \
                RRA:MAX:0.5:4:1080 \
                RRA:MAX:0.5:12:1460 \
                RRA:MAX:0.5:48:1095 \
                RRA:LAST:0.5:1:672
        rrdtool update $RRDFILE ${TS}:${VALUE}
fi

#!/bin/bash

RRDFILE=$1

DSTYPE=$2

TS=$3

VALUE=$4

rrdtool update $RRDFILE ${TS}:${VALUE}

if [ $? -ne 0 ] ; then

DIR=`dirname $RRDFILE`

[ -d $DIR ] || mkdir -p $DIR

[ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \

DS:metric:${DSTYPE}:7200:0:U \

RRA:AVERAGE:0.5:1:672 \

RRA:AVERAGE:0.5:4:1080 \

RRA:AVERAGE:0.5:12:1460 \

RRA:AVERAGE:0.5:48:1095 \

RRA:MAX:0.5:4:1080 \

RRA:MAX:0.5:12:1460 \

RRA:MAX:0.5:48:1095 \

RRA:LAST:0.5:1:672

rrdtool update $RRDFILE ${TS}:${VALUE}

Once you have all your rrd files populated, it’s easy to script automatic reporting. You would probably want a graph with the request count served by your Apache cluster, along with its linear regression:

rrdtool graph - -s "end-${hours}hours" -e $end \
                -v "Requests Completed/sec" \
        -w 640 -h 240 --slope-mode \
                -t "HTTP Requests for www.ludovicocaldara.net" \
                DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \
                DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \
                CDEF:request_completed=1request_completed,2request_completed,+ \
                VDEF:slope=request_completed,LSLSLOPE \
                VDEF:lslint=request_completed,LSLINT \
                CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \
                LINE1:reg#666666:"Regression" \
                AREA:1request_completed#4040AA:"wwwserver1"  \
                AREA:2request_completed#6666FF:"wwwserver1":STACK  \
        &gt; mygraph.png

rrdtool graph - -s "end-${hours}hours" -e $end \

-v "Requests Completed/sec" \

-w 640 -h 240 --slope-mode \

-t "HTTP Requests for www.ludovicocaldara.net" \

DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \

DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \

CDEF:request_completed=1request_completed,2request_completed,+ \

VDEF:slope=request_completed,LSLSLOPE \

VDEF:lslint=request_completed,LSLINT \

CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \

LINE1:reg#666666:"Regression" \

AREA:1request_completed#4040AA:"wwwserver1" \

AREA:2request_completed#6666FF:"wwwserver1":STACK \

> mygraph.png

This is the result:
OHS request completed
OHHHHHHHHHHHH!!!! COOL!!!!

That’s all for DMS capacity planning. Stay tuned, more about rrdtool is coming!

More about Dataguard and how to check it

Posted on February 6, 2009 by Ludovico

After my post Quick Oracle Dataguard check script I have some considerations to add:
to check the gap of applied log stream by MRP0 process it’s sufficient to replace this query in the perl script I posted:

 select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='RFS' and client_process='LGWR'

1 2	select SEQUENCE#, BLOCK# from v\$managed_standby where process='RFS' and client_process='LGWR'

with this new one:

 select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='MRP0'

1 2	select SEQUENCE#, BLOCK# from v\$managed_standby where process='MRP0'

To check this you have to meet the following condition: You should have real-time apply enabled (and possibly NODELAY clause specified in your recover statement). Check it with this query:

SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;

1	SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;

It should be “MANAGED REAL TIME APPLY”.
If not using realtime apply your MRP0 process will wait until you have a new archive, so even if you have redo transport mode set to LGWR you’ll wait for standby log completion. Your gap of applied redo stream will be at least one sequence#.

With transport mode set to LGWR and real-time apply the output of the perl script is similar to this one:

# ./checkDataGuard.sh
PROD   :       1230      20631
STANDBY:       1230      20613
18         blocks gap

# ./checkDataGuard.sh

PROD : 1230 20631

STANDBY: 1230 20613

18 blocks gap

The whole gap between your primary and standby database should be LOW.

Clustering the RMAN catalog on a RAC environment

Posted on January 13, 2009 by Ludovico

You have your brand new RAC deployed on a cluster and you want to manage your backups through a recovery catalog.
Suppose you don’t have a dedicate server to host your catalog, perhaps you wouldn’t configure your catalog as a RAC database: so why don’t you use Clusterware to configure your catalog as a single instance in cold failover?

OTN has a very nice whitepaper describing how to protect a single instance database. This can be nicely applied on 10g, 10gR2 or 11g: Using Oracle Clusterware to Protect A Single Instance Oracle Database 11g.

Clusterware is appealing also for traditional cold failover clusters. Licensing allows you to use Clusterware as far as you protect Oracle software or 3rd party software that use Oracle as database backend.

Quick Oracle Dataguard check script

Posted on January 5, 2009 by Ludovico

Oracle Dataguard has his own command-line dgmgrl to check the whole dataguard configuration status.
At least you should check that the show configuration command returns SUCCESS.

This is an hypothetic script:

#!/bin/bash
export ORACLE_HOME=/u1/app/oracle/product/10.2.0
export ORACLE_SID=orcldg
result=`echo "show configuration;" | \
  $ORACLE_HOME/bin/dgmgrl sys/strongpasswd | \
  grep -A 1 "Current status for" | grep -v "Current status for"`
if [ "$result" = "SUCCESS" ] ; then
    exit 0
else
    exit 1
fi

#!/bin/bash

export ORACLE_HOME=/u1/app/oracle/product/10.2.0

export ORACLE_SID=orcldg

result=`echo "show configuration;" | \

$ORACLE_HOME/bin/dgmgrl sys/strongpasswd | \

grep -A 1 "Current status for" | grep -v "Current status for"`

if [ "$result" = "SUCCESS" ] ; then

exit 0

else

exit 1

Another script should check for the gap between production online log and the log stream received by the standby database. This can be accomplished with v$managed_standby view.
The Total Block Gap between production and standby can be calculated this way:
Sum all blocks from v$archived_logs where seq# between Current Standby Seq# and Current Production Seq#. Then add current block# of the production LGWR process and subtract current block# from RFS standby process. This gives you total blocks even if there is a log sequence gap between sites.
This is NOT the gap of online log APPLIED to the standby database. THIS IS THE GAP OF ONLINE LOG TRANSMITTED TO THE STANDBY RFS PROCESS and can be used to monitor your dataguard transmission from production to disaster recovery environment.

This is an excerpt of such script (please take care that it does not check against RFS failures, so it can fails when RFS is not alive):

#!/u1/app/oracle/product/10.2.0/perl/bin/perl -w
use DBI;
use DBD::Oracle qw(:ora_session_modes);
# DB connection #
my $prod  = "orclprod";
my $stby = "orcldr";
my $prodh;
unless ($prodh = DBI->connect('dbi:Oracle:'.$prod,
  'sys', 'strongpassword',
  {PrintError=>0, AutoCommit => 0,
  ora_session_mode => ORA_SYSDBA}))  {
    print "Error connecting to DB: $DBI::errstr\n";
        exit(1);
}
$prodh->{RaiseError}=1;

my $stbyh;
unless ($stbyh = DBI->connect('dbi:Oracle:'.$stby,
  'sys', 'strongpassword',
  {PrintError=>0, AutoCommit => 0,
  ora_session_mode => ORA_SYSDBA}))  {
    print "Error connecting to DB: $DBI::errstr\n";
        $prodh->disconnect;
        exit(1);
}
$stbyh->{RaiseError}=1;

my $sth;
### query prod
$sth = $prodh->prepare( < <eosql );
        select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='LGWR'
EOSQL
$sth->execute();
my ($psequence, $pblock) = $sth->fetchrow_array();
$sth->finish();
### query stdby
$sth = $stbyh->prepare( < </eosql><eosql );
        select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='RFS' and client_process='LGWR'
EOSQL
$sth->execute();
my ($ssequence, $sblock) = $sth->fetchrow_array();
$sth->finish();

printf ("PROD   : %10d %10d\n", $psequence, $pblock);
printf ("STANDBY: %10d %10d\n", $ssequence, $sblock);

$sth = $stbyh->prepare( < </eosql><eosql );
        select nvl(sum(blocks),0)
        + $pblock - $sblock as BLOCK_GAP
    from v\$archived_log
        where sequence# between $ssequence and $psequence
EOSQL
$sth->execute();
my ($blockgap) = $sth->fetchrow_array();
$sth->finish();
printf ("%-10d blocks gap\n", $blockgap);

$stbyh->disconnect;
$prodh->disconnect;
</eosql>

#!/u1/app/oracle/product/10.2.0/perl/bin/perl -w

use DBI;

use DBD::Oracle qw(:ora_session_modes);

# DB connection #

my $prod = "orclprod";

my $stby = "orcldr";

my $prodh;

unless ($prodh = DBI->connect('dbi:Oracle:'.$prod,

'sys', 'strongpassword',

{PrintError=>0, AutoCommit => 0,

ora_session_mode => ORA_SYSDBA})) {

print "Error connecting to DB: $DBI::errstr\n";

exit(1);

}

$prodh->{RaiseError}=1;

my $stbyh;

unless ($stbyh = DBI->connect('dbi:Oracle:'.$stby,

'sys', 'strongpassword',

{PrintError=>0, AutoCommit => 0,

ora_session_mode => ORA_SYSDBA})) {

print "Error connecting to DB: $DBI::errstr\n";

$prodh->disconnect;

exit(1);

}

$stbyh->{RaiseError}=1;

my $sth;

### query prod

$sth = $prodh->prepare( < <eosql );

select SEQUENCE#, BLOCK# from v\$managed_standby

where process='LGWR'

EOSQL

$sth->execute();

my ($psequence, $pblock) = $sth->fetchrow_array();

$sth->finish();

### query stdby

$sth = $stbyh->prepare( < </eosql><eosql );

select SEQUENCE#, BLOCK# from v\$managed_standby

where process='RFS' and client_process='LGWR'

EOSQL

$sth->execute();

my ($ssequence, $sblock) = $sth->fetchrow_array();

$sth->finish();

printf ("PROD : %10d %10d\n", $psequence, $pblock);

printf ("STANDBY: %10d %10d\n", $ssequence, $sblock);

$sth = $stbyh->prepare( < </eosql><eosql );

select nvl(sum(blocks),0)

+ $pblock - $sblock as BLOCK_GAP

from v\$archived_log

where sequence# between $ssequence and $psequence

EOSQL

$sth->execute();

my ($blockgap) = $sth->fetchrow_array();

$sth->finish();

printf ("%-10d blocks gap\n", $blockgap);

$stbyh->disconnect;

$prodh->disconnect;

</eosql>

Any comment is appreciated!

Tips: Bash Prompt and Oracle

Posted on December 30, 2008 by Ludovico

You may want to check the NEW VERSION of this prompt here.

export PS1=\u@\h:\w\$

1	export PS1=\u@\h:\w\$

I disagree with default bash prompt. Do you? It’s quote common to work with long paths:

ludovico@host:/u01/app/oracle/product/10.2.0/network/admin$ \
/nooo/this/command/line/is/really/long/and/offcourse -I \
-will -wrap -my -command -line

ludovico@host:/u01/app/oracle/product/10.2.0/network/admin$ \

/nooo/this/command/line/is/really/long/and/offcourse -I \

-will -wrap -my -command -line

and, when working on multi-database environments I need to check my environment:

env | grep -i oracle
#or
echo $ORACLE_SID
echo $ORACLE_HOME

env | grep -i oracle

#or

echo $ORACLE_SID

echo $ORACLE_HOME

I currently use this prompt, instead:

export PS1=$'\\n# [ $LOGNAME@\h:$PWD [\\t] [`ohvers` SID:${ORACLE_SID:-"no sid"}] ]\\n# '

# [ ludovico@caldara_2k:/u01/app/oracle/product/10.2.0/db_1/network/admin [23:15:58] [10.2.0 SID:orcl] ]
#

export PS1=$'\\n# [ $LOGNAME@\h:$PWD [\\t] [`ohvers` SID:${ORACLE_SID:-"no sid"}] ]\\n# '

# [ ludovico@caldara_2k:/u01/app/oracle/product/10.2.0/db_1/network/admin [23:15:58] [10.2.0 SID:orcl] ]

What is ohvers?? I defined this function to get the version of oracle from my ORACLE_HOME variable:

ohvers ()
{
echo -n $ORACLE_HOME | sed -n 's/.*\/\([[:digit:].]\+\)\/.*/\1/p'
}

ohvers ()

{

echo -n $ORACLE_HOME | sed -n 's/.*\/$[[:digit:].]\+$\/.*/\1/p'

}

Pros:

I have a blank line that separate my prompt from previous output
I get the system clock (useful when saving my konsole history. Did I say konsole?)
I can see my Oracle Environment before launching dangerous commands
I have an empty line to start my endless commands
I have a lot of sharps “#” : they are fine against wrong copy&paste operations…

Suggestions?

JBoss Portal and MySQL scalability: What The…???

Posted on December 10, 2008 by Ludovico

I found several queries running on a MySQL 5.0 database like this one:

SELECT PATH, NAME FROM JBP_OBJECT_NODE  WHERE PK IN (
SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'LUDOVICO'
and ju.jbp_enabled = 1));

SELECT PATH, NAME FROM JBP_OBJECT_NODE WHERE PK IN (

SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'LUDOVICO'

and ju.jbp_enabled = 1));

This query is related to JBoss Portal and does a full scan on table JBP_OBJECT_NODE.

It has bad performances (>0.8 sec) with just a few records:

mysql> select count(*) from JBP_OBJECT_NODE;
+———-+
| count(*) |
+———-+
| 33461 |
+———-+

If I rewrite the query using an inner join (à la Oracle, please forgive me) instead of a subquery I get an index scan:

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from  JBP_OBJECT_NODE
a, JBP_OBJECT_NODE_SEC b
where a.pk=b.NODE_KEY
and b.ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'UTDEMO'
and ju.jbp_enabled = 1);

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from JBP_OBJECT_NODE

a, JBP_OBJECT_NODE_SEC b

where a.pk=b.NODE_KEY

and b.ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'UTDEMO'

and ju.jbp_enabled = 1);

With 30k records the execution time falls down from 0.8 secs to 0.01 secs…
That’s NOT all! I found this open bug:

https://jira.jboss.org/jira/browse/JBPORTAL-2040

With many users registered in, the JBoss Portal Admin console tooks over a minute to show a single page…

I don’t like portals…

Oracle RAC Standard Edition to achieve low cost and high performance

Posted on November 28, 2008 by Ludovico

I finished today to create a new production environment based on 2 Linux serverX86_64 and running Oracle RAC 10gR2. (I know, there is 11g right now, but I’m a conservative!)
Wheeew, I just spent a couple of hours applying all the recommended patches!
We choosed 2 nodes with a maximum of 2 multi-core processors each one so we can license Standard Edition instead of Enterprise Edition. 64bits addressing allow us to allocate many gigabytes of SGA. I’m starting with 5Gb but I think we’ll need more. And a set of 6x300Gb 15krpms disks (it can be expanded with more disks and more shelves).
This configuration keeps low the total cost of ownership but achieves best performance.
Due to disks layout, costs and needed usable storage, we had to configure one huge RAID5 on the SAN with multi-path. I decided anyway to create 2 ASM disk groups (ASM is mandatory for Standard Edition RAC), one for the DB, the second one for the recovery area. With spare disks we should have enough availability and even if it’s a RAID5 I saw good write performances (>150M/s).

Welcome new RAC, I hope we’ll feel good together!

It’s time to trouble…

Posted on November 21, 2008 by Ludovico

Sometimes it’s hard to find enough time to write something or even to only THINK about writing something…

The following are the projects I have to complete before the deadline of December 17th (at least if I still want to go on vacation…)

A totally new Oracle 10gR2 RAC SE on Linux (OCFS2, ASM) including jboss frontends, backups, monitoring, documentation. (Servers are ready today).

A Disaster recovery architecture based on Dataguard with scripts based on rsync to do filesystem replication, with failover and failback, including backups, monitoring, documentation. (The server in DR site is reachable via network today).

A 17 server infrastructure (among others a RAC 10gR2 on linux) transfer from Milan datacenter to here. It’s planned for december 11th but I have to crosscheck backup and contingency requirements.

A 14 server infrastructure (based on Windows and SqlServer) transfer from Milan datacenter to here. To be planned in december.

A totally new cold failover cluster based on linux with Oracle DBMS and E-business suite (Servers will be provided soon, I hope!).

A new standalone Windows Server 64bit to outstand the 32bit allocation bottleneck for a 500Gb oracle database (Server will be provided not before december 10th).

Normally manage the day-by-day work, including replying to e-mails and answering the phone.

AARGH!!