DBA survival BLOG

DBA stuff and Oracle Data Guard

How to collect Oracle Application Server performance data with DMS and RRDtool

Posted on March 2, 2009 by Ludovico

RRDize everything, chapter 1

If you are managing some Application Server deployments you should have wondered how to check and collect performance data.
As stated in documentation, you can gather performance metrics with the dmstool utility.
AFAIK, this can be done from 9.0.2 release upwards, but i’m concerned DMS will not work on Weblogic.

Mainly, you should have an external server that acts as collector (it could be a server in the Oracle AS farm as well): copy the dms.jar library from an Oracle AS installation to your collector and use it as you would use dmstool:

java -jar dms.jar [dmstool options]

1	java -jar dms.jar [dmstool options]

There are three basilar methods to get data:

Get all metrics at once:

java -jar dms.jar -dump -a "youraddress://..." [format=xml]

1	java -jar dms.jar -dump -a "youraddress://..." [format=xml]

Get only the interesting metrics:

java -jar dms.jar -a "youraddress://..." metric metric ...

1	java -jar dms.jar -a "youraddress://..." metric metric ...

Get metrics included into specific DMS tables:

java -jar dms.jar -a "youraddress://..." -table table table ...

1	java -jar dms.jar -a "youraddress://..." -table table table ...

What youraddress:// is, it depends on the component you are trying to connect:

opmn://asserver:6003
http://asserver:7200/dms0/Spy
ajp13://asserver:3301/dmsoc4j/Spy

opmn://asserver:6003

http://asserver:7200/dms0/Spy

ajp13://asserver:3301/dmsoc4j/Spy

If you are trying to connect to the OHS (Apache), be careful to allow remote access from the collector by editing the dms.conf file.

Now that you can query dms data, you should store it somewhere.
Personally, I did a first attempt with dmstool -dump format=xml. I wrote a parser in PHP with SimpleXML extension and I did a lot of inserts into a MySQL database. After a few months the whole data collected from tens of servers was too much to be mantained…
To avoid the maintenance of a DWH-grade database I investigated and found RRDTool. Now I’m asking how could I live without it!

I then wrote a parser in awk that parse the output of the dms.jar call and invoke an rrdtool update command.
I always use dms.jar -table command. The output has always the same format:

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------
TABLE1_Name
---------------

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

---
TABLE2_Name
---

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

##EOF

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------

TABLE1_Name

---------------

record1_metric1.name: value units

record1_metric2.name: value units

....

record2_metric1.name: value units

record2_metric2.name: value units

....

---

TABLE2_Name

---

record1_metric1.name: value units

record1_metric2.name: value units

....

record2_metric1.name: value units

record2_metric2.name: value units

....

##EOF

So I written an awk file that works for me.
use it this way:

 java -jar dms.jar ... | awk -f parse_output.awk

1	java -jar dms.jar ... \| awk -f parse_output.awk

####################
# parse_output.awk #
####################

#function pl() replaces all non alphanumeric occurrences with an underscore
function pl(input) {
        return gensub("[^[:alnum:]_-]","_","G",input);
}

# function get_rrd_path() returns a path where the rrd files should be placed
# I should rewrite a new path for each dms table... I'll skip many of them
function get_rrd_path() {
        if (table == "mod_oc4j_destination_metrics")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Name.value"]), pl(var) );
        if (table == "mod_oc4j_mount_pt_metrics")
                return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );
        if (table == "ohs_server")
                return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
        if (table == "JVM")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Process"]), pl(var) );
        if (table == "opmn_process")
                return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),
                  pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),
                  pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),
                  pl(record["Name"]), pl(var) );

        return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
}
# function process_record actually does the dirty work of invoking the update script
function process_record() {
        #every record has a timeStamp.ts metric that I should use to update my rrd
        ts=substr(record["timeStamp.ts"],0,10);
        for ( var in record ) {
        if ( var != "timeStamp.ts" &amp;&amp; record[var] ~ /^[[:digit:]]+$/ ) {
            if ( var ~ /\.(count|completed|time)$/ ) {
                dstype="DERIVE";
            } else {
                if ( var == "responseSize.value" ) {
                    dstype="DERIVE";
                } else {
                    dstype="GAUGE";
                }
            }
            rrdFile=sprintf("/path_to_data/%s",get_rrd_path());
            #### update_metric_rrd is a shell script listed below!!!!!
            cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",
                rrdFile,dstype,ts,record[var]);
            system(cmd);
            }
        }
}

# parse_record() populates an hash array
# with all metrics belonging to the table record
function parse_record() {
    #print "RRRR -  START OF RECORD (table " table ")"
    delete record
    while ( ! /^$/ ) {
        # I'm parsing the record as far I'm in this while statement
        # the array hash is the name of the dms metric basename.
        # $1 is the metric name but I have to trim the final ":"
        key=substr($1,0,length($1)-1)
        record[key]=$2
        getline
    }
    # this function is included in funcions.awk:
    # I invoke it to process the record I've just parsed
    process_record();
}
BEGIN {
    # as far as started is 0, I've never reached the first table
    started=0
}

#MAIN
{
    # I jump over the first lines until I reach the first table
    if (started==0) {
        while ( ! /^---/ ) {
           getline
        }
        started=1
    }

    # looking for the next occurrence of a table
    # all tables start with:
    # ----------
    # table_name
    # ----------
    if ( /^---/ ) {
        # first table reached: the next row is my table name,
        # then I reach again a dashed line -----
        getline table
        getline trash
        #print ""
        #print "##########################"
        print "  TABELLA " table
        #print "##########################"
        next
    }

    if ( ! /^$/ ) {
        # reached an empty line: could be the end of a record or the and of a table
        # since a new table is threated in previous "if" statement, I'm starting a new record.
        parse_record()
    }

}

END {
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

####################

# parse_output.awk #

####################

#function pl() replaces all non alphanumeric occurrences with an underscore

function pl(input) {

return gensub("[^[:alnum:]_-]","_","G",input);

}

# function get_rrd_path() returns a path where the rrd files should be placed

# I should rewrite a new path for each dms table... I'll skip many of them

function get_rrd_path() {

if (table == "mod_oc4j_destination_metrics")

return sprintf("%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Name.value"]), pl(var) );

if (table == "mod_oc4j_mount_pt_metrics")

return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );

if (table == "ohs_server")

return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );

if (table == "JVM")

return sprintf("%s/%s/%s/%s.rrd", record["Host"],

pl(table), pl(record["Process"]), pl(var) );

if (table == "opmn_process")

return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),

pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),

pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),

pl(record["Name"]), pl(var) );

return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );

}

# function process_record actually does the dirty work of invoking the update script

function process_record() {

#every record has a timeStamp.ts metric that I should use to update my rrd

ts=substr(record["timeStamp.ts"],0,10);

for ( var in record ) {

if ( var != "timeStamp.ts" && record[var] ~ /^[[:digit:]]+$/ ) {

if ( var ~ /\.(count|completed|time)$/ ) {

dstype="DERIVE";

} else {

if ( var == "responseSize.value" ) {

dstype="DERIVE";

} else {

dstype="GAUGE";

}

rrdFile=sprintf("/path_to_data/%s",get_rrd_path());

#### update_metric_rrd is a shell script listed below!!!!!

cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",

rrdFile,dstype,ts,record[var]);

system(cmd);

}

# parse_record() populates an hash array

# with all metrics belonging to the table record

function parse_record() {

#print "RRRR - START OF RECORD (table " table ")"

delete record

while ( ! /^$/ ) {

# I'm parsing the record as far I'm in this while statement

# the array hash is the name of the dms metric basename.

# $1 is the metric name but I have to trim the final ":"

key=substr($1,0,length($1)-1)

record[key]=$2

getline

}

# this function is included in funcions.awk:

# I invoke it to process the record I've just parsed

process_record();

}

BEGIN {

# as far as started is 0, I've never reached the first table

started=0

}

#MAIN

{

# I jump over the first lines until I reach the first table

if (started==0) {

while ( ! /^---/ ) {

getline

}

started=1

}

# looking for the next occurrence of a table

# all tables start with:

# ----------

# table_name

# ----------

if ( /^---/ ) {

# first table reached: the next row is my table name,

# then I reach again a dashed line -----

getline table

getline trash

#print ""

#print "##########################"

print " TABELLA " table

#print "##########################"

}

if ( ! /^$/ ) {

# reached an empty line: could be the end of a record or the and of a table

# since a new table is threated in previous "if" statement, I'm starting a new record.

parse_record()

}

END {

}

And this is the code for update_metric_rrd:

#!/bin/bash
RRDFILE=$1
DSTYPE=$2
TS=$3
VALUE=$4

rrdtool update $RRDFILE ${TS}:${VALUE}

if [ $? -ne 0 ] ; then
        DIR=`dirname $RRDFILE`

        [ -d $DIR ] || mkdir -p $DIR
        [ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \
                DS:metric:${DSTYPE}:7200:0:U \
                RRA:AVERAGE:0.5:1:672 \
                RRA:AVERAGE:0.5:4:1080 \
                RRA:AVERAGE:0.5:12:1460 \
                RRA:AVERAGE:0.5:48:1095 \
                RRA:MAX:0.5:4:1080 \
                RRA:MAX:0.5:12:1460 \
                RRA:MAX:0.5:48:1095 \
                RRA:LAST:0.5:1:672
        rrdtool update $RRDFILE ${TS}:${VALUE}
fi

#!/bin/bash

RRDFILE=$1

DSTYPE=$2

TS=$3

VALUE=$4

rrdtool update $RRDFILE ${TS}:${VALUE}

if [ $? -ne 0 ] ; then

DIR=`dirname $RRDFILE`

[ -d $DIR ] || mkdir -p $DIR

[ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \

DS:metric:${DSTYPE}:7200:0:U \

RRA:AVERAGE:0.5:1:672 \

RRA:AVERAGE:0.5:4:1080 \

RRA:AVERAGE:0.5:12:1460 \

RRA:AVERAGE:0.5:48:1095 \

RRA:MAX:0.5:4:1080 \

RRA:MAX:0.5:12:1460 \

RRA:MAX:0.5:48:1095 \

RRA:LAST:0.5:1:672

rrdtool update $RRDFILE ${TS}:${VALUE}

Once you have all your rrd files populated, it’s easy to script automatic reporting. You would probably want a graph with the request count served by your Apache cluster, along with its linear regression:

rrdtool graph - -s "end-${hours}hours" -e $end \
                -v "Requests Completed/sec" \
        -w 640 -h 240 --slope-mode \
                -t "HTTP Requests for www.ludovicocaldara.net" \
                DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \
                DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \
                CDEF:request_completed=1request_completed,2request_completed,+ \
                VDEF:slope=request_completed,LSLSLOPE \
                VDEF:lslint=request_completed,LSLINT \
                CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \
                LINE1:reg#666666:"Regression" \
                AREA:1request_completed#4040AA:"wwwserver1"  \
                AREA:2request_completed#6666FF:"wwwserver1":STACK  \
        &gt; mygraph.png

rrdtool graph - -s "end-${hours}hours" -e $end \

-v "Requests Completed/sec" \

-w 640 -h 240 --slope-mode \

-t "HTTP Requests for www.ludovicocaldara.net" \

DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \

DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \

CDEF:request_completed=1request_completed,2request_completed,+ \

VDEF:slope=request_completed,LSLSLOPE \

VDEF:lslint=request_completed,LSLINT \

CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \

LINE1:reg#666666:"Regression" \

AREA:1request_completed#4040AA:"wwwserver1" \

AREA:2request_completed#6666FF:"wwwserver1":STACK \

> mygraph.png

This is the result:
OHS request completed
OHHHHHHHHHHHH!!!! COOL!!!!

That’s all for DMS capacity planning. Stay tuned, more about rrdtool is coming!

More about Dataguard and how to check it

Posted on February 6, 2009 by Ludovico

After my post Quick Oracle Dataguard check script I have some considerations to add:
to check the gap of applied log stream by MRP0 process it’s sufficient to replace this query in the perl script I posted:

 select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='RFS' and client_process='LGWR'

1 2	select SEQUENCE#, BLOCK# from v\$managed_standby where process='RFS' and client_process='LGWR'

with this new one:

 select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='MRP0'

1 2	select SEQUENCE#, BLOCK# from v\$managed_standby where process='MRP0'

To check this you have to meet the following condition: You should have real-time apply enabled (and possibly NODELAY clause specified in your recover statement). Check it with this query:

SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;

1	SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;

It should be “MANAGED REAL TIME APPLY”.
If not using realtime apply your MRP0 process will wait until you have a new archive, so even if you have redo transport mode set to LGWR you’ll wait for standby log completion. Your gap of applied redo stream will be at least one sequence#.

With transport mode set to LGWR and real-time apply the output of the perl script is similar to this one:

# ./checkDataGuard.sh
PROD   :       1230      20631
STANDBY:       1230      20613
18         blocks gap

# ./checkDataGuard.sh

PROD : 1230 20631

STANDBY: 1230 20613

18 blocks gap

The whole gap between your primary and standby database should be LOW.

Clustering the RMAN catalog on a RAC environment

Posted on January 13, 2009 by Ludovico

You have your brand new RAC deployed on a cluster and you want to manage your backups through a recovery catalog.
Suppose you don’t have a dedicate server to host your catalog, perhaps you wouldn’t configure your catalog as a RAC database: so why don’t you use Clusterware to configure your catalog as a single instance in cold failover?

OTN has a very nice whitepaper describing how to protect a single instance database. This can be nicely applied on 10g, 10gR2 or 11g: Using Oracle Clusterware to Protect A Single Instance Oracle Database 11g.

Clusterware is appealing also for traditional cold failover clusters. Licensing allows you to use Clusterware as far as you protect Oracle software or 3rd party software that use Oracle as database backend.

Quick Oracle Dataguard check script

Posted on January 5, 2009 by Ludovico

Oracle Dataguard has his own command-line dgmgrl to check the whole dataguard configuration status.
At least you should check that the show configuration command returns SUCCESS.

This is an hypothetic script:

#!/bin/bash
export ORACLE_HOME=/u1/app/oracle/product/10.2.0
export ORACLE_SID=orcldg
result=`echo "show configuration;" | \
  $ORACLE_HOME/bin/dgmgrl sys/strongpasswd | \
  grep -A 1 "Current status for" | grep -v "Current status for"`
if [ "$result" = "SUCCESS" ] ; then
    exit 0
else
    exit 1
fi

#!/bin/bash

export ORACLE_HOME=/u1/app/oracle/product/10.2.0

export ORACLE_SID=orcldg

result=`echo "show configuration;" | \

$ORACLE_HOME/bin/dgmgrl sys/strongpasswd | \

grep -A 1 "Current status for" | grep -v "Current status for"`

if [ "$result" = "SUCCESS" ] ; then

exit 0

else

exit 1

Another script should check for the gap between production online log and the log stream received by the standby database. This can be accomplished with v$managed_standby view.
The Total Block Gap between production and standby can be calculated this way:
Sum all blocks from v$archived_logs where seq# between Current Standby Seq# and Current Production Seq#. Then add current block# of the production LGWR process and subtract current block# from RFS standby process. This gives you total blocks even if there is a log sequence gap between sites.
This is NOT the gap of online log APPLIED to the standby database. THIS IS THE GAP OF ONLINE LOG TRANSMITTED TO THE STANDBY RFS PROCESS and can be used to monitor your dataguard transmission from production to disaster recovery environment.

This is an excerpt of such script (please take care that it does not check against RFS failures, so it can fails when RFS is not alive):

#!/u1/app/oracle/product/10.2.0/perl/bin/perl -w
use DBI;
use DBD::Oracle qw(:ora_session_modes);
# DB connection #
my $prod  = "orclprod";
my $stby = "orcldr";
my $prodh;
unless ($prodh = DBI->connect('dbi:Oracle:'.$prod,
  'sys', 'strongpassword',
  {PrintError=>0, AutoCommit => 0,
  ora_session_mode => ORA_SYSDBA}))  {
    print "Error connecting to DB: $DBI::errstr\n";
        exit(1);
}
$prodh->{RaiseError}=1;

my $stbyh;
unless ($stbyh = DBI->connect('dbi:Oracle:'.$stby,
  'sys', 'strongpassword',
  {PrintError=>0, AutoCommit => 0,
  ora_session_mode => ORA_SYSDBA}))  {
    print "Error connecting to DB: $DBI::errstr\n";
        $prodh->disconnect;
        exit(1);
}
$stbyh->{RaiseError}=1;

my $sth;
### query prod
$sth = $prodh->prepare( < <eosql );
        select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='LGWR'
EOSQL
$sth->execute();
my ($psequence, $pblock) = $sth->fetchrow_array();
$sth->finish();
### query stdby
$sth = $stbyh->prepare( < </eosql><eosql );
        select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='RFS' and client_process='LGWR'
EOSQL
$sth->execute();
my ($ssequence, $sblock) = $sth->fetchrow_array();
$sth->finish();

printf ("PROD   : %10d %10d\n", $psequence, $pblock);
printf ("STANDBY: %10d %10d\n", $ssequence, $sblock);

$sth = $stbyh->prepare( < </eosql><eosql );
        select nvl(sum(blocks),0)
        + $pblock - $sblock as BLOCK_GAP
    from v\$archived_log
        where sequence# between $ssequence and $psequence
EOSQL
$sth->execute();
my ($blockgap) = $sth->fetchrow_array();
$sth->finish();
printf ("%-10d blocks gap\n", $blockgap);

$stbyh->disconnect;
$prodh->disconnect;
</eosql>

#!/u1/app/oracle/product/10.2.0/perl/bin/perl -w

use DBI;

use DBD::Oracle qw(:ora_session_modes);

# DB connection #

my $prod = "orclprod";

my $stby = "orcldr";

my $prodh;

unless ($prodh = DBI->connect('dbi:Oracle:'.$prod,

'sys', 'strongpassword',

{PrintError=>0, AutoCommit => 0,

ora_session_mode => ORA_SYSDBA})) {

print "Error connecting to DB: $DBI::errstr\n";

exit(1);

}

$prodh->{RaiseError}=1;

my $stbyh;

unless ($stbyh = DBI->connect('dbi:Oracle:'.$stby,

'sys', 'strongpassword',

{PrintError=>0, AutoCommit => 0,

ora_session_mode => ORA_SYSDBA})) {

print "Error connecting to DB: $DBI::errstr\n";

$prodh->disconnect;

exit(1);

}

$stbyh->{RaiseError}=1;

my $sth;

### query prod

$sth = $prodh->prepare( < <eosql );

select SEQUENCE#, BLOCK# from v\$managed_standby

where process='LGWR'

EOSQL

$sth->execute();

my ($psequence, $pblock) = $sth->fetchrow_array();

$sth->finish();

### query stdby

$sth = $stbyh->prepare( < </eosql><eosql );

select SEQUENCE#, BLOCK# from v\$managed_standby

where process='RFS' and client_process='LGWR'

EOSQL

$sth->execute();

my ($ssequence, $sblock) = $sth->fetchrow_array();

$sth->finish();

printf ("PROD : %10d %10d\n", $psequence, $pblock);

printf ("STANDBY: %10d %10d\n", $ssequence, $sblock);

$sth = $stbyh->prepare( < </eosql><eosql );

select nvl(sum(blocks),0)

+ $pblock - $sblock as BLOCK_GAP

from v\$archived_log

where sequence# between $ssequence and $psequence

EOSQL

$sth->execute();

my ($blockgap) = $sth->fetchrow_array();

$sth->finish();

printf ("%-10d blocks gap\n", $blockgap);

$stbyh->disconnect;

$prodh->disconnect;

</eosql>

Any comment is appreciated!

Tips: Bash Prompt and Oracle

Posted on December 30, 2008 by Ludovico

You may want to check the NEW VERSION of this prompt here.

export PS1=\u@\h:\w\$

1	export PS1=\u@\h:\w\$

I disagree with default bash prompt. Do you? It’s quote common to work with long paths:

ludovico@host:/u01/app/oracle/product/10.2.0/network/admin$ \
/nooo/this/command/line/is/really/long/and/offcourse -I \
-will -wrap -my -command -line

ludovico@host:/u01/app/oracle/product/10.2.0/network/admin$ \

/nooo/this/command/line/is/really/long/and/offcourse -I \

-will -wrap -my -command -line

and, when working on multi-database environments I need to check my environment:

env | grep -i oracle
#or
echo $ORACLE_SID
echo $ORACLE_HOME

env | grep -i oracle

#or

echo $ORACLE_SID

echo $ORACLE_HOME

I currently use this prompt, instead:

export PS1=$'\\n# [ $LOGNAME@\h:$PWD [\\t] [`ohvers` SID:${ORACLE_SID:-"no sid"}] ]\\n# '

# [ ludovico@caldara_2k:/u01/app/oracle/product/10.2.0/db_1/network/admin [23:15:58] [10.2.0 SID:orcl] ]
#

export PS1=$'\\n# [ $LOGNAME@\h:$PWD [\\t] [`ohvers` SID:${ORACLE_SID:-"no sid"}] ]\\n# '

# [ ludovico@caldara_2k:/u01/app/oracle/product/10.2.0/db_1/network/admin [23:15:58] [10.2.0 SID:orcl] ]

What is ohvers?? I defined this function to get the version of oracle from my ORACLE_HOME variable:

ohvers ()
{
echo -n $ORACLE_HOME | sed -n 's/.*\/\([[:digit:].]\+\)\/.*/\1/p'
}

ohvers ()

{

echo -n $ORACLE_HOME | sed -n 's/.*\/$[[:digit:].]\+$\/.*/\1/p'

}

Pros:

I have a blank line that separate my prompt from previous output
I get the system clock (useful when saving my konsole history. Did I say konsole?)
I can see my Oracle Environment before launching dangerous commands
I have an empty line to start my endless commands
I have a lot of sharps “#” : they are fine against wrong copy&paste operations…

Suggestions?

JBoss Portal and MySQL scalability: What The…???

Posted on December 10, 2008 by Ludovico

I found several queries running on a MySQL 5.0 database like this one:

SELECT PATH, NAME FROM JBP_OBJECT_NODE  WHERE PK IN (
SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'LUDOVICO'
and ju.jbp_enabled = 1));

SELECT PATH, NAME FROM JBP_OBJECT_NODE WHERE PK IN (

SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'LUDOVICO'

and ju.jbp_enabled = 1));

This query is related to JBoss Portal and does a full scan on table JBP_OBJECT_NODE.

It has bad performances (>0.8 sec) with just a few records:

mysql> select count(*) from JBP_OBJECT_NODE;
+———-+
| count(*) |
+———-+
| 33461 |
+———-+

If I rewrite the query using an inner join (à la Oracle, please forgive me) instead of a subquery I get an index scan:

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from  JBP_OBJECT_NODE
a, JBP_OBJECT_NODE_SEC b
where a.pk=b.NODE_KEY
and b.ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'UTDEMO'
and ju.jbp_enabled = 1);

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from JBP_OBJECT_NODE

a, JBP_OBJECT_NODE_SEC b

where a.pk=b.NODE_KEY

and b.ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'UTDEMO'

and ju.jbp_enabled = 1);

With 30k records the execution time falls down from 0.8 secs to 0.01 secs…
That’s NOT all! I found this open bug:

https://jira.jboss.org/jira/browse/JBPORTAL-2040

With many users registered in, the JBoss Portal Admin console tooks over a minute to show a single page…

I don’t like portals…

Oracle RAC Standard Edition to achieve low cost and high performance

Posted on November 28, 2008 by Ludovico

I finished today to create a new production environment based on 2 Linux serverX86_64 and running Oracle RAC 10gR2. (I know, there is 11g right now, but I’m a conservative!)
Wheeew, I just spent a couple of hours applying all the recommended patches!
We choosed 2 nodes with a maximum of 2 multi-core processors each one so we can license Standard Edition instead of Enterprise Edition. 64bits addressing allow us to allocate many gigabytes of SGA. I’m starting with 5Gb but I think we’ll need more. And a set of 6x300Gb 15krpms disks (it can be expanded with more disks and more shelves).
This configuration keeps low the total cost of ownership but achieves best performance.
Due to disks layout, costs and needed usable storage, we had to configure one huge RAID5 on the SAN with multi-path. I decided anyway to create 2 ASM disk groups (ASM is mandatory for Standard Edition RAC), one for the DB, the second one for the recovery area. With spare disks we should have enough availability and even if it’s a RAID5 I saw good write performances (>150M/s).

Welcome new RAC, I hope we’ll feel good together!

It’s time to trouble…

Posted on November 21, 2008 by Ludovico

Sometimes it’s hard to find enough time to write something or even to only THINK about writing something…

The following are the projects I have to complete before the deadline of December 17th (at least if I still want to go on vacation…)

A totally new Oracle 10gR2 RAC SE on Linux (OCFS2, ASM) including jboss frontends, backups, monitoring, documentation. (Servers are ready today).

A Disaster recovery architecture based on Dataguard with scripts based on rsync to do filesystem replication, with failover and failback, including backups, monitoring, documentation. (The server in DR site is reachable via network today).

A 17 server infrastructure (among others a RAC 10gR2 on linux) transfer from Milan datacenter to here. It’s planned for december 11th but I have to crosscheck backup and contingency requirements.

A 14 server infrastructure (based on Windows and SqlServer) transfer from Milan datacenter to here. To be planned in december.

A totally new cold failover cluster based on linux with Oracle DBMS and E-business suite (Servers will be provided soon, I hope!).

A new standalone Windows Server 64bit to outstand the 32bit allocation bottleneck for a 500Gb oracle database (Server will be provided not before december 10th).

Normally manage the day-by-day work, including replying to e-mails and answering the phone.

AARGH!!

System triggers, stats$user_log and side effects

Posted on October 9, 2008 by Ludovico

Sometimes people get advice from internet: both Metalink or well-known consulting sites.
If people need a fix or a feature, they use to trust advices.

Last week I heard a collegue about a 10g RAC database with performance problems and, since I never lay on my chair, I probed both AWR and ADDM . I suddenly recognized heavy enqueues and physical reads
over a segment named STATS$USER_LOG. “Strange”, I said, “I cannot remember this name in neither perfstat or catalog segments”.
Then I searched the Internet and the Metalink and found the same thing in BOTH metalink.oracle.com and www.dba-oracle.com: a trick to trace logon and logoffs into a table using system triggers.

Look at this code:

create or replace trigger logon_audit_trigger AFTER LOGON ON DATABASE BEGIN insert into stats$user_log values( user, sys_context('USERENV','SESSIONID'), sys_context('USERENV','HOST'), sysdate, to_char(sysdate, 'hh24:mi:ss'), [...] ); COMMIT; END; /

Cool, every single access is kept into stats$user_log.

Let’s see the logoff trigger:

create or replace trigger logoff_audit_trigger BEFORE LOGOFF ON DATABASE BEGIN -- *************************************************** -- Update the last action accessed -- *************************************************** update stats$user_log [...] --*************************************************** -- Update the last program accessed -- *************************************************** update stats$user_log [...] -- *************************************************** [ ... many, many updates ...] -- *************************************************** update stats$user_log [...] -- *************************************************** -- Compute the elapsed minutes -- *************************************************** update stats$user_log set elapsed_minutes = round((logoff_day - logon_day)*1440) where sys_context('USERENV','SESSIONID') = session_id; COMMIT; END; /
That’s all. It inserts a row when someone logons. It updates MANY rows when someone logoffs.
There is no match between the record inserted and the records updated (but the session_id).
Neither indexes or constraints.

What’s the matter?

What happens if we have many logons?

SQL> select num_rows from dba_tables where table_name='STATS$USER_LOG';

NUM_ROWS
———-
3053931

What happens if the execution plan does a full scan?

SQL> explain plan for update stats$user_log […]

Explained.

SQL> @?/rdbms/admin/utlxpls --------------------------------------------- | Id | Operation | Name | --------------------------------------------- | 0 | UPDATE STATEMENT | | | 1 | UPDATE | STATS$USER_LOG | | 2 | TABLE ACCESS FULL| STATS$USER_LOG | ---------------------------------------------

How many reads should it take?

SQL> select bytes/1024/1024 Mb from dba_Segments where segment_name='STATS$USER_LOG';

MB
———-
237

The database performace will decrease constantly and very slowly…..
Remember: never trust a solution if it involves a change on the system.

Plot Oracle historical statistics within SQL*Plus

Posted on September 24, 2008 by Ludovico

More than often I’m asked to investigate “what happened yesterday when performance problems appeared”.

Sometimes I have the Enterprise Manager DB Console licensed, sometimes not. Sometimes I have direct SQL*Net access to the database that I may use to produce custom reports with my LAMP self-developed application. But it may happen that only an ssh access is granted to the db server.

That’s why I started to develop some little scripts to plot the trends of database timed statistics.

Let’s see this one:
SQL> @sysstat.sql Enter a sysstat to search for: physical reads
STAT_ID STAT_NAME ----------- ------------------------------------------
2263124246physical reads 4171507801 physical reads cache 297908839 physical reads cache prefetch 2589616721 physical reads direct 2564935310 physical reads direct (lob) 2663793346 physical reads direct temporary tablespace 473165409 physical reads for flashback new 3102888545 physical reads prefetch warmup 531193461 physical reads retry corrupt

9 rows selected.

Enter the desired stat_id: 2263124246
Enter the start date (YYYYMMDD) [defaults today] : 20080922
Enter the end date date (YYYYMMDD) [defaults today] : 20080922
STAT_NAME START END ---------------- -------- -------- physical reads 20080922 20080922

BEGIN_INTERVAL_TIME VALORE PLOTTED_VALUE ------------------------- ---------- ------------------------- 22-SEP-08 12.00.12.122 AM 0 22-SEP-08 01.00.28.253 AM 120092 22-SEP-08 02.00.05.039 AM 35780 22-SEP-08 03.00.55.595 AM 4792 22-SEP-08 04.00.43.725 AM 4905 22-SEP-08 05.00.31.855 AM 7300 22-SEP-08 06.00.17.017 AM 234596 22-SEP-08 07.00.08.132 AM 24651 22-SEP-08 08.00.50.936 AM 481884 22-SEP-08 09.00.33.488 AM 130201 22-SEP-08 10.00.03.805 AM 1300306 ** 22-SEP-08 11.00.07.764 AM 491857 22-SEP-08 12.00.31.548 PM 304702 22-SEP-08 01.01.04.880 PM 1023664 * 22-SEP-08 02.00.17.822 PM 8588180 ************ 22-SEP-08 03.00.36.969 PM 2201615 *** 22-SEP-08 04.01.01.397 PM 17237098 ************************* 22-SEP-08 05.00.39.262 PM 1606300 ** 22-SEP-08 06.00.03.829 PM 451568 22-SEP-08 07.00.31.461 PM 137684 22-SEP-08 08.00.05.966 PM 203803 22-SEP-08 09.00.24.829 PM 536394 22-SEP-08 10.00.12.945 PM 10209783 ************** 22-SEP-08 11.00.35.123 PM 6151663 *********

24 rows selected.

Oh! At 4.00 PM we had a lot of physical reads. Nice.

This is the code:

-- display given statistics from DBA_HIST_SYSSTAT
col BEGIN_INTERVAL_TIME for a25
set pages 100 lines 130
set verify off term on

accept sysstat prompt 'Enter a sysstat to search for: '
select STAT_ID, STAT_NAME
  from DBA_HIST_STAT_NAME
   where lower(STAT_NAME) like lower('%&sysstat%')
  order by stat_name;

accept stat_id prompt 'Enter the desired stat_id: '
accept startdate prompt 'Start date (YYYYMMDD) [today] : '
accept enddate prompt 'End date date (YYYYMMDD) [today] : '

select STAT_NAME,
  nvl('&startdate',to_char(sysdate,'YYYYMMDD')) as "START",
  nvl('&enddate',to_char(sysdate,'YYYYMMDD')) as "END"
 from DBA_HIST_STAT_NAME
where STAT_ID = &stat_id;

select BEGIN_INTERVAL_TIME, VALORE,
  substr( rpad('*',40*round( VALORE/max(VALORE)over(),2),'*'),1,40) PLOTTED_VALORE
  from (
  select s.BEGIN_INTERVAL_TIME BEGIN_INTERVAL_TIME,
    nvl(decode(greatest(VALUE, nvl(lag(VALUE) over
      (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),0)),
    VALUE,
    VALUE - lag(VALUE) over
      (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),VALUE), 0) VALORE
  from DBA_HIST_SNAPSHOT s,
          DBA_HIST_SYSSTAT g,
          v$instance i
  where s.SNAP_ID=g.SNAP_ID
  and g.STAT_ID='&stat_id'
  and s.BEGIN_INTERVAL_TIME >=
    trunc(to_timestamp(nvl('&startdate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD'))
  and s.BEGIN_INTERVAL_TIME < =
   trunc(to_timestamp(nvl('&enddate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')+1)
  and s.instance_number=i.instance_number
  and s.instance_number=g.instance_number
  order by 1
);

-- display given statistics from DBA_HIST_SYSSTAT

col BEGIN_INTERVAL_TIME for a25

set pages 100 lines 130

set verify off term on

accept sysstat prompt 'Enter a sysstat to search for: '

select STAT_ID, STAT_NAME

from DBA_HIST_STAT_NAME

where lower(STAT_NAME) like lower('%&sysstat%')

order by stat_name;

accept stat_id prompt 'Enter the desired stat_id: '

accept startdate prompt 'Start date (YYYYMMDD) [today] : '

accept enddate prompt 'End date date (YYYYMMDD) [today] : '

select STAT_NAME,

nvl('&startdate',to_char(sysdate,'YYYYMMDD')) as "START",

nvl('&enddate',to_char(sysdate,'YYYYMMDD')) as "END"

from DBA_HIST_STAT_NAME

where STAT_ID = &stat_id;

select BEGIN_INTERVAL_TIME, VALORE,

substr( rpad('*',40*round( VALORE/max(VALORE)over(),2),'*'),1,40) PLOTTED_VALORE

from (

select s.BEGIN_INTERVAL_TIME BEGIN_INTERVAL_TIME,

nvl(decode(greatest(VALUE, nvl(lag(VALUE) over

(partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),0)),

VALUE,

VALUE - lag(VALUE) over

(partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),VALUE), 0) VALORE

from DBA_HIST_SNAPSHOT s,

DBA_HIST_SYSSTAT g,

v$instance i

where s.SNAP_ID=g.SNAP_ID

and g.STAT_ID='&stat_id'

and s.BEGIN_INTERVAL_TIME >=

trunc(to_timestamp(nvl('&startdate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD'))

and s.BEGIN_INTERVAL_TIME < =

trunc(to_timestamp(nvl('&enddate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')+1)

and s.instance_number=i.instance_number

and s.instance_number=g.instance_number

order by 1

);

Ciao
—
Ludovico