Archive for the ‘Linux’ Category

A great talk

Friday, July 10th, 2009

John Allspaw and Paul Hammond (Flickr) talk about Dev and Ops Cooperation at Flickr (very good).

Have a look at Kitchen Soap blog.

How to collect Oracle Application Server performance data with DMS and RRDtool

Monday, March 2nd, 2009

RRDize everything, chapter 1

If you are managing some Application Server deployments you should have wondered how to check and collect performance data.
As stated in documentation, you can gather performance metrics with the dmstool utility.
AFAIK, this can be done from 9.0.2 release upwards, but i’m concerned DMS will not work on Weblogic.

Mainly, you should have an external server that acts as collector (it could be a server in the Oracle AS farm as well): copy the dms.jar library from an Oracle AS installation to your collector and use it as you would use dmstool:

java -jar dms.jar [dmstool options]

There are three basilar methods to get data:

Get all metrics at once:

java -jar dms.jar -dump -a "youraddress://..." [format=xml]

Get only the interesting metrics:

java -jar dms.jar -a "youraddress://..." metric metric ...

Get metrics included into specific DMS tables:

java -jar dms.jar -a "youraddress://..." -table table table ...

What youraddress:// is, it depends on the component you are trying to connect:

opmn://asserver:6003
http://asserver:7200/dms0/Spy
ajp13://asserver:3301/dmsoc4j/Spy

If you are trying to connect to the OHS (Apache), be careful to allow remote access from the collector by editing the dms.conf file.

Now that you can query dms data, you should store it somewhere.
Personally, I did a first attempt with dmstool -dump format=xml. I wrote a parser in PHP with SimpleXML extension and I did a lot of inserts into a MySQL database. After a few months the whole data collected from tens of servers was too much to be mantained…
To avoid the maintenance of a DWH-grade database I investigated and found RRDTool. Now I’m asking how could I live without it!

I then wrote a parser in awk that parse the output of the dms.jar call and invoke an rrdtool update command.
I always use dms.jar -table command. The output has always the same format:

###SOF

Mon Mar 02 17:01:19 CET 2009

---------------
TABLE1_Name
---------------

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

---
TABLE2_Name
---

record1_metric1.name:     value       units
record1_metric2.name:     value       units
....

record2_metric1.name:     value       units
record2_metric2.name:     value       units
....

##EOF

So I written an awk file that works for me.
use it this way:

 java -jar dms.jar ... | awk -f parse_output.awk
####################
# parse_output.awk #
####################
 
#function pl() replaces all non alphanumeric occurrences with an underscore
function pl(input) {
        return gensub("[^[:alnum:]_-]","_","G",input);
}
 
# function get_rrd_path() returns a path where the rrd files should be placed
# I should rewrite a new path for each dms table... I'll skip many of them
function get_rrd_path() {
        if (table == "mod_oc4j_destination_metrics")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Name.value"]), pl(var) );
        if (table == "mod_oc4j_mount_pt_metrics")
                return sprintf("%s/%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Destination.value"]), pl(record["Name.value"]), pl(var) );
        if (table == "ohs_server")
                return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
        if (table == "JVM")
                return sprintf("%s/%s/%s/%s.rrd", record["Host"],
                    pl(table), pl(record["Process"]), pl(var) );
        if (table == "opmn_process")
                return sprintf("%s/%s/%s/%s/%s/%s/%s/%s.rrd", record["Host"], pl(table),
                  pl(record["iasInstance.value"]), pl(record["opmn_ias_component"]),
                  pl(record["opmn_process_type"]),pl(record["opmn_process_set"]),
                  pl(record["Name"]), pl(var) );
 
        return sprintf("%s/%s/%s.rrd", record["Host"], pl(table), pl(var) );
}
# function process_record actually does the dirty work of invoking the update script
function process_record() {
        #every record has a timeStamp.ts metric that I should use to update my rrd
        ts=substr(record["timeStamp.ts"],0,10);
        for ( var in record ) {
        if ( var != "timeStamp.ts" && record[var] ~ /^[[:digit:]]+$/ ) {
            if ( var ~ /\.(count|completed|time)$/ ) {
                dstype="DERIVE";
            } else {
                if ( var == "responseSize.value" ) {
                    dstype="DERIVE";
                } else {
                    dstype="GAUGE";
                }
            }
            rrdFile=sprintf("/path_to_data/%s",get_rrd_path());
            #### update_metric_rrd is a shell script listed below!!!!!
            cmd=sprintf("/path_to_scripts/update_metric_rrd %s %s %d %d",
                rrdFile,dstype,ts,record[var]);
            system(cmd);
            }
        }
}
 
# parse_record() populates an hash array
# with all metrics belonging to the table record
function parse_record() {
    #print "RRRR -  START OF RECORD (table " table ")"
    delete record
    while ( ! /^$/ ) {
        # I'm parsing the record as far I'm in this while statement
        # the array hash is the name of the dms metric basename.
        # $1 is the metric name but I have to trim the final ":"
        key=substr($1,0,length($1)-1)
        record[key]=$2
        getline
    }
    # this function is included in funcions.awk:
    # I invoke it to process the record I've just parsed
    process_record();
}
BEGIN {
    # as far as started is 0, I've never reached the first table
    started=0
}
 
#MAIN
{
    # I jump over the first lines until I reach the first table
    if (started==0) {
        while ( ! /^---/ ) {
           getline
        }
        started=1
    }
 
    # looking for the next occurrence of a table
    # all tables start with:
    # ----------
    # table_name
    # ----------
    if ( /^---/ ) {
        # first table reached: the next row is my table name,
        # then I reach again a dashed line -----
        getline table
        getline trash
        #print ""
        #print "##########################"
        print "  TABELLA " table
        #print "##########################"
        next
    }
 
    if ( ! /^$/ ) {
        # reached an empty line: could be the end of a record or the and of a table
        # since a new table is threated in previous "if" statement, I'm starting a new record.
        parse_record()
    }
 
}
 
END {
}

And this is the code for update_metric_rrd:

#!/bin/bash
RRDFILE=$1
DSTYPE=$2
TS=$3
VALUE=$4
 
rrdtool update $RRDFILE ${TS}:${VALUE}
 
if [ $? -ne 0 ] ; then
        DIR=`dirname $RRDFILE`
 
        [ -d $DIR ] || mkdir -p $DIR
        [ -f $RRDFILE ] || rrdtool create $RRDFILE -b "now-1month" -s 1800 \
                DS:metric:${DSTYPE}:7200:0:U \
                RRA:AVERAGE:0.5:1:672 \
                RRA:AVERAGE:0.5:4:1080 \
                RRA:AVERAGE:0.5:12:1460 \
                RRA:AVERAGE:0.5:48:1095 \
                RRA:MAX:0.5:4:1080 \
                RRA:MAX:0.5:12:1460 \
                RRA:MAX:0.5:48:1095 \
                RRA:LAST:0.5:1:672
        rrdtool update $RRDFILE ${TS}:${VALUE}
fi

Once you have all your rrd files populated, it’s easy to script automatic reporting. You would probably want a graph with the request count served by your Apache cluster, along with its linear regression:

rrdtool graph - -s "end-${hours}hours" -e $end \
                -v "Requests Completed/sec" \
        -w 640 -h 240 --slope-mode \
                -t "HTTP Requests for www.ludovicocaldara.net" \
                DEF:1request_completed=/data/wwwserver1/ohs_server/request_completed.rrd:metric:AVERAGE \
                DEF:2request_completed=/data/wwwserver2/ohs_server/request_completed.rrd:metric:AVERAGE \
                CDEF:request_completed=1request_completed,2request_completed,+ \
                VDEF:slope=request_completed,LSLSLOPE \
                VDEF:lslint=request_completed,LSLINT \
                CDEF:reg=request_completed,POP,slope,COUNT,*,lslint,+ \
                LINE1:reg#666666:"Regression" \
                AREA:1request_completed#4040AA:"wwwserver1"  \
                AREA:2request_completed#6666FF:"wwwserver1":STACK  \
        > mygraph.png

This is the result:
OHS request completed
OHHHHHHHHHHHH!!!! COOL!!!!

That’s all for DMS capacity planning. Stay tuned, more about rrdtool is coming!

Interesting Slides about Capacity Planning

Thursday, January 29th, 2009

Browsing the web I found this interesting presentation by John Allspaw:

Slides from ‘Capacity Planning for LAMP’ talk at MySQL Conf 2007.
(Damn, I had to clean out the embedded flash because of invalid markup…)

I’m preparing some stuff related to RRDTool, I’ll post soon here some code snippets.

Awk snippet to count TCP sockets grouped by state

Monday, January 19th, 2009

Depending on the release of awk it could be:

#!/usr/bin/gawk -f
{
        if ( ($NF) in stats )  {
                stats[$NF] = stats[$NF]+1;
        } else {
                stats[$NF]=1;
        }
}
END {
        for ( var in stats) {
                print var " = " stats[var];
        }
}

I saved the script as netstat_c.
I have to filter my netstat output to match only my tcp sockets prior to pipe the output to the script.

On linux:

$ netstat -a | grep ^tcp | netstat_c
LISTEN = 13
ESTABLISHED = 74
TIME_WAIT = 7

This is great to check my webserver connections when I do stress tests.

Quick Oracle Dataguard check script

Monday, January 5th, 2009

Oracle Dataguard has his own command-line dgmgrl to check the whole dataguard configuration status.
At least you should check that the show configuration command returns SUCCESS.

This is an hypothetic script:

#!/bin/bash
export ORACLE_HOME=/u1/app/oracle/product/10.2.0
export ORACLE_SID=orcldg
result=`echo "show configuration;" | \
  $ORACLE_HOME/bin/dgmgrl sys/strongpasswd | \
  grep -A 1 "Current status for" | grep -v "Current status for"`
if [ "$result" = "SUCCESS" ] ; then
    exit 0
else
    exit 1
fi

Another script should check for the gap between production online log and the log stream received by the standby database. This can be accomplished with v$managed_standby view.
The Total Block Gap between production and standby can be calculated this way:
Sum all blocks from v$archived_logs where seq# between Current Standby Seq# and Current Production Seq#. Then add current block# of the production LGWR process and subtract current block# from RFS standby process. This gives you total blocks even if there is a log sequence gap between sites.
This is NOT the gap of online log APPLIED to the standby database. THIS IS THE GAP OF ONLINE LOG TRANSMITTED TO THE STANDBY RFS PROCESS and can be used to monitor your dataguard transmission from production to disaster recovery environment.

This is an excerpt of such script (please take care that it does not check against RFS failures, so it can fails when RFS is not alive):

#!/u1/app/oracle/product/10.2.0/perl/bin/perl -w
use DBI;
use DBD::Oracle qw(:ora_session_modes);
# DB connection #
my $prod  = "orclprod";
my $stby = "orcldr";
my $prodh;
unless ($prodh = DBI->connect('dbi:Oracle:'.$prod,
  'sys', 'strongpassword',
  {PrintError=>0, AutoCommit => 0,
  ora_session_mode => ORA_SYSDBA}))  {
    print "Error connecting to DB: $DBI::errstr\n";
        exit(1);
}
$prodh->{RaiseError}=1;
 
my $stbyh;
unless ($stbyh = DBI->connect('dbi:Oracle:'.$stby,
  'sys', 'strongpassword',
  {PrintError=>0, AutoCommit => 0,
  ora_session_mode => ORA_SYSDBA}))  {
    print "Error connecting to DB: $DBI::errstr\n";
        $prodh->disconnect;
        exit(1);
}
$stbyh->{RaiseError}=1;
 
my $sth;
### query prod
$sth = $prodh->prepare( < <EOSQL );
        select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='LGWR'
EOSQL
$sth->execute();
my ($psequence, $pblock) = $sth->fetchrow_array();
$sth->finish();
### query stdby
$sth = $stbyh->prepare( < <EOSQL );
        select SEQUENCE#, BLOCK# from v\$managed_standby
        where process='RFS' and client_process='LGWR'
EOSQL
$sth->execute();
my ($ssequence, $sblock) = $sth->fetchrow_array();
$sth->finish();
 
printf ("PROD   : %10d %10d\n", $psequence, $pblock);
printf ("STANDBY: %10d %10d\n", $ssequence, $sblock);
 
$sth = $stbyh->prepare( < <EOSQL );
        select nvl(sum(blocks),0)
        + $pblock - $sblock as BLOCK_GAP
    from v\$archived_log
        where sequence# between $ssequence and $psequence
EOSQL
$sth->execute();
my ($blockgap) = $sth->fetchrow_array();
$sth->finish();
printf ("%-10d blocks gap\n", $blockgap);
 
$stbyh->disconnect;
$prodh->disconnect;

Any comment is appreciated!

Tips: Bash Prompt and Oracle

Tuesday, December 30th, 2008
export PS1=\u@\h:\w\$

I disagree with default bash prompt. Do you? It’s quote common to work with long paths:

ludovico@host:/u01/app/oracle/product/10.2.0/network/admin$ \
/nooo/this/command/line/is/really/long/and/offcourse -I \
-will -wrap -my -command -line

and, when working on multi-database environments I need to check my environment:

env | grep -i oracle
#or
echo $ORACLE_SID
echo $ORACLE_HOME

I currently use this prompt, instead:

export PS1=$'\\n# [ $LOGNAME@\h:$PWD [\\t] [`ohvers` SID:${ORACLE_SID:-"no sid"}] ]\\n# '
 
# [ ludovico@caldara_2k:/u01/app/oracle/product/10.2.0/db_1/network/admin [23:15:58] [10.2.0 SID:orcl] ]
#

What is `ohvers`?? I defined this function to get the version of oracle from my ORACLE_HOME variable:

ohvers ()
{
echo -n $ORACLE_HOME | sed -n 's/.*\/\([[:digit:].]\+\)\/.*/\1/p'
}

Pros:

  • I have a blank line that separate my prompt from previous output
  • I get the system clock (useful when saving my konsole history. Did I say konsole?)
  • I can see my Oracle Environment before launching dangerous commands
  • I have an empty line to start my endless commands
  • I have a lot of sharps “#” : they are fine against wrong copy&paste operations…

Suggestions?

Oracle RAC Standard Edition to achieve low cost and high performance

Friday, November 28th, 2008

I finished today to create a new production environment based on 2 Linux serverX86_64 and running Oracle RAC 10gR2. (I know, there is 11g right now, but I’m a conservative!)
Wheeew, I just spent a couple of hours applying all the recommended patches!
We choosed 2 nodes with a maximum of 2 multi-core processors each one so we can license Standard Edition instead of Enterprise Edition. 64bits addressing allow us to allocate many gigabytes of SGA. I’m starting with 5Gb but I think we’ll need more. And a set of 6×300Gb 15krpms disks (it can be expanded with more disks and more shelves).
This configuration keeps low the total cost of ownership but achieves best performance.
Due to disks layout, costs and needed usable storage, we had to configure one huge RAID5 on the SAN with multi-path. I decided anyway to create 2 ASM disk groups (ASM is mandatory for Standard Edition RAC), one for the DB, the second one for the recovery area. With spare disks we should have enough availability and even if it’s a RAID5 I saw good write performances (>150M/s).

Welcome new RAC, I hope we’ll feel good together!

It’s time to trouble…

Friday, November 21st, 2008

Sometimes it’s hard to find enough time to write something or even to only THINK about writing something

The following are the projects I have to complete before the deadline of December 17th (at least if I still want to go on vacation…)

  • A totally new Oracle 10gR2 RAC SE on Linux (OCFS2, ASM) including jboss frontends, backups, monitoring, documentation. (Servers are ready today).
  • A Disaster recovery architecture based on Dataguard with scripts based on rsync to do filesystem replication, with failover and failback, including backups, monitoring, documentation. (The server in DR site is reachable via network today).
  • A 17 server infrastructure (among others a RAC 10gR2 on linux) transfer from Milan datacenter to here. It’s planned for december 11th but I have to crosscheck backup and contingency requirements.
  • A 14 server infrastructure (based on Windows and SqlServer) transfer from Milan datacenter to here. To be planned in december.
  • A totally new cold failover cluster based on linux with Oracle DBMS and E-business suite (Servers will be provided soon, I hope!).
  • A new standalone Windows Server 64bit to outstand the 32bit allocation bottleneck for a 500Gb oracle database (Server will be provided not before december 10th).
  • Normally manage the day-by-day work, including replying to e-mails and answering the phone.

AARGH!!