RRDize everything, chapter 2
Oracle Database Server has the most powerful system catalog that allows to query almost any aspect inside an oracle instance.
You can query many v$ fixed views at regular intervals and populate many RRD files through rrdtool: space usage, wait events. system statistics and so on…
Since release 10.1 Oracle has introduced Automatic Workload Repository, a finer version of old good Statspack.
No matter if you are using AWR or statspack, you can rely on their views to collect data for your RRDs.
If you are administering a new instance and you haven’t collected its statistics so far, you can query (as example) the DBA_HIST_BG_EVENT_SUMMARY view to gather all AWR data about wait events. Historical views could be useful also to collect historical data once a week rather than query the fixed views every few minutes doing the hard work twice (you and AWR).
The whole process of gathering performance data and update rrd files can be resumed into the following steps:
– connect to the database
– query the AWR’s views
– build and execute an rrdtool update command
– check if rrd file exists or create it
– update the rrd file
The less rrdtool update commands you will execute, the better the whole process will perform.
Do it in a language you are comfortable with and that supports easily connection descriptors.
Since I’m very comfortable with php, I did it this way.
This is a very basilar script that works greatly for me with good performances:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
#!/usr/bin/php -f < ?php define('WD','/opt/oracle/awr'); $cs = $_SERVER['argv'][1]; $user = 'mymonitoruser'; $pass = 'mystrongpassword'; /* open a new connection */ $ds = oci_connect($user, $pass, $cs) or die ("Cannot connect to Oracle Database ".$cs."\n"); /* setting client nls environment */ $sql = "alter session set nls_timestamp_format='MM/DD/YY HH24:MI'"; $stmt = oci_parse($ds, $sql); oci_execute($stmt); oci_free_statement($stmt); /* create directory that will contain rrds (if not exists) */ if(!file_exists(WD.'/'.$cs)) mkdir(WD.'/'.$cs); if(!file_exists(WD.'/'.$cs.'/wait')) mkdir(WD.'/'.$cs.'/wait'); /* function to create new RRDs */ function createRRD($name, $interval, $cs) { $hb = $interval*5; //heartbeat $cmd="rrdtool create ".WD."/".$cs."/wait/${name}.rrd -s ".$interval." \ -b \"now -3month\" DS:waits:DERIVE:$hb:0:U \ DS:mswaited:DERIVE:$hb:0:U \ RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:30:336 \ RRA:AVERAGE:0.5:120:372 RRA:AVERAGE:0.5:720:730 \ RRA:MIN:0.5:1:1440 RRA:MIN:0.5:30:336 \ RRA:MIN:0.5:120:372 RRA:MIN:0.5:720:730 \ RRA:MAX:0.5:1:1440 RRA:MAX:0.5:30:336 \ RRA:MAX:0.5:120:372 RRA:MAX:0.5:720:730 \ RRA:LAST:0.5:1:1440"; //print $cmd."\n"; return passthru($cmd); } /* take the snapshot frequency from dba_hist_wr_control to create the RDD with correct heartbeat value */ $sql = 'select extract(hour from snap_interval)*3600 + extract(minute from snap_interval)*60 as SEED from DBA_HIST_WR_CONTROL'; $stmt = oci_parse($ds, $sql); oci_execute($stmt); $row = oci_fetch_assoc($stmt); $interval = $row['SEED']; unset($row); oci_free_statement($stmt); /* statement definition that will collect all snapshots for a certain wait event with more than a certain amonut of time waited. Gathering ALL EVENTS could be time consuming and useless. I fetch rows ordered by event_name rather then by date because I can update many values into the same rrd with very few rrdupdate commands */ $sql = 'select s.END_INTERVAL_TIME END_INTERVAL_TIME, g.EVENT_NAME, g.WAIT_CLASS, g.TOTAL_WAITS, round(g.TIME_WAITED_MICRO/1000) MS from DBA_HIST_SNAPSHOT s, dba_hist_bg_event_summary g, v$instance i where s.SNAP_ID=g.SNAP_ID and g.wait_class!=\'Idle\' and g.TIME_WAITED_MICRO>100000 and s.instance_number=i.instance_number and s.instance_number=g.instance_number order by 2,1'; /* default prefetch size (148) matches default snapshot retention (24hx7dd) */ $stmt = oci_parse($ds, $sql); oci_set_prefetch($stmt, 148); oci_execute($stmt); $i=0; $oldevent=""; while ($row = oci_fetch_assoc($stmt)) { if ($oldevent != $row['EVENT_NAME']) { //NEW EVENT DETECTED: WILL START A NEW UPDATE CMD if ($i != 0 && !empty($cmd)) { /* not the first occurrence, I bet there's something in my buffer */ passthru($cmd); } $cleanName = preg_replace ("([^[:alnum:]_-])","_",$row['EVENT_NAME']); // if there is no rrd for this event, I create a new one if (!file_exists(WD."/".$cs."/wait/${cleanName}.rrd")) { createRRD($cleanName, $interval, $cs); } /* * I initialize a new update command. This string act as a buffer: I append many * values to be updated so I'll update many values in a single command line: * less forks of rrdtool and less file opens: the whole update process has an * enormous improvement. */ $precmd="rrdtool update ".WD."/".$cs."/wait/${cleanName}.rrd "; $lastcmd="rrdtool info ".WD."/".$cs."/wait/${cleanName}.rrd". "| grep last_update | awk '{print \$NF}'"; $last=trim(`$lastcmd`); printf ("%s - %s - last: %d\n", $row['EVENT_NAME'], $cleanName, $last); $i=0; $cmd=$precmd; $oldevent=$row['EVENT_NAME']; } $time=strtotime($row['END_INTERVAL_TIME']); //print "time: ".$time." last: ".$last."\n"; if ( $time > $last ) { $cmd.=" ".$time.":".$row['TOTAL_WAITS'].":".$row['MS']; $i++; } if ($i >= 40) { // when I reach 40 values per commandline I force // the update: next loop will reinitialize a new commandline. passthru($cmd); $cmd=$precmd; $i=0; } unset($row); } if ($i != 0) { /* one more update pending in my buffer */ passthru($cmd); } oci_free_statement($stmt); oci_close($ds); ?> |
Depending on how many different wait events you have, you’ll have a certain number of rrd files:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# ls -l total 3864 -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 Streams_AQ__enqueue_blocked_on_low_memory.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 buffer_busy_waits.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_parallel_write.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 control_file_sequential_read.rrd -rw-r--r-- 1 ludovico ludovico 165304 Apr 30 10:12 cursor__pin_S_wait_on_X.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_scattered_read.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 db_file_sequential_read.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 events_in_waitclass_Other.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__cache_buffers_chains.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__library_cache.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 latch__library_cache_lock.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 20 08:18 latch__redo_writing.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__row_cache_objects.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 latch__shared_pool.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 library_cache_load_lock.rrd -rw-r--r-- 1 ludovico ludovico 165304 Apr 15 13:17 library_cache_lock.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_buffer_space.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_parallel_write.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_sequential_read.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_single_write.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 log_file_switch_completion.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 11 13:22 log_file_sync.rrd -rw-r--r-- 1 ludovico ludovico 165304 May 25 15:00 os_thread_startup.rrd |
As you can see, they are not so big…
Once you have your data in rrd files, it’s quite simple to script even complex plots with several datasources. Everything depends on the results you want.
This script stack all my wait events for a certain instance: it takes the directory containing all the rrds as first argument and the number of hours we want to be plotted as second argument:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
cs=$1 hours=${2:-148} eventlist=`ls $cs/wait/*rrd` colors[1]="#000000" colors[2]="#000055" colors[3]="#0000aa" colors[4]="#0000ff" colors[5]="#550055" colors[6]="#aa00aa" colors[7]="#ff00ff" colors[8]="#550000" colors[9]="#aa0000" colors[10]="#ff0000" colors[11]="#555500" colors[12]="#aaaa00" colors[13]="#ffff00" colors[14]="#005500" colors[15]="#00aa00" colors[16]="#00ff00" colors[17]="#005555" colors[18]="#00aaaa" colors[19]="#00ffff" colors[20]="#555555" colors[21]="#aaaaaa" i=0 for event in $eventlist ; do if [ $i -eq 0 ] ; then end=`rrdtool info $event | grep last_update | awk '{print $NF}'` end=`rrdtool info $cs/wait/control_file_parallel_write.rrd | grep last_update | awk '{print $NF}'` cmd="rrdtool graph - -s end-${hours}hours -e $end -v \"milliseconds waited\" -l 0 -w 640 -h 240 -t \"$cs WAIT PROFILE\"" i=$(($i+1)) fi color=${colors[$i]} echo $color evname=`basename $event | sed -e s/\.rrd\$//` cmd="$cmd DEF:$evname=$event:mswaited:AVERAGE" cmd="$cmd AREA:${evname}${color}:"$evname":STACK" i=$(($i+1)) if [ $i -eq 20 ] ; then i=1 fi done cmd="$cmd |display /dev/input" echo $cmd eval $cmd exit |
The resulting command is very long:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
rrdtool graph - -s end-148hours -e 1243252800 \ -v "milliseconds waited" -l 0 -w 640 -h 240 -t "mydb WAIT PROFILE"\ DEF:Streams_AQ__enqueue_blocked_on_low_memory=mydb/wait/Streams_AQ__enqueue_blocked_on_low_memory.rrd:mswaited:AVERAGE \ AREA:Streams_AQ__enqueue_blocked_on_low_memory#000000:Streams_AQ__enqueue_blocked_on_low_memory:STACK\ DEF:buffer_busy_waits=mydb/wait/buffer_busy_waits.rrd:mswaited:AVERAGE \ AREA:buffer_busy_waits#000055:buffer_busy_waits:STACK\ DEF:control_file_parallel_write=mydb/wait/control_file_parallel_write.rrd:mswaited:AVERAGE \ AREA:control_file_parallel_write#0000aa:control_file_parallel_write:STACK\ DEF:control_file_sequential_read=mydb/wait/control_file_sequential_read.rrd:mswaited:AVERAGE \ AREA:control_file_sequential_read#0000ff:control_file_sequential_read:STACK\ DEF:cursor__pin_S_wait_on_X=mydb/wait/cursor__pin_S_wait_on_X.rrd:mswaited:AVERAGE \ AREA:cursor__pin_S_wait_on_X#550055:cursor__pin_S_wait_on_X:STACK\ DEF:db_file_scattered_read=mydb/wait/db_file_scattered_read.rrd:mswaited:AVERAGE \ AREA:db_file_scattered_read#aa00aa:db_file_scattered_read:STACK\ DEF:db_file_sequential_read=mydb/wait/db_file_sequential_read.rrd:mswaited:AVERAGE \ AREA:db_file_sequential_read#ff00ff:db_file_sequential_read:STACK\ DEF:events_in_waitclass_Other=mydb/wait/events_in_waitclass_Other.rrd:mswaited:AVERAGE \ AREA:events_in_waitclass_Other#550000:events_in_waitclass_Other:STACK\ DEF:latch__cache_buffers_chains=mydb/wait/latch__cache_buffers_chains.rrd:mswaited:AVERAGE \ AREA:latch__cache_buffers_chains#aa0000:latch__cache_buffers_chains:STACK\ DEF:latch__library_cache=mydb/wait/latch__library_cache.rrd:mswaited:AVERAGE \ AREA:latch__library_cache#ff0000:latch__library_cache:STACK\ DEF:latch__library_cache_lock=mydb/wait/latch__library_cache_lock.rrd:mswaited:AVERAGE \ AREA:latch__library_cache_lock#555500:latch__library_cache_lock:STACK\ DEF:latch__redo_writing=mydb/wait/latch__redo_writing.rrd:mswaited:AVERAGE \ AREA:latch__redo_writing#aaaa00:latch__redo_writing:STACK\ DEF:latch__row_cache_objects=mydb/wait/latch__row_cache_objects.rrd:mswaited:AVERAGE \ AREA:latch__row_cache_objects#ffff00:latch__row_cache_objects:STACK\ DEF:latch__shared_pool=mydb/wait/latch__shared_pool.rrd:mswaited:AVERAGE \ AREA:latch__shared_pool#005500:latch__shared_pool:STACK\ DEF:library_cache_load_lock=mydb/wait/library_cache_load_lock.rrd:mswaited:AVERAGE \ AREA:library_cache_load_lock#00aa00:library_cache_load_lock:STACK\ DEF:library_cache_lock=mydb/wait/library_cache_lock.rrd:mswaited:AVERAGE \ AREA:library_cache_lock#00ff00:library_cache_lock:STACK\ DEF:log_buffer_space=mydb/wait/log_buffer_space.rrd:mswaited:AVERAGE \ AREA:log_buffer_space#005555:log_buffer_space:STACK\ DEF:log_file_parallel_write=mydb/wait/log_file_parallel_write.rrd:mswaited:AVERAGE \ AREA:log_file_parallel_write#00aaaa:log_file_parallel_write:STACK\ DEF:log_file_sequential_read=mydb/wait/log_file_sequential_read.rrd:mswaited:AVERAGE \ AREA:log_file_sequential_read#00ffff:log_file_sequential_read:STACK\ DEF:log_file_single_write=mydb/wait/log_file_single_write.rrd:mswaited:AVERAGE \ AREA:log_file_single_write#000000:log_file_single_write:STACK\ DEF:log_file_switch_completion=mydb/wait/log_file_switch_completion.rrd:mswaited:AVERAGE \ AREA:log_file_switch_completion#000055:log_file_switch_completion:STACK\ DEF:log_file_sync=mydb/wait/log_file_sync.rrd:mswaited:AVERAGE \ AREA:log_file_sync#0000aa:log_file_sync:STACK\ DEF:os_thread_startup=mydb/wait/os_thread_startup.rrd:mswaited:AVERAGE \ AREA:os_thread_startup#0000ff:os_thread_startup:STACK |display /dev/input |
OHHHHHHHHHHHH COOOOL!!!
😉
Any comment is appreciated! thanks