Oracle Instances and real memory consumption on Linux and Solaris

There’s a way to know the REAL memory usage by Oracle Instance, including all connecting processes and using the shell rather than a connection to oracle?

The short answer is “I think so” 🙂

Summing up RSS column from ps output, is not reliable because Linux uses a copy-on-write on process forks and also doesn’t take into account correctly the shared memory and other shared allocations.

I’ve come across this post on Pythian’s Blog from Marc Billette.

While it seems good I’ve had discording results depending on platform and release.

Instead, I’ve tried to create a shell snippet that always uses pmap but works differently and SEEMS to work correctly on Linux ans Solaris.

Basically, using the pmap script I get a lot of information about the different memory areas allocated to the process:

 

Initially I’ve tried to decode correctly the different kinds of memory the same way other scripts I’ve found online do:

but finally the ADDRESS is the same from different processes when the memory area is shared, so my script now just get a unique line for each address and sums up the memory size (not the rss one!):

This should give the total virtual memory allocated by the different Oracle instances.

The results I get are plausible both on Linux and Solaris.

Example:

If you find any error let me know and I’ll fix the script!

Ludovico

A good news to start November in a good mood

I’ve just noticed that my room mate at #OOW13 has been recognised as ACE Director.

Even if he was already a world-known performance specialist, this announcement makes me one more time proud to know him. Well deserved Chris! 🙂

Generating graphs massively from Windows Performance Counters logs

Windows Performance Monitor is an invaluable tool when you don’t have external enterprise monitoring tools and you need to face performance problems, whether you have a web/application server, a mail server or a database server.

But what I don’t personally like of it is what you get in terms of graphing. If you schedule and collect a big amount of performance metrics you will likely get lost in adding/removing such metrics from the graphical interface.

What I’ve done long time ago (and I’ve done again recently after my old laptop has been stolen 🙁 ) is to prepare a PHP script that parse the resulting CSV file and generate automatically one graph for each metric that could be found.

Unfortunately, most of Windows Sysadmin between you will disagree that I’ve done this using a Linux Box. But I guess you can use my script if you install php inside cygwin. The other tool you need, is rrdtool, again I use it massively to resolve my graphing needs.

How to collect your data

Basically you need to create any Data Collector within the Performance Monitor that generates a log file. You can specify directly a CSV file (Log format: Comma separated) or generate a BLG file and convert it later (Log format: Binary). System dumps are not used, so if you use the standard Performace template, you can delete it from your collection.

Remember that the more counters you take, the more the graph generation will take. The script does not run in parallel, so it will use only one core. Generally:

Where (Speed factor) is depending on both the CPU speed and the disk speed because of the huge number of syncs required to update several thousands of files. I’ve tried to reduce the number of rrdupdates by queuing several update values in a single command line and I’ve noticed an important increase of performances, but I know it’s not enough.

Converting a BLG (binary) log into a CSV log

Just use the relog tool:

 Generating the graphs

Transfer the CSV on the box where you have the php and rrdtool configured, then run:

 

generated_graphs

Now it’s done! 

The script generate a folder with the name of the server (LUDO in my example) and a subfolder for each class of counters (as you see in Performance Monitor).

Inside each folder you will have a PNG (and an rrd) for each metric.

 

generated_graph_cpu

 

Important: The RRD are generated with a single round-robin archive with a size equal to the number of samples. If you want to have the rrd to store your historical data you’ll need to modify the script. Also, the size of the graph will be the same as the number of samples (for best reading), but limited to 1000 to avoid huge images.

Future Improvements

Would be nice to have a prepared set of graphs for standard graphs with multiple metrics (e.g. CPU user, system and idle together) and additional lines like regressions…

Download the script: process_l_php.txt and rename it with a .php extension.

Hope you’ll find it useful!

Cheers

Ludo

Oracle capacity planning with RRDTOOL

RRDize everything, chapter 2

Oracle Database Server has the most powerful system catalog that allows to query almost any aspect inside an oracle instance.
You can query many v$ fixed views at regular intervals and populate many RRD files through rrdtool: space usage, wait events. system statistics and so on…

Since release 10.1 Oracle has introduced Automatic Workload Repository, a finer version of old good Statspack.
No matter if you are using AWR or statspack, you can rely on their views to collect data for your RRDs.

If you are administering a new instance and you haven’t collected its statistics so far, you can query (as example) the DBA_HIST_BG_EVENT_SUMMARY view to gather all AWR data about wait events. Historical views could be useful also to collect historical data once a week rather than query the fixed views every few minutes doing the hard work twice (you and AWR).

The whole process of gathering performance data and update rrd files can be resumed into the following steps:

– connect to the database
– query the AWR’s views
– build and execute an rrdtool update command
– check if rrd file exists or create it
– update the rrd file

The less rrdtool update commands you will execute, the better the whole process will perform.
Do it in a language you are comfortable with and that supports easily connection descriptors.

Since I’m very comfortable with php, I did it this way.

This is a very basilar script that works greatly for me with good performances:

Depending on how many different wait events you have, you’ll have a certain number of rrd files:

As you can see, they are not so big…

Once you have your data in rrd files, it’s quite simple to script even complex plots with several datasources. Everything depends on the results you want.
This script stack all my wait events for a certain instance: it takes the directory containing all the rrds as first argument and the number of hours we want to be plotted as second argument:

The resulting command is very long:

This is the resulting graph:
Graph plotted with rrdtool displaying Oracle instance Wait Events

OHHHHHHHHHHHH COOOOL!!!
😉

Any comment is appreciated! thanks

How to collect Oracle Application Server performance data with DMS and RRDtool

RRDize everything, chapter 1

If you are managing some Application Server deployments you should have wondered how to check and collect performance data.
As stated in documentation, you can gather performance metrics with the dmstool utility.
AFAIK, this can be done from 9.0.2 release upwards, but i’m concerned DMS will not work on Weblogic.

Mainly, you should have an external server that acts as collector (it could be a server in the Oracle AS farm as well): copy the dms.jar library from an Oracle AS installation to your collector and use it as you would use dmstool:

There are three basilar methods to get data:

Get all metrics at once:

Get only the interesting metrics:

Get metrics included into specific DMS tables:

What youraddress:// is, it depends on the component you are trying to connect:

If you are trying to connect to the OHS (Apache), be careful to allow remote access from the collector by editing the dms.conf file.

Now that you can query dms data, you should store it somewhere.
Personally, I did a first attempt with dmstool -dump format=xml. I wrote a parser in PHP with SimpleXML extension and I did a lot of inserts into a MySQL database. After a few months the whole data collected from tens of servers was too much to be mantained…
To avoid the maintenance of a DWH-grade database I investigated and found RRDTool. Now I’m asking how could I live without it!

I then wrote a parser in awk that parse the output of the dms.jar call and invoke an rrdtool update command.
I always use dms.jar -table command. The output has always the same format:

So I written an awk file that works for me.
use it this way:

And this is the code for update_metric_rrd:

Once you have all your rrd files populated, it’s easy to script automatic reporting. You would probably want a graph with the request count served by your Apache cluster, along with its linear regression:

This is the result:
OHS request completed
OHHHHHHHHHHHH!!!! COOL!!!!

That’s all for DMS capacity planning. Stay tuned, more about rrdtool is coming!

JBoss Portal and MySQL scalability: What The…???

I found several queries running on a MySQL 5.0 database like this one:

This query is related to JBoss Portal and does a full scan on table JBP_OBJECT_NODE.

It has bad performances (>0.8 sec) with just a few records:

mysql> select count(*) from JBP_OBJECT_NODE;
+———-+
| count(*) |
+———-+
|    33461 |
+———-+

If I rewrite the query using an inner join (à la Oracle, please forgive me) instead of a subquery I get an index scan:

With 30k records the execution time falls down from 0.8 secs to 0.01 secs…
That’s NOT all! I found this open bug:

https://jira.jboss.org/jira/browse/JBPORTAL-2040

With many users registered in, the JBoss Portal Admin console tooks over a minute to show a single page…

I don’t like portals…

Oracle RAC Standard Edition to achieve low cost and high performance

I finished today to create a new production environment based on 2 Linux serverX86_64 and running Oracle RAC 10gR2. (I know, there is 11g right now, but I’m a conservative!)
Wheeew, I just spent a couple of hours applying all the recommended patches!
We choosed 2 nodes with a maximum of 2 multi-core processors each one so we can license Standard Edition instead of Enterprise Edition. 64bits addressing allow us to allocate many gigabytes of SGA. I’m starting with 5Gb but I think we’ll need more. And a set of 6x300Gb 15krpms disks (it can be expanded with more disks and more shelves).
This configuration keeps low the total cost of ownership but achieves best performance.
Due to disks layout, costs and needed usable storage, we had to configure one huge RAID5 on the SAN with multi-path. I decided anyway to create 2 ASM disk groups (ASM is mandatory for Standard Edition RAC), one for the DB, the second one for the recovery area. With spare disks we should have enough availability and even if it’s a RAID5 I saw good write performances (>150M/s).

Welcome new RAC, I hope we’ll feel good together!

System triggers, stats$user_log and side effects

Sometimes people get advice from internet: both Metalink or well-known consulting sites.
If people need a fix or a feature, they use to trust advices.

Last week I heard a collegue about a 10g RAC database with performance problems and, since I never lay on my chair, I probed both AWR and ADDM . I suddenly recognized heavy enqueues and physical reads
over a segment named STATS$USER_LOG. “Strange”, I said, “I cannot remember this name in neither perfstat or catalog segments”.
Then I searched the Internet and the Metalink and found the same thing in BOTH metalink.oracle.com and www.dba-oracle.com: a trick to trace logon and logoffs into a table using system triggers.

Look at this code:

create or replace trigger
logon_audit_trigger
AFTER LOGON ON DATABASE
BEGIN
insert into stats$user_log values(
user,
sys_context('USERENV','SESSIONID'),
sys_context('USERENV','HOST'),
sysdate,
to_char(sysdate, 'hh24:mi:ss'),
[...]
);
COMMIT;
END;
/

Cool, every single access is kept into stats$user_log.

Let’s see the logoff trigger:

create or replace trigger
logoff_audit_trigger
BEFORE LOGOFF ON DATABASE
BEGIN
-- ***************************************************
-- Update the last action accessed
-- ***************************************************
update stats$user_log [...]
--***************************************************
-- Update the last program accessed
-- ***************************************************
update stats$user_log [...]
-- ***************************************************
[ ... many, many updates ...]
-- ***************************************************
update stats$user_log [...]
-- ***************************************************
-- Compute the elapsed minutes
-- ***************************************************
update stats$user_log set elapsed_minutes =
round((logoff_day - logon_day)*1440)
where
sys_context('USERENV','SESSIONID') = session_id;
COMMIT;
END;
/

That’s all. It inserts a row when someone logons. It updates MANY rows when someone logoffs.
There is no match between the record inserted and the records updated (but the session_id).
Neither indexes or constraints.

What’s the matter?

What happens if we have many logons?

SQL> select num_rows from dba_tables where table_name='STATS$USER_LOG';

NUM_ROWS
———-
3053931

What happens if the execution plan does a full scan?

SQL> explain plan for update stats$user_log […]

Explained.

SQL> @?/rdbms/admin/utlxpls
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | UPDATE STATEMENT | |
| 1 | UPDATE | STATS$USER_LOG |
| 2 | TABLE ACCESS FULL| STATS$USER_LOG |
---------------------------------------------

How many reads should it take?

SQL> select bytes/1024/1024 Mb from dba_Segments where segment_name='STATS$USER_LOG';

MB
———-
237

The database performace will decrease constantly and very slowly…..
Remember: never trust a solution if it involves a change on the system.