DBA survival BLOG

DBA stuff and Oracle Data Guard

JBoss Portal and MySQL scalability: What The…???

Posted on December 10, 2008 by Ludovico

I found several queries running on a MySQL 5.0 database like this one:

SELECT PATH, NAME FROM JBP_OBJECT_NODE  WHERE PK IN (
SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'LUDOVICO'
and ju.jbp_enabled = 1));

SELECT PATH, NAME FROM JBP_OBJECT_NODE WHERE PK IN (

SELECT NODE_KEY FROM JBP_OBJECT_NODE_SEC WHERE ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'LUDOVICO'

and ju.jbp_enabled = 1));

This query is related to JBoss Portal and does a full scan on table JBP_OBJECT_NODE.

It has bad performances (>0.8 sec) with just a few records:

mysql> select count(*) from JBP_OBJECT_NODE;
+———-+
| count(*) |
+———-+
| 33461 |
+———-+

If I rewrite the query using an inner join (à la Oracle, please forgive me) instead of a subquery I get an index scan:

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from  JBP_OBJECT_NODE
a, JBP_OBJECT_NODE_SEC b
where a.pk=b.NODE_KEY
and b.ROLE IN (
SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,
jbp_roles jr
where jrm.jbp_uid = ju.jbp_uid
and jr.jbp_rid = jrm.jbp_rid
and ju.jbp_uname = 'UTDEMO'
and ju.jbp_enabled = 1);

SELECT distinct a.PATH, a.NAME /* , b.NODE_KEY */ from JBP_OBJECT_NODE

a, JBP_OBJECT_NODE_SEC b

where a.pk=b.NODE_KEY

and b.ROLE IN (

SELECT jr.jbp_name from jbp_users ju, jbp_role_membership jrm,

jbp_roles jr

where jrm.jbp_uid = ju.jbp_uid

and jr.jbp_rid = jrm.jbp_rid

and ju.jbp_uname = 'UTDEMO'

and ju.jbp_enabled = 1);

With 30k records the execution time falls down from 0.8 secs to 0.01 secs…
That’s NOT all! I found this open bug:

https://jira.jboss.org/jira/browse/JBPORTAL-2040

With many users registered in, the JBoss Portal Admin console tooks over a minute to show a single page…

I don’t like portals…

Oracle RAC Standard Edition to achieve low cost and high performance

Posted on November 28, 2008 by Ludovico

I finished today to create a new production environment based on 2 Linux serverX86_64 and running Oracle RAC 10gR2. (I know, there is 11g right now, but I’m a conservative!)
Wheeew, I just spent a couple of hours applying all the recommended patches!
We choosed 2 nodes with a maximum of 2 multi-core processors each one so we can license Standard Edition instead of Enterprise Edition. 64bits addressing allow us to allocate many gigabytes of SGA. I’m starting with 5Gb but I think we’ll need more. And a set of 6x300Gb 15krpms disks (it can be expanded with more disks and more shelves).
This configuration keeps low the total cost of ownership but achieves best performance.
Due to disks layout, costs and needed usable storage, we had to configure one huge RAID5 on the SAN with multi-path. I decided anyway to create 2 ASM disk groups (ASM is mandatory for Standard Edition RAC), one for the DB, the second one for the recovery area. With spare disks we should have enough availability and even if it’s a RAID5 I saw good write performances (>150M/s).

Welcome new RAC, I hope we’ll feel good together!

It’s time to trouble…

Posted on November 21, 2008 by Ludovico

Sometimes it’s hard to find enough time to write something or even to only THINK about writing something…

The following are the projects I have to complete before the deadline of December 17th (at least if I still want to go on vacation…)

A totally new Oracle 10gR2 RAC SE on Linux (OCFS2, ASM) including jboss frontends, backups, monitoring, documentation. (Servers are ready today).

A Disaster recovery architecture based on Dataguard with scripts based on rsync to do filesystem replication, with failover and failback, including backups, monitoring, documentation. (The server in DR site is reachable via network today).

A 17 server infrastructure (among others a RAC 10gR2 on linux) transfer from Milan datacenter to here. It’s planned for december 11th but I have to crosscheck backup and contingency requirements.

A 14 server infrastructure (based on Windows and SqlServer) transfer from Milan datacenter to here. To be planned in december.

A totally new cold failover cluster based on linux with Oracle DBMS and E-business suite (Servers will be provided soon, I hope!).

A new standalone Windows Server 64bit to outstand the 32bit allocation bottleneck for a 500Gb oracle database (Server will be provided not before december 10th).

Normally manage the day-by-day work, including replying to e-mails and answering the phone.

AARGH!!

System triggers, stats$user_log and side effects

Posted on October 9, 2008 by Ludovico

Sometimes people get advice from internet: both Metalink or well-known consulting sites.
If people need a fix or a feature, they use to trust advices.

Last week I heard a collegue about a 10g RAC database with performance problems and, since I never lay on my chair, I probed both AWR and ADDM . I suddenly recognized heavy enqueues and physical reads
over a segment named STATS$USER_LOG. “Strange”, I said, “I cannot remember this name in neither perfstat or catalog segments”.
Then I searched the Internet and the Metalink and found the same thing in BOTH metalink.oracle.com and www.dba-oracle.com: a trick to trace logon and logoffs into a table using system triggers.

Look at this code:

create or replace trigger logon_audit_trigger AFTER LOGON ON DATABASE BEGIN insert into stats$user_log values( user, sys_context('USERENV','SESSIONID'), sys_context('USERENV','HOST'), sysdate, to_char(sysdate, 'hh24:mi:ss'), [...] ); COMMIT; END; /

Cool, every single access is kept into stats$user_log.

Let’s see the logoff trigger:

create or replace trigger logoff_audit_trigger BEFORE LOGOFF ON DATABASE BEGIN -- *************************************************** -- Update the last action accessed -- *************************************************** update stats$user_log [...] --*************************************************** -- Update the last program accessed -- *************************************************** update stats$user_log [...] -- *************************************************** [ ... many, many updates ...] -- *************************************************** update stats$user_log [...] -- *************************************************** -- Compute the elapsed minutes -- *************************************************** update stats$user_log set elapsed_minutes = round((logoff_day - logon_day)*1440) where sys_context('USERENV','SESSIONID') = session_id; COMMIT; END; /
That’s all. It inserts a row when someone logons. It updates MANY rows when someone logoffs.
There is no match between the record inserted and the records updated (but the session_id).
Neither indexes or constraints.

What’s the matter?

What happens if we have many logons?

SQL> select num_rows from dba_tables where table_name='STATS$USER_LOG';

NUM_ROWS
———-
3053931

What happens if the execution plan does a full scan?

SQL> explain plan for update stats$user_log […]

Explained.

SQL> @?/rdbms/admin/utlxpls --------------------------------------------- | Id | Operation | Name | --------------------------------------------- | 0 | UPDATE STATEMENT | | | 1 | UPDATE | STATS$USER_LOG | | 2 | TABLE ACCESS FULL| STATS$USER_LOG | ---------------------------------------------

How many reads should it take?

SQL> select bytes/1024/1024 Mb from dba_Segments where segment_name='STATS$USER_LOG';

MB
———-
237

The database performace will decrease constantly and very slowly…..
Remember: never trust a solution if it involves a change on the system.

Plot Oracle historical statistics within SQL*Plus

Posted on September 24, 2008 by Ludovico

More than often I’m asked to investigate “what happened yesterday when performance problems appeared”.

Sometimes I have the Enterprise Manager DB Console licensed, sometimes not. Sometimes I have direct SQL*Net access to the database that I may use to produce custom reports with my LAMP self-developed application. But it may happen that only an ssh access is granted to the db server.

That’s why I started to develop some little scripts to plot the trends of database timed statistics.

Let’s see this one:
SQL> @sysstat.sql Enter a sysstat to search for: physical reads
STAT_ID STAT_NAME ----------- ------------------------------------------
2263124246physical reads 4171507801 physical reads cache 297908839 physical reads cache prefetch 2589616721 physical reads direct 2564935310 physical reads direct (lob) 2663793346 physical reads direct temporary tablespace 473165409 physical reads for flashback new 3102888545 physical reads prefetch warmup 531193461 physical reads retry corrupt

9 rows selected.

Enter the desired stat_id: 2263124246
Enter the start date (YYYYMMDD) [defaults today] : 20080922
Enter the end date date (YYYYMMDD) [defaults today] : 20080922
STAT_NAME START END ---------------- -------- -------- physical reads 20080922 20080922

BEGIN_INTERVAL_TIME VALORE PLOTTED_VALUE ------------------------- ---------- ------------------------- 22-SEP-08 12.00.12.122 AM 0 22-SEP-08 01.00.28.253 AM 120092 22-SEP-08 02.00.05.039 AM 35780 22-SEP-08 03.00.55.595 AM 4792 22-SEP-08 04.00.43.725 AM 4905 22-SEP-08 05.00.31.855 AM 7300 22-SEP-08 06.00.17.017 AM 234596 22-SEP-08 07.00.08.132 AM 24651 22-SEP-08 08.00.50.936 AM 481884 22-SEP-08 09.00.33.488 AM 130201 22-SEP-08 10.00.03.805 AM 1300306 ** 22-SEP-08 11.00.07.764 AM 491857 22-SEP-08 12.00.31.548 PM 304702 22-SEP-08 01.01.04.880 PM 1023664 * 22-SEP-08 02.00.17.822 PM 8588180 ************ 22-SEP-08 03.00.36.969 PM 2201615 *** 22-SEP-08 04.01.01.397 PM 17237098 ************************* 22-SEP-08 05.00.39.262 PM 1606300 ** 22-SEP-08 06.00.03.829 PM 451568 22-SEP-08 07.00.31.461 PM 137684 22-SEP-08 08.00.05.966 PM 203803 22-SEP-08 09.00.24.829 PM 536394 22-SEP-08 10.00.12.945 PM 10209783 ************** 22-SEP-08 11.00.35.123 PM 6151663 *********

24 rows selected.

Oh! At 4.00 PM we had a lot of physical reads. Nice.

This is the code:

-- display given statistics from DBA_HIST_SYSSTAT
col BEGIN_INTERVAL_TIME for a25
set pages 100 lines 130
set verify off term on

accept sysstat prompt 'Enter a sysstat to search for: '
select STAT_ID, STAT_NAME
  from DBA_HIST_STAT_NAME
   where lower(STAT_NAME) like lower('%&sysstat%')
  order by stat_name;

accept stat_id prompt 'Enter the desired stat_id: '
accept startdate prompt 'Start date (YYYYMMDD) [today] : '
accept enddate prompt 'End date date (YYYYMMDD) [today] : '

select STAT_NAME,
  nvl('&startdate',to_char(sysdate,'YYYYMMDD')) as "START",
  nvl('&enddate',to_char(sysdate,'YYYYMMDD')) as "END"
 from DBA_HIST_STAT_NAME
where STAT_ID = &stat_id;

select BEGIN_INTERVAL_TIME, VALORE,
  substr( rpad('*',40*round( VALORE/max(VALORE)over(),2),'*'),1,40) PLOTTED_VALORE
  from (
  select s.BEGIN_INTERVAL_TIME BEGIN_INTERVAL_TIME,
    nvl(decode(greatest(VALUE, nvl(lag(VALUE) over
      (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),0)),
    VALUE,
    VALUE - lag(VALUE) over
      (partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),VALUE), 0) VALORE
  from DBA_HIST_SNAPSHOT s,
          DBA_HIST_SYSSTAT g,
          v$instance i
  where s.SNAP_ID=g.SNAP_ID
  and g.STAT_ID='&stat_id'
  and s.BEGIN_INTERVAL_TIME >=
    trunc(to_timestamp(nvl('&startdate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD'))
  and s.BEGIN_INTERVAL_TIME < =
   trunc(to_timestamp(nvl('&enddate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')+1)
  and s.instance_number=i.instance_number
  and s.instance_number=g.instance_number
  order by 1
);

-- display given statistics from DBA_HIST_SYSSTAT

col BEGIN_INTERVAL_TIME for a25

set pages 100 lines 130

set verify off term on

accept sysstat prompt 'Enter a sysstat to search for: '

select STAT_ID, STAT_NAME

from DBA_HIST_STAT_NAME

where lower(STAT_NAME) like lower('%&sysstat%')

order by stat_name;

accept stat_id prompt 'Enter the desired stat_id: '

accept startdate prompt 'Start date (YYYYMMDD) [today] : '

accept enddate prompt 'End date date (YYYYMMDD) [today] : '

select STAT_NAME,

nvl('&startdate',to_char(sysdate,'YYYYMMDD')) as "START",

nvl('&enddate',to_char(sysdate,'YYYYMMDD')) as "END"

from DBA_HIST_STAT_NAME

where STAT_ID = &stat_id;

select BEGIN_INTERVAL_TIME, VALORE,

substr( rpad('*',40*round( VALORE/max(VALORE)over(),2),'*'),1,40) PLOTTED_VALORE

from (

select s.BEGIN_INTERVAL_TIME BEGIN_INTERVAL_TIME,

nvl(decode(greatest(VALUE, nvl(lag(VALUE) over

(partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),0)),

VALUE,

VALUE - lag(VALUE) over

(partition by s.dbid, s.instance_number, g.stat_name order by s.snap_id),VALUE), 0) VALORE

from DBA_HIST_SNAPSHOT s,

DBA_HIST_SYSSTAT g,

v$instance i

where s.SNAP_ID=g.SNAP_ID

and g.STAT_ID='&stat_id'

and s.BEGIN_INTERVAL_TIME >=

trunc(to_timestamp(nvl('&startdate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD'))

and s.BEGIN_INTERVAL_TIME < =

trunc(to_timestamp(nvl('&enddate',to_char(sysdate,'YYYYMMDD')),'YYYYMMDD')+1)

and s.instance_number=i.instance_number

and s.instance_number=g.instance_number

order by 1

);

Ciao
—
Ludovico