My customer today tried to do a duplicate on a cluster. When preparing the auxiliary instance, she noticed that the startup nomount was hanging forever: Nothing in the alert, nothing in the trace files.
Because the database and the spfile were stored inside ASM, I’ve been quite suspicious…
The ASM trace files had the following entries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
kfgbDiscoverNow: called for group 1/0x9f5bfe53 (ACFS) *** 2017-03-24 12:42:13.327 2017-03-24 12:42:13.327: [ GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.DiscoveryString='/dev/mapper/asm_*,/dev/asm_*' *** 2017-03-24 12:42:15.386 kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53) *** 2017-03-24 12:42:18.387 kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53) *** 2017-03-24 12:42:21.393 kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53) *** 2017-03-24 12:42:24.398 kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53) *** 2017-03-24 12:42:27.403 kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53) |
The ASM instance had the following sessions waiting:
1 2 3 4 5 6 7 8 9 |
SQL> select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid; INST_ID SID SERIAL# STATUS EVENT WAIT_CLASS WAIT_TIME LOGON_TIME PROGRAM MACHINE ------- ---- ------- ------- ---------------------------- ---------- --------- ------------------- ----------------------------------- -------- 2 36 41916 ACTIVE ASM file metadata operation Other 0 24.03.2017 13:47:28 oracle@clusrv02 (O001) clusrv02 2 266 64885 ACTIVE KSV master wait Other 0 24.03.2017 13:47:25 oracletorcl01v@clusrv02 (TNS V1-V3) clusrv02 1 483 63446 ACTIVE KSV master wait Other 0 24.03.2017 13:31:14 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01 1 497 31202 ACTIVE ASM file metadata operation Other 0 24.03.2017 13:39:07 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01 3 708 484 ACTIVE ASM file metadata operation Other 0 24.03.2017 12:38:56 OMS omssrv01 |
OMS?
Around 12:38:56, another colleague in the office added a disk to one of the disk groups, through Enterprise Manager 12c!
But there were no rebalance operations:
1 2 3 |
SQL> select * from gv$asm_operation; no rows selected |
It’s not the first time that I hit this type of problems. Sadly, sometimes it requires a full restart of the cluster or of ASM (because of different bugs).
This time, however, I have tried to kill only the foreground sessions waiting on “ASM file metadata operation”, starting with the one coming from the OMS.
Surprisingly, after killing that session, everything was fine again:
1 2 3 4 5 6 7 8 9 10 11 12 |
-- on +ASM3 SQL> alter system kill session '708,484'; System altered. SQL> SQL> select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid; no rows selected SQL> |
I never add disks via OMS (I’m a sqlplus guy ;-)) , I wonder what went wrong with it 🙂
—
Ludovico