DBA survival BLOG

DBA stuff and Oracle Data Guard

Another problem with “KSV master wait” and “ASM file metadata operation”

Posted on March 24, 2017 by Ludovico

1

My customer today tried to do a duplicate on a cluster. When preparing the auxiliary instance, she noticed that the startup nomount was hanging forever: Nothing in the alert, nothing in the trace files.

Because the database and the spfile were stored inside ASM, I’ve been quite suspicious…

The ASM trace files had the following entries:

kfgbDiscoverNow: called for group 1/0x9f5bfe53 (ACFS)

*** 2017-03-24 12:42:13.327
2017-03-24 12:42:13.327: [    GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.DiscoveryString='/dev/mapper/asm_*,/dev/asm_*'

*** 2017-03-24 12:42:15.386
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:18.387
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:21.393
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:24.398
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:27.403
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

kfgbDiscoverNow: called for group 1/0x9f5bfe53 (ACFS)

*** 2017-03-24 12:42:13.327

2017-03-24 12:42:13.327: [ GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.DiscoveryString='/dev/mapper/asm_*,/dev/asm_*'

*** 2017-03-24 12:42:15.386

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:18.387

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:21.393

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:24.398

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:27.403

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

The ASM instance had the following sessions waiting:

SQL>  select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

INST_ID  SID SERIAL# STATUS  EVENT                        WAIT_CLASS WAIT_TIME LOGON_TIME          PROGRAM                             MACHINE
------- ---- ------- ------- ---------------------------- ---------- --------- ------------------- ----------------------------------- --------
      2   36   41916 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 13:47:28 oracle@clusrv02 (O001)              clusrv02
      2  266   64885 ACTIVE  KSV master wait              Other              0 24.03.2017 13:47:25 oracletorcl01v@clusrv02 (TNS V1-V3) clusrv02
      1  483   63446 ACTIVE  KSV master wait              Other              0 24.03.2017 13:31:14 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01
      1  497   31202 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 13:39:07 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01
      3  708     484 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 12:38:56 OMS                                 omssrv01

1

2

3

4

5

6

7

8

9

SQL> select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

INST_ID SID SERIAL# STATUS EVENT WAIT_CLASS WAIT_TIME LOGON_TIME PROGRAM MACHINE

------- ---- ------- ------- ---------------------------- ---------- --------- ------------------- ----------------------------------- --------

2 36 41916 ACTIVE ASM file metadata operation Other 0 24.03.2017 13:47:28 oracle@clusrv02 (O001) clusrv02

2 266 64885 ACTIVE KSV master wait Other 0 24.03.2017 13:47:25 oracletorcl01v@clusrv02 (TNS V1-V3) clusrv02

1 483 63446 ACTIVE KSV master wait Other 0 24.03.2017 13:31:14 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01

1 497 31202 ACTIVE ASM file metadata operation Other 0 24.03.2017 13:39:07 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01

3 708 484 ACTIVE ASM file metadata operation Other 0 24.03.2017 12:38:56 OMS omssrv01

OMS?

Around 12:38:56, another colleague in the office added a disk to one of the disk groups, through Enterprise Manager 12c!

But there were no rebalance operations:

SQL> select * from gv$asm_operation;

no rows selected

1

2

3

SQL> select * from gv$asm_operation;

no rows selected

It’s not the first time that I hit this type of problems. Sadly, sometimes it requires a full restart of the cluster or of ASM (because of different bugs).

This time, however, I have tried to kill only the foreground sessions waiting on “ASM file metadata operation”, starting with the one coming from the OMS.

Surprisingly, after killing that session, everything was fine again:

-- on +ASM3
SQL> alter system kill session '708,484';

System altered.

SQL>

SQL>  select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

no rows selected

SQL>

1

2

3

4

5

6

7

8

9

10

11

12

-- on +ASM3

SQL> alter system kill session '708,484';

System altered.

SQL>

SQL> select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

no rows selected

SQL>

I never add disks via OMS (I’m a sqlplus guy ;-)) , I wonder what went wrong with it 🙂

—

Ludovico