DBA survival BLOG

DBA stuff and Oracle Data Guard

Changing FPP temporary directory (/tmp in noexec and other issues)

Posted on July 13, 2021 by Ludovico

When using FPP, you might experience the following error (PRVF-7546):

$ rhpctl add workingcopy -workingcopy WC_db_19_11_FPPC -image db_19_11 -path /u01/app/oracle/product/WC_db_19_11_FPPC -client fppc -oraclebase /u01/app/oracle
fpps01: Audit ID: 121
PRGO-1260 : Cluster Verification checks for database home provisioning  failed for the specified working copy WC_db_19_11_FPPC.
PRCR-1178 : Execution of command failed on one or more nodes
 
PRVF-7546 : The work directory "/tmp/CVU_19.0.0.0.0_oracle/" cannot be used on node "fppc02"

$ rhpctl add workingcopy -workingcopy WC_db_19_11_FPPC -image db_19_11 -path /u01/app/oracle/product/WC_db_19_11_FPPC -client fppc -oraclebase /u01/app/oracle

fpps01: Audit ID: 121

PRGO-1260 : Cluster Verification checks for database home provisioning failed for the specified working copy WC_db_19_11_FPPC.

PRCR-1178 : Execution of command failed on one or more nodes

PRVF-7546 : The work directory "/tmp/CVU_19.0.0.0.0_oracle/" cannot be used on node "fppc02"

This is often related to the filesystem /tmp that has the “noexec” option:

$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec)

1 2	$ mount \| grep /tmp tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec)

Although it is tempting to just remount the filesystem with “exec”, you might be in this situation because your systems are configured to adhere to the STIG recommendations:

The noexec option must be added to the /tmp partition (https://www.stigviewer.com/stig/red_hat_enterprise_linux_6/2016-12-16/finding/V-57569)

FPP 19.9 contains fix 30885598 that allows specifying the temporary location for FPP operations:

$ srvctl modify rhpserver  -tmploc <new_tmp>

1	$ srvctl modify rhpserver -tmploc <new_tmp>

After that, the operation should run smoothly:

fppc02: Successfully executed clone operation.
fppc02: Executing root script on nodes ltora401,ltora402.
fppc02: Successfully executed root script on nodes fppc01,fppc02.
fppc02: Working copy creation completed.
fppc02: Oracle home provisioned.
fpps01: Client-side action completed.

fppc02: Successfully executed clone operation.

fppc02: Executing root script on nodes ltora401,ltora402.

fppc02: Successfully executed root script on nodes fppc01,fppc02.

fppc02: Working copy creation completed.

fppc02: Oracle home provisioned.

fpps01: Client-side action completed.

HTH

—

Ludo

Oracle Fleet Patching and Provisioning (FPP): My new role as PM and a brand new series of blog posts

Posted on May 4, 2021 by Ludovico

It’s been 6 years since I’ve tried FPP for the first time (formerly Rapid Home Provisioning, or RHP).

Rapid Home Provisioning

FPP was still young and lacking many features at that time, but it already changed the way I’ve worked during the next years. I embraced the out of place patching, developed some basic scripts to install Oracle Homes, and sought automation and standardization at all costs:

Oracle Home Management – part 7: Putting all together

When 18c came with the FPP local-mode automaton, I have implemented it for the Grid Infrastructure patching strategy at CERN:

Oracle Grid Infrastructure 18c patching part 3: Executing out-of-place patching with the local-mode automaton

And discovered that meanwhile, FPP did giant steps, with many new features and fixes for quite a few usability and performance problems.

Last year, when joining the Oracle Database High Availability (HA), Scalability, and Maximum Availability Architecture (MAA) Product Management Team at Oracle, I took (among others) the Product Manager role for FPP.

Becoming an Oracle employee after 20 years of working with Oracle technology is a big leap. It allows me to understand how big the company is, and how collaborative and friendly the Oracle employees are (Yes, I was used to marketing nonsense, insisting salesmen, and unfriendly license auditors. This is slowly changing with Oracle embracing the Cloud, but it is still a fresh wound for many customers. Expect this to change even more! Regarding me… I’ll be the same I’ve always been 🙂 ).

Now I have daily meetings with big customers (bigger than the ones I have ever had in the past), development teams, other product managers, Oracle consultants, and community experts. My primary goal is to make the product better, increasing its adoption, and helping customers having the best experience with it. This includes testing the product myself, writing specs, presentations, videos, collecting feedback from the customers, tracking bugs, and manage escalations.

I am a Product Manager for other products as well, but I have to admit that FPP is the product that takes most of my Product Manager time. Why?

I will give a few reasons in my next blog post(s).

—

Ludo

The fear of (availability) loss is a path to the dark side.

Posted on October 5, 2020 by Ludovico

I have been a DBA/consultant for customers and big production environments for over twenty years. I have explained more or less my career path in this blog post.

Database (and application) high availability has always been one of my favorite areas. Over the years I have become a high availability expert (my many blog posts are there to confirm it) and I have spent a lot of time building, troubleshooting, teaching, presenting, advocating these gems of technology that are RAC, Data Guard, Application Continuity and the many other products that are part of the Oracle Maximum Availability Architecture solution. Customers fear downtime, and I have always been with them on that. But in my case, it looks like Yoda’s famous quote worked well for me (in a good way):

I’ll be joining the Oracle Maximum Availability Architecture Product Management team as MAA Product Manager (or rather Cloud MAA, I will not explain here ;-)) next November.

(for those who are not familiar with the joke, the “Dark Side” is how we often refer to the Oracle employees in the Oracle Community 😉 )

I remember just like if it was yesterday that I was presenting some Data Guard 12c new features in front of a big audience at Collaborate 2014. There I have met two incredible people that were part of the MAA Product Management team: Larry Carpenter and Markus Michalewicz. Larry has been a great source of inspiration to improve my seniority and ease of presenting in front of the public, while Markus has become a friend over the years in addition of being one of the most influent persons in my professional network.

Now I have got the opportunity to join that team, and I feel like it’s the most natural change to do in my career.

And because I imagine some of you will have some questions, there are some answers to questions I’ve been frequently asked so far:

MAA PM does not mean becoming team lead or supervising other colleagues, I’ll be a “regular” PM
I will stay in Switzerland and work remotely from here
I will stay in “the conference circus” and keep presenting as soon the COVID-19 situation will allow to do so
Yes, I was VERY happy in Trivadis and it will always have a special place in my heart
Yep, that means no ACE Director award anymore 😉

Exciting times ahead! 🙂

Data Guard, Easy Connect and the Observer for multiple configurations

Posted on August 14, 2020 by Ludovico

EZConnect

One of the challenges of automation in bin Oracle Environments is dealing with tnsnames.ora files.
These files might grow big and are sometimes hard to distribute/maintain properly.
The worst is when manual modifications are needed: manual operations, if not made carefully, can screw up the connection to the databases.
The best solution is always using LDAP naming resolution. I have seen customers using OID, OUD, Active Directory, openldapd, all with a great level of control and automation. However, some customer don’t have/want this possibility and keep relying on TNS naming resolution.
When Data Guard (and eventually RAC) are in place, the tnsnames.ora gets filled by entries for the DGConnectIdentifiers and StaticConnectIdentifier. If I add the observer, an additional entry is required to access the dbname_CFG service created by the Fast Start Failover.

Actually, all these entries are not required if I use Easy Connect.

My friend Franck Pachot wrote a couple of nice blog posts about Easy Connect while working with me at CERN:
https://medium.com/@FranckPachot/19c-easy-connect-e0c3b77968d7

https://medium.com/@FranckPachot/19c-ezconnect-and-wallet-easy-connect-and-external-password-file-8e326bb8c9f5

Basic Data Guard configuration

The basic configuration with Data Guard is quite simple to achieve with Easy Connect. In this examples I have:
– The primary database TOOLCDB1_SITE1
– The duplicated database for standby TOOLCDB1_SITE2

After setting up the static registration (no Grid Infrastructure in my lab):

SID_LIST_LISTENER=
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=TOOLCDB1_SITE1_DGMGRL)
      (SID_NAME=TOOLCDB1)
      (ORACLE_HOME=/u01/app/oracle/product/db_19_8_0)
    )
  )

SID_LIST_LISTENER=

(SID_LIST=

(SID_DESC=

(GLOBAL_DBNAME=TOOLCDB1_SITE1_DGMGRL)

(SID_NAME=TOOLCDB1)

(ORACLE_HOME=/u01/app/oracle/product/db_19_8_0)

)

and copying the passwordfile, the configuration can be created with:

DGMGRL> create configuration TOOLCDB1 as primary database is TOOLCDB1_SITE1 connect identifier is 'newbox01:1521/TOOLCDB1_SITE1';
Configuration "toolcdb1" created with primary database "toolcdb1_site1"

DGMGRL>  edit database TOOLCDB1_SITE1 set property 'StaticConnectIdentifier'='newbox01:1521/TOOLCDB1_SITE1_DGMGRL';
Property "StaticConnectIdentifier" updated

DGMGRL>  add database TOOLCDB1_SITE2 as connect identifier is 'newbox02:1521/TOOLCDB1_SITE2';
Database "toolcdb1_site2" added

DGMGRL>  edit database TOOLCDB1_SITE2 set property 'StaticConnectIdentifier'='newbox02:1521/TOOLCDB1_SITE2_DGMGRL';
Property "StaticConnectIdentifier" updated

DGMGRL>  enable configuration;
Enabled.

DGMGRL> create configuration TOOLCDB1 as primary database is TOOLCDB1_SITE1 connect identifier is 'newbox01:1521/TOOLCDB1_SITE1';

Configuration "toolcdb1" created with primary database "toolcdb1_site1"

DGMGRL> edit database TOOLCDB1_SITE1 set property 'StaticConnectIdentifier'='newbox01:1521/TOOLCDB1_SITE1_DGMGRL';

Property "StaticConnectIdentifier" updated

DGMGRL> add database TOOLCDB1_SITE2 as connect identifier is 'newbox02:1521/TOOLCDB1_SITE2';

Database "toolcdb1_site2" added

DGMGRL> edit database TOOLCDB1_SITE2 set property 'StaticConnectIdentifier'='newbox02:1521/TOOLCDB1_SITE2_DGMGRL';

Property "StaticConnectIdentifier" updated

DGMGRL> enable configuration;

Enabled.

That’s it.

Now, if I want to have the configuration observed, I need to activate the Fast Start Failover:

DGMGRL> edit database toolcdb1_site1 set property LogXptMode='SYNC';
Property "logxptmode" updated

DGMGRL> edit database toolcdb1_site2 set property LogXptMode='SYNC';
Property "logxptmode" updated

DGMGRL> edit database toolcdb1_site1 set property FastStartFailoverTarget='toolcdb1_site2';
Property "faststartfailovertarget" updated

DGMGRL> edit database toolcdb1_site2 set property FastStartFailoverTarget='toolcdb1_site1';
Property "faststartfailovertarget" updated

DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.

DGMGRL> enable fast_start failover;
Enabled in Zero Data Loss Mode.

DGMGRL> edit database toolcdb1_site1 set property LogXptMode='SYNC';

Property "logxptmode" updated

DGMGRL> edit database toolcdb1_site2 set property LogXptMode='SYNC';

Property "logxptmode" updated

DGMGRL> edit database toolcdb1_site1 set property FastStartFailoverTarget='toolcdb1_site2';

Property "faststartfailovertarget" updated

DGMGRL> edit database toolcdb1_site2 set property FastStartFailoverTarget='toolcdb1_site1';

Property "faststartfailovertarget" updated

DGMGRL> edit configuration set protection mode as maxavailability;

Succeeded.

DGMGRL> enable fast_start failover;

Enabled in Zero Data Loss Mode.

With just two databases, FastStartFailoverTarget is not explicitly needed, but I usually do it as other databases might be added to the configuration in the future.
After that, the broker complains that FSFO is enabled but there is no observer yet:

DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Zero Data Loss Mode

  Protection Mode:    MaxAvailability
  Lag Limit:          0 seconds

  Threshold:          180 seconds
  Active Target:      toolcdb1_site2
  Potential Targets:  "toolcdb1_site2"
    toolcdb1_site2 valid
  Observer:           (none)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: 180 seconds
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)


DGMGRL> show configuration;

Configuration - toolcdb1

  Protection Mode: MaxAvailability
  Members:
  toolcdb1_site1 - Primary database
    Warning: ORA-16819: fast-start failover observer not started

    toolcdb1_site2 - (*) Physical standby database

Fast-Start Failover: Enabled in Zero Data Loss Mode

Configuration Status:
WARNING   (status updated 39 seconds ago)

DGMGRL> show fast_start failover;

Fast-Start Failover: Enabled in Zero Data Loss Mode

Protection Mode: MaxAvailability

Lag Limit: 0 seconds

Threshold: 180 seconds

Active Target: toolcdb1_site2

Potential Targets: "toolcdb1_site2"

toolcdb1_site2 valid

Observer: (none)

Shutdown Primary: TRUE

Auto-reinstate: TRUE

Observer Reconnect: 180 seconds

Observer Override: FALSE

Configurable Failover Conditions

Health Conditions:

Corrupted Controlfile YES

Corrupted Dictionary YES

Inaccessible Logfile NO

Stuck Archiver NO

Datafile Write Errors YES

Oracle Error Conditions:

(none)

DGMGRL> show configuration;

Configuration - toolcdb1

Protection Mode: MaxAvailability

Members:

toolcdb1_site1 - Primary database

Warning: ORA-16819: fast-start failover observer not started

toolcdb1_site2 - (*) Physical standby database

Fast-Start Failover: Enabled in Zero Data Loss Mode

Configuration Status:

WARNING (status updated 39 seconds ago)

Observer for multiple configurations

This feature has been introduced in 12.2 but it is still not widely used.
Before 12.2, the Observer was a foreground process: the DBAs had to start it in a wrapper script executed with nohup in order to keep it live.
Since 12.2, the observer can run as a background process as far as there is a valid wallet for the connection to the databases.
Also, 12.2 introduced the capability of starting multiple configurations with a single dgmgrl command: “START OBSERVING”.

For more information about it, you can check the documentation here:
https://docs.oracle.com/en/database/oracle/oracle-database/19/dgbkr/using-data-guard-broker-to-manage-switchovers-failovers.html#GUID-BC513CDB-1E06-4EB3-9FE1-E1331E15E492

How to set it up with Easy Connect?

First, I need a wallet. And here comes the first compromise:
Having a single dgmgrl session to start all my configurations means that I have a single wallet for all the databases that I want to observe.
Fair enough, all the DBs (CDBs?) are managed by the same team in this case.
If I have only observers on my host I can easily point to the wallet from my central sqlnet.ora:

WALLET_LOCATION =
   (SOURCE =
      (METHOD = FILE)
      (METHOD_DATA = (DIRECTORY = /u01/app/oracle/admin/observers/wallet))
  )
SQLNET.WALLET_OVERRIDE = TRUE

WALLET_LOCATION =

(SOURCE =

(METHOD = FILE)

(METHOD_DATA = (DIRECTORY = /u01/app/oracle/admin/observers/wallet))

)

SQLNET.WALLET_OVERRIDE = TRUE

Otherwise I need to create a separate TNS_ADMIN for my observer management environment.
Then, I create the wallet:

$ WALLET_DIR=$ORACLE_BASE/admin/observers/wallet
$ mkdir -p $WALLET_DIR
$ orapki wallet create -wallet $WALLET_DIR -auto_login_local -pwd Password2020
Oracle PKI Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.

Operation is successfully completed.

$ WALLET_DIR=$ORACLE_BASE/admin/observers/wallet

$ mkdir -p $WALLET_DIR

$ orapki wallet create -wallet $WALLET_DIR -auto_login_local -pwd Password2020

Oracle PKI Tool Release 21.0.0.0.0 - Production

Version 21.0.0.0.0

Operation is successfully completed.

Now I need to add the connection descriptors.

Which connection descriptors do I need?
The Observer uses the DGConnectIdentifier to keep observing the databases, but needs a connection to both of them using the TOOLCDB1_CFG service (unless I specify something different with the broker configuration property ConfigurationWideServiceName) to connect to the configuration and get the DGConnectIdentifier information. Again, you can check it in the doc. or the note Oracle 12.2 – Simplified OBSERVER Management for Multiple Fast-Start Failover Configurations (Doc ID 2285891.1)

So I need to specify three secrets for three connection descriptors:

$ mkstore -wrl "$TNS_ADMIN" -createCredential newbox01,newbox02:1521/TOOLCDB1_CFG sysdg
Oracle Secret Store Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.

Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:

$ mkstore -wrl "$TNS_ADMIN" -createCredential newbox01:1521/TOOLCDB1_SITE1 sysdg
Oracle Secret Store Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.

Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:


$ mkstore -wrl "$TNS_ADMIN" -createCredential newbox02:1521/TOOLCDB1_SITE2 sysdg
Oracle Secret Store Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.

Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:

$ mkstore -wrl "$TNS_ADMIN" -createCredential newbox01,newbox02:1521/TOOLCDB1_CFG sysdg

Oracle Secret Store Tool Release 21.0.0.0.0 - Production

Version 21.0.0.0.0

Your secret/Password is missing in the command line

Enter your secret/Password:

Re-enter your secret/Password:

Enter wallet password:

$ mkstore -wrl "$TNS_ADMIN" -createCredential newbox01:1521/TOOLCDB1_SITE1 sysdg

Oracle Secret Store Tool Release 21.0.0.0.0 - Production

Version 21.0.0.0.0

Your secret/Password is missing in the command line

Enter your secret/Password:

Re-enter your secret/Password:

Enter wallet password:

$ mkstore -wrl "$TNS_ADMIN" -createCredential newbox02:1521/TOOLCDB1_SITE2 sysdg

Oracle Secret Store Tool Release 21.0.0.0.0 - Production

Version 21.0.0.0.0

Your secret/Password is missing in the command line

Enter your secret/Password:

Re-enter your secret/Password:

Enter wallet password:

The first one will be used for the initial connection. The other two to observe the Primary and Standby.
I need to be careful that the first EZConnect descriptor matches EXACTLY what I put in observer.ora (see next step) and the last two match my DGConnectIdentifier (unless I specify something different with ObserverConnectIdentifier), otherwise I will get some errors and the observer will not observe correctly (or will not start at all).

The dgmgrl needs then a file named observer.ora.
$ORACLE_BASE/admin/observers or the central TNS_ADMIN would be good locations, but what if I have observers that must be started from multiple Oracle Homes?
In that case, having a observer.ora in $ORACLE_HOME/network/admin (or $ORACLE_BASE/homes/{OHNAME}/network/admin/ if Read-Only Oracle Home is enabled) would be a better solution: in this case I would need to start one session per Oracle Home

The content of my observer.ora must be something like:

BROKER_CONFIGS=
   (
     (CONFIG=
       (NAME=TOOLCDB1)
       (CONNECT_ID=newbox01,newbox02:1521/TOOLCDB1_CFG)
       (CONFIG_HOME=/export/soft/oracle/admin/TOOLCDB1/observer)
     )
   )

BROKER_CONFIGS=

(

(CONFIG=

(NAME=TOOLCDB1)

(CONNECT_ID=newbox01,newbox02:1521/TOOLCDB1_CFG)

(CONFIG_HOME=/export/soft/oracle/admin/TOOLCDB1/observer)

)

This is the example for my configuration, but I can put as many (CONFIG=…) as I want in order to observe multiple configurations.
Then, if everything is configured properly, I can start all the observers with a single command:

DGMGRL> SET OBSERVERCONFIGFILE=/u01/app/oracle/admin/observers/observer.ora
DGMGRL> START OBSERVING
ObserverConfigFile=observer.ora
observer configuration file parsing succeeded
Submitted command "START OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"

Check superobserver.log, individual observer logs and Data Guard Broker logs for execution details.

DGMGRL> show observers
ObserverConfigFile=/u01/app/oracle/admin/observers/observer.ora
observer configuration file parsing succeeded
Submitted command "SHOW OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"
Connected to "TOOLCDB1_SITE2"

Configuration - toolcdb1

  Primary:            toolcdb1_site1
  Active Target:      toolcdb1_site2

Observer "newbox03.trivadistraining.com1" - Master

  Host Name:                    newbox03.trivadistraining.com
  Last Ping to Primary:         1 second ago
  Last Ping to Target:          2 seconds ago

DGMGRL> SET OBSERVERCONFIGFILE=/u01/app/oracle/admin/observers/observer.ora

DGMGRL> START OBSERVING

ObserverConfigFile=observer.ora

observer configuration file parsing succeeded

Submitted command "START OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"

Check superobserver.log, individual observer logs and Data Guard Broker logs for execution details.

DGMGRL> show observers

ObserverConfigFile=/u01/app/oracle/admin/observers/observer.ora

observer configuration file parsing succeeded

Submitted command "SHOW OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"

Connected to "TOOLCDB1_SITE2"

Configuration - toolcdb1

Primary: toolcdb1_site1

Active Target: toolcdb1_site2

Observer "newbox03.trivadistraining.com1" - Master

Host Name: newbox03.trivadistraining.com

Last Ping to Primary: 1 second ago

Last Ping to Target: 2 seconds ago

Troubleshooting

If the observer does not work, sometimes it is not easy to understand the cause.

Has SYSDG been granted to SYSDG user? Is SYSDG account unlocked?
Does sqlnet.ora contain the correct wallet location?
Is the wallet accessible in autologin?
Are the entries in the wallet correct? (check with “sqlplus /@connstring as sysdg”)

Missing pieces

Here, a few features that I think would be a nice addition in the future:

Awareness for the ORACLE_HOME to be used for each observer
Possibility to specify a different TNS_ADMIN per observer (different wallets)
Integration with Grid Infrastructure (srvctl add observer…) and support for multiple observers

—

Ludovico

Real-Time Cascade Standby Container Databases without Oracle Managed Files

Posted on July 10, 2020 by Ludovico

OK, the title might not be the best… I just would like to add more detail to content you can already find in other blogs (E.g. this nice one from Philippe Fierens http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).

I have this Cascade Standby configuration:

DGMGRL> connect /
Connected to "TOOLCDB1_SITE1"
Connected as SYSDG.
DGMGRL> show configuration;

Configuration - toolcdb1

  Protection Mode: MaxPerformance
  Members:
  toolcdb1_site1 - Primary database
    toolcdb1_site2 - Physical standby database
      toolcdx1_site2 - Physical standby database (receiving current redo)

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 42 seconds ago)

DGMGRL> connect /

Connected to "TOOLCDB1_SITE1"

Connected as SYSDG.

DGMGRL> show configuration;

Configuration - toolcdb1

Protection Mode: MaxPerformance

Members:

toolcdb1_site1 - Primary database

toolcdb1_site2 - Physical standby database

toolcdx1_site2 - Physical standby database (receiving current redo)

Fast-Start Failover: Disabled

Configuration Status:

SUCCESS (status updated 42 seconds ago)

Years ago I wrote this whitepaper about cascaded standbys:
https://fr.slideshare.net/ludovicocaldara/2014-603-caldarappr
While it is still relevant for non-CDBs, things have changed with Multitenant architecture.

In my config, the Oracle Database version is 19.7 and the databases are actually CDBs. No Grid Infrastructure, non-OMF datafiles.
It is important to highlight that a lot of things have changed since 12.1. And because 19c is the LTS version now, it does not make sense to try anything older.

First, I just want to make sure that my standbys are aligned.

Primary:

alter system switch logfile;

1	alter system switch logfile;

1st Standby alert log:

2020-07-07T10:20:23.370868+02:00
 rfs (PID:6408): Archived Log entry 58 added for B-1044796516.T-1.S-39 ID 0xf15601c6 LAD:2
 rfs (PID:6408): No SRLs available for T-1
2020-07-07T10:20:23.386410+02:00
 rfs (PID:6408): Opened log for T-1.S-40 dbid 4048667172 branch 1044796516
2020-07-07T10:20:24.552766+02:00
PR00 (PID:6478): Media Recovery Log /u03/oradata/fra/TOOLCDB1_SITE2/archivelog/2020_07_07/o1_mf_1_39_hj8cs7vo_.arc
PR00 (PID:6478): Media Recovery Waiting for T-1.S-40 (in transit)

2020-07-07T10:20:23.370868+02:00

rfs (PID:6408): Archived Log entry 58 added for B-1044796516.T-1.S-39 ID 0xf15601c6 LAD:2

rfs (PID:6408): No SRLs available for T-1

2020-07-07T10:20:23.386410+02:00

rfs (PID:6408): Opened log for T-1.S-40 dbid 4048667172 branch 1044796516

2020-07-07T10:20:24.552766+02:00

PR00 (PID:6478): Media Recovery Log /u03/oradata/fra/TOOLCDB1_SITE2/archivelog/2020_07_07/o1_mf_1_39_hj8cs7vo_.arc

PR00 (PID:6478): Media Recovery Waiting for T-1.S-40 (in transit)

2nd Standby alert log:

2020-07-07T10:20:31.051281+02:00
 rfs (PID:6498): Opened log for T-1.S-39 dbid 4048667172 branch 1044796516
2020-07-07T10:20:31.150748+02:00
 rfs (PID:6498): Archived Log entry 38 added for B-1044796516.T-1.S-39 ID 0xf15601c6 LAD:2
2020-07-07T10:20:31.862337+02:00
PR00 (PID:6718): Media Recovery Log /u03/oradata/fra/TOOLCDX1_SITE2/archivelog/2020_07_07/o1_mf_1_39_hj8d2h1k_.arc
PR00 (PID:6718): Media Recovery Waiting for T-1.S-40

2020-07-07T10:20:31.051281+02:00

rfs (PID:6498): Opened log for T-1.S-39 dbid 4048667172 branch 1044796516

2020-07-07T10:20:31.150748+02:00

rfs (PID:6498): Archived Log entry 38 added for B-1044796516.T-1.S-39 ID 0xf15601c6 LAD:2

2020-07-07T10:20:31.862337+02:00

PR00 (PID:6718): Media Recovery Log /u03/oradata/fra/TOOLCDX1_SITE2/archivelog/2020_07_07/o1_mf_1_39_hj8d2h1k_.arc

PR00 (PID:6718): Media Recovery Waiting for T-1.S-40

Then, I create a pluggable database (from PDB$SEED):

SQL>         CREATE PLUGGABLE DATABASE LATERALUS ADMIN USER PDBADMIN IDENTIFIED BY "NfrwTgbjwq7MbPNT92cH"  ROLES=(DBA)
  2                  FILE_NAME_CONVERT=('/pdbseed/','/LATERALUS/')
  3                  DEFAULT TABLESPACE USERS DATAFILE '/u02/oradata/TOOLCDB1/data/LATERALUS/USERS01.dbf' SIZE 50M AUTOEXTEND ON NEXT 50M MAXSIZE 1G;

Pluggable database created.

SQL>         ALTER PLUGGABLE DATABASE LATERALUS OPEN;

Pluggable database altered.

SQL>         ALTER PLUGGABLE DATABASE LATERALUS SAVE STATE;

Pluggable database altered.

SQL> CREATE PLUGGABLE DATABASE LATERALUS ADMIN USER PDBADMIN IDENTIFIED BY "NfrwTgbjwq7MbPNT92cH" ROLES=(DBA)

2 FILE_NAME_CONVERT=('/pdbseed/','/LATERALUS/')

3 DEFAULT TABLESPACE USERS DATAFILE '/u02/oradata/TOOLCDB1/data/LATERALUS/USERS01.dbf' SIZE 50M AUTOEXTEND ON NEXT 50M MAXSIZE 1G;

Pluggable database created.

SQL> ALTER PLUGGABLE DATABASE LATERALUS OPEN;

Pluggable database altered.

SQL> ALTER PLUGGABLE DATABASE LATERALUS SAVE STATE;

Pluggable database altered.

On the first standby I get:

2020-07-07T10:23:33.148457+02:00
 rfs (PID:6408): Archived Log entry 60 added for B-1044796516.T-1.S-40 ID 0xf15601c6 LAD:2
 rfs (PID:6408): No SRLs available for T-1
2020-07-07T10:23:33.184335+02:00
 rfs (PID:6408): Opened log for T-1.S-41 dbid 4048667172 branch 1044796516
2020-07-07T10:23:33.887665+02:00
PR00 (PID:6478): Media Recovery Log /u03/oradata/fra/TOOLCDB1_SITE2/archivelog/2020_07_07/o1_mf_1_40_hj8d27d0_.arc
Recovery created pluggable database LATERALUS
Recovery copied files for tablespace SYSTEM
Recovery successfully copied file /u02/oradata/TOOLCDB1/data/LATERALUS/system01.dbf from /u02/oradata/TOOLCDB1/data/pdbseed/system01.dbf
LATERALUS(4):WARNING: File being created with same name as in Primary
LATERALUS(4):Existing file may be overwritten
LATERALUS(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/system01.dbf
LATERALUS(4):Successfully added datafile 16 to media recovery
LATERALUS(4):Datafile #16: '/u02/oradata/TOOLCDB1/data/LATERALUS/system01.dbf'
2020-07-07T10:23:35.846985+02:00
Recovery copied files for tablespace SYSAUX
Recovery successfully copied file /u02/oradata/TOOLCDB1/data/LATERALUS/sysaux01.dbf from /u02/oradata/TOOLCDB1/data/pdbseed/sysaux01.dbf
LATERALUS(4):WARNING: File being created with same name as in Primary
LATERALUS(4):Existing file may be overwritten
LATERALUS(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/sysaux01.dbf
LATERALUS(4):Successfully added datafile 17 to media recovery
LATERALUS(4):Datafile #17: '/u02/oradata/TOOLCDB1/data/LATERALUS/sysaux01.dbf'
2020-07-07T10:23:41.004383+02:00
Recovery copied files for tablespace UNDOTBS1
Recovery successfully copied file /u02/oradata/TOOLCDB1/data/LATERALUS/undotbs01.dbf from /u02/oradata/TOOLCDB1/data/pdbseed/undotbs01.dbf
LATERALUS(4):WARNING: File being created with same name as in Primary
LATERALUS(4):Existing file may be overwritten
LATERALUS(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/undotbs01.dbf
LATERALUS(4):Successfully added datafile 18 to media recovery
LATERALUS(4):Datafile #18: '/u02/oradata/TOOLCDB1/data/LATERALUS/undotbs01.dbf'
2020-07-07T10:23:42.191607+02:00
(4):WARNING: File being created with same name as in Primary
(4):Existing file may be overwritten
(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/USERS01.dbf
(4):Successfully added datafile 19 to media recovery
(4):Datafile #19: '/u02/oradata/TOOLCDB1/data/LATERALUS/USERS01.dbf'
PR00 (PID:6478): Media Recovery Waiting for T-1.S-41 (in transit)

2020-07-07T10:23:33.148457+02:00

rfs (PID:6408): Archived Log entry 60 added for B-1044796516.T-1.S-40 ID 0xf15601c6 LAD:2

rfs (PID:6408): No SRLs available for T-1

2020-07-07T10:23:33.184335+02:00

rfs (PID:6408): Opened log for T-1.S-41 dbid 4048667172 branch 1044796516

2020-07-07T10:23:33.887665+02:00

PR00 (PID:6478): Media Recovery Log /u03/oradata/fra/TOOLCDB1_SITE2/archivelog/2020_07_07/o1_mf_1_40_hj8d27d0_.arc

Recovery created pluggable database LATERALUS

Recovery copied files for tablespace SYSTEM

Recovery successfully copied file /u02/oradata/TOOLCDB1/data/LATERALUS/system01.dbf from /u02/oradata/TOOLCDB1/data/pdbseed/system01.dbf

LATERALUS(4):WARNING: File being created with same name as in Primary

LATERALUS(4):Existing file may be overwritten

LATERALUS(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/system01.dbf

LATERALUS(4):Successfully added datafile 16 to media recovery

LATERALUS(4):Datafile #16: '/u02/oradata/TOOLCDB1/data/LATERALUS/system01.dbf'

2020-07-07T10:23:35.846985+02:00

Recovery copied files for tablespace SYSAUX

Recovery successfully copied file /u02/oradata/TOOLCDB1/data/LATERALUS/sysaux01.dbf from /u02/oradata/TOOLCDB1/data/pdbseed/sysaux01.dbf

LATERALUS(4):WARNING: File being created with same name as in Primary

LATERALUS(4):Existing file may be overwritten

LATERALUS(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/sysaux01.dbf

LATERALUS(4):Successfully added datafile 17 to media recovery

LATERALUS(4):Datafile #17: '/u02/oradata/TOOLCDB1/data/LATERALUS/sysaux01.dbf'

2020-07-07T10:23:41.004383+02:00

Recovery copied files for tablespace UNDOTBS1

Recovery successfully copied file /u02/oradata/TOOLCDB1/data/LATERALUS/undotbs01.dbf from /u02/oradata/TOOLCDB1/data/pdbseed/undotbs01.dbf

LATERALUS(4):WARNING: File being created with same name as in Primary

LATERALUS(4):Existing file may be overwritten

LATERALUS(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/undotbs01.dbf

LATERALUS(4):Successfully added datafile 18 to media recovery

LATERALUS(4):Datafile #18: '/u02/oradata/TOOLCDB1/data/LATERALUS/undotbs01.dbf'

2020-07-07T10:23:42.191607+02:00

(4):WARNING: File being created with same name as in Primary

(4):Existing file may be overwritten

(4):Recovery created file /u02/oradata/TOOLCDB1/data/LATERALUS/USERS01.dbf

(4):Successfully added datafile 19 to media recovery

(4):Datafile #19: '/u02/oradata/TOOLCDB1/data/LATERALUS/USERS01.dbf'

PR00 (PID:6478): Media Recovery Waiting for T-1.S-41 (in transit)

On the second:

2020-07-07T10:24:31.393410+02:00
 rfs (PID:6500): Opened log for T-1.S-40 dbid 4048667172 branch 1044796516
2020-07-07T10:24:31.460391+02:00
 rfs (PID:6500): Archived Log entry 39 added for B-1044796516.T-1.S-40 ID 0xf15601c6 LAD:2
2020-07-07T10:24:32.360726+02:00
PR00 (PID:6718): Media Recovery Log /u03/oradata/fra/TOOLCDX1_SITE2/archivelog/2020_07_07/o1_mf_1_40_hj8d9zd7_.arc
Recovery created pluggable database LATERALUS
2020-07-07T10:24:36.000250+02:00
Recovery copied files for tablespace SYSTEM
Recovery successfully copied file /u02/oradata/TOOLCDX1/data/LATERALUS/system01.dbf from /u02/oradata/TOOLCDX1/data/pdbseed/system01.dbf
LATERALUS(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/system01.dbf
LATERALUS(4):Successfully added datafile 16 to media recovery
LATERALUS(4):Datafile #16: '/u02/oradata/TOOLCDX1/data/LATERALUS/system01.dbf'
2020-07-07T10:24:40.657596+02:00
Recovery copied files for tablespace SYSAUX
Recovery successfully copied file /u02/oradata/TOOLCDX1/data/LATERALUS/sysaux01.dbf from /u02/oradata/TOOLCDX1/data/pdbseed/sysaux01.dbf
LATERALUS(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/sysaux01.dbf
LATERALUS(4):Successfully added datafile 17 to media recovery
LATERALUS(4):Datafile #17: '/u02/oradata/TOOLCDX1/data/LATERALUS/sysaux01.dbf'
2020-07-07T10:24:47.688298+02:00
Recovery copied files for tablespace UNDOTBS1
Recovery successfully copied file /u02/oradata/TOOLCDX1/data/LATERALUS/undotbs01.dbf from /u02/oradata/TOOLCDX1/data/pdbseed/undotbs01.dbf
LATERALUS(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/undotbs01.dbf
LATERALUS(4):Successfully added datafile 18 to media recovery
LATERALUS(4):Datafile #18: '/u02/oradata/TOOLCDX1/data/LATERALUS/undotbs01.dbf'
(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/USERS01.dbf
(4):Successfully added datafile 19 to media recovery
(4):Datafile #19: '/u02/oradata/TOOLCDX1/data/LATERALUS/USERS01.dbf'
2020-07-07T10:24:48.924510+02:00
PR00 (PID:6718): Media Recovery Waiting for T-1.S-41

2020-07-07T10:24:31.393410+02:00

rfs (PID:6500): Opened log for T-1.S-40 dbid 4048667172 branch 1044796516

2020-07-07T10:24:31.460391+02:00

rfs (PID:6500): Archived Log entry 39 added for B-1044796516.T-1.S-40 ID 0xf15601c6 LAD:2

2020-07-07T10:24:32.360726+02:00

PR00 (PID:6718): Media Recovery Log /u03/oradata/fra/TOOLCDX1_SITE2/archivelog/2020_07_07/o1_mf_1_40_hj8d9zd7_.arc

Recovery created pluggable database LATERALUS

2020-07-07T10:24:36.000250+02:00

Recovery copied files for tablespace SYSTEM

Recovery successfully copied file /u02/oradata/TOOLCDX1/data/LATERALUS/system01.dbf from /u02/oradata/TOOLCDX1/data/pdbseed/system01.dbf

LATERALUS(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/system01.dbf

LATERALUS(4):Successfully added datafile 16 to media recovery

LATERALUS(4):Datafile #16: '/u02/oradata/TOOLCDX1/data/LATERALUS/system01.dbf'

2020-07-07T10:24:40.657596+02:00

Recovery copied files for tablespace SYSAUX

Recovery successfully copied file /u02/oradata/TOOLCDX1/data/LATERALUS/sysaux01.dbf from /u02/oradata/TOOLCDX1/data/pdbseed/sysaux01.dbf

LATERALUS(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/sysaux01.dbf

LATERALUS(4):Successfully added datafile 17 to media recovery

LATERALUS(4):Datafile #17: '/u02/oradata/TOOLCDX1/data/LATERALUS/sysaux01.dbf'

2020-07-07T10:24:47.688298+02:00

Recovery copied files for tablespace UNDOTBS1

Recovery successfully copied file /u02/oradata/TOOLCDX1/data/LATERALUS/undotbs01.dbf from /u02/oradata/TOOLCDX1/data/pdbseed/undotbs01.dbf

LATERALUS(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/undotbs01.dbf

LATERALUS(4):Successfully added datafile 18 to media recovery

LATERALUS(4):Datafile #18: '/u02/oradata/TOOLCDX1/data/LATERALUS/undotbs01.dbf'

(4):Recovery created file /u02/oradata/TOOLCDX1/data/LATERALUS/USERS01.dbf

(4):Successfully added datafile 19 to media recovery

(4):Datafile #19: '/u02/oradata/TOOLCDX1/data/LATERALUS/USERS01.dbf'

2020-07-07T10:24:48.924510+02:00

PR00 (PID:6718): Media Recovery Waiting for T-1.S-41

So, yeah, not having OMF might get you some warnings like: WARNING: File being created with same name as in Primary
But it is good to know that the cascade standby deals well with new PDBs.

Of course, this is not of big interest as I know that the problem with Multitenant comes from CLONING PDBs from either local or remote PDBs in read-write mode.

So let’s try a relocate from another CDB:

 CREATE PLUGGABLE DATABASE PNEUMA FROM PNEUMA@LUDOCDB1_PNEUMA_tempclone
         RELOCATE AVAILABILITY NORMAL
         file_name_convert=('/LUDOCDB1/data/PNEUMA/','/TOOLCDB1/data/PNEUMA/')
         PARALLEL 2;

Pluggable database created.

SQL>         ALTER PLUGGABLE DATABASE PNEUMA OPEN;

Pluggable database altered.

SQL>         ALTER PLUGGABLE DATABASE PNEUMA SAVE STATE;

Pluggable database altered.

CREATE PLUGGABLE DATABASE PNEUMA FROM PNEUMA@LUDOCDB1_PNEUMA_tempclone

RELOCATE AVAILABILITY NORMAL

file_name_convert=('/LUDOCDB1/data/PNEUMA/','/TOOLCDB1/data/PNEUMA/')

PARALLEL 2;

Pluggable database created.

SQL> ALTER PLUGGABLE DATABASE PNEUMA OPEN;

Pluggable database altered.

SQL> ALTER PLUGGABLE DATABASE PNEUMA SAVE STATE;

Pluggable database altered.

This is what I get on the first standby:

2020-07-07T12:03:02.364271+02:00
Recovery created pluggable database PNEUMA
PNEUMA(5):Tablespace-SYSTEM during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #20 added to control file as 'UNNAMED00020'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.
PNEUMA(5):Tablespace-SYSAUX during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #21 added to control file as 'UNNAMED00021'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.
PNEUMA(5):Tablespace-UNDOTBS1 during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #22 added to control file as 'UNNAMED00022'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.
PNEUMA(5):Tablespace-TEMP during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):Tablespace-USERS during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #23 added to control file as 'UNNAMED00023'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.

2020-07-07T12:03:02.364271+02:00

Recovery created pluggable database PNEUMA

PNEUMA(5):Tablespace-SYSTEM during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #20 added to control file as 'UNNAMED00020'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

PNEUMA(5):Tablespace-SYSAUX during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #21 added to control file as 'UNNAMED00021'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

PNEUMA(5):Tablespace-UNDOTBS1 during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #22 added to control file as 'UNNAMED00022'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

PNEUMA(5):Tablespace-TEMP during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):Tablespace-USERS during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #23 added to control file as 'UNNAMED00023'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

and this is on the cascaded standby:

2020-07-07T12:03:02.368014+02:00
Recovery created pluggable database PNEUMA
PNEUMA(5):Tablespace-SYSTEM during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #20 added to control file as 'UNNAMED00020'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.
PNEUMA(5):Tablespace-SYSAUX during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #21 added to control file as 'UNNAMED00021'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.
PNEUMA(5):Tablespace-UNDOTBS1 during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #22 added to control file as 'UNNAMED00022'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.
PNEUMA(5):Tablespace-TEMP during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):Tablespace-USERS during PDB create skipped since source is in            r/w mode or this is a refresh clone
PNEUMA(5):File #23 added to control file as 'UNNAMED00023'. Originally created as:
PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf'
PNEUMA(5):because the pluggable database was created with nostandby
PNEUMA(5):or the tablespace belonging to the pluggable database is
PNEUMA(5):offline.

2020-07-07T12:03:02.368014+02:00

Recovery created pluggable database PNEUMA

PNEUMA(5):Tablespace-SYSTEM during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #20 added to control file as 'UNNAMED00020'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

PNEUMA(5):Tablespace-SYSAUX during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #21 added to control file as 'UNNAMED00021'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

PNEUMA(5):Tablespace-UNDOTBS1 during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #22 added to control file as 'UNNAMED00022'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

PNEUMA(5):Tablespace-TEMP during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):Tablespace-USERS during PDB create skipped since source is in r/w mode or this is a refresh clone

PNEUMA(5):File #23 added to control file as 'UNNAMED00023'. Originally created as:

PNEUMA(5):'/u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf'

PNEUMA(5):because the pluggable database was created with nostandby

PNEUMA(5):or the tablespace belonging to the pluggable database is

PNEUMA(5):offline.

So absolutely the same behavior between the two levels of standby.
According to the documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-PLUGGABLE-DATABASE.html#GUID-F2DBA8DD-EEA8-4BB7-A07F-78DC04DB1FFC
I quote what is specified for the parameter STANDBYS={ALL|NONE|…}:
“If you include a PDB in a standby CDB, then during standby recovery the standby CDB will search for the data files for the PDB. If the data files are not found, then standby recovery will stop and you must copy the data files to the correct location before you can restart recovery.”

“Specify ALL to include the new PDB in all standby CDBs. This is the default.”

“Specify NONE to exclude the new PDB from all standby CDBs. When a PDB is excluded from all standby CDBs, the PDB’s data files are unnamed and marked offline on all of the standby CDBs. Standby recovery will not stop if the data files for the PDB are not found on the standby. […]”

So, in order to avoid the MRP to crash, I should have included STANDBYS=NONE
But the documentation is not up to date, because in my case the PDB is skipped automatically and the recovery process DOES NOT STOP:

SQL> r
  1* select process, status, sequence#, client_process from v$managed_standby

PROCESS   STATUS        SEQUENCE# CLIENT_P
--------- ------------ ---------- --------
ARCH      CONNECTED             0 ARCH
DGRD      ALLOCATED             0 N/A
DGRD      ALLOCATED             0 N/A
ARCH      CLOSING              43 ARCH
ARCH      CLOSING              40 ARCH
ARCH      CLOSING              42 ARCH
RFS       IDLE                  0 Archival
RFS       IDLE                  0 UNKNOWN
RFS       IDLE                 44 LGWR
RFS       IDLE                  0 UNKNOWN
MRP0      APPLYING_LOG         44 N/A
LNS       WRITING              44 LNS
DGRD      ALLOCATED             0 N/A

13 rows selected.

SQL> r

1* select process, status, sequence#, client_process from v$managed_standby

PROCESS STATUS SEQUENCE# CLIENT_P

--------- ------------ ---------- --------

ARCH CONNECTED 0 ARCH

DGRD ALLOCATED 0 N/A

ARCH CLOSING 43 ARCH

ARCH CLOSING 40 ARCH

ARCH CLOSING 42 ARCH

RFS IDLE 0 Archival

RFS IDLE 0 UNKNOWN

RFS IDLE 44 LGWR

RFS IDLE 0 UNKNOWN

MRP0 APPLYING_LOG 44 N/A

LNS WRITING 44 LNS

DGRD ALLOCATED 0 N/A

13 rows selected.

However, the recovery is marked ENABLED for the PDB on the standby, while usind STANDBYS=NONE it would have been DISABLED.

  1* select name, recovery_status from v$pdbs

NAME                           RECOVERY
------------------------------ --------
PDB$SEED                       ENABLED
LATERALUS                      ENABLED
PNEUMA                         ENABLED

1* select name, recovery_status from v$pdbs

NAME RECOVERY

------------------------------ --------

PDB$SEED ENABLED

LATERALUS ENABLED

PNEUMA ENABLED

So, another difference with the doc who states:
“You can enable a PDB on a standby CDB after it was excluded on that standby CDB by copying the data files to the correct location, bringing the PDB online, and marking it as enabled for recovery.”

This reflects the findings of Philippe Fierens in his blog (http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).

This behavior has been introduced probably between 12.2 and 19c, but I could not manage to find exactly when, as it is not explicitly stated in the documentation.
However, I remember well that in 12.1.0.2, the MRP process was crashing.

In my configuration, not on purpose, but interesting for this article, the first standby has the very same directory structure, while the cascaded standby has not.

In any case, there is a potentially big problem for all the customers implementing Multitenant on Data Guard:

With the old behaviour (MRP crashing), it was easy to spot when a PDB was cloned online into a primary database, because a simple dgmgrl “show configuration” whould have displayed a warning because of the increasing lag (following the MRP crash).

With the current behavior, the MRP keeps recovering and the “show configuration” displays “SUCCESS” despite there is a PDB not copied on the standby (thus not protected).

Indeed, this is what I get after the clone:

DGMGRL> show configuration;

Configuration - toolcdb1

  Protection Mode: MaxPerformance
  Members:
  toolcdb1_site1 - Primary database
    toolcdb1_site2 - Physical standby database
      toolcdx1_site2 - Physical standby database (receiving current redo)

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 21 seconds ago)

DGMGRL> show database  toolcdb1_site2;

Database - toolcdb1_site2

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 8.00 KByte/s
  Real Time Query:    ON
  Instance(s):
    TOOLCDB1

Database Status:
SUCCESS

DGMGRL> show configuration;

Configuration - toolcdb1

Protection Mode: MaxPerformance

Members:

toolcdb1_site1 - Primary database

toolcdb1_site2 - Physical standby database

toolcdx1_site2 - Physical standby database (receiving current redo)

Fast-Start Failover: Disabled

Configuration Status:

SUCCESS (status updated 21 seconds ago)

DGMGRL> show database toolcdb1_site2;

Database - toolcdb1_site2

Role: PHYSICAL STANDBY

Intended State: APPLY-ON

Transport Lag: 0 seconds (computed 1 second ago)

Apply Lag: 0 seconds (computed 1 second ago)

Average Apply Rate: 8.00 KByte/s

Real Time Query: ON

Instance(s):

TOOLCDB1

Database Status:

SUCCESS

I can see that the Data Guard Broker is completely silent about the missing PDB. So I might think my PDB is protected while it is not!

I actually have to add a check on the standby DBs to check if I have any missing datafiles:

1* select con_id, name, status from v$datafile where status not in ('SYSTEM','ONLINE');

    CON_ID NAME                                                  STATUS
---------- ----------------------------------------------------- -------
         5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00020    SYSOFF
         5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00021    RECOVER
         5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00022    RECOVER
         5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00023    RECOVER

Although this first query seems OK to get the missing datafiles, actually the next one is the correct one to use:

SQL> select * from v$recover_file where online_status='OFFLINE';

     FILE# ONLINE  ONLINE_ ERROR               CHANGE# TIME             CON_ID
---------- ------- ------- ---------------- ---------- ------------ ----------
        20 OFFLINE OFFLINE FILE MISSING              0                       5
        21 OFFLINE OFFLINE FILE MISSING              0                       5
        22 OFFLINE OFFLINE FILE MISSING              0                       5
        23 OFFLINE OFFLINE FILE MISSING              0                       5

1* select con_id, name, status from v$datafile where status not in ('SYSTEM','ONLINE');

CON_ID NAME STATUS

---------- ----------------------------------------------------- -------

5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00020 SYSOFF

5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00021 RECOVER

5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00022 RECOVER

5 /u01/app/oracle/product/db_19_7_0/dbs/UNNAMED00023 RECOVER

Although this first query seems OK to get the missing datafiles, actually the next one is the correct one to use:

SQL> select * from v$recover_file where online_status='OFFLINE';

FILE# ONLINE ONLINE_ ERROR CHANGE# TIME CON_ID

---------- ------- ------- ---------------- ---------- ------------ ----------

20 OFFLINE OFFLINE FILE MISSING 0 5

21 OFFLINE OFFLINE FILE MISSING 0 5

22 OFFLINE OFFLINE FILE MISSING 0 5

23 OFFLINE OFFLINE FILE MISSING 0 5

This check should be implemented and put under monitoring (custom metrics in OEM?)

SQL> select 'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!' from v$recover_file where online_status='OFFLINE' group by con_id;

'ERROR:CON_ID'||CON_ID||'HAS'||COUNT(*)||'DATAFILESOFFLINE!'
--------------------------------------------------------------------------------
ERROR: CON_ID 5 has 4 datafiles offline!

SQL> select 'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!' from v$recover_file where online_status='OFFLINE' group by con_id;

'ERROR:CON_ID'||CON_ID||'HAS'||COUNT(*)||'DATAFILESOFFLINE!'

--------------------------------------------------------------------------------

ERROR: CON_ID 5 has 4 datafiles offline!

The missing PDB is easy to spot once I know that I have to do it. However, for each PDB to recover (I might have many!), I have to prepare the rename of datafiles and creation of directory (do not forget I am using non-OMF here).

Now, the datafile names on the standby got changed to …/UNNAMEDnnnnn.

So I have to get the original ones from the primary database and do the same replace that db_file_name_convert would do:

set trim on
col rename_file for a300
set lines 400
select 'set newname for datafile '||file#||' to '''||replace(name,'/TOOLCDB1/','/TOOLCDX1/')||''';' as rename_file  from v$datafile where con_id=6;

set trim on

col rename_file for a300

set lines 400

select 'set newname for datafile '||file#||' to '''||replace(name,'/TOOLCDB1/','/TOOLCDX1/')||''';' as rename_file from v$datafile where con_id=6;

and put this in a rman script (this will be for the second standby, the first has the same name so same PATH):

run {
set newname for datafile 20 to '/u02/oradata/TOOLCDX1/data/PNEUMA/system01.dbf';
set newname for datafile 21 to '/u02/oradata/TOOLCDX1/data/PNEUMA/sysaux01.dbf';
set newname for datafile 22 to '/u02/oradata/TOOLCDX1/data/PNEUMA/undotbs01.dbf';
set newname for datafile 23 to '/u02/oradata/TOOLCDX1/data/PNEUMA/USERS01.dbf';
restore pluggable database PNEUMA from service 'newbox01:1521/TOOLCDB1_SITE1_DGMGRL' ;
}
switch pluggable database PNEUMA to copy;

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 07-JUL-2020 14:19:22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1530 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00020 to /u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00021 to /u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00022 to /u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00023 to /u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 07-JUL-2020 14:19:43

datafile 20 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf"
datafile 21 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf"
datafile 22 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf"
datafile 23 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf"

run {

set newname for datafile 20 to '/u02/oradata/TOOLCDX1/data/PNEUMA/system01.dbf';

set newname for datafile 21 to '/u02/oradata/TOOLCDX1/data/PNEUMA/sysaux01.dbf';

set newname for datafile 22 to '/u02/oradata/TOOLCDX1/data/PNEUMA/undotbs01.dbf';

set newname for datafile 23 to '/u02/oradata/TOOLCDX1/data/PNEUMA/USERS01.dbf';

restore pluggable database PNEUMA from service 'newbox01:1521/TOOLCDB1_SITE1_DGMGRL' ;

}

switch pluggable database PNEUMA to copy;

executing command: SET NEWNAME

Starting restore at 07-JUL-2020 14:19:22

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=1530 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00020 to /u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf

channel ORA_DISK_1: restore complete, elapsed time: 00:00:03

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00021 to /u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf

channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00022 to /u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf

channel ORA_DISK_1: restore complete, elapsed time: 00:00:03

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: using network backup set from service newbox01:1521/TOOLCDB1_SITE1_DGMGRL

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00023 to /u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf

channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

Finished restore at 07-JUL-2020 14:19:43

datafile 20 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/system01.dbf"

datafile 21 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/sysaux01.dbf"

datafile 22 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/undotbs01.dbf"

datafile 23 switched to datafile copy "/u02/oradata/TOOLCDB1/data/PNEUMA/USERS01.dbf"

Then, I need to stop the recovery, start it and stopping again, put the datafiles online and finally restart the recover.
These are the same steps used my Philippe in his blog post, just adapted to my taste 🙂

DGMGRL> edit database "TOOLCDB1_SITE2" set state='APPLY-OFF';

1	DGMGRL> edit database "TOOLCDB1_SITE2" set state='APPLY-OFF';

For the second part, I use this HEREDOC to online all offline datafiles:

$ sqlplus / as sysdba <<EOF
RECOVER STANDBY DATABASE UNTIL CANCEL;
CANCEL
ALTER SESSION SET CONTAINER=PNEUMA;
DECLARE
        CURSOR c_fileids IS
                SELECT  file#  FROM v\$recover_file where online_STATUS='OFFLINE';

		r_fileid c_fileids%ROWTYPE;
BEGIN
        OPEN c_fileids;
        LOOP
                FETCH  c_fileids  INTO r_fileid;
                EXIT WHEN c_fileids%NOTFOUND;
                BEGIN
					EXECUTE IMMEDIATE 'ALTER DATABASE DATAFILE '||to_char(r_fileid.file#)||' ONLINE';
                END;
        END LOOP;
END;
/
exit
EOF

$ sqlplus / as sysdba <<EOF

RECOVER STANDBY DATABASE UNTIL CANCEL;

CANCEL

ALTER SESSION SET CONTAINER=PNEUMA;

DECLARE

CURSOR c_fileids IS

SELECT file# FROM v\$recover_file where online_STATUS='OFFLINE';

r_fileid c_fileids%ROWTYPE;

BEGIN

OPEN c_fileids;

LOOP

FETCH c_fileids INTO r_fileid;

EXIT WHEN c_fileids%NOTFOUND;

BEGIN

EXECUTE IMMEDIATE 'ALTER DATABASE DATAFILE '||to_char(r_fileid.file#)||' ONLINE';

END;

END LOOP;

END;

exit

EOF

and finally:

DGMGRL> edit database "TOOLCDB1_SITE2" set state='APPLY-ON';

1	DGMGRL> edit database "TOOLCDB1_SITE2" set state='APPLY-ON';

Now, I do not have anymore any datafiles offline on the standby:

SQL> select 'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!' from v$recover_file where online_status='OFFLINE' group by con_id;

no rows selected

SQL> select 'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!' from v$recover_file where online_status='OFFLINE' group by con_id;

no rows selected

I will not publish the steps for the second standby, they are exactly the same (same output as well).

At the end, for me it is important to highlight that monitoring the OFFLINE datafiles on the standby becomes a crucial point to guarantee the health of Data Guard in Multitenant. Relying on the Broker status or “PDB recovery disabled” is not enough.

On the bright side, it is nice to see that Cascade Standby configurations do not introduce any variation, so cascaded standbys can be threated the same as “direct” standby databases.

HTH

—

Ludovico

Parameter REMOTE_LISTENER pointing to a TNS alias? Beware of how it registers.

Posted on August 23, 2019 by Ludovico

On an Oracle Database instance, if I set:

alter system set remote_listener='cluster-scan:1521';

1	alter system set remote_listener='cluster-scan:1521';

The instance tries to resolve the cluster-scan name to detect if it is a SCAN address.
So, after it solves, it stores all the addresses it gets and registers to them.
I can check which addresses there are with this query:

SQL>  select type, value from v$listener_network where type='REMOTE LISTENER';

TYPE             VALUE
---------------- ---------------------------------------------------------------------------------------------------
REMOTE LISTENER  (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.1)(PORT=1521)))
REMOTE LISTENER  (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.2)(PORT=1521)))
REMOTE LISTENER  (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.3)(PORT=1521)))

SQL> select type, value from v$listener_network where type='REMOTE LISTENER';

TYPE VALUE

---------------- ---------------------------------------------------------------------------------------------------

REMOTE LISTENER (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.1)(PORT=1521)))

REMOTE LISTENER (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.2)(PORT=1521)))

REMOTE LISTENER (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.0.3)(PORT=1521)))

In this case, the instance registers to the three addresses discovered, which is OK: all three SCAN listeners will get service updates from the instance.

But if I have this TNS alias:

REMOTE_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=cluster-scan)))

1	REMOTE_LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=cluster-scan)))

and I set:

alter system set remote_listener='remote_listener';

1	alter system set remote_listener='remote_listener';

I get:

SQL>  select type, value from v$listener_network where type='REMOTE LISTENER';

TYPE             VALUE
---------------- ---------------------------------------------------------------------------
REMOTE LISTENER  (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=cluster-scan)))

SQL> select type, value from v$listener_network where type='REMOTE LISTENER';

TYPE VALUE

---------------- ---------------------------------------------------------------------------

REMOTE LISTENER (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(PORT=1521)(HOST=cluster-scan)))

the result is that the instance registers only at the first IP fot from the DNS, leaving the other SCANs without the service registration and thus random

ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

1	ORA-12514, TNS:listener does not currently know of service requested in connect descriptor

This is in my opinion quite annoying, as my goal here was to have all the DBs set with:

local_listener=local_listener
remote_listener=remote_listener

1 2	local_listener=local_listener remote_listener=remote_listener

in order to facilitate changes of ports, database migrations from different clusters, clones, etc.

So the solution is either to revert to the syntax “cluster-scan:port”, or specifying explicitly all the endpoints in the address list:

REMOTE_LISTENER = (DESCRIPTION= (ADDRESS_LIST=
  (ADDRESS= (PROTOCOL=TCP) (PORT=1521) (HOST=10.0.0.1))
  (ADDRESS= (PROTOCOL=TCP) (PORT=1521) (HOST=10.0.0.2))
  (ADDRESS= (PROTOCOL=TCP) (PORT=1521) (HOST=10.0.0.3))
 ))

REMOTE_LISTENER = (DESCRIPTION= (ADDRESS_LIST=

(ADDRESS= (PROTOCOL=TCP) (PORT=1521) (HOST=10.0.0.1))

(ADDRESS= (PROTOCOL=TCP) (PORT=1521) (HOST=10.0.0.2))

(ADDRESS= (PROTOCOL=TCP) (PORT=1521) (HOST=10.0.0.3))

))

I am sure it is “working as designed”, but I wonder if it could be an enhancement to have the address expended fully also in case of TNS alias….
Or… do you know any way to do it from a TNS alias without having the full IP list?

Cheers

—

Ludo

FPP local-mode: Steps to remove/add node from a cluster if RHP fails to move gihome

Posted on July 9, 2019 by Ludovico

I am getting more and more experience with patching clusters with the local-mode automaton. The whole process would be very complex, but the local-mode automaton makes it really easy.

I have had nevertheless a couple of clusters where the process did not work:

#1: The very first cluster that I installed in 18c

This cluster has “kind of failed” patching the first node. Actually, the rhpctl command exited with an error:

$ rhpctl move gihome -sourcehome /u01/crs/crs1830 -desthome /u01/crs/crs1860 -node server1
server1.cern.ch: Audit ID: 2
server1.cern.ch: verifying versions of Oracle homes ...
server1.cern.ch: verifying owners of Oracle homes ...
server1.cern.ch: verifying groups of Oracle homes ...
server1.cern.ch: starting to move the Oracle Grid Infrastructure home from "/u01/crs/crs1830" to "/u01/crs/crs1860" on server cluster "AISTEST-RAC16"
[...]
2019/07/08 09:45:06 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
PRCG-1239 : failed to close a proxy connection
Connection refused to host: server1.cern.ch; nested exception is:
        java.net.ConnectException: Connection refused (Connection refused)
PRCG-1079 : Internal error: ClientFactoryImpl-submitAction-error1
PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

$ rhpctl move gihome -sourcehome /u01/crs/crs1830 -desthome /u01/crs/crs1860 -node server1

server1.cern.ch: Audit ID: 2

server1.cern.ch: verifying versions of Oracle homes ...

server1.cern.ch: verifying owners of Oracle homes ...

server1.cern.ch: verifying groups of Oracle homes ...

server1.cern.ch: starting to move the Oracle Grid Infrastructure home from "/u01/crs/crs1830" to "/u01/crs/crs1860" on server cluster "AISTEST-RAC16"

[...]

2019/07/08 09:45:06 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'

PRCG-1239 : failed to close a proxy connection

Connection refused to host: server1.cern.ch; nested exception is:

java.net.ConnectException: Connection refused (Connection refused)

PRCG-1079 : Internal error: ClientFactoryImpl-submitAction-error1

PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

But actually, the helper kept running and configured everything properly:

$ tail -f /ORA/dbs01/oracle/crsdata/server1/crsconfig/crs_postpatch_server1_2019-07-08_09-41-36AM.log
2019-07-08 09:55:25:
2019-07-08 09:55:25: Succeeded in writing the checkpoint:'ROOTCRS_POSTPATCH' with status:SUCCESS
2019-07-08 09:55:25: Executing cmd: /u01/crs/crs1860/bin/clsecho -p has -f clsrsc -m 672
2019-07-08 09:55:25: Executing cmd: /u01/crs/crs1860/bin/clsecho -p has -f clsrsc -m 672
2019-07-08 09:55:25: Command output:
>  CLSRSC-672: Post-patch steps for patching GI home successfully completed.
>End Command output
2019-07-08 09:55:25: CLSRSC-672: Post-patch steps for patching GI home successfully completed.

$ tail -f /ORA/dbs01/oracle/crsdata/server1/crsconfig/crs_postpatch_server1_2019-07-08_09-41-36AM.log

2019-07-08 09:55:25:

2019-07-08 09:55:25: Succeeded in writing the checkpoint:'ROOTCRS_POSTPATCH' with status:SUCCESS

2019-07-08 09:55:25: Executing cmd: /u01/crs/crs1860/bin/clsecho -p has -f clsrsc -m 672

2019-07-08 09:55:25: Command output:

> CLSRSC-672: Post-patch steps for patching GI home successfully completed.

>End Command output

2019-07-08 09:55:25: CLSRSC-672: Post-patch steps for patching GI home successfully completed.

The cluster was OK on the first node, with the correct patch level. The second node, however, was failing with:

$  rhpctl move gihome -sourcehome /u01/crs/crs1830 -desthome /u01/crs/crs1860 -node server2
server1.cern.ch: retrieving status of databases ...
server1.cern.ch: retrieving status of services of databases ...
PRCT-1011 : Failed to run "rhphelper". Detailed error: <HLP_EMSG>,RHPHELP_procCmdLine-05,</HLP_EMSG>,<HLP_VRES>3</HLP_VRES>,<HLP_IEEMSG>,PRCG-1079 : Internal error: RHPHELP122_main-01,</HLP_IEEMSG>,<HLP_ERES>1</HLP_ERES>

$ rhpctl move gihome -sourcehome /u01/crs/crs1830 -desthome /u01/crs/crs1860 -node server2

server1.cern.ch: retrieving status of databases ...

server1.cern.ch: retrieving status of services of databases ...

PRCT-1011 : Failed to run "rhphelper". Detailed error: <HLP_EMSG>,RHPHELP_procCmdLine-05,</HLP_EMSG>,<HLP_VRES>3</HLP_VRES>,<HLP_IEEMSG>,PRCG-1079 : Internal error: RHPHELP122_main-01,</HLP_IEEMSG>,<HLP_ERES>1</HLP_ERES>

I am not sure about the cause, but let’s assume it is irrelevant for the moment.

#2: A cluster with new GI home not properly linked with RAC

This was another funny case, where the first node patched successfully, but the second one failed upgrading in the middle of the process with a java NullPointer exception. We did a few bad tries of prePatch and postPatch to solve, but after that the second node of the cluster was in an inconsistent state: in ROLLING_UPGRADE mode and not possible to patch anymore.

Common solution: removing the node from the cluster and adding it back

In both cases we were in the following situation:

one node was successfully patched to 18.6
one node was not patched and was not possible to patch it anymore (at least without heavy interventions)

So, for me, the easiest solution has been removing the failing node and adding it back with the new patched version.

Steps to remove the node

Although the steps are described here: https://docs.oracle.com/en/database/oracle/oracle-database/18/cwadd/adding-and-deleting-cluster-nodes.html#GUID-8ADA9667-EC27-4EF9-9F34-C8F65A757F2A, there are a few differences that I will highlight:

Stop of the cluster:

(root)# crsctl stop crs

1	(root)# crsctl stop crs

The actual procedure to remove a node asks to deconfigure the databases and managed homes from the active cluster version. But as we manage our homes with golden images, we do not need this; we rather want to keep all the entries in the OCR so that when we add it back, everything is in place.

Once stopped the CRS, we have deinstalled the CRS home on the failing node:

(oracle)$ $OH/deinstall/deinstall -local

1	(oracle)$ $OH/deinstall/deinstall -local

This complained about the CRS that was down, but it continued and ask for this script to be executed:

/u01/crs/crs1830/crs/install/rootcrs.sh -force  -deconfig -paramfile "/tmp/deinstall2019-07-08_11-37-20AM/response/deinstall_1830.rsp"

1	/u01/crs/crs1830/crs/install/rootcrs.sh -force -deconfig -paramfile "/tmp/deinstall2019-07-08_11-37-20AM/response/deinstall_1830.rsp"

We’ve got errors also for this script, but the remove process was OK afterall.

Then, from the surviving node:

root # crsctl delete node -n server2
oracle $ srvctl stop vip -vip server2
root $ srvctl remove vip -vip server2

root # crsctl delete node -n server2

oracle $ srvctl stop vip -vip server2

root $ srvctl remove vip -vip server2

Adding the node back

From the surviving node, we ran gridSetup.sh and followed the steps to ad the node.

Wait before running root.sh.

In our case, we have originally installed the cluster starting with a SW_ONLY install. This type of installation keeps some leftovers in the configuration files that prevent the root.sh from configuring the cluster…we have had to modify rootconfig.sh:

check/modify /u01/crs/crs1860/crs/config/rootconfig.sh and change this:
# before:
# SW_ONLY=true
# after:
SW_ONLY=false

check/modify /u01/crs/crs1860/crs/config/rootconfig.sh and change this:

# before:

# SW_ONLY=true

# after:

SW_ONLY=false

then, after running root.sh and the config tools, everything was back as before removing the node form the cluster.

For one of the clusters , both nodes were at the same patch level, but the cluster was still in ROLLING_PATCH mode. So we have had to do a

(root) # crsctl stop rollingpatch

1	(root) # crsctl stop rollingpatch

—

Ludo

Oracle SW_ONLY install leads to relink with rac_off at every attachHome

Posted on July 4, 2019 by Ludovico

OK, I really do not know what other title I should use for this post.

I have developed and presented a few times my personal approach to Oracle Home provisioning and patching. You can read more in this series.

With this approach:

I install the software (either GI or RDBMS) with the option SW_ONLY once
I patch it to the last version
I create a golden image that I evolve for the rest of the release lifecycle

When I need to install it, I just unzip the golden image and attach it to the Central Inventory.

I have discovered quite longtime ago that, every time I was attaching the home to the inventory, the binaries were relinked with rac_off, disregarding the fact that the home that I zipped actually had RAC enabled. This is quite annoying at my work at CERN, as all our databases are RAC.

So my solution to the problem is to detect if the server is on a cluster, and relink on the fly:

### EARLIER, IN THE ENVIRONMENT SCRIPTS
if [ -f /etc/oracle/olr.loc ] ; then
        export CRS_EXISTS=1
else
        export CRS_EXISTS=0
fi

### LATER, AFTER ATTACHING THE ORACLE_HOME:
pushd $ORACLE_HOME/rdbms/lib
if [ $CRS_EXISTS -eq 1 ] ; then
	make -f ins_rdbms.mk rac_on
else
	make -f ins_rdbms.mk rac_off
fi
make -f ins_rdbms.mk ioracle

### EARLIER, IN THE ENVIRONMENT SCRIPTS

if [ -f /etc/oracle/olr.loc ] ; then

export CRS_EXISTS=1

else

export CRS_EXISTS=0

### LATER, AFTER ATTACHING THE ORACLE_HOME:

pushd $ORACLE_HOME/rdbms/lib

if [ $CRS_EXISTS -eq 1 ] ; then

make -f ins_rdbms.mk rac_on

else

make -f ins_rdbms.mk rac_off

make -f ins_rdbms.mk ioracle

This is a simplified snippet of my actual code, but it gives the idea.

What causes the relink with rac_off?

I have discovered recently that the steps used by the runInstaller process to attach the Oracle Home are described in this file:

$ORACLE_HOME/inventory/make/makeorder.xml

1	$ORACLE_HOME/inventory/make/makeorder.xml

and in my case, for all my golden images, it contains:

<ohmd:MAKE
MAKEPATH="/usr/bin/make" FILENAME="rdbms/lib/ins_rdbms.mk" >
<ohmd:TARGET ACTIONTYPE="INSTALL" TARGETNAME="rac_off" >
<ohmd:INPUT_LIST>
<ohmd:INPUT VAL="ORACLE_HOME=%ORACLE_HOME%"/>
</ohmd:INPUT_LIST>
<ohmd:COMP_LIST>
<ohmd:COMP NAME="oracle.rdbms" VERSION="18.0.0.0.0"/>
</ohmd:COMP_LIST>
</ohmd:TARGET>
</ohmd:MAKE>

<ohmd:MAKE

MAKEPATH="/usr/bin/make" FILENAME="rdbms/lib/ins_rdbms.mk" >

<ohmd:TARGET ACTIONTYPE="INSTALL" TARGETNAME="rac_off" >

<ohmd:INPUT_LIST>

<ohmd:INPUT VAL="ORACLE_HOME=%ORACLE_HOME%"/>

</ohmd:INPUT_LIST>

<ohmd:COMP_LIST>

<ohmd:COMP NAME="oracle.rdbms" VERSION="18.0.0.0.0"/>

</ohmd:COMP_LIST>

</ohmd:TARGET>

</ohmd:MAKE>

So, it does not matter how I prepare my images: unless I change this file and put rac_on, the runInstaller keeps relinking with rac_off.

I have thought about changing the file, but then realized that I prefer to check and recompile at runtime, so I can reuse my images also for standalone servers (in case we need them).

Just to avoid surprises, it is convenient to check if a ORACLE_HOME is linked with RAC with this small function:

$ type isRACoh
isRACoh is a function
isRACoh ()
{
    OH2CHECK=${1:-$ORACLE_HOME};
    ar -t $OH2CHECK/rdbms/lib/libknlopt.a | grep --color=auto kcsm.o > /dev/null;
    if [ $? -eq 0 ]; then
        echo "Enabled";
    else
        echo "Disabled";
        false;
    fi
}

$ type isRACoh

isRACoh is a function

isRACoh ()

{

OH2CHECK=${1:-$ORACLE_HOME};

ar -t $OH2CHECK/rdbms/lib/libknlopt.a | grep --color=auto kcsm.o > /dev/null;

if [ $? -eq 0 ]; then

echo "Enabled";

else

echo "Disabled";

false;

}

This is true especially for Grid Infrastructure golden images, as they have the very same behavior of RDBMS homes, with the exception that they might break out-of-place patching if RAC is not enabled: the second ASM instance will not mount because the first will be exclusively mounted without the RAC option.

HTH.

—

Ludovico

Oracle Grid Infrastructure 19c does not configure FPP in local-mode by default. How to add it?

Posted on June 13, 2019 by Ludovico

I have been installing Grid Infrastructure 18c for a while, then switched to 19c when it became GA.

At the beginning I have been overly enthusiast by the shorter installation time:

Grid Infra 19c install process is MUCH faster than 18c/12cR2. Mean time for 2 node clusters @ CERN (incl. volumes, puppet runs, etc.) lowered from 1h30 to 45mins. No GIMR anymore by default!

— Ludovico Caldara (@ludodba) 5 maggio 2019

The GIMR is now optional, that means that deciding to install it is a choice of the customer, and a customer might like to keep it or not, depending on its practices.

Not having the GIMR by default means not having the local-mode automaton. This is also not a problem at all. The default configuration is good for most customers and works really well.

This new simplified configuration reduces some maintenance effort at the beginning, but personally I use a lot the local-mode automaton for out-of-place patching of Grid Infrastructure (read my blog posts to know why I really love the local-mode automaton), so it is something that I definitely need in my clusters.

A choice that makes sense for Oracle and most customers

Oracle vision regarding Grid Infrastructure consists of a central management of clusters, using the Oracle Domain Services Cluster. In this kind of deployment, the Management Repository, TFA, and many other services, are centralized. All the clusters use those services remotely instead of having them configured locally. The local-mode automaton is no exception: the full, enterprise-grade version of Fleet Patching and Provisioning (FPP, formerly Rapid home provisioning or RHP) allows much more than just out-of-place patching of Grid Infrastructure, so it makes perfectly sense to avoid those configurations everywhere, if you use a Domain Cluster architecture. Read more here.

Again, as I said many times in the past, doing out-of-place patching is the best approach in my opinion, but if you keep doing in-place patching, not having the local-mode automaton is not a problem at all and the default behavior in 19c is a good thing for you.

I need local-mode automaton on 19c, what I need to do at install time?

If you have many clusters, you are not installing them by hand with the graphic interface (hopefully!). In the responseFile for 19c Grid Infrastructure installation, this is all you need to change comparing to a 18c:

$ diff grid_install_template_18.rsp grid_install_template_19.rsp
1c1
< oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v18.0.0
---
> oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v19.0.0
25a26
> oracle.install.crs.configureGIMR=true
27c28
< oracle.install.crs.config.storageOption=
---
> oracle.install.crs.config.storageOption=FLEX_ASM_STORAGE

$ diff grid_install_template_18.rsp grid_install_template_19.rsp

1c1

< oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v18.0.0

---

> oracle.install.responseFileVersion=/oracle/install/rspfmt_crsinstall_response_schema_v19.0.0

25a26

> oracle.install.crs.configureGIMR=true

27c28

< oracle.install.crs.config.storageOption=

---

> oracle.install.crs.config.storageOption=FLEX_ASM_STORAGE

as you can see, also Flex ASM is not part of the game by default in 19c.

Once you specify in the responseFile that you want GIMR, then the local-mode automaton is installed as well by default.

I installed GI 19c without GIMR and local-mode automaton. How can I add them to my new cluster?

First, recreate the empty MGMTDB CDB by hand:

$ dbca -silent -createDatabase -sid -MGMTDB -createAsContainerDatabase true \
 -templateName MGMTSeed_Database.dbc -gdbName _mgmtdb \
 -storageType ASM -diskGroupName +MGMT \
 -datafileJarLocation $OH/assistants/dbca/templates \
 -characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck

Prepare for db operation
10% complete
Registering database with Oracle Grid Infrastructure
14% complete
Copying database files
43% complete
Creating and starting Oracle instance
45% complete
49% complete
54% complete
58% complete
62% complete
Completing Database Creation
66% complete
69% complete
71% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
 /u01/app/oracle/cfgtoollogs/dbca/_mgmtdb.
Database Information:
Global Database Name:_mgmtdb
System Identifier(SID):-MGMTDB
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb2.log" for further details.

$ dbca -silent -createDatabase -sid -MGMTDB -createAsContainerDatabase true \

-templateName MGMTSeed_Database.dbc -gdbName _mgmtdb \

-storageType ASM -diskGroupName +MGMT \

-datafileJarLocation $OH/assistants/dbca/templates \

-characterset AL32UTF8 -autoGeneratePasswords -skipUserTemplateCheck

Prepare for db operation

10% complete

Registering database with Oracle Grid Infrastructure

14% complete

Copying database files

43% complete

Creating and starting Oracle instance

45% complete

49% complete

54% complete

58% complete

62% complete

Completing Database Creation

66% complete

69% complete

71% complete

Executing Post Configuration Actions

100% complete

Database creation complete. For details check the logfiles at:

/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb.

Database Information:

Global Database Name:_mgmtdb

System Identifier(SID):-MGMTDB

Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb2.log" for further details.

Then, configure the PDB for the cluster. Pay attention to the -local switch that is not documented (or at least it does not appear in the inline help):

$ mgmtca -local

1	$ mgmtca -local

After that, you might check that you have the PDB for your cluster inside the MGMTDB, I’ll skip this step.

Before creating the rhpserver (local-mode automaton resource), we need the volume and filesystem to make it work (read here for more information).

The volume:

ASMCMD> volcreate -G MGMT -s 1536M --column 8 --width 1024k --redundancy unprotected GHCHKPT

ASMCMD> volinfo --all
Diskgroup Name: MGMT

         Volume Name: GHCHKPT
         Volume Device: /dev/asm/ghchkpt-303
         State: ENABLED
         Size (MB): 1536
         Resize Unit (MB): 64
         Redundancy: UNPROT
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:

ASMCMD> volcreate -G MGMT -s 1536M --column 8 --width 1024k --redundancy unprotected GHCHKPT

ASMCMD> volinfo --all

Diskgroup Name: MGMT

Volume Name: GHCHKPT

Volume Device: /dev/asm/ghchkpt-303

State: ENABLED

Size (MB): 1536

Resize Unit (MB): 64

Redundancy: UNPROT

Stripe Columns: 8

Stripe Width (K): 1024

Usage:

Mountpath:

The filesystem:

(oracle)$ mkfs -t acfs /dev/asm/ghchkpt-303

(root)# $CRS_HOME/bin/srvctl add filesystem -d /dev/asm/ghchkpt-303 -m /opt/oracle/rhp_images/chkbase -u oracle -fstype ACFS
(root)# $CRS_HOME/bin/srvctl enable filesystem -volume ghchkpt -diskgroup MGMT
(root)# $CRS_HOME/bin/srvctl start filesystem -volume ghchkpt -diskgroup MGMT

(oracle)$ mkfs -t acfs /dev/asm/ghchkpt-303

(root)# $CRS_HOME/bin/srvctl add filesystem -d /dev/asm/ghchkpt-303 -m /opt/oracle/rhp_images/chkbase -u oracle -fstype ACFS

(root)# $CRS_HOME/bin/srvctl enable filesystem -volume ghchkpt -diskgroup MGMT

(root)# $CRS_HOME/bin/srvctl start filesystem -volume ghchkpt -diskgroup MGMT

Finally, create the local-mode automaton resource:

(root)# $CRS_HOME/bin/srvctl add rhpserver -local -storage /opt/oracle/rhp_images

1	(root)# $CRS_HOME/bin/srvctl add rhpserver -local -storage /opt/oracle/rhp_images

Again, note that there is a -local switch that is not documented. Specifying it will create the resource as a local-mode automaton and not as a full FPP Server (or RHP Server, damn, this change of name gets me mad when I write blog posts about it 🙂 ).

HTH

—

Ludovico

Oracle Clusterware Services Status at a glance, fast!

Posted on March 20, 2019 by Ludovico

If you use Oracle Clusterware or you deploy your databases to the Oracle Cloud, you probably have some application services defined with srvctl for your database.

If you have many databases, services and nodes, it might be annoying, when doing maintenance or service relocation, to have a quick overview about how services are distributed across the nodes and what’s their status.

With srvctl (the official tool for that), it is a per-database operation:

$ srvctl status service
PRKO-2082 : Missing mandatory option -db

1 2	$ srvctl status service PRKO-2082 : Missing mandatory option -db

If you have many databases, you have to run db by db.

It is also slow! For example, this database has 20 services. Getting the status takes 27 seconds:

# [ oracle@server1:/home/oracle/ [15:52:00] [11.2.0.4.0 [DBMS EE] SID=HRDEV1] 1 ] #
$ time srvctl status service -d hrdev_site1
Service SERVICE_NUMBER_01 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_02 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_03 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_04 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_05 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_06 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_07 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_08 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_09 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_10 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_11 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_12 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_13 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_14 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_15 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_16 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_17 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_18 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_19 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_20 is running on instance(s) HRDEV4

real    0m27.858s
user    0m1.365s
sys     0m1.143s

# [ oracle@server1:/home/oracle/ [15:52:00] [11.2.0.4.0 [DBMS EE] SID=HRDEV1] 1 ] #

$ time srvctl status service -d hrdev_site1

Service SERVICE_NUMBER_01 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_02 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_03 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_04 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_05 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_06 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_07 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_08 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_09 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_10 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_11 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_12 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_13 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_14 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_15 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_16 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_17 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_18 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_19 is running on instance(s) HRDEV4

Service SERVICE_NUMBER_20 is running on instance(s) HRDEV4

real 0m27.858s

user 0m1.365s

sys 0m1.143s

Instead of operating row-by-row (get the status for each service), why not relying on the cluster resources with crsctl and get the big picture once?

$ time crsctl stat res -f -w "(TYPE = ora.service.type)"
...
...

real    0m0.655s
user    0m0.169s
sys     0m0.098s

$ time crsctl stat res -f -w "(TYPE = ora.service.type)"

...

real 0m0.655s

user 0m0.169s

sys 0m0.098s

crsctl stat res -f returns a list of ATTRIBUTE_NAME=value for each service, eventually more than one if the service is not singleton/single instance but uniform/multi instance.

By parsing them with some awk code can provide nice results!

STATE, INTERNAL_STATE and TARGET are useful in this case and might be used to display colours as well.

Green: Status ONLINE, Target ONLINE, STABLE
Black: Status OFFLINE, Target OFFLNE, STABLE
Red: Status ONLINE, Target OFFLINE, STABLE
Yellow: all other cases

Here’s the code:

if [ -f /etc/oracle/olr.loc ] ; then
        export ORA_CLU_HOME=`cat /etc/oracle/olr.loc 2>/dev/null | grep crs_home | awk -F= '{print $2}'`
        export CRS_EXISTS=1
        export CRSCTL=$ORA_CLU_HOME/bin/crsctl
else
        export CRS_EXISTS=0
fi

svcstat ()
{
    if [ $CRS_EXISTS -eq 1 ]; then
        ${CRSCTL} stat res -f -w "(TYPE = ora.service.type)" | awk -F= '
function print_row() {
        dbbcol="";
        dbecol="";
        instbcol="";
        instecol="";
        instances=res["INSTANCE_COUNT 1"];
        for(i=1;i<=instances;i++) {
                # if at least one of the services is online, the service is online (then I paint it green)
                if (res["STATE " i] == "ONLINE" ) {
                        dbbcol="\033[0;32m";
                        dbecol="\033[0m";
                }
        }
        # db unique name is always the second part of the resource name
        # because it does not change, I can get it once from the resource name
        res["DB_UNIQUE_NAME"]=substr(substr(res["NAME"],5),1,index(substr(res["NAME"],5),".")-1);

        # same for service name
        res["SERVICE_NAME"]=substr(res["NAME"],index(substr(res["NAME"],5),".")+5,length(substr(res["NAME"],index(substr(res["NAME"],5),".")+5))-4);

        #starting printing the first part of the information
        printf ("%s%-24s %-30s%s",dbbcol, res["DB_UNIQUE_NAME"], res["SERVICE_NAME"], dbecol);

        # here, instance need to map to the correct server.
        # the mapping is node by attribute TARGET_SERVER (not last server)
        for ( n in node ) {
                node_name=node[n];
                status[node_name]="";
                for (i=1; i<=instances; i++) {
                        # we are on the instance that matches the server
                        if (node_name == res["TARGET_SERVER " i]) {
                                res["SERVER_NAME " i]=node_name;
                                if (status[node_name] !~ "ONLINE") {
                                        # when a service relocates both instances get the survival target_server
                                        # but just one is ONLINE... so we need to get always the ONLINE one.
                                        #printf("was::%s:", status[node_name]);
                                        status[node_name]=res["STATE " i];
                                }

                                # colors modes
                                if ( res["STATE " i] == "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {
                                        # online and stable: GREEN
                                        status[node_name]=sprintf("\033[0;32m%-14s\033[0m", status[node_name]);
                                }
                                else if ( res["STATE " i] != "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {
                                        # offline and stable
                                        if ( res["TARGET " i] == "OFFLINE" ) {
                                                # offline, stable, target offline: BLACK
                                                status[node_name]=sprintf("%-14s", status[node_name]);
                                        }
                                        else {
                                                # offline, stable, target online: RED
                                                status[node_name]=sprintf("\033[0;31m%-14s\033[0m", status[node_name]);
                                        }
                                }
                                else {
                                        # all other cases: offline and starting, online and stopping, clearning, etc.: YELLOW
                                        status[node_name]=sprintf("\033[0;33m%-14s\033[0m", status[node_name]);
                                }
                                #printf("%s %s %s %s\n", status[node_name], node[n], res["STATE " i], res["INTERNAL_STATE " i]);
                        }
                }
               printf(" %-14s", status[node_name]);
        }
        printf("\n");
}
function pad (string, len, char) {
        ret = string;
        for ( i = length(string); i<len ; i++) {
                ret = sprintf("%s%s",ret,char);
        }
        return ret;
}
BEGIN {
        debug = 0;
        first = 1;
        afterempty=1;
        # this loop should set:
        # node[1]=server1; node[2]=server2; nodes=2;
        nodes=0;
        while ("olsnodes" | getline a) {
                nodes++;
                node[nodes] = a;
        }
        fmt="%-24s %-30s";
        printf (fmt, "DB_Unique_Name", "Service_Name");
        for ( n in node ) {
                printf (" %-14s", node[n]);
        }
        printf ("\n");
        printf (fmt, pad("",24,"-"), pad("",30,"-"));
        for ( n in node ) {
                printf (" %s", pad("",14,"-"));
        }
        printf ("\n");

}
# MAIN awk svcstat
{
        if ( $1 == "NAME" ) {
                if ( first != 1 && res["NAME"] == $2 ) {
                        if ( debug == 1 ) print "Secondary instance";
                        instance++;
                }
                else {
                        if ( first != 1 ) {
                                print_row();
                        }
                        first = 0;
                        instance=1;
                        delete res;
                        res["NAME"] = $2;
                }
        }
        else  {
                res[$1 " " instance] = $2 ;

        }
}
END {
        #if ( debug == 1 ) for (key in res) { print key ": " res[key] }
        print_row();
}
';
    else
        echo "svcstat not available on non-clustered environments";
        false;
    fi
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

if [ -f /etc/oracle/olr.loc ] ; then

export ORA_CLU_HOME=`cat /etc/oracle/olr.loc 2>/dev/null | grep crs_home | awk -F= '{print $2}'`

export CRS_EXISTS=1

export CRSCTL=$ORA_CLU_HOME/bin/crsctl

else

export CRS_EXISTS=0

svcstat ()

{

if [ $CRS_EXISTS -eq 1 ]; then

${CRSCTL} stat res -f -w "(TYPE = ora.service.type)" | awk -F= '

function print_row() {

dbbcol="";

dbecol="";

instbcol="";

instecol="";

instances=res["INSTANCE_COUNT 1"];

for(i=1;i<=instances;i++) {

# if at least one of the services is online, the service is online (then I paint it green)

if (res["STATE " i] == "ONLINE" ) {

dbbcol="\033[0;32m";

dbecol="\033[0m";

}

# db unique name is always the second part of the resource name

# because it does not change, I can get it once from the resource name

res["DB_UNIQUE_NAME"]=substr(substr(res["NAME"],5),1,index(substr(res["NAME"],5),".")-1);

# same for service name

res["SERVICE_NAME"]=substr(res["NAME"],index(substr(res["NAME"],5),".")+5,length(substr(res["NAME"],index(substr(res["NAME"],5),".")+5))-4);

#starting printing the first part of the information

printf ("%s%-24s %-30s%s",dbbcol, res["DB_UNIQUE_NAME"], res["SERVICE_NAME"], dbecol);

# here, instance need to map to the correct server.

# the mapping is node by attribute TARGET_SERVER (not last server)

for ( n in node ) {

node_name=node[n];

status[node_name]="";

for (i=1; i<=instances; i++) {

# we are on the instance that matches the server

if (node_name == res["TARGET_SERVER " i]) {

res["SERVER_NAME " i]=node_name;

if (status[node_name] !~ "ONLINE") {

# when a service relocates both instances get the survival target_server

# but just one is ONLINE... so we need to get always the ONLINE one.

#printf("was::%s:", status[node_name]);

status[node_name]=res["STATE " i];

}

# colors modes

if ( res["STATE " i] == "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {

# online and stable: GREEN

status[node_name]=sprintf("\033[0;32m%-14s\033[0m", status[node_name]);

}

else if ( res["STATE " i] != "ONLINE" && res["INTERNAL_STATE " i] == "STABLE" ) {

# offline and stable

if ( res["TARGET " i] == "OFFLINE" ) {

# offline, stable, target offline: BLACK

status[node_name]=sprintf("%-14s", status[node_name]);

}

else {

# offline, stable, target online: RED

status[node_name]=sprintf("\033[0;31m%-14s\033[0m", status[node_name]);

}

else {

# all other cases: offline and starting, online and stopping, clearning, etc.: YELLOW

status[node_name]=sprintf("\033[0;33m%-14s\033[0m", status[node_name]);

}

#printf("%s %s %s %s\n", status[node_name], node[n], res["STATE " i], res["INTERNAL_STATE " i]);

}

printf(" %-14s", status[node_name]);

}

printf("\n");

}

function pad (string, len, char) {

ret = string;

for ( i = length(string); i<len ; i++) {

ret = sprintf("%s%s",ret,char);

}

return ret;

}

BEGIN {

debug = 0;

first = 1;

afterempty=1;

# this loop should set:

# node[1]=server1; node[2]=server2; nodes=2;

nodes=0;

while ("olsnodes" | getline a) {

nodes++;

node[nodes] = a;

}

fmt="%-24s %-30s";

printf (fmt, "DB_Unique_Name", "Service_Name");

for ( n in node ) {

printf (" %-14s", node[n]);

}

printf ("\n");

printf (fmt, pad("",24,"-"), pad("",30,"-"));

for ( n in node ) {

printf (" %s", pad("",14,"-"));

}

printf ("\n");

}

# MAIN awk svcstat

{

if ( $1 == "NAME" ) {

if ( first != 1 && res["NAME"] == $2 ) {

if ( debug == 1 ) print "Secondary instance";

instance++;

}

else {

if ( first != 1 ) {

print_row();

}

first = 0;

instance=1;

delete res;

res["NAME"] = $2;

}

else {

res[$1 " " instance] = $2 ;

}

END {

#if ( debug == 1 ) for (key in res) { print key ": " res[key] }

print_row();

}

else

echo "svcstat not available on non-clustered environments";

false;

}

Here’s what you can expect, for 92 services distributed on 4 nodes and a dozen of databases (the output is snipped and the names are masked):

$ time svcstat
DB_Unique_Name     Service_Name       server1  server2  server3  server4
------------------ ------------------ -------- -------- -------- --------
hrdev_site1        SERVICE_NUMBER_01                             ONLINE
hrdev_site1        SERVICE_NUMBER_02                             ONLINE
...
hrdev_site1        SERVICE_NUMBER_20                             ONLINE
hrstg_site1        SERVICE_NUMBER_21                    ONLINE  
hrstg_site1        SERVICE_NUMBER_22                    ONLINE  
...
hrstg_site1        SERVICE_NUMBER_41                    ONLINE  
hrtest_site1       SERVICE_NUMBER_42           ONLINE           
hrtest_site1       SERVICE_NUMBER_43           ONLINE           
...
hrtest_site1       SERVICE_NUMBER_62           ONLINE           
hrtest_site1       SERVICE_NUMBER_63           ONLINE           
hrtest_site1       SERVICE_NUMBER_64           ONLINE           
hrtest_site1       SERVICE_NUMBER_65           ONLINE           
hrtest_site1       SERVICE_NUMBER_66           ONLINE           
erpdev_site1       SERVICE_NUMBER_67  ONLINE                    
erptest_site1      SERVICE_NUMBER_68  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_69  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_70  ONLINE                    
...
cmsstg_site1       SERVICE_NUMBER_74  ONLINE                    
cmsstg_site1       SERVICE_NUMBER_75  ONLINE                    
cmstest_site1      SERVICE_NUMBER_76  ONLINE                    
...
cmstest_site1      SERVICE_NUMBER_81  ONLINE                    
kbtest_site1       SERVICE_NUMBER_82                    ONLINE           
...
kbtest_site1       SERVICE_NUMBER_84                    ONLINE           
reporting_site1    SERVICE_NUMBER_85  ONLINE                    
paydev_site1       SERVICE_NUMBER_86           ONLINE           
payrep_site1       SERVICE_NUMBER_87           ONLINE           
...
paytest_site1      SERVICE_NUMBER_90           ONLINE           
paytest_site1      SERVICE_NUMBER_91           ONLINE           
crm_site1          SERVICE_NUMBER_92                             ONLINE

real    0m0.358s
user    0m0.232s
sys     0m0.134s

$ time svcstat

DB_Unique_Name Service_Name server1 server2 server3 server4

------------------ ------------------ -------- -------- -------- --------

hrdev_site1 SERVICE_NUMBER_01 ONLINE

hrdev_site1 SERVICE_NUMBER_02 ONLINE

...

hrdev_site1 SERVICE_NUMBER_20 ONLINE

hrstg_site1 SERVICE_NUMBER_21 ONLINE

hrstg_site1 SERVICE_NUMBER_22 ONLINE

...

hrstg_site1 SERVICE_NUMBER_41 ONLINE

hrtest_site1 SERVICE_NUMBER_42 ONLINE

hrtest_site1 SERVICE_NUMBER_43 ONLINE

...

hrtest_site1 SERVICE_NUMBER_62 ONLINE

hrtest_site1 SERVICE_NUMBER_63 ONLINE

hrtest_site1 SERVICE_NUMBER_64 ONLINE

hrtest_site1 SERVICE_NUMBER_65 ONLINE

hrtest_site1 SERVICE_NUMBER_66 ONLINE

erpdev_site1 SERVICE_NUMBER_67 ONLINE

erptest_site1 SERVICE_NUMBER_68 ONLINE

cmsstg_site1 SERVICE_NUMBER_69 ONLINE

cmsstg_site1 SERVICE_NUMBER_70 ONLINE

...

cmsstg_site1 SERVICE_NUMBER_74 ONLINE

cmsstg_site1 SERVICE_NUMBER_75 ONLINE

cmstest_site1 SERVICE_NUMBER_76 ONLINE

...

cmstest_site1 SERVICE_NUMBER_81 ONLINE

kbtest_site1 SERVICE_NUMBER_82 ONLINE

...

kbtest_site1 SERVICE_NUMBER_84 ONLINE

reporting_site1 SERVICE_NUMBER_85 ONLINE

paydev_site1 SERVICE_NUMBER_86 ONLINE

payrep_site1 SERVICE_NUMBER_87 ONLINE

...

paytest_site1 SERVICE_NUMBER_90 ONLINE

paytest_site1 SERVICE_NUMBER_91 ONLINE

crm_site1 SERVICE_NUMBER_92 ONLINE

real 0m0.358s

user 0m0.232s

sys 0m0.134s

I’d be curious to know if it works well for your environment, please comment here. 🙂

Thanks

—

Ludo