New views in Oracle Data Guard 23c

Oracle Data Guard 23c comes with many nice improvements for observability, which greatly increase the usability of Data Guard in environments with a high level of automation.

For the 23c version, we have the following new views.V$DG_BROKER_ROLE_CHANGE

This view tracks the last role transitions that occurred in the configuration. Example:

The event might be a Switchover, Failover, or Fast-Start Failover.

In the case of Fast-Start Failover, you will see the reason (typically “Primary Disconnected” if it comes from the observer, or whatever reason you put in DBMS_DG.INITIATE_FS_FAILOVER.

No more need to analyze the logs to find out which database was primary at any moment in time!

V$DG_BROKER_PROPERTY

Before 23c, the only possible way to get a broker property from SQL was to use undocumented (unsupported) procedures in the fixed package DBMS_DRS. I’ve blogged about it in the past, before joining Oracle.

Now, it’s as easy as selecting from a view, where you can get the properties per member or per configuration:

The example selects just three columns, but the view is rich in detailing which properties apply to which situation (scope, valid_role):

The monitorable properties can be monitored using DBMS_DG.GET_PROPERTY(). I’ll write a blog post about the new PL/SQL APIs in the upcoming weeks.

I wish I had this view when I was a DBA 🙂

V$FAST_START_FAILOVER_CONFIG

If you have a Fast-Start Failover configuration, this view will show its details:

This view replaces some columns currently in v$database, that are therefore deprecated:

V$FS_LAG_HISTOGRAM

This view is useful to calculate the optimal FastStartFailoverLagTime.

It shows the frequency of Fast-Start Failover lags and the most recent occurrence for each bucket.

LAG_TIME is the upper bound of the bucket, e.g.

  • 5 -> between 0 and 5 seconds
  • 10 -> between 5 and 10 seconds
  • etc.

It’s refreshed every minute, only when Fast-Start Failover is enabled (also in observe-only mode).

V$FS_FAILOVER_OBSERVERS

This view is not new, however, its definition now contains more columns:

This gives important additional information about the observers, for example, the last time a specific observer was able to ping the primary or the target (in seconds).

Also, the path of the log file and runtime data file are available, making it easier to find them on the observer host in case of a problem.

Conclusion

These new views should greatly improve the experience when monitoring or diagnosing problems with Data Guard. But they are just a part of many improvements we introduced in 23c. Stay tuned for more 🙂

Ludovico

New in Data Guard 21c and 23c: Automatic preparation of the primary

Oracle Data Guard 21c came with a new command:

The command prepare database for data guard automatically sets parameters and creates standby redo logs according to best practices.This command prepares a database to become primary in a Data Guard configuration.

It sets many recommended parameters:

Sets the RMAN archive deletion policy, enables flashback and force logging, creates the standby logs according to the online redo logs configuration, and creates an spfile if the database is running with an init file.

If you tried this in 21c, you have noticed that there is an automatic restart of the database to set all the static parameters. If you weren’t expecting this, the sudden restart could be a bit brutal approach.

In 23c, we added an additional keyword “restart” to specify that you are OK with the restart of the database. If you don’t specify it, the broker will complain that it cannot proceed without a restart:

If you specify it, it will proceed with the restart:

Notice that if you already have these static parameters set, the broker will just set the missing dynamic parameters without the need for a restart:

This new command greatly simplifies the preparation of a Data Guard configuration!

Before 21c, you had to do everything by hand.

Ludo

When it comes to using Oracle, trust Oracle…

A month ago, I saw this article published on the AWS architecture blog:

Disaster Recovery for Oracle Database on Amazon EC2 with Fast-Start Failover

I love seeing people suggesting Oracle Data Guard Fast-Start Failover for high availability. Nevertheless, there are a few problems with the architecture and steps proposed in the article.

I sent my comments via Disqus on the AWS blogging platform, but after a month, my comment was rejected, and the blog content hasn’t changed.

For this reason, I don’t have other places to post my comment but here…

  1. The link to the setup procedure is from 2009.
    We have official documentation that we keep up to date. The Fast-Start Failover part:
    https://docs.oracle.com/en/database/oracle/oracle-database/19/dgbkr/using-data-guard-broker-to-manage-switchovers-failovers.html#GUID-D26D79F2-0093-4C0E-98CD-224A5C8CBFA4
    and the Best Practices guide:
    https://docs.oracle.com/en/database/oracle/oracle-database/19/haovw/oracle-data-guard-best-practices.html#GUID-C3A78B07-6584-4380-8D53-E5B831A5894C
  2. The part about cascading standbys references a step-by-step guide from an external blog written many years ago for 11gR2.
  3. The DBMS_SERVICE doc is from 12cR1, while other links are from 21c doc or 19c doc. As of today, most implement 19c. That’s probably the version to use.
    https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_SERVICE.html#GUID-C11449DC-EEDE-4BB8-9D2C-0A45198C1928
  4. The steps used to create the database service do not include any HA property, which will make most efforts useless. (see Table 153-6 in the link above).
  5. The article talks about TAF, but no steps exist to configure it. We don’t recommend TAF since 12c anyway. Today (19c), the recommendation is TAC (Transparent Application Continuity).
    https://www.oracle.com/docs/tech/application-checklist-for-continuous-availability-for-maa.pdf
  6. But, most important, TAF (or Oracle connectivity in general) does NOT require a host IP change! There is no need to change the DNS when using the recommended connection string with multiple address_lists.
  7. Some RedoRoutes examples are not correct. In this video I explain how they work and how to set them up:
    https://www.youtube.com/watch?v=huG8JPu_s4Q
  8. The diagram shows the master observer together with the standby database, which is a bad practice. I explain why and how here:
    https://www.youtube.com/watch?v=e81UPLfnLi0

The central message is:

If you need to implement a complex architecture using a software solution, pay attention that the practices suggested by the partner/integrator/3rd party match the ones from the software vendor. In the case of Oracle Data Guard, Oracle knows better 😉

Cheers

Ludovico

 

Video: Where should I put the Observer in a Fast-Start Failover configuration?

The video explains best practices and different failure scenarios for different observer placements. It also shows how to configure high availability for the observer.

Here’s the summary:

  • Always try to put the observer(s) on an external site.
  • If you don’t have any, put it where the primary database is, and have one ready on the secondary site after the role transition.
  • Don’t put the observer together with the standby database!
  • Configure multiple observers for high availability, and use the PreferredObserverHosts Data Guard member property to ensure you never run the observer where the standby database is.

 

Far Sync and Fast-Start Failover Protection modes

Oracle advertises Far Sync as a solution for “Zero Data Loss at any distance”. This is because the primary sends its redo stream synchronously to the Far Sync, which relays it to the remote physical standby.

There are many reasons why Far Sync is an optimal solution for this use case, but that’s not the topic of this post 🙂

Some customers ask: Can I configure Far Sync to receive the redo stream asynchronously?

Although a direct standby receiving asynchronously would be a better idea, Far Sync can receive asynchronously as well.

And one reason might be to send asynchronously to one Far Sync member that redistributes locally to many standbys.

It is very simple to achieve: just changing the RedoRoutes property on the primary.

This will work seamlessly. The v$dataguard_process will show the async transport process:

 

What about Fast-Start Failover?

Up to and including 19c, ASYNC transport to Far Sync will not work with Fast-Start Failover (FSFO).

ASYNC redo transport mandates Maximum Performance protection mode, and FSFO supports that in conjunction with Far Sync only starting with 21c.

Before 21c, trying to enable FSFO with a Far Sync will fail with:

So if you want FSFO with Far Sync in 19c, it has to be MaxAvailability (and SYNC redo transport to the FarSync).


If you don’t need FSFO, as we have seen, there is no problem. The only protection mode that will not work with Far Sync is Maximum Protection:

If FSFO is required, and you want Maximum Performance before 21c, or Maximum Protection, you have to remove Far Sync from the redo route.

Ludovico

Can a physical standby database receive the redo SYNC if the Far Sync instance fails?

The answer is YES.

In the following configuration, cdgsima_lhr1pq (primary) sends synchronously to cdgsima_farsync1 (far sync), which forwards the redo stream asynchronously to cdgsima_lhr1bm (physical standby):

But if cdgsima_farsync1 is not available, I want the primary to send synchronously to the physical standby database. I accept a performance penalty, but I do not want to compromise my data protection.

I just need to set up the Redoroutes as follows:

This is defined the second part of the RedoRoutes rules:

Let’s test. If I shutdown abort the farsync instance:

I can see the new SYNC destination being open almost instantaneously (because the old destination fails immediately with ORA-03113):

Indeed, I can see the new NSS process (synchronous redo transport) spawn at that time:

Ludo

Can I rename a PDB in a Data Guard configuration?

Someone asked me this question recently.

The answer is: yes!

Let’s see it in action.

On the primary I have:

And of course the same PDBs on the standby:

Let’s change the PDB RED name to TOBY: The PDB rename operation is straightforward (but it requires a brief downtime). To be done on the primary:

On the standby, I can see that the PDB changed its name:

The PDB name change is propagated transparently with the redo apply.

Ludo

Data Guard, Easy Connect and the Observer for multiple configurations

EZConnect

One of the challenges of automation in bin Oracle Environments is dealing with tnsnames.ora files.
These files might grow big and are sometimes hard to distribute/maintain properly.
The worst is when manual modifications are needed: manual operations, if not made carefully, can screw up the connection to the databases.
The best solution is always using LDAP naming resolution. I have seen customers using OID, OUD, Active Directory, openldapd, all with a great level of control and automation. However, some customer don’t have/want this possibility and keep relying on TNS naming resolution.
When Data Guard (and eventually RAC) are in place, the tnsnames.ora gets filled by entries for the DGConnectIdentifiers and StaticConnectIdentifier. If I add the observer, an additional entry is required to access the dbname_CFG service created by the Fast Start Failover.

Actually, all these entries are not required if I use Easy Connect.

My friend Franck Pachot wrote a couple of nice blog posts about Easy Connect while working with me at CERN:
https://medium.com/@FranckPachot/19c-easy-connect-e0c3b77968d7

https://medium.com/@FranckPachot/19c-ezconnect-and-wallet-easy-connect-and-external-password-file-8e326bb8c9f5

Basic Data Guard configuration

The basic configuration with Data Guard is quite simple to achieve with Easy Connect. In this examples I have:
– The primary database TOOLCDB1_SITE1
– The duplicated database for standby TOOLCDB1_SITE2

After setting up the static registration (no Grid Infrastructure in my lab):

and copying the passwordfile, the configuration can be created with:

That’s it.

Now, if I want to have the configuration observed, I need to activate the Fast Start Failover:

With just two databases, FastStartFailoverTarget is not explicitly needed, but I usually do it as other databases might be added to the configuration in the future.
After that, the broker complains that FSFO is enabled but there is no observer yet:

 

Observer for multiple configurations

This feature has been introduced in 12.2 but it is still not widely used.
Before 12.2, the Observer was a foreground process: the DBAs had to start it in a wrapper script executed with nohup in order to keep it live.
Since 12.2, the observer can run as a background process as far as there is a valid wallet for the connection to the databases.
Also, 12.2 introduced the capability of starting multiple configurations with a single dgmgrl command: “START OBSERVING”.

For more information about it, you can check the documentation here:
https://docs.oracle.com/en/database/oracle/oracle-database/19/dgbkr/using-data-guard-broker-to-manage-switchovers-failovers.html#GUID-BC513CDB-1E06-4EB3-9FE1-E1331E15E492

How to set it up with Easy Connect?

First, I need a wallet. And here comes the first compromise:
Having a single dgmgrl session to start all my configurations means that I have a single wallet for all the databases that I want to observe.
Fair enough, all the DBs (CDBs?) are managed by the same team in this case.
If I have only observers on my host I can easily point to the wallet from my central sqlnet.ora:

Otherwise I need to create a separate TNS_ADMIN for my observer management environment.
Then, I create the wallet:

Now I need to add the connection descriptors.

Which connection descriptors do I need?
The Observer uses the DGConnectIdentifier to keep observing the databases, but needs a connection to both of them using the TOOLCDB1_CFG service (unless I specify something different with the broker configuration property ConfigurationWideServiceName) to connect to the configuration and get the DGConnectIdentifier information. Again, you can check it in the doc. or the note Oracle 12.2 – Simplified OBSERVER Management for Multiple Fast-Start Failover Configurations (Doc ID 2285891.1)

So I need to specify three secrets for three connection descriptors:

The first one will be used for the initial connection. The other two to observe the Primary and Standby.
I need to be careful that the first EZConnect descriptor matches EXACTLY what I put in observer.ora (see next step) and the last two match my DGConnectIdentifier (unless I specify something different with ObserverConnectIdentifier), otherwise I will get some errors and the observer will not observe correctly (or will not start at all).

The dgmgrl needs then a file named observer.ora.
$ORACLE_BASE/admin/observers or the central TNS_ADMIN would be good locations, but what if I have observers that must be started from multiple Oracle Homes?
In that case, having a observer.ora in $ORACLE_HOME/network/admin (or $ORACLE_BASE/homes/{OHNAME}/network/admin/ if Read-Only Oracle Home is enabled) would be a better solution: in this case I would need to start one session per Oracle Home

The content of my observer.ora must be something like:

This is the example for my configuration, but I can put as many (CONFIG=…) as I want in order to observe multiple configurations.
Then, if everything is configured properly, I can start all the observers with a single command:

Troubleshooting

If the observer does not work, sometimes it is not easy to understand the cause.

  • Has SYSDG been granted to SYSDG user? Is SYSDG account unlocked?
  • Does sqlnet.ora contain the correct wallet location?
  • Is the wallet accessible in autologin?
  • Are the entries in the wallet correct? (check with “sqlplus /@connstring as sysdg”)

Missing pieces

Here, a few features that I think would be a nice addition in the future:

  • Awareness for the ORACLE_HOME to be used for each observer
  • Possibility to specify a different TNS_ADMIN per observer (different wallets)
  • Integration with Grid Infrastructure (srvctl add observer…) and support for multiple observers

Ludovico

Script to check Data Guard status from SQL

In a previous blog post I have explained how to get the basic configuration from x$drc and display something like:

There are other possibilities, by using the DBMS_DRS PL/SQL package.

The package is quite rich. In order to get more details, I use CHECK_CONNECT to check the connectivity to the member databases:

example:

In the first case I get no exceptions, that means that the database is reachable using the DGConnectIdentifier specified in the configuration (‘TOOLCDB1_SITE2’ is my database name in the configuration, it is NOT a TNS entry. I use EZConnect in my lab).

In the second case I specify a database that is not in the configuration.

In the third case, it looks like the database is down (no service), or the DGConnectIdentifier is not correct.

 

GET_PROPERTY_OBJ is useful to get e single property of a database/instance:

Example:

Here I have, for the primary (the object_id from x$drc), a TransportLagThreshold of 30 seconds.

DO_CONTROL does a specific check and returns a document with the results:

The problem is… what’s the format for indoc?

To get the correct format, I have enabled sql trace to get the executions, with bind variables, of the dgmgrl commands. It happens that the input format is XML and the output format is HTML.

This is how you can get the LogXptStatus, for example:

 

The big script

So I said… why not trying to have a comprehensive SQL script to check a few vital statuses of Data Guard?

This is the script that came out:

Of course, it is not perfect (many checks missing: FSFO readiness, observer checks, etc.), but it is good enough for base monitoring. Also, it’s faster than a normal shell+dgmgrl script.

Output on a Primary database:

Output on a standby database:

In case of errors (e.g. standby listener stopped), I would get:

So easy to spot the error and use a shell wrapper to grep ^ERROR or similar.

Be careful, the script is not RAC aware, and it lacks some checks, so you might want to reuse it and extend it to fit your exact configuration.

Hope you like it!

Ludovico