Data Guard 26ai – #5: Fast-Start Failover Observer Priority

This post is part of a blog series.

Technically, this is a 21c feature, but it’s worth calling out some of those 21c improvements, because most customers are still running on 19c or earlier. That means when you upgrade to 26ai, you’ll pick up all the 21c goodies too!

Here’s one: Fast-Start Failover observer’s priority. In 19c, you could list preferred observers for each possible primary with the property PreferredObserverHosts , but you couldn’t actually assign an observer priority based, for example, to the observer’s location.

21c fixes that. Now, you can give each observer a priority by adding a colon and a number right after the hostname. The lower the number, the higher the priority. This lets you spell out exactly which observers should be chosen first if a promotion is needed.

The diagram below shows an example: the external site’s observer is set as the top pick for both databases at the primary and the secondary sites, with each site’s local observer as the backup (which is our recommendation in this case).

two databases, one per site, have the top preferred observer on an external site, and both have a backup observer locally on their respective site.

You can create that setup now—thanks to observer priorities.

Data Guard 26ai – #4: Faster DML Redirection

This post is part of a blog series.

Before Oracle 26ai, Active Data Guard’s DML redirection was significantly slower than DMLs executed directly on the primary database. When your app ran DML on the standby, the changes had to be executed on the primary and then returned and applied on the standby before your session could continue. That led to unnecessary pauses, with sessions often waiting on the “standby query scn advance” wait event.

Most of that waiting isn’t always needed. Theoretically, you only have to wait if your session needs to commit or read the updated data.

Oracle AI Database 26ai fixes this. Now, once DML succeeds on the primary, your session can continue with the next statement (or commit) without waiting. The only wait required to keep ACID consistency is upon commit or read. With this brilliant change, redirected transactions are up to 33 times faster in our internal tests compared to 19c.

This new behavior is on by default, but if you prefer the old way, you can set the hidden parameter “_alter_adg_redirect_behavior” to “sync_each_dml”.

The chart below shows the difference. We tested both 19c and 26ai (primary and standby) with SwingBench running 16 concurrent order-entry sessions, each doing a mix of “newCustomerProess” and “browseProducts” operations.

a chart shows that 26ai redirected DML transactions are 13x faster in mixed workloads with 5% of writes, and 33x faster in mixed workloads with 25% writes.

Data Guard 26ai – #3: Choice of Lag Type for Fast-Start Failover

This post is part of a blog series.

In Data Guard, when Fast-Start Failover runs in Maximum Performance mode, the FSFP process tracks lag to keep the primary database within safe limits.
Traditionally, FSFP used APPLY LAG to measure lag, a legacy from older redo transport. But APPLY LAG may not reflect real data loss risk: TRANSPORT LAG shows how much data hasn’t reached the standby.
With 26ai, you can set FastStartFailoverLagType to APPLY (default) or TRANSPORT.
Consider switching this property to TRANSPORT to track real data loss exposure.

a table shows that before 26ai "APPLY" is the only lag type, while in 26ai it can also be TRANSPORT.

Data Guard 26ai – #2: Minimized Stall in Maximum Performance

This post is part of a blog series.

Did you know that Oracle Data Guard fast-start failover in maximum performance mode can briefly stall your primary database?

Most users don’t notice, but in environments with strict performance needs, these stalls can matter.

Here’s why: when the database shifts from “UNDER LAG” to “OVER LAG” status, the primary waits for the observer’s acknowledgment. This pause ensures the database doesn’t breach its recovery point objective set with the FastStartFailoverLagLimit property, but it can last up to three seconds (default observer ping time).

when transitioning to "OVER LAG" the primary stalls waiting for the observer's acknowlegment.Stalls are especially common if the standby can’t keep up with primary redo generation, causing frequent state transitions.

There's a grace period where the primary asks the observer to pre-acknowledge the state change before stalling.
Now in 26ai, the new FastStartFailoverLagGraceTime property lets the observer acknowledge a “pre-stall” before reaching the real lag limit. That way, when the database hits the actual limit, it won’t need to pause: the acknowledge’s already done.
This simple change removes stalls during state transitions, so even the strictest environments meet their performance goals.

What you need to do: set FastStartFailoverLagGraceTime to a value greater than 0 and lower or equal to 3 to make this feature effective. The default of 0 keeps the old behavior.

 

Data Guard 26ai – #1: Faster role transitions

This post is part of a blog series.
I’ve already blogged about it on the official Oracle MAA Blog (read here) , but let me insist on this.

Role transitions (switchover, failover) are much faster in Oracle Data Guard 26ai.

Depending on the configuration and workload, they can be up to five times faster! No changes to the application code or configuration: you get this improvement out of the box.

Here’s an example of two identical configurations using 19.29 and 23.26.1, one PDB, and no application services (basically, an empty database):

Switchover in 19.29

Total: ~44 seconds

Switchover in 23.26.1

Total: < 20 seconds

😎

Mini-blog series: Oracle Data Guard 26ai new features

Thank you for your patience: Oracle AI Database 26ai is now available on Linux x86_64 systems!
This release delivers many new features, including key updates to Data Guard and Active Data Guard: two areas I track as a product manager.
Some improvements aren’t listed in the feature guide, so I’m launching a daily series of brief blog posts over the next month. Each one will spotlight a practical change or enhancement you can try right away.

  1. Faster role transitions
  2. Minimized Stall in Maximum Performance
  3. Choice of Lag Type for Fast-Start Failover
  4. Faster DML Redirection
  5. Fast-Start Failover Observer Priority (21c)
  6. Up to four Fast-Start Failover Observers (21c)
  7. Rolling Upgrade with Application Continuity
  8. Multiple ASYNC connections
  9. Automatic preparation of primary and standby
  10. Data Guard Broker PL/SQL API
  11. SQLcl support for Data Guard commands
  12. ORDS support for Data Guard
  13. Show / edit all members at once
  14. JSON output for DGMGRL
  15. Prevent standby databases from becoming primary
  16. Configuration and member tagging
  17. Automatic standby tempfile creation
  18. PDB Recovery Isolation
  19. Easy AWR snapshots on the standby
  20. Strict database validation
  21. Switchover and Failover Readiness
  22. Easier tracking of role transitions
  23. Easier checking of Data Guard configurations
  24. New command: VALIDATE DGConnectIdentifier
  25. Easier checking of Fast-Start Failover configurations
  26. Fast-Start Failover Lag Histogram
  27. Enhanced observer diagnostic
  28. Fast Start Failover Configuration Validation
  29. Offload AI Inference and Vector Search to Oracle Active Data Guard

Announcing ADG-Topology

Today is my birthday! 🎂🍾

📣 I’m excited to announce the release of my personal project, ADG-Topology: a free, open-source tool to help you create RedoRoutes rules by drawing Oracle Active Data Guard topologies. This is not an official Oracle product: it’s available “AS IS” under the MIT license.

What is ADG-Topology?

Draw Active Data Guard Topologies

ADG-Topology is a web application built with ReactFlow. It lets you easily design complex Active Data Guard diagrams, then automatically generates DGMGRL commands to create the corresponding RedoRoutes rules.

Get Active Data Guard RedoRoutes generated automatically

Key Features

  • Draw topologies that include cascaded physical standbys, Far Sync, and ZDLRA
  • Model different redo topologies for every possible primary
  • Support alternate destinations
  • Perform basic validation (see the GitHub repository for details)
  • 100% browser-based: your data never leaves your computer
  • Auto-generate RedoRoutes rules
  • Export and import reusable JSON files

Try it out!

Get started at ludovicocaldara.net/adg-topology. There’s a direct link in my blog’s menu.
Want to build or customize it? Visit the GitHub repository for setup instructions and details on contributing.
I hope ADG-Topology makes your Active Data Guard projects easier and more efficient.

Happy diagramming!

Reinstate an Oracle Database after failover without Flashback logging enabled

The key difference between a switchover and a failover in Data Guard is the synchronization between primary and standby to avoid data loss.

Switchover
A switchover is a planned event. The primary and standby databases synchronize to allow a smooth, lossless role reversal:

  • The primary database stops the activity.
  • It writes a “redo marker” to the redo stream.
  • It flushes all the remaining redo to the standby.
  • The standby applies all the redo and, upon reaching the marker, knows it’s up to date and can open as the new primary.
  • Finally, the old primary becomes a standby and starts recovering changes from the new primary.

Failover
A failover usually happens after the primary becomes unavailable. Unlike a switchover, the new primary may not have all the latest redo from the old primary, even in synchronous mode because in-progress changes can exist past the last transmitted redo. As a result, their timelines diverge. The former primary (and possibly other standbys) must be reinstated to rejoin the configuration.

Flashback and Reinstate
Oracle’s Flashback Database feature allows you to rewind a database to a specific point in time. This makes recovering a failed primary much faster and easier because it does not require a full restore.
With Flashback enabled, the Data Guard Broker can quickly reinstate the former primary after a failover with a single command:

A customer recently asked how to reinstate a former primary when Flashback Logging is disabled.
Although we strongly recommend enabling Flashback Logging, some environments cannot support it due to storage constraints.
In such cases, you can still reinstate the database manually. Here’s how.

Step-by-step: reinstate a database from a backup and re-enable it in the broker

We are in a Maximum Performance configuration:

Let’s fail over:

Then restart the former primary in MOUNT mode:

The broker now says the former primary requires a reinstate:

As I said, we can’t proceed with a normal reinstate if there’s no flashback:

Let’s restore! This is the only way to bring a database to a past point in time if there is no flashback logging.

First, we put the DB in nomount mode to restore the standby controlfile:

Then we wipe out the directory that contains the DB (careful with this command!)

We restore the standby controlfile and the database:

Now let’s enable the freshly reinstated standby database:

Oups, I made a mistake. The Real-Time Apply isn’t working. Of course! I also deleted the standby logs, so I need to clear them.

Everything looks good now.

HTH
Ludovico

SHOW CONFIGURATION VERBOSE changes in 23.9

Traditionally, the DGMGRL command SHOW CONFIGURATION VERBOSE not only retrieved  detailed configuration information but also triggered a health check. The health check operation can be resource-intensive and time-consuming, especially when executed repeatedly across multiple database instances or as part of automated workflows.

Starting with Oracle 23.9 (and planned also for a future 19c Release Update), the behavior of SHOW CONFIGURATION VERBOSE changes with the introduction of the following fix:

Bug 37829413 – ‘SHOW CONFIGURATION VERBOSE’ UNNECESSARILY TRIGGERS A FORCED HEALTH CHECK

You can check it yourself in the Oracle Release Analyzer Diff Utility:

https://oradiff.oraclecorp.com/ords/r/oradiff/oradiff/search-fixes

Previous behavior

Each use of SHOW CONFIGURATION VERBOSE triggered a fresh, full health check before showing configuration details, regardless of whether up-to-date health information was needed.

New behavior

The command now returns comprehensive configuration details and property values without forcing an immediate health check.

Why this change?

This change eliminates unnecessary resource usage and network communication, improving performance especially in automated systems that repeatedly gather configuration info, such as Oracle TFA or custom scripts. The goal is to make monitoring and troubleshooting more efficient.

What’s the impact for me?

When you execute SHOW CONFIGURATION, at the bottom you see when the last health check was executed:

The health check is scheduled automatically every minute.

When there was a warning, it was common to execute “SHOW CONFIGURATION VERBOSE” to force a refresh of the status and get the most recent status. This won’t work anymore, and you’ll have to wait until the next scheduled health check.

In Oracle 23ai, you can still force a health check explicitly with:

Remember, avoid running it unless you are in an emergency!

Ludovico

New views in Oracle Data Guard 23c

Oracle Data Guard 23c comes with many nice improvements for observability, which greatly increase the usability of Data Guard in environments with a high level of automation.

For the 23c version, we have the following new views.V$DG_BROKER_ROLE_CHANGE

This view tracks the last role transitions that occurred in the configuration. Example:

The event might be a Switchover, Failover, or Fast-Start Failover.

In the case of Fast-Start Failover, you will see the reason (typically “Primary Disconnected” if it comes from the observer, or whatever reason you put in DBMS_DG.INITIATE_FS_FAILOVER.

No more need to analyze the logs to find out which database was primary at any moment in time!

V$DG_BROKER_PROPERTY

Before 23c, the only possible way to get a broker property from SQL was to use undocumented (unsupported) procedures in the fixed package DBMS_DRS. I’ve blogged about it in the past, before joining Oracle.

Now, it’s as easy as selecting from a view, where you can get the properties per member or per configuration:

The example selects just three columns, but the view is rich in detailing which properties apply to which situation (scope, valid_role):

The monitorable properties can be monitored using DBMS_DG.GET_PROPERTY(). I’ll write a blog post about the new PL/SQL APIs in the upcoming weeks.

I wish I had this view when I was a DBA 🙂

V$FAST_START_FAILOVER_CONFIG

If you have a Fast-Start Failover configuration, this view will show its details:

This view replaces some columns currently in v$database, that are therefore deprecated:

V$FS_LAG_HISTOGRAM

This view is useful to calculate the optimal FastStartFailoverLagTime.

It shows the frequency of Fast-Start Failover lags and the most recent occurrence for each bucket.

LAG_TIME is the upper bound of the bucket, e.g.

  • 5 -> between 0 and 5 seconds
  • 10 -> between 5 and 10 seconds
  • etc.

It’s refreshed every minute, only when Fast-Start Failover is enabled (also in observe-only mode).

V$FS_FAILOVER_OBSERVERS

This view is not new, however, its definition now contains more columns:

This gives important additional information about the observers, for example, the last time a specific observer was able to ping the primary or the target (in seconds).

Also, the path of the log file and runtime data file are available, making it easier to find them on the observer host in case of a problem.

Conclusion

These new views should greatly improve the experience when monitoring or diagnosing problems with Data Guard. But they are just a part of many improvements we introduced in 23c. Stay tuned for more 🙂

Ludovico