Oracle Data Guard 23c comes with many nice improvements for observability, which greatly increase the usability of Data Guard in environments with a high level of automation.
For the 23c version, we have the following new views.V$DG_BROKER_ROLE_CHANGE
This view tracks the last role transitions that occurred in the configuration. Example:
The event might be a Switchover, Failover, or Fast-Start Failover.
In the case of Fast-Start Failover, you will see the reason (typically “Primary Disconnected” if it comes from the observer, or whatever reason you put in DBMS_DG.INITIATE_FS_FAILOVER.
No more need to analyze the logs to find out which database was primary at any moment in time!
V$DG_BROKER_PROPERTY
Before 23c, the only possible way to get a broker property from SQL was to use undocumented (unsupported) procedures in the fixed package DBMS_DRS. I’ve blogged about it in the past, before joining Oracle.
Now, it’s as easy as selecting from a view, where you can get the properties per member or per configuration:
This gives important additional information about the observers, for example, the last time a specific observer was able to ping the primary or the target (in seconds).
Also, the path of the log file and runtime data file are available, making it easier to find them on the observer host in case of a problem.
Conclusion
These new views should greatly improve the experience when monitoring or diagnosing problems with Data Guard. But they are just a part of many improvements we introduced in 23c. Stay tuned for more 🙂
This command prepares a database to become primary in a Data Guard configuration.
It sets many recommended parameters:
Oracle PL/SQL
1
2
3
4
5
6
7
8
DB_FILES=1024
LOG_BUFFER=256M
DB_BLOCK_CHECKSUM=TYPICAL
DB_LOST_WRITE_PROTECT=TYPICAL
DB_FLASHBACK_RETENTION_TARGET=120
PARALLEL_THREADS_PER_CPU=1
STANDBY_FILE_MANAGEMENT=AUTO
DG_BROKER_START=TRUE
Sets the RMAN archive deletion policy, enables flashback and force logging, creates the standby logs according to the online redo logs configuration, and creates an spfile if the database is running with an init file.
If you tried this in 21c, you have noticed that there is an automatic restart of the database to set all the static parameters. If you weren’t expecting this, the sudden restart could be a bit brutal approach.
In 23c, we added an additional keyword “restart” to specify that you are OK with the restart of the database. If you don’t specify it, the broker will complain that it cannot proceed without a restart:
I love seeing people suggesting Oracle Data Guard Fast-Start Failover for high availability. Nevertheless, there are a few problems with the architecture and steps proposed in the article.
I sent my comments via Disqus on the AWS blogging platform, but after a month, my comment was rejected, and the blog content hasn’t changed.
For this reason, I don’t have other places to post my comment but here…
The steps used to create the database service do not include any HA property, which will make most efforts useless. (see Table 153-6 in the link above).
But, most important, TAF (or Oracle connectivity in general) does NOT require a host IP change! There is no need to change the DNS when using the recommended connection string with multiple address_lists.
If you need to implement a complex architecture using a software solution, pay attention that the practices suggested by the partner/integrator/3rd party match the ones from the software vendor. In the case of Oracle Data Guard, Oracle knows better 😉
The video explains best practices and different failure scenarios for different observer placements. It also shows how to configure high availability for the observer.
Here’s the summary:
Always try to put the observer(s) on an external site.
If you don’t have any, put it where the primary database is, and have one ready on the secondary site after the role transition.
Don’t put the observer together with the standby database!
Configure multiple observers for high availability, and use the PreferredObserverHosts Data Guard member property to ensure you never run the observer where the standby database is.
Why is Fast-Start Failover a crucial component for mission-critical Data Guard deployments?
The observer lowers the RTO in case of failure, and the Fast-Start Failover protection modes protect the database from split-brain and data loss.
Oracle advertises Far Sync as a solution for “Zero Data Loss at any distance”. This is because the primary sends its redo stream synchronously to the Far Sync, which relays it to the remote physical standby.
There are many reasons why Far Sync is an optimal solution for this use case, but that’s not the topic of this post 🙂
Some customers ask: Can I configure Far Sync to receive the redo stream asynchronously?
Although a direct standby receiving asynchronously would be a better idea, Far Sync can receive asynchronously as well.
And one reason might be to send asynchronously to one Far Sync member that redistributes locally to many standbys.
It is very simple to achieve: just changing the RedoRoutes property on the primary.
Oracle PL/SQL
1
RedoRoutes='(LOCAL : cdgsima_farsync1 ASYNC)'
This will work seamlessly. The v$dataguard_process will show the async transport process:
Oracle PL/SQL
1
2
NAME PID TYP ACTION CLIENT_PID CLIENT_ROLE GROUP# RESETLOG_ID THREAD# SEQUENCE# BLOCK#
In the following configuration, cdgsima_lhr1pq (primary) sends synchronously to cdgsima_farsync1 (far sync), which forwards the redo stream asynchronously to cdgsima_lhr1bm (physical standby):
But if cdgsima_farsync1 is not available, I want the primary to send synchronously to the physical standby database. I accept a performance penalty, but I do not want to compromise my data protection.
One of the challenges of automation in bin Oracle Environments is dealing with tnsnames.ora files.
These files might grow big and are sometimes hard to distribute/maintain properly.
The worst is when manual modifications are needed: manual operations, if not made carefully, can screw up the connection to the databases.
The best solution is always using LDAP naming resolution. I have seen customers using OID, OUD, Active Directory, openldapd, all with a great level of control and automation. However, some customer don’t have/want this possibility and keep relying on TNS naming resolution.
When Data Guard (and eventually RAC) are in place, the tnsnames.ora gets filled by entries for the DGConnectIdentifiers and StaticConnectIdentifier. If I add the observer, an additional entry is required to access the dbname_CFG service created by the Fast Start Failover.
Actually, all these entries are not required if I use Easy Connect.
The basic configuration with Data Guard is quite simple to achieve with Easy Connect. In this examples I have:
– The primary database TOOLCDB1_SITE1
– The duplicated database for standby TOOLCDB1_SITE2
After setting up the static registration (no Grid Infrastructure in my lab):
Lisp
1
2
3
4
5
6
7
8
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=TOOLCDB1_SITE1_DGMGRL)
(SID_NAME=TOOLCDB1)
(ORACLE_HOME=/u01/app/oracle/product/db_19_8_0)
)
)
and copying the passwordfile, the configuration can be created with:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DGMGRL> create configuration TOOLCDB1 as primary database is TOOLCDB1_SITE1 connect identifier is 'newbox01:1521/TOOLCDB1_SITE1';
Configuration "toolcdb1" created with primary database "toolcdb1_site1"
DGMGRL> edit database TOOLCDB1_SITE1 set property 'StaticConnectIdentifier'='newbox01:1521/TOOLCDB1_SITE1_DGMGRL';
Property "StaticConnectIdentifier" updated
DGMGRL> add database TOOLCDB1_SITE2 as connect identifier is 'newbox02:1521/TOOLCDB1_SITE2';
Database "toolcdb1_site2" added
DGMGRL> edit database TOOLCDB1_SITE2 set property 'StaticConnectIdentifier'='newbox02:1521/TOOLCDB1_SITE2_DGMGRL';
Property "StaticConnectIdentifier" updated
DGMGRL> enable configuration;
Enabled.
That’s it.
Now, if I want to have the configuration observed, I need to activate the Fast Start Failover:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
DGMGRL> edit database toolcdb1_site1 set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database toolcdb1_site2 set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database toolcdb1_site1 set property FastStartFailoverTarget='toolcdb1_site2';
Property "faststartfailovertarget" updated
DGMGRL> edit database toolcdb1_site2 set property FastStartFailoverTarget='toolcdb1_site1';
Property "faststartfailovertarget" updated
DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.
DGMGRL> enable fast_start failover;
Enabled in Zero Data Loss Mode.
With just two databases, FastStartFailoverTarget is not explicitly needed, but I usually do it as other databases might be added to the configuration in the future.
After that, the broker complains that FSFO is enabled but there is no observer yet:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Zero Data Loss Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 180 seconds
Active Target: toolcdb1_site2
Potential Targets: "toolcdb1_site2"
toolcdb1_site2 valid
Observer: (none)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: 180 seconds
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL> show configuration;
Configuration - toolcdb1
Protection Mode: MaxAvailability
Members:
toolcdb1_site1 - Primary database
Warning: ORA-16819: fast-start failover observer not started
toolcdb1_site2 - (*) Physical standby database
Fast-Start Failover: Enabled in Zero Data Loss Mode
Configuration Status:
WARNING (status updated 39 seconds ago)
Observer for multiple configurations
This feature has been introduced in 12.2 but it is still not widely used.
Before 12.2, the Observer was a foreground process: the DBAs had to start it in a wrapper script executed with nohup in order to keep it live.
Since 12.2, the observer can run as a background process as far as there is a valid wallet for the connection to the databases.
Also, 12.2 introduced the capability of starting multiple configurations with a single dgmgrl command: “START OBSERVING”.
First, I need a wallet. And here comes the first compromise:
Having a single dgmgrl session to start all my configurations means that I have a single wallet for all the databases that I want to observe.
Fair enough, all the DBs (CDBs?) are managed by the same team in this case.
If I have only observers on my host I can easily point to the wallet from my central sqlnet.ora:
Which connection descriptors do I need?
The Observer uses the DGConnectIdentifier to keep observing the databases, but needs a connection to both of them using the TOOLCDB1_CFG service (unless I specify something different with the broker configuration property ConfigurationWideServiceName) to connect to the configuration and get the DGConnectIdentifier information. Again, you can check it in the doc. or the note Oracle 12.2 – Simplified OBSERVER Management for Multiple Fast-Start Failover Configurations (Doc ID 2285891.1)
So I need to specify three secrets for three connection descriptors:
Oracle Secret Store Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.
Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:
The first one will be used for the initial connection. The other two to observe the Primary and Standby.
I need to be careful that the first EZConnect descriptor matches EXACTLY what I put in observer.ora (see next step) and the last two match my DGConnectIdentifier (unless I specify something different with ObserverConnectIdentifier), otherwise I will get some errors and the observer will not observe correctly (or will not start at all).
The dgmgrl needs then a file named observer.ora.
$ORACLE_BASE/admin/observers or the central TNS_ADMIN would be good locations, but what if I have observers that must be started from multiple Oracle Homes?
In that case, having a observer.ora in $ORACLE_HOME/network/admin (or $ORACLE_BASE/homes/{OHNAME}/network/admin/ if Read-Only Oracle Home is enabled) would be a better solution: in this case I would need to start one session per Oracle Home
The content of my observer.ora must be something like:
This is the example for my configuration, but I can put as many (CONFIG=…) as I want in order to observe multiple configurations.
Then, if everything is configured properly, I can start all the observers with a single command:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
DGMGRL> SET OBSERVERCONFIGFILE=/u01/app/oracle/admin/observers/observer.ora
DGMGRL> START OBSERVING
ObserverConfigFile=observer.ora
observer configuration file parsing succeeded
Submitted command "START OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"
Check superobserver.log, individual observer logs and Data Guard Broker logs for execution details.
In my config, the Oracle Database version is 19.7 and the databases are actually CDBs. No Grid Infrastructure, non-OMF datafiles.
It is important to highlight that a lot of things have changed since 12.1. And because 19c is the LTS version now, it does not make sense to try anything older.
First, I just want to make sure that my standbys are aligned.
PR00 (PID:6718): Media Recovery Waiting for T-1.S-41
So, yeah, not having OMF might get you some warnings like: WARNING: File being created with same name as in Primary
But it is good to know that the cascade standby deals well with new PDBs.
Of course, this is not of big interest as I know that the problem with Multitenant comes from CLONING PDBs from either local or remote PDBs in read-write mode.
So absolutely the same behavior between the two levels of standby.
According to the documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-PLUGGABLE-DATABASE.html#GUID-F2DBA8DD-EEA8-4BB7-A07F-78DC04DB1FFC
I quote what is specified for the parameter STANDBYS={ALL|NONE|…}: “If you include a PDB in a standby CDB, then during standby recovery the standby CDB will search for the data files for the PDB. If the data files are not found, then standby recovery will stop and you must copy the data files to the correct location before you can restart recovery.”
“Specify ALL to include the new PDB in all standby CDBs. This is the default.”
“Specify NONE to exclude the new PDB from all standby CDBs. When a PDB is excluded from all standby CDBs, the PDB’s data files are unnamed and marked offline on all of the standby CDBs. Standby recovery will not stop if the data files for the PDB are not found on the standby. […]”
So, in order to avoid the MRP to crash, I should have included STANDBYS=NONE
But the documentation is not up to date, because in my case the PDB is skipped automatically and the recovery process DOES NOT STOP:
However, the recovery is marked ENABLED for the PDB on the standby, while usind STANDBYS=NONE it would have been DISABLED.
Oracle PL/SQL
1
2
3
4
5
6
7
1*selectname,recovery_statusfromv$pdbs
NAMERECOVERY
------------------------------ --------
PDB$SEEDENABLED
LATERALUSENABLED
PNEUMAENABLED
So, another difference with the doc who states: “You can enable a PDB on a standby CDB after it was excluded on that standby CDB by copying the data files to the correct location, bringing the PDB online, and marking it as enabled for recovery.”
This reflects the findings of Philippe Fierens in his blog (http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).
This behavior has been introduced probably between 12.2 and 19c, but I could not manage to find exactly when, as it is not explicitly stated in the documentation.
However, I remember well that in 12.1.0.2, the MRP process was crashing.
In my configuration, not on purpose, but interesting for this article, the first standby has the very same directory structure, while the cascaded standby has not.
In any case, there is a potentially big problem for all the customers implementing Multitenant on Data Guard:
With the old behaviour (MRP crashing), it was easy to spot when a PDB was cloned online into a primary database, because a simple dgmgrl “show configuration” whould have displayed a warning because of the increasing lag (following the MRP crash).
With the current behavior, the MRP keeps recovering and the “show configuration” displays “SUCCESS” despite there is a PDB not copied on the standby (thus not protected).
The missing PDB is easy to spot once I know that I have to do it. However, for each PDB to recover (I might have many!), I have to prepare the rename of datafiles and creation of directory (do not forget I am using non-OMF here).
Now, the datafile names on the standby got changed to …/UNNAMEDnnnnn.
So I have to get the original ones from the primary database and do the same replace that db_file_name_convert would do:
Oracle PL/SQL
1
2
3
4
settrimon
colrename_filefora300
setlines400
select'set newname for datafile '||file#||' to '''||replace(name,'/TOOLCDB1/','/TOOLCDX1/')||''';'asrename_filefromv$datafilewherecon_id=6;
and put this in a rman script (this will be for the second standby, the first has the same name so same PATH):
Then, I need to stop the recovery, start it and stopping again, put the datafiles online and finally restart the recover.
These are the same steps used my Philippe in his blog post, just adapted to my taste 🙂
Now, I do not have anymore any datafiles offline on the standby:
Oracle PL/SQL
1
2
3
SQL>select'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!'fromv$recover_filewhereonline_status='OFFLINE'groupbycon_id;
norowsselected
I will not publish the steps for the second standby, they are exactly the same (same output as well).
At the end, for me it is important to highlight that monitoring the OFFLINE datafiles on the standby becomes a crucial point to guarantee the health of Data Guard in Multitenant. Relying on the Broker status or “PDB recovery disabled” is not enough.
On the bright side, it is nice to see that Cascade Standby configurations do not introduce any variation, so cascaded standbys can be threated the same as “direct” standby databases.
HTH
—
Ludovico
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.