The video explains best practices and different failure scenarios for different observer placements. It also shows how to configure high availability for the observer.
Here’s the summary:
Always try to put the observer(s) on an external site.
If you don’t have any, put it where the primary database is, and have one ready on the secondary site after the role transition.
Don’t put the observer together with the standby database!
Configure multiple observers for high availability, and use the PreferredObserverHosts Data Guard member property to ensure you never run the observer where the standby database is.
Why is Fast-Start Failover a crucial component for mission-critical Data Guard deployments?
The observer lowers the RTO in case of failure, and the Fast-Start Failover protection modes protect the database from split-brain and data loss.
Oracle advertises Far Sync as a solution for “Zero Data Loss at any distance”. This is because the primary sends its redo stream synchronously to the Far Sync, which relays it to the remote physical standby.
There are many reasons why Far Sync is an optimal solution for this use case, but that’s not the topic of this post 🙂
Some customers ask: Can I configure Far Sync to receive the redo stream asynchronously?
Although a direct standby receiving asynchronously would be a better idea, Far Sync can receive asynchronously as well.
And one reason might be to send asynchronously to one Far Sync member that redistributes locally to many standbys.
It is very simple to achieve: just changing the RedoRoutes property on the primary.
Oracle PL/SQL
1
RedoRoutes='(LOCAL : cdgsima_farsync1 ASYNC)'
This will work seamlessly. The v$dataguard_process will show the async transport process:
Oracle PL/SQL
1
2
NAME PID TYP ACTION CLIENT_PID CLIENT_ROLE GROUP# RESETLOG_ID THREAD# SEQUENCE# BLOCK#
In the following configuration, cdgsima_lhr1pq (primary) sends synchronously to cdgsima_farsync1 (far sync), which forwards the redo stream asynchronously to cdgsima_lhr1bm (physical standby):
But if cdgsima_farsync1 is not available, I want the primary to send synchronously to the physical standby database. I accept a performance penalty, but I do not want to compromise my data protection.
The “RHPHELP_preNodeAddVal” might already give an idea of the cause: something related to the “cluvfy stage -pre nodeadd” evaluation that we normally do when adding a node by hand. FPP does not really run cluvfy, but it calls the same primitives cluvfy is based on.
In FPP, when the error does not give any useful information, this is the flow to follow:
use “rhpctl query audit” to get the date and time of the failing operation
open the “rhpserver.log.0” and look for the operation log in that time frame
get the UID of the operation e.g., in the following line it is “1556344143”:
PRGO-1260:Cluster Verification checks fordatabase home provisioning failed forthe specified working copy WC_db_19_11_FPPC.
PRCR-1178:Execution of commandfailed on one ormorenodes
PRVF-7546:The work directory"/tmp/CVU_19.0.0.0.0_oracle/"cannot be used on node"fppc02"
This is often related to the filesystem /tmp that has the “noexec” option:
Shell
1
2
$mount|grep/tmp
tmpfs on/tmp typetmpfs(rw,nosuid,nodev,noexec)
Although it is tempting to just remount the filesystem with “exec”, you might be in this situation because your systems are configured to adhere to the STIG recommendations:
FPP was still young and lacking many features at that time, but it already changed the way I’ve worked during the next years. I embraced the out of place patching, developed some basic scripts to install Oracle Homes, and sought automation and standardization at all costs:
And discovered that meanwhile, FPP did giant steps, with many new features and fixes for quite a few usability and performance problems.
Last year, when joining the Oracle Database High Availability (HA), Scalability, and Maximum Availability Architecture (MAA) Product Management Team at Oracle, I took (among others) the Product Manager role for FPP.
Becoming an Oracle employee after 20 years of working with Oracle technology is a big leap. It allows me to understand how big the company is, and how collaborative and friendly the Oracle employees are (Yes, I was used to marketing nonsense, insisting salesmen, and unfriendly license auditors. This is slowly changing with Oracle embracing the Cloud, but it is still a fresh wound for many customers. Expect this to change even more! Regarding me… I’ll be the same I’ve always been 🙂 ).
Now I have daily meetings with big customers (bigger than the ones I have ever had in the past), development teams, other product managers, Oracle consultants, and community experts. My primary goal is to make the product better, increasing its adoption, and helping customers having the best experience with it. This includes testing the product myself, writing specs, presentations, videos, collecting feedback from the customers, tracking bugs, and manage escalations.
I am a Product Manager for other products as well, but I have to admit that FPP is the product that takes most of my Product Manager time. Why?
I will give a few reasons in my next blog post(s).
I have been a DBA/consultant for customers and big production environments for over twenty years. I have explained more or less my career path in this blog post.
Database (and application) high availability has always been one of my favorite areas. Over the years I have become a high availability expert (my many blog posts are there to confirm it) and I have spent a lot of time building, troubleshooting, teaching, presenting, advocating these gems of technology that are RAC, Data Guard, Application Continuity and the many other products that are part of the Oracle Maximum Availability Architecture solution. Customers fear downtime, and I have always been with them on that. But in my case, it looks like Yoda’s famous quote worked well for me (in a good way):
I’ll be joining the Oracle Maximum Availability Architecture Product Management team as MAA Product Manager (or rather Cloud MAA, I will not explain here ;-)) next November.
(for those who are not familiar with the joke, the “Dark Side” is how we often refer to the Oracle employees in the Oracle Community 😉 )
I remember just like if it was yesterday that I was presenting some Data Guard 12c new features in front of a big audience at Collaborate 2014. There I have met two incredible people that were part of the MAA Product Management team: Larry Carpenter and Markus Michalewicz. Larry has been a great source of inspiration to improve my seniority and ease of presenting in front of the public, while Markus has become a friend over the years in addition of being one of the most influent persons in my professional network.
Now I have got the opportunity to join that team, and I feel like it’s the most natural change to do in my career.
And because I imagine some of you will have some questions, there are some answers to questions I’ve been frequently asked so far:
MAA PM does not mean becoming team lead or supervising other colleagues, I’ll be a “regular” PM
I will stay in Switzerland and work remotely from here
I will stay in “the conference circus” and keep presenting as soon the COVID-19 situation will allow to do so
Yes, I was VERY happy in Trivadis and it will always have a special place in my heart
One of the challenges of automation in bin Oracle Environments is dealing with tnsnames.ora files.
These files might grow big and are sometimes hard to distribute/maintain properly.
The worst is when manual modifications are needed: manual operations, if not made carefully, can screw up the connection to the databases.
The best solution is always using LDAP naming resolution. I have seen customers using OID, OUD, Active Directory, openldapd, all with a great level of control and automation. However, some customer don’t have/want this possibility and keep relying on TNS naming resolution.
When Data Guard (and eventually RAC) are in place, the tnsnames.ora gets filled by entries for the DGConnectIdentifiers and StaticConnectIdentifier. If I add the observer, an additional entry is required to access the dbname_CFG service created by the Fast Start Failover.
Actually, all these entries are not required if I use Easy Connect.
The basic configuration with Data Guard is quite simple to achieve with Easy Connect. In this examples I have:
– The primary database TOOLCDB1_SITE1
– The duplicated database for standby TOOLCDB1_SITE2
After setting up the static registration (no Grid Infrastructure in my lab):
Lisp
1
2
3
4
5
6
7
8
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=TOOLCDB1_SITE1_DGMGRL)
(SID_NAME=TOOLCDB1)
(ORACLE_HOME=/u01/app/oracle/product/db_19_8_0)
)
)
and copying the passwordfile, the configuration can be created with:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DGMGRL> create configuration TOOLCDB1 as primary database is TOOLCDB1_SITE1 connect identifier is 'newbox01:1521/TOOLCDB1_SITE1';
Configuration "toolcdb1" created with primary database "toolcdb1_site1"
DGMGRL> edit database TOOLCDB1_SITE1 set property 'StaticConnectIdentifier'='newbox01:1521/TOOLCDB1_SITE1_DGMGRL';
Property "StaticConnectIdentifier" updated
DGMGRL> add database TOOLCDB1_SITE2 as connect identifier is 'newbox02:1521/TOOLCDB1_SITE2';
Database "toolcdb1_site2" added
DGMGRL> edit database TOOLCDB1_SITE2 set property 'StaticConnectIdentifier'='newbox02:1521/TOOLCDB1_SITE2_DGMGRL';
Property "StaticConnectIdentifier" updated
DGMGRL> enable configuration;
Enabled.
That’s it.
Now, if I want to have the configuration observed, I need to activate the Fast Start Failover:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
DGMGRL> edit database toolcdb1_site1 set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database toolcdb1_site2 set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database toolcdb1_site1 set property FastStartFailoverTarget='toolcdb1_site2';
Property "faststartfailovertarget" updated
DGMGRL> edit database toolcdb1_site2 set property FastStartFailoverTarget='toolcdb1_site1';
Property "faststartfailovertarget" updated
DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.
DGMGRL> enable fast_start failover;
Enabled in Zero Data Loss Mode.
With just two databases, FastStartFailoverTarget is not explicitly needed, but I usually do it as other databases might be added to the configuration in the future.
After that, the broker complains that FSFO is enabled but there is no observer yet:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Zero Data Loss Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 180 seconds
Active Target: toolcdb1_site2
Potential Targets: "toolcdb1_site2"
toolcdb1_site2 valid
Observer: (none)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: 180 seconds
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL> show configuration;
Configuration - toolcdb1
Protection Mode: MaxAvailability
Members:
toolcdb1_site1 - Primary database
Warning: ORA-16819: fast-start failover observer not started
toolcdb1_site2 - (*) Physical standby database
Fast-Start Failover: Enabled in Zero Data Loss Mode
Configuration Status:
WARNING (status updated 39 seconds ago)
Observer for multiple configurations
This feature has been introduced in 12.2 but it is still not widely used.
Before 12.2, the Observer was a foreground process: the DBAs had to start it in a wrapper script executed with nohup in order to keep it live.
Since 12.2, the observer can run as a background process as far as there is a valid wallet for the connection to the databases.
Also, 12.2 introduced the capability of starting multiple configurations with a single dgmgrl command: “START OBSERVING”.
First, I need a wallet. And here comes the first compromise:
Having a single dgmgrl session to start all my configurations means that I have a single wallet for all the databases that I want to observe.
Fair enough, all the DBs (CDBs?) are managed by the same team in this case.
If I have only observers on my host I can easily point to the wallet from my central sqlnet.ora:
Which connection descriptors do I need?
The Observer uses the DGConnectIdentifier to keep observing the databases, but needs a connection to both of them using the TOOLCDB1_CFG service (unless I specify something different with the broker configuration property ConfigurationWideServiceName) to connect to the configuration and get the DGConnectIdentifier information. Again, you can check it in the doc. or the note Oracle 12.2 – Simplified OBSERVER Management for Multiple Fast-Start Failover Configurations (Doc ID 2285891.1)
So I need to specify three secrets for three connection descriptors:
Oracle Secret Store Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.
Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:
The first one will be used for the initial connection. The other two to observe the Primary and Standby.
I need to be careful that the first EZConnect descriptor matches EXACTLY what I put in observer.ora (see next step) and the last two match my DGConnectIdentifier (unless I specify something different with ObserverConnectIdentifier), otherwise I will get some errors and the observer will not observe correctly (or will not start at all).
The dgmgrl needs then a file named observer.ora.
$ORACLE_BASE/admin/observers or the central TNS_ADMIN would be good locations, but what if I have observers that must be started from multiple Oracle Homes?
In that case, having a observer.ora in $ORACLE_HOME/network/admin (or $ORACLE_BASE/homes/{OHNAME}/network/admin/ if Read-Only Oracle Home is enabled) would be a better solution: in this case I would need to start one session per Oracle Home
The content of my observer.ora must be something like:
This is the example for my configuration, but I can put as many (CONFIG=…) as I want in order to observe multiple configurations.
Then, if everything is configured properly, I can start all the observers with a single command:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
DGMGRL> SET OBSERVERCONFIGFILE=/u01/app/oracle/admin/observers/observer.ora
DGMGRL> START OBSERVING
ObserverConfigFile=observer.ora
observer configuration file parsing succeeded
Submitted command "START OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"
Check superobserver.log, individual observer logs and Data Guard Broker logs for execution details.
If the observer does not work, sometimes it is not easy to understand the cause.
Has SYSDG been granted to SYSDG user? Is SYSDG account unlocked?
Does sqlnet.ora contain the correct wallet location?
Is the wallet accessible in autologin?
Are the entries in the wallet correct? (check with “sqlplus /@connstring as sysdg”)
Missing pieces
Here, a few features that I think would be a nice addition in the future:
Awareness for the ORACLE_HOME to be used for each observer
Possibility to specify a different TNS_ADMIN per observer (different wallets)
Integration with Grid Infrastructure (srvctl add observer…) and support for multiple observers
—
Ludovico
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.