PRGO-1260:Cluster Verification checks fordatabase home provisioning failed forthe specified working copy WC_db_19_11_FPPC.
PRCR-1178:Execution of commandfailed on one ormorenodes
PRVF-7546:The work directory"/tmp/CVU_19.0.0.0.0_oracle/"cannot be used on node"fppc02"
This is often related to the filesystem /tmp that has the “noexec” option:
Shell
1
2
$mount|grep/tmp
tmpfs on/tmp typetmpfs(rw,nosuid,nodev,noexec)
Although it is tempting to just remount the filesystem with “exec”, you might be in this situation because your systems are configured to adhere to the STIG recommendations:
FPP was still young and lacking many features at that time, but it already changed the way I’ve worked during the next years. I embraced the out of place patching, developed some basic scripts to install Oracle Homes, and sought automation and standardization at all costs:
And discovered that meanwhile, FPP did giant steps, with many new features and fixes for quite a few usability and performance problems.
Last year, when joining the Oracle Database High Availability (HA), Scalability, and Maximum Availability Architecture (MAA) Product Management Team at Oracle, I took (among others) the Product Manager role for FPP.
Becoming an Oracle employee after 20 years of working with Oracle technology is a big leap. It allows me to understand how big the company is, and how collaborative and friendly the Oracle employees are (Yes, I was used to marketing nonsense, insisting salesmen, and unfriendly license auditors. This is slowly changing with Oracle embracing the Cloud, but it is still a fresh wound for many customers. Expect this to change even more! Regarding me… I’ll be the same I’ve always been 🙂 ).
Now I have daily meetings with big customers (bigger than the ones I have ever had in the past), development teams, other product managers, Oracle consultants, and community experts. My primary goal is to make the product better, increasing its adoption, and helping customers having the best experience with it. This includes testing the product myself, writing specs, presentations, videos, collecting feedback from the customers, tracking bugs, and manage escalations.
I am a Product Manager for other products as well, but I have to admit that FPP is the product that takes most of my Product Manager time. Why?
I will give a few reasons in my next blog post(s).
I have been a DBA/consultant for customers and big production environments for over twenty years. I have explained more or less my career path in this blog post.
Database (and application) high availability has always been one of my favorite areas. Over the years I have become a high availability expert (my many blog posts are there to confirm it) and I have spent a lot of time building, troubleshooting, teaching, presenting, advocating these gems of technology that are RAC, Data Guard, Application Continuity and the many other products that are part of the Oracle Maximum Availability Architecture solution. Customers fear downtime, and I have always been with them on that. But in my case, it looks like Yoda’s famous quote worked well for me (in a good way):
I’ll be joining the Oracle Maximum Availability Architecture Product Management team as MAA Product Manager (or rather Cloud MAA, I will not explain here ;-)) next November.
(for those who are not familiar with the joke, the “Dark Side” is how we often refer to the Oracle employees in the Oracle Community 😉 )
I remember just like if it was yesterday that I was presenting some Data Guard 12c new features in front of a big audience at Collaborate 2014. There I have met two incredible people that were part of the MAA Product Management team: Larry Carpenter and Markus Michalewicz. Larry has been a great source of inspiration to improve my seniority and ease of presenting in front of the public, while Markus has become a friend over the years in addition of being one of the most influent persons in my professional network.
Now I have got the opportunity to join that team, and I feel like it’s the most natural change to do in my career.
And because I imagine some of you will have some questions, there are some answers to questions I’ve been frequently asked so far:
MAA PM does not mean becoming team lead or supervising other colleagues, I’ll be a “regular” PM
I will stay in Switzerland and work remotely from here
I will stay in “the conference circus” and keep presenting as soon the COVID-19 situation will allow to do so
Yes, I was VERY happy in Trivadis and it will always have a special place in my heart
One of the challenges of automation in bin Oracle Environments is dealing with tnsnames.ora files.
These files might grow big and are sometimes hard to distribute/maintain properly.
The worst is when manual modifications are needed: manual operations, if not made carefully, can screw up the connection to the databases.
The best solution is always using LDAP naming resolution. I have seen customers using OID, OUD, Active Directory, openldapd, all with a great level of control and automation. However, some customer don’t have/want this possibility and keep relying on TNS naming resolution.
When Data Guard (and eventually RAC) are in place, the tnsnames.ora gets filled by entries for the DGConnectIdentifiers and StaticConnectIdentifier. If I add the observer, an additional entry is required to access the dbname_CFG service created by the Fast Start Failover.
Actually, all these entries are not required if I use Easy Connect.
The basic configuration with Data Guard is quite simple to achieve with Easy Connect. In this examples I have:
– The primary database TOOLCDB1_SITE1
– The duplicated database for standby TOOLCDB1_SITE2
After setting up the static registration (no Grid Infrastructure in my lab):
Lisp
1
2
3
4
5
6
7
8
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=TOOLCDB1_SITE1_DGMGRL)
(SID_NAME=TOOLCDB1)
(ORACLE_HOME=/u01/app/oracle/product/db_19_8_0)
)
)
and copying the passwordfile, the configuration can be created with:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
DGMGRL> create configuration TOOLCDB1 as primary database is TOOLCDB1_SITE1 connect identifier is 'newbox01:1521/TOOLCDB1_SITE1';
Configuration "toolcdb1" created with primary database "toolcdb1_site1"
DGMGRL> edit database TOOLCDB1_SITE1 set property 'StaticConnectIdentifier'='newbox01:1521/TOOLCDB1_SITE1_DGMGRL';
Property "StaticConnectIdentifier" updated
DGMGRL> add database TOOLCDB1_SITE2 as connect identifier is 'newbox02:1521/TOOLCDB1_SITE2';
Database "toolcdb1_site2" added
DGMGRL> edit database TOOLCDB1_SITE2 set property 'StaticConnectIdentifier'='newbox02:1521/TOOLCDB1_SITE2_DGMGRL';
Property "StaticConnectIdentifier" updated
DGMGRL> enable configuration;
Enabled.
That’s it.
Now, if I want to have the configuration observed, I need to activate the Fast Start Failover:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
DGMGRL> edit database toolcdb1_site1 set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database toolcdb1_site2 set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database toolcdb1_site1 set property FastStartFailoverTarget='toolcdb1_site2';
Property "faststartfailovertarget" updated
DGMGRL> edit database toolcdb1_site2 set property FastStartFailoverTarget='toolcdb1_site1';
Property "faststartfailovertarget" updated
DGMGRL> edit configuration set protection mode as maxavailability;
Succeeded.
DGMGRL> enable fast_start failover;
Enabled in Zero Data Loss Mode.
With just two databases, FastStartFailoverTarget is not explicitly needed, but I usually do it as other databases might be added to the configuration in the future.
After that, the broker complains that FSFO is enabled but there is no observer yet:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Zero Data Loss Mode
Protection Mode: MaxAvailability
Lag Limit: 0 seconds
Threshold: 180 seconds
Active Target: toolcdb1_site2
Potential Targets: "toolcdb1_site2"
toolcdb1_site2 valid
Observer: (none)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: 180 seconds
Observer Override: FALSE
Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Write Errors YES
Oracle Error Conditions:
(none)
DGMGRL> show configuration;
Configuration - toolcdb1
Protection Mode: MaxAvailability
Members:
toolcdb1_site1 - Primary database
Warning: ORA-16819: fast-start failover observer not started
toolcdb1_site2 - (*) Physical standby database
Fast-Start Failover: Enabled in Zero Data Loss Mode
Configuration Status:
WARNING (status updated 39 seconds ago)
Observer for multiple configurations
This feature has been introduced in 12.2 but it is still not widely used.
Before 12.2, the Observer was a foreground process: the DBAs had to start it in a wrapper script executed with nohup in order to keep it live.
Since 12.2, the observer can run as a background process as far as there is a valid wallet for the connection to the databases.
Also, 12.2 introduced the capability of starting multiple configurations with a single dgmgrl command: “START OBSERVING”.
First, I need a wallet. And here comes the first compromise:
Having a single dgmgrl session to start all my configurations means that I have a single wallet for all the databases that I want to observe.
Fair enough, all the DBs (CDBs?) are managed by the same team in this case.
If I have only observers on my host I can easily point to the wallet from my central sqlnet.ora:
Which connection descriptors do I need?
The Observer uses the DGConnectIdentifier to keep observing the databases, but needs a connection to both of them using the TOOLCDB1_CFG service (unless I specify something different with the broker configuration property ConfigurationWideServiceName) to connect to the configuration and get the DGConnectIdentifier information. Again, you can check it in the doc. or the note Oracle 12.2 – Simplified OBSERVER Management for Multiple Fast-Start Failover Configurations (Doc ID 2285891.1)
So I need to specify three secrets for three connection descriptors:
Oracle Secret Store Tool Release 21.0.0.0.0 - Production
Version 21.0.0.0.0
Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved.
Your secret/Password is missing in the command line
Enter your secret/Password:
Re-enter your secret/Password:
Enter wallet password:
The first one will be used for the initial connection. The other two to observe the Primary and Standby.
I need to be careful that the first EZConnect descriptor matches EXACTLY what I put in observer.ora (see next step) and the last two match my DGConnectIdentifier (unless I specify something different with ObserverConnectIdentifier), otherwise I will get some errors and the observer will not observe correctly (or will not start at all).
The dgmgrl needs then a file named observer.ora.
$ORACLE_BASE/admin/observers or the central TNS_ADMIN would be good locations, but what if I have observers that must be started from multiple Oracle Homes?
In that case, having a observer.ora in $ORACLE_HOME/network/admin (or $ORACLE_BASE/homes/{OHNAME}/network/admin/ if Read-Only Oracle Home is enabled) would be a better solution: in this case I would need to start one session per Oracle Home
The content of my observer.ora must be something like:
This is the example for my configuration, but I can put as many (CONFIG=…) as I want in order to observe multiple configurations.
Then, if everything is configured properly, I can start all the observers with a single command:
Oracle PL/SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
DGMGRL> SET OBSERVERCONFIGFILE=/u01/app/oracle/admin/observers/observer.ora
DGMGRL> START OBSERVING
ObserverConfigFile=observer.ora
observer configuration file parsing succeeded
Submitted command "START OBSERVER" using connect identifier "newbox01,newbox02:1521/TOOLCDB1_CFG"
Check superobserver.log, individual observer logs and Data Guard Broker logs for execution details.
In my config, the Oracle Database version is 19.7 and the databases are actually CDBs. No Grid Infrastructure, non-OMF datafiles.
It is important to highlight that a lot of things have changed since 12.1. And because 19c is the LTS version now, it does not make sense to try anything older.
First, I just want to make sure that my standbys are aligned.
PR00 (PID:6718): Media Recovery Waiting for T-1.S-41
So, yeah, not having OMF might get you some warnings like: WARNING: File being created with same name as in Primary
But it is good to know that the cascade standby deals well with new PDBs.
Of course, this is not of big interest as I know that the problem with Multitenant comes from CLONING PDBs from either local or remote PDBs in read-write mode.
So absolutely the same behavior between the two levels of standby.
According to the documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-PLUGGABLE-DATABASE.html#GUID-F2DBA8DD-EEA8-4BB7-A07F-78DC04DB1FFC
I quote what is specified for the parameter STANDBYS={ALL|NONE|…}: “If you include a PDB in a standby CDB, then during standby recovery the standby CDB will search for the data files for the PDB. If the data files are not found, then standby recovery will stop and you must copy the data files to the correct location before you can restart recovery.”
“Specify ALL to include the new PDB in all standby CDBs. This is the default.”
“Specify NONE to exclude the new PDB from all standby CDBs. When a PDB is excluded from all standby CDBs, the PDB’s data files are unnamed and marked offline on all of the standby CDBs. Standby recovery will not stop if the data files for the PDB are not found on the standby. […]”
So, in order to avoid the MRP to crash, I should have included STANDBYS=NONE
But the documentation is not up to date, because in my case the PDB is skipped automatically and the recovery process DOES NOT STOP:
However, the recovery is marked ENABLED for the PDB on the standby, while usind STANDBYS=NONE it would have been DISABLED.
Oracle PL/SQL
1
2
3
4
5
6
7
1*selectname,recovery_statusfromv$pdbs
NAMERECOVERY
------------------------------ --------
PDB$SEEDENABLED
LATERALUSENABLED
PNEUMAENABLED
So, another difference with the doc who states: “You can enable a PDB on a standby CDB after it was excluded on that standby CDB by copying the data files to the correct location, bringing the PDB online, and marking it as enabled for recovery.”
This reflects the findings of Philippe Fierens in his blog (http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).
This behavior has been introduced probably between 12.2 and 19c, but I could not manage to find exactly when, as it is not explicitly stated in the documentation.
However, I remember well that in 12.1.0.2, the MRP process was crashing.
In my configuration, not on purpose, but interesting for this article, the first standby has the very same directory structure, while the cascaded standby has not.
In any case, there is a potentially big problem for all the customers implementing Multitenant on Data Guard:
With the old behaviour (MRP crashing), it was easy to spot when a PDB was cloned online into a primary database, because a simple dgmgrl “show configuration” whould have displayed a warning because of the increasing lag (following the MRP crash).
With the current behavior, the MRP keeps recovering and the “show configuration” displays “SUCCESS” despite there is a PDB not copied on the standby (thus not protected).
The missing PDB is easy to spot once I know that I have to do it. However, for each PDB to recover (I might have many!), I have to prepare the rename of datafiles and creation of directory (do not forget I am using non-OMF here).
Now, the datafile names on the standby got changed to …/UNNAMEDnnnnn.
So I have to get the original ones from the primary database and do the same replace that db_file_name_convert would do:
Oracle PL/SQL
1
2
3
4
settrimon
colrename_filefora300
setlines400
select'set newname for datafile '||file#||' to '''||replace(name,'/TOOLCDB1/','/TOOLCDX1/')||''';'asrename_filefromv$datafilewherecon_id=6;
and put this in a rman script (this will be for the second standby, the first has the same name so same PATH):
Then, I need to stop the recovery, start it and stopping again, put the datafiles online and finally restart the recover.
These are the same steps used my Philippe in his blog post, just adapted to my taste 🙂
Now, I do not have anymore any datafiles offline on the standby:
Oracle PL/SQL
1
2
3
SQL>select'ERROR: CON_ID '||con_id||' has '||count(*)||' datafiles offline!'fromv$recover_filewhereonline_status='OFFLINE'groupbycon_id;
norowsselected
I will not publish the steps for the second standby, they are exactly the same (same output as well).
At the end, for me it is important to highlight that monitoring the OFFLINE datafiles on the standby becomes a crucial point to guarantee the health of Data Guard in Multitenant. Relying on the Broker status or “PDB recovery disabled” is not enough.
On the bright side, it is nice to see that Cascade Standby configurations do not introduce any variation, so cascaded standbys can be threated the same as “direct” standby databases.
The instance tries to resolve the cluster-scan name to detect if it is a SCAN address.
So, after it solves, it stores all the addresses it gets and registers to them.
I can check which addresses there are with this query:
In this case, the instance registers to the three addresses discovered, which is OK: all three SCAN listeners will get service updates from the instance.
the result is that the instance registers only at the first IP fot from the DNS, leaving the other SCANs without the service registration and thus random
I am sure it is “working as designed”, but I wonder if it could be an enhancement to have the address expended fully also in case of TNS alias….
Or… do you know any way to do it from a TNS alias without having the full IP list?
I am getting more and more experience with patching clusters with the local-mode automaton. The whole process would be very complex, but the local-mode automaton makes it really easy.
I have had nevertheless a couple of clusters where the process did not work:
#1: The very first cluster that I installed in 18c
This cluster has “kind of failed” patching the first node. Actually, the rhpctl command exited with an error:
server1.cern.ch: retrieving status of databases ...
server1.cern.ch: retrieving status of services of databases ...
PRCT-1011 : Failed to run "rhphelper". Detailed error: <HLP_EMSG>,RHPHELP_procCmdLine-05,</HLP_EMSG>,<HLP_VRES>3</HLP_VRES>,<HLP_IEEMSG>,PRCG-1079 : Internal error: RHPHELP122_main-01,</HLP_IEEMSG>,<HLP_ERES>1</HLP_ERES>
I am not sure about the cause, but let’s assume it is irrelevant for the moment.
#2: A cluster with new GI home not properly linked with RAC
This was another funny case, where the first node patched successfully, but the second one failed upgrading in the middle of the process with a java NullPointer exception. We did a few bad tries of prePatch and postPatch to solve, but after that the second node of the cluster was in an inconsistent state: in ROLLING_UPGRADE mode and not possible to patch anymore.
Common solution: removing the node from the cluster and adding it back
In both cases we were in the following situation:
one node was successfully patched to 18.6
one node was not patched and was not possible to patch it anymore (at least without heavy interventions)
So, for me, the easiest solution has been removing the failing node and adding it back with the new patched version.
The actual procedure to remove a node asks to deconfigure the databases and managed homes from the active cluster version. But as we manage our homes with golden images, we do not need this; we rather want to keep all the entries in the OCR so that when we add it back, everything is in place.
Once stopped the CRS, we have deinstalled the CRS home on the failing node:
Oracle PL/SQL
1
(oracle)$ $OH/deinstall/deinstall -local
This complained about the CRS that was down, but it continued and ask for this script to be executed:
We’ve got errors also for this script, but the remove process was OK afterall.
Then, from the surviving node:
Oracle PL/SQL
1
2
3
root # crsctl delete node -n server2
oracle $ srvctl stop vip -vip server2
root $ srvctl remove vip -vip server2
Adding the node back
From the surviving node, we ran gridSetup.sh and followed the steps to ad the node.
Wait before running root.sh.
In our case, we have originally installed the cluster starting with a SW_ONLY install. This type of installation keeps some leftovers in the configuration files that prevent the root.sh from configuring the cluster…we have had to modify rootconfig.sh:
Oracle PL/SQL
1
2
3
4
5
check/modify /u01/crs/crs1860/crs/config/rootconfig.sh and change this:
# before:
# SW_ONLY=true
# after:
SW_ONLY=false
then, after running root.sh and the config tools, everything was back as before removing the node form the cluster.
For one of the clusters , both nodes were at the same patch level, but the cluster was still in ROLLING_PATCH mode. So we have had to do a
OK, I really do not know what other title I should use for this post.
I have developed and presented a few times my personal approach to Oracle Home provisioning and patching. You can read more in this series.
With this approach:
I install the software (either GI or RDBMS) with the option SW_ONLY once
I patch it to the last version
I create a golden image that I evolve for the rest of the release lifecycle
When I need to install it, I just unzip the golden image and attach it to the Central Inventory.
I have discovered quite longtime ago that, every time I was attaching the home to the inventory, the binaries were relinked with rac_off, disregarding the fact that the home that I zipped actually had RAC enabled. This is quite annoying at my work at CERN, as all our databases are RAC.
So my solution to the problem is to detect if the server is on a cluster, and relink on the fly:
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
### EARLIER, IN THE ENVIRONMENT SCRIPTS
if[-f/etc/oracle/olr.loc];then
export CRS_EXISTS=1
else
export CRS_EXISTS=0
fi
### LATER, AFTER ATTACHING THE ORACLE_HOME:
pushd$ORACLE_HOME/rdbms/lib
if[$CRS_EXISTS-eq1];then
make-fins_rdbms.mkrac_on
else
make-fins_rdbms.mkrac_off
fi
make-fins_rdbms.mkioracle
This is a simplified snippet of my actual code, but it gives the idea.
What causes the relink with rac_off?
I have discovered recently that the steps used by the runInstaller process to attach the Oracle Home are described in this file:
Oracle PL/SQL
1
$ORACLE_HOME/inventory/make/makeorder.xml
and in my case, for all my golden images, it contains:
So, it does not matter how I prepare my images: unless I change this file and put rac_on, the runInstaller keeps relinking with rac_off.
I have thought about changing the file, but then realized that I prefer to check and recompile at runtime, so I can reuse my images also for standalone servers (in case we need them).
Just to avoid surprises, it is convenient to check if a ORACLE_HOME is linked with RAC with this small function:
This is true especially for Grid Infrastructure golden images, as they have the very same behavior of RDBMS homes, with the exception that they might break out-of-place patching if RAC is not enabled: the second ASM instance will not mount because the first will be exclusively mounted without the RAC option.
I have been installing Grid Infrastructure 18c for a while, then switched to 19c when it became GA.
At the beginning I have been overly enthusiast by the shorter installation time:
Grid Infra 19c install process is MUCH faster than 18c/12cR2. Mean time for 2 node clusters @ CERN (incl. volumes, puppet runs, etc.) lowered from 1h30 to 45mins. No GIMR anymore by default!
The GIMR is now optional, that means that deciding to install it is a choice of the customer, and a customer might like to keep it or not, depending on its practices.
Not having the GIMR by default means not having the local-mode automaton. This is also not a problem at all. The default configuration is good for most customers and works really well.
This new simplified configuration reduces some maintenance effort at the beginning, but personally I use a lot the local-mode automaton for out-of-place patching of Grid Infrastructure (read my blog posts to know why I really love the local-mode automaton), so it is something that I definitely need in my clusters.
A choice that makes sense for Oracle and most customers
Oracle vision regarding Grid Infrastructure consists of a central management of clusters, using the Oracle Domain Services Cluster. In this kind of deployment, the Management Repository, TFA, and many other services, are centralized. All the clusters use those services remotely instead of having them configured locally. The local-mode automaton is no exception: the full, enterprise-grade version of Fleet Patching and Provisioning (FPP, formerly Rapid home provisioning or RHP) allows much more than just out-of-place patching of Grid Infrastructure, so it makes perfectly sense to avoid those configurations everywhere, if you use a Domain Cluster architecture. Read more here.
Again, as I said many times in the past, doing out-of-place patching is the best approach in my opinion, but if you keep doing in-place patching, not having the local-mode automaton is not a problem at all and the default behavior in 19c is a good thing for you.
I need local-mode automaton on 19c, what I need to do at install time?
If you have many clusters, you are not installing them by hand with the graphic interface (hopefully!). In the responseFile for 19c Grid Infrastructure installation, this is all you need to change comparing to a 18c:
Registering database with Oracle Grid Infrastructure
14% complete
Copying database files
43% complete
Creating and starting Oracle instance
45% complete
49% complete
54% complete
58% complete
62% complete
Completing Database Creation
66% complete
69% complete
71% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb.
Database Information:
Global Database Name:_mgmtdb
System Identifier(SID):-MGMTDB
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb2.log" for further details.
Then, configure the PDB for the cluster. Pay attention to the -local switch that is not documented (or at least it does not appear in the inline help):
Oracle PL/SQL
1
$ mgmtca -local
After that, you might check that you have the PDB for your cluster inside the MGMTDB, I’ll skip this step.
Before creating the rhpserver (local-mode automaton resource), we need the volume and filesystem to make it work (read here for more information).
Again, note that there is a -local switch that is not documented. Specifying it will create the resource as a local-mode automaton and not as a full FPP Server (or RHP Server, damn, this change of name gets me mad when I write blog posts about it 🙂 ).
If you use Oracle Clusterware or you deploy your databases to the Oracle Cloud, you probably have some application services defined with srvctl for your database.
If you have many databases, services and nodes, it might be annoying, when doing maintenance or service relocation, to have a quick overview about how services are distributed across the nodes and what’s their status.
With srvctl (the official tool for that), it is a per-database operation:
Oracle PL/SQL
1
2
$ srvctl status service
PRKO-2082 : Missing mandatory option -db
If you have many databases, you have to run db by db.
It is also slow! For example, this database has 20 services. Getting the status takes 27 seconds:
Service SERVICE_NUMBER_01 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_02 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_03 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_04 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_05 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_06 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_07 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_08 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_09 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_10 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_11 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_12 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_13 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_14 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_15 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_16 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_17 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_18 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_19 is running on instance(s) HRDEV4
Service SERVICE_NUMBER_20 is running on instance(s) HRDEV4
real 0m27.858s
user 0m1.365s
sys 0m1.143s
Instead of operating row-by-row (get the status for each service), why not relying on the cluster resources with crsctl and get the big picture once?
Oracle PL/SQL
1
2
3
4
5
6
7
$ time crsctl stat res -f -w "(TYPE = ora.service.type)"
...
...
real 0m0.655s
user 0m0.169s
sys 0m0.098s
crsctl stat res -f returns a list of ATTRIBUTE_NAME=value for each service, eventually more than one if the service is not singleton/single instance but uniform/multi instance.
By parsing them with some awk code can provide nice results!
STATE, INTERNAL_STATE and TARGET are useful in this case and might be used to display colours as well.
I’d be curious to know if it works well for your environment, please comment here. 🙂
Thanks
—
Ludo
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.