Data Guard, Easy Connect and the Observer for multiple configurations

EZConnect

One of the challenges of automation in bin Oracle Environments is dealing with tnsnames.ora files.
These files might grow big and are sometimes hard to distribute/maintain properly.
The worst is when manual modifications are needed: manual operations, if not made carefully, can screw up the connection to the databases.
The best solution is always using LDAP naming resolution. I have seen customers using OID, OUD, Active Directory, openldapd, all with a great level of control and automation. However, some customer don’t have/want this possibility and keep relying on TNS naming resolution.
When Data Guard (and eventually RAC) are in place, the tnsnames.ora gets filled by entries for the DGConnectIdentifiers and StaticConnectIdentifier. If I add the observer, an additional entry is required to access the dbname_CFG service created by the Fast Start Failover.

Actually, all these entries are not required if I use Easy Connect.

My friend Franck Pachot wrote a couple of nice blog posts about Easy Connect while working with me at CERN:
https://medium.com/@FranckPachot/19c-easy-connect-e0c3b77968d7

https://medium.com/@FranckPachot/19c-ezconnect-and-wallet-easy-connect-and-external-password-file-8e326bb8c9f5

Basic Data Guard configuration

The basic configuration with Data Guard is quite simple to achieve with Easy Connect. In this examples I have:
– The primary database TOOLCDB1_SITE1
– The duplicated database for standby TOOLCDB1_SITE2

After setting up the static registration (no Grid Infrastructure in my lab):

and copying the passwordfile, the configuration can be created with:

That’s it.

Now, if I want to have the configuration observed, I need to activate the Fast Start Failover:

With just two databases, FastStartFailoverTarget is not explicitly needed, but I usually do it as other databases might be added to the configuration in the future.
After that, the broker complains that FSFO is enabled but there is no observer yet:

 

Observer for multiple configurations

This feature has been introduced in 12.2 but it is still not widely used.
Before 12.2, the Observer was a foreground process: the DBAs had to start it in a wrapper script executed with nohup in order to keep it live.
Since 12.2, the observer can run as a background process as far as there is a valid wallet for the connection to the databases.
Also, 12.2 introduced the capability of starting multiple configurations with a single dgmgrl command: “START OBSERVING”.

For more information about it, you can check the documentation here:
https://docs.oracle.com/en/database/oracle/oracle-database/19/dgbkr/using-data-guard-broker-to-manage-switchovers-failovers.html#GUID-BC513CDB-1E06-4EB3-9FE1-E1331E15E492

How to set it up with Easy Connect?

First, I need a wallet. And here comes the first compromise:
Having a single dgmgrl session to start all my configurations means that I have a single wallet for all the databases that I want to observe.
Fair enough, all the DBs (CDBs?) are managed by the same team in this case.
If I have only observers on my host I can easily point to the wallet from my central sqlnet.ora:

Otherwise I need to create a separate TNS_ADMIN for my observer management environment.
Then, I create the wallet:

Now I need to add the connection descriptors.

Which connection descriptors do I need?
The Observer uses the DGConnectIdentifier to keep observing the databases, but needs a connection to both of them using the TOOLCDB1_CFG service (unless I specify something different with the broker configuration property ConfigurationWideServiceName) to connect to the configuration and get the DGConnectIdentifier information. Again, you can check it in the doc. or the note Oracle 12.2 – Simplified OBSERVER Management for Multiple Fast-Start Failover Configurations (Doc ID 2285891.1)

So I need to specify three secrets for three connection descriptors:

The first one will be used for the initial connection. The other two to observe the Primary and Standby.
I need to be careful that the first EZConnect descriptor matches EXACTLY what I put in observer.ora (see next step) and the last two match my DGConnectIdentifier (unless I specify something different with ObserverConnectIdentifier), otherwise I will get some errors and the observer will not observe correctly (or will not start at all).

The dgmgrl needs then a file named observer.ora.
$ORACLE_BASE/admin/observers or the central TNS_ADMIN would be good locations, but what if I have observers that must be started from multiple Oracle Homes?
In that case, having a observer.ora in $ORACLE_HOME/network/admin (or $ORACLE_BASE/homes/{OHNAME}/network/admin/ if Read-Only Oracle Home is enabled) would be a better solution: in this case I would need to start one session per Oracle Home

The content of my observer.ora must be something like:

This is the example for my configuration, but I can put as many (CONFIG=…) as I want in order to observe multiple configurations.
Then, if everything is configured properly, I can start all the observers with a single command:

Troubleshooting

If the observer does not work, sometimes it is not easy to understand the cause.

  • Has SYSDG been granted to SYSDG user? Is SYSDG account unlocked?
  • Does sqlnet.ora contain the correct wallet location?
  • Is the wallet accessible in autologin?
  • Are the entries in the wallet correct? (check with “sqlplus /@connstring as sysdg”)

Missing pieces

Here, a few features that I think would be a nice addition in the future:

  • Awareness for the ORACLE_HOME to be used for each observer
  • Possibility to specify a different TNS_ADMIN per observer (different wallets)
  • Integration with Grid Infrastructure (srvctl add observer…) and support for multiple observers

Ludovico

Script to check Data Guard status from SQL

In a previous blog post I have explained how to get the basic configuration from x$drc and display something like:

There are other possibilities, by using the DBMS_DRS PL/SQL package.

The package is quite rich. In order to get more details, I use CHECK_CONNECT to check the connectivity to the member databases:

example:

In the first case I get no exceptions, that means that the database is reachable using the DGConnectIdentifier specified in the configuration (‘TOOLCDB1_SITE2’ is my database name in the configuration, it is NOT a TNS entry. I use EZConnect in my lab).

In the second case I specify a database that is not in the configuration.

In the third case, it looks like the database is down (no service), or the DGConnectIdentifier is not correct.

 

GET_PROPERTY_OBJ is useful to get e single property of a database/instance:

Example:

Here I have, for the primary (the object_id from x$drc), a TransportLagThreshold of 30 seconds.

DO_CONTROL does a specific check and returns a document with the results:

The problem is… what’s the format for indoc?

To get the correct format, I have enabled sql trace to get the executions, with bind variables, of the dgmgrl commands. It happens that the input format is XML and the output format is HTML.

This is how you can get the LogXptStatus, for example:

 

The big script

So I said… why not trying to have a comprehensive SQL script to check a few vital statuses of Data Guard?

This is the script that came out:

Of course, it is not perfect (many checks missing: FSFO readiness, observer checks, etc.), but it is good enough for base monitoring. Also, it’s faster than a normal shell+dgmgrl script.

Output on a Primary database:

Output on a standby database:

In case of errors (e.g. standby listener stopped), I would get:

So easy to spot the error and use a shell wrapper to grep ^ERROR or similar.

Be careful, the script is not RAC aware, and it lacks some checks, so you might want to reuse it and extend it to fit your exact configuration.

Hope you like it!

Ludovico

Real-Time Cascade Standby Container Databases without Oracle Managed Files

OK, the title might not be the best… I just would like to add more detail to content you can already find in other blogs (E.g. this nice one from Philippe Fierens http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).

I have this Cascade Standby configuration:

Years ago I wrote this whitepaper about cascaded standbys:
https://fr.slideshare.net/ludovicocaldara/2014-603-caldarappr
While it is still relevant for non-CDBs, things have changed with Multitenant architecture.

In my config, the Oracle Database version is 19.7 and the databases are actually CDBs. No Grid Infrastructure, non-OMF datafiles.
It is important to highlight that a lot of things have changed since 12.1. And because 19c is the LTS version now, it does not make sense to try anything older.

First, I just want to make sure that my standbys are aligned.

Primary:

1st Standby alert log:

2nd Standby alert log:

Then, I create a pluggable database (from PDB$SEED):

On the first standby I get:

On the second:

So, yeah, not having OMF might get you some warnings like: WARNING: File being created with same name as in Primary
But it is good to know that the cascade standby deals well with new PDBs.

Of course, this is not of big interest as I know that the problem with Multitenant comes from CLONING PDBs from either local or remote PDBs in read-write mode.

So let’s try a relocate from another CDB:

This is what I get on the first standby:

and this is on the cascaded standby:

So absolutely the same behavior between the two levels of standby.
According to the documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-PLUGGABLE-DATABASE.html#GUID-F2DBA8DD-EEA8-4BB7-A07F-78DC04DB1FFC
I quote what is specified for the parameter STANDBYS={ALL|NONE|…}:
“If you include a PDB in a standby CDB, then during standby recovery the standby CDB will search for the data files for the PDB. If the data files are not found, then standby recovery will stop and you must copy the data files to the correct location before you can restart recovery.”

“Specify ALL to include the new PDB in all standby CDBs. This is the default.”

Specify NONE to exclude the new PDB from all standby CDBs. When a PDB is excluded from all standby CDBs, the PDB’s data files are unnamed and marked offline on all of the standby CDBs. Standby recovery will not stop if the data files for the PDB are not found on the standby. […]”

So, in order to avoid the MRP to crash, I should have included STANDBYS=NONE
But the documentation is not up to date, because in my case the PDB is skipped automatically and the recovery process DOES NOT STOP:

However, the recovery is marked ENABLED for the PDB on the standby, while usind STANDBYS=NONE it would have been DISABLED.

So, another difference with the doc who states:
“You can enable a PDB on a standby CDB after it was excluded on that standby CDB by copying the data files to the correct location, bringing the PDB online, and marking it as enabled for recovery.”

This reflects the findings of Philippe Fierens in his blog (http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).

This behavior has been introduced probably between 12.2 and 19c, but I could not manage to find exactly when, as it is not explicitly stated in the documentation.
However, I remember well that in 12.1.0.2, the MRP process was crashing.

In my configuration, not on purpose, but interesting for this article, the first standby has the very same directory structure, while the cascaded standby has not.

In any case, there is a potentially big problem for all the customers implementing Multitenant on Data Guard:

With the old behaviour (MRP crashing), it was easy to spot when a PDB was cloned online into a primary database, because a simple dgmgrl “show configuration” whould have displayed a warning because of the increasing lag (following the MRP crash).

With the current behavior, the MRP keeps recovering and the “show configuration” displays “SUCCESS” despite there is a PDB not copied on the standby (thus not protected).

Indeed, this is what I get after the clone:

I can see that the Data Guard Broker is completely silent about the missing PDB. So I might think my PDB is protected while it is not!

I actually have to add a check on the standby DBs to check if I have any missing datafiles:

This check should be implemented and put under monitoring (custom metrics in OEM?)

The missing PDB is easy to spot once I know that I have to do it. However, for each PDB to recover (I might have many!), I have to prepare the rename of datafiles and creation of directory (do not forget I am using non-OMF here).

Now, the datafile names on the standby got changed to …/UNNAMEDnnnnn.

So I have to get the original ones from the primary database and do the same replace that db_file_name_convert would do:

and put this in a rman script (this will be for the second standby, the first has the same name so same PATH):

Then, I need to stop the recovery, start it and stopping again, put the datafiles online and finally restart the recover.
These are the same steps used my Philippe in his blog post, just adapted to my taste 🙂

For the second part, I use this HEREDOC to online all offline datafiles:

and finally:

Now, I do not have anymore any datafiles offline on the standby:

I will not publish the steps for the second standby, they are exactly the same (same output as well).

At the end, for me it is important to highlight that monitoring the OFFLINE datafiles on the standby becomes a crucial point to guarantee the health of Data Guard in Multitenant. Relying on the Broker status or “PDB recovery disabled” is not enough.

On the bright side, it is nice to see that Cascade Standby configurations do not introduce any variation, so cascaded standbys can be threated the same as “direct” standby databases.

HTH

Ludovico

How to get the Data Guard broker configuration from a SQL query?

Everybody knows that you can get the Data Guard configuration in dgmgrl with the command:

but few know that this information is actually available in x$drc:

The table design is not very friendly, because it is a mix of ATTRIBUTE/VALUE pairs (hence many rows per object) and specific columns.

So, in order to get something usable to show the databases and their status, the only solution is to make use of PIVOT().

To get it in a friendly format, I recommend using SQLcl and settting

This is what I get on a simple two databases configuration (primary/standby):

and with a real-time cascade standby:

(interesting to note the leading :RTC_ON in the ship_to attribute).

Although it is much easier to get this information from DGMGRL, get it programmatically is more sure/flexible using the SQL interface, as you know what you want to get, no matter how the dgmgrl syntax changes.

Looking forward to have the REST APIs in a future version of Data Guard 🙂

Ludovico

Awk to format the files .ora (listener.ora, tnsnames.ora, etc)

Tired of formatting the tnsnames.ora to make it more readable, I have taken the nice awk examples from Jeremy Schneider: https://ardentperf.com/2008/11/28/parsing-listenerora-with-awk-and-sed/ and created a function to format all files .ora (lisp-like config files).

Example, before:

and after:

The AWK script:

I have included the function in the COE github repo. More functions to come (hopefully).

Ludo

Multitenant Pills: Partial PDB cloning (and cleanup)

When consolidating to multitenant, there are several consolidation patterns.

  • Big, complex databases usually have special requirements for which it might be a good choice to go to single-tenant (a single PDB in one CDB)
  • Small, relatively easy databases are the best candidate for consolidation to multitenant
  • Schema consolidated databases require special attention, but in general there are several advantages to convert individual schemas (or group of schemas) to individual PDBs

For the latter, there are some techniques to convert a schema in a PDB.

  • export/import (obviously), with eventually Golden Gate to do it online
  • Transportable tablespaces (if the schemas follow strict 1-to-1 tablespace separation
  • partial PDB cloning

We will focus on the last one for this blog post.

Situation

Here we have a PDB with some schemas, each of them has a dedicated tablespace, but accidentally, two of them have also some objects on a common tablespace.

This happens frequently when all the users have quota on the default database tablespace and they do not have necessarily a personal default tablespace.

This is the typical situation where transportable tablespaces become hard to achieve without some upfront segment movement, as tablespaces are not self-contained.

Thankfully, Oracle Multitenant allows us to clone a PDB from a remote one and specify only a subset of tablespaces.

Here is a full example script with some checks and fancy parameters:

This is an example output:

If the clone process succeeds, at the end we should have the new ABC pluggable database with ABC and DATA tablespaces only.

Yeah!

Any Cleanup needed?

What happened to the users? Actually, they are all still there:

And the segments in the two skipped tablespaces are not there:

So the table definitions are also gone?

Not at all! The tables are still there and reference to tablespaces that do not exist anymore. Possible?

Actually, the tablespaces definition are still there if we look at v$tablespace:

If we give a look at the DBA_TABLESPACES view definition, there are a few interesting filters:

What is their meaning?

So the first WHERE clause skips all the INVALID TABLESPACES (when you drop a tablespace it is still stored in ts$ with this state), the second skips the definition of TEMPORARY TABLESPACE GROUPS, the third one is actually our candidate.

Indeed, this is what we get from ts$ for these tablespaces:

So the two tablespaces are filtered out because of this new multitenant flag.

If we try to drop the tablespaces, it succeeds:

But the user GHI, who has no objects anymore, is still there.

So we need to drop it explicitly.

Automate the cleanup

This is an example PL/SQL that is aimed to automate the cleanup.

Actually:

  • Users that had segments in one of the excluded tablespaces but do not have any segments left are just LOCKED (for security reasons, you can guess why).
  • Tablespaces that meet the “excluded PDB” criteria, are just dropped

This is the output for the clone procedure we have just seen:

The PL/SQL block can be quite slow depending on the segments and tablespaces, so it might be a good idea to have a custom script instead of this automated one.

What about user DEF?

The automated procedure has not locked the account DEF. Why?

Actually, the user DEF still has some segments in the DATA tablespace. Hence, the procedure cannot be sure what was the original intention: copy the user ABC ? The clone procedure allows only to specify the tablespaces, so this is the only possible result.

Promo: If you need to migrate to Multitenant and you need consulting/training, just contact me, I can help you 🙂

 

Multitenant Pills – Change DBID for an existing PDB

When you plug the same PDB many times you have to specify “AS COPY” in the syntax:

Otherwise, you will get an error similar to:

There are cases, however, where you cannot do it. For example, it the existing PDB should have been the clone, or if you are converting a copy of the same database from Non-CDB to PDB using autoupgrade (with autoupgrade you cannot modify the CREATE PLUGGABLE DATABASE statement).

In this case, the solution might be to change the DBID of the existing PDB, via unplug/plug:

 

EDIT 2024-01-03: If you use Transparent Data Encryption, you have to export the key before you can unplug the PDB, or you will encounter:

Before closing the PDB, just run:

After opening the PDB, you’ll need to import the key again:

Don’t forget to keep your master key exports in a secure place.

Ludo

Parallel Oracle Catalog/Catproc creation with catpcat.sql

With Oracle 19c, Oracle has released a new script, annotated for parallel execution, to create the CATALOG and CATPROC in parallel at instance creation.

I have a customer who is in the process of migrating massively to Multitenant using many CDBs, so I decided to give it a try to check how much time they could save for the CDB creations.

I have run the tests on my laptop, on a VirtualBox VM with 4 vCPUs.

Test 1: catalog.sql + catproc.sql

In this test, I use the classic way (this is also the case when DBCA creates the scripts):

The catalog is created first on CDB$ROOT and PDB$SEED. Then the catproc is created.

Looking at the very first occurrence of BEGIN_RUNNING (start of catalog for CDB$ROOT) and the very last of END_RUNNING in the log (end of catproc in PDB$SEED), I can see that it took ~ 44 minutes to complete:

 

Test 2: catpcat.sql

In this test, I use the new catpcat.sql using catctl.pl, with a parallelism of 4 processes:

This creates catalog and catproc first on CDB$ROOT, than creates them on PDB$SEED. So, same steps but in different orders.

By running vmstat in the background, I noticed during the run that most of the time the creation was running serially, and when there was some parallelism, it was short and compensated by a lot of process synchronizations (waits, sleeps) done by the catctl.pl.

At the end, the process took ~ 45 minutes to complete.

So the answer is no: it is not faster to run catpcat.sql with parallel degree 4 compared to running catalog.sql and catproc.sql serially.

HTH

Ludo

Oracle Home Management – part 7: Putting all together

Last part of the blog series… let’s see how to put everything together and have a single script that creates and provisions Oracle Home golden images:

Review of the points

The scripts will:

  • let create a golden image based on the current Oracle Home
  • save the golden image metadata into a repository (an Oracle schema somewhere)
  • list the avilable golden images and display whether they are already deployed on the current host
  • let provision an image locally (pull, not push), either with the default name or a new name

Todo:

  • Run as root in order to run root.sh automatically (or let specify the sudo command or a root password)
  • Manage Grid Infrastructure homes

Assumptions

  • There is an available Oracle schema where the golden image metadata will be stored
  • There is an available NFS share that contains the working copies and golden images
  • Some variables must be set accordingly to the environment in the script
  • The function setoh is defined in the environment (it might be copied inside the script)
  • The Instant Client is installed and “setoh ic” correctly sets its environment. This is required because there might be no sqlplus binaries available at the very first deploy
  • Oracle Home name and path’s basename are equal for all the Oracle Homes

Repository table

First we need a metadata table. Let’s keep it as simple as possible:

Helpers

The script has some functions that check stuff inside the central inventory.

e.g.

checks if a specific Oracle Home (name) is present in the central inventory. It is helpful to check, for every golden image in the matadata repository, if it is already provisioned or not:

Variables

Some variables must be changed, but in general you might want to adapt the whole script to fit your needs.

Image creation

The image creation would be as easy as creating a zip file, but there are some files that we do not want to include in the golden image, therefore we need to create a staging directory (working copy) to clean up everything:

Home provisioning

Home provisioning requires, beside some checks, a runInstaller -clone command, eventually a relink, eventually a setasmgid, eventually some other tasks, but definitely  run root.sh. This last task is not automated yet in my deployment script.

 

Usage

Examples

List installed homes:

Create a golden image 12_1_0_2_BP170718 from the Oracle Home named OraDB12Home2 (tha latter having been installed manually without naming convention):

List the new golden image from the metadata repository:

Reinstalling the same home with the new naming convention:

Installing the same home in a new path for manual patching from 170718 to 180116:

New home situation:

Patch manually the home named  12_1_0_2_BP180116 with the January bundle patch:

Create the new golden image from the home patched with January bundle patch:

Full source code

Full source code of ohctl

I hope you find it useful! The cool thing is that once you have the golden images ready in the golden image repository, then the provisioning to all the servers is striaghtforward and requires just a couple of minutes, from nothing to a full working and patched Oracle Home.

Why applying the patch manually?

If you read everything carefully, I automated the golden image creation and provisioning, but the patching is still done manually.

The aim of this framework is not to patch all the Oracle Homes with the same patch, but to install the patch ONCE and then deploy the patched home everywhere. Because each patch has different conflicts, bugs, etc, it might be convenient to install it manually the first time and then forget it. At least this is my opinion 🙂

Of course, patch download, conflict detection, etc. can also be automated (and it is a good idea, if you have the time to implement it carefully and bullet-proof).

In the addendum blog post, I will show some scripts made by Hutchison Austria and why I find them really useful in this context.

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Home Management – part 6: Simple Golden Image Blueprint

As I explained in the previous blog posts, from a manageability perspective, you should not change the patch level of a deployed Oracle Home, but rather install and patch a new Oracle Home.

With the same principle, Oracle Homes deployed on different hosts should have an identical patch level for the same name. For example, an Oracle Home /u01/app/oracle/product/EE12_1_0_2_BP171018 should have the same patch level on all the servers.

To guarantee the same binaries and patch levels everywhere, the simple solution that I am shoing in this series is to store copies of the Oracle Homes somewhere and use them as golden images. (Another approach, really different and cool, is used by Ilmar Kerm: he explains it here https://ilmarkerm.eu/blog/2018/05/oracle-home-management-using-ansible/ )

For this, we will use a Golden Image store (that could be a NFS share mounted on the Oracle Database servers, or a remote host accessible with scp, or other) and a metadata store.

golden-image-storeWhen all the software is deployed from golden images, there is the guarantee that all the Homes are equal; therefore the information about patches and bugfixes might be centralized in one place (golden image metadata).

A typical Oracle Home lifecycle:

  • Install the software manually the first time
  • Create automatically a golden image from the Oracle Home
  • Deploy automatically the golden image on the other servers

When a new patch is needed:

  • Deploy automatically the golden image to a new Oracle Home
  • Patch manually (or automatically!) the new Oracle Home
  • Create automatically the new golden image with the new name
  • Deploy automatically the new golden image to the other servers

The script that automates this lifecycle does just two main actions:

  • Automates the creation of a new golden image
  • Deploys an existing image to an Oracle Home (either with a new path or the default one)
  • (optional: uninstall an existing Home)

Let’s make a graphical example of the previously described steps:

oh-mgmt-lifecycleHere, the script ohctl takes two actions: -c (creates a Golden Image) and -i (installs a Golden Image)).

The create action does the following steps:

  • Copies the content to a working directory
  • Cleans up logs, audits, etc.
  • Creates the zip file
  • Stores the zip file in a shared NFS repository
  • Inserts the metadata of the new golden image in a repository

The install action does the following steps:

  • Checks if the image is already deployed (plus other security checks)
  • Creates the new path based on the name of the image or the new name passed as argument
  • Unzips the content in the new Oracle Home
  • Runs the runInstaller –clone to attach the home in the central inventory and (optionally) set a new Home name
  • (optionally) Relinks the oracle binary with the RAC option
  • Run setasmgid if found
  • Other environment-specific tasks (e.g. dealing with TNS_ADMIN links)

By following this pattern, Oracle Home names and paths are clean and the same everywhere. This facilitates the deployment and the patching.

You can find the Oracle Home cloning steps in the Oracle Database documentation:

Cloning an Oracle Home

In the next blog post I will explain parts of the ohctl source code and give some examples of how I use it (and publish a link to the full source code 🙂 )