Real-Time Cascade Standby Container Databases without Oracle Managed Files

OK, the title might not be the best… I just would like to add more detail to content you can already find in other blogs (E.g. this nice one from Philippe Fierens http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).

I have this Cascade Standby configuration:

Years ago I wrote this whitepaper about cascaded standbys:
https://fr.slideshare.net/ludovicocaldara/2014-603-caldarappr
While it is still relevant for non-CDBs, things have changed with Multitenant architecture.

In my config, the Oracle Database version is 19.7 and the databases are actually CDBs. No Grid Infrastructure, non-OMF datafiles.
It is important to highlight that a lot of things have changed since 12.1. And because 19c is the LTS version now, it does not make sense to try anything older.

First, I just want to make sure that my standbys are aligned.

Primary:

1st Standby alert log:

2nd Standby alert log:

Then, I create a pluggable database (from PDB$SEED):

On the first standby I get:

On the second:

So, yeah, not having OMF might get you some warnings like: WARNING: File being created with same name as in Primary
But it is good to know that the cascade standby deals well with new PDBs.

Of course, this is not of big interest as I know that the problem with Multitenant comes from CLONING PDBs from either local or remote PDBs in read-write mode.

So let’s try a relocate from another CDB:

This is what I get on the first standby:

and this is on the cascaded standby:

So absolutely the same behavior between the two levels of standby.
According to the documentation: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/CREATE-PLUGGABLE-DATABASE.html#GUID-F2DBA8DD-EEA8-4BB7-A07F-78DC04DB1FFC
I quote what is specified for the parameter STANDBYS={ALL|NONE|…}:
“If you include a PDB in a standby CDB, then during standby recovery the standby CDB will search for the data files for the PDB. If the data files are not found, then standby recovery will stop and you must copy the data files to the correct location before you can restart recovery.”

“Specify ALL to include the new PDB in all standby CDBs. This is the default.”

Specify NONE to exclude the new PDB from all standby CDBs. When a PDB is excluded from all standby CDBs, the PDB’s data files are unnamed and marked offline on all of the standby CDBs. Standby recovery will not stop if the data files for the PDB are not found on the standby. […]”

So, in order to avoid the MRP to crash, I should have included STANDBYS=NONE
But the documentation is not up to date, because in my case the PDB is skipped automatically and the recovery process DOES NOT STOP:

However, the recovery is marked ENABLED for the PDB on the standby, while usind STANDBYS=NONE it would have been DISABLED.

So, another difference with the doc who states:
“You can enable a PDB on a standby CDB after it was excluded on that standby CDB by copying the data files to the correct location, bringing the PDB online, and marking it as enabled for recovery.”

This reflects the findings of Philippe Fierens in his blog (http://pfierens.blogspot.com/2020/04/19c-data-guard-series-part-iii-adding.html).

This behavior has been introduced probably between 12.2 and 19c, but I could not manage to find exactly when, as it is not explicitly stated in the documentation.
However, I remember well that in 12.1.0.2, the MRP process was crashing.

In my configuration, not on purpose, but interesting for this article, the first standby has the very same directory structure, while the cascaded standby has not.

In any case, there is a potentially big problem for all the customers implementing Multitenant on Data Guard:

With the old behaviour (MRP crashing), it was easy to spot when a PDB was cloned online into a primary database, because a simple dgmgrl “show configuration” whould have displayed a warning because of the increasing lag (following the MRP crash).

With the current behavior, the MRP keeps recovering and the “show configuration” displays “SUCCESS” despite there is a PDB not copied on the standby (thus not protected).

Indeed, this is what I get after the clone:

I can see that the Data Guard Broker is completely silent about the missing PDB. So I might think my PDB is protected while it is not!

I actually have to add a check on the standby DBs to check if I have any missing datafiles:

This check should be implemented and put under monitoring (custom metrics in OEM?)

The missing PDB is easy to spot once I know that I have to do it. However, for each PDB to recover (I might have many!), I have to prepare the rename of datafiles and creation of directory (do not forget I am using non-OMF here).

Now, the datafile names on the standby got changed to …/UNNAMEDnnnnn.

So I have to get the original ones from the primary database and do the same replace that db_file_name_convert would do:

and put this in a rman script (this will be for the second standby, the first has the same name so same PATH):

Then, I need to stop the recovery, start it and stopping again, put the datafiles online and finally restart the recover.
These are the same steps used my Philippe in his blog post, just adapted to my taste 🙂

For the second part, I use this HEREDOC to online all offline datafiles:

and finally:

Now, I do not have anymore any datafiles offline on the standby:

I will not publish the steps for the second standby, they are exactly the same (same output as well).

At the end, for me it is important to highlight that monitoring the OFFLINE datafiles on the standby becomes a crucial point to guarantee the health of Data Guard in Multitenant. Relying on the Broker status or “PDB recovery disabled” is not enough.

On the bright side, it is nice to see that Cascade Standby configurations do not introduce any variation, so cascaded standbys can be threated the same as “direct” standby databases.

HTH

Ludovico

Duplicating a DB and setting up Data Guard through CMAN and SSH tunnel

I am fascinated about the new Zero Downtime Migration tool that has been available since November 28th. Whilst I am still in the process of testing it, there is one big requirement that might cause some headache to some customers. It is about network connectivity:

Configuring Connectivity Between the Source and Target Database Servers

The source database server […] can connect to target database instance over target SCAN through the respecitve scan port and vice versa.
The SCAN of the target should be resolvable from the source database server, and the SCAN of the source should resolve from the target server.
Having connectivity from both sides, you can synchronize between the source database and target database from either side. […]

If you are taking cloud migrations seriously, you should have either a VPN site-to-site to the cloud, or a Fast Connect link. At CERN we are quite lucky to have a high bandwidth Fast Connect to OCI Frankfurt.

This requirement might be missing for many customers, so what is a possible solution to setup connectivity for database duplicates and Data Guard setups?

In the picture above you can see a classic situation, that usually has two problems that must be solved:

  • the SCAN addresses are private: not accessible from internet
  • there are multiple SCAN addresses, so tunneling through all of them might be complex

Is it possible to configure CMAN in front of the SCAN listener as a single IP entry and tunnel through SSH to this single IP?

I will show now how to achieve this configuration.

For sake of simplicity, I have put two single instances without SCAN and a CMAN installation on the database servers, but it will work with little modification using SCAN and RAC setups as well. Note that in a Cloud Infrastructure setup, this will require a correct setup of the TDE wallet on both the source and the destination.

Because I put everything on s single host, I have to setup CMAN to listen to another port, but having a separate host for CMAN is a better practice when it has to proxy to SCAN listeners.

Installing and configuring CMAN

The most important part of the whole setup is that the CMAN on the standby site must have a public IP address and open SSH port so that we can tunnel through it.

The on-premises CMAN must have open access to the standby CMAN port 22.

For both primary and standby site you can follow the instructions of my blog post: Install and configure CMAN 19c in the Oracle Cloud, step by step.

In my example, because I install CMAN on the same host of the DB, I configure CMAN to run on port 1522.

CMAN primary config:

CMAN standby config:

This configuration is not secure at all, you might want to secure it further in order to allow only the services needed for setting up Data Guard.

The registration of database services to CMAN through the the remote_listener parameter is optional, as I will register the entries statically in the listener and use a routed connection through CMAN.

Listener configuration

The listener must have a static entry for the database, so that duplicate and switchover work properly.

On primary add to listener.ora:

On standby:

In a RAC config, all the local listeners must be configured with the correct SID_NAME running on the host. Make sure to reload the listeners 😉

Creating the SSH tunnels

There must be two tunnels open: one that tunnels from on-premises to the cloud and the other that tunnels from the cloud to on-premises.

However, such tunnels can both be created from the on-premises CMAN host that has access to the cloud CMAN host:

in my case, the hostnames are:

Important: with CMAN on a host other than the DB server, the CMAN sshd must be configured to have GatewayPorts set to yes:

After the tunnels are open, any connections to the local CMAN server port 1523 will be forwarded to the remote CMAN port 1522.

Configuring the TNSNAMES to hop through CMAN and SSH tunnel

Both servers must have now one entry for the local database pointing to the actual SCAN (or listener for single instances) and one entry for the remote database pointing to local port 1523 and routing to the remote scan.

On-premises tnsnames.ora:

Cloud tnsnames.ora:

After copying the passwordfile and starting nomount the cloud database, it should be possible from both sides to connect as SYSDBA to both DB_CLOUD and DB_ONPREM.

This configuration is ready for both duplicate from active database and for Data Guard.
I still have to figure out if it works with ZDM, but I think it is a big step towards establishing connection between on-premises and the Oracle Cloud when no VPN or Fast Connect are available.

Duplicate from active database

If the setup is correct, this should work:

Setting up Data Guard

  • Configure broker config files
  • Add and clear the standby logs
  • Start the broker
  • Create the configuration:

    The static connect identifier here is better if it uses the TNSNAMES resolution because each database sees each other differently.

Checking the DG config

A validate first:

Than a switchover, back and forth:

Conclusion

Yes, it is possible to setup a Data Guard between two sites that have no connections except mono-directional SSH. The SSH tunnels allow SQL*Net communication to a remote endpoint. CMAN allows to proxy through a single endpoint to multiple SCAN addresses.

However, do not forget about the ultimate goal that is to migrate your BUSINESS to the cloud, not just the database. Therefore, having a proper communication to the cloud with proper performance, architecture and security is crucial. Depending on your target Cloud database, Zero Downtime Migration or MV2ADB should be the correct and supported solutions.

Install and configure CMAN 19c in the Oracle Cloud, step by step

Installing and configuring CMAN is a trivial activity, but having the steps in one place is better than reinventing the wheel.

Prepare for the install

Download the Oracle Client 19.3.0.0 in the Oracle Database 19c download page.

Choose this one: LINUX.X64_193000_client.zip (64-bit) (1,134,912,540 bytes) , not the one named “LINUX.X64_193000_client_home.zip” because it is a preinstalled home that does not contain the CMAN tools.

Access the OCI Console and create a new Compute instance. The default  configuration is OK, just make sure that it is Oracle Linux 7 🙂

Do not forget to add your SSH Public Key to access the VM via SSH!

Access the VM using

Copy the Oracle Client zip in /tmp using your favorite scp program.

Install CMAN

Follow these steps to install CMAN:

 

Basic configuration

This will create a CMAN configuration named cman-test. Beware that it is very basic and insecure. Please read the CMAN documentation if you want something more secure or sophisticated.

The advantage of having the TNS_ADMIN outside the Oracle Home is that if you need to patch CMAN, you can do it out-of-place without the need to copy the configuration files somewhere else.

The advantage of using IFILE inside cman.ora, is that you can manage easily different CMAN configurations in the same host without editing directly cman.ora, with the risk of messing it up.

Preparing the start/stop script

Create a file /u01/app/oracle/scripts/cman_service.sh with this content:

This is at the same time ORACLE_HOME agnostic and configuration agnostic.

Make it executable:

and try to start CMAN:

Stop should work as well:

Add the service in systemctl

Open firewall ports

By default, new OL7 images use firewalld. Just open the port 1521 from the public zone:

 

Bonus: have a smart environment!

Ludo

How to install and access Oracle Weblogic 12.2 in the Oracle Cloud Infrastructure

I put here the steps required to install and access Weblogic in the OCI (mostly for me in case I need to do it again 😉 ). The assumptions are:

  • you already have an account for the Oracle Cloud Infrastructure and you can access the OCI console
  • you already have a Compartment with a VCN and a subnet configured (for test purposes, a VCN created with the default values will be just fine)
  • you already have a keypair for your SSH client (id_rsa, id_rsa.pub)
  • you have an X server on your laptop (if you have Windows, I recommend MobaXTerm, but Xming or other servers are just fine)

Create the compute instance

  • Menu -> Core Infrastructure -> Compute -> Instances -> Create Instance
  • Choose a name for the Instance, all the other fields defaults are fine for test (Oracle Linux 7.6, VM.Standard2.1, etc.)
  • Paste your SSH public key
  • Optionally, under advanced/network, specify a different name for the VM
  • Click on Create to complete the creation

At some point you will have an instance “Green” ready to access:

Click on it and get the public address:

Using your SSH keypair, you can now access the instance with:

 

Setup sshd for SSH tunneling and X11 forwarding

Edit as root the sshd_config:

Modify it so that the following lines are present with these values:

Those values are required for X11 forwarding (required for the graphical installation) and for SSH tunneling (required to access  the Weblogic ports without exposing them over internet).

Then restart sshd:

Install the packages for X11 

At this point, it should be possible to forward X11. You can test by reconnecting with:

and then:

Create the oracle user

At this point, you can reconnect using oracle directly, so X11 forward will work for the oracle user without any additional setup:

 

Follow the canonical steps to install weblogic

If you do not know how to do that, follow this good tutorial by Tim Hall (oracle-base):

Oracle WebLogic Server (WLS) 12cR2 (12.2.1) Installation on Oracle Linux 6 and 7

 

Access the Weblogic console from outside Oracle Cloud

If you configured correctly sshd, once the Oracle Weblogic instance is configured and started, you can tunnel to the port (it should be 7001):

And be able to browse from your laptop using localhost:7001:

HTH

Ludovico

First draft of a Common Oracle Environment… for the Cloud Database (and not only)

I have just published on GitHub a draft of a common Oracle environment scripts that make the shell environment a little bit smarter than what it is by default. It uses some function and aliases that I have published during the past years.

You can start playing with:

Ideal for the Oracle Cloud Infrastructure

If you are new to the Oracle Cloud, probably you do not have environment scripts that makes it easy to interact with the database.

The environment scripts that I have published work out-of the box in the cloud (just make sure that you have rlwrap installed so that you can have a better CLI experience).

Actually, they work great as well on-premises, but I assume that you already have something automatic there.

Some examples

  • My famous Smart Prompt 😉 (including version, edition, exit code, etc)

  • u : gets the status of the databases

  • pmon: just displays the running pmon processes

  • db : sets the environment for a specific DB_NAME, DB_UNIQUE_NAME or SID

  • svcstat : shows the running services (and the corresponding pdb, host, etc) as I described in my previous post

  • s_ : smart alias for sqlplus: connects as sysdba/sysasm by default, or with any arguments that you pass:

  • adr_, dg_ rman_, cm_, lsn_ : aliases for common oracle binaries
  • genpasswd : generates random passwords (default length 30)

  • lsoh: lists the Oracle Homes attached to the inventory

  • setoh: sets the Oracle Home given its name in the inventory

 

You might want to install the same environment for oracle, grid (if you have role separation, it should be the case for Cloud DB Systems) and (eventually) root.

I am curious to know if it works well for your environment.

Cheers

Ludo

Oracle Clusterware Services Status at a glance, fast!

If you use Oracle Clusterware or you deploy your databases to the Oracle Cloud, you probably have some application services defined with srvctl for your database.

If you have many databases, services and nodes, it might be annoying, when doing maintenance or service relocation, to have a quick overview about how services are distributed across the nodes and what’s their status.

With srvctl (the official tool for that), it is a per-database operation:

If you have many databases, you have to run db by db.

It is also slow! For example, this database has 20 services. Getting the status takes 27 seconds:

Instead of operating row-by-row (get the status for each service), why not relying on the cluster resources with crsctl and get the big picture once?

crsctl stat res -f  returns a list of ATTRIBUTE_NAME=value for each service, eventually more than one if the service is not singleton/single instance  but uniform/multi instance.

By parsing them with some awk code can provide nice results!

STATE, INTERNAL_STATE and TARGET are useful in this case and might be used to display colours as well.

  • Green: Status ONLINE, Target ONLINE, STABLE
  • Black: Status OFFLINE, Target OFFLNE, STABLE
  • Red: Status ONLINE, Target OFFLINE, STABLE
  • Yellow: all other cases

Here’s the code:

Here’s what you can expect, for 92 services distributed on 4 nodes and a dozen of databases (the output is snipped and the names are masked):

I’d be curious to know if it works well for your environment, please comment here. 🙂

Thanks

Ludo

Oracle Grid Infrastructure 18c patching part 3: Executing out-of-place patching with the local-mode automaton

I wish I had more time to blog in the recent weeks. Sorry for the delay in this blog series 🙂

If you have not read the two previous blog posts, please do it now. I suppose here that you have the Independent Local-Mode Automaton already enabled.

What does the Independent Local-mode Automaton?

The automaton automates the process of moving the active Grid Infrastructure Oracle Home from the current one to a new one. The new one can be either at a higher patch level or at a lower one. Of course, you will probably want to patch your grid infrastructure, going then to a higher level of patching.

Preparing the new Grid Infrastructure Oracle Home

The GI home, starting from 12.2, is just a zip that is extracted directly in the new Oracle Home. In this blog post I suppose that you want to patch your Grid Infrastructure from an existing 18.3 to a brand new 18.4 (18.5 will be released very soon).

So, if your current OH is /u01/app/grid/crs1830, you might want to prepare the new home in /u01/app/grid/crs1840 by unzipping the software and then patching using the steps described here.

If you already have a golden image with the correct version, you can unzip it directly.

Beware of four important things: 

  1. You have to register the new Oracle home in the Central Inventory using the SW_ONLY install, as  described here.
  2. You must do it for all the nodes in the cluster prior to upgrading
  3. The response file must contain the same groups (DBA, OPER, etc) as the current active Home, otherwise errors will appear.
  4. You must relink by hand your Oracle binaries with the RAC option:
    $ cd /u01/app/grid/1crs1840/rdbms/lib
    $ make -f ins_rdbms.mk rac_on ioracle

In fact, after every attach to the central inventory the binaries are relinked without RAC option, so it is important to activate RAC again to avoid bad problems when upgrading the ASM with the new Automaton.

Executing the move gihome

If everything is correct, you should have now the current and new Oracle Homes, correctly registered in the Central Inventory, with the RAC option activated.

You can now do a first eval to check if everything looks good:

My personal suggestion at least at your first experiences with the automaton, is to move the Oracle Home on one node at a time. This way, YOU control the relocation of the services and resources before doing the actual move operation.

Here is the execution for the first node:

From this output you can see at line 15 that the cluster status is NORMAL, then the cluster is stopped on node 1 (lines 16 to 100), then the active version is modified in the oracle-ohasd.service file (line 101), then started back with the new version (lines 102 to 171). The cluster status now is ROLLING PATCH (line 172). The TFA and the node list are updated. 

Before continuing with the other(s) node(s), make sure that all the resources are up & running:

You might want as well to relocate manually your resources back to node 1 prior to continuing on node 2.

After that, node 2 can follow the very same procedure:

As you can see, there are two differencse here: the second node was in this case the last one, so the cluster status gets back to NORMAL, and the GIMR is patched with datapatch (lines 176-227).

At this point, the cluster has been patched. After some testing, you can safely remove the inactive version of Grid Infrastructure using the deinstall binary ($OLD_OH/deinstall/deinstall).

Quite easy, huh?

If you combine the Independent Local-mode Automaton with a home-developed solution for the creation and the provisioning of Grid Infrastructure Golden Images, you can easily achieve automated Grid Infrastructure patching of a big, multi-cluster environment.

Of course, Fleet Patching and Provisioning remains the Rolls-Royce: if you can afford it, GI patching and much more is completely automated and developed by Oracle, so you will have no headaches when new versions are released. But the local-mode automaton might be enough for your needs.

— 

Ludo

Oracle Grid Infrastructure 18c patching part 2: Independent Local-mode Automaton architecture and activation

The first important step before starting using the new Independent Local-mode Automaton is understanding which are its components inside a cluster.

Resources

Here’s the list of service that you will find when you install a Grid Infrastructure 18c:

As you can see, there are 4 components that are OFFLINE by default:

Three local resources (that are present on each node):

  • ora.MGMT.GHCHKPT.advm
  • ora.mgmt.ghchkpt.acfs
  • ora.helper

One cluster resource (active on only one server at a time, it can relocate):

  • ora.rhpserver

If you have ever worked with 12c Rapid Home Provisioning, those name should sound familiar.

The GHCHKPT filesystem (ant its relative volume), is used to store some data regarding the ongoing operations across the cluster during the GI home move.

The ora.helper is the process that actually does the operations. It is local because each node needs it to execute some actions at some point.

The rhpserver is the server process that coordinates the operations and delegates them to the helpers.

All those services compose the independent local-mode automaton, that is the default deployment. The full RHP framework (RHP Server and RHP Client) might be configured instead with some additional work.

Important note: Just a few weeks ago Oracle changed the name of Rapid Home Provisioning (RHP) to Fleet Patching and Provisioning (FPP). The name is definitely more appealing now, but it generates again some confusion about product names and acronyms, so beware that in this series sometimes I refer to RHP, sometimes to FPP, but actually it is the same thing.

Tomcat?

You might have noticed that tomcat is deployed now in the GI home, as there are patches specific to it (here I paste the 18.4 version):

 

Indeed Tomcat is registered in the inventory and patched just like any other product inside the OH:

 

Out of the box, Tomcat is used for the Quality of Services Management (ora.qosmserver resource):

But it is used for the Independent Local Mode Automaton as well, when it is started.

Enabling and starting the independent local-mode automaton

The resources are started using the following commands (as root, the order is quite important):

Before continuing with the rhpserver resource, you might want to check if the filesystem is mounted:

Now the rhpserver should start without problems, as oracle:

Please note that if you omit to activate the filesystem first, the rhpserver will fail to start.

As you can see, now both rhpserver and the helper are online:

Now all is set to start using it!

We’ll see how to use it in the next posts.

Ludo

 

Oracle Grid Infrastructure 18c patching part 1: Some history

Down the memory lane

Although sometimes I think I have been working with Oracle Grid Infrastructure since it exists, sometimes my memory does not work well. I still like to go through the Oracle RAC family history from time to time:

  • 8i -> no Oracle cluster did exist. RAC was leveraging 3rd party clusters (like Tru Cluster, AIX HACMP, Sun Cluster)…
  • 9i -> if I remember well, Oracle hired some developers of Tru Cluster after the acquisition of Compaq by HP. Oracle CRS was born and was quite similar to Tru Cluster. (The commands were almost the same: crs_stat instead of caa_stat, etc)
  • 10g -> Oracle re-branded CRS to Clusterware
  • 11g -> With the addition of ASM (and other components), Oracle created the concept of “Grid Infrastructure”, composed by Clusterware and additional products. All the new versions still use the name Grid Infrastructure and new products have been added through the years (ACFS, RHP, QoS …)

But I have missing souvenirs. For example, I cannot remember having ever upgraded an Oracle Cluster from 9i to 10g or from 10g to 11g. At that time I was working for several customers, and every new release was installed on new Hardware.

My first, real upgrade (as far as I can remember) was from 11gR2 to 12c, where the upgrade process was a nice, OUI-driven, out-of-place install.

The process was (still is 🙂 ) nice and smooth:

  • The installer copies, prepares and links the binaries on all the nodes in a new Oracle Home
  • The upgrade process is rolling: the first node puts the cluster in upgrade mode
  • The last node does the final steps and exists the cluster from the upgrade mode.

This is about Upgrading to a new release. But what about patching?

In-place patching

Patching of Grid Infrastructure has always been in-place and, I will not hide it, quite painful.

If you wanted to patch a Grid Infrastructure before release 12cR2, you had to:

  • read the documentation carefully and check for possible conflicts
  • backup the Grid Home
  • copy the patch on the host
  • evacuate all the services and databases from the cluster node that you want to patch
  • patch the binaries (depending on the versions and patches, this might be easy with opatchauto or quite painful with manual unlocking/locking and manual opatch steps)
  • restart/relocate the services back on the node
  • repeat the tasks for every node

The disadvantages of in-place patching are many:

  • Need to stage the patch on every node
  • Need to repeat the patching process for every node
  • No easy rollback (some bad problems might lead to deconfiguring the cluster from one node and then adding it back to the cluster)

Out-of-place patching

Out-of-place patching is proven to be much a better solution. I am doing it regularly since a while for Oracle Database homes and I am very satisfied with it. I am implementing it at CERN as well, and it will unlock new levels of server consolidation 🙂

I have written a blog series here, and presented about it a few times.

But out-of-place patching for Grid Infrastructure is VERY recent.

12cR2: opatchauto 

Oracle 12cR2 introduced out-of-place patching as a new feature of opatchauto.

This MOS document explains it quite in detail:

Grid Infrastructure Out of Place ( OOP ) Patching using opatchauto (Doc ID 2419319.1)

The process is the following:

  • a preparation process clones the active Oracle Home on the current node and patches it
  • a switch process switches the active Oracle Home from the old one to the prepared clone
  • those two phases are repeated for each node

12cr2-oop

The good thing is that the preparation can be done in advance on all the nodes and the switch can be triggered only if all the clones are patched successfully.

However, the staging of the patch, the cloning and patching must still happen on every node, making the concept of golden images quite useless for patching.

It is worth to mention, at this point, that Grid Infrastructure Golden Images ARE A THING, and that they have been introduced by Rapid Home Provisioning release 12cR2, where cluster automatic provisioning has been included as a new feature.

This Grid Infrastructure golden images have already been mentioned here and here.

I have discussed about Rapid Home provisioning itself here, but I will ad a couple of thoughts in the next paragraph.

18c and the brand new Independent local-mode Automaton

I have been early tester of the Rapid Home Provisioning product, when it has been released with Oracle 12.1.0.2. I have presented about it at UKOUG and as a RAC SIG webinar.
https://www.youtube.com/watch?v=vaB4RWjYPq0
http://www.ludovicocaldara.net/dba/rhp-presentation/

I liked the product A LOT, despite a few bugs due to the initial release. The concept of out-of-placing patching that RHP uses is the best one, in my opinion, to cope with frequent patches and upgrades.

Now, with Oracle 18c, the Rapid Home Provisioning Independent Local-mode Automaton comes to play. There is not that much documentation about it, even in the Oracle documentation, but a few things are clear:

  • The Independent local-mode automaton comes without additional licenses as it is not part of the RHP Server/Client infrastructure
  • It is 100% local to the cluster where it is used
  • Its main “job” is to allow moving Grid Infrastructure Homes from a non-patched version to an out-of-place patched one.

I will not disclore more here, as the rest of this blog series is focused on this new product 🙂

Stay tuned for details, examples and feedback from its usage at CERN 😉

Ludo

Port conflict with “Oracle Remote Method Invocation (ORMI)” during Grid Infrastructure install

After years of installing Grid Infrastructures, today I have got for the first time an error on something new:

Looking at the logs (which I do not have now as I removed them as part of the failed install cleanup 🙁 ), the error is generated by the cluster verification utility (CVU) on this check:

The components verified by the CVU can be found inside $ORACLE_HOME/cv/cvdata/. In my case, precisely:

This check is critical, so the install fails.

In my case the port was used by mcollectived.

The port has been taken dynamically, and previous runs of CVU did not encounter the problem.

A rare port conflict that might happen when configuring GI 🙂

Ludo