Oracle Grid Infrastructure 18c patching part 1: Some history

Down the memory lane

Although sometimes I think I have been working with Oracle Grid Infrastructure since it exists, sometimes my memory does not work well. I still like to go through the Oracle RAC family history from time to time:

  • 8i -> no Oracle cluster did exist. RAC was leveraging 3rd party clusters (like Tru Cluster, AIX HACMP, Sun Cluster)…
  • 9i -> if I remember well, Oracle hired some developers of Tru Cluster after the acquisition of Compaq by HP. Oracle CRS was born and was quite similar to Tru Cluster. (The commands were almost the same: crs_stat instead of caa_stat, etc)
  • 10g -> Oracle re-branded CRS to Clusterware
  • 11g -> With the addition of ASM (and other components), Oracle created the concept of “Grid Infrastructure”, composed by Clusterware and additional products. All the new versions still use the name Grid Infrastructure and new products have been added through the years (ACFS, RHP, QoS …)

But I have missing souvenirs. For example, I cannot remember having ever upgraded an Oracle Cluster from 9i to 10g or from 10g to 11g. At that time I was working for several customers, and every new release was installed on new Hardware.

My first, real upgrade (as far as I can remember) was from 11gR2 to 12c, where the upgrade process was a nice, OUI-driven, out-of-place install.

The process was (still is 🙂 ) nice and smooth:

  • The installer copies, prepares and links the binaries on all the nodes in a new Oracle Home
  • The upgrade process is rolling: the first node puts the cluster in upgrade mode
  • The last node does the final steps and exists the cluster from the upgrade mode.

This is about Upgrading to a new release. But what about patching?

In-place patching

Patching of Grid Infrastructure has always been in-place and, I will not hide it, quite painful.

If you wanted to patch a Grid Infrastructure before release 12cR2, you had to:

  • read the documentation carefully and check for possible conflicts
  • backup the Grid Home
  • copy the patch on the host
  • evacuate all the services and databases from the cluster node that you want to patch
  • patch the binaries (depending on the versions and patches, this might be easy with opatchauto or quite painful with manual unlocking/locking and manual opatch steps)
  • restart/relocate the services back on the node
  • repeat the tasks for every node

The disadvantages of in-place patching are many:

  • Need to stage the patch on every node
  • Need to repeat the patching process for every node
  • No easy rollback (some bad problems might lead to deconfiguring the cluster from one node and then adding it back to the cluster)

Out-of-place patching

Out-of-place patching is proven to be much a better solution. I am doing it regularly since a while for Oracle Database homes and I am very satisfied with it. I am implementing it at CERN as well, and it will unlock new levels of server consolidation 🙂

I have written a blog series here, and presented about it a few times.

But out-of-place patching for Grid Infrastructure is VERY recent.

12cR2: opatchauto 

Oracle 12cR2 introduced out-of-place patching as a new feature of opatchauto.

This MOS document explains it quite in detail:

Grid Infrastructure Out of Place ( OOP ) Patching using opatchauto (Doc ID 2419319.1)

The process is the following:

  • a preparation process clones the active Oracle Home on the current node and patches it
  • a switch process switches the active Oracle Home from the old one to the prepared clone
  • those two phases are repeated for each node

12cr2-oop

The good thing is that the preparation can be done in advance on all the nodes and the switch can be triggered only if all the clones are patched successfully.

However, the staging of the patch, the cloning and patching must still happen on every node, making the concept of golden images quite useless for patching.

It is worth to mention, at this point, that Grid Infrastructure Golden Images ARE A THING, and that they have been introduced by Rapid Home Provisioning release 12cR2, where cluster automatic provisioning has been included as a new feature.

This Grid Infrastructure golden images have already been mentioned here and here.

I have discussed about Rapid Home provisioning itself here, but I will ad a couple of thoughts in the next paragraph.

18c and the brand new Independent local-mode Automaton

I have been early tester of the Rapid Home Provisioning product, when it has been released with Oracle 12.1.0.2. I have presented about it at UKOUG and as a RAC SIG webinar.
https://www.youtube.com/watch?v=vaB4RWjYPq0
http://www.ludovicocaldara.net/dba/rhp-presentation/

I liked the product A LOT, despite a few bugs due to the initial release. The concept of out-of-placing patching that RHP uses is the best one, in my opinion, to cope with frequent patches and upgrades.

Now, with Oracle 18c, the Rapid Home Provisioning Independent Local-mode Automaton comes to play. There is not that much documentation about it, even in the Oracle documentation, but a few things are clear:

  • The Independent local-mode automaton comes without additional licenses as it is not part of the RHP Server/Client infrastructure
  • It is 100% local to the cluster where it is used
  • Its main “job” is to allow moving Grid Infrastructure Homes from a non-patched version to an out-of-place patched one.

I will not disclore more here, as the rest of this blog series is focused on this new product 🙂

Stay tuned for details, examples and feedback from its usage at CERN 😉

Ludo

Port conflict with “Oracle Remote Method Invocation (ORMI)” during Grid Infrastructure install

After years of installing Grid Infrastructures, today I have got for the first time an error on something new:

Looking at the logs (which I do not have now as I removed them as part of the failed install cleanup 🙁 ), the error is generated by the cluster verification utility (CVU) on this check:

The components verified by the CVU can be found inside $ORACLE_HOME/cv/cvdata/. In my case, precisely:

This check is critical, so the install fails.

In my case the port was used by mcollectived.

The port has been taken dynamically, and previous runs of CVU did not encounter the problem.

A rare port conflict that might happen when configuring GI 🙂

Ludo

Grid Infrastructure 18c: changes in gridSetup.sh -applyRU and -createGoldImage

Starting with release 12cR2, Grid Infrastructure binaries are no more shipped as an installer, but as a zip file that is uncompressed directly in the Oracle Home path.
This opened a few new possibilities including patching the software before the Grid Infrastructure configuration.
My former colleague Markus Flechtner wrote an excellent blog post about it, here: https://www.markusdba.net/?p=294

Now, with 18c, there are a couple of things that changed comparing to Markus blog.

The -applyRU switch replaces the -applyPSU

While it is possible to apply several sub-patches of a PSU one by one:

it was possible to do all at once with:

Now the switch is called, for consistency with the patch naming, -applyRU.

E.g.:

Still there are no options to avoid the run of the Setup Wizard, but it is safe to ignore the error as the patch has been applied successfully.

The -createGoldImage does not work anymore if the Home is not attached

I have tried to create the golden image as per Markus post, but I get this error:

To workaround the issue, there are two ways:

  1. Create a zip file manually, as all the content needed to install the patched version is right there. No need to touch anything as the software is not configured yet.
  2. Configure the software with CRS_SWONLY before creating the gold image:

 

HTH

Ludo

Converting SQL*Plus calls in shell scripts to ORDS calls

I develop a lot of shell scripts. I would not define myself an old dinosaur that keeps avoiding python or other modern languages. It is just that most of my scripts automate OS commands that I would normally run interactively in an interactive shell… tar, cp, expdp, rman, dgmgrl, etc… and of course, some SQL*Plus executions.

For database calls, the shell is not appropriate: no drivers, no connection, no statement, no resultset… that’s why I need to make SQL*Plus executions (with some hacks to make them work correctly), and that’s also why I normally use python or perl for data-related tasks.

Using SQL*Plus in shell scripts

For SQL*Plus executions within a shell scripts there are some hacks, as I have said, that allow to get the data correctly.

As example, let’s use this table (that you might have found in my recent posts):

In order to get, as example, the result of this query:

and assign the values to some variables (in a shell loop), it is common to do something like this:

As you can see, there are several hacks:

  • The credentials must be defined somewhere (I recommend putting them in a wallet)
  • All the output goes in a variable (or looping directly)
  • SQL*Plus formatting can be a problem (both sqlplus settings and concatenating fields)
  • Loop and get, for each line, the variables (using awk in my case)

It is not rock solid (unexpected data might compromise the results) and there are dependencies (sqlplus binary, credentials, etc.). But for many simple tasks, that’s more than enough.

Here’s the output:

 

Using ORDS instead

Recently I have come across a situation where I had no Oracle binaries but needed to get some data from a table. That is often a situation where I use python or perl, but even in these cases, I need compatible software and drivers!

So I used ORDS instead (that by chance, was already configured for the databases I wanted to query), and used curl and jq to get the data in the shell script.

First, I have defined the service in the database:

At this point, a direct call gives this:

How to parse the data?

jq is a command-line JSON processor that can be used in a pipeline.

I can get the items:

And I can produce a csv output:

But the best, is the shell formatter, that returns strings properly escaped for usage in shell commands:

At this point, the call to eval is a natural step 🙂

The output:

😉

Ludovico

Setting Grid Infrastructure 18c Oracle Home name during the install

A colleague has been struggling for some time in order to get the correct Oracle Home name for the Grid Infrastructure18.3.0 when running gridSetup.sh.

In the graphical Oracle Universal Installer there is no way (as far as we could find) to set the Home name. Moreover, it was our intention to automate the install of Grid Infrastructure.

The complete responsefile ($OH/inventory/response/oracle.crs_Complete.rsp) contains the parameter:

However, when using a responsefile with such parameter, gridSetup.sh fails with the error:

After some tries (and a SR), this happens to actually work:

  • strip the ORACLE_HOME_NAME parameter from the responsefile
  • pass it as a double-quoted parameter at the end of the gridSetup.sh command line

HTH

Oracle Database 18c and version numbers

The Oracle New Release Model is very young, and thus suffers of some small inconsistencies in the release naming.
Oracle already announced that 18c was a renaming of what was intended to be 12.2.0.2 in the original roadmap.
I though that 19c would have been 12.2.0.3, but now I have some doubts when looking at the local inventory contents.

I am consistently using my functions lsoh and setoh, as described in my posts:

Getting the Oracle Homes in a server from the oraInventory

and:

Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions

What I do, basically, is to get the list of attached Oracle Homes from the Central Inventory, and then get some details (like version and edition) from the local inventory of each Oracle Home.

But now that Oracle 18.3 is out, my function shows release 18.0.0.0.0 when I try to get it in the previous way.

The fact is that prior to 18c, the component version was showing the actual version (without patches):

but now, it shows the “base release” 18.0.0.0.0, whereas the ACT_INST_VER property shows the “Active” version:

You can see that ACT_INST_VER is 12.2.0.4.0! does it indicate that 18.3 was planned to be 12.2.0.4?

like …

12.2.0.2 -> 18.1

12.2.0.3 -> 18.2

12.2.0.4 -> 18.3

?

This is in contrast with MOS Doc ID 230.1 that states that 18c was a “sort of” 12.2.0.2, so probably I get it wrong.

My first reflex has been to search, in the local inventory, where the string 18.3.0  was written down, but with my surprise, it is just a description, not a “real value”:

Again, the ACT_INST_VER property reports 12.2.0.4.0.

So, where can we extract the version we would expect (18.3.0.0.0)?

Oracle 18c provides a new binary oraversion that gives us this information:

Note that 18.3.0.0.0 differs from the description:

which is, as far as I understand, just a bad way to use the old notation to give the idea of the release date of such release update.

Also note that baseVersion has always the format <MAJOR>.0.0.0.0.

In the future I expect that ACT_INST_VER will be consistent with the compositeVersion, but I cannot be sure.

Ludo

 

 

Oracle Home Management – part 7: Putting all together

Last part of the blog series… let’s see how to put everything together and have a single script that creates and provisions Oracle Home golden images:

Review of the points

The scripts will:

  • let create a golden image based on the current Oracle Home
  • save the golden image metadata into a repository (an Oracle schema somewhere)
  • list the avilable golden images and display whether they are already deployed on the current host
  • let provision an image locally (pull, not push), either with the default name or a new name

Todo:

  • Run as root in order to run root.sh automatically (or let specify the sudo command or a root password)
  • Manage Grid Infrastructure homes

Assumptions

  • There is an available Oracle schema where the golden image metadata will be stored
  • There is an available NFS share that contains the working copies and golden images
  • Some variables must be set accordingly to the environment in the script
  • The function setoh is defined in the environment (it might be copied inside the script)
  • The Instant Client is installed and “setoh ic” correctly sets its environment. This is required because there might be no sqlplus binaries available at the very first deploy
  • Oracle Home name and path’s basename are equal for all the Oracle Homes

Repository table

First we need a metadata table. Let’s keep it as simple as possible:

Helpers

The script has some functions that check stuff inside the central inventory.

e.g.

checks if a specific Oracle Home (name) is present in the central inventory. It is helpful to check, for every golden image in the matadata repository, if it is already provisioned or not:

Variables

Some variables must be changed, but in general you might want to adapt the whole script to fit your needs.

Image creation

The image creation would be as easy as creating a zip file, but there are some files that we do not want to include in the golden image, therefore we need to create a staging directory (working copy) to clean up everything:

Home provisioning

Home provisioning requires, beside some checks, a runInstaller -clone command, eventually a relink, eventually a setasmgid, eventually some other tasks, but definitely  run root.sh. This last task is not automated yet in my deployment script.

 

Usage

Examples

List installed homes:

Create a golden image 12_1_0_2_BP170718 from the Oracle Home named OraDB12Home2 (tha latter having been installed manually without naming convention):

List the new golden image from the metadata repository:

Reinstalling the same home with the new naming convention:

Installing the same home in a new path for manual patching from 170718 to 180116:

New home situation:

Patch manually the home named  12_1_0_2_BP180116 with the January bundle patch:

Create the new golden image from the home patched with January bundle patch:

Full source code

Full source code of ohctl

I hope you find it useful! The cool thing is that once you have the golden images ready in the golden image repository, then the provisioning to all the servers is striaghtforward and requires just a couple of minutes, from nothing to a full working and patched Oracle Home.

Why applying the patch manually?

If you read everything carefully, I automated the golden image creation and provisioning, but the patching is still done manually.

The aim of this framework is not to patch all the Oracle Homes with the same patch, but to install the patch ONCE and then deploy the patched home everywhere. Because each patch has different conflicts, bugs, etc, it might be convenient to install it manually the first time and then forget it. At least this is my opinion 🙂

Of course, patch download, conflict detection, etc. can also be automated (and it is a good idea, if you have the time to implement it carefully and bullet-proof).

In the addendum blog post, I will show some scripts made by Hutchison Austria and why I find them really useful in this context.

Blog posts in this series:

Oracle Home Management – part 1: Patch soon, patch often vs. reality
Oracle Home Management – part 2: Common patching patterns
Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning
Oracle Home Management – part 4: Challenges and opportunities of the New Release Model
Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions
Oracle Home Management – part 6: Simple Golden Image blueprint
Oracle Home Management – part 7: Putting all together
Oracle Home Management – Addendum: Managing and controlling the patch level (berx’s work)

Oracle Home Management – part 6: Simple Golden Image Blueprint

As I explained in the previous blog posts, from a manageability perspective, you should not change the patch level of a deployed Oracle Home, but rather install and patch a new Oracle Home.

With the same principle, Oracle Homes deployed on different hosts should have an identical patch level for the same name. For example, an Oracle Home /u01/app/oracle/product/EE12_1_0_2_BP171018 should have the same patch level on all the servers.

To guarantee the same binaries and patch levels everywhere, the simple solution that I am shoing in this series is to store copies of the Oracle Homes somewhere and use them as golden images. (Another approach, really different and cool, is used by Ilmar Kerm: he explains it here https://ilmarkerm.eu/blog/2018/05/oracle-home-management-using-ansible/ )

For this, we will use a Golden Image store (that could be a NFS share mounted on the Oracle Database servers, or a remote host accessible with scp, or other) and a metadata store.

golden-image-storeWhen all the software is deployed from golden images, there is the guarantee that all the Homes are equal; therefore the information about patches and bugfixes might be centralized in one place (golden image metadata).

A typical Oracle Home lifecycle:

  • Install the software manually the first time
  • Create automatically a golden image from the Oracle Home
  • Deploy automatically the golden image on the other servers

When a new patch is needed:

  • Deploy automatically the golden image to a new Oracle Home
  • Patch manually (or automatically!) the new Oracle Home
  • Create automatically the new golden image with the new name
  • Deploy automatically the new golden image to the other servers

The script that automates this lifecycle does just two main actions:

  • Automates the creation of a new golden image
  • Deploys an existing image to an Oracle Home (either with a new path or the default one)
  • (optional: uninstall an existing Home)

Let’s make a graphical example of the previously described steps:

oh-mgmt-lifecycleHere, the script ohctl takes two actions: -c (creates a Golden Image) and -i (installs a Golden Image)).

The create action does the following steps:

  • Copies the content to a working directory
  • Cleans up logs, audits, etc.
  • Creates the zip file
  • Stores the zip file in a shared NFS repository
  • Inserts the metadata of the new golden image in a repository

The install action does the following steps:

  • Checks if the image is already deployed (plus other security checks)
  • Creates the new path based on the name of the image or the new name passed as argument
  • Unzips the content in the new Oracle Home
  • Runs the runInstaller –clone to attach the home in the central inventory and (optionally) set a new Home name
  • (optionally) Relinks the oracle binary with the RAC option
  • Run setasmgid if found
  • Other environment-specific tasks (e.g. dealing with TNS_ADMIN links)

By following this pattern, Oracle Home names and paths are clean and the same everywhere. This facilitates the deployment and the patching.

You can find the Oracle Home cloning steps in the Oracle Database documentation:

Cloning an Oracle Home

In the next blog post I will explain parts of the ohctl source code and give some examples of how I use it (and publish a link to the full source code 🙂 )

 

Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions

Having the capability of managing multiple Oracle Homes is fundamental for the following reasons:

  • Out-of-place patching: cloning and patching a new Oracle Home usually takes less downtime than stopping the DBs and patching in-place
  • Better control of downtime windows: if the databases are consolidated on a single server, having multiple Oracle Homes allows moving and patching one database at a time instead of stopping everything and doing a “big bang” patch.

Make sure that you have a good set of scripts that help you to switch correctly from one environment to the other one. Personally, I recommend TVD-BasEnv, as it is very powerful and supports OFA and non-OFA environments, but for this blog series I will show my personal approach.

Get your Home information from the Inventory!

I wrote a blog post sometimes ago that shows how to get the Oracle Homes from the Central Inventory (Using Bash, OK, not the right tool to query XML files, but you get the idea):

Getting the Oracle Homes in a server from the oraInventory

With the same approach, you can have a script to SET your environment:

It uses a different approach from the oraenv script privided by Oracle, where you set the environment based on the ORACLE_SID variable and getting the information from the oratab. My setoh function gets the Oracle Home name as input. Although you can convert it easily to set the environment for a specific ORACLE_SID, there are some reason why I like it:

  • You can set the environment for an Oracle Home that it is not associated to any database (yet)
  • You can set the environment for an upgrade to a new release without changing (yet) the oratab
  • It works for OMS, Grid and Agent homes as well…
  • Most important, it will let you specify correctly the environment when you need to use a  fresh install (for patching it as well)

So, this is how it works:

In the previous example, there are two Database homes that have been installed without a specific naming convention (OraDb11g_home1, OraDB12Home1) and two that follow a specific one (12_1_0_2_BP170718_RON, 12_1_0_2_BP180116_OCW).

Naming conventions play an important role

If you want to achieve an effective Oracle Home management, it is important that you have everywhere the same ORACLE_HOME paths, names and patch levels.

The Oracle Home path should not include only the release number:

If we have many Oracle Homes with the same release, how shall we call the other ones? There are several variables that might influence the naming convention:

Edition (EE, SE), RAC Option or other options, the patch type (formerly PSU, BP: now RU and RUR), eventual additional one-off patches.

Some ideas might be:

The new release model will facilitate a lot the definition of a naming convention as we will have names like:

Of course, the naming convention is not universal and can be adapted depending on the customer (e.g., if you have only Enterprise Editions you might omit this information).

Replacing dots with underscores?

You will see, at the end of the series, that I use Oracle Home paths with underscores instead of dots:

Why?

From a naming perspective, there is no need to have the Home that corresponds to the release number. Release, version and product information can be collected through the inventory.

What is really important is to have good naming conventions and good manageability. In my ideal world, the Oracle Home name inside the central inventory and the basename of the Oracle Home path are the same: this facilitates tremendously the scripting of the Oracle Home provisioning.

Sadly, the Oracle Home name cannot contain dots, it is a limitation of the Oracle Inventory, here’s why I replaced them with underscores.

In the next blog post, I will show how to plan a framework for automated Oracle Home provisioning.

 

Oracle Home Management – part 4: Challenges and Opportunities of the New Release Model

Starting with the upcoming next release (18c), the Oracle Database will be a yearly release. (18c, 19c, etc). New yearly releases will contain only new features ready to go, and eventually some new features for performance improvements (plus bug fixes and security fixes from the previous version.)

Quarterly, instead of Patch Set Updates (PSU) and Bundle Patches (BP), there will be the new Release Updates (RU). They will contain critical fixes, optimizer changes, minor functional enhancements, bug fixes, security fixes. The new Release Updates will be equivalent to what we have now with Bundle Patches.

The Release Updates will be released during the whole lifetime of the feature release, according to the roadmap (2 years or 5 years depending on whether the release is in Long Term Support (LTS) or not). There will be a Long Term Support release every few years. The first two will probably be Oracle 19c and Oracle 23c (I am deliberately supposing that the c will still be relevant 🙂 ).

Beside Release Updates, there will be the new Release Update Revisions (RUR), that according to what I have read until now, will be released “at least” quarterly. Release Update Revisions will contain only regression fixes for bugs introduced by RUs and new security fixes, very close to what we have now with Patch Set Updates.

Release Update Revisions will cover ONLY 6 months, after that it will be necessary to upgrade to a newer Release Update or to a newer major release. Oracle introduced this change to reduce the complexity of their release management.

This leads to a few important things:

  • There will be no more than two RURs for each RU (e.g. 18.2 will have only 18.2.1 and 18.2.2)
  • If applying a RUR, after 6 months at latest, the DBs must be patched to a greater level of RU.
  • Applying the second RUR of each RU (e.g. 18.2.2 -> 18.3.2 -> 18.4.2) is the most conservative approach whilst keeping up to date with the latest critical fixes.

On top of that, one-off patches will still exist. For more information,  please read the note Release Update Introduction and FAQ (Doc ID 2285040.1)

new-release-modelHow will the new release model impact the patching strategy?

It is clear that it will be complex to keep the same major upgrade frequency as today (I expect it to increase). There have been from 3 to 5 years between each major release so far, and switching to a yearly release is a big change.

But the numbering will be easier: 18.3.2 is much more readable/maintainable than 12.2.0.3.BP180719 and, despite it does not contain an explicit date, it keeps it easy to understand the “distance” with the latest release.

So we will have, on one side, the need to upgrade more frequently. But on the other side, the upgrades might be easier than how they are now. One thing is sure, however: we will deal with many more Oracle Homes with different patch levels.

The new release model will bring us a unique opportunity to reinvent our procedures and scripts for Oracle Home management, to achieve a standardized and automated way to solve common problems like:

  • Multiple Oracle Homes coexistence (environment, naming conventions)
  • Automated binaries setup (via golden images or other automatic provisioning)
  • Database patches
  • Database upgrades

In the next post, I will show my idea of how Oracle Homes could be managed (with either the current or the new release model), making their coexistence easier for the DBAs.

Bonus: calculating the distance between releases

For a given release YY.x.z, the distance from its first release is ( x + z -1 ) quarters.

E.g.18.3.2 will be ( 3 + 2 – 1 ) = 4 quarters after the initial release date.

Across versions, assuming that each yearly release will be released in the same quarter, the distance between versions YY1.x1.z1 and YY2.x2.z2  is:

( YY2 – YY1 ) * 4 + ( x2 + z2 ) – ( x1 + z1 ) quarters

E.g. : between 18.4.1 and 20.1.2 the distance will be:

( 20 – 18 ) * 4 + ( 1 + 2 ) – ( 4 + 1 ) = 6 quarters