DBA survival BLOG

DBA stuff and Oracle Data Guard

Oracle Home Management – part 6: Simple Golden Image Blueprint

Posted on May 28, 2018 by Ludovico

As I explained in the previous blog posts, from a manageability perspective, you should not change the patch level of a deployed Oracle Home, but rather install and patch a new Oracle Home.

With the same principle, Oracle Homes deployed on different hosts should have an identical patch level for the same name. For example, an Oracle Home /u01/app/oracle/product/EE12_1_0_2_BP171018 should have the same patch level on all the servers.

To guarantee the same binaries and patch levels everywhere, the simple solution that I am shoing in this series is to store copies of the Oracle Homes somewhere and use them as golden images. (Another approach, really different and cool, is used by Ilmar Kerm: he explains it here https://ilmarkerm.eu/blog/2018/05/oracle-home-management-using-ansible/ )

For this, we will use a Golden Image store (that could be a NFS share mounted on the Oracle Database servers, or a remote host accessible with scp, or other) and a metadata store.

When all the software is deployed from golden images, there is the guarantee that all the Homes are equal; therefore the information about patches and bugfixes might be centralized in one place (golden image metadata).

A typical Oracle Home lifecycle:

Install the software manually the first time
Create automatically a golden image from the Oracle Home
Deploy automatically the golden image on the other servers

When a new patch is needed:

Deploy automatically the golden image to a new Oracle Home
Patch manually (or automatically!) the new Oracle Home
Create automatically the new golden image with the new name
Deploy automatically the new golden image to the other servers

The script that automates this lifecycle does just two main actions:

Automates the creation of a new golden image
Deploys an existing image to an Oracle Home (either with a new path or the default one)
(optional: uninstall an existing Home)

Let’s make a graphical example of the previously described steps:

Here, the script ohctl takes two actions: -c (creates a Golden Image) and -i (installs a Golden Image)).

The create action does the following steps:

Copies the content to a working directory
Cleans up logs, audits, etc.
Creates the zip file
Stores the zip file in a shared NFS repository
Inserts the metadata of the new golden image in a repository

The install action does the following steps:

Checks if the image is already deployed (plus other security checks)
Creates the new path based on the name of the image or the new name passed as argument
Unzips the content in the new Oracle Home
Runs the runInstaller –clone to attach the home in the central inventory and (optionally) set a new Home name
(optionally) Relinks the oracle binary with the RAC option
Run setasmgid if found
Other environment-specific tasks (e.g. dealing with TNS_ADMIN links)

By following this pattern, Oracle Home names and paths are clean and the same everywhere. This facilitates the deployment and the patching.

You can find the Oracle Home cloning steps in the Oracle Database documentation:

Cloning an Oracle Home

In the next blog post I will explain parts of the ohctl source code and give some examples of how I use it (and publish a link to the full source code 🙂 )

Oracle Home Management – part 5: Oracle Home Inventory and Naming Conventions

Posted on May 20, 2018 by Ludovico

Having the capability of managing multiple Oracle Homes is fundamental for the following reasons:

Out-of-place patching: cloning and patching a new Oracle Home usually takes less downtime than stopping the DBs and patching in-place
Better control of downtime windows: if the databases are consolidated on a single server, having multiple Oracle Homes allows moving and patching one database at a time instead of stopping everything and doing a “big bang” patch.

Make sure that you have a good set of scripts that help you to switch correctly from one environment to the other one. Personally, I recommend TVD-BasEnv, as it is very powerful and supports OFA and non-OFA environments, but for this blog series I will show my personal approach.

Get your Home information from the Inventory!

I wrote a blog post sometimes ago that shows how to get the Oracle Homes from the Central Inventory (Using Bash, OK, not the right tool to query XML files, but you get the idea):

Getting the Oracle Homes in a server from the oraInventory

With the same approach, you can have a script to SET your environment:

setoh ()
{
    SEARCH=${1:-"_foo_"};
    if [ $SEARCH == "ic" ]; then
		# ic is a shortcut for the Instant Client...
        OH=/u01/app/oracle/sbin/instantclient_12_2
        export VERSION=12.2.0.1
        export ORACLE_HOME=$OH
        export LD_LIBRARY_PATH=$ORACLE_HOME
        export OH_NAME=instantclient_12_2
        export ORACLE_VERSION=$VERSION
        export PATH=$ORACLE_HOME:$DEFAULT_PATH
        echo ORACLE_SID = $ORACLE_SID
        echo ORACLE_VERSION = $ORACLE_VERSION
        echo ORACLE_HOME = $ORACLE_HOME
    else
        CENTRAL_ORAINV=`grep ^inventory_loc /etc/oraInst.loc | awk -F= '{print $2}'`;
        IFS='
';
        found=0;
        for line in `grep "<HOME NAME=" ${CENTRAL_ORAINV}/ContentsXML/inventory.xml 2>/dev/null`;
        do
            if [ $found -eq 1 ]; then
                continue;
            fi;
            unset ORACLE_VERSION;
            unset ORAEDITION;
            OH=`echo $line | tr ' ' '\n' | grep ^LOC= | awk -F\" '{print $2}'`;
            OH_NAME=`echo $line | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;
            if [ "$SEARCH" == "$OH_NAME" ]; then
                found=1;
                comp_file=$OH/inventory/ContentsXML/comps.xml;
                comp_xml=`grep "COMP NAME" $comp_file | head -1`;
                comp_name=`echo $comp_xml | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;
                comp_vers=`echo $comp_xml | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;
                case $comp_name in
                    "oracle.crs")
                        ORACLE_VERSION=$comp_vers;
                        ORAEDITION=GRID
                    ;;
                    "oracle.sysman.top.agent")
                        ORACLE_VERSION=$comp_vers;
                        ORAEDITION=AGT
                    ;;
                    "oracle.server")
                        ORACLE_VERSION=`grep "PATCH NAME=\"oracle.server\"" $comp_file 2>/dev/null | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;
                        ORAEDITION="DBMS";
                        if [ -z "$ORACLE_VERSION" ]; then
                            ORACLE_VERSION=$comp_vers;
                        fi;
                        ORAMAJOR=`echo $ORACLE_VERSION |  cut -d . -f 1`;
                        case $ORAMAJOR in
                            11 | 12)
                                ORAEDITION="DBMS "`grep "oracle_install_db_InstallType" $OH/inventory/globalvariables/oracle.server/globalvariables.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`
                            ;;
                            10)
                                ORAEDITION="DBMS "`grep "s_serverInstallType" $OH/inventory/Components21/oracle.server/*/context.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`
                            ;;
                        esac
                    ;;
                esac;
                export VERSION=$ORACLE_VERSION;
                export ORACLE_HOME=$OH;
                export LD_LIBRARY_PATH=$ORACLE_HOME/lib;
                export OH_NAME;
                export ORACLE_VERSION;
                export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$DEFAULT_PATH;
                echo ORACLE_SID = $ORACLE_SID;
                echo ORACLE_VERSION = $ORACLE_VERSION;
                echo ORACLE_HOME = $ORACLE_HOME;
                continue;
            fi;
        done;
        if [ $found -eq 0 ]; then
            echo "cannot find Oracle Home $1";
            false;
        else
            true;
        fi;
    fi
}

setoh ()

{

SEARCH=${1:-"_foo_"};

if [ $SEARCH == "ic" ]; then

# ic is a shortcut for the Instant Client...

OH=/u01/app/oracle/sbin/instantclient_12_2

export VERSION=12.2.0.1

export ORACLE_HOME=$OH

export LD_LIBRARY_PATH=$ORACLE_HOME

export OH_NAME=instantclient_12_2

export ORACLE_VERSION=$VERSION

export PATH=$ORACLE_HOME:$DEFAULT_PATH

echo ORACLE_SID = $ORACLE_SID

echo ORACLE_VERSION = $ORACLE_VERSION

echo ORACLE_HOME = $ORACLE_HOME

else

CENTRAL_ORAINV=`grep ^inventory_loc /etc/oraInst.loc | awk -F= '{print $2}'`;

IFS='

found=0;

for line in `grep "<HOME NAME=" ${CENTRAL_ORAINV}/ContentsXML/inventory.xml 2>/dev/null`;

if [ $found -eq 1 ]; then

continue;

fi;

unset ORACLE_VERSION;

unset ORAEDITION;

OH=`echo $line | tr ' ' '\n' | grep ^LOC= | awk -F\" '{print $2}'`;

OH_NAME=`echo $line | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;

if [ "$SEARCH" == "$OH_NAME" ]; then

found=1;

comp_file=$OH/inventory/ContentsXML/comps.xml;

comp_xml=`grep "COMP NAME" $comp_file | head -1`;

comp_name=`echo $comp_xml | tr ' ' '\n' | grep ^NAME= | awk -F\" '{print $2}'`;

comp_vers=`echo $comp_xml | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;

case $comp_name in

"oracle.crs")

ORACLE_VERSION=$comp_vers;

ORAEDITION=GRID

;;

"oracle.sysman.top.agent")

ORACLE_VERSION=$comp_vers;

ORAEDITION=AGT

;;

"oracle.server")

ORACLE_VERSION=`grep "PATCH NAME=\"oracle.server\"" $comp_file 2>/dev/null | tr ' ' '\n' | grep ^VER= | awk -F\" '{print $2}'`;

ORAEDITION="DBMS";

if [ -z "$ORACLE_VERSION" ]; then

ORACLE_VERSION=$comp_vers;

fi;

ORAMAJOR=`echo $ORACLE_VERSION | cut -d . -f 1`;

case $ORAMAJOR in

11 | 12)

ORAEDITION="DBMS "`grep "oracle_install_db_InstallType" $OH/inventory/globalvariables/oracle.server/globalvariables.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`

;;

10)

ORAEDITION="DBMS "`grep "s_serverInstallType" $OH/inventory/Components21/oracle.server/*/context.xml 2>/dev/null | tr ' ' '\n' | grep VALUE | awk -F\" '{print $2}'`

;;

esac

;;

esac;

export VERSION=$ORACLE_VERSION;

export ORACLE_HOME=$OH;

export LD_LIBRARY_PATH=$ORACLE_HOME/lib;

export OH_NAME;

export ORACLE_VERSION;

export PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$DEFAULT_PATH;

echo ORACLE_SID = $ORACLE_SID;

echo ORACLE_VERSION = $ORACLE_VERSION;

echo ORACLE_HOME = $ORACLE_HOME;

continue;

fi;

done;

if [ $found -eq 0 ]; then

echo "cannot find Oracle Home $1";

false;

else

true;

fi;

}

It uses a different approach from the oraenv script privided by Oracle, where you set the environment based on the ORACLE_SID variable and getting the information from the oratab. My setoh function gets the Oracle Home name as input. Although you can convert it easily to set the environment for a specific ORACLE_SID, there are some reason why I like it:

You can set the environment for an Oracle Home that it is not associated to any database (yet)
You can set the environment for an upgrade to a new release without changing (yet) the oratab
It works for OMS, Grid and Agent homes as well…
Most important, it will let you specify correctly the environment when you need to use a fresh install (for patching it as well)

So, this is how it works:

# [ oracle@myserver:/u01/app/oracle [11:23:18] [12.1.0.2.0 SID="not set"] 0 ] #
# lsoh

HOME                        LOCATION                                                VERSION      EDITION
--------------------------- ------------------------------------------------------- ------------ ---------
OraGI12Home1                /u01/app/grid/product/grid                              12.1.0.2.0   GRID
agent12c1                   /u01/app/oracle/product/agent12c/core/12.1.0.5.0        12.1.0.5.0   AGT
OraDb11g_home1              /u01/app/oracle/product/11.2.0.4                        11.2.0.4.0   DBMS EE
OraDB12Home1                /u01/app/oracle/product/12.1.0.2                        12.1.0.2.0   DBMS EE
12_1_0_2_BP170718_RON       /u01/app/oracle/product/12_1_0_2_BP170718_RON           12.1.0.2.0   DBMS EE
12_1_0_2_BP180116_OCW       /u01/app/oracle/product/12_1_0_2_BP180116_OCW           12.1.0.2.0   DBMS EE

# [ oracle@myserver:/u01/app/oracle [11:23:22] [12.1.0.2.0 SID="not set"] 0 ] #
# setoh 12_1_0_2_BP180116_OCW
ORACLE_SID =
ORACLE_VERSION = 12.1.0.2.0
ORACLE_HOME = /u01/app/oracle/product/12_1_0_2_BP180116_OCW

# [ oracle@myserver:/u01/app/oracle [11:23:25] [12.1.0.2.0 SID="not set"] 0 ] #
# opatch lspatches
26925218;OCW Patch Set Update : 12.1.0.2.180116 (26925218)
26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)
22243983;

OPatch succeeded.

# [ oracle@myserver:/u01/app/oracle [11:23:18] [12.1.0.2.0 SID="not set"] 0 ] #

# lsoh

HOME LOCATION VERSION EDITION

--------------------------- ------------------------------------------------------- ------------ ---------

OraGI12Home1 /u01/app/grid/product/grid 12.1.0.2.0 GRID

agent12c1 /u01/app/oracle/product/agent12c/core/12.1.0.5.0 12.1.0.5.0 AGT

OraDb11g_home1 /u01/app/oracle/product/11.2.0.4 11.2.0.4.0 DBMS EE

OraDB12Home1 /u01/app/oracle/product/12.1.0.2 12.1.0.2.0 DBMS EE

12_1_0_2_BP170718_RON /u01/app/oracle/product/12_1_0_2_BP170718_RON 12.1.0.2.0 DBMS EE

12_1_0_2_BP180116_OCW /u01/app/oracle/product/12_1_0_2_BP180116_OCW 12.1.0.2.0 DBMS EE

# [ oracle@myserver:/u01/app/oracle [11:23:22] [12.1.0.2.0 SID="not set"] 0 ] #

# setoh 12_1_0_2_BP180116_OCW

ORACLE_SID =

ORACLE_VERSION = 12.1.0.2.0

ORACLE_HOME = /u01/app/oracle/product/12_1_0_2_BP180116_OCW

# [ oracle@myserver:/u01/app/oracle [11:23:25] [12.1.0.2.0 SID="not set"] 0 ] #

# opatch lspatches

26925218;OCW Patch Set Update : 12.1.0.2.180116 (26925218)

26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)

22243983;

OPatch succeeded.

In the previous example, there are two Database homes that have been installed without a specific naming convention (OraDb11g_home1, OraDB12Home1) and two that follow a specific one (12_1_0_2_BP170718_RON, 12_1_0_2_BP180116_OCW).

Naming conventions play an important role

If you want to achieve an effective Oracle Home management, it is important that you have everywhere the same ORACLE_HOME paths, names and patch levels.

The Oracle Home path should not include only the release number:

/u01/app/oracle/product/12.1.0.2

1	/u01/app/oracle/product/12.1.0.2

If we have many Oracle Homes with the same release, how shall we call the other ones? There are several variables that might influence the naming convention:

Edition (EE, SE), RAC Option or other options, the patch type (formerly PSU, BP: now RU and RUR), eventual additional one-off patches.

Some ideas might be:

/u01/app/oracle/product/EE12.1.0.2
/u01/app/oracle/product/EE12.1.0.2_BP171019
/u01/app/oracle/product/EE12.1.0.2_BP171019_v2

/u01/app/oracle/product/EE12.1.0.2

/u01/app/oracle/product/EE12.1.0.2_BP171019

/u01/app/oracle/product/EE12.1.0.2_BP171019_v2

The new release model will facilitate a lot the definition of a naming convention as we will have names like:

/u01/app/oracle/product/EE18.1.0
/u01/app/oracle/product/EE18.2.1
/u01/app/oracle/product/EE18.2.1_v2

/u01/app/oracle/product/EE18.1.0

/u01/app/oracle/product/EE18.2.1

/u01/app/oracle/product/EE18.2.1_v2

Of course, the naming convention is not universal and can be adapted depending on the customer (e.g., if you have only Enterprise Editions you might omit this information).

Replacing dots with underscores?

You will see, at the end of the series, that I use Oracle Home paths with underscores instead of dots:

/u01/app/oracle/product/EE12_1_0_2
/u01/app/oracle/product/EE12_1_0_2_BP171019
/u01/app/oracle/product/EE12_1_0_2_BP171019_v2

/u01/app/oracle/product/EE12_1_0_2

/u01/app/oracle/product/EE12_1_0_2_BP171019

/u01/app/oracle/product/EE12_1_0_2_BP171019_v2

Why?

From a naming perspective, there is no need to have the Home that corresponds to the release number. Release, version and product information can be collected through the inventory.

What is really important is to have good naming conventions and good manageability. In my ideal world, the Oracle Home name inside the central inventory and the basename of the Oracle Home path are the same: this facilitates tremendously the scripting of the Oracle Home provisioning.

Sadly, the Oracle Home name cannot contain dots, it is a limitation of the Oracle Inventory, here’s why I replaced them with underscores.

In the next blog post, I will show how to plan a framework for automated Oracle Home provisioning.

Oracle Home Management – part 4: Challenges and Opportunities of the New Release Model

Posted on May 14, 2018 by Ludovico

Starting with the upcoming next release (18c), the Oracle Database will be a yearly release. (18c, 19c, etc). New yearly releases will contain only new features ready to go, and eventually some new features for performance improvements (plus bug fixes and security fixes from the previous version.)

Quarterly, instead of Patch Set Updates (PSU) and Bundle Patches (BP), there will be the new Release Updates (RU). They will contain critical fixes, optimizer changes, minor functional enhancements, bug fixes, security fixes. The new Release Updates will be equivalent to what we have now with Bundle Patches.

The Release Updates will be released during the whole lifetime of the feature release, according to the roadmap (2 years or 5 years depending on whether the release is in Long Term Support (LTS) or not). There will be a Long Term Support release every few years. The first two will probably be Oracle 19c and Oracle 23c (I am deliberately supposing that the c will still be relevant 🙂 ).

Beside Release Updates, there will be the new Release Update Revisions (RUR), that according to what I have read until now, will be released “at least” quarterly. Release Update Revisions will contain only regression fixes for bugs introduced by RUs and new security fixes, very close to what we have now with Patch Set Updates.

Release Update Revisions will cover ONLY 6 months, after that it will be necessary to upgrade to a newer Release Update or to a newer major release. Oracle introduced this change to reduce the complexity of their release management.

This leads to a few important things:

There will be no more than two RURs for each RU (e.g. 18.2 will have only 18.2.1 and 18.2.2)
If applying a RUR, after 6 months at latest, the DBs must be patched to a greater level of RU.
Applying the second RUR of each RU (e.g. 18.2.2 -> 18.3.2 -> 18.4.2) is the most conservative approach whilst keeping up to date with the latest critical fixes.

On top of that, one-off patches will still exist. For more information, please read the note Release Update Introduction and FAQ (Doc ID 2285040.1)

How will the new release model impact the patching strategy?

It is clear that it will be complex to keep the same major upgrade frequency as today (I expect it to increase). There have been from 3 to 5 years between each major release so far, and switching to a yearly release is a big change.

But the numbering will be easier: 18.3.2 is much more readable/maintainable than 12.2.0.3.BP180719 and, despite it does not contain an explicit date, it keeps it easy to understand the “distance” with the latest release.

So we will have, on one side, the need to upgrade more frequently. But on the other side, the upgrades might be easier than how they are now. One thing is sure, however: we will deal with many more Oracle Homes with different patch levels.

The new release model will bring us a unique opportunity to reinvent our procedures and scripts for Oracle Home management, to achieve a standardized and automated way to solve common problems like:

Multiple Oracle Homes coexistence (environment, naming conventions)
Automated binaries setup (via golden images or other automatic provisioning)
Database patches
Database upgrades

In the next post, I will show my idea of how Oracle Homes could be managed (with either the current or the new release model), making their coexistence easier for the DBAs.

Bonus: calculating the distance between releases

For a given release YY.x.z, the distance from its first release is ( x + z -1 ) quarters.

E.g.18.3.2 will be ( 3 + 2 – 1 ) = 4 quarters after the initial release date.

Across versions, assuming that each yearly release will be released in the same quarter, the distance between versions YY1.x1.z1 and YY2.x2.z2 is:

( YY2 – YY1 ) * 4 + ( x2 + z2 ) – ( x1 + z1 ) quarters

E.g. : between 18.4.1 and 20.1.2 the distance will be:

( 20 – 18 ) * 4 + ( 1 + 2 ) – ( 4 + 1 ) = 6 quarters

Oracle Home Management – part 3: Strengths and limitations of Rapid Home Provisioning

Posted on May 11, 2018 by Ludovico

In the previous post I mentioned that having a central repository storing the Golden Images would be the best solution for the Oracle Home provisioning.

In this context, Oracle provides Rapid Home Provisioning: a product included in Oracle Grid Infrastructure that automates home provisioning and patching of Oracle Database and Grid Infrastructure Homes, databases and also generic software.

Oracle Rapid Home Provisioning simplifies tremendously the software provisioning: you can use it to create golden images starting from existing installations and then deploy them locally, across different nodes, on local or remote clusters, standalone servers, etc.

Having a central store with enforced naming conventions ensures software standardization across the whole Oracle farm, and makes patching easier with less risks. Also, it allows to patch existing databases, moving them to Oracle Homes with a higher patch level, and taking care of service draining and rolling upgrades when RAC or RAC One Node deployments exist. Multiple databases can be patched in a single batch using one single rhpctl command.

I will not explain the technical details of Rapid Home Provisioning implementation operation. I already did a webinar a couple of years ago for the RAC SIG:

Burt Clouse, the RHP product manager, did a presentation as well about Rapid Home Provisioning 12c Release 2, that highlights some new features that the product was missing in the first release:

More details about the new features can be found here:

https://blogs.oracle.com/db_maintenance/whats-new-in-122-for-rapid-home-provisioning-and-maintenance

Close to be the perfect product, but…

If rapid home provisioning is so powerful, what makes it less appealing for most users?

In my opinion (read: very own personal opinion 🙂 ), there are two main factors:

First: The technology stack RHP is relying on is quite complex

Although Rapid Home Provisioning 12c Release 2 allows Oracle Home deployments on standalone servers (it was not the case with 12c Release 1), the Rapid Home Provisioning sever itself relies on Oracle Grid Infrastructure 12cR2. That means that there must be skills in the company to manage the full stack: Clusterware, ASM, ACFS, NFS, GNS, SCAN, etc. as well as the RHP Server itself.

Second: remote provisioning requires Lifecycle Management Pack (extra-cost) option licensed on all the RHP targets

If Oracle Homes are deployed on the same cluster that hosts the RHP Server, the product can be used at no extra cost. But if you have many clusters, or using standalone servers for your Oracle databases, then RHP can become pricey very quickly: the price per processor for Lifecycle Management Pack is 12’000$, plus support (pricelist April 2018). So, buying this management pack just to introduce Rapid Home Provisioning in your company might be an excessive investment.

Of course, depending on your needs, you can evaluate it and leverage its full potential and make a bigger return of investment.

Or, you might explore if it is viable to configure each cluster as Rapid Home Provisioning Server: in this case it would be free, but it will have the additional complexity layer on all your clusters.

For small companies, simple architectures and especially where Standard Edition is deployed (no Management Pack for Standard Edition!), a self-made, simpler solution might be a better choice.

In the next post, before going into the details of a hypothetical self-made implementation, I will introduce my thoughts about the New Oracle Database Release Model.

Oracle Home Management – part 2: Common patching patterns

Posted on May 3, 2018 by Ludovico

(*) Multiple times in this blog post I refer to a problem with new Oracle Home installs and rollback scripts. The problem has been fixed with PSU Jan 2017, I did not notice it before, sorry. Thanks to Martin Berger for the information

Let’s see some common approaches to Oracle Home patching.

First, how patches are applied

No, I will not talk about how to use opatch 🙂 It is an overview of the “high-level” methods… when you have multiple servers and (eventually) multiple databases per server.

Worst approach (big bang)

1.Stop everything

2.In-place binaries patching

3.Database patching, “big bang” mode

4.Start everything

With this approach, you have a big downtime, a maintenance window hard to get (all applications are down at the same time), no control over a single database and no easy rollback in case your binaries get compromised/corrupted by the patch apply.

Another bad approach (new install and out-of-place patching)

1.Re-install binaries manually in a new path

2.Patch the new binaries

3.Stop, change OH, patch databases one by one

4.Decommission old binaries

This approach is much better than the previous one, but still has some pitfalls:

If you have many servers and environments: doing it frequently might be a challenge
Rollback scripts are not copied automatically: the datapatch will fail unless you copy them by hand (*)
New installs introduce potential human error, unless you use unattended install with your own scripts
Do you like to run opatch apply all the time, after all?

Better approach (software cloning)

This approach is very close to the previous one, with the exception that the new Oracle Home is not installed from scratch, but rather cloned from an existing one. This way, the rollback scripts used by the datapatch binary will be there and there will be no errors when patching the databases. (*)

The procedure for Oracle Home cloning is described in the Oracle Documentation, here.

Another cool thing is that you can clone Oracle Homes across different nodes, so that you might have the same patch level everywhere without repeating the tedious tasks of upgrading the opatch, patching the binaries, etc. etc.

But still, you have to identify which Oracle Home you have to clone and keep track of the latest version.

Best approach (Golden Images)

The best approach would consist in having a central repository for your software, where you store every version of your Oracle Homes, one for each patch level.

Having a central repository allows to install the software ONCE and use a “clone, patch and store it” strategy. You can, for example, use only one server to do all the patching and then distribute your software images to the different database servers.

This is the concept of Golden Images used by Rapid Home Provisioning that will be in the scope of my next blog post.

Second, which patches are applied

Now that we have seen some Oracle Home patching approaches, is it worth to know which patches are important in a patching strategy.

It is better that you get used to the differences between PSU/BP and RU/RUR, by reading this valuable post from Mike Dietrich:

https://mikedietrichde.com/2017/10/24/differences-psu-bp-ru-rur/

I will make the assumption that in every case, the critical patches should be applied quarterly, or at least once per year, in order to fix security bugs.

The conservative approach (stability and performance over improvements)

Prior to 12.2, in order to guarantee security and stability, the best approach was to apply only PSUs each quarter.

From 12.2, the most conservative approach is to apply the latest Release Update Review on top of the oldest as possible Release Update. Confusing? Things will be clearer when I’ll write about the 18c New Release Model in a few days…

The cowboy approach (improvements over stability and performance)

Sometimes Bundle Patches and Release Updates contain cool backports from the new releases; sometimes they contain just more bug fixes than the PSUs and RURs; sometimes they fix important stuff like disabling bad transformations that lead to wrong result bugs or other annoying bugs.

Personally, I prefer to include such improvements in my patching strategy: I regularly apply RU for releases >=12.2 and BP for releases <=12.1. Don’t call me cowboy, however 🙂

The incumbent approach (or why you cannot avoid one-offs)

It does not matter your patch frequency: sometimes you hit a bug, and the only solution is either to apply the one-off patch or the workaround, if available.

If you apply the one-off patch for a specific bug, from an Oracle Home maintenance point of view, it would be better to

apply the same one-off everywhere (read, all your Oracle Homes with the very same release), this makes your environment homogeneous.

use a clone of the Oracle Home with the one-off as basis to apply the release update and distribute it to the other servers.

Why?

Again, it is a problem with rollback scripts (*), with patch conflicts and also, of number of versions to maintain:Less paths, less error-prone!

There is, however, the alternative to one-offs: implementing the workaround instead of applying the patch. Most of the time the workaround consist in disabling “something” through parameters, or worse, hidden parameters (the underscore parameters that the support says you should not set, but advise to do all the time as workaround :-))

It might be a good idea to use the workaround instead of apply tha patch if you already know that the bug will be fixed in the next Release Update (for example), or that the workaround is so easy to implement that it is not worth to create another version of Oracle Home that will require special attention at the next quarter.

If you apply workarounds, anyway, be sure that you comment EXACTLY why, when and who, so you can decide to unset it at the next parameter review or maintenance… e.g.

alter system set "_px_groupby_pushdown"=off
  comment='Ludo, 03.05.16: W/A for bug 18499088' scope=both sid='*';

alter system set "_fix_control"='14033181:0','11843466:off','26664361:7','16732417:1','20243268:1' 
  comment='Ludo, 20.11.17: fixes of BP171017 + W/A bugs 21303294 24499054' scope=spfile sid='*';

alter system set "_px_groupby_pushdown"=off

comment='Ludo, 03.05.16: W/A for bug 18499088' scope=both sid='*';

alter system set "_fix_control"='14033181:0','11843466:off','26664361:7','16732417:1','20243268:1'

comment='Ludo, 20.11.17: fixes of BP171017 + W/A bugs 21303294 24499054' scope=spfile sid='*';

Makes sense?

Oracle Home Management – part 1: “Patch soon, patch often” vs. reality

Posted on May 1, 2018 by Ludovico

With this post, I am starting a new blog series about Oracle Database home management, provisioning, patching… Best (and worst) practices, common practices and blueprints from my point of view as consultant and, sometimes, as operational DBA.

I hope to find the time to continue (and finish) it 🙂

How often should you upgrade/patch?

Database patching and upgrading is not an easy task, but it is really important.

Many companies do not have a clear patching strategy, for several reasons.

Patching is time consuming
It is complex
It introduces some risks
It is not always really necesary
It leads to human errors

Oracle, of course, recommends to apply the patches quarterly, as soon as they are released. But the reality is that it is (still) very common to find customers that do not apply patches regularly.

Look at this:

$ opatch lspatches
26925218;OCW Patch Set Update : 12.1.0.2.180116 (26925218)
26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)
22243983;

OPatch succeeded.

$ cd $ORACLE_HOME/inventory
$ grep -r "bug description" * |  wc -l
1883
$ grep -r "bug description" * | grep -i "wrong result" | wc -l
56

$ opatch lspatches

26925218;OCW Patch Set Update : 12.1.0.2.180116 (26925218)

26925263;Database Bundle Patch : 12.1.0.2.180116 (26925263)

22243983;

OPatch succeeded.

$ cd $ORACLE_HOME/inventory

$ grep -r "bug description" * | wc -l

1883

$ grep -r "bug description" * | grep -i "wrong result" | wc -l

With January 2018 Bundle Patch, you can fix 1883 bugs, including 56 “wrong results” bugs! I hope I will talk more about this kind of bugs, but for now consider that if you are not patching often, you are taking serious risks, including putting at risk your data consistency.

I will not talk about bugs, upgrade procedures, new releases here… For this, I recommend to follow Mike Dietrich’s blog: Upgrade your Database – NOW!

I would like rather to talk, as the title of this blog series states, about the approaches of maintaining the Oracle Homes across your Oracle server farm.

Common worst practices in maintaining homes

Maintaining a plethora of Oracle Homes across different servers requires thoughtful planning. This is a non-exhaustive list of bad practices that I see from time to time.

Installing by hand every new Oracle Home
Applying different patch levels on Oracle Homes with the same path
Not tracking the installed patches
Having Oracle Home paths hard-coded in the operational scripts
Not minding about Oracle Home path naming convention
Not minding about Oracle Home internal names
Copying Oracle Homes without minding about the Central Inventory

All these worst practices lead to what I like to call “patching madness”… that monster that makes regular patching very difficult / impossible.

THIS IS A SITUATION THAT YOU NEED TO AVOID:

Server A
/u01/app/oracle/product/12.1.0            -> Home "OraHOme12C", contains clean 12.1.0.2

Server B
/u01/app/oracle/product/12.1.0.2          -> Home "OraHome1",   contains 12.1.0.2.PSU161018
/u01/app/oracle/product/12.1.0.2.BP170117 -> Home "OraHome2",   contains 12.1.0.2.BP170117

Server C
/u01/app/oracle/product/12.1.0            -> Home "OraHome1",   contains clean 12.1.0.1
/u01/app/oracle/product/12.1.0.2          -> Home "DBHome_1",   contains 12.1.0.2.BP170117

Server A

/u01/app/oracle/product/12.1.0 -> Home "OraHOme12C", contains clean 12.1.0.2

Server B

/u01/app/oracle/product/12.1.0.2 -> Home "OraHome1", contains 12.1.0.2.PSU161018

/u01/app/oracle/product/12.1.0.2.BP170117 -> Home "OraHome2", contains 12.1.0.2.BP170117

Server C

/u01/app/oracle/product/12.1.0 -> Home "OraHome1", contains clean 12.1.0.1

/u01/app/oracle/product/12.1.0.2 -> Home "DBHome_1", contains 12.1.0.2.BP170117

A better approach, would be starting having some naming conventions, e.g.:

Server A
/u01/app/oracle/product/12.1.0.2           -> Home "Ora12cR2",           contains clean 12.1.0.2

Server B
/u01/app/oracle/product/12.1.0.2.PSU161018 -> Home "Ora12cR2_PSU161018", contains 12.1.0.2.PSU161018
/u01/app/oracle/product/12.1.0.2.BP170117  -> Home "Ora12cR2_BP170117",  contains 12.1.0.2.BP170117

Server C
/u01/app/oracle/product/12.1.0.1           -> Home "Ora12cR1",           contains clean 12.1.0.1
/u01/app/oracle/product/12.1.0.2.BP170117  -> Home "Ora12cR2_BP170117",  contains 12.1.0.2.BP170117

Server A

/u01/app/oracle/product/12.1.0.2 -> Home "Ora12cR2", contains clean 12.1.0.2

Server B

/u01/app/oracle/product/12.1.0.2.PSU161018 -> Home "Ora12cR2_PSU161018", contains 12.1.0.2.PSU161018

/u01/app/oracle/product/12.1.0.2.BP170117 -> Home "Ora12cR2_BP170117", contains 12.1.0.2.BP170117

Server C

/u01/app/oracle/product/12.1.0.1 -> Home "Ora12cR1", contains clean 12.1.0.1

/u01/app/oracle/product/12.1.0.2.BP170117 -> Home "Ora12cR2_BP170117", contains 12.1.0.2.BP170117

In the next blog post, I will talk about common patching patterns and their pitfalls.

DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL not working on 12c? Here’s why…

Posted on April 27, 2018 by Ludovico

It is bad to realize, after a few years, that my customer’s Audit Cleanup procedures are not working properly for every database…

NOTE: The post is based on standard audit, not unified audit.

My customer developed a quite nice procedure for database housekeeping (including diag dest, OS audit trail, recyclebin, DB audit…)

But after some performance problems, I have come across the infamous sql_id 4ztz048yfq32s:

SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp, COUNT(username) AS failed_count, TO_CHAR(MIN(timestamp), 'yyyy-mm-dd hh24:mi:ss') AS first_occur_time, TO_CHAR(MAX(timestamp), 'yyyy-mm-dd hh24:mi:ss') AS last_occur_time
FROM sys.dba_audit_session
WHERE returncode != 0 AND timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00')

SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT', 'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp, COUNT(username) AS failed_count, TO_CHAR(MIN(timestamp), 'yyyy-mm-dd hh24:mi:ss') AS first_occur_time, TO_CHAR(MAX(timestamp), 'yyyy-mm-dd hh24:mi:ss') AS last_occur_time

FROM sys.dba_audit_session

WHERE returncode != 0 AND timestamp >= current_timestamp - TO_DSINTERVAL('0 0:30:00')

This SQL comes from the “Failed Logon Attempts” metric in Enterprise Manager.

I’ve checked the specific database, and the table SYS.AUD$ was containing way too many rows, dating before our purge time:

SQL> select min(timestamp) from dba_audit_session;

MIN(TIMESTAMP)
-------------------
04.02.2017 07:01:20

SQL>  select dbid, count(*) from aud$ group by dbid;

      DBID   COUNT(*)
---------- ----------
2416611527   35846477

SQL> select min(timestamp) from dba_audit_session;

MIN(TIMESTAMP)

-------------------

04.02.2017 07:01:20

SQL> select dbid, count(*) from aud$ group by dbid;

DBID COUNT(*)

---------- ----------

2416611527 35846477

The cleanup procedure does basically this:

SQL> begin
  2  dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
  3                          ,last_archive_time => SYSTIMESTAMP-31);
  4  end;
  5  /

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> begin
  2  dbms_audit_mgmt.clean_audit_trail(
  3    audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,
  4    use_last_arch_timestamp => TRUE);
  5  end;
  6  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:38.34

SQL> begin

2 dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD

3 ,last_archive_time => SYSTIMESTAMP-31);

4 end;

5 /

PL/SQL procedure successfully completed.

SQL> set timing on

SQL> begin

2 dbms_audit_mgmt.clean_audit_trail(

3 audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,

4 use_last_arch_timestamp => TRUE);

5 end;

6 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:38.34

But despite a retention window of 31 days, the rows are still there:

SQL> select min(timestamp) from dba_audit_session;

MIN(TIMESTAMP)
-------------------
04.02.2017 07:01:20

Elapsed: 00:00:29.06

SQL> select min(timestamp) from dba_audit_session;

MIN(TIMESTAMP)

-------------------

04.02.2017 07:01:20

Elapsed: 00:00:29.06

(today is 27.04.2018, so the oldest records are more than 1 year old)

I’ve checked with ASH, the actual delete statement executed by the clean_audit_trail procedure is:

DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2017-02-04 05:01:10', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

1	DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2017-02-04 05:01:10', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

So, the DBID clause is OK, but the NTIMESTAMP# clause is not!

Why?

Long story long (hint, it’s a bug: 19958239)
Update 30.05.2018 the solution is explained in this Doc: 2068066.1, thanks John)

The cleanup metadata is stored into the view DBA_AUDIT_MGMT_LAST_ARCH_TS. Its structure in 11g was:

SQL> desc dba_audit_mgmt_last_arch_ts
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 AUDIT_TRAIL                                        VARCHAR2(20)
 RAC_INSTANCE                              NOT NULL NUMBER
 LAST_ARCHIVE_TS                                    TIMESTAMP(6) WITH TIME ZONE

SQL> desc dba_audit_mgmt_last_arch_ts

Name Null? Type

----------------------------------------- -------- ----------------------------

AUDIT_TRAIL VARCHAR2(20)

RAC_INSTANCE NOT NULL NUMBER

LAST_ARCHIVE_TS TIMESTAMP(6) WITH TIME ZONE

But in 12c, there are 2 new columns:

SQL> desc dba_audit_mgmt_last_arch_ts
 Name                                  Null?    Type
 ------------------------------------- -------- ----------------------------
 AUDIT_TRAIL                                    VARCHAR2(20)
 RAC_INSTANCE                          NOT NULL NUMBER
 LAST_ARCHIVE_TS                                TIMESTAMP(6) WITH TIME ZONE
 DATABASE_ID                           NOT NULL NUMBER
 CONTAINER_GUID                        NOT NULL VARCHAR2(33)

SQL> desc dba_audit_mgmt_last_arch_ts

Name Null? Type

------------------------------------- -------- ----------------------------

AUDIT_TRAIL VARCHAR2(20)

RAC_INSTANCE NOT NULL NUMBER

LAST_ARCHIVE_TS TIMESTAMP(6) WITH TIME ZONE

DATABASE_ID NOT NULL NUMBER

CONTAINER_GUID NOT NULL VARCHAR2(33)

When the database is upgraded from 11g to 12c, the two new columns are set to “0” by default.

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL                 RAC_INSTANCE LAST_ARCHIVE_TS                      DATABASE_ID CONTAINER_GUID
--------------------------- ------------ ------------------------------------ ----------- --------------------------------
STANDARD AUDIT TRAIL                   0 04-FEB-17 05.01.10.000000 AM +00:00            0 00000000000000000000000000000000
OS AUDIT TRAIL                         1 04-FEB-17 05.01.15.000000 AM +02:00            0 00000000000000000000000000000000

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL RAC_INSTANCE LAST_ARCHIVE_TS DATABASE_ID CONTAINER_GUID

--------------------------- ------------ ------------------------------------ ----------- --------------------------------

STANDARD AUDIT TRAIL 0 04-FEB-17 05.01.10.000000 AM +00:00 0 00000000000000000000000000000000

OS AUDIT TRAIL 1 04-FEB-17 05.01.15.000000 AM +02:00 0 00000000000000000000000000000000

But when the procedure DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP is executed, the actual dbid is used, and new lines appear:

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL                 RAC_INSTANCE LAST_ARCHIVE_TS                      DATABASE_ID CONTAINER_GUID
--------------------------- ------------ ------------------------------------ ----------- --------------------------------
STANDARD AUDIT TRAIL                   0 04-FEB-17 05.01.10.000000 AM +00:00            0 00000000000000000000000000000000
OS AUDIT TRAIL                         1 04-FEB-17 05.01.15.000000 AM +02:00            0 00000000000000000000000000000000
STANDARD AUDIT TRAIL                   0 27-MAR-18 12.29.55.000000 PM +00:00   2416611527 4A2962517EF2316FE0532296780AE383
OS AUDIT TRAIL                         1 27-MAR-18 12.20.06.000000 PM +02:00   2416611527 4A2962517EF2316FE0532296780AE383

SQL> select * from dba_audit_mgmt_last_arch_ts;

AUDIT_TRAIL RAC_INSTANCE LAST_ARCHIVE_TS DATABASE_ID CONTAINER_GUID

--------------------------- ------------ ------------------------------------ ----------- --------------------------------

STANDARD AUDIT TRAIL 0 04-FEB-17 05.01.10.000000 AM +00:00 0 00000000000000000000000000000000

OS AUDIT TRAIL 1 04-FEB-17 05.01.15.000000 AM +02:00 0 00000000000000000000000000000000

STANDARD AUDIT TRAIL 0 27-MAR-18 12.29.55.000000 PM +00:00 2416611527 4A2962517EF2316FE0532296780AE383

OS AUDIT TRAIL 1 27-MAR-18 12.20.06.000000 PM +02:00 2416611527 4A2962517EF2316FE0532296780AE383

It is clear now that the DELETE statement is not constructed properly. It should get the LAST_ARCHIVE_TS of the actual DBID being purged… but it takes the other one.

According to my tests, it does not use neither the correct timestamp for the dbid, nor get the oldest timestamp: it uses instead the timestamp of the first record found by the clause “WHERE AUDIT_TRAIL=’STANDARD AUDIT TRAIL'”. It depends on the physical location of the row in the table! Clearly a big mess… (PS, not sure 100%, but this is what I suppose)

So, I have tried to modify the archive time for DBID 0:

SQL> begin
  2  dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
  3                          ,last_archive_time => SYSTIMESTAMP-31
  4                          ,database_id => 0
  5                          ,container_guid => '00000000000000000000000000000000');
  6  end;
  7
  8  /

PL/SQL procedure successfully completed.

SQL> select database_id, audit_trail, last_archive_ts from dba_audit_mgmt_last_arch_ts;

DATABASE_ID AUDIT_TRAIL                   LAST_ARCHIVE_TS
----------- ----------------------------- ----------------------------------------
          0 STANDARD AUDIT TRAIL          27-MAR-18 12.37.22.000000 PM +00:00
          0 OS AUDIT TRAIL                04-FEB-17 05.01.15.000000 AM +02:00
 2416611527 STANDARD AUDIT TRAIL          27-MAR-18 12.29.55.000000 PM +00:00
 2416611527 OS AUDIT TRAIL                27-MAR-18 12.20.06.000000 PM +02:00

SQL> begin

2 dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD

3 ,last_archive_time => SYSTIMESTAMP-31

4 ,database_id => 0

5 ,container_guid => '00000000000000000000000000000000');

6 end;

8 /

PL/SQL procedure successfully completed.

SQL> select database_id, audit_trail, last_archive_ts from dba_audit_mgmt_last_arch_ts;

DATABASE_ID AUDIT_TRAIL LAST_ARCHIVE_TS

----------- ----------------------------- ----------------------------------------

0 STANDARD AUDIT TRAIL 27-MAR-18 12.37.22.000000 PM +00:00

0 OS AUDIT TRAIL 04-FEB-17 05.01.15.000000 AM +02:00

2416611527 STANDARD AUDIT TRAIL 27-MAR-18 12.29.55.000000 PM +00:00

2416611527 OS AUDIT TRAIL 27-MAR-18 12.20.06.000000 PM +02:00

Trying to execute the cleanup again, now leads to a better timestamp:

DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2018-03-27 12:37:22', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

1	DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2018-03-27 12:37:22', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

I have then tried to play a little bit with the DBA_AUDIT_MGMT_LAST_ARCH_TS view (and the underlying table DAM_LAST_ARCH_TS$).

First, I’ve faked the DBID:

SQL> update dba_audit_mgmt_last_arch_ts set database_id=2416611526 where database_id=0;

2 rows updated.

SQL> commit;

Commit complete.
SQL> select database_id, audit_trail, last_archive_ts from DBA_AUDIT_MGMT_LAST_ARCH_TS;

DATABASE_ID AUDIT_TRAIL                                                  LAST_ARCHIVE_TS
----------- ------------------------------------------------------------ ---------------------------------------------------------------------------
 2416611526 STANDARD AUDIT TRAIL                                         27-MAR-18 12.37.22.000000 PM +00:00
 2416611526 OS AUDIT TRAIL                                               04-FEB-17 05.01.15.000000 AM +02:00
 2416611527 STANDARD AUDIT TRAIL                                         27-MAR-18 12.29.55.000000 PM +00:00
 2416611527 OS AUDIT TRAIL                                               27-MAR-18 12.20.06.000000 PM +02:00

SQL> update dba_audit_mgmt_last_arch_ts set database_id=2416611526 where database_id=0;

2 rows updated.

SQL> commit;

Commit complete.

SQL> select database_id, audit_trail, last_archive_ts from DBA_AUDIT_MGMT_LAST_ARCH_TS;

DATABASE_ID AUDIT_TRAIL LAST_ARCHIVE_TS

----------- ------------------------------------------------------------ ---------------------------------------------------------------------------

2416611526 STANDARD AUDIT TRAIL 27-MAR-18 12.37.22.000000 PM +00:00

2416611526 OS AUDIT TRAIL 04-FEB-17 05.01.15.000000 AM +02:00

2416611527 STANDARD AUDIT TRAIL 27-MAR-18 12.29.55.000000 PM +00:00

2416611527 OS AUDIT TRAIL 27-MAR-18 12.20.06.000000 PM +02:00

Then, I have tried to increase the retention timestamp (500 days):

SQL> begin
  2  dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD
  3                          ,last_archive_time => SYSTIMESTAMP-500
  4                          ,database_id => 2416611526
  5                          ,container_guid => '00000000000000000000000000000000');
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select database_id, audit_trail, last_archive_ts from dba_audit_mgmt_last_arch_ts;

DATABASE_ID AUDIT_TRAIL                                                  LAST_ARCHIVE_TS
----------- ------------------------------------------------------------ ---------------------------------------------------------------------------
 2416611526 STANDARD AUDIT TRAIL                                         13-DEC-16 12.48.23.000000 PM +00:00
 2416611526 OS AUDIT TRAIL                                               04-FEB-17 05.01.15.000000 AM +02:00
 2416611527 STANDARD AUDIT TRAIL                                         27-MAR-18 12.29.55.000000 PM +00:00
 2416611527 OS AUDIT TRAIL                                               27-MAR-18 12.20.06.000000 PM +02:00

SQL> begin

2 dbms_audit_mgmt.set_last_archive_timestamp(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD

3 ,last_archive_time => SYSTIMESTAMP-500

4 ,database_id => 2416611526

5 ,container_guid => '00000000000000000000000000000000');

6 end;

7 /

PL/SQL procedure successfully completed.

SQL> select database_id, audit_trail, last_archive_ts from dba_audit_mgmt_last_arch_ts;

DATABASE_ID AUDIT_TRAIL LAST_ARCHIVE_TS

----------- ------------------------------------------------------------ ---------------------------------------------------------------------------

2416611526 STANDARD AUDIT TRAIL 13-DEC-16 12.48.23.000000 PM +00:00

2416611526 OS AUDIT TRAIL 04-FEB-17 05.01.15.000000 AM +02:00

2416611527 STANDARD AUDIT TRAIL 27-MAR-18 12.29.55.000000 PM +00:00

2416611527 OS AUDIT TRAIL 27-MAR-18 12.20.06.000000 PM +02:00

Finally, I have tried to purge the audit trail with both DBIDs:

SQL> begin
  2  dbms_audit_mgmt.clean_audit_trail(
  3    audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,
  4    database_id =>   2416611526,
  5    use_last_arch_timestamp => TRUE);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:45.89

SQL> begin
  2   dbms_audit_mgmt.clean_audit_trail(
  3    audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,
  4    database_id =>   2416611527,
  5     use_last_arch_timestamp => TRUE);
  6  end
  7  ;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:34.72

SQL> begin

2 dbms_audit_mgmt.clean_audit_trail(

3 audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,

4 database_id => 2416611526,

5 use_last_arch_timestamp => TRUE);

6 end;

7 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:45.89

SQL> begin

2 dbms_audit_mgmt.clean_audit_trail(

3 audit_trail_type => sys.dbms_audit_mgmt.AUDIT_TRAIL_AUD_STD,

4 database_id => 2416611527,

5 use_last_arch_timestamp => TRUE);

6 end

7 ;

8 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:34.72

As I expected, in both cases the the cleanup generated the delete with the timestamp of the fake DBID:

-- clean audit trail for dbid 2416611526 
DELETE FROM SYS.AUD$ WHERE DBID = 2416611526 AND NTIMESTAMP# < to_timestamp('2016-12-13 12:48:23', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

-- clean audit trail for dbid 2416611527
DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2016-12-13 12:48:23', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

-- clean audit trail for dbid 2416611526

DELETE FROM SYS.AUD$ WHERE DBID = 2416611526 AND NTIMESTAMP# < to_timestamp('2016-12-13 12:48:23', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

-- clean audit trail for dbid 2416611527

DELETE FROM SYS.AUD$ WHERE DBID = 2416611527 AND NTIMESTAMP# < to_timestamp('2016-12-13 12:48:23', 'YYYY-MM-DD HH24:MI:SS.FF') AND ROWNUM <= 140724603463440

Is it possible to delete the unwanted records from the view DBA_AUDIT_MGMT_LAST_ARCH_TS?

Not only is possible, but I recommend it:

SQL> delete from dba_audit_mgmt_last_arch_ts where database_id=2416611526;

2 rows deleted.

SQL> commit;

Commit complete.

SQL>

SQL> delete from dba_audit_mgmt_last_arch_ts where database_id=2416611526;

2 rows deleted.

SQL> commit;

Commit complete.

SQL>

Afterwards, the timestamp in the where condition is correct and remains correct after subsequent executions of DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP.

Conclusions, IMPORTANT FOR THE DATABASE OPERATIONS:

The upgrade causes the unwanted lines with DBID=0 in the DBA_AUDIT_MGMT_LAST_ARCH_TS view.

Moreover, any duplicate changes the DBID: any subsequent execution of DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP in the duplicated database will lead to additional lines in the view.

This is what I plan to do now:

Whenever I upgrade from 11g to 12c, cleanup the data from DBA_AUDIT_MGMT_LAST_ARCH_TS and schedule the cleanup for DBID 0 as well
Whenever I duplicate a database, I execute a DELETE (without clauses) from DBA_AUDIT_MGMT_LAST_ARCH_T and a truncate of the table SYS.AUD$ (it is a duplicate, after all!)

HTH

My own Dbvisit Replicate integration with Grid Infrastructure

Posted on October 30, 2017 by Ludovico

I am helping my customer for a PoC of Dbvisit Replicate as a logical replication tool. I will not discuss (at least, not in this post) about the capabilities of the tool itself, its configuration or the caveats that you should beware of when you do logical replication. Instead, I will concentrate on how we will likely integrate it in the current environment.

My role in this PoC is to make sure that the tool will be easy to operate from the operational point of view, and the database operations, here, are supported by Oracle Grid Infrastructure and cold failover clusters.

Note: there are official Dbvisit online resources about how to configure Dbvisit Replicate in a cluster. I aim to complement those informations, not copy them.

Quick overview

If you know Dbvisit replicate, skip this paragraph.

There are three main components of Dbvisit Replicate: The FETCHER, the MINE and the APPLY processes. The FETCHER gets the redo stream from the source and sends it to the MINE process. The MINE process elaborates the redo streams and converts it in proprietary transaction log files (named plog). The APPLY process gets the plog files and applies the transactions on the destination database.

From an architectural point of view, MINE and APPLY do not need to run close to the databases that are part of the configuration. The FETCHER process, by opposite, needs to be local to the source database online log files (and archived logs).

Because the MINE process is the most resource intensive, it is not convenient to run it where the databases reside, as it might consume precious CPU resources that are licensed for Oracle Database. So, first step in this PoC: the FETCHER processes will run on the cluster, while MINE and APPLY will run on a dedicated Virtual Machine.

Clustering considerations

the FETCHER does NOT need to run on the server of the source database: having access to the online logs through the ASM instance is enough
to avoid SPoF, the fetcher should be a cluster resource that can relocate without problems
to simplify the configuration, the FETCHER configuration and the Dbvisit binaries should be on a shared filesystem (the FETCHER does not persist any data, just the logs)
the destination database might be literally anywhere: the APPLY connects via SQL*Net, so a correct name resolution and routing to the destination database are enough

so the implementation steps are:

create a shared filesystem
install dbvisit in the shared filesystem
create the Dbvisit Replicate configuration on the dedicated VM
copy the configuration files on the cluster
prepare an action script
configure the resource
test!

Convention over configuration: the importance of a strong naming convention

Before starting the implementation, I decided to put all the caveats related to the FETCHER resource relocation on paper:

Where will the configuration files reside? Dbvisit has an important variable: the Configuration Name. All the operations are done by passing a configuration file named /{PATH}/{CONFIG_NAME}/{CONFIG_NAME}-{PROCESS_TYPE}.ddc to the dbvrep binary. So, I decided to put ALL the configuration directories under the same path: given the Configuration Name, I will always be able to get the configuration file path.
How will the configuration files relocate from one node to the other? Easy here: they won’t. I will use an ACFS filesystem
How can I link the cluster resource with its configuration name? Easy again: I call my resources dbvrep.CONFIGNAME.PROCESS_TYPE. e.g. dbvrep.FROM_A_TO_B.fetcher
How will I manage the need to use a new version of dbvisit in the future? Old and new versions must coexist: Instead of using external configuration files, I will just use a custom resource attribute named DBVREP_HOME inside my resource type definition. (see later)
What port number should I use? Of course, many fetchers started on different servers should not have conflicts. This is something that might be either planned or made dynamic. I will opt for the first one. But instead of getting the port number inside the Dbvisit configuration, I will use a custom resource attribute: DBVREP_PORT.

Considerations on the FETCHER listen address

This requires a dedicated paragraph. The Dbvisit documentation suggest to create a VIP, bind on the VIP address and create a dependency between the FETCHER resource and the VIP. Here is where my configuration will differ.

Having a separate VIP per FETCHER resource might, potentially, lead to dozens of VIPs in the cluster. Everything will depend on the success of the PoC and on how many internal clients will decide to ask for such implementation. Many VIPs == many interactions with network admins for address reservation, DNS configurations, etc. Long story short, it might slow down the creation and maintenance of new configurations.

Instead, each FETCHER will listen to the local server address, and the action script will take care of:

getting the current host name
getting the current ASM instance
changing the settings of the specific Dbvisit Replicate configuration (ASM instance and FETCHER listen address)
starting the FETCHER

Implementation

Now that all the caveats and steps are clear, I can show how I implemented it:

Create a shared filesystem

asmcmd volcreate -G ACFS -s 10G dbvisit --column 1
/sbin/mkfs -t acfs /dev/asm/dbvisit-293
sudo /u01/app/grid/product/12.1.0.2/grid/bin/srvctl add filesystem -d /dev/asm/dbvisit-293 -m /u02/data/oracle/dbvisit -u oracle -fstype ACFS -autostart ALWAYS
srvctl start filesystem -d /dev/asm/dbvisit-293

asmcmd volcreate -G ACFS -s 10G dbvisit --column 1

/sbin/mkfs -t acfs /dev/asm/dbvisit-293

sudo /u01/app/grid/product/12.1.0.2/grid/bin/srvctl add filesystem -d /dev/asm/dbvisit-293 -m /u02/data/oracle/dbvisit -u oracle -fstype ACFS -autostart ALWAYS

srvctl start filesystem -d /dev/asm/dbvisit-293

Install dbvisit in the shared filesystem

out of scope!

1	out of scope!

Create the Dbvisit Replicate configuration on the dedicated VM

out of scope!

1	out of scope!

Copy the configuration files from the Dbvisit VM to the cluster

scp /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B-FETCHER.ddc \ 
 cluster-scan:/u02/data/oracle/dbvisit/FROM_A_TO_B

1 2	scp /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B-FETCHER.ddc \ cluster-scan:/u02/data/oracle/dbvisit/FROM_A_TO_B

Prepare an action script

$ cat dbvrep.sh
#!/bin/ksh
########################################
# Name   : dbvrep.sh
# Author : Ludovico Caldara, Trivadis AG

# the DBVISIT FETCHER process needs to know 2 attributes: DBVREP_HOME and DBVREP_PORT.
# If you want to call the action script directly set:
# _CRS_NAME=<resource name in format dbvrep.CONFIGNAME.fetcher>
# _CRS_DBVREP_HOME=<dbvrep installation path>
# _CRS_DBVREP_PORT=<listening port>

DBVREP_RES_NAME=${_CRS_NAME}
DBVREP_CONFIG_NAME=`echo $DBVREP_RES_NAME | awk -F. '{print $2}'`

# MINE, FETCHER or APPLY?
DBVREP_PROCESS_TYPE=`echo $DBVREP_RES_NAME | awk -F. '{print toupper($3)}'`

DBVREP_HOME=${_CRS_DBVREP_HOME}
DBVREP=${DBVREP_HOME}/dbvrep
DBVREP_PORT=${_CRS_DBVREP_PORT}
DBVREP_CONFIG_PATH=/u02/data/oracle/dbvisit

DBVREP_CONFIG_FILE=${DBVREP_CONFIG_PATH}/${DBVREP_CONFIG_NAME}/${DBVREP_CONFIG_NAME}-${DBVREP_PROCESS_TYPE}.ddc

function F_verify_dbvrep_up {
        ps -eaf | grep "[d]bvrep ${DBVREP_PROCESS_TYPE} $DBVREP_CONFIG_NAME" > /dev/null
        if [ $? -eq 0 ] ; then
                echo "OK"
        else
                echo "KO"
                exit 1
        fi
}

ACTION="${1}"
case "$ACTION" in

        'start')
        LOCAL_ASM="+"`ps -eaf | grep [a]sm_pmon | awk -F+ '{print $NF}'`;

        if [ "${DBVREP_PROCESS_TYPE}" == "FETCHER" ] ; then
                $DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} --silent <<EOF
set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.MINE_ASM=${LOCAL_ASM}
start FETCHER
EOF
        fi
;;

        'stop')
        $DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} shutdown ${DBVREP_PROCESS_TYPE}

;;

        'check')
        F_verify_dbvrep_up
;;

        'clean')
        sleep 1
        exit 0
;;

        *)
usage
;;

esac

$ cat dbvrep.sh

#!/bin/ksh

########################################

# Name : dbvrep.sh

# Author : Ludovico Caldara, Trivadis AG

# the DBVISIT FETCHER process needs to know 2 attributes: DBVREP_HOME and DBVREP_PORT.

# If you want to call the action script directly set:

# _CRS_NAME=<resource name in format dbvrep.CONFIGNAME.fetcher>

# _CRS_DBVREP_HOME=<dbvrep installation path>

# _CRS_DBVREP_PORT=<listening port>

DBVREP_RES_NAME=${_CRS_NAME}

DBVREP_CONFIG_NAME=`echo $DBVREP_RES_NAME | awk -F. '{print $2}'`

# MINE, FETCHER or APPLY?

DBVREP_PROCESS_TYPE=`echo $DBVREP_RES_NAME | awk -F. '{print toupper($3)}'`

DBVREP_HOME=${_CRS_DBVREP_HOME}

DBVREP=${DBVREP_HOME}/dbvrep

DBVREP_PORT=${_CRS_DBVREP_PORT}

DBVREP_CONFIG_PATH=/u02/data/oracle/dbvisit

DBVREP_CONFIG_FILE=${DBVREP_CONFIG_PATH}/${DBVREP_CONFIG_NAME}/${DBVREP_CONFIG_NAME}-${DBVREP_PROCESS_TYPE}.ddc

function F_verify_dbvrep_up {

ps -eaf | grep "[d]bvrep ${DBVREP_PROCESS_TYPE} $DBVREP_CONFIG_NAME" > /dev/null

if [ $? -eq 0 ] ; then

echo "OK"

else

echo "KO"

exit 1

}

ACTION="${1}"

case "$ACTION" in

'start')

LOCAL_ASM="+"`ps -eaf | grep [a]sm_pmon | awk -F+ '{print $NF}'`;

if [ "${DBVREP_PROCESS_TYPE}" == "FETCHER" ] ; then

$DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} --silent <<EOF

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.MINE_ASM=${LOCAL_ASM}

start FETCHER

EOF

;;

'stop')

$DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} shutdown ${DBVREP_PROCESS_TYPE}

;;

'check')

F_verify_dbvrep_up

;;

'clean')

sleep 1

exit 0

;;

usage

;;

esac

Configure the resource

$ cat dbvrep.type
ATTRIBUTE=ACTION_SCRIPT
DEFAULT_VALUE=/path_to_action_script/dbvrep.ksh
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=SCRIPT_TIMEOUT
DEFAULT_VALUE=120
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=DBVREP_PORT
DEFAULT_VALUE=
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=DBVREP_HOME
DEFAULT_VALUE=/u02/data/oracle/dbvisit/replicate
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=SERVER_POOLS
DEFAULT_VALUE=*
TYPE=STRING
FLAGS=CONFIG|HOTMOD

ATTRIBUTE=START_DEPENDENCIES
DEFAULT_VALUE=hard() weak(type:ora.listener.type,global:type:ora.scan_listener.type) pullup()
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=STOP_DEPENDENCIES
DEFAULT_VALUE=hard()
TYPE=STRING
FLAGS=CONFIG


ATTRIBUTE=RESTART_ATTEMPTS
DEFAULT_VALUE=2
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=CHECK_INTERVAL
DEFAULT_VALUE=60
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=FAILURE_THRESHOLD
DEFAULT_VALUE=2
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=UPTIME_THRESHOLD
DEFAULT_VALUE=8h
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=FAILURE_INTERVAL
DEFAULT_VALUE=3600
TYPE=INT
FLAGS=CONFIG

$ crsctl add type dbvrep.type -basetype cluster_resource -file dbvrep.type
$ crsctl add resource dbvrep.FROM_A_TO_B.fetcher -type dbvrep.type \
  -attr "START_DEPENDENCIES=hard(db.source) pullup:always(db.source),STOP_DEPENDENCIES=hard(db.source),DBVREP_PORT=7901"

$ cat dbvrep.type

ATTRIBUTE=ACTION_SCRIPT

DEFAULT_VALUE=/path_to_action_script/dbvrep.ksh

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=SCRIPT_TIMEOUT

DEFAULT_VALUE=120

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=DBVREP_PORT

DEFAULT_VALUE=

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=DBVREP_HOME

DEFAULT_VALUE=/u02/data/oracle/dbvisit/replicate

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=SERVER_POOLS

DEFAULT_VALUE=*

TYPE=STRING

FLAGS=CONFIG|HOTMOD

ATTRIBUTE=START_DEPENDENCIES

DEFAULT_VALUE=hard() weak(type:ora.listener.type,global:type:ora.scan_listener.type) pullup()

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=STOP_DEPENDENCIES

DEFAULT_VALUE=hard()

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=RESTART_ATTEMPTS

DEFAULT_VALUE=2

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=CHECK_INTERVAL

DEFAULT_VALUE=60

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=FAILURE_THRESHOLD

DEFAULT_VALUE=2

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=UPTIME_THRESHOLD

DEFAULT_VALUE=8h

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=FAILURE_INTERVAL

DEFAULT_VALUE=3600

TYPE=INT

FLAGS=CONFIG

$ crsctl add type dbvrep.type -basetype cluster_resource -file dbvrep.type

$ crsctl add resource dbvrep.FROM_A_TO_B.fetcher -type dbvrep.type \

-attr "START_DEPENDENCIES=hard(db.source) pullup:always(db.source),STOP_DEPENDENCIES=hard(db.source),DBVREP_PORT=7901"

Test!

$ crsctl start res dbvrep.FROM_A_TO_B.fetcher
CRS-2672: Attempting to start 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'
CRS-2676: Start of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..
2017-10-30 15:24:34.992478 :    AGFW:1127589632: {1:30181:30166} Agent received the message: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912
2017-10-30 15:24:34.992512 :    AGFW:1127589632: {1:30181:30166} Preparing START command for: dbvrep.FROM_A_TO_B.fetcher 1 1
2017-10-30 15:24:34.992521 :    AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: OFFLINE to: STARTING
2017-10-30 15:24:34.993195 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Executing action script: dbvrep.ksh[start]
2017-10-30 15:24:41.254703 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_REMOTE_INTERFACE set to server1:7901 for process
2017-10-30 15:24:41.254726 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.
2017-10-30 15:24:41.354916 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_LISTEN_INTERFACE set to server1:7901 for process
2017-10-30 15:24:41.354935 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.
2017-10-30 15:24:41.405052 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable MINE_ASM set to +ASM1 for process FETCHER.
2017-10-30 15:24:41.605423 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Starting process FETCHER...started
2017-10-30 15:24:41.655660 :    AGFW:1106577152: {1:30181:30166} Command: start for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS
2017-10-30 15:24:41.656100 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] Executing action script: dbvrep.ksh[check]
2017-10-30 15:24:41.658242 :    AGFW:1127589632: {1:30181:30166} Agent sending reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912
2017-10-30 15:24:41.908256 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] OK
2017-10-30 15:24:41.908440 :    AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STARTING to: ONLINE
2017-10-30 15:24:41.908486 :    AGFW:1127589632: {1:30181:30166} Started implicit monitor for [dbvrep.FROM_A_TO_B.fetcher 1 1] interval=60000 delay=60000
2017-10-30 15:24:41.908696 :    AGFW:1127589632: {1:30181:30166} Agent sending last reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912


$ crsctl stop res dbvrep.FROM_A_TO_B.fetcher
CRS-2673: Attempting to stop 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'
CRS-2677: Stop of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..
2017-10-30 15:22:14.891730 :    AGFW:1127589632: {1:30181:30156} Agent received the message: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818
2017-10-30 15:22:14.891762 :    AGFW:1127589632: {1:30181:30156} Preparing STOP command for: dbvrep.FROM_A_TO_B.fetcher 1 1
2017-10-30 15:22:14.891772 :    AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: ONLINE to: STOPPING
2017-10-30 15:22:14.892400 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Executing action script: dbvrep.ksh[stop]
2017-10-30 15:22:20.957375 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC loaded from database (458 variables).
2017-10-30 15:22:21.007939 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate version 2.9.04
2017-10-30 15:22:21.007963 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Copyright (C) Dbvisit Software Limited. All rights reserved.
2017-10-30 15:22:21.007976 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC file
2017-10-30 15:22:21.007994 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B
2017-10-30 15:22:21.008005 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] -FETCHER.ddc loaded.
2017-10-30 15:22:21.108340 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate FETCHER process shutting down.
2017-10-30 15:22:21.108361 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] OK-0: Completed successfully.
2017-10-30 15:22:45.747531 :    AGFW:1091868416: {1:30181:30156} Command: stop for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS
2017-10-30 15:22:45.747898 :    AGFW:1127589632: {1:30181:30156} Agent sending reply for: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818
2017-10-30 15:22:45.747902 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] Executing action script: dbvrep.ksh[check]
2017-10-30 15:22:45.949702 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] KO
2017-10-30 15:22:45.949913 :    AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STOPPING to: OFFLINE
2017-10-30 15:22:45.950014 :    AGFW:1127589632: {1:30181:30156} Agent sending last reply for: RESOURCE_STOP[dbvrep.dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175818

$ crsctl start res dbvrep.FROM_A_TO_B.fetcher

CRS-2672: Attempting to start 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'

CRS-2676: Start of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..

2017-10-30 15:24:34.992478 : AGFW:1127589632: {1:30181:30166} Agent received the message: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912

2017-10-30 15:24:34.992512 : AGFW:1127589632: {1:30181:30166} Preparing START command for: dbvrep.FROM_A_TO_B.fetcher 1 1

2017-10-30 15:24:34.992521 : AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: OFFLINE to: STARTING

2017-10-30 15:24:34.993195 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Executing action script: dbvrep.ksh[start]

2017-10-30 15:24:41.254703 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_REMOTE_INTERFACE set to server1:7901 for process

2017-10-30 15:24:41.254726 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.

2017-10-30 15:24:41.354916 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_LISTEN_INTERFACE set to server1:7901 for process

2017-10-30 15:24:41.354935 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.

2017-10-30 15:24:41.405052 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable MINE_ASM set to +ASM1 for process FETCHER.

2017-10-30 15:24:41.605423 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Starting process FETCHER...started

2017-10-30 15:24:41.655660 : AGFW:1106577152: {1:30181:30166} Command: start for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS

2017-10-30 15:24:41.656100 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] Executing action script: dbvrep.ksh[check]

2017-10-30 15:24:41.658242 : AGFW:1127589632: {1:30181:30166} Agent sending reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912

2017-10-30 15:24:41.908256 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] OK

2017-10-30 15:24:41.908440 : AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STARTING to: ONLINE

2017-10-30 15:24:41.908486 : AGFW:1127589632: {1:30181:30166} Started implicit monitor for [dbvrep.FROM_A_TO_B.fetcher 1 1] interval=60000 delay=60000

2017-10-30 15:24:41.908696 : AGFW:1127589632: {1:30181:30166} Agent sending last reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912

$ crsctl stop res dbvrep.FROM_A_TO_B.fetcher

CRS-2673: Attempting to stop 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'

CRS-2677: Stop of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..

2017-10-30 15:22:14.891730 : AGFW:1127589632: {1:30181:30156} Agent received the message: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818

2017-10-30 15:22:14.891762 : AGFW:1127589632: {1:30181:30156} Preparing STOP command for: dbvrep.FROM_A_TO_B.fetcher 1 1

2017-10-30 15:22:14.891772 : AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: ONLINE to: STOPPING

2017-10-30 15:22:14.892400 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Executing action script: dbvrep.ksh[stop]

2017-10-30 15:22:20.957375 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC loaded from database (458 variables).

2017-10-30 15:22:21.007939 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate version 2.9.04

2017-10-30 15:22:21.007976 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC file

2017-10-30 15:22:21.007994 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B

2017-10-30 15:22:21.008005 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] -FETCHER.ddc loaded.

2017-10-30 15:22:21.108340 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate FETCHER process shutting down.

2017-10-30 15:22:21.108361 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] OK-0: Completed successfully.

2017-10-30 15:22:45.747531 : AGFW:1091868416: {1:30181:30156} Command: stop for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS

2017-10-30 15:22:45.747898 : AGFW:1127589632: {1:30181:30156} Agent sending reply for: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818

2017-10-30 15:22:45.747902 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] Executing action script: dbvrep.ksh[check]

2017-10-30 15:22:45.949702 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] KO

2017-10-30 15:22:45.949913 : AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STOPPING to: OFFLINE

2017-10-30 15:22:45.950014 : AGFW:1127589632: {1:30181:30156} Agent sending last reply for: RESOURCE_STOP[dbvrep.dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175818

Also the relocation worked as expected: when the settings are modified through:

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.MINE_ASM=${LOCAL_ASM}

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.MINE_ASM=${LOCAL_ASM}

The MINE process get the change dynamically, so no need to restart it.

Last consideration

Adding a hard dependency between the DB and the FETCHER will require to stop the DB with the force option or to always stop the fetcher before the database. Also, the start of the DB will pullup the FETCHER (pullup:always) and the opposite as well. We will consider furtherly if we will use this dependency or if we will manage it differently (e.g. through the action script).

The hard dependency declared without the global keyword, will always start the fetcher on the server where the database runs. This is not required, but it might be nice to see the fetcher on the same node. Again, a consideration that we will discuss furtherly.

HTH

—

Ludovico

12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

Posted on September 14, 2017 by Ludovico

Recently my customer patched its 12.1.0.2 databases with the Bundle Patch 170718 on the new servers (half of the customer’s environment). The old servers are still on 161018 Bundle Patch.

We realized that we could not move anymore the databases from the old servers to the new ones because the duplicate from active database was failing with this error:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 09/11/2017 15:59:32
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on prmy1 channel at 09/11/2017 15:59:32
ORA-17629: Cannot connect to the remote database server
ORA-17630: Mismatch in the remote file protocol version client 2 server 3

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of Duplicate Db command at 09/11/2017 15:59:32

RMAN-05501: aborting duplication of target database

RMAN-03015: error occurred in stored script Memory Script

RMAN-03009: failure of backup command on prmy1 channel at 09/11/2017 15:59:32

ORA-17629: Cannot connect to the remote database server

ORA-17630: Mismatch in the remote file protocol version client 2 server 3

The last lines shows the same error that Franck blogged about some months ago.

Oracle 12.2 had introduced incompatibility with previous releases in remote file transfer via SQL*Net. At least this is what it seems. According to Oracle, this is due to a bugfix present in Oracle 12.2

Now, the bundle patch that we installed on BP 170718 contains the same bugfix (Patch for bug 18633374).

So, the incompatibility happens now between databases of the same “Major Release” (12.1.0.2).

There are two possible workarounds:

Apply the same patch level on both sides (BP170718 in my case)
Apply just the patch 18633374 on top of your current PSU/DBBP (a merge might be necessary).

We used the second approach and now we can setup Data Guard again to move our databases without downtime:

oracle@oldserver $ opatch lspatches
18633374;   <<<<<< FIX!
24340679;DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)

oracle@newserver $ opatch lspatches
22652097;
22243983;
25869760;DATABASE BUNDLE PATCH: 12.1.0.2.170718 (25869760)

oracle@oldserver $ opatch lspatches

18633374; <<<<<< FIX!

24340679;DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)

oracle@newserver $ opatch lspatches

22652097;

22243983;

25869760;DATABASE BUNDLE PATCH: 12.1.0.2.170718 (25869760)

HTH

—

Ludovico

Which Oracle Databases use most CPU on my server?

Posted on May 24, 2017 by Ludovico

Assumptions

You have many (hundreds) of instances and more than a couple of servers
One of your servers have high CPU Load
You have Enterprise Manager 12c but the Database Load does not filter by server
You want to have an historical representation of the user CPU utilization, per instance

Getting the data from the EM Repository

With the following query, connected to the SYSMAN schema of your EM repository, you can get the hourly max() and/or avg() of user CPU by instance and time.

SELECT entity_name,
  ROUND(collection_time,'HH') AS colltime,
  ROUND(avg_value,2)/16*100   AS avgv, -- 16 is my number of CPU
  ROUND(max_value,2)/16*100   AS maxv  -- same here
FROM gc$metric_values_hourly mv
JOIN em_targets t
ON (t.target_name         =mv.entity_name)
WHERE t.host_name         ='myserver1'  -- myserver1 is the server that has high CPU Usage
AND mv.metric_column_name = 'user_cpu_time_cnt' -- let's get the user cpu time
AND collection_time>sysdate-14  -- for the lase 14 days
ORDER BY entity_name,
  ROUND(collection_time,'HH');

SELECT entity_name,

ROUND(collection_time,'HH') AS colltime,

ROUND(avg_value,2)/16*100 AS avgv, -- 16 is my number of CPU

ROUND(max_value,2)/16*100 AS maxv -- same here

FROM gc$metric_values_hourly mv

JOIN em_targets t

ON (t.target_name =mv.entity_name)

WHERE t.host_name ='myserver1' -- myserver1 is the server that has high CPU Usage

AND mv.metric_column_name = 'user_cpu_time_cnt' -- let's get the user cpu time

AND collection_time>sysdate-14 -- for the lase 14 days

ORDER BY entity_name,

ROUND(collection_time,'HH');

Suppose you select just the max value: the result will be similar to this:

ENTITY_ COLLTIME          MAXV
------- ----------------  ------
mydbone	10.05.2017 16:00  0.3125
mydbone	10.05.2017 17:00  0.1875
mydbone	10.05.2017 18:00  0.1875
mydbone	10.05.2017 19:00  0.1875
mydbone	10.05.2017 20:00  0.25
mydbone	10.05.2017 21:00  0.125
mydbone	10.05.2017 22:00  0.125
mydbone	10.05.2017 23:00  0.125
mydbone	11.05.2017 00:00  0.1875
mydbone	11.05.2017 01:00  0.125
mydbone	11.05.2017 02:00  0.1875
mydbone	11.05.2017 03:00  0.1875
....                      
mydbone	23.05.2017 20:00  0.125
mydbone	23.05.2017 21:00  0.125
mydbone	23.05.2017 22:00  0.125
mydbone	23.05.2017 23:00  0.0625
mydbtwo	10.05.2017 16:00  0.3125
mydbtwo	10.05.2017 17:00  0.25
mydbtwo	10.05.2017 18:00  0.1875
mydbtwo	10.05.2017 19:00  0.1875
mydbtwo	10.05.2017 20:00  0.3125
mydbtwo	10.05.2017 21:00  0.125
mydbtwo	10.05.2017 22:00  0.125
mydbtwo	10.05.2017 23:00  0.125
.....                     
mydbtwo	14.05.2017 19:00  0.125
mydbtwo	14.05.2017 20:00  0.125
mydbtwo	14.05.2017 21:00  0.125
mydbtwo	14.05.2017 22:00  0.125
mydbtwo	14.05.2017 23:00  0.125
dbthree	10.05.2017 16:00  1.1875
dbthree	10.05.2017 17:00  0.6875
dbthree	10.05.2017 18:00  0.625
dbthree	10.05.2017 19:00  0.5625
dbthree	10.05.2017 20:00  0.8125
dbthree	10.05.2017 21:00  0.5
dbthree	10.05.2017 22:00  0.4375
dbthree	10.05.2017 23:00  0.4375
...

ENTITY_ COLLTIME MAXV

------- ---------------- ------

mydbone 10.05.2017 16:00 0.3125

mydbone 10.05.2017 17:00 0.1875

mydbone 10.05.2017 18:00 0.1875

mydbone 10.05.2017 19:00 0.1875

mydbone 10.05.2017 20:00 0.25

mydbone 10.05.2017 21:00 0.125

mydbone 10.05.2017 22:00 0.125

mydbone 10.05.2017 23:00 0.125

mydbone 11.05.2017 00:00 0.1875

mydbone 11.05.2017 01:00 0.125

mydbone 11.05.2017 02:00 0.1875

mydbone 11.05.2017 03:00 0.1875

....

mydbone 23.05.2017 20:00 0.125

mydbone 23.05.2017 21:00 0.125

mydbone 23.05.2017 22:00 0.125

mydbone 23.05.2017 23:00 0.0625

mydbtwo 10.05.2017 16:00 0.3125

mydbtwo 10.05.2017 17:00 0.25

mydbtwo 10.05.2017 18:00 0.1875

mydbtwo 10.05.2017 19:00 0.1875

mydbtwo 10.05.2017 20:00 0.3125

mydbtwo 10.05.2017 21:00 0.125

mydbtwo 10.05.2017 22:00 0.125

mydbtwo 10.05.2017 23:00 0.125

.....

mydbtwo 14.05.2017 19:00 0.125

mydbtwo 14.05.2017 20:00 0.125

mydbtwo 14.05.2017 21:00 0.125

mydbtwo 14.05.2017 22:00 0.125

mydbtwo 14.05.2017 23:00 0.125

dbthree 10.05.2017 16:00 1.1875

dbthree 10.05.2017 17:00 0.6875

dbthree 10.05.2017 18:00 0.625

dbthree 10.05.2017 19:00 0.5625

dbthree 10.05.2017 20:00 0.8125

dbthree 10.05.2017 21:00 0.5

dbthree 10.05.2017 22:00 0.4375

dbthree 10.05.2017 23:00 0.4375

...

Putting it into excel

There are one million ways to do something more reusable than excel (like rrdtool scripts, gnuplot, R, name it), but Excel is just right for most people out there (including me when I feel lazy).

Configure an Oracle Client and add the ODBC data source to the EM repository:

Open Excel, go to “Data” – “Connections” and add a new connection:
- Search…
- New Source
- DSN ODBC
Select your new ODBC data source, user, password
Uncheck “Connection to a specific table”
Give a name and click Finish
On the DSN -> Properties -> Definition, enter the SQL text I have provided previously

The result should be something similar: ( but much longer :-))

Pivoting the results

Create e new sheet and name it “pivot”, Click on “Create Pivot Table”, select your data and your dimensions:

The result:

Creating the Graph

Now that the data is correctly formatted, it’s easyy to add a graph:

just select the entire pivot table and create a new stacked area graph.

The result will be similar to this:

With such graph, it is easy to spot which databases consumed most CPU on the system in a defined period, and to track the progress if you start a “performance campaign”.

For example, you can see that the “green” and “red” databases were consuming constantly some CPU up to 17.05.2017 and then some magic solved the CPU problem for those instances.

It is also quite convenient for checking the results of new instance caging settings…

The resulting CPU will not necessarily be 100%: the SYS CPU time is not included, as well as the user CPU of all the other processes that are either not DB or not monitored with Enterprise Manager.

HTH

—

Ludovico