DBA survival BLOG

DBA stuff and Oracle Data Guard

Setting Grid Infrastructure 18c Oracle Home name during the install

Posted on August 28, 2018 by Ludovico

A colleague has been struggling for some time in order to get the correct Oracle Home name for the Grid Infrastructure18.3.0 when running gridSetup.sh.

In the graphical Oracle Universal Installer there is no way (as far as we could find) to set the Home name. Moreover, it was our intention to automate the install of Grid Infrastructure.

The complete responsefile ($OH/inventory/response/oracle.crs_Complete.rsp) contains the parameter:

#-------------------------------------------------------------------------------
#Name       : ORACLE_HOME_NAME
#Datatype   : String
#Description: Oracle Home Name. Used in creating folders and services.
#Example: ORACLE_HOME_NAME = "OHOME1"
#-------------------------------------------------------------------------------
ORACLE_HOME_NAME="OraGI18Home1"

#-------------------------------------------------------------------------------

#Name : ORACLE_HOME_NAME

#Datatype : String

#Description: Oracle Home Name. Used in creating folders and services.

#Example: ORACLE_HOME_NAME = "OHOME1"

#-------------------------------------------------------------------------------

ORACLE_HOME_NAME="OraGI18Home1"

However, when using a responsefile with such parameter, gridSetup.sh fails with the error:

Cause - Syntactically incorrect response file.
Either unexpected variables are specified or expected variables are not specified in the response file.
Action - Refer the latest product specific response file template
Summary  - cvc-complex-type.2.4.a: Invalid content was found starting with element 'ORACLE_HOME_NAME'.
One of '{..... long list .....}' is expected.

Cause - Syntactically incorrect response file.

Either unexpected variables are specified or expected variables are not specified in the response file.

Action - Refer the latest product specific response file template

Summary - cvc-complex-type.2.4.a: Invalid content was found starting with element 'ORACLE_HOME_NAME'.

One of '{..... long list .....}' is expected.

After some tries (and a SR), this happens to actually work:

strip the ORACLE_HOME_NAME parameter from the responsefile
pass it as a double-quoted parameter at the end of the gridSetup.sh command line

./gridSetup.sh -debug -responseFile inventory/response/Grid_Config.rsp "ORACLE_HOME_NAME=YourGIHomeName"

1	./gridSetup.sh -debug -responseFile inventory/response/Grid_Config.rsp "ORACLE_HOME_NAME=YourGIHomeName"

HTH

My own Dbvisit Replicate integration with Grid Infrastructure

Posted on October 30, 2017 by Ludovico

I am helping my customer for a PoC of Dbvisit Replicate as a logical replication tool. I will not discuss (at least, not in this post) about the capabilities of the tool itself, its configuration or the caveats that you should beware of when you do logical replication. Instead, I will concentrate on how we will likely integrate it in the current environment.

My role in this PoC is to make sure that the tool will be easy to operate from the operational point of view, and the database operations, here, are supported by Oracle Grid Infrastructure and cold failover clusters.

Note: there are official Dbvisit online resources about how to configure Dbvisit Replicate in a cluster. I aim to complement those informations, not copy them.

Quick overview

If you know Dbvisit replicate, skip this paragraph.

There are three main components of Dbvisit Replicate: The FETCHER, the MINE and the APPLY processes. The FETCHER gets the redo stream from the source and sends it to the MINE process. The MINE process elaborates the redo streams and converts it in proprietary transaction log files (named plog). The APPLY process gets the plog files and applies the transactions on the destination database.

From an architectural point of view, MINE and APPLY do not need to run close to the databases that are part of the configuration. The FETCHER process, by opposite, needs to be local to the source database online log files (and archived logs).

Because the MINE process is the most resource intensive, it is not convenient to run it where the databases reside, as it might consume precious CPU resources that are licensed for Oracle Database. So, first step in this PoC: the FETCHER processes will run on the cluster, while MINE and APPLY will run on a dedicated Virtual Machine.

Clustering considerations

the FETCHER does NOT need to run on the server of the source database: having access to the online logs through the ASM instance is enough
to avoid SPoF, the fetcher should be a cluster resource that can relocate without problems
to simplify the configuration, the FETCHER configuration and the Dbvisit binaries should be on a shared filesystem (the FETCHER does not persist any data, just the logs)
the destination database might be literally anywhere: the APPLY connects via SQL*Net, so a correct name resolution and routing to the destination database are enough

so the implementation steps are:

create a shared filesystem
install dbvisit in the shared filesystem
create the Dbvisit Replicate configuration on the dedicated VM
copy the configuration files on the cluster
prepare an action script
configure the resource
test!

Convention over configuration: the importance of a strong naming convention

Before starting the implementation, I decided to put all the caveats related to the FETCHER resource relocation on paper:

Where will the configuration files reside? Dbvisit has an important variable: the Configuration Name. All the operations are done by passing a configuration file named /{PATH}/{CONFIG_NAME}/{CONFIG_NAME}-{PROCESS_TYPE}.ddc to the dbvrep binary. So, I decided to put ALL the configuration directories under the same path: given the Configuration Name, I will always be able to get the configuration file path.
How will the configuration files relocate from one node to the other? Easy here: they won’t. I will use an ACFS filesystem
How can I link the cluster resource with its configuration name? Easy again: I call my resources dbvrep.CONFIGNAME.PROCESS_TYPE. e.g. dbvrep.FROM_A_TO_B.fetcher
How will I manage the need to use a new version of dbvisit in the future? Old and new versions must coexist: Instead of using external configuration files, I will just use a custom resource attribute named DBVREP_HOME inside my resource type definition. (see later)
What port number should I use? Of course, many fetchers started on different servers should not have conflicts. This is something that might be either planned or made dynamic. I will opt for the first one. But instead of getting the port number inside the Dbvisit configuration, I will use a custom resource attribute: DBVREP_PORT.

Considerations on the FETCHER listen address

This requires a dedicated paragraph. The Dbvisit documentation suggest to create a VIP, bind on the VIP address and create a dependency between the FETCHER resource and the VIP. Here is where my configuration will differ.

Having a separate VIP per FETCHER resource might, potentially, lead to dozens of VIPs in the cluster. Everything will depend on the success of the PoC and on how many internal clients will decide to ask for such implementation. Many VIPs == many interactions with network admins for address reservation, DNS configurations, etc. Long story short, it might slow down the creation and maintenance of new configurations.

Instead, each FETCHER will listen to the local server address, and the action script will take care of:

getting the current host name
getting the current ASM instance
changing the settings of the specific Dbvisit Replicate configuration (ASM instance and FETCHER listen address)
starting the FETCHER

Implementation

Now that all the caveats and steps are clear, I can show how I implemented it:

Create a shared filesystem

asmcmd volcreate -G ACFS -s 10G dbvisit --column 1
/sbin/mkfs -t acfs /dev/asm/dbvisit-293
sudo /u01/app/grid/product/12.1.0.2/grid/bin/srvctl add filesystem -d /dev/asm/dbvisit-293 -m /u02/data/oracle/dbvisit -u oracle -fstype ACFS -autostart ALWAYS
srvctl start filesystem -d /dev/asm/dbvisit-293

asmcmd volcreate -G ACFS -s 10G dbvisit --column 1

/sbin/mkfs -t acfs /dev/asm/dbvisit-293

sudo /u01/app/grid/product/12.1.0.2/grid/bin/srvctl add filesystem -d /dev/asm/dbvisit-293 -m /u02/data/oracle/dbvisit -u oracle -fstype ACFS -autostart ALWAYS

srvctl start filesystem -d /dev/asm/dbvisit-293

Install dbvisit in the shared filesystem

out of scope!

1	out of scope!

Create the Dbvisit Replicate configuration on the dedicated VM

out of scope!

1	out of scope!

Copy the configuration files from the Dbvisit VM to the cluster

scp /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B-FETCHER.ddc \ 
 cluster-scan:/u02/data/oracle/dbvisit/FROM_A_TO_B

1 2	scp /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B-FETCHER.ddc \ cluster-scan:/u02/data/oracle/dbvisit/FROM_A_TO_B

Prepare an action script

$ cat dbvrep.sh
#!/bin/ksh
########################################
# Name   : dbvrep.sh
# Author : Ludovico Caldara, Trivadis AG

# the DBVISIT FETCHER process needs to know 2 attributes: DBVREP_HOME and DBVREP_PORT.
# If you want to call the action script directly set:
# _CRS_NAME=<resource name in format dbvrep.CONFIGNAME.fetcher>
# _CRS_DBVREP_HOME=<dbvrep installation path>
# _CRS_DBVREP_PORT=<listening port>

DBVREP_RES_NAME=${_CRS_NAME}
DBVREP_CONFIG_NAME=`echo $DBVREP_RES_NAME | awk -F. '{print $2}'`

# MINE, FETCHER or APPLY?
DBVREP_PROCESS_TYPE=`echo $DBVREP_RES_NAME | awk -F. '{print toupper($3)}'`

DBVREP_HOME=${_CRS_DBVREP_HOME}
DBVREP=${DBVREP_HOME}/dbvrep
DBVREP_PORT=${_CRS_DBVREP_PORT}
DBVREP_CONFIG_PATH=/u02/data/oracle/dbvisit

DBVREP_CONFIG_FILE=${DBVREP_CONFIG_PATH}/${DBVREP_CONFIG_NAME}/${DBVREP_CONFIG_NAME}-${DBVREP_PROCESS_TYPE}.ddc

function F_verify_dbvrep_up {
        ps -eaf | grep "[d]bvrep ${DBVREP_PROCESS_TYPE} $DBVREP_CONFIG_NAME" > /dev/null
        if [ $? -eq 0 ] ; then
                echo "OK"
        else
                echo "KO"
                exit 1
        fi
}

ACTION="${1}"
case "$ACTION" in

        'start')
        LOCAL_ASM="+"`ps -eaf | grep [a]sm_pmon | awk -F+ '{print $NF}'`;

        if [ "${DBVREP_PROCESS_TYPE}" == "FETCHER" ] ; then
                $DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} --silent <<EOF
set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.MINE_ASM=${LOCAL_ASM}
start FETCHER
EOF
        fi
;;

        'stop')
        $DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} shutdown ${DBVREP_PROCESS_TYPE}

;;

        'check')
        F_verify_dbvrep_up
;;

        'clean')
        sleep 1
        exit 0
;;

        *)
usage
;;

esac

$ cat dbvrep.sh

#!/bin/ksh

########################################

# Name : dbvrep.sh

# Author : Ludovico Caldara, Trivadis AG

# the DBVISIT FETCHER process needs to know 2 attributes: DBVREP_HOME and DBVREP_PORT.

# If you want to call the action script directly set:

# _CRS_NAME=<resource name in format dbvrep.CONFIGNAME.fetcher>

# _CRS_DBVREP_HOME=<dbvrep installation path>

# _CRS_DBVREP_PORT=<listening port>

DBVREP_RES_NAME=${_CRS_NAME}

DBVREP_CONFIG_NAME=`echo $DBVREP_RES_NAME | awk -F. '{print $2}'`

# MINE, FETCHER or APPLY?

DBVREP_PROCESS_TYPE=`echo $DBVREP_RES_NAME | awk -F. '{print toupper($3)}'`

DBVREP_HOME=${_CRS_DBVREP_HOME}

DBVREP=${DBVREP_HOME}/dbvrep

DBVREP_PORT=${_CRS_DBVREP_PORT}

DBVREP_CONFIG_PATH=/u02/data/oracle/dbvisit

DBVREP_CONFIG_FILE=${DBVREP_CONFIG_PATH}/${DBVREP_CONFIG_NAME}/${DBVREP_CONFIG_NAME}-${DBVREP_PROCESS_TYPE}.ddc

function F_verify_dbvrep_up {

ps -eaf | grep "[d]bvrep ${DBVREP_PROCESS_TYPE} $DBVREP_CONFIG_NAME" > /dev/null

if [ $? -eq 0 ] ; then

echo "OK"

else

echo "KO"

exit 1

}

ACTION="${1}"

case "$ACTION" in

'start')

LOCAL_ASM="+"`ps -eaf | grep [a]sm_pmon | awk -F+ '{print $NF}'`;

if [ "${DBVREP_PROCESS_TYPE}" == "FETCHER" ] ; then

$DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} --silent <<EOF

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.MINE_ASM=${LOCAL_ASM}

start FETCHER

EOF

;;

'stop')

$DBVREP --daemon --ddcfile ${DBVREP_CONFIG_FILE} shutdown ${DBVREP_PROCESS_TYPE}

;;

'check')

F_verify_dbvrep_up

;;

'clean')

sleep 1

exit 0

;;

usage

;;

esac

Configure the resource

$ cat dbvrep.type
ATTRIBUTE=ACTION_SCRIPT
DEFAULT_VALUE=/path_to_action_script/dbvrep.ksh
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=SCRIPT_TIMEOUT
DEFAULT_VALUE=120
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=DBVREP_PORT
DEFAULT_VALUE=
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=DBVREP_HOME
DEFAULT_VALUE=/u02/data/oracle/dbvisit/replicate
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=SERVER_POOLS
DEFAULT_VALUE=*
TYPE=STRING
FLAGS=CONFIG|HOTMOD

ATTRIBUTE=START_DEPENDENCIES
DEFAULT_VALUE=hard() weak(type:ora.listener.type,global:type:ora.scan_listener.type) pullup()
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=STOP_DEPENDENCIES
DEFAULT_VALUE=hard()
TYPE=STRING
FLAGS=CONFIG


ATTRIBUTE=RESTART_ATTEMPTS
DEFAULT_VALUE=2
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=CHECK_INTERVAL
DEFAULT_VALUE=60
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=FAILURE_THRESHOLD
DEFAULT_VALUE=2
TYPE=INT
FLAGS=CONFIG

ATTRIBUTE=UPTIME_THRESHOLD
DEFAULT_VALUE=8h
TYPE=STRING
FLAGS=CONFIG

ATTRIBUTE=FAILURE_INTERVAL
DEFAULT_VALUE=3600
TYPE=INT
FLAGS=CONFIG

$ crsctl add type dbvrep.type -basetype cluster_resource -file dbvrep.type
$ crsctl add resource dbvrep.FROM_A_TO_B.fetcher -type dbvrep.type \
  -attr "START_DEPENDENCIES=hard(db.source) pullup:always(db.source),STOP_DEPENDENCIES=hard(db.source),DBVREP_PORT=7901"

$ cat dbvrep.type

ATTRIBUTE=ACTION_SCRIPT

DEFAULT_VALUE=/path_to_action_script/dbvrep.ksh

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=SCRIPT_TIMEOUT

DEFAULT_VALUE=120

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=DBVREP_PORT

DEFAULT_VALUE=

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=DBVREP_HOME

DEFAULT_VALUE=/u02/data/oracle/dbvisit/replicate

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=SERVER_POOLS

DEFAULT_VALUE=*

TYPE=STRING

FLAGS=CONFIG|HOTMOD

ATTRIBUTE=START_DEPENDENCIES

DEFAULT_VALUE=hard() weak(type:ora.listener.type,global:type:ora.scan_listener.type) pullup()

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=STOP_DEPENDENCIES

DEFAULT_VALUE=hard()

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=RESTART_ATTEMPTS

DEFAULT_VALUE=2

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=CHECK_INTERVAL

DEFAULT_VALUE=60

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=FAILURE_THRESHOLD

DEFAULT_VALUE=2

TYPE=INT

FLAGS=CONFIG

ATTRIBUTE=UPTIME_THRESHOLD

DEFAULT_VALUE=8h

TYPE=STRING

FLAGS=CONFIG

ATTRIBUTE=FAILURE_INTERVAL

DEFAULT_VALUE=3600

TYPE=INT

FLAGS=CONFIG

$ crsctl add type dbvrep.type -basetype cluster_resource -file dbvrep.type

$ crsctl add resource dbvrep.FROM_A_TO_B.fetcher -type dbvrep.type \

-attr "START_DEPENDENCIES=hard(db.source) pullup:always(db.source),STOP_DEPENDENCIES=hard(db.source),DBVREP_PORT=7901"

Test!

$ crsctl start res dbvrep.FROM_A_TO_B.fetcher
CRS-2672: Attempting to start 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'
CRS-2676: Start of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..
2017-10-30 15:24:34.992478 :    AGFW:1127589632: {1:30181:30166} Agent received the message: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912
2017-10-30 15:24:34.992512 :    AGFW:1127589632: {1:30181:30166} Preparing START command for: dbvrep.FROM_A_TO_B.fetcher 1 1
2017-10-30 15:24:34.992521 :    AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: OFFLINE to: STARTING
2017-10-30 15:24:34.993195 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Executing action script: dbvrep.ksh[start]
2017-10-30 15:24:41.254703 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_REMOTE_INTERFACE set to server1:7901 for process
2017-10-30 15:24:41.254726 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.
2017-10-30 15:24:41.354916 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_LISTEN_INTERFACE set to server1:7901 for process
2017-10-30 15:24:41.354935 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.
2017-10-30 15:24:41.405052 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable MINE_ASM set to +ASM1 for process FETCHER.
2017-10-30 15:24:41.605423 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Starting process FETCHER...started
2017-10-30 15:24:41.655660 :    AGFW:1106577152: {1:30181:30166} Command: start for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS
2017-10-30 15:24:41.656100 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] Executing action script: dbvrep.ksh[check]
2017-10-30 15:24:41.658242 :    AGFW:1127589632: {1:30181:30166} Agent sending reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912
2017-10-30 15:24:41.908256 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] OK
2017-10-30 15:24:41.908440 :    AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STARTING to: ONLINE
2017-10-30 15:24:41.908486 :    AGFW:1127589632: {1:30181:30166} Started implicit monitor for [dbvrep.FROM_A_TO_B.fetcher 1 1] interval=60000 delay=60000
2017-10-30 15:24:41.908696 :    AGFW:1127589632: {1:30181:30166} Agent sending last reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912


$ crsctl stop res dbvrep.FROM_A_TO_B.fetcher
CRS-2673: Attempting to stop 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'
CRS-2677: Stop of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..
2017-10-30 15:22:14.891730 :    AGFW:1127589632: {1:30181:30156} Agent received the message: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818
2017-10-30 15:22:14.891762 :    AGFW:1127589632: {1:30181:30156} Preparing STOP command for: dbvrep.FROM_A_TO_B.fetcher 1 1
2017-10-30 15:22:14.891772 :    AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: ONLINE to: STOPPING
2017-10-30 15:22:14.892400 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Executing action script: dbvrep.ksh[stop]
2017-10-30 15:22:20.957375 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC loaded from database (458 variables).
2017-10-30 15:22:21.007939 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate version 2.9.04
2017-10-30 15:22:21.007963 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Copyright (C) Dbvisit Software Limited. All rights reserved.
2017-10-30 15:22:21.007976 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC file
2017-10-30 15:22:21.007994 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B
2017-10-30 15:22:21.008005 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] -FETCHER.ddc loaded.
2017-10-30 15:22:21.108340 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate FETCHER process shutting down.
2017-10-30 15:22:21.108361 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] OK-0: Completed successfully.
2017-10-30 15:22:45.747531 :    AGFW:1091868416: {1:30181:30156} Command: stop for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS
2017-10-30 15:22:45.747898 :    AGFW:1127589632: {1:30181:30156} Agent sending reply for: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818
2017-10-30 15:22:45.747902 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] Executing action script: dbvrep.ksh[check]
2017-10-30 15:22:45.949702 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] KO
2017-10-30 15:22:45.949913 :    AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STOPPING to: OFFLINE
2017-10-30 15:22:45.950014 :    AGFW:1127589632: {1:30181:30156} Agent sending last reply for: RESOURCE_STOP[dbvrep.dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175818

$ crsctl start res dbvrep.FROM_A_TO_B.fetcher

CRS-2672: Attempting to start 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'

CRS-2676: Start of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..

2017-10-30 15:24:34.992478 : AGFW:1127589632: {1:30181:30166} Agent received the message: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912

2017-10-30 15:24:34.992512 : AGFW:1127589632: {1:30181:30166} Preparing START command for: dbvrep.FROM_A_TO_B.fetcher 1 1

2017-10-30 15:24:34.992521 : AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: OFFLINE to: STARTING

2017-10-30 15:24:34.993195 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Executing action script: dbvrep.ksh[start]

2017-10-30 15:24:41.254703 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_REMOTE_INTERFACE set to server1:7901 for process

2017-10-30 15:24:41.254726 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.

2017-10-30 15:24:41.354916 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable FETCHER_LISTEN_INTERFACE set to server1:7901 for process

2017-10-30 15:24:41.354935 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] FETCHER.

2017-10-30 15:24:41.405052 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Variable MINE_ASM set to +ASM1 for process FETCHER.

2017-10-30 15:24:41.605423 :CLSDYNAM:1106577152: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [start] Starting process FETCHER...started

2017-10-30 15:24:41.655660 : AGFW:1106577152: {1:30181:30166} Command: start for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS

2017-10-30 15:24:41.656100 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] Executing action script: dbvrep.ksh[check]

2017-10-30 15:24:41.658242 : AGFW:1127589632: {1:30181:30166} Agent sending reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912

2017-10-30 15:24:41.908256 :CLSDYNAM:1081362176: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30166} [check] OK

2017-10-30 15:24:41.908440 : AGFW:1127589632: {1:30181:30166} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STARTING to: ONLINE

2017-10-30 15:24:41.908486 : AGFW:1127589632: {1:30181:30166} Started implicit monitor for [dbvrep.FROM_A_TO_B.fetcher 1 1] interval=60000 delay=60000

2017-10-30 15:24:41.908696 : AGFW:1127589632: {1:30181:30166} Agent sending last reply for: RESOURCE_START[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175912

$ crsctl stop res dbvrep.FROM_A_TO_B.fetcher

CRS-2673: Attempting to stop 'dbvrep.FROM_A_TO_B.fetcher' on 'server1'

CRS-2677: Stop of 'dbvrep.FROM_A_TO_B.fetcher' on 'server1' succeeded

..in the logs..

2017-10-30 15:22:14.891730 : AGFW:1127589632: {1:30181:30156} Agent received the message: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818

2017-10-30 15:22:14.891762 : AGFW:1127589632: {1:30181:30156} Preparing STOP command for: dbvrep.FROM_A_TO_B.fetcher 1 1

2017-10-30 15:22:14.891772 : AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: ONLINE to: STOPPING

2017-10-30 15:22:14.892400 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Executing action script: dbvrep.ksh[stop]

2017-10-30 15:22:20.957375 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC loaded from database (458 variables).

2017-10-30 15:22:21.007939 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate version 2.9.04

2017-10-30 15:22:21.007976 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] DDC file

2017-10-30 15:22:21.007994 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] /u02/data/oracle/dbvisit/FROM_A_TO_B/FROM_A_TO_B

2017-10-30 15:22:21.008005 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] -FETCHER.ddc loaded.

2017-10-30 15:22:21.108340 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] Dbvisit Replicate FETCHER process shutting down.

2017-10-30 15:22:21.108361 :CLSDYNAM:1091868416: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [stop] OK-0: Completed successfully.

2017-10-30 15:22:45.747531 : AGFW:1091868416: {1:30181:30156} Command: stop for resource: dbvrep.FROM_A_TO_B.fetcher 1 1 completed with status: SUCCESS

2017-10-30 15:22:45.747898 : AGFW:1127589632: {1:30181:30156} Agent sending reply for: RESOURCE_STOP[dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4099:5175818

2017-10-30 15:22:45.747902 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] Executing action script: dbvrep.ksh[check]

2017-10-30 15:22:45.949702 :CLSDYNAM:1123387136: [dbvrep.FROM_A_TO_B.fetcher]{1:30181:30156} [check] KO

2017-10-30 15:22:45.949913 : AGFW:1127589632: {1:30181:30156} dbvrep.FROM_A_TO_B.fetcher 1 1 state changed from: STOPPING to: OFFLINE

2017-10-30 15:22:45.950014 : AGFW:1127589632: {1:30181:30156} Agent sending last reply for: RESOURCE_STOP[dbvrep.dbvrep.FROM_A_TO_B.fetcher 1 1] ID 4098:5175818

Also the relocation worked as expected: when the settings are modified through:

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}
set FETCHER.MINE_ASM=${LOCAL_ASM}

set FETCHER.FETCHER_REMOTE_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.FETCHER_LISTEN_INTERFACE=${HOSTNAME}:${DBVREP_PORT}

set FETCHER.MINE_ASM=${LOCAL_ASM}

The MINE process get the change dynamically, so no need to restart it.

Last consideration

Adding a hard dependency between the DB and the FETCHER will require to stop the DB with the force option or to always stop the fetcher before the database. Also, the start of the DB will pullup the FETCHER (pullup:always) and the opposite as well. We will consider furtherly if we will use this dependency or if we will manage it differently (e.g. through the action script).

The hard dependency declared without the global keyword, will always start the fetcher on the server where the database runs. This is not required, but it might be nice to see the fetcher on the same node. Again, a consideration that we will discuss furtherly.

HTH

—

Ludovico

Another problem with “KSV master wait” and “ASM file metadata operation”

Posted on March 24, 2017 by Ludovico

My customer today tried to do a duplicate on a cluster. When preparing the auxiliary instance, she noticed that the startup nomount was hanging forever: Nothing in the alert, nothing in the trace files.

Because the database and the spfile were stored inside ASM, I’ve been quite suspicious…

The ASM trace files had the following entries:

kfgbDiscoverNow: called for group 1/0x9f5bfe53 (ACFS)

*** 2017-03-24 12:42:13.327
2017-03-24 12:42:13.327: [    GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.DiscoveryString='/dev/mapper/asm_*,/dev/asm_*'

*** 2017-03-24 12:42:15.386
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:18.387
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:21.393
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:24.398
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:27.403
kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

kfgbDiscoverNow: called for group 1/0x9f5bfe53 (ACFS)

*** 2017-03-24 12:42:13.327

2017-03-24 12:42:13.327: [ GPNP]clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:345] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.DiscoveryString='/dev/mapper/asm_*,/dev/asm_*'

*** 2017-03-24 12:42:15.386

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:18.387

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:21.393

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:24.398

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

*** 2017-03-24 12:42:27.403

kfgbTryFn: failed to acquire DG.1.3 for kfgbRefreshNow (of group 1/0x9f5bfe53)

The ASM instance had the following sessions waiting:

SQL>  select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

INST_ID  SID SERIAL# STATUS  EVENT                        WAIT_CLASS WAIT_TIME LOGON_TIME          PROGRAM                             MACHINE
------- ---- ------- ------- ---------------------------- ---------- --------- ------------------- ----------------------------------- --------
      2   36   41916 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 13:47:28 oracle@clusrv02 (O001)              clusrv02
      2  266   64885 ACTIVE  KSV master wait              Other              0 24.03.2017 13:47:25 oracletorcl01v@clusrv02 (TNS V1-V3) clusrv02
      1  483   63446 ACTIVE  KSV master wait              Other              0 24.03.2017 13:31:14 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01
      1  497   31202 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 13:39:07 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01
      3  708     484 ACTIVE  ASM file metadata operation  Other              0 24.03.2017 12:38:56 OMS                                 omssrv01

SQL> select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

INST_ID SID SERIAL# STATUS EVENT WAIT_CLASS WAIT_TIME LOGON_TIME PROGRAM MACHINE

------- ---- ------- ------- ---------------------------- ---------- --------- ------------------- ----------------------------------- --------

2 36 41916 ACTIVE ASM file metadata operation Other 0 24.03.2017 13:47:28 oracle@clusrv02 (O001) clusrv02

2 266 64885 ACTIVE KSV master wait Other 0 24.03.2017 13:47:25 oracletorcl01v@clusrv02 (TNS V1-V3) clusrv02

1 483 63446 ACTIVE KSV master wait Other 0 24.03.2017 13:31:14 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01

1 497 31202 ACTIVE ASM file metadata operation Other 0 24.03.2017 13:39:07 oracletorcl01v@clusrv01 (TNS V1-V3) clusrv01

3 708 484 ACTIVE ASM file metadata operation Other 0 24.03.2017 12:38:56 OMS omssrv01

OMS?

Around 12:38:56, another colleague in the office added a disk to one of the disk groups, through Enterprise Manager 12c!

But there were no rebalance operations:

SQL> select * from gv$asm_operation;

no rows selected

SQL> select * from gv$asm_operation;

no rows selected

It’s not the first time that I hit this type of problems. Sadly, sometimes it requires a full restart of the cluster or of ASM (because of different bugs).

This time, however, I have tried to kill only the foreground sessions waiting on “ASM file metadata operation”, starting with the one coming from the OMS.

Surprisingly, after killing that session, everything was fine again:

-- on +ASM3
SQL> alter system kill session '708,484';

System altered.

SQL>

SQL>  select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

no rows selected

SQL>

-- on +ASM3

SQL> alter system kill session '708,484';

System altered.

SQL>

SQL> select inst_id, sid, serial#, status, event, wait_class, wait_time, logon_time , program, machine from gv$session where wait_class!='Idle' order by sid;

no rows selected

SQL>

I never add disks via OMS (I’m a sqlplus guy ;-)) , I wonder what went wrong with it 🙂

—

Ludovico

DBMS_QOPATCH, datapatch, rollback, apply force

Posted on November 21, 2016 by Ludovico

I am working for a customer on a quite big implementation of Cold Failover Cluster with Oracle Grid Infrastructure on Linux. I hope to have some material to publish soon about it! However, in this post I will be talking about patching the database in a cold-failover environment.

DISCLAIMER: I use massively scripts provided in this great blog post by Simon Pane:

https://www.pythian.com/blog/oracle-database-12c-patching-dbms_qopatch-opatch_xml_inv-and-datapatch/

Thank you Simon for sharing this 🙂

Intro

We are not yet in the process of doing out-of-place patching; at the moment the customer prefers to do in-place patching:

evacuate a node by relocating all the databases on other nodes
patching the node binaries
move back the databases and patch them with datapatch
do the same for the remaining nodes

I beg to disagree with this method, being a fan of having many patched golden copies distributed on all servers and patching the databases by just changing the ORACLE_HOME and running datapatch (like Rapid Home Provisioning does). But, this is the situation today, and we have to live with it.

Initial situation

Server 1, 2 and 3: one-off 20139391 applied
New database created

When the DBCA creates a new database, in 12.1.0.2, it does not run datapatch by default, thus, the database does not have any patches installed.

However, this specific one-off patch does not modify anything in the database (sql_patch=false)

SQL> -- Patches installed in the oracle home
SQL> r
  1   with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  2   select x.patch_id, x.patch_uid, x.description
  3     from a,
  4          xmltable('InventoryInstance/patches/*'
  5             passing a.patch_output
  6             columns
  7                patch_id number path 'patchID',
  8                patch_uid number path 'uniquePatchID',
  9                description varchar2(80) path 'patchDescription',
 10                sql_patch varchar2(8) path 'sqlPatch'
 10          ) x
 11 *

  PATCH_ID  PATCH_UID DESCRIPTION               SQL_PATCH
---------- ---------- ------------------------- ---------
  20139391   18466820                           false

SQL> -- Patches installed in the database
SQL> select s.patch_id, s.patch_uid, s.description from dba_registry_sqlpatch s;
no rows selected

SQL>

SQL> -- Patches installed in the oracle home

SQL> r

1 with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)

2 select x.patch_id, x.patch_uid, x.description

3 from a,

4 xmltable('InventoryInstance/patches/*'

5 passing a.patch_output

6 columns

7 patch_id number path 'patchID',

8 patch_uid number path 'uniquePatchID',

9 description varchar2(80) path 'patchDescription',

10 sql_patch varchar2(8) path 'sqlPatch'

10 ) x

11 *

PATCH_ID PATCH_UID DESCRIPTION SQL_PATCH

---------- ---------- ------------------------- ---------

20139391 18466820 false

SQL> -- Patches installed in the database

SQL> select s.patch_id, s.patch_uid, s.description from dba_registry_sqlpatch s;

no rows selected

SQL>

and the datapatch runs without touching the db:

oracle1> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 13:34:10 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done

Current state of SQL patches:

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  Nothing to apply

SQL Patching tool complete on Wed Nov  2 13:34:13 2016
oracle1>

oracle1> $ORACLE_HOME/OPatch/datapatch -verbose

SQL Patching tool version 12.2.0.0.0 on Wed Nov 2 13:34:10 2016

Connecting to database...OK

Determining current state...done

Current state of SQL patches:

Adding patches to installation queue and performing prereq checks...

Installation queue:

Nothing to roll back

Nothing to apply

SQL Patching tool complete on Wed Nov 2 13:34:13 2016

oracle1>

Next step: I evacuate the server 2 and patch it, then I relocate my database on it

oracle2> $ORACLE_HOME/OPatch/opatch lspatches
24340679;DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)

OPatch succeeded.
oracle2>
oracle2> crsctl relocate res theludot.db -n oracle2
CRS-2673: Attempting to stop 'theludot.db' on 'oracle1'
CRS-2677: Stop of 'theludot.db' on 'oracle1' succeeded
CRS-2672: Attempting to start 'theludot.db' on 'oracle2'
CRS-2676: Start of 'theludot.db' on 'oracle2' succeeded
oracle2>

oracle2> $ORACLE_HOME/OPatch/opatch lspatches

24340679;DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)

OPatch succeeded.

oracle2>

oracle2> crsctl relocate res theludot.db -n oracle2

CRS-2673: Attempting to stop 'theludot.db' on 'oracle1'

CRS-2677: Stop of 'theludot.db' on 'oracle1' succeeded

CRS-2672: Attempting to start 'theludot.db' on 'oracle2'

CRS-2676: Start of 'theludot.db' on 'oracle2' succeeded

oracle2>

Now the database is not at the same level of the binaries and need to be patched:

SQL> -- Patches installed in the oracle home
SQL> r
  1  with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  2   select x.*
  3     from a,
  4   xmltable('InventoryInstance/patches/*'
  5   passing a.patch_output
  6   columns
  7      patch_id number path 'patchID',
  8      patch_uid number path 'uniquePatchID',
  9      description varchar2(80) path 'patchDescription',
 10    constituent number path 'constituent',
 11    patch_type varchar2(20) path 'patchType',
 12    rollbackable varchar2(20) path 'rollbackable',
 13    sql_patch varchar2(8) path 'sqlPatch',
 14    DBStartMode varchar2(10) path 'sqlPatchDatabaseStartupMode'
 15*  ) x

  PATCH_ID  PATCH_UID DESCRIPTION                                        CONSTITUENT PATCH_TYPE           ROLLBACKABLE SQL_PATC DBSTARTMOD
---------- ---------- -------------------------------------------------- ----------- -------------------- ------------ -------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)     24340679 singleton            true         true     normal
  23144544   20247727 DATABASE BUNDLE PATCH: 12.1.0.2.160719 (23144544)     24340679 singleton            true         true     normal
  22806133   19983161 DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133)     24340679 singleton            true         true     normal
  21949015   19576071 DATABASE BUNDLE PATCH: 12.1.0.2.160119 (21949015)     24340679 singleton            true         true     normal
  21694919   19338504 DATABASE BUNDLE PATCH: 12.1.0.2.13 (21694919)         24340679 singleton            true         true     normal
  21527488   19238856 DATABASE BUNDLE PATCH: 12.1.0.2.12 (21527488)         24340679 singleton            true         true     normal
  21359749   19147148 DATABASE BUNDLE PATCH: 12.1.0.2.11 (21359749)         24340679 singleton            true         true     normal
  21125181   18992109 DATABASE BUNDLE PATCH: 12.1.0.2.10 (21125181)         24340679 singleton            true         true     normal
  20950328   18903184 DATABASE BUNDLE PATCH: 12.1.0.2.9 (20950328)          24340679 singleton            true         true     normal
  20788771   18810992 DATABASE BUNDLE PATCH: 12.1.0.2.8 (20788771)          24340679 singleton            true         true     normal
  20594149   18687526 DATABASE BUNDLE PATCH: 12.1.0.2.7 (20594149)          24340679 singleton            true         true     normal
  20415006   18565812 DATABASE BUNDLE PATCH: 12.1.0.2.6 (20415006)          24340679 singleton            true         true     normal
  20243804   18468778 DATABASE BUNDLE PATCH: 12.1.0.2.5 (20243804)          24340679 singleton            true         true     normal

SQL> -- Patches installed in the oracle home

SQL> r

1 with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)

2 select x.*

3 from a,

4 xmltable('InventoryInstance/patches/*'

5 passing a.patch_output

6 columns

7 patch_id number path 'patchID',

8 patch_uid number path 'uniquePatchID',

9 description varchar2(80) path 'patchDescription',

10 constituent number path 'constituent',

11 patch_type varchar2(20) path 'patchType',

12 rollbackable varchar2(20) path 'rollbackable',

13 sql_patch varchar2(8) path 'sqlPatch',

14 DBStartMode varchar2(10) path 'sqlPatchDatabaseStartupMode'

15* ) x

PATCH_ID PATCH_UID DESCRIPTION CONSTITUENT PATCH_TYPE ROLLBACKABLE SQL_PATC DBSTARTMOD

---------- ---------- -------------------------------------------------- ----------- -------------------- ------------ -------- ----------

24340679 20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679) 24340679 singleton true true normal

23144544 20247727 DATABASE BUNDLE PATCH: 12.1.0.2.160719 (23144544) 24340679 singleton true true normal

22806133 19983161 DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133) 24340679 singleton true true normal

21949015 19576071 DATABASE BUNDLE PATCH: 12.1.0.2.160119 (21949015) 24340679 singleton true true normal

21694919 19338504 DATABASE BUNDLE PATCH: 12.1.0.2.13 (21694919) 24340679 singleton true true normal

21527488 19238856 DATABASE BUNDLE PATCH: 12.1.0.2.12 (21527488) 24340679 singleton true true normal

21359749 19147148 DATABASE BUNDLE PATCH: 12.1.0.2.11 (21359749) 24340679 singleton true true normal

21125181 18992109 DATABASE BUNDLE PATCH: 12.1.0.2.10 (21125181) 24340679 singleton true true normal

20950328 18903184 DATABASE BUNDLE PATCH: 12.1.0.2.9 (20950328) 24340679 singleton true true normal

20788771 18810992 DATABASE BUNDLE PATCH: 12.1.0.2.8 (20788771) 24340679 singleton true true normal

20594149 18687526 DATABASE BUNDLE PATCH: 12.1.0.2.7 (20594149) 24340679 singleton true true normal

20415006 18565812 DATABASE BUNDLE PATCH: 12.1.0.2.6 (20415006) 24340679 singleton true true normal

20243804 18468778 DATABASE BUNDLE PATCH: 12.1.0.2.5 (20243804) 24340679 singleton true true normal

The column CONSTITUENT is important here because it tells us what the parent patch_id is. This is the column that we have to check when we want to know if the patch has been applied on the database.

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Nov  2 13:47:49 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_63956_2016_11_02_13_47_49/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Bundle series DBBP:
  ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_apply_THELUDOT_2016Nov02_13_48_03.log (no errors)
SQL Patching tool complete on Wed Nov  2 13:49:51 2016
oracle2>

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose

SQL Patching tool version 12.1.0.2.0 on Wed Nov 2 13:47:49 2016

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_63956_2016_11_02_13_47_49/sqlpatch_invocation.log

Connecting to database...OK

Bootstrapping registry and package to current versions...done

Determining current state...done

Current state of SQL patches:

Bundle series DBBP:

ID 161018 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...

Installation queue:

Nothing to roll back

The following patches will be applied:

24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...

Patch installation complete. Total patches installed: 1

Validating logfiles...

Patch 24340679 apply: SUCCESS

logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_apply_THELUDOT_2016Nov02_13_48_03.log (no errors)

SQL Patching tool complete on Wed Nov 2 13:49:51 2016

oracle2>

Now the patch is visible inside the dba_registry_sqlpatch:

SQL> r
  1* select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id  from dba_registry_sqlpatch

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02-NOV-16 01.49.51.664800 PM   APPLY           SUCCESS  DBBP              161018

SQL> r

1* select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id from dba_registry_sqlpatch

PATCH_ID PATCH_UID DESCRIPTION ACTION_TIME ACTION STATUS BUNDLE_SERIES BUNDLE_ID

---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------

24340679 20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679) 02-NOV-16 01.49.51.664800 PM APPLY SUCCESS DBBP 161018

Notice that the child patches are not listed in thie view.

Rolling back

Now, one node is patched, but the others are not. What happen if I relocate the patched database to a non-patched node?

oracle1> crsctl relocate res theludot.db -n oracle1
CRS-2673: Attempting to stop 'theludot.db' on 'oracle2'
CRS-2677: Stop of 'theludot.db' on 'oracle2' succeeded
CRS-2672: Attempting to start 'theludot.db' on 'oracle1'
CRS-2676: Start of 'theludot.db' on 'oracle1' succeeded
oracle1>

oracle1> crsctl relocate res theludot.db -n oracle1

CRS-2673: Attempting to stop 'theludot.db' on 'oracle2'

CRS-2677: Stop of 'theludot.db' on 'oracle2' succeeded

CRS-2672: Attempting to start 'theludot.db' on 'oracle1'

CRS-2676: Start of 'theludot.db' on 'oracle1' succeeded

oracle1>

The patch is applied inside the database but not in the binaries!

SQL>  select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id
  2   from dba_registry_sqlpatch;

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02.11.16 13:49:51.664800       APPLY           SUCCESS  DBBP              161018

SQL> r
  1  with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)
  2   select x.*
  3     from a,
  4   xmltable('InventoryInstance/patches/*'
  5   passing a.patch_output
  6   columns
  7      patch_id number path 'patchID',
  8      patch_uid number path 'uniquePatchID',
  9      description varchar2(80) path 'patchDescription',
 10    constituent number path 'constituent',
 11    patch_type varchar2(20) path 'patchType',
 12    rollbackable varchar2(20) path 'rollbackable',
 13    sql_patch varchar2(8) path 'sqlPatch',
 14    DBStartMode varchar2(10) path 'sqlPatchDatabaseStartupMode'
 15* ) x

  PATCH_ID  PATCH_UID DESCRIPTION                                        CONSTITUENT PATCH_TYPE           ROLLBACKABLE SQL_PATC DBSTARTMOD
---------- ---------- -------------------------------------------------- ----------- -------------------- ------------ -------- ----------
  20139391   18466820                                                                singleton            true         false

SQL> select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id

2 from dba_registry_sqlpatch;

PATCH_ID PATCH_UID DESCRIPTION ACTION_TIME ACTION STATUS BUNDLE_SERIES BUNDLE_ID

---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------

24340679 20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679) 02.11.16 13:49:51.664800 APPLY SUCCESS DBBP 161018

SQL> r

1 with a as (select dbms_qopatch.get_opatch_lsinventory patch_output from dual)

2 select x.*

3 from a,

4 xmltable('InventoryInstance/patches/*'

5 passing a.patch_output

6 columns

7 patch_id number path 'patchID',

8 patch_uid number path 'uniquePatchID',

9 description varchar2(80) path 'patchDescription',

10 constituent number path 'constituent',

11 patch_type varchar2(20) path 'patchType',

12 rollbackable varchar2(20) path 'rollbackable',

13 sql_patch varchar2(8) path 'sqlPatch',

14 DBStartMode varchar2(10) path 'sqlPatchDatabaseStartupMode'

15* ) x

PATCH_ID PATCH_UID DESCRIPTION CONSTITUENT PATCH_TYPE ROLLBACKABLE SQL_PATC DBSTARTMOD

---------- ---------- -------------------------------------------------- ----------- -------------------- ------------ -------- ----------

20139391 18466820 singleton true false

If I run datapatch again, the patch is rolled back:

oracle1> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 14:48:50 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done

Current state of SQL patches:

Adding patches to installation queue and performing prereq checks...
Installation queue:
  The following patches will be rolled back:
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))
  Nothing to apply

catcon: ALL catcon-related output will be written to /tmp/sqlpatch_catcon__catcon_24776.lst
catcon: See /tmp/sqlpatch_catcon_*.log files for output generated by scripts
catcon: See /tmp/sqlpatch_catcon__*.lst files for spool files, if any
Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 24340679 rollback: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_rollback_THELUDOT_2016Nov. 02_14_48_53.log (no errors)
SQL Patching tool complete on Wed Nov  2 14:48:53 2016
oracle1>

oracle1> $ORACLE_HOME/OPatch/datapatch -verbose

SQL Patching tool version 12.2.0.0.0 on Wed Nov 2 14:48:50 2016

Connecting to database...OK

Determining current state...done

Current state of SQL patches:

Adding patches to installation queue and performing prereq checks...

Installation queue:

The following patches will be rolled back:

24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Nothing to apply

catcon: ALL catcon-related output will be written to /tmp/sqlpatch_catcon__catcon_24776.lst

catcon: See /tmp/sqlpatch_catcon_*.log files for output generated by scripts

catcon: See /tmp/sqlpatch_catcon__*.lst files for spool files, if any

Installing patches...

Patch installation complete. Total patches installed: 1

Validating logfiles...

Patch 24340679 rollback: SUCCESS

logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_rollback_THELUDOT_2016Nov. 02_14_48_53.log (no errors)

SQL Patching tool complete on Wed Nov 2 14:48:53 2016

oracle1>

The patch has been rolled back according to the datapatch, and the action is shown in the dba_registry_sqlpatch:

SQL> r
  1   select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id
  2*  from dba_registry_sqlpatch

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02.11.16 13:49:51.664800       APPLY           SUCCESS  DBBP              161018
  24340679   20646358                                                    02.11.16 14:48:53.760632       ROLLBACK        SUCCESS

SQL> r

1 select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id

2* from dba_registry_sqlpatch

PATCH_ID PATCH_UID DESCRIPTION ACTION_TIME ACTION STATUS BUNDLE_SERIES BUNDLE_ID

---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------

24340679 20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679) 02.11.16 13:49:51.664800 APPLY SUCCESS DBBP 161018

24340679 20646358 02.11.16 14:48:53.760632 ROLLBACK SUCCESS

But if I look at the logfile, the patch had some errors:

oracle1> grep "ORA-\|PLS-" /tmp/sqlpatch_catcon_0.log
ORA-20001: set_patch_metadata not called
ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 621
ORA-06512: a ligne 2
IGNORABLE ERRORS: ORA-02303
IGNORABLE ERRORS: ORA-01418
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
IGNORABLE ERRORS: ORA-01435
ORA-01555: cliches trop vieux : rollback segment no , nomme "", trop petit
ORA-22924: cliche trop ancien
ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 102
ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 663
ORA-06512: a ligne 1

oracle1> grep "ORA-\|PLS-" /tmp/sqlpatch_catcon_0.log

ORA-20001: set_patch_metadata not called

ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 621

ORA-06512: a ligne 2

IGNORABLE ERRORS: ORA-02303

IGNORABLE ERRORS: ORA-01418

IGNORABLE ERRORS: ORA-01435

ORA-01555: cliches trop vieux : rollback segment no , nomme "", trop petit

ORA-22924: cliche trop ancien

ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 102

ORA-06512: a "SYS.DBMS_SQLPATCH", ligne 663

ORA-06512: a ligne 1

Indeed, the patch looks still there:

SQL> r
  1  SELECT dbms_sqlpatch.sql_registry_state
  2* FROM dual

SQL_REGISTRY_STATE
--------------------------------------------------------------------------------
<sql_registry_state>
  <!-- Non bundle patches -->
  <!-- Bundle patches -->
  <patch bundle="yes" id="24340679" uid="20646358" action="APPLY" status="SUCCES
S" bundle_series="DBBP" bundle_id="161018">DBBP bundle patch 161018 (DATABASE BU
NDLE PATCH: 12.1.0.2.161018 (24340679))</patch>
</sql_registry_state>

SQL> r

1 SELECT dbms_sqlpatch.sql_registry_state

2* FROM dual

SQL_REGISTRY_STATE

--------------------------------------------------------------------------------

<sql_registry_state>

<patch bundle="yes" id="24340679" uid="20646358" action="APPLY" status="SUCCES

S" bundle_series="DBBP" bundle_id="161018">DBBP bundle patch 161018 (DATABASE BU

NDLE PATCH: 12.1.0.2.161018 (24340679))</patch>

</sql_registry_state>

If I try to run it again, it does nothing/it fails saying the patch is not there:

oracle1> $ORACLE_HOME/OPatch/datapatch -rollback 24340679
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 16:10:49 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done
Adding patches to installation queue and performing prereq checks...done
Installation queue:
  Nothing to roll back
  Nothing to apply

SQL Patching tool complete on Wed Nov  2 16:10:51 2016

oracle1> $ORACLE_HOME/OPatch/datapatch -rollback 24340679 -force
SQL Patching tool version 12.2.0.0.0 on Wed Nov  2 16:11:01 2016
Copyright (c) 2014, Oracle.  All rights reserved.

Connecting to database...OK
Determining current state...done

Error: prereq checks failed!
  patch 24340679: Could not determine unique patch ID for patch 24340679 because it is not present in the SQL registry
Prereq check failed, exiting without installing any patches.

Please refer to MOS Note 1609718.1 for information on how to resolve the above errors.

SQL Patching tool complete on Wed Nov  2 16:11:01 2016

oracle1> $ORACLE_HOME/OPatch/datapatch -rollback 24340679

SQL Patching tool version 12.2.0.0.0 on Wed Nov 2 16:10:49 2016

Connecting to database...OK

Determining current state...done

Adding patches to installation queue and performing prereq checks...done

Installation queue:

Nothing to roll back

Nothing to apply

SQL Patching tool complete on Wed Nov 2 16:10:51 2016

oracle1> $ORACLE_HOME/OPatch/datapatch -rollback 24340679 -force

SQL Patching tool version 12.2.0.0.0 on Wed Nov 2 16:11:01 2016

Connecting to database...OK

Determining current state...done

Error: prereq checks failed!

patch 24340679: Could not determine unique patch ID for patch 24340679 because it is not present in the SQL registry

Prereq check failed, exiting without installing any patches.

Please refer to MOS Note 1609718.1 for information on how to resolve the above errors.

SQL Patching tool complete on Wed Nov 2 16:11:01 2016

What does it say on the patched node?

oracle2> crsctl relocate res theludot.db -n oracle2
CRS-2673: Attempting to stop 'theludot.db' on 'oracle1'
CRS-2677: Stop of 'theludot.db' on 'oracle1' succeeded
CRS-2672: Attempting to start 'theludot.db' on 'oracle2'
CRS-2676: Start of 'theludot.db' on 'oracle2' succeeded
oracle2>
oracle2> $ORACLE_HOME/OPatch/datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Wed Nov  2 16:15:36 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_7878_2016_11_02_16_15_36/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Bundle series DBBP:
  ID 161018 in the binary registry and ID 161018 in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  Nothing to apply

SQL Patching tool complete on Wed Nov  2 16:15:49 2016

oracle2> crsctl relocate res theludot.db -n oracle2

CRS-2673: Attempting to stop 'theludot.db' on 'oracle1'

CRS-2677: Stop of 'theludot.db' on 'oracle1' succeeded

CRS-2672: Attempting to start 'theludot.db' on 'oracle2'

CRS-2676: Start of 'theludot.db' on 'oracle2' succeeded

oracle2>

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose

SQL Patching tool version 12.1.0.2.0 on Wed Nov 2 16:15:36 2016

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_7878_2016_11_02_16_15_36/sqlpatch_invocation.log

Connecting to database...OK

Bootstrapping registry and package to current versions...done

Determining current state...done

Current state of SQL patches:

Bundle series DBBP:

ID 161018 in the binary registry and ID 161018 in the SQL registry

Adding patches to installation queue and performing prereq checks...

Installation queue:

Nothing to roll back

Nothing to apply

SQL Patching tool complete on Wed Nov 2 16:15:49 2016

Whaaat? datapatch there says that the patch IS in the registry and there’s nothing to do. Let’s try to force its apply again:

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose -apply 24340679 -force
SQL Patching tool version 12.1.0.2.0 on Wed Nov  2 16:17:40 2016
Copyright (c) 2016, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_12726_2016_11_02_16_17_40/sqlpatch_invocation.log

Connecting to database...OK
Determining current state...done

Current state of SQL patches:
Bundle series DBBP:
  ID 161018 in the binary registry and ID 161018 in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...
Patch 24340679 apply: SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_apply_THELUDOT_2016Nov02_16_17_40.log (no errors)
SQL Patching tool complete on Wed Nov  2 16:18:50 2016

oracle2> $ORACLE_HOME/OPatch/datapatch -verbose -apply 24340679 -force

SQL Patching tool version 12.1.0.2.0 on Wed Nov 2 16:17:40 2016

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_12726_2016_11_02_16_17_40/sqlpatch_invocation.log

Connecting to database...OK

Determining current state...done

Current state of SQL patches:

Bundle series DBBP:

ID 161018 in the binary registry and ID 161018 in the SQL registry

Adding patches to installation queue and performing prereq checks...

Installation queue:

Nothing to roll back

The following patches will be applied:

24340679 (DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679))

Installing patches...

Patch installation complete. Total patches installed: 1

Validating logfiles...

Patch 24340679 apply: SUCCESS

logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/24340679/20646358/24340679_apply_THELUDOT_2016Nov02_16_17_40.log (no errors)

SQL Patching tool complete on Wed Nov 2 16:18:50 2016

SQL> r
  1  select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id
  2* from dba_registry_sqlpatch

  PATCH_ID  PATCH_UID DESCRIPTION                                        ACTION_TIME                    ACTION          STATUS   BUNDLE_SERIES  BUNDLE_ID
---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02-NOV-16 01.49.51.664800 PM   APPLY           SUCCESS  DBBP              161018
  24340679   20646358                                                    02-NOV-16 02.48.53.760632 PM   ROLLBACK        SUCCESS
  24340679   20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679)  02-NOV-16 04.18.50.320745 PM   APPLY           SUCCESS  DBBP              161018

SQL> r

1 select patch_id, patch_uid, description, action_time, action, status, bundle_series, bundle_id

2* from dba_registry_sqlpatch

PATCH_ID PATCH_UID DESCRIPTION ACTION_TIME ACTION STATUS BUNDLE_SERIES BUNDLE_ID

---------- ---------- -------------------------------------------------- ------------------------------ --------------- -------- ------------- ----------

24340679 20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679) 02-NOV-16 01.49.51.664800 PM APPLY SUCCESS DBBP 161018

24340679 20646358 02-NOV-16 02.48.53.760632 PM ROLLBACK SUCCESS

24340679 20646358 DATABASE BUNDLE PATCH: 12.1.0.2.161018 (24340679) 02-NOV-16 04.18.50.320745 PM APPLY SUCCESS DBBP 161018

Conclusion

I’m not sure whether it is safe to run the patched database in a non-patched Oracle Home. I guess it is time for a new SR 🙂

Meanwhile, we will try hard not to relocate the databases once they have been patched.

Cheers

—

Ludo

Recording of “Rapid Home Provisioning” webinar for the RAC SIG

Posted on January 14, 2016 by Ludovico

Yesterday I have presented the Oracle Rapid Home Provisioning technology for the RAC SIG, you can find the recording on YouTube:

Cheers

—

Ludo

Rapid Home Provisioning

Posted on December 4, 2015 by Ludovico

In a few days I will give a presentation at UKOUG Tech15 about Rapid Home Provisioning, it will be the first time that I present this session in public.

I usually like to give the link to the material to my audience, so here we go:

Slides:

Demo:

Enjoy
—
Ludovico

Migrating Oracle RAC from SuSE to OEL (or RHEL) live

Posted on November 10, 2015 by Ludovico

I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.

I know, I know, there is a paper from Dell and Oracle named:

How Dell Migrated from SUSE Linux to Oracle Linux

That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:

– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart

What I want to achieve instead is:
– add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
– move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
– repeat (for the remaining nodes)

because the customer will also migrate to new hardware.

In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).

oracle@sles01:~> crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       sles01                   STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       sles01                   STABLE
ora.asm
               ONLINE  ONLINE       sles01                   Started,STABLE
ora.net1.network
               ONLINE  ONLINE       sles01                   STABLE
ora.ons
               ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       sles01                   STABLE
ora.cvu
      1        ONLINE  ONLINE       sles01                   STABLE
ora.oc4j
      1        OFFLINE OFFLINE                               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       sles01                   STABLE
ora.sles01.vip
      1        ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------
oracle@sles01:~> cat /etc/issue

Welcome to SUSE Linux Enterprise Server 11 SP4  (x86_64) - Kernel \r (\l).

oracle@sles01:~> crsctl stat res -t

--------------------------------------------------------------------------------

Name Target State Server State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATA.dg

ONLINE ONLINE sles01 STABLE

ora.LISTENER.lsnr

ONLINE ONLINE sles01 STABLE

ora.asm

ONLINE ONLINE sles01 Started,STABLE

ora.net1.network

ONLINE ONLINE sles01 STABLE

ora.ons

ONLINE ONLINE sles01 STABLE

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE sles01 STABLE

ora.cvu

1 ONLINE ONLINE sles01 STABLE

ora.oc4j

1 OFFLINE OFFLINE STABLE

ora.scan1.vip

1 ONLINE ONLINE sles01 STABLE

ora.sles01.vip

1 ONLINE ONLINE sles01 STABLE

--------------------------------------------------------------------------------

oracle@sles01:~> cat /etc/issue

Welcome to SUSE Linux Enterprise Server 11 SP4 (x86_64) - Kernel \r (\l).

I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:

Add new ip addresses (public, private, vip) to the DNS/hosts
Install the new OEL server
Keep the same user and groups (uid, gid, etc)
Verify the network connectivity and setup SSH equivalence
Check that the multicast connection is ok
Add the storage, configure persistent naming (udev) and verify that the disks (major, minor, names) are the very same
The network cards also must be the very same

Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to

Kernel release mismatch
Package mismatch

Here’s an example of output:

oracle@sles01:~> cluvfy stage -pre nodeadd -n rhel01

Performing pre-checks for node addition

Checking node reachability...
Node reachability check passed from node "sles01"


Checking user equivalence...
User equivalence check passed for user "oracle"
Package existence check passed for "cvuqdisk"

Checking CRS integrity...

CRS integrity check passed

Clusterware version consistency passed.

Checking shared resources...

Checking CRS home location...
Location check passed for: "/u01/app/12.1.0/grid"
Shared resources check for node addition passed


Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity using interfaces on subnet "192.168.56.0"
Node connectivity passed for subnet "192.168.56.0" with node(s) sles01,rhel01
TCP connectivity check passed for subnet "192.168.56.0"


Check: Node connectivity using interfaces on subnet "172.16.100.0"
Node connectivity passed for subnet "172.16.100.0" with node(s) rhel01,sles01
TCP connectivity check passed for subnet "172.16.100.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.56.0".
Subnet mask consistency check passed for subnet "172.16.100.0".
Subnet mask consistency check passed.

Node connectivity check passed

Checking multicast communication...

Checking subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251"...
Check of subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251" passed.

Check of multicast communication passed.
Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "sles01:/usr,sles01:/var,sles01:/etc,sles01:/u01/app/12.1.0/grid,sles01:/sbin,sles01:/tmp"
Free disk space check passed for "rhel01:/usr,rhel01:/var,rhel01:/etc,rhel01:/u01/app/12.1.0/grid,rhel01:/sbin,rhel01:/tmp"
Check for multiple users with UID value 1101 passed
User existence check passed for "oracle"
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed

WARNING:
PRVF-7524 : Kernel version is not consistent across all the nodes.
Kernel version = "3.0.101-63-default" found on nodes: sles01.
Kernel version = "3.8.13-16.2.1.el6uek.x86_64" found on nodes: rhel01.
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make"
Package existence check passed for "libaio"
Package existence check passed for "binutils"
Package existence check passed for "gcc(x86_64)"
Package existence check passed for "gcc-c++(x86_64)"
Package existence check passed for "glibc"
Package existence check passed for "glibc-devel"
Package existence check passed for "ksh"
Package existence check passed for "libaio-devel"
Package existence check failed for "libstdc++33"
Check failed on nodes:
        rhel01
Package existence check failed for "libstdc++43-devel"
Check failed on nodes:
        rhel01
Package existence check passed for "libstdc++-devel(x86_64)"
Package existence check failed for "libstdc++46"
Check failed on nodes:
        rhel01
Package existence check failed for "libgcc46"
Check failed on nodes:
        rhel01
Package existence check passed for "sysstat"
Package existence check failed for "libcap1"
Check failed on nodes:
        rhel01
Package existence check failed for "nfs-kernel-server"
Check failed on nodes:
        rhel01
Check for multiple users with UID value 0 passed
Current group ID check passed

Starting check for consistency of primary group of root user

Check for consistency of root user's primary group passed
Group existence check passed for "asmadmin"
Group existence check passed for "asmoper"
Group existence check passed for "asmdba"

Checking ASMLib configuration.
Check for ASMLib configuration passed.

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed


User "oracle" is not part of "root" group. Check passed
Checking integrity of file "/etc/resolv.conf" across nodes

"domain" and "search" entries do not coexist in any  "/etc/resolv.conf" file
All nodes have same "search" order defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: sles01,rhel01

Check for integrity of file "/etc/resolv.conf" failed


Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed


Pre-check for node addition was unsuccessful on all the nodes.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

oracle@sles01:~> cluvfy stage -pre nodeadd -n rhel01

Performing pre-checks for node addition

Checking node reachability...

Node reachability check passed from node "sles01"

Checking user equivalence...

User equivalence check passed for user "oracle"

Package existence check passed for "cvuqdisk"

Checking CRS integrity...

CRS integrity check passed

Clusterware version consistency passed.

Checking shared resources...

Checking CRS home location...

Location check passed for: "/u01/app/12.1.0/grid"

Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity using interfaces on subnet "192.168.56.0"

Node connectivity passed for subnet "192.168.56.0" with node(s) sles01,rhel01

TCP connectivity check passed for subnet "192.168.56.0"

Check: Node connectivity using interfaces on subnet "172.16.100.0"

Node connectivity passed for subnet "172.16.100.0" with node(s) rhel01,sles01

TCP connectivity check passed for subnet "172.16.100.0"

Checking subnet mask consistency...

Subnet mask consistency check passed for subnet "192.168.56.0".

Subnet mask consistency check passed for subnet "172.16.100.0".

Subnet mask consistency check passed.

Node connectivity check passed

Checking multicast communication...

Checking subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251"...

Check of subnet "172.16.100.0" for multicast communication with multicast group "224.0.0.251" passed.

Check of multicast communication passed.

Total memory check passed

Available memory check passed

Swap space check passed

Free disk space check passed for "sles01:/usr,sles01:/var,sles01:/etc,sles01:/u01/app/12.1.0/grid,sles01:/sbin,sles01:/tmp"

Free disk space check passed for "rhel01:/usr,rhel01:/var,rhel01:/etc,rhel01:/u01/app/12.1.0/grid,rhel01:/sbin,rhel01:/tmp"

Check for multiple users with UID value 1101 passed

User existence check passed for "oracle"

Run level check passed

Hard limits check passed for "maximum open file descriptors"

Soft limits check passed for "maximum open file descriptors"

Hard limits check passed for "maximum user processes"

Soft limits check passed for "maximum user processes"

System architecture check passed

WARNING:

PRVF-7524 : Kernel version is not consistent across all the nodes.

Kernel version = "3.0.101-63-default" found on nodes: sles01.

Kernel version = "3.8.13-16.2.1.el6uek.x86_64" found on nodes: rhel01.

Kernel version check passed

Kernel parameter check passed for "semmsl"

Kernel parameter check passed for "semmns"

Kernel parameter check passed for "semopm"

Kernel parameter check passed for "semmni"

Kernel parameter check passed for "shmmax"

Kernel parameter check passed for "shmmni"

Kernel parameter check passed for "shmall"

Kernel parameter check passed for "file-max"

Kernel parameter check passed for "ip_local_port_range"

Kernel parameter check passed for "rmem_default"

Kernel parameter check passed for "rmem_max"

Kernel parameter check passed for "wmem_default"

Kernel parameter check passed for "wmem_max"

Kernel parameter check passed for "aio-max-nr"

Package existence check passed for "make"

Package existence check passed for "libaio"

Package existence check passed for "binutils"

Package existence check passed for "gcc(x86_64)"

Package existence check passed for "gcc-c++(x86_64)"

Package existence check passed for "glibc"

Package existence check passed for "glibc-devel"

Package existence check passed for "ksh"

Package existence check passed for "libaio-devel"

Package existence check failed for "libstdc++33"

Check failed on nodes:

rhel01

Package existence check failed for "libstdc++43-devel"

Check failed on nodes:

rhel01

Package existence check passed for "libstdc++-devel(x86_64)"

Package existence check failed for "libstdc++46"

Check failed on nodes:

rhel01

Package existence check failed for "libgcc46"

Check failed on nodes:

rhel01

Package existence check passed for "sysstat"

Package existence check failed for "libcap1"

Check failed on nodes:

rhel01

Package existence check failed for "nfs-kernel-server"

Check failed on nodes:

rhel01

Check for multiple users with UID value 0 passed

Current group ID check passed

Starting check for consistency of primary group of root user

Check for consistency of root user's primary group passed

Group existence check passed for "asmadmin"

Group existence check passed for "asmoper"

Group existence check passed for "asmdba"

Checking ASMLib configuration.

Check for ASMLib configuration passed.

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed

Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...

No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

User "oracle" is not part of "root" group. Check passed

Checking integrity of file "/etc/resolv.conf" across nodes

"domain" and "search" entries do not coexist in any "/etc/resolv.conf" file

All nodes have same "search" order defined in file "/etc/resolv.conf"

PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: sles01,rhel01

Check for integrity of file "/etc/resolv.conf" failed

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...

Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed

Pre-check for node addition was unsuccessful on all the nodes.

So the problem is not if the check succeed or not (it will not), but what fails.

Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors. I need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it works:

oracle@sles01:/u01/app/12.1.0/grid/addnode> ./addnode.sh -silent "CLUSTER_NEW_NODES={rhel01}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rhel01-vip}" -ignorePrereq -ignoreSysPrereqs
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB.   Actual 27479 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 2032 MB    Passed

Prepare Configuration in progress.

Prepare Configuration successful.
..................................................   9% Done.
You can find the log of this install session at:
 /u01/app/oraInventory/logs/addNodeActions2015-11-09_09-57-16PM.log

Instantiate files in progress.

Instantiate files successful.
..................................................   15% Done.

Copying files to node in progress.

Copying files to node successful.
..................................................   79% Done.

Saving cluster inventory in progress.
..................................................   87% Done.

Saving cluster inventory successful.
The Cluster Node Addition of /u01/app/12.1.0/grid was successful.
Please check '/tmp/silentInstall.log' for more details.

As a root user, execute the following script(s):
        1. /u01/app/oraInventory/orainstRoot.sh
        2. /u01/app/12.1.0/grid/root.sh

Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes:
[rhel01]
Execute /u01/app/12.1.0/grid/root.sh on the following nodes:
[rhel01]

The scripts can be executed in parallel on all the nodes. If there are any policy managed databases managed by cluster, proceed with the addnode procedure without executing the root.sh script. Ensure that root.sh script is executed after all the policy managed databases managed by clusterware are extended to the new nodes.
..........
Update Inventory in progress.
..................................................   100% Done.

Update Inventory successful.
Successfully Setup Software.

oracle@sles01:/u01/app/12.1.0/grid/addnode> ./addnode.sh -silent "CLUSTER_NEW_NODES={rhel01}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rhel01-vip}" -ignorePrereq -ignoreSysPrereqs

Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB. Actual 27479 MB Passed

Checking swap space: must be greater than 150 MB. Actual 2032 MB Passed

Prepare Configuration in progress.

Prepare Configuration successful.

.................................................. 9% Done.

You can find the log of this install session at:

/u01/app/oraInventory/logs/addNodeActions2015-11-09_09-57-16PM.log

Instantiate files in progress.

Instantiate files successful.

.................................................. 15% Done.

Copying files to node in progress.

Copying files to node successful.

.................................................. 79% Done.

Saving cluster inventory in progress.

.................................................. 87% Done.

Saving cluster inventory successful.

The Cluster Node Addition of /u01/app/12.1.0/grid was successful.

Please check '/tmp/silentInstall.log' for more details.

As a root user, execute the following script(s):

1. /u01/app/oraInventory/orainstRoot.sh

2. /u01/app/12.1.0/grid/root.sh

Execute /u01/app/oraInventory/orainstRoot.sh on the following nodes:

[rhel01]

Execute /u01/app/12.1.0/grid/root.sh on the following nodes:

[rhel01]

The scripts can be executed in parallel on all the nodes. If there are any policy managed databases managed by cluster, proceed with the addnode procedure without executing the root.sh script. Ensure that root.sh script is executed after all the policy managed databases managed by clusterware are extended to the new nodes.

..........

Update Inventory in progress.

.................................................. 100% Done.

Update Inventory successful.

Successfully Setup Software.

Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:

[oracle@rhel01 install]$ sudo /u01/app/12.1.0/grid/root.sh
Performing root user operation for Oracle 12c

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/12.1.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params
2015/11/09 23:18:42 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful
2015/11/09 23:19:08 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Oracle High Availability Services has been started.
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'rhel01'
CRS-2672: Attempting to start 'ora.evmd' on 'rhel01'
CRS-2676: Start of 'ora.mdnsd' on 'rhel01' succeeded
CRS-2676: Start of 'ora.evmd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rhel01'
CRS-2676: Start of 'ora.gpnpd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rhel01'
CRS-2676: Start of 'ora.gipcd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rhel01'
CRS-2676: Start of 'ora.cssdmonitor' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rhel01'
CRS-2672: Attempting to start 'ora.diskmon' on 'rhel01'
CRS-2676: Start of 'ora.diskmon' on 'rhel01' succeeded
CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rhel01'
CRS-2676: Start of 'ora.cssd' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rhel01'
CRS-2672: Attempting to start 'ora.ctssd' on 'rhel01'
CRS-2676: Start of 'ora.ctssd' on 'rhel01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rhel01'
CRS-2676: Start of 'ora.asm' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'rhel01'
CRS-2676: Start of 'ora.storage' on 'rhel01' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rhel01'
CRS-2676: Start of 'ora.crsd' on 'rhel01' succeeded
CRS-6017: Processing resource auto-start for servers: rhel01
CRS-2672: Attempting to start 'ora.ons' on 'rhel01'
CRS-2676: Start of 'ora.ons' on 'rhel01' succeeded
CRS-6016: Resource auto-start has completed for server rhel01
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2015/11/09 23:22:06 CLSRSC-343: Successfully started Oracle clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
2015/11/09 23:22:23 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

[oracle@rhel01 install]$ sudo /u01/app/12.1.0/grid/root.sh

Performing root user operation for Oracle 12c

The following environment variables are set as:

ORACLE_OWNER= oracle

ORACLE_HOME= /u01/app/12.1.0/grid

Copying dbhome to /usr/local/bin ...

Copying oraenv to /usr/local/bin ...

Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by

Database Configuration Assistant when a database is created

Finished running generic part of root script.

Now product-specific root actions will be performed.

Relinking oracle with rac_on option

Using configuration parameter file: /u01/app/12.1.0/grid/crs/install/crsconfig_params

2015/11/09 23:18:42 CLSRSC-363: User ignored prerequisites during installation

OLR initialization - successful

2015/11/09 23:19:08 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.conf'

CRS-4133: Oracle High Availability Services has been stopped.

CRS-4123: Oracle High Availability Services has been started.

CRS-4133: Oracle High Availability Services has been stopped.

CRS-4123: Oracle High Availability Services has been started.

CRS-4133: Oracle High Availability Services has been stopped.

CRS-4123: Starting Oracle High Availability Services-managed resources

CRS-2672: Attempting to start 'ora.mdnsd' on 'rhel01'

CRS-2672: Attempting to start 'ora.evmd' on 'rhel01'

CRS-2676: Start of 'ora.mdnsd' on 'rhel01' succeeded

CRS-2676: Start of 'ora.evmd' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rhel01'

CRS-2676: Start of 'ora.gpnpd' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.gipcd' on 'rhel01'

CRS-2676: Start of 'ora.gipcd' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rhel01'

CRS-2676: Start of 'ora.cssdmonitor' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rhel01'

CRS-2672: Attempting to start 'ora.diskmon' on 'rhel01'

CRS-2676: Start of 'ora.diskmon' on 'rhel01' succeeded

CRS-2789: Cannot stop resource 'ora.diskmon' as it is not running on server 'rhel01'

CRS-2676: Start of 'ora.cssd' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rhel01'

CRS-2672: Attempting to start 'ora.ctssd' on 'rhel01'

CRS-2676: Start of 'ora.ctssd' on 'rhel01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rhel01'

CRS-2676: Start of 'ora.asm' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.storage' on 'rhel01'

CRS-2676: Start of 'ora.storage' on 'rhel01' succeeded

CRS-2672: Attempting to start 'ora.crsd' on 'rhel01'

CRS-2676: Start of 'ora.crsd' on 'rhel01' succeeded

CRS-6017: Processing resource auto-start for servers: rhel01

CRS-2672: Attempting to start 'ora.ons' on 'rhel01'

CRS-2676: Start of 'ora.ons' on 'rhel01' succeeded

CRS-6016: Resource auto-start has completed for server rhel01

CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources

CRS-4123: Oracle High Availability Services has been started.

2015/11/09 23:22:06 CLSRSC-343: Successfully started Oracle clusterware stack

clscfg: EXISTING configuration version 5 detected.

clscfg: version 5 is 12c Release 1.

Successfully accumulated necessary OCR keys.

Creating OCR keys for user 'root', privgrp 'root'..

Operation successful.

Preparing packages for installation...

cvuqdisk-1.0.9-1

2015/11/09 23:22:23 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded

Bingo! Let’s check if everything is up and running:

[oracle@rhel01 ~]$ /u01/app/12.1.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
ora.asm
               ONLINE  ONLINE       rhel01                   Started,STABLE
               ONLINE  ONLINE       sles01                   Started,STABLE
ora.net1.network
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
ora.ons
               ONLINE  ONLINE       rhel01                   STABLE
               ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       sles01                   STABLE
ora.cvu
      1        ONLINE  ONLINE       sles01                   STABLE
ora.oc4j
      1        OFFLINE OFFLINE                               STABLE
ora.rhel01.vip
      1        ONLINE  ONLINE       rhel01                   STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       sles01                   STABLE
ora.sles01.vip
      1        ONLINE  ONLINE       sles01                   STABLE
--------------------------------------------------------------------------------

[oracle@rhel01 ~]$ /u01/app/12.1.0/grid/bin/crsctl stat res -t

--------------------------------------------------------------------------------

Name Target State Server State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DATA.dg

ONLINE ONLINE rhel01 STABLE

ONLINE ONLINE sles01 STABLE

ora.LISTENER.lsnr

ONLINE ONLINE rhel01 STABLE

ONLINE ONLINE sles01 STABLE

ora.asm

ONLINE ONLINE rhel01 Started,STABLE

ONLINE ONLINE sles01 Started,STABLE

ora.net1.network

ONLINE ONLINE rhel01 STABLE

ONLINE ONLINE sles01 STABLE

ora.ons

ONLINE ONLINE rhel01 STABLE

ONLINE ONLINE sles01 STABLE

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE sles01 STABLE

ora.cvu

1 ONLINE ONLINE sles01 STABLE

ora.oc4j

1 OFFLINE OFFLINE STABLE

ora.rhel01.vip

1 ONLINE ONLINE rhel01 STABLE

ora.scan1.vip

1 ONLINE ONLINE sles01 STABLE

ora.sles01.vip

1 ONLINE ONLINE sles01 STABLE

--------------------------------------------------------------------------------

[oracle@rhel01 ~]$ olsnodes -s
sles01  Active
rhel01  Active

[oracle@rhel01 ~]$ ssh rhel01 uname -r
3.8.13-16.2.1.el6uek.x86_64
[oracle@rhel01 ~]$ ssh sles01 uname -r
3.0.101-63-default

[oracle@rhel01 ~]$ ssh rhel01 cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)
[oracle@rhel01 ~]$ ssh sles01 cat /etc/issue
Welcome to SUSE Linux Enterprise Server 11 SP4  (x86_64) - Kernel \r (\l).

[oracle@rhel01 ~]$ olsnodes -s

sles01 Active

rhel01 Active

[oracle@rhel01 ~]$ ssh rhel01 uname -r

3.8.13-16.2.1.el6uek.x86_64

[oracle@rhel01 ~]$ ssh sles01 uname -r

3.0.101-63-default

[oracle@rhel01 ~]$ ssh rhel01 cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5 (Santiago)

[oracle@rhel01 ~]$ ssh sles01 cat /etc/issue

Welcome to SUSE Linux Enterprise Server 11 SP4 (x86_64) - Kernel \r (\l).

So yes, it works, but remember that it’s not a supported long-term configuration.

In my case I expect to migrate the whole cluster from SLES to OEL in one day.

NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html

Otherwise, you need to configure a new interface name for the cluster with oifcfg.

HTH

—

Ludovico

Grid Infrastructure 12c: Recovering the GRID Disk Group and recreating the GIMR

Posted on September 27, 2015 by Ludovico

Losing the Disk Group that contains OCR and voting files has always been a challenge. It requires you to take regular backups of OCR, spfile and diskgroup metadata.

Since Oracle 12cR1, there are a few additional components you must take care of:

– The ASM password file (if you have Flex ASM it can be quite critical)

– The Grid Infrastructure Management Repository

Why ASM password file is important? Well, you can read this good blog post form my colleague Robert Bialek: http://blog.trivadis.com/b/robertbialek/archive/2014/10/26/are-you-using-oracle-12c-flex-asm-if-yes-do-you-have-asm-password-file-backup.aspx

So the problem here, is not whether you should back them up or not, but how you can restore them quickly.

Assumptions: you back up regularly:

ASM parameter file:

SQL> create pfile='/backup/spfileASM.ora' from spfile;

File created.

SQL> create pfile='/backup/spfileASM.ora' from spfile;

File created.

Oracle Cluster Registry:

grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/ocrconfig -manualbackup
tvdrach03 2015/09/21 14:30:39 /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150921_143039.ocr 0

1 2	grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/ocrconfig -manualbackup tvdrach03 2015/09/21 14:30:39 /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150921_143039.ocr 0

ASM Diskgroup Metadata:

ASMCMD [+] > md_backup GRID.dg -G GRID
Disk group metadata to be backed up: GRID
Current alias directory path: _MGMTDB/DATAFILE
Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE
Current alias directory path: tvdrac-cluster
Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE
Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5
Current alias directory path: ASM/PASSWORD
Current alias directory path: _MGMTDB/TEMPFILE
Current alias directory path: tvdrac-cluster/ASMPARAMETERFILE
Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/TEMPFILE
Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815
Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6
Current alias directory path: _MGMTDB/ONLINELOG
Current alias directory path: _MGMTDB
Current alias directory path: ASM
Current alias directory path: tvdrac-cluster/OCRFILE
Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7
Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6/TEMPFILE
Current alias directory path: _MGMTDB/CONTROLFILE
Current alias directory path: _MGMTDB/PARAMETERFILE

ASMCMD [+] > md_backup GRID.dg -G GRID

Disk group metadata to be backed up: GRID

Current alias directory path: _MGMTDB/DATAFILE

Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE

Current alias directory path: tvdrac-cluster

Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE

Current alias directory path: _MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5

Current alias directory path: ASM/PASSWORD

Current alias directory path: _MGMTDB/TEMPFILE

Current alias directory path: tvdrac-cluster/ASMPARAMETERFILE

Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/TEMPFILE

Current alias directory path: _MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815

Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6

Current alias directory path: _MGMTDB/ONLINELOG

Current alias directory path: _MGMTDB

Current alias directory path: ASM

Current alias directory path: tvdrac-cluster/OCRFILE

Current alias directory path: _MGMTDB/20BC39F0F36C18F4E0533358A8C058F7

Current alias directory path: _MGMTDB/20BC2691871B0B14E0533358A8C01AC6/TEMPFILE

Current alias directory path: _MGMTDB/CONTROLFILE

Current alias directory path: _MGMTDB/PARAMETERFILE

ASM password file:

ASMCMD [+GRID] > pwcopy +GRID/orapwASM /backup/
copying +GRID/orapwASM -> /backup/orapwASM

1 2	ASMCMD [+GRID] > pwcopy +GRID/orapwASM /backup/ copying +GRID/orapwASM -> /backup/orapwASM

What about the GIMR?

According to the MOS Note: FAQ: 12c Grid Infrastructure Management Repository (GIMR) (Doc ID 1568402.1), there is no such need for the moment.

Weird, huh? The -MGMTDB itself contains for the moment just the Cluster Health Monitor repository, but expect to see its important increasing with the next versions of Oracle Grid Infrastructure.

If you REALLY want to back it up (even if not fundamental, it is not a bad idea, after all), you can do it.

The -MGMTDB is in noarchivelog by default. You need to either put it in archivelog mode (and set a recovery area, etc etc) or back it up while it is mounted.

Because the Cluster Health Monitor (ora.crf) depends on it, you have to stop it beforehand:

grid@tvdrach01:~/ [-MGMTDB] crsctl stop resource ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'tvdrach01'
CRS-2677: Stop of 'ora.crf' on 'tvdrach01' succeeded

grid@tvdrach01:~/ [-MGMTDB] crsctl stop resource ora.crf -init

CRS-2673: Attempting to stop 'ora.crf' on 'tvdrach01'

CRS-2677: Stop of 'ora.crf' on 'tvdrach01' succeeded

Then you can operate with -MGMTDB:

grid@tvdrach01:~/ [-MGMTDB] srvctl stop mgmtdb -stopoption IMMEDIATE
grid@tvdrach01:~/ [-MGMTDB] srvctl start mgmtdb -startoption MOUNT

grid@tvdrach01:~/ [-MGMTDB]

grid@tvdrach02:~/ [-MGMTDB] rman

Recovery Manager: Release 12.1.0.2.0 - Production on Sun Sep 27 17:59:55 2015

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /

connected to target database: _MGMTDB (DBID=1095800268, not open)

RMAN> backup as compressed backupset database format '+DATA';

Starting backup at 27-SEP-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdata.269.891526555
input datafile file number=00007 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/system.270.891526555
input datafile file number=00008 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysaux.271.891526555
input datafile file number=00010 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysgridhomedata.272.891526555
input datafile file number=00012 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdatadb.273.891526555
input datafile file number=00009 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/users.274.891526555
channel ORA_DISK_1: starting piece 1 at 27-SEP-15
channel ORA_DISK_1: finished piece 1 at 27-SEP-15
piece handle=+DATA/_MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.256.891540019 tag=TAG20150927T180016 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+GRID/_MGMTDB/DATAFILE/system.258.891526155
input datafile file number=00003 name=+GRID/_MGMTDB/DATAFILE/sysaux.257.891526135
input datafile file number=00004 name=+GRID/_MGMTDB/DATAFILE/undotbs1.259.891526181
channel ORA_DISK_1: starting piece 1 at 27-SEP-15
channel ORA_DISK_1: finished piece 1 at 27-SEP-15
piece handle=+DATA/_MGMTDB/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.257.891540043 tag=TAG20150927T180016 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00005 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/system.265.891526233
input datafile file number=00006 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/sysaux.266.891526233
channel ORA_DISK_1: starting piece 1 at 27-SEP-15
channel ORA_DISK_1: finished piece 1 at 27-SEP-15
piece handle=+DATA/_MGMTDB/20BC2691871B0B14E0533358A8C01AC6/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.258.891540069 tag=TAG20150927T180016 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
Finished backup at 27-SEP-15

Starting Control File and SPFILE Autobackup at 27-SEP-15
piece handle=/u01/app/grid/12.1.0.2/dbs/c-1095800268-20150927-00 comment=NONE
Finished Control File and SPFILE Autobackup at 27-SEP-15

RMAN> alter database open;

Statement processed

RMAN>

grid@tvdrach01:~/ [-MGMTDB] srvctl stop mgmtdb -stopoption IMMEDIATE

grid@tvdrach01:~/ [-MGMTDB] srvctl start mgmtdb -startoption MOUNT

grid@tvdrach01:~/ [-MGMTDB]

grid@tvdrach02:~/ [-MGMTDB] rman

Recovery Manager: Release 12.1.0.2.0 - Production on Sun Sep 27 17:59:55 2015

RMAN> connect target /

connected to target database: _MGMTDB (DBID=1095800268, not open)

RMAN> backup as compressed backupset database format '+DATA';

Starting backup at 27-SEP-15

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=24 device type=DISK

channel ORA_DISK_1: starting compressed full datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

input datafile file number=00011 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdata.269.891526555

input datafile file number=00007 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/system.270.891526555

input datafile file number=00008 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysaux.271.891526555

input datafile file number=00010 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysgridhomedata.272.891526555

input datafile file number=00012 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/sysmgmtdatadb.273.891526555

input datafile file number=00009 name=+GRID/_MGMTDB/FD9B43BF6A646F8CE043B6A9E80A2815/DATAFILE/users.274.891526555

channel ORA_DISK_1: starting piece 1 at 27-SEP-15

channel ORA_DISK_1: finished piece 1 at 27-SEP-15

piece handle=+DATA/_MGMTDB/20BC39F0F36C18F4E0533358A8C058F7/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.256.891540019 tag=TAG20150927T180016 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25

channel ORA_DISK_1: starting compressed full datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

input datafile file number=00001 name=+GRID/_MGMTDB/DATAFILE/system.258.891526155

input datafile file number=00003 name=+GRID/_MGMTDB/DATAFILE/sysaux.257.891526135

input datafile file number=00004 name=+GRID/_MGMTDB/DATAFILE/undotbs1.259.891526181

channel ORA_DISK_1: starting piece 1 at 27-SEP-15

channel ORA_DISK_1: finished piece 1 at 27-SEP-15

piece handle=+DATA/_MGMTDB/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.257.891540043 tag=TAG20150927T180016 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25

channel ORA_DISK_1: starting compressed full datafile backup set

channel ORA_DISK_1: specifying datafile(s) in backup set

input datafile file number=00005 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/system.265.891526233

input datafile file number=00006 name=+GRID/_MGMTDB/FD9AC0F7C36E4438E043B6A9E80A24D5/DATAFILE/sysaux.266.891526233

channel ORA_DISK_1: starting piece 1 at 27-SEP-15

channel ORA_DISK_1: finished piece 1 at 27-SEP-15

piece handle=+DATA/_MGMTDB/20BC2691871B0B14E0533358A8C01AC6/BACKUPSET/2015_09_27/nnndf0_tag20150927t180016_0.258.891540069 tag=TAG20150927T180016 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15

Finished backup at 27-SEP-15

Starting Control File and SPFILE Autobackup at 27-SEP-15

piece handle=/u01/app/grid/12.1.0.2/dbs/c-1095800268-20150927-00 comment=NONE

Finished Control File and SPFILE Autobackup at 27-SEP-15

RMAN> alter database open;

Statement processed

RMAN>

Now, imagine that you loose the GRID diskgroup (nowadays, with the ASM Filter Driver, it’s more complex to corrupt a device by mistake, but let’s assume that you do it):

root@tvdrach01:~/ [-MGMTDB] dd if=/dev/zero of=/dev/asm-disk1 bs=1M count=128
128+0 records in
128+0 records out
134217728 bytes (134 MB) copied, 0.360653 s, 372 MB/s

root@tvdrach01:~/ [-MGMTDB] dd if=/dev/zero of=/dev/asm-disk1 bs=1M count=128

128+0 records in

128+0 records out

134217728 bytes (134 MB) copied, 0.360653 s, 372 MB/s

The cluster will not start anymore, you need to disable the crs, reboot and start it in exclusive mode:

root@tvdrach01:~/ [-MGMTDB] crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'tvdrach01'
CRS-2672: Attempting to start 'ora.mdnsd' on 'tvdrach01'
CRS-2676: Start of 'ora.mdnsd' on 'tvdrach01' succeeded
CRS-2676: Start of 'ora.evmd' on 'tvdrach01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'tvdrach01'
CRS-2676: Start of 'ora.gpnpd' on 'tvdrach01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'tvdrach01'
CRS-2672: Attempting to start 'ora.gipcd' on 'tvdrach01'
CRS-2676: Start of 'ora.cssdmonitor' on 'tvdrach01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'tvdrach01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'tvdrach01'
CRS-2672: Attempting to start 'ora.diskmon' on 'tvdrach01'
CRS-2676: Start of 'ora.diskmon' on 'tvdrach01' succeeded
CRS-2676: Start of 'ora.cssd' on 'tvdrach01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'tvdrach01'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'tvdrach01'
CRS-2672: Attempting to start 'ora.ctssd' on 'tvdrach01'
CRS-2676: Start of 'ora.ctssd' on 'tvdrach01' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'tvdrach01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'tvdrach01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'tvdrach01'
CRS-2676: Start of 'ora.asm' on 'tvdrach01' succeeded
root@tvdrach01:~/ [-MGMTDB]

root@tvdrach01:~/ [-MGMTDB] crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.evmd' on 'tvdrach01'

CRS-2672: Attempting to start 'ora.mdnsd' on 'tvdrach01'

CRS-2676: Start of 'ora.mdnsd' on 'tvdrach01' succeeded

CRS-2676: Start of 'ora.evmd' on 'tvdrach01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'tvdrach01'

CRS-2676: Start of 'ora.gpnpd' on 'tvdrach01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'tvdrach01'

CRS-2672: Attempting to start 'ora.gipcd' on 'tvdrach01'

CRS-2676: Start of 'ora.cssdmonitor' on 'tvdrach01' succeeded

CRS-2676: Start of 'ora.gipcd' on 'tvdrach01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'tvdrach01'

CRS-2672: Attempting to start 'ora.diskmon' on 'tvdrach01'

CRS-2676: Start of 'ora.diskmon' on 'tvdrach01' succeeded

CRS-2676: Start of 'ora.cssd' on 'tvdrach01' succeeded

CRS-2672: Attempting to start 'ora.drivers.acfs' on 'tvdrach01'

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'tvdrach01'

CRS-2672: Attempting to start 'ora.ctssd' on 'tvdrach01'

CRS-2676: Start of 'ora.ctssd' on 'tvdrach01' succeeded

CRS-2676: Start of 'ora.drivers.acfs' on 'tvdrach01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'tvdrach01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'tvdrach01'

CRS-2676: Start of 'ora.asm' on 'tvdrach01' succeeded

root@tvdrach01:~/ [-MGMTDB]

Then you can recreate the GRID disk group and restore everything inside it:

SQL> alter system set asm_diskstring='/dev/asm*';

System altered.

SQL> create diskgroup GRID  external redundancy disk '/dev/asm-disk1' attribute 'COMPATIBLE.ADVM'='12.1.0.0.0', 'COMPATIBLE.ASM'='12.1.0.0.0';

Diskgroup created.

SQL> create spfile='+GRID' from pfile='/backup/spfileASM.ora';

File created.

SQL> 

root@tvdrach01:~/ [+ASM1] ocrconfig -restore /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150927_174702.ocr
root@tvdrach01:~/ [+ASM1]

grid@tvdrach01:~/ [+ASM1] crsctl replace votedisk '+GRID'
Successful addition of voting disk a375f4bdb7854f8fbf7a92cd880fba60.
Successfully replaced voting disk group with +GRID.
CRS-4266: Voting file(s) successfully replaced


root@tvdrach01:~/ [+ASM1]  crsctl stop crs -f
...
root@tvdrach01:~/ [+ASM1]  crsctl start crs
...


ASMCMD [+] >  pwcopy --asm /backup/orapwASM +GRID/orapwASM
copying /backup/orapwASM -> +GRID/orapwASM

SQL> alter system set asm_diskstring='/dev/asm*';

System altered.

SQL> create diskgroup GRID external redundancy disk '/dev/asm-disk1' attribute 'COMPATIBLE.ADVM'='12.1.0.0.0', 'COMPATIBLE.ASM'='12.1.0.0.0';

Diskgroup created.

SQL> create spfile='+GRID' from pfile='/backup/spfileASM.ora';

File created.

SQL>

root@tvdrach01:~/ [+ASM1] ocrconfig -restore /u01/app/grid/12.1.0.2/cdata/tvdrac-cluster/backup_20150927_174702.ocr

root@tvdrach01:~/ [+ASM1]

grid@tvdrach01:~/ [+ASM1] crsctl replace votedisk '+GRID'

Successful addition of voting disk a375f4bdb7854f8fbf7a92cd880fba60.

Successfully replaced voting disk group with +GRID.

CRS-4266: Voting file(s) successfully replaced

root@tvdrach01:~/ [+ASM1] crsctl stop crs -f

...

root@tvdrach01:~/ [+ASM1] crsctl start crs

...

ASMCMD [+] > pwcopy --asm /backup/orapwASM +GRID/orapwASM

copying /backup/orapwASM -> +GRID/orapwASM

Finally, the last missing component: the GIMR.

You can recreate it or restore it (if you backed it up at some point in time).

Let’s see how to recreate it:

grid@tvdrach03:~/ [-MGMTDB] srvctl disable mgmtdb
grid@tvdrach03:~/ [-MGMTDB] srvctl remove mgmtdb
Remove the database _mgmtdb? (y/[n]) y
grid@tvdrach01:~/ [+ASM1] dbca -silent -createDatabase -sid -MGMTDB \
> -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc \
> -gdbName _mgmtdb -storageType ASM -diskGroupName +GRID \
> -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -characterset AL32UTF8 \
> -autoGeneratePasswords -skipUserTemplateCheck
Cleaning up failed steps
5% complete
Registering database with Oracle Grid Infrastructure
11% complete
Copying database files
12% complete
14% complete
21% complete
27% complete
34% complete
41% complete
44% complete
Creating and starting Oracle instance
46% complete
51% complete
52% complete
53% complete
58% complete
62% complete
63% complete
66% complete
Completing Database Creation
70% complete
80% complete
90% complete
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb0.log" for further details.
grid@tvdrach01:~/ [+ASM1] dbca -silent -createPluggableDatabase -sourceDB -MGMTDB \
>  -pdbName tvdrac_cluster -createPDBFrom RMANBACKUP \
>  -PDBBackUpfile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.dfb \
>  -PDBMetadataFile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.xml \
>  -createAsClone true -internalSkipGIHomeCheck
Creating Pluggable Database
Creating Pluggable Database
4% complete
12% complete
21% complete
38% complete
55% complete
O-GRINF Grid Infrastructure Disaster Recovery
Page 21
85% complete
Completing Pluggable Database Creation
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/tvdrac_cluster/_mgmtdb.log" for further details.
grid@tvdrach01:~/ [+ASM1] srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node tvdrach01

grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/crsctl modify res ora.crf -attr ENABLED=1 -init
grid@tvdrach01:~/ [+ASM1] crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'tvdrach01'
CRS-2676: Start of 'ora.crf' on 'tvdrach01' succeeded
grid@tvdrach01:~/ [+ASM1]

grid@tvdrach03:~/ [-MGMTDB] srvctl disable mgmtdb

grid@tvdrach03:~/ [-MGMTDB] srvctl remove mgmtdb

Remove the database _mgmtdb? (y/[n]) y

grid@tvdrach01:~/ [+ASM1] dbca -silent -createDatabase -sid -MGMTDB \

> -createAsContainerDatabase true -templateName MGMTSeed_Database.dbc \

> -gdbName _mgmtdb -storageType ASM -diskGroupName +GRID \

> -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -characterset AL32UTF8 \

> -autoGeneratePasswords -skipUserTemplateCheck

Cleaning up failed steps

5% complete

Registering database with Oracle Grid Infrastructure

11% complete

Copying database files

12% complete

14% complete

21% complete

27% complete

34% complete

41% complete

44% complete

Creating and starting Oracle instance

46% complete

51% complete

52% complete

53% complete

58% complete

62% complete

63% complete

66% complete

Completing Database Creation

70% complete

80% complete

90% complete

100% complete

Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/_mgmtdb0.log" for further details.

grid@tvdrach01:~/ [+ASM1] dbca -silent -createPluggableDatabase -sourceDB -MGMTDB \

> -pdbName tvdrac_cluster -createPDBFrom RMANBACKUP \

> -PDBBackUpfile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.dfb \

> -PDBMetadataFile $ORACLE_HOME/assistants/dbca/templates/mgmtseed_pdb.xml \

> -createAsClone true -internalSkipGIHomeCheck

Creating Pluggable Database

4% complete

12% complete

21% complete

38% complete

55% complete

O-GRINF Grid Infrastructure Disaster Recovery

Page 21

85% complete

Completing Pluggable Database Creation

100% complete

Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/_mgmtdb/tvdrac_cluster/_mgmtdb.log" for further details.

grid@tvdrach01:~/ [+ASM1] srvctl status mgmtdb

Database is enabled

Instance -MGMTDB is running on node tvdrach01

grid@tvdrach01:~/ [+ASM1] sudo $ORACLE_HOME/bin/crsctl modify res ora.crf -attr ENABLED=1 -init

grid@tvdrach01:~/ [+ASM1] crsctl start res ora.crf -init

CRS-2672: Attempting to start 'ora.crf' on 'tvdrach01'

CRS-2676: Start of 'ora.crf' on 'tvdrach01' succeeded

grid@tvdrach01:~/ [+ASM1]

Conclusion

Recovering from a lost Disk Group / Cluster is not rocket science. Just practice it every now and then. If you do not have a test RAC, you can build your lab on your laptop using the RAC Attack instructions. If you want to test all the scenarios, the RAC SIG webcast: Oracle 11g Clusterware failure scenarios with practical demonstrations by Kamran Agayev is the best starting point, IMHO. Just keep in mind that Flex ASM and the GIMR add more complexity.

HTH

—

Ludovico

Another successful RAC Attack in Geneva!

Posted on September 27, 2015 by Ludovico

Last week I have hosted the second Swiss RAC Attack workshop at Trivadis offices in Geneva. It has been a great success, with 21 total participants: 5 Ninjas, 4 alumni and 14 people actively installing or playing with RAC 12c on their laptops.

Last year I was suprised by a participant coming fron Nanterre. This year two people came directly from Moscow, just for the workshop!

We’ve got good pizza and special beer: Chimay , Vedett, Duvel, Andechs…

Last but not least, our friend Marc Fielding was visiting Switzerland last week, so he took the opportunity to join us and make the workshop even more interesting! 😀

Looking forward to organize it again in one year! Thank you guys 🙂

—

Ludovico

RAC Attack! 12c is back to Geneva!

Posted on June 17, 2015 by Ludovico

Version française ici.

After a great success in 2014, RAC Attack! comes back to Geneva!
Set up an Oracle Real Application Clusters 12c environment on your laptop, try advanced configurations or simply take the opportunity to discuss about Oracle technology with the best experts in Suisse Romande!
Experienced volunteers (ninjas) will help you address any related issues and guide you through the setup process.

Where? Trivadis office, Chemin Château-Bloch 11, CH1219 Geneva

When? Thursday September 17th, 2015, from 17h00 onwards

Cost? It is a FREE event! It is a community based, informal and enjoyable workshop. You just need to bring your own laptop and your desire to have fun!

Confirmed Ninjas:
Ludovico Caldara – Oracle ACE, RAC SIG Chair & co-auteur RAC Attack
Eric Grancher – Membre OAK Table & Senior DBA
Jacques Kostic – OCM 11g & Senior Consultant chez Trivadis

Limited places! Reserve your seat and T-shirt now!

Agenda:
17.00 – Welcome and T-shirt distribution
17.30 – RAC Attack 12c part I
19.30 – Pizza and Beers! (sponsored by Trivadis)
20.00 – RAC Attack 12c part II
22.00 – Group photo and wrap-up!!

Still undecided? Look at what we did last year!

This event is sold out. No more seats available, sorry! Would you be interested in joining the event next year? Drop me an email!