July, 2021 - DBA survival BLOG

I have a customer trying to add a new node to a cluster using Fleet Patching and Provisioning.

The error in the command output is not very friendly:

[grid@fpps ~]$ rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \
  -newnodes fppc3:fppc3-vip  -cred fppc-cred
fpps: Audit ID: 269
PRCT-1003 : failed to run "rhphelper" on node "fppc2"
PRCT-1014 : Internal error: RHPHELP_preNodeAddVal-05null

[grid@fpps ~]$ rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \

-newnodes fppc3:fppc3-vip -cred fppc-cred

fpps: Audit ID: 269

PRCT-1003 : failed to run "rhphelper" on node "fppc2"

PRCT-1014 : Internal error: RHPHELP_preNodeAddVal-05null

The “RHPHELP_preNodeAddVal” might already give an idea of the cause: something related to the “cluvfy stage -pre nodeadd” evaluation that we normally do when adding a node by hand. FPP does not really run cluvfy, but it calls the same primitives cluvfy is based on.

In FPP, when the error does not give any useful information, this is the flow to follow:

use “rhpctl query audit” to get the date and time of the failing operation
open the “rhpserver.log.0” and look for the operation log in that time frame
get the UID of the operation e.g., in the following line it is “1556344143”:

[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:25:20.741 KST ]
  [ServerCommon.processParameters:485]  before parsing: params = 
  {-methodName=addnodesWorkingCopy, -userName=grid, -version=19.0.0.0.0, -auditId=-1556344143,
  -auditCli=rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 -newnodes fppc3:fppc3-vip -cred cred_fppc,
  -plsnrPort=31605, -noun=gihome, -isSingleNodeProv=FALSE, -nls_lang=AMERICAN_AMERICA.AL32UTF8,
  -clusterName=fpps-cluster, -plsnrHost=fpps, -SA11204ClusterName=null,
  -lang=en_US, -clientNode=fpps, -verb=addnode, -ghopuid=-1556344143}

[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:25:20.741 KST ]

[ServerCommon.processParameters:485] before parsing: params =

{-methodName=addnodesWorkingCopy, -userName=grid, -version=19.0.0.0.0, -auditId=-1556344143,

-auditCli=rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 -newnodes fppc3:fppc3-vip -cred cred_fppc,

-plsnrPort=31605, -noun=gihome, -isSingleNodeProv=FALSE, -nls_lang=AMERICAN_AMERICA.AL32UTF8,

-clusterName=fpps-cluster, -plsnrHost=fpps, -SA11204ClusterName=null,

-lang=en_US, -clientNode=fpps, -verb=addnode, -ghopuid=-1556344143}

Isolate the log for the operation: grep $UID rhpserver.log.0 > $UID.log
Locate the trace file of the rhphelper remote execution:

[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:26:07.031 KST ] [RHPHELPERUtil.getTraceEnvs:4386] 
  TraceFileLocEnv is :RHPHELPER_TRACEFILE=/u01/app/grid/crsdata/fppc2/rhp/rhphelp_20210727002603.trc

1 2	[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:26:07.031 KST ] [RHPHELPERUtil.getTraceEnvs:4386] TraceFileLocEnv is :RHPHELPER_TRACEFILE=/u01/app/grid/crsdata/fppc2/rhp/rhphelp_20210727002603.trc

Find the root cause in the rhphelper trace:

[main] [ 2021-07-27 00:27:02.600 KST ] [reflect.GeneratedMethodAccessor1.invoke:-1]  PRVG-11406 : API with node roles argument must be called for Flex Cluster

1	[main] [ 2021-07-27 00:27:02.600 KST ] [reflect.GeneratedMethodAccessor1.invoke:-1] PRVG-11406 : API with node roles argument must be called for Flex Cluster

In this case, the target cluster is a Flex Cluster, so the command must be run specifying the node_role.

The documentation is not clear (we will fix it soon):

rhpctl addnode gihome {-workingcopy workingcopy_name | -client cluster_name}
  -newnodes node_name:node_vip[:node_role][,node_name:node_vip[:node_role]...]

1 2	rhpctl addnode gihome {-workingcopy workingcopy_name \| -client cluster_name} -newnodes node_name:node_vip[:node_role][,node_name:node_vip[:node_role]...]

node_role must be specified for Flex Clusters, and it must be either HUB or LEAF.

After using the correct command line, the command succeeded.

rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \
 -newnodes fppc3:fppc3-vip:HUB  -cred fppc-cred

1 2	rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \ -newnodes fppc3:fppc3-vip:HUB -cred fppc-cred

HTH

—

Ludovico

When using FPP, you might experience the following error (PRVF-7546):

$ rhpctl add workingcopy -workingcopy WC_db_19_11_FPPC -image db_19_11 -path /u01/app/oracle/product/WC_db_19_11_FPPC -client fppc -oraclebase /u01/app/oracle
fpps01: Audit ID: 121
PRGO-1260 : Cluster Verification checks for database home provisioning  failed for the specified working copy WC_db_19_11_FPPC.
PRCR-1178 : Execution of command failed on one or more nodes
 
PRVF-7546 : The work directory "/tmp/CVU_19.0.0.0.0_oracle/" cannot be used on node "fppc02"

$ rhpctl add workingcopy -workingcopy WC_db_19_11_FPPC -image db_19_11 -path /u01/app/oracle/product/WC_db_19_11_FPPC -client fppc -oraclebase /u01/app/oracle

fpps01: Audit ID: 121

PRGO-1260 : Cluster Verification checks for database home provisioning failed for the specified working copy WC_db_19_11_FPPC.

PRCR-1178 : Execution of command failed on one or more nodes

PRVF-7546 : The work directory "/tmp/CVU_19.0.0.0.0_oracle/" cannot be used on node "fppc02"

This is often related to the filesystem /tmp that has the “noexec” option:

$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec)

1 2	$ mount \| grep /tmp tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec)

Although it is tempting to just remount the filesystem with “exec”, you might be in this situation because your systems are configured to adhere to the STIG recommendations:

The noexec option must be added to the /tmp partition (https://www.stigviewer.com/stig/red_hat_enterprise_linux_6/2016-12-16/finding/V-57569)

FPP 19.9 contains fix 30885598 that allows specifying the temporary location for FPP operations:

$ srvctl modify rhpserver  -tmploc <new_tmp>

1	$ srvctl modify rhpserver -tmploc <new_tmp>

After that, the operation should run smoothly:

fppc02: Successfully executed clone operation.
fppc02: Executing root script on nodes ltora401,ltora402.
fppc02: Successfully executed root script on nodes fppc01,fppc02.
fppc02: Working copy creation completed.
fppc02: Oracle home provisioned.
fpps01: Client-side action completed.

fppc02: Successfully executed clone operation.

fppc02: Executing root script on nodes ltora401,ltora402.

fppc02: Successfully executed root script on nodes fppc01,fppc02.

fppc02: Working copy creation completed.

fppc02: Oracle home provisioned.

fpps01: Client-side action completed.

HTH

—

Ludo

DBA survival BLOG

DBA stuff and Oracle Data Guard

Monthly Archives: July 2021

rhpctl addnode gihome: specify HUB or LEAF when adding new nodes to a Flex Cluster

Changing FPP temporary directory (/tmp in noexec and other issues)