{"id":1880,"date":"2019-07-09T15:13:37","date_gmt":"2019-07-09T13:13:37","guid":{"rendered":"http:\/\/www.ludovicocaldara.net\/dba\/?p=1880"},"modified":"2021-05-04T10:54:55","modified_gmt":"2021-05-04T08:54:55","slug":"remove-add-node-rhp","status":"publish","type":"post","link":"https:\/\/www.ludovicocaldara.net\/dba\/remove-add-node-rhp\/","title":{"rendered":"FPP local-mode: Steps to remove\/add node from a cluster if RHP fails to move gihome"},"content":{"rendered":"<p>I am getting more and more experience with patching clusters with the local-mode automaton. The whole process would be very complex, but the local-mode automaton makes it really easy.<\/p>\n<p>I have had nevertheless<strong> a couple of clusters where the process did not work:<\/strong><\/p>\n<p><strong>#1: The very first cluster that I installed in 18c<\/strong><\/p>\n<p>This cluster has &#8220;kind of failed&#8221; patching the first node. Actually, the rhpctl command exited with an error:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">$ rhpctl move gihome -sourcehome \/u01\/crs\/crs1830 -desthome \/u01\/crs\/crs1860 -node server1\r\nserver1.cern.ch: Audit ID: 2\r\nserver1.cern.ch: verifying versions of Oracle homes ...\r\nserver1.cern.ch: verifying owners of Oracle homes ...\r\nserver1.cern.ch: verifying groups of Oracle homes ...\r\nserver1.cern.ch: starting to move the Oracle Grid Infrastructure home from \"\/u01\/crs\/crs1830\" to \"\/u01\/crs\/crs1860\" on server cluster \"AISTEST-RAC16\"\r\n[...]\r\n2019\/07\/08 09:45:06 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'\r\nPRCG-1239 : failed to close a proxy connection\r\nConnection refused to host: server1.cern.ch; nested exception is:\r\n        java.net.ConnectException: Connection refused (Connection refused)\r\nPRCG-1079 : Internal error: ClientFactoryImpl-submitAction-error1\r\nPROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]<\/pre>\n<p>But actually, the helper kept running and configured everything properly:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">$ tail -f \/ORA\/dbs01\/oracle\/crsdata\/server1\/crsconfig\/crs_postpatch_server1_2019-07-08_09-41-36AM.log\r\n2019-07-08 09:55:25:\r\n2019-07-08 09:55:25: Succeeded in writing the checkpoint:'ROOTCRS_POSTPATCH' with status:SUCCESS\r\n2019-07-08 09:55:25: Executing cmd: \/u01\/crs\/crs1860\/bin\/clsecho -p has -f clsrsc -m 672\r\n2019-07-08 09:55:25: Executing cmd: \/u01\/crs\/crs1860\/bin\/clsecho -p has -f clsrsc -m 672\r\n2019-07-08 09:55:25: Command output:\r\n&gt;  CLSRSC-672: Post-patch steps for patching GI home successfully completed.\r\n&gt;End Command output\r\n2019-07-08 09:55:25: CLSRSC-672: Post-patch steps for patching GI home successfully completed.\r\n<\/pre>\n<p>The cluster was OK on the first node, with the correct patch level. The second node, however, was failing with:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true \">$  rhpctl move gihome -sourcehome \/u01\/crs\/crs1830 -desthome \/u01\/crs\/crs1860 -node server2\r\nserver1.cern.ch: retrieving status of databases ...\r\nserver1.cern.ch: retrieving status of services of databases ...\r\nPRCT-1011 : Failed to run \"rhphelper\". Detailed error: &lt;HLP_EMSG&gt;,RHPHELP_procCmdLine-05,&lt;\/HLP_EMSG&gt;,&lt;HLP_VRES&gt;3&lt;\/HLP_VRES&gt;,&lt;HLP_IEEMSG&gt;,PRCG-1079 : Internal error: RHPHELP122_main-01,&lt;\/HLP_IEEMSG&gt;,&lt;HLP_ERES&gt;1&lt;\/HLP_ERES&gt;<\/pre>\n<p>I am not sure about the cause, but let&#8217;s assume it is irrelevant for the moment.<\/p>\n<p><strong>#2: A cluster with new GI home not properly linked with RAC<\/strong><\/p>\n<p>This was another funny case, where the first node patched successfully, but the second one failed upgrading in the middle of the process with a java NullPointer exception. We did a few bad tries of prePatch and postPatch to solve, but after that the second node of the cluster was in an inconsistent state: in ROLLING_UPGRADE mode and not possible to patch anymore.<\/p>\n<p><strong>Common solution: removing the node from the cluster and adding it back<\/strong><\/p>\n<p>In both cases we were in the following situation:<\/p>\n<ul>\n<li>one node was successfully patched to 18.6<\/li>\n<li>one node was not patched and was not possible to patch it anymore (at least without heavy interventions)<\/li>\n<\/ul>\n<p>So, for me, the easiest solution has been removing the failing node and adding it back with the new patched version.<\/p>\n<p><strong>Steps to remove the node<\/strong><\/p>\n<p>Although the steps are described here: <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/18\/cwadd\/adding-and-deleting-cluster-nodes.html#GUID-8ADA9667-EC27-4EF9-9F34-C8F65A757F2A\">https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/18\/cwadd\/adding-and-deleting-cluster-nodes.html#GUID-8ADA9667-EC27-4EF9-9F34-C8F65A757F2A, <\/a>there are a few differences that I will highlight:<\/p>\n<p>Stop of the cluster:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">(root)# crsctl stop crs<\/pre>\n<p>The actual procedure to remove a node asks to deconfigure the databases and managed homes from the active cluster version. But as we manage our homes with golden images, we do not need this; we rather want to keep all the entries in the OCR so that when we add it back, everything is in place.<\/p>\n<p>Once stopped the CRS, we have deinstalled the CRS home on the failing node:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">(oracle)$ $OH\/deinstall\/deinstall -local<\/pre>\n<p>This\u00a0 complained about the CRS that was down, but it continued and ask for this script to be executed:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">\/u01\/crs\/crs1830\/crs\/install\/rootcrs.sh -force  -deconfig -paramfile \"\/tmp\/deinstall2019-07-08_11-37-20AM\/response\/deinstall_1830.rsp\"<\/pre>\n<p>We&#8217;ve got errors also for this script, but the remove process was OK afterall.<\/p>\n<p>Then, from the surviving node:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">root # crsctl delete node -n server2\r\noracle $ srvctl stop vip -vip server2\r\nroot $ srvctl remove vip -vip server2<\/pre>\n<p><strong>Adding the node back<\/strong><\/p>\n<p>From the surviving node, we <strong>ran gridSetup.sh<\/strong> and followed the steps to ad the node.<\/p>\n<p><strong>Wait before running root.sh<\/strong>.<\/p>\n<p>In our case, we have originally installed the cluster starting with a SW_ONLY install. This type of installation keeps some leftovers in the configuration files that prevent the root.sh from configuring the cluster&#8230;we have had to modify rootconfig.sh:<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">check\/modify \/u01\/crs\/crs1860\/crs\/config\/rootconfig.sh and change this:\r\n# before:\r\n# SW_ONLY=true\r\n# after:\r\nSW_ONLY=false<\/pre>\n<p>then, after running root.sh and the config tools, everything was back as before removing the node form the cluster.<\/p>\n<p>For one of the clusters , both nodes were at the same patch level, but the cluster was still in ROLLING_PATCH mode. So we have had to do a<\/p>\n<pre class=\"lang:plsql highlight:0 decode:true\">(root) # crsctl stop rollingpatch<\/pre>\n<p>&#8212;<\/p>\n<p>Ludo<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am getting more and more experience with patching clusters with the local-mode automaton. The whole process would be very complex, but the local-mode automaton makes it really easy. I have had nevertheless a couple of clusters where the process &hellip; <a href=\"https:\/\/www.ludovicocaldara.net\/dba\/remove-add-node-rhp\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[321,333,327,326,3,308,330,149],"tags":[],"class_list":["post-1880","post","type-post","status-publish","format-standard","hentry","category-aced","category-oracle-fpp","category-oracle-maa","category-oracle","category-oracledb","category-oracle-database-18c","category-oracle-inst-upg","category-oracle-rac"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts\/1880","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/comments?post=1880"}],"version-history":[{"count":4,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts\/1880\/revisions"}],"predecessor-version":[{"id":2048,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/posts\/1880\/revisions\/2048"}],"wp:attachment":[{"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/media?parent=1880"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/categories?post=1880"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ludovicocaldara.net\/dba\/wp-json\/wp\/v2\/tags?post=1880"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}