Checking usage of HugePages by Oracle databases in Linux environments

Yesterday several databases on one server started logging errors in the alert log:

That means not enough contiguous free memory in the OS. The first thing that I have checked has been of course the memory, and the used huge pages:

The memory available (last column in the free command) was indeed quite low, but still plenty of space in the huge pages (86k pages free out of 180k).

The usage by Oracle instances:

You can get the code of mem.sh in this post.

Regarding pure shared memory usage, the situation was what I was expecting:

360G of shared memory usage, much more than what was allocated in the huge pages.

I have compared the situation with the other node in the cluster: it had more memory allocated by the databases (because of more load on it), more huge page usage and less 4k pages consumption overall.

So I was wondering if all the DBs were property allocating the SGA in huge pages or not.

This redhat page has been quite useful to create a quick snippet to check the huge page memory allocation per process:

It has been easy to spot the databases not using huge pages at all:

Indeed, after stopping them, the huge page usage has not changed:

But after starting them back I could see the new huge pages reserved/allocated:

The reason was that the server has been started without huge pages first, and after a few instances started, the huge pages has been set.

HTH

Ludovico

 

Parameter REMOTE_LISTENER pointing to a TNS alias? Beware of how it registers.

On an Oracle Database instance, if I set:

The instance tries to resolve the cluster-scan name to detect if it is a SCAN address.
So, after it solves, it stores all the addresses it gets and registers to them.
I can check which addresses there are with this query:

In this case, the instance registers to the three addresses discovered, which is OK: all three SCAN listeners will get service updates from the instance.

But if I have this TNS alias:

and I set:

I get:

the result is that the instance registers only at the first IP fot from the DNS, leaving the other SCANs without the service registration and thus random

This is in my opinion quite annoying, as my goal here was to have all the DBs set with:

in order to facilitate changes of ports, database migrations from different clusters, clones, etc.

So the solution is either to revert to the syntax “cluster-scan:port”, or specifying explicitly all the endpoints in the address list:

I am sure it is “working as designed”, but I wonder if it could be an enhancement to have the address expended fully also in case of TNS alias….
Or… do you know any way to do it from a TNS alias without having the full IP list?

Cheers

Ludo

Install and configure CMAN 19c in the Oracle Cloud, step by step

Installing and configuring CMAN is a trivial activity, but having the steps in one place is better than reinventing the wheel.

Prepare for the install

Download the Oracle Client 19.3.0.0 in the Oracle Database 19c download page.

Choose this one: LINUX.X64_193000_client.zip (64-bit) (1,134,912,540 bytes) , not the one named “LINUX.X64_193000_client_home.zip” because it is a preinstalled home that does not contain the CMAN tools.

Access the OCI Console and create a new Compute instance. The default  configuration is OK, just make sure that it is Oracle Linux 7 🙂

Do not forget to add your SSH Public Key to access the VM via SSH!

Access the VM using

Copy the Oracle Client zip in /tmp using your favorite scp program.

Install CMAN

Follow these steps to install CMAN:

 

Basic configuration

This will create a CMAN configuration named cman-test. Beware that it is very basic and insecure. Please read the CMAN documentation if you want something more secure or sophisticated.

The advantage of having the TNS_ADMIN outside the Oracle Home is that if you need to patch CMAN, you can do it out-of-place without the need to copy the configuration files somewhere else.

The advantage of using IFILE inside cman.ora, is that you can manage easily different CMAN configurations in the same host without editing directly cman.ora, with the risk of messing it up.

Preparing the start/stop script

Create a file /u01/app/oracle/scripts/cman_service.sh with this content:

This is at the same time ORACLE_HOME agnostic and configuration agnostic.

Make it executable:

and try to start CMAN:

Stop should work as well:

Add the service in systemctl

Open firewall ports

By default, new OL7 images use firewalld. Just open the port 1521 from the public zone:

 

Bonus: have a smart environment!

Ludo

Steps to remove/add node from a cluster if RHP fails to move gihome

I am getting more and more experience patching clusters with the local-mode automaton. The whole process would be very complex, but the local-mode automaton makes it really easy.

I have had nevertheless a couple of clusters where the process did not work:

#1: The very first cluster that I installed in 18c

This cluster has “kind of failed” patching the first node. Actually, the rhpctl command exited with an error:

But actually, the helper lept running and configured everything properly:

The cluster was OK on the first node, with the correct patch level. The second node, however, was filing with:

I am not sure about the cause, but let’s assume it is irrelevant for the moment.

#2: A cluster with new GI home not properly linked with RAC

This was another funny case, where the first node patched successfully, but the second one failed upgrading in the middle of the process with a java NullPointer exception. We did a few bad tries of prePatch and postPatch to solve, but after that the second node of the cluster was in an inconsistent state: in ROLLING_UPGRADE mode and not possible to patch anymore.

Common solution: removing the node from the cluster and adding it back

In both cases we were in the following situation:

  • one node was successfully patched to 18.6
  • one node was not patched and was not possible to patch it anymore (at least without heavy interventions)

So, for me, the easiest solution has been removing the failing node and adding it back with the new patched version.

Steps to remove the node

Although the steps are described here: https://docs.oracle.com/en/database/oracle/oracle-database/18/cwadd/adding-and-deleting-cluster-nodes.html#GUID-8ADA9667-EC27-4EF9-9F34-C8F65A757F2A, there are a few differences that I will highlight:

Stop of the cluster:

The actual procedure to remove a node asks to deconfigure the databases and managed homes from the active cluster version. But as we manage our homes with golden images, we do not need this; we rather want to keep all the entries in the OCR so that when we add it back, everything is in place.

Once stopped the CRS, we have deinstalled the CRS home on the failing node:

This  complained about the CRS that was down, but continued and ask for this script to be executed:

We’ve got errors also for this script, but the remove process was OK afterall.

Then, from the surviving node:

Adding the node back

From the surviving node, we ran gridSetup.sh and followed the steps to ad the node.

Wait before running root.sh.

In our case, we have originally installed the cluster starting with a SW_ONLY install. This type of installation keeps some leftovers in the configuration files that prevent the root.sh from configuring the cluster…we have had to modify rootconfig.sh:

then, after running root.sh and the config tools, everything was back as before removing the node form the cluster.

For one of the clusters , both nodes were at the same patch level, but the cluster was still in ROLLING_PATCH mode. So we have had to do a

Ludo

How to install and access Oracle Weblogic 12.2 in the Oracle Cloud Infrastructure

I put here the steps required to install and access Weblogic in the OCI (mostly for me in case I need to do it again 😉 ). The assumptions are:

  • you already have an account for the Oracle Cloud Infrastructure and you can access the OCI console
  • you already have a Compartment with a VCN and a subnet configured (for test purposes, a VCN created with the default values will be just fine)
  • you already have a keypair for your SSH client (id_rsa, id_rsa.pub)
  • you have an X server on your laptop (if you have Windows, I recommend MobaXTerm, but Xming or other servers are just fine)

Create the compute instance

  • Menu -> Core Infrastructure -> Compute -> Instances -> Create Instance
  • Choose a name for the Instance, all the other fields defaults are fine for test (Oracle Linux 7.6, VM.Standard2.1, etc.)
  • Paste your SSH public key
  • Optionally, under advanced/network, specify a different name for the VM
  • Click on Create to complete the creation

At some point you will have an instance “Green” ready to access:

Click on it and get the public address:

Using your SSH keypair, you can now access the instance with:

 

Setup sshd for SSH tunneling and X11 forwarding

Edit as root the sshd_config:

Modify it so that the following lines are present with these values:

Those values are required for X11 forwarding (required for the graphical installation) and for SSH tunneling (required to access  the Weblogic ports without exposing them over internet).

Then restart sshd:

Install the packages for X11 

At this point, it should be possible to forward X11. You can test by reconnecting with:

and then:

Create the oracle user

At this point, you can reconnect using oracle directly, so X11 forward will work for the oracle user without any additional setup:

 

Follow the canonical steps to install weblogic

If you do not know how to do that, follow this good tutorial by Tim Hall (oracle-base):

Oracle WebLogic Server (WLS) 12cR2 (12.2.1) Installation on Oracle Linux 6 and 7

 

Access the Weblogic console from outside Oracle Cloud

If you configured correctly sshd, once the Oracle Weblogic instance is configured and started, you can tunnel to the port (it should be 7001):

And be able to browse from your laptop using localhost:7001:

HTH

Ludovico

Oracle SW_ONLY install leads to relink with rac_off at every attachHome

OK, I really do not know what other title I should use for this post.

I have developed and presented a few times my personal approach to Oracle Home provisioning and patching. You can read more in this series.

With this approach:

  • I install the software (either GI or RDBMS) with the option SW_ONLY once
  • I patch it to the last version
  • I create a golden image that I evolve for the rest of the release lifecycle

When I need to install it, I just unzip the golden image and attach it to the Central Inventory.

I have discovered quite longtime ago that, every time I was attaching the home to the inventory, the binaries were relinked with rac_off, disregarding the fact that the home that I zipped actually had RAC enabled. This is quite annoying at my work at CERN, as all our databases are RAC.

So my solution to the problem is to detect if the server is on a cluster, and relink on the fly:

This is a simplified snippet of my actual code, but it gives the idea.

What causes the relink with rac_off?

I have discovered recently that the steps used by the runInstaller process to attach the Oracle Home are described in this file:

and in my case, for all my golden images, it contains:

So, it does not matter how I prepare my images: unless I change this file and put rac_on, the runInstaller keeps relinking with rac_off.

I have thought about changing the file, but then realized that I prefer to check and recompile at runtime, so I can reuse my images also for standalone servers (in case we need them).

Just to avoid surprises, it is convenient to check if a ORACLE_HOME is linked with RAC with this small function:

This is true especially for Grid Infrastructure golden images, as they have the very same behavior of RDBMS homes, with the exception that they might break out-of-place patching if RAC is not enabled: the second ASM instance will not mount because the first will be exclusively mounted without the RAC option.

 

HTH.

Ludovico

Oracle Grid Infrastructure 19c does not configure the local-mode automaton by default. How to add it?

I have been installing Grid Infrastructure 18c for a while, then switched to 19c when it became GA.

At the beginning I have been overly enthusiast by the shorter installation time:

The GIMR is now optional, that means that deciding to install it is a choice of the customer, and a customer might like to keep it or not, depending on its practices.

Not having the GIMR by default means not having the local-mode automaton. This is also not a problem at all. The default configuration is good for most customers and works really well.

This new simplified configuration reduces some maintenance effort at the beginning, but personally I use a lot the local-mode automaton for out-of-place patching of Grid Infrastructure (read my blog posts to know why I really love the local-mode automaton), so it is something that I definitely need in my clusters.

A choice that makes sense for Oracle and most customers

Oracle vision regarding Grid Infrastructure consists of a central management of clusters, using the Oracle Domain Services Cluster. In this kind of deployment, the Management Repository, TFA, and many other services, are centralized. All the clusters use those services remotely instead of having them configured locally. The local-mode automaton is no exception: the full, enterprise-grade version of Fleet Patching and Provisioning (FPP, formerly Rapid home provisioning or RHP) allows much more than just out-of-place patching of Grid Infrastructure, so it makes perfectly sense to avoid those configurations everywhere, if you use a Domain Cluster architecture. Read more here.

Again, as I said many times in the past, doing out-of-place patching is the best approach in my opinion, but if you keep doing in-place patching, not having the local-mode automaton is not a problem at all and the default behavior in 19c is a good thing for you.

I need local-mode automaton on 19c, what I need to do at install time?

If you have many clusters, you are not installing them by hand with the graphic interface (hopefully!). In the responseFile for 19c Grid Infrastructure installation, this is all you need to change comparing to a 18c:

as you can see, also Flex ASM is not part of the game by default in 19c.

Once you specify in the responseFile that you want GIMR, then the local-mode automaton is installed  as well by default.

I installed GI 19c without GIMR and local-mode automaton. How can I add them to my new cluster?

First, recreate the empty MGMTDB CDB by hand:

Then, configure the PDB for the cluster. Pay attention to the -local switch that is not documented (or at least it does not appear in the inline help):

After that, you might check that you have the PDB for your cluster inside the MGMTDB, I’ll skip this step.

Before creating the rhpserver (local-mode automaton resource), we need the volume and filesystem to make it work (read here for more information).

The volume:

The filesystem:

Finally, create the local-mode automaton resource:

Again, note that there is a -local switch that is not documented. Specifying it will create the resource as a local-mode automaton and not as a full FPP Server (or RHP Server, damn, this change of name gets me mad when I write blog posts about it 🙂 ).

HTH

Ludovico

(unsupported) DST_UPGRADE_STATE is DATAPUMP(1) but no data pump jobs are running. How to fix?

This blog post contains unsupported commands. Do not try to use them without the supervision of Oracle Support!

I have just run across an Oracle Database who had a broken configuration in database_properties.

The database was in process of being upgraded to 18c, but the DST upgrade step was not working because of wrong entries in the view DATABASE_PROPERTIES:

The MOS note Updating the RDBMS DST version in 12c Release 1 (12.1.0.1 and up) using DBMS_DST (Doc ID 1509653.1) states that I had to check note How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ? (Doc ID 336014.1) to solve the problem.

In fact, there should have  been an orphan data pump job trying to import the timezone file. But in my case, no jobs at all, do data pump job tables.

Second, the secondary time zone being lower than the primary one was, to me, sign of an old upgrade went wrong.

Trying to begin a new prepare phase was failing with:

Trying to end the old one was failing as well:

Trying to unload the secondary was failing as well:

I double-checked ALL the notes to clean-up the situation and made sure that there was nothing actually running regarding a DST upgrade.

I am pretty evil trying unsupported stuff. So I have decided to check the underlying table:

Fixed tables are not writable, but sys.props$ is, and it was containing the same bad data:

So I did what I knew was wrong, after taking a guaranteed restore point. Do not try this at home without the supervision of Oracle Support!

Trying again:

The rest of the upgrade procedure went smoothly.

Ludovico

First draft of a Common Oracle Environment… for the Cloud Database (and not only)

I have just published on GitHub a draft of a common Oracle environment scripts that make the shell environment a little bit smarter than what it is by default. It uses some function and aliases that I have published during the past years.

You can start playing with:

Ideal for the Oracle Cloud Infrastructure

If you are new to the Oracle Cloud, probably you do not have environment scripts that makes it easy to interact with the database.

The environment scripts that I have published work out-of the box in the cloud (just make sure that you have rlwrap installed so that you can have a better CLI experience).

Actually, they work great as well on-premises, but I assume that you already have something automatic there.

Some examples

  • My famous Smart Prompt 😉 (including version, edition, exit code, etc)

  • u : gets the status of the databases

  • pmon: just displays the running pmon processes

  • db : sets the environment for a specific DB_NAME, DB_UNIQUE_NAME or SID

  • svcstat : shows the running services (and the corresponding pdb, host, etc) as I described in my previous post

  • s_ : smart alias for sqlplus: connects as sysdba/sysasm by default, or with any arguments that you pass:

  • adr_, dg_ rman_, cm_, lsn_ : aliases for common oracle binaries
  • genpasswd : generates random passwords (default length 30)

  • lsoh: lists the Oracle Homes attached to the inventory

  • setoh: sets the Oracle Home given its name in the inventory

 

You might want to install the same environment for oracle, grid (if you have role separation, it should be the case for Cloud DB Systems) and (eventually) root.

I am curious to know if it works well for your environment.

Cheers

Ludo

The new era of spam? Technical-related comments that defeat anti-spam filters

Part of the tasks of every blogger is to update the anti-spam filters (a WordPress plugin in my case, akismet), read the comments in quarantine and either approve them, delete them or mark them as spam (to that the anti-spam plugin can enrich the spam detection algorithms).

So far, the main factor to me to decide to approve or not, is to read the content of the message and find it appropriate or not for the blog post.

Until today.

Some time ago I have written this blog post about diagnosing problems in deleting the Audit Trail in Oracle 12c:

DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL not working on 12c? Here’s why…

Last week I have got this comment:

Technical. Talking about Oracle Audit (which is relevant to my blog post) and with  technical details that only an Oracle professional might know.

So I have approved it and read my blog post again to find a proper answer.

Except that my blog post was not talking about Unified Audit, relink of binaries or exports!

I have realized then that the “link” of the blogger was in fact a website selling sex toys… (yeah, writing this here will bring even more spam… I take the risk 🙂 )

So where does it come from? Actually, the comment has been copied from another blog:

https://petesdbablog.wordpress.com/2013/07/20/12c-new-feature-unified-auditing/

A new type of spam (maybe it exists since a while? I am not a spam expert).

The spammer (bot?) copies technical comments from blog posts with similar keywords and just changes the name and the link of the author.

Beware, bloggers.

Ludo