Checking usage of HugePages by Oracle databases in Linux environments

Yesterday several databases on one server started logging errors in the alert log:

That means not enough contiguous free memory in the OS. The first thing that I have checked has been of course the memory, and the used huge pages:

The memory available (last column in the free command) was indeed quite low, but still plenty of space in the huge pages (86k pages free out of 180k).

The usage by Oracle instances:

You can get the code of mem.sh in this post.

Regarding pure shared memory usage, the situation was what I was expecting:

360G of shared memory usage, much more than what was allocated in the huge pages.

I have compared the situation with the other node in the cluster: it had more memory allocated by the databases (because of more load on it), more huge page usage and less 4k pages consumption overall.

So I was wondering if all the DBs were property allocating the SGA in huge pages or not.

This redhat page has been quite useful to create a quick snippet to check the huge page memory allocation per process:

It has been easy to spot the databases not using huge pages at all:

Indeed, after stopping them, the huge page usage has not changed:

But after starting them back I could see the new huge pages reserved/allocated:

The reason was that the server has been started without huge pages first, and after a few instances started, the huge pages has been set.

HTH

Ludovico

 

Basic Vagrantfile for multiple groups of VMs

In case you want to prepare multiple sets of machines quickly using Vagrant, ready for different setups, this might be something for you:

The nice thing, (beside speeding up the creation and basic configuration) is the organization of the directories. The configuration at the beginning of the script will result in 5 virtual machines:

It is based, in part (but modified and simplified a lot), from  the RAC Attack automation scripts by Alvaro Miranda.

I have a more complex version that automates all the tasks for a full multi-cluster RAC environment, but if this is your requirement, I would rather check oravirt scripts on github (https://github.com/oravirt) . They are much more powerful and complete (and complex…) than my Vagrantfile. 🙂

Cheers

Bash tips & tricks [ep. 7]: Cleanup on EXIT with a trap

This is the seventh epidose of a small series.

Description:

Pipes, temporary files, lock files, processes spawned in background, rows inserted in a status table that need to be updated… Everything need to be cleaned up if the script exits, even when the exit condition is not triggered inside the script.

BAD:

The worst practice is, of course, to forget to cleanup the tempfiles, leaving my output and temporary directories full of files *.tmp, *.pipe, *.lck, etc. I will not show the code because the list of bad practices is quite long…

Better than forgiving to cleanup, but still very bad, is to cleanup everything just before triggering the exit command (in the following example, F_check_exit is a function that exits the script if the first argument is non-zero, as defined it in the previous episode):

A better approach, would be to put all the cleanup tasks in a Cleanup()  function and then call this function instead of duplicating all the code everywhere:

But still, I need to make sure that I insert this piece of code everywhere. Not optimal yet.

I may include the Cleanup function inside the F_check_exit function, but then I have two inconvenients:
1 – I need to define the Cleanup function in every script that includes my include file
2 – still there will be exit conditions that are not trapped

GOOD:

The good approach would be to trap the EXIT signal with the Cleanup function:

Much better! But what if my include script has some logic that also creates some temporary files?

I can create a global F_Cleanup function that eventually executes the local Cleanup function, if defined. Let me show this:

Include script:

Main script:

The Cleanup function will be executed only if defined.

No Cleanup function: no worries, but still the F_Cleanup function can do some global cleanup not specific to the main script.

Bash tips & tricks [ep. 6]: Check the exit code

This is the sixth epidose of a small series.

Description:

Every command in a script may fail due to external reasons. Bash programming is not functional programming! 🙂

After running a command, make sure that you check the exit code and either raise a warning or exit with an error, depending on how a failure can impact the execution of the script.

BAD:

The worst example is not to check the exit code at all:

Next one is better, but you may have a lot of additional code to type:

Again, Log_Close, eok, eerror, etc are functions defined using the previous Bash Tips & Tricks in this series.

GOOD:

Define once the check functions that you will use after every command:

 

Bash tips & tricks [ep. 5]: Write the output to a logfile

This is the fifth epidose of a small series.

Description:

Logging the output of the scripts to a file is very important. There are several ways to achieve it, I will just show one of my favorites.

BAD:

You can log badly either from the script to a log file:

or by redirecting badly the standard output of the script:

 GOOD:

My favorite solution is to automatically open a pipe that will receive from the standard output and redirect to the logfile. With this solution, I can programmatically define my logfile name inside the script (based on the script name and input parameters for example) and forget about redirecting the output everytime that I run a command.

(*) the functions edebug, einfo, etc, have to be created using the guidelines I have used in this post: Bash tips & tricks [ep. 4]: Use logging levels

The -Z parameter can be used to intentionally avoid logging.

Again, all this stuff (function definitions and variables) should be put in a global include file.

If I execute it:

 

Bash tips & tricks [ep. 4]: Use logging levels

This is the fourth epidose of a small series.

Description:

Support different logging levels natively in your scripts so that your code will be more stable and maintainable.

BAD:

 

 GOOD:

Nothing to invent, there are already a few blog posts around about the best practices for log messages. I personally like the one from Michael Wayne Goodman:

http://www.goodmami.org/2011/07/04/Simple-logging-in-BASH-scripts.html

I have reused his code in my scripts with very few modifications to fit my needs:

The edumpvar is handy to have the status of several variables at once:

If you couple the verbosity level with input parameters you can have something quite clever (e.g. -s for silent, -V for verbose, -G for debug). I’m putting everything into one single snippet just as example, but as you can imagine, you should seriously put all the fixed variables and functions inside an external file that you will systematically include in your scripts:

Example:

bash-colour-output-normal

bash-colour-output-verbose

bash-colour-output-debug

It does not take into account the output file. That will be part of the next tip 🙂

Bash tips & tricks [ep. 3]: Colour your terminal!

This is the third epidose of a small series.

Description:

The days of monochrome green-on-black screens are over, in a remote shell  terminal you can have something fancier!

BAD:

bash_prompt_nocolor

GOOD:

Define a series of variables as shortcuts for color escape codes, there are plenty of examples on internet.

Use them whenever you need to highlight the output of a script, and eventually integrate them in a smart prompt (like the one I’ve blogged about sometimes ago).bash_prompt_color

The echo builtin command requires -e in order to make the colours work. When reading files, cat works, less requires -r. vi may work with some hacking, but it’s not worth to spend too much time, IMHO.

Bash tips & tricks [ep. 2]: Have a smart environment for personal accounts

This is the second epidose of a small series.

Description:

The main technical account (oracle here) usually has the smart environment, with aliases, scripts avilable at fingertips, correct environment variables and functions.

When working with personal accounts, it may be boring to set the new environment at each login, copy it from a golden copy or reinvent the wheel everytime.

BAD:

 

GOOD:

Distribute a standard .bash_profile that calls a central profile script valid for all the users:

Make your common environment as smart as possible. If any commands need to be run differently depending on the user (oracle or not oracle), just use a simple if:

The goal of course is to avoid as many types as you can, and let all your colleagues profit of the smart environment.

Migrating Oracle RAC from SuSE to OEL (or RHEL) live

I have a customer that needs to migrate its Oracle RAC cluster from SuSE to OEL.

I know, I know, there is a paper from Dell and Oracle named:

How Dell Migrated from SUSE Linux to Oracle Linux

That explains how Dell migrated its many RAC clusters from SuSE to OEL. The problem is that they used a different strategy:

– backup the configuration of the nodes
– then for each node, one at time
– stop the node
– reinstall the OS
– restore the configuration and the Oracle binaries
– relink
– restart

What I want to achieve instead is:
add one OEL node to the SuSE cluster as new node
– remove one SuSE node from the now-mixed cluster
– install/restore/relink the RDBMS software (RAC) on the new node
– move the RAC instances to the new node (taking care to NOT run more than the number of licensed nodes/CPUs at any time)
– repeat (for the remaining nodes)

because the customer will also migrate to new hardware.

In order to test this migration path, I’ve set up a SINGLE NODE cluster (if it works for one node, it will for two or more).

I have to setup the new node addition carefully, mainly as I would do with a traditional node addition:

  • Add new ip addresses (public, private, vip) to the DNS/hosts
  • Install the new OEL server
  • Keep the same user and groups (uid, gid, etc)
  • Verify the network connectivity and setup SSH equivalence
  • Check that the multicast connection is ok
  • Add the storage, configure persistent naming (udev) and verify that the disks (major, minor, names) are the very same
  • The network cards also must be the very same

Once the new host ready, the cluvfy stage -pre nodeadd will likely fail due to

  • Kernel release mismatch
  • Package mismatch

Here’s an example of output:

So the problem is not if the check succeed or not (it will not), but what fails.

Solving all the problems not related to the difference SuSE-OEL is crucial, because the addNode.sh will fail with the same errors.  I need to run it using -ignorePrereqs and -ignoreSysPrereqs switches. Let’s see how it works:

Then, as stated by the addNode.sh, I run the root.sh and I expect it to work:

Bingo! Let’s check if everything is up and running:

So yes, it works, but remember that it’s not a supported long-term configuration.

In my case I expect to migrate the whole cluster from SLES to OEL in one day.

NOTE: using OEL6 as new target is easy because the interface names do not change. The new OEL7 interface naming changes, if you need to migrate without cluster downtime you need to setup the new OEL7 nodes following this post: http://ask.xmodulo.com/change-network-interface-name-centos7.html

Otherwise, you need to configure a new interface name for the cluster with oifcfg.

HTH

Ludovico