About Ludovico

Ludovico is an Oracle ACE Director, frequent speaker and community contributor, working as Senior Database Specialist for Trivadis, Switzerland.

My own Dbvisit Replicate integration with Grid Infrastructure

I am helping my customer for a PoC of Dbvisit Replicate as a logical replication tool. I will not discuss (at least, not in this post) about the capabilities of the tool itself, its configuration or the caveats that you should beware of when you do logical replication. Instead, I will concentrate on how we will likely integrate it in the current environment.

My role in this PoC is to make sure that the tool will be easy to operate from the operational point of view, and the database operations, here, are supported by Oracle Grid Infrastructure and cold failover clusters.

Note: there are official Dbvisit  online resources  about how to configure Dbvisit Replicate in a cluster. I aim to complement those informations, not copy them.

Quick overview

If you know Dbvisit replicate, skip this paragraph.

There are three main components of Dbvisit Replicate: The FETCHER, the MINE and the APPLY processes. The FETCHER gets the redo stream from the source and sends it to the MINE process. The MINE process elaborates the redo streams and converts it in proprietary transaction log files (named plog). The APPLY process gets the plog files and applies the transactions on the destination database.

From an architectural point of view, MINE and APPLY do not need to run close to the databases that are part of the configuration. The FETCHER process, by opposite, needs to be local to the source database online log files (and archived logs).

Because the MINE process is the most resource intensive, it is not convenient to run it where the databases reside, as it might consume precious CPU resources that are licensed for Oracle Database. So, first step in this PoC: the FETCHER processes will run on the cluster, while MINE and APPLY will run on a dedicated Virtual Machine.

dbvisit_gi_overview

Clustering considerations

  • the FETCHER does NOT need to run on the server of the source database: having access to the online logs through the ASM instance is enough
  • to avoid SPoF, the fetcher should be a cluster resource that can relocate without problems
  • to simplify the configuration, the FETCHER configuration and the Dbvisit binaries should be on a shared filesystem (the FETCHER does not persist any data, just the logs)
  • the destination database might be literally anywhere: the APPLY connects via SQL*Net, so a correct name resolution and routing to the destination database are enough

so the implementation steps are:

  1. create a shared filesystem
  2. install dbvisit in the shared filesystem
  3. create the Dbvisit Replicate configuration on the dedicated VM
  4. copy the configuration files on the cluster
  5. prepare an action script
  6. configure the resource
  7. test!

Convention over configuration: the importance of a strong naming convention

Before starting the implementation, I decided to put all the caveats related to the FETCHER  resource relocation on paper:

  • Where will the configuration files reside? Dbvisit has an important variable: the Configuration Name. All the operations are done by passing a configuration file named /{PATH}/{CONFIG_NAME}/{CONFIG_NAME}-{PROCESS_TYPE}.ddc to the dbvrep binary. So, I decided to put ALL the configuration directories under the same path: given the Configuration Name, I will always be able to get the configuration file path.
  • How will the configuration files relocate from one node to the other? Easy here: they won’t. I will use an ACFS filesystem
  • How can I link the cluster resource with its configuration name? Easy again: I call my resources dbvrep.CONFIGNAME.PROCESS_TYPE. e.g. dbvrep.FROM_A_TO_B.fetcher
  • How will I manage the need to use a new version of dbvisit in the future? Old and new versions must coexist: Instead of using external configuration files, I will just use a custom resource attribute named DBVREP_HOME inside my resource type definition. (see later)
  • What port number should I use? Of course, many fetchers started on different servers should not have conflicts. This is something that might be either planned or made dynamic. I will opt for the first one. But instead of getting the port number inside the Dbvisit configuration, I will use a custom resource attribute: DBVREP_PORT.

Considerations on the FETCHER listen address

This requires a dedicated paragraph. The Dbvisit documentation suggest to  create a VIP, bind on the VIP address and create a dependency between the FETCHER resource and the VIP. Here is where my configuration will differ.

Having a separate VIP per FETCHER resource might, potentially, lead to dozens of VIPs in the cluster. Everything will depend on the success of the PoC and on how many internal clients will decide to ask for such implementation. Many VIPs == many interactions with network admins for address reservation, DNS configurations, etc. Long story short, it might slow down the creation and maintenance of new configurations.

Instead, each FETCHER will listen to the local server address, and the action script will take care of:

  • getting the current host name
  • getting the current ASM instance
  • changing the settings of the specific Dbvisit Replicate configuration (ASM instance and FETCHER listen address)
  • starting the FETCHER

Implementation

Now that all the caveats and steps are clear, I can show how I implemented it:

Create a shared filesystem

Install dbvisit in the shared filesystem

Create the Dbvisit Replicate configuration on the dedicated VM

Copy the configuration files from the Dbvisit VM to the cluster

Prepare an action script

Configure the resource

Test!

 

Also the relocation worked as expected: when the settings are modified through:

The MINE process get the change dynamically, so no need to restart it.

Last consideration

Adding a hard dependency between the DB and the FETCHER will require to stop the DB with the force option or to always stop the fetcher before the database. Also, the start of the DB will pullup the FETCHER (pullup:always) and the opposite as well. We will consider furtherly if we will use this dependency or if we will manage it differently (e.g. through the action script).

The hard dependency declared without the global keyword, will always start the fetcher on the server where the database runs. This is not required, but it might be nice to see the fetcher on the same node. Again, a consideration that we will discuss furtherly.

HTH

Ludovico

Get the Most out of Oracle Data Guard – The material

Here we go: as usual, the feedback that I usually get after my talks (specifically, after POUG High Five conference), is if I will share my demo scripts and material.

Sadly, the demos I am doing for my presentation “Get the most out of Oracle Data Guard” are quite tied to an environment built for the purpose of the demos. So, do not expect to get scripts easy to use as is, but rather to get some ideas beyond the demo themselves.

I hope they will help to get the whole picture.

Of course, if you need to implement a cloning strategy based on Data Guard or any other solution that I describe in this post, please feel free to contact me, I will be glad to help you implement it in your environment.

Slides

Demo 1

Video:

Scripts:

 

Demo 2

Video:


Scripts:

 

Demo 3

Video:

Scripts:

Preparation:

snap_acfs.pl

 

snap_databasae.pl

clone_from_snap.pl

Cheers

Ludovico

12.1.0.2 Bundle Patch 170718 breaks Data Guard and Duplicate from active database

Recently my customer patched its 12.1.0.2 databases with the Bundle Patch 170718 on the new servers (half of the customer’s environment). The old servers are still on 161018 Bundle Patch.

We realized that we could not move anymore the databases from the old servers to the new ones because the duplicate from active database was failing with this error:

The last lines shows the same error that Franck blogged about some months ago.

Oracle 12.2 had introduced incompatibility with previous releases in remote file transfer via SQL*Net. At least this is what it seems. According to Oracle, this is due to a bugfix present in Oracle 12.2

Now, the bundle patch that we installed on BP 170718 contains the same bugfix (Patch for bug 18633374).

So, the incompatibility happens now between databases of the same “Major Release” (12.1.0.2).

There are two possible workarounds:

  1. Apply the same patch level on both sides (BP170718 in my case)
  2. Apply just the patch 18633374 on top of your current PSU/DBBP (a merge might be necessary).

We used the second approach and now we can setup Data Guard again to move our databases without downtime:

HTH

Ludovico

 

 

trivadis sessions at Oracle Open World 2017

This year Trivadis will be again at Oracle Open World (and Oak Table World!) in San Francisco, with a few sessions (including mine!)

If you are going to Oracle Open World and you want to say hello to the Trivadis speakers, make sure you attend them!

Get the Most Out of Oracle Data Guard
Ludovico Caldara – ACE Director, Senior Consultant – Trivadis
When: Sunday, Oct 01, 12:45 PM
Where: Marriott Marquis (Yerba Buena Level) – Nob Hill A/B

EOUC Database ACES Share Their Favorite Database Things
Christian Antognini – ACE Director, OAK Table Member, Senior Principal Consultant, Partner – Trivadis
When: Sunday, Oct 01, 10:45 AM
Where: Marriott Marquis (Golden Gate Level) – Golden Gate C1/C2

Application Containers: Multitenancy for Database Applications
Markus Flechtner – Principal Consultant – Trivadis
When: Sunday, Oct 01, 2:45 PM
Where: Marriott Marquis (Yerba Buena Level) – Nob Hill A/B

TBA
Christian Antognini – ACE Director, OAK Table Member, Senior Principal Consultant, Partner – Trivadis
When: Monday Oct 02, 1:00 PM
Where: Oak Table World, Children Creativity Museum

Apache Kafka: Scalable Message Processing and More
Guido Schmutz – ACE Director, Senior Principal Consultant, Partner – Trivadis
When: Monday Oct 02, 4:30 PM
Where: Moscone West – Room 2004

You can find trivadis’s sessions in the session catalog here.

See you there!

PostgreSQL Large Objects and space usage (part 3)

A blog post series would not be complete without a final post about vacuumlo.

In the previous post we have seen that the large objects are split in tuples containing 2048 bytes each one, and each chunk behaves in the very same way as regular tuples.

What distinguish large objects?
NOTE: in PostgreSQL, IT IS possible to store a large amount of data along with the table, thanks to the TOAST technology. Read about TOAST here.

Large objects are not inserted in application tables, but are threated in a different way. The application using large objects usually has a table with columns of type OID. When the application creates a new large objects, a new OID number is assigned to it, and this number is inserted into the application table.
Now, a common mistake for people who come from other RDBMS (e.g. Oracle), think that a large object is unlinked automatically when the row that references
it is deleted. It is not, and we need to unlink it explicitly from the application.

Let’s see it with a simple example, starting with an empty pg_largeobject table:

Let’s insert a new LOB and reference it in the table t:

Another one:

If we delete the first one, the chunks of its LOB are still there, valid:

If we want to get the rid of the LOB, we have to unlink it, either explicitly or by using triggers that unlink the LOB when a record in the application table is deleted.
Another way is to use the binary vacuumlo included in PostgreSQL.
It scans the pg_largeobject_metadata and search through the tables that have OID columns to find if there are any references to the LOBs. The LOB that are not referenced, are unlinked.
ATTENTION: this means that if you use ways to reference LOBs other than OID columns, vacuumlo might unlink LOBs that are still needed!

vacuumlo has indeed unlinked the first LOB, but the deleted tuples are not freed until a vacuum is executed:

So vacuumlo does not do any vacuuming on pg_largeobject table.

PostgreSQL Large Objects and space usage (part 2)

In my previous post I showed how large objects use space inside the table pg_largeobject when inserted.

Let’s see something more:

The table had 2 large objects (for a total of 1024 records):

Let’s try to add another random-padded file:

As expected, because a random sequence of characters cannot be compressed, the size increased again by 171 blocks (see my previous post for the explanation)

If you read this nice series of blog posts by Frits Hoogland, you should know about the pageinspect extension and the t_infomask 16-bit mask.

Let’s install it and check the content of the pg_largeobjects pages:

We already know the mathematics, but we love having all the pieces come together 🙂

We know that: The page header is 24 bytes, and that the line pointers use 4 bytes for each tuple.

The first 4 pages have the lower offset to 452 bytes means that we have (452-24)/4 = 107 tuples.

The 5th page (page number 4) has the lower to 360: (360-24)/4=84 tuples.

The remaining pages have the lower to 36: (36-24)/4 = 3 tuples.

Let’s check if we are right:

🙂
Now, let’s delete the 1Mb file and check the space again:

The space is still used and the tuples are still there.

However, we can check that the tuples are no longer used by checking the validity of their t_xmax. In fact, according to the documentation, if the XMAX is invalid the row is at the latest version:

[…] a tuple is the latest version of its row iff XMAX is invalid or t_ctid points to itself (in which case, if XMAX is valid, the tuple is either locked or deleted). […]
 (from htup_details.h lines 87-89).
We have to check the infomask against the 12th bit (2048, or 0x0800)
#define HEAP_XMAX_INVALID       0x0800  /* t_xmax invalid/aborted */

Here we go. The large objects are split in compressed chunks that internally behave the same way as regular rows!

If we import another lob we will see that the space is not reused:

Flagging the tuples as reusable is the vacuum’s job:

The normal vacuum does not release the empty space, but it can be reused now:

If we unlink the lob again and we do a vacuum full, the empty space is released:

PostgreSQL Large Objects and space usage (part 1)

PostgreSQL uses a nice, non standard mechanism for big columns called TOAST (hopefully will blog about it in the future) that can be compared to extended data types in Oracle (TOAST rows by the way can be much bigger). But traditional large objects exist and are still used by many customers.

If you are new to large objects in PostgreSQL, read here. For TOAST, read here.

Inside the application tables, the columns for large objects are defined as OIDs that point to data chunks inside the pg_largeobject table.

pg_lo

Because the large objects are created independently from the table columns that reference to it, when you delete a row from the table that points to the large object, the large object itself is not deleted.

Moreover, pg_largeobject stores by design all the large objects that exist in the database.

This makes housekeeping and maintenance of this table crucial for the database administration. (we will see it in a next post)

How is space organized for large objects?

We will see it by examples. Let’s start with an empty database with empty pg_largeobject:

Just one block. Let’s see its file on disk:

First evidence: the file is empty, meaning that the first block is not created physically until there’s some data in the table (like deferred segment creation in Oracle, except that the file exists).

Now, let’s create two files big 1MB for our tests, one zero-padded and another random-padded:

Let’s import the zero-padded one:

The large objects are split in chunks big 2048 bytes each one, hence we have 512 pieces. What about the physical size?

Just 40k! This means that the chunks are compressed (like the TOAST pages). PostgreSQL uses the pglz_compress function, its algorithm is well explained in the source code src/common/pg_lzcompress.c.

What happens when we insert the random-padded file?

The segment increased of much more than 1Mb! precisely, 1441792-40960 = 1400832 bytes. Why?

The large object is splitted again in 512 data chinks big 2048 bytes each, and again, PostgreSQL tries to compress them. But because a random string cannot be compressed, the pieces are still (average) 2048 bytes big.

Now, a database block size is 8192 bytes. If we subtract the size of the bloch header, there is not enough space for 4 chunks of 2048 bytes. Every block will contain just 3 non-compressed chunks.

So, 512 chunks will be distributed over 171 blocks (CEIL(512/3.0)), that gives:

1400832 bytes!

Depending on the compression rate that we can apply to our large objects, we might expect much more or much less space used inside the pg_largeobject table.

Time for an additional RDBMS platform in this blog?

Since its creation (9 years ago), this blog has been almost only Oracle-oriented. But during my career I worked a lot with other RDBMS technologies… SQL Server, MySQL (and forks), Sybase, PostgreSQL, Progres. Some posts in this blog prove it.

The last two years especially, I have worked a lot with PostgreSQL. In the last few months I have seen many friends and technologists increasing their curiosity in this product. So I think that I will, gently, start blogging also about my experiences with PostgreSQL.

Stay tuned if you are interested!

Italian Oracle User Group: from nothing to something

I am pretty sure that every user group has had its own difficulties when starting. Because it happened that we started the ITOUG just a couple of years ago, I think it is worth to tell the story 🙂

I have been told about a team of italian DBAs willing to create a new user group, back in 2013. I decided to join the team because I was starting my journey in the Oracle Community.

We were coming from different cities (Lausanne, Vicenza, Milano, Roma)… the only solution was to meet up online, through Google Hangouts.

The first meetings were incredibly boring and not concluding (sorry ITOUG guys if you read this :-)):

You cannot start something if you do not know what you want to achieve

Of course, everybody was agreeing that it would have been nice to become the new UKOUG in the southern Europe. But being realistic: no budget, no spare time, nothing at all, we needed a starting point. We though that the starting point was a website. But even something easy like a basic website forks a lot of additional questions:

  • Should it allow to publish content?
  • Should it have a bulletin board?
  • What about user subscription?
  • What should be the content? Articles, webinars?

We created something with the idea to publish our content, but after a while it was mostly an empty container.

It took me a while to learn a first big lesson:

Democracy does not work for small User Groups

The original founder of the group decided to quit the ITOUG because nothing concrete was happening. Everybody was putting ideas on the table but nothing was happening, really.

When my friend Björn Rost proposed me to candidate Milano for the OTN Tour, I jumped on the train: it was the best way to start something concrete. I somehow “forced” the OTN Tour candidature to happen, saying to my peers: “I will do it, if you support me, thanks; otherwise I will do it alone”.

And the response was great!

Do not wait for something to happen. If you want it, take the lead and make it.

This is the biggest lesson I have learned from my involvement with user groups. People sometimes cannot contribute because they do not what to do, how to do it, or simply they do not have time because there are a gazillion of things more important than the user groups: family, work, health… even hobbies sometimes are more important 🙂

If you have an idea, it’s up to YOU to transform it in something concrete.

 

After the acceptance, we had to prepare the event. We proposed a few dates and Björn prepared the calendar taking into account the other user groups.

Organizing an event is not easy but not so complex either

  • Set a reasonable target number of participants
  • Fix a date when most contributors are available
  • Get offers from different venues (pricing for the venue and the catering)
  • Get an idea of the budget
  • Ask different companies to sponsor the event
  • Eventually ask Oracle 🙂
  • Once the sponsors are committed to pay, block the venue
  • Prepare and publish the Call for Paper
  • Eventually start some advertising
  • Select the papers
  • Prepare the agenda
  • Ask the speakers for confirmation about the proposed date/time
  • Prepare the event registration form
  • Publish the agenda
  • Broadcast it everywhere 🙂 (Social media, contacts, website, Oracle through their channels)
  • Interact with the hotel and the sponsor to have the proper setup, billing addresses, invoices, etc.
  • Host the event
  • Relax

Finding the sponsors is the most difficult part

It has been easy for the first two events to find a sponsor (just a database stream, event held in Milano), but it was not the same for the last one.

Our aim was to do a double event (Milano + Roma) with two streams in each location (DB + BI & Analytics). In Roma we have been unable to find a sponsor (if you read this AND your company may be interested in sponsoring such event in Roma, please contact me :-)), we decided then to continue with the event in Milano.

Finding the speakers is easier than you can imagine

Unless you want non-english sessions held by native speakers, there is a huge community of speakers willing to share their knowledge. For Oracle, the most obvious source of speakers is the ACE Program, and twitter is probably the best channel for finding them.

Now it has been the third time that we organized an event, and every time we have been surprised by the good attendance and feedback.

A few images from the last event

DBygdizXkAE-EiZ DBynRIlXcAAqbtB DBy_0e8XUAECvW5
DByeHFQWAAEyh6S
DByEWJwXkAEuvrW DByLNwhW0AAlP1a DByn2xpXoAA94GL DByn43hWAAInzVp DByNSwmW0AAzfP_ DByoj6iXUAEV24j DByQNHLXYAAN9X5
DBzBIhOWAAQ_lZO DBzLRM6W0AExe6a DBzKCL3XkAAXcBF
DBydb-qWsAUerGg

Which Oracle Databases use most CPU on my server?

Assumptions

  • You have many (hundreds) of instances and more than a couple of servers
  • One of your servers have high CPU Load
  • You have Enterprise Manager 12c but the Database Load does not filter by server
  • You want to have an historical representation of the user CPU utilization, per instance

Getting the data from the EM Repository

With the following query, connected to the SYSMAN schema of your EM repository, you can get the hourly max() and/or avg() of user CPU by instance and time.

Suppose you select just the max value: the result will be similar to this:

 

Putting it into excel

There are one million ways to do something more reusable than excel (like rrdtool scripts, gnuplot, R, name it), but Excel is just right for most people out there (including me when I feel lazy).

  • Configure an Oracle Client and add the ODBC data source to the EM repository:

odbc_emrep

  • Open Excel, go to “Data” – “Connections” and add a new connection:
    • Search…
    • New Source
    • DSN ODBC
  • Select your new ODBC data source, user, password
  • Uncheck “Connection to a specific table”
  • Give a name and click Finish
  • On the DSN -> Properties -> Definition, enter the SQL text I have provided previously

connection_properties_odbc_excel

The result should be something similar: ( but much longer :-))

first_step_excelPivoting the results

Create e new sheet and name it “pivot”, Click on “Create Pivot Table”, select your data and your dimensions:

pivotThe result:

pivotedCreating the Graph

Now that the data is correctly formatted, it’s easyy to add a graph:

just select the entire pivot table and create a new stacked area graph.

The result will be similar to this:

graph_cpu_load_excel

With such graph, it is easy to spot which databases consumed most CPU on the system in a defined period, and to track the progress if you start a “performance campaign”.

For example, you can see that the “green” and “red” databases were consuming constantly some CPU up to 17.05.2017 and then some magic solved the CPU problem for those instances.

It is also quite convenient for checking the results of new instance caging settings…

The resulting CPU will not necessarily be 100%: the SYS CPU time is not included, as well as the user CPU of all the other processes that are either not DB or not monitored with Enterprise Manager.

HTH

Ludovico