About Ludovico

Ludovico is a member of the Oracle Database High Availability (HA), Scalability & Maximum Availability Architecture (MAA) Product Management team in Oracle. He focuses on Oracle Data Guard, Flashback technologies, and Cloud MAA.

Video: Where should I put the Observer in a Fast-Start Failover configuration?

The video explains best practices and different failure scenarios for different observer placements. It also shows how to configure high availability for the observer.

Here’s the summary:

  • Always try to put the observer(s) on an external site.
  • If you don’t have any, put it where the primary database is, and have one ready on the secondary site after the role transition.
  • Don’t put the observer together with the standby database!
  • Configure multiple observers for high availability, and use the PreferredObserverHosts Data Guard member property to ensure you never run the observer where the standby database is.

 

Find Ludovico at Oracle Cloud World 2022!

Are you attending OCW, and do you want to find me and know more about how to avoid downtime and data loss? Or how to optimize your application configuration to make the most out of MAA technologies? Or any database, or technology-related topic?

Maybe you prefer just a chat and discussing life? Over a coffee, or tea? (or maybe beer?)

👇This is where you can find me during OCW.👇

Monday, October 17, 2022

6:30 PM – 10:00 PM – Customer Appreciation Event

Where: Mandalay Bay Shark Reef

This is an invitation-only event. If you are one of the lucky customers that possess an invitation, let’s meet there! It will be fun to discuss technology, business, and life while watching sharks and enjoying a drink together.

Tuesday, October 18, 2022

2:00 PM – 4:30 PM – Oracle Maximum Availability Architecture with Oracle RAC and Active Data Guard

Where: CloudWorld Hub, Database booth DB-01

Come together and ask anything Data Guard, Active Data Guard, RAC, FPP, or High Availability! See some products in action, and get some insights from my colleagues and me. The booth will be open during the whole exhibition time, but I will be certainly there on Tuesday for these two hours.

4:00 PM – 5:30 PM – Protect Your Business Using Oracle Full Stack Disaster Recovery Service –  Interactive Hands-On-Lab [HOL4089]

Where: Bellini 2003, The Venetian, Level 2

I will help my colleague Suraj Ramesh run the hands-on lab of this brand-new (actually, still to be released!) service for general-purpose Disaster Recovery in the cloud.

After HOL4089 until – 7:00 pm – Welcome Reception

Where: CloudWorld Hub, Database booth DB-01

I will probably join to say hello during the Welcome Reception. Maybe you can spot me there 🙂

Wednesday, October 19, 2022

10:00 AM – 12:00 PM – Oracle Maximum Availability Architecture with Oracle RAC and Active Data Guard

Where: CloudWorld Hub, Database booth DB-01

I will be there once again to answer all your questions and show some fancy stuff 🙂

1:15 PM – 2:00 PM – Oracle Data Guard—Active, Autonomous, and Always Protective [LRN3528]

Where: San Polo 3403, The Venetian, Level 3

I will talk about Data Guard, Active Data Guard, and what I consider the most important features today. Come to the session to know more!

3:00 PM – 4:30 PM – Protect Your Data with Oracle Active Data Guard – Interactive Hands-On-Lab [HOL4054]

Where: Bellini 2003, The Venetian, Level 2

I will run this hands-on lab. You will have an Active Data Guard 19c configuration in the cloud at your fingertips and you will play with role changes, corruption detection and reparation, and other features. I will be there to explain insights, hints, and recommendations on how to implement it in your work environment.

Thursday, October 20, 2022

11:40 AM – 12:00 PM – The Least-Known Facts About Oracle Data Guard and Oracle Active Data Guard [LIT4029]

Where: Ascend Lounge, CloudWorld Hub, The Venetian

This will be great! I bet you will discover MANY things that you did not know about Data Guard and Active Data Guard. Come to know more!

 

See you there!

Ludovico

Check, check… Does the mic still work? #JoelKallmanday

Update PHP:
Update WordPress:
New content:

It’s almost six months without blogging from my side. What a bad score!
It’s not a coincidence that I’m blogging today during #JoelKallmanDay.
A day that reminds the community how important it is to share. Knowledge, mostly. But also good and bad experiences, emotions…

A bittersweet day, at least for me.
On the bitter side: it reminds me of Joel, Pieter, and other friends that are not there anymore. That as a Product Manager, I have to wear big shoes, and it does not matter how good I try to do; I will always feel that it’s not good enough for the high expectations that I set for myself. Guess what! Being PM is way more complicated than I expected when I applied for the position two years ago. So many things to do or learn, so many requests, and so many customers! And being PM at Oracle is probably twice as complicated because it does not matter how good I (or we as a team) try to do; there will always be a portion of the community that picks on the Oracle technology for one reason or another.

On the bright side: it reminds me that I am incredibly privileged to have this role, working in a great team and helping the most demanding customers to get the most out of incredible technology. I love sharing, teaching, giving constructive feedback, producing quality content, and improving the customer experience. This is the sweet part of the job, where I am still taking baby steps when comparing myself to the PM legends we have in our organization. They are always glad to explain our products to the community, the customers, and colleagues! And they are all excellent mentors, each with a different style, background, and personal life.

And knowing people personally is, at least for me, the best thing about being part of a community (outside Oracle) and team (inside Oracle). We all strive for the best technical solutions, performance, developer experience, or uptime for the business. But we are human first of all. And this is what #JoelKallmanDay stands for me—trying to be a better human as a goal so that everything else comes naturally, including being a great colleague, community servant, or friend.

Far Sync and Fast-Start Failover Protection modes

Oracle advertises Far Sync as a solution for “Zero Data Loss at any distance”. This is because the primary sends its redo stream synchronously to the Far Sync, which relays it to the remote physical standby.

There are many reasons why Far Sync is an optimal solution for this use case, but that’s not the topic of this post 🙂

Some customers ask: Can I configure Far Sync to receive the redo stream asynchronously?

Although a direct standby receiving asynchronously would be a better idea, Far Sync can receive asynchronously as well.

And one reason might be to send asynchronously to one Far Sync member that redistributes locally to many standbys.

It is very simple to achieve: just changing the RedoRoutes property on the primary.

This will work seamlessly. The v$dataguard_process will show the async transport process:

 

What about Fast-Start Failover?

Up to and including 19c, ASYNC transport to Far Sync will not work with Fast-Start Failover (FSFO).

ASYNC redo transport mandates Maximum Performance protection mode, and FSFO supports that in conjunction with Far Sync only starting with 21c.

Before 21c, trying to enable FSFO with a Far Sync will fail with:

So if you want FSFO with Far Sync in 19c, it has to be MaxAvailability (and SYNC redo transport to the FarSync).


If you don’t need FSFO, as we have seen, there is no problem. The only protection mode that will not work with Far Sync is Maximum Protection:

If FSFO is required, and you want Maximum Performance before 21c, or Maximum Protection, you have to remove Far Sync from the redo route.

Ludovico

Can a physical standby database receive the redo SYNC if the Far Sync instance fails?

The answer is YES.

In the following configuration, cdgsima_lhr1pq (primary) sends synchronously to cdgsima_farsync1 (far sync), which forwards the redo stream asynchronously to cdgsima_lhr1bm (physical standby):

But if cdgsima_farsync1 is not available, I want the primary to send synchronously to the physical standby database. I accept a performance penalty, but I do not want to compromise my data protection.

I just need to set up the Redoroutes as follows:

This is defined the second part of the RedoRoutes rules:

Let’s test. If I shutdown abort the farsync instance:

I can see the new SYNC destination being open almost instantaneously (because the old destination fails immediately with ORA-03113):

Indeed, I can see the new NSS process (synchronous redo transport) spawn at that time:

Ludo

Can I rename a PDB in a Data Guard configuration?

Someone asked me this question recently.

The answer is: yes!

Let’s see it in action.

On the primary I have:

And of course the same PDBs on the standby:

Let’s change the PDB RED name to TOBY: The PDB rename operation is straightforward (but it requires a brief downtime). To be done on the primary:

On the standby, I can see that the PDB changed its name:

The PDB name change is propagated transparently with the redo apply.

Ludo

rhpctl addnode gihome: specify HUB or LEAF when adding new nodes to a Flex Cluster

I have a customer trying to add a new node to a cluster using Fleet Patching and Provisioning.

The error in the command output is not very friendly:

The “RHPHELP_preNodeAddVal” might already give an idea of the cause: something related to the “cluvfy stage -pre nodeadd” evaluation that we normally do when adding a node by hand. FPP does not really run cluvfy, but it calls the same primitives cluvfy is based on.

In FPP, when the error does not give any useful information, this is the flow to follow:

  • use “rhpctl query audit” to get the date and time of the failing operation
  • open the “rhpserver.log.0” and look for the operation log in that time frame
  • get the UID of the operation e.g., in the following line it is “1556344143”:

  • Isolate the log for the operation: grep $UID rhpserver.log.0 > $UID.log
  • Locate the trace file of the rhphelper remote execution:

  • Find the root cause in the rhphelper trace:

In this case, the target cluster is a Flex Cluster, so the command must be run specifying the node_role.

The documentation is not clear (we will fix it soon):

node_role must be specified for Flex Clusters, and it must be either HUB or LEAF.

After using the correct command line, the command succeeded.

HTH

Ludovico

Changing FPP temporary directory (/tmp in noexec and other issues)

When using FPP, you might experience the following error (PRVF-7546):

This is often related to the filesystem /tmp that has the “noexec” option:

Although it is tempting to just remount the filesystem with “exec”, you might be in this situation because your systems are configured to adhere to the STIG recommendations:

The noexec option must be added to the /tmp partition (https://www.stigviewer.com/stig/red_hat_enterprise_linux_6/2016-12-16/finding/V-57569)

FPP 19.9 contains fix 30885598 that allows specifying the temporary location for FPP operations:

After that, the operation should run smoothly:

HTH

Ludo

Why do PMs ask you to open Service Requests for almost EVERYTHING?

If you attend Oracle-related events or if you are active on Twitter or other social medias used by technologists, you might know many of us Product Managers directly. If it is the case, you know that we are in general very easy to reach and always happy to help.

When you contact us directly, however, sometimes we answer “Please open a SR for that“. Somehow irritating, huh? “We had chats and drinks together at conferences and now this bureaucracy?” This is understandable. Who likes opening SRs after all? Isn’t just easier to forward that e-mail internally and get the answer first hand?

This is something that happened to me as well in the past when I was not working for Oracle yet, and that still happens with me now (the answer coming from me, as PM).

Why? The first answer is “it depends on the question“. If it is anything that we can answer directly, we will probably do it.

It might be a question about a specific feature: “Does product X support Y?”, “can you add this feature in your product?” or a known problem for which the PM already knows the bug (in that case is just a matter of looking up the bug number), or anything that is relatively easy to answer: “What are the best practices for X?”, “Do you have a paper explaining that?”, “Does this bug have a fix already?

But there is a plethora of questions for which we need more information.

I try this, but it does not work“. “I get this error and I think it is a bug“. “I have THIS performance problem“.

This is when I’d personally ask to open a SR most of the times (unless I have a quick answer to give). And there are a few reasons:

Data protection

Oracle takes data protection very seriously. Oracle employees are trained to deal with potentially sensitive data and cannot forward customer information via e-mail. That could be exposed or forwarded to the wrong recipients by mistake, etc. We don’t ask for TFA collections or logs via e-mail (even if sometimes customers send them to us anyway…).

There are special privileges required to access customer SRs, that’s the only secure way we provide to transfer logs and protected information. The files uploaded into the SRs must be accessed through a specific application. All the checkouts and downloads are tracked. When we need to forward customer information internally, we just specify the SR number and let our colleagues access the information themselves. Sometimes we use SRs just as placeholder to exchange data with customers, without having a support engineer working on it.

This is the single most important point that somehow makes the other points irrelevant. But still the remaining ones are good points.

Important pieces in the discussion do not get lost

The answer does not always come from first-hand… it might take 3-4 hops (sometimes more) and analysis, comments, explanations, discussions.

E-mail is not a good tool for this. Long threads can split and include just part of the audience (the “don’t reply to all” effect). Attachments are deleted when replying instead of forwarding… and pieces get lost.

This is where you would use a Jira, or a trouble ticketing system. Guess which is the one that Oracle uses for its customers? 🙂

MOS has internal views to dig into TFA logs (that’s why it is a good idea to provide one, whenever it might be relevant), and all the attachments, comments and internal discussions are centralized there. But we need a SR to add information to!

Win-win: knowledge base, feedback, continuous improvement

If you discover something new from a technical discussion, what do you do? Do you share it or do you keep it for yourself? MOS is part of our knowledge base and it is a good idea to store important discussions in it. Support engineers can find solutions in SRs with similar cases. It is a good opportunity for the support engineer him/herself to be involved in one more interesting discussion, so next time he/she might have the answer on top of the fingers.

To conclude, think about it as a win-win. You give us interesting problems that might help improving the product, and you get a Guardian Angel on your SR for free 😉

Ludo