DBA survival BLOG

DBA stuff and Oracle Data Guard

New views in Oracle Data Guard 23c

Posted on January 3, 2024 by Ludovico

Oracle Data Guard 23c comes with many nice improvements for observability, which greatly increase the usability of Data Guard in environments with a high level of automation.

For the 23c version, we have the following new views.V$DG_BROKER_ROLE_CHANGE

This view tracks the last role transitions that occurred in the configuration. Example:

SQL> select * from v$dg_broker_role_change;

EVENT         STANDBY_TYPE    OLD_PRIMARY       NEW_PRIMARY       FS_FAILOVER_REASON    BEGIN_TIME                         END_TIME                              CON_ID
_____________ _______________ _________________ _________________ _____________________ __________________________________ __________________________________ _________
Switchover    Physical        adghol_53k_lhr    adghol_p4n_lhr                          18-DEC-23 10.40.12.000000000 AM    18-DEC-23 10.40.32.000000000 AM            0
Switchover    Physical        adghol_p4n_lhr    adghol_53k_lhr                          18-DEC-23 10.48.55.000000000 AM    18-DEC-23 10.49.15.000000000 AM            0

SQL> select * from v$dg_broker_role_change;

EVENT STANDBY_TYPE OLD_PRIMARY NEW_PRIMARY FS_FAILOVER_REASON BEGIN_TIME END_TIME CON_ID

_____________ _______________ _________________ _________________ _____________________ __________________________________ __________________________________ _________

Switchover Physical adghol_53k_lhr adghol_p4n_lhr 18-DEC-23 10.40.12.000000000 AM 18-DEC-23 10.40.32.000000000 AM 0

Switchover Physical adghol_p4n_lhr adghol_53k_lhr 18-DEC-23 10.48.55.000000000 AM 18-DEC-23 10.49.15.000000000 AM 0

The event might be a Switchover, Failover, or Fast-Start Failover.

In the case of Fast-Start Failover, you will see the reason (typically “Primary Disconnected” if it comes from the observer, or whatever reason you put in DBMS_DG.INITIATE_FS_FAILOVER.

No more need to analyze the logs to find out which database was primary at any moment in time!

V$DG_BROKER_PROPERTY

Before 23c, the only possible way to get a broker property from SQL was to use undocumented (unsupported) procedures in the fixed package DBMS_DRS. I’ve blogged about it in the past, before joining Oracle.

Now, it’s as easy as selecting from a view, where you can get the properties per member or per configuration:

SQL> select member, property, value from V$DG_BROKER_PROPERTY where value is not null;

MEMBER      PROPERTY                        VALUE
___________ _______________________________ _________
mydb        FastStartFailoverThreshold      180
mydb        OperationTimeout                30
mydb        TraceLevel                      USER
mydb        FastStartFailoverLagLimit       300
mydb        CommunicationTimeout            180
mydb        ObserverReconnect               0
mydb        ObserverPingInterval            0
mydb        ObserverPingRetry               0
mydb        FastStartFailoverAutoReinstate  TRUE
mydb        FastStartFailoverPmyShutdown    TRUE
...
mydb_site1  DGConnectIdentifier             mydb_site1
mydb_site1  FastStartFailoverTarget         mydb_site2
mydb_site1  LogShipping                     ON
mydb_site1  LogXptMode                      ASYNC
mydb_site1  DelayMins                       0
...
mydb_site1  StaticConnectIdentifier         (DESCRIPTION=<...>)))
mydb_site1  TopWaitEvents                   (monitor)
mydb_site1  SidName                         (monitor)
mydb_site2  DGConnectIdentifier             mydb_site2
mydb_site2  FastStartFailoverTarget         mydb_site1

SQL> select member, property, value from V$DG_BROKER_PROPERTY where value is not null;

MEMBER PROPERTY VALUE

___________ _______________________________ _________

mydb FastStartFailoverThreshold 180

mydb OperationTimeout 30

mydb TraceLevel USER

mydb FastStartFailoverLagLimit 300

mydb CommunicationTimeout 180

mydb ObserverReconnect 0

mydb ObserverPingInterval 0

mydb ObserverPingRetry 0

mydb FastStartFailoverAutoReinstate TRUE

mydb FastStartFailoverPmyShutdown TRUE

...

mydb_site1 DGConnectIdentifier mydb_site1

mydb_site1 FastStartFailoverTarget mydb_site2

mydb_site1 LogShipping ON

mydb_site1 LogXptMode ASYNC

mydb_site1 DelayMins 0

...

mydb_site1 StaticConnectIdentifier (DESCRIPTION=<...>)))

mydb_site1 TopWaitEvents (monitor)

mydb_site1 SidName (monitor)

mydb_site2 DGConnectIdentifier mydb_site2

mydb_site2 FastStartFailoverTarget mydb_site1

The example selects just three columns, but the view is rich in detailing which properties apply to which situation (scope, valid_role):

SQL> set sqlformat json-formatted
SQL> select * from v$dg_broker_property where member='adghol_p4n_lhr' and upper(property) like '%REDO%';
{
  "results" : [
    {
    ...
      "items" : [
        {
          "member" : "adghol_p4n_lhr",
          "instance" : "N/A",
          "dataguard_role" : "PHYSICAL STANDBY",
          "property" : "PreferredObserverHosts",
          "property_type" : "CONFIGURABLE",
          "value" : "",
          "value_type" : "STRING",
          "scope" : "MEMBER",
          "valid_role" : "N/A",
          "con_id" : 0
        },
        {
          "member" : "adghol_p4n_lhr",
          "instance" : "N/A",
          "dataguard_role" : "PHYSICAL STANDBY",
          "property" : "RedoRoutes",
          "property_type" : "CONFIGURABLE",
          "value" : "",
          "value_type" : "STRING",
          "scope" : "MEMBER",
          "valid_role" : "N/A",
          "con_id" : 0
        },
        {
          "member" : "adghol_p4n_lhr",
          "instance" : "N/A",
          "dataguard_role" : "PHYSICAL STANDBY",
          "property" : "RedoCompression",
          "property_type" : "CONFIGURABLE",
          "value" : "DISABLE",
          "value_type" : "STRING",
          "scope" : "MEMBER",
          "valid_role" : "STANDBY",
          "con_id" : 0
        }
      ]
    }
  ]
}

SQL> set sqlformat json-formatted

SQL> select * from v$dg_broker_property where member='adghol_p4n_lhr' and upper(property) like '%REDO%';

{

"results" : [

{

...

"items" : [

{

"member" : "adghol_p4n_lhr",

"instance" : "N/A",

"dataguard_role" : "PHYSICAL STANDBY",

"property" : "PreferredObserverHosts",

"property_type" : "CONFIGURABLE",

"value" : "",

"value_type" : "STRING",

"scope" : "MEMBER",

"valid_role" : "N/A",

"con_id" : 0

{

"member" : "adghol_p4n_lhr",

"instance" : "N/A",

"dataguard_role" : "PHYSICAL STANDBY",

"property" : "RedoRoutes",

"property_type" : "CONFIGURABLE",

"value" : "",

"value_type" : "STRING",

"scope" : "MEMBER",

"valid_role" : "N/A",

"con_id" : 0

{

"member" : "adghol_p4n_lhr",

"instance" : "N/A",

"dataguard_role" : "PHYSICAL STANDBY",

"property" : "RedoCompression",

"property_type" : "CONFIGURABLE",

"value" : "DISABLE",

"value_type" : "STRING",

"scope" : "MEMBER",

"valid_role" : "STANDBY",

"con_id" : 0

}

]

}

]

}

The monitorable properties can be monitored using DBMS_DG.GET_PROPERTY(). I’ll write a blog post about the new PL/SQL APIs in the upcoming weeks.

I wish I had this view when I was a DBA 🙂

V$FAST_START_FAILOVER_CONFIG

If you have a Fast-Start Failover configuration, this view will show its details:

SQL> SELECT fsfo_mode, status, current_target, threshold, observer_present, observer_host,
 2> protection_mode, lag_limit, auto_reinstate, observer_override, shutdown_primary FROM V$FAST_START_FAILOVER_CONFIG;

FSFO_MODE           STATUS                 CURRENT_TARGET THRESHOLD OBSERVE OBSERVER_HOST PROTECTION_MODE  LAG_LIMIT AUTO_ OBSER SHUTD
___________________ ______________________ ______________ _________ _______ _____________ ________________ _________ _____ _____ _____
POTENTIAL DATA LOSS TARGET UNDER LAG LIMIT mydb_site2           180 YES     mydb-obs      MaxPerformance         300 TRUE  FALSE TRUE

SQL> SELECT fsfo_mode, status, current_target, threshold, observer_present, observer_host,

2> protection_mode, lag_limit, auto_reinstate, observer_override, shutdown_primary FROM V$FAST_START_FAILOVER_CONFIG;

FSFO_MODE STATUS CURRENT_TARGET THRESHOLD OBSERVE OBSERVER_HOST PROTECTION_MODE LAG_LIMIT AUTO_ OBSER SHUTD

___________________ ______________________ ______________ _________ _______ _____________ ________________ _________ _____ _____ _____

POTENTIAL DATA LOSS TARGET UNDER LAG LIMIT mydb_site2 180 YES mydb-obs MaxPerformance 300 TRUE FALSE TRUE

This view replaces some columns currently in v$database, that are therefore deprecated:

SQL> desc v$database

Name                            Null?    Type
_______________________________ ________ ________________
...
FS_FAILOVER_MODE                         VARCHAR2(19)
FS_FAILOVER_STATUS                       VARCHAR2(22)
FS_FAILOVER_CURRENT_TARGET               VARCHAR2(30)
FS_FAILOVER_THRESHOLD                    NUMBER
FS_FAILOVER_OBSERVER_PRESENT             VARCHAR2(7)
FS_FAILOVER_OBSERVER_HOST                VARCHAR2(512)
...

SQL> desc v$database

Name Null? Type

_______________________________ ________ ________________

...

FS_FAILOVER_MODE VARCHAR2(19)

FS_FAILOVER_STATUS VARCHAR2(22)

FS_FAILOVER_CURRENT_TARGET VARCHAR2(30)

FS_FAILOVER_THRESHOLD NUMBER

FS_FAILOVER_OBSERVER_PRESENT VARCHAR2(7)

FS_FAILOVER_OBSERVER_HOST VARCHAR2(512)

...

V$FS_LAG_HISTOGRAM

This view is useful to calculate the optimal FastStartFailoverLagTime.

SQL> select * from v$fs_lag_histogram;

   THREAD# LAG_TYPE      LAG_TIME  LAG_COUNT LAST_UPDATE_TIME         CON_ID                        
---------- ----------- ---------- ---------- -------------------- ----------                        
         1 APPLY                5        122 01/23/2023 10:46:07           0                        
         1 APPLY               10          5 01/02/2023 16:12:42           0                        
         1 APPLY               15          2 12/25/2022 12:01:23           0                        
         1 APPLY               30          0                               0                        
         1 APPLY               60          0                               0                        
         1 APPLY              120          0                               0                        
         1 APPLY              180          0                               0                        
         1 APPLY              300          0                               0                        
         1 APPLY            65535          0                               0

SQL> select * from v$fs_lag_histogram;

THREAD# LAG_TYPE LAG_TIME LAG_COUNT LAST_UPDATE_TIME CON_ID

---------- ----------- ---------- ---------- -------------------- ----------

1 APPLY 5 122 01/23/2023 10:46:07 0

1 APPLY 10 5 01/02/2023 16:12:42 0

1 APPLY 15 2 12/25/2022 12:01:23 0

1 APPLY 30 0 0

1 APPLY 60 0 0

1 APPLY 120 0 0

1 APPLY 180 0 0

1 APPLY 300 0 0

1 APPLY 65535 0 0

It shows the frequency of Fast-Start Failover lags and the most recent occurrence for each bucket.

LAG_TIME is the upper bound of the bucket, e.g.

5 -> between 0 and 5 seconds
10 -> between 5 and 10 seconds
etc.

It’s refreshed every minute, only when Fast-Start Failover is enabled (also in observe-only mode).

V$FS_FAILOVER_OBSERVERS

This view is not new, however, its definition now contains more columns:

SQL> desc  v$fs_failover_observers
 Name                           Null?    Type
 ------------------------------ -------- -----------------
 NAME                                    VARCHAR2(513)
 REGISTERED                              VARCHAR2(4)
 HOST                                    VARCHAR2(513)
 ISMASTER                                VARCHAR2(4)
 TIME_SELECTED                           TIMESTAMP(9)
 PINGING_PRIMARY                         VARCHAR2(4)
 PINGING_TARGET                          VARCHAR2(4)
 CON_ID                                  NUMBER
 
 -- new in 23c:
 LAST_PING_PRIMARY                       NUMBER
 LAST_PING_TARGET                        NUMBER
 LOG_FILE                                VARCHAR2(513)
 STATE_FILE                              VARCHAR2(513)
 CURRENT_TIME                            TIMESTAMP(9)

SQL> desc v$fs_failover_observers

Name Null? Type

------------------------------ -------- -----------------

NAME VARCHAR2(513)

REGISTERED VARCHAR2(4)

HOST VARCHAR2(513)

ISMASTER VARCHAR2(4)

TIME_SELECTED TIMESTAMP(9)

PINGING_PRIMARY VARCHAR2(4)

PINGING_TARGET VARCHAR2(4)

CON_ID NUMBER

-- new in 23c:

LAST_PING_PRIMARY NUMBER

LAST_PING_TARGET NUMBER

LOG_FILE VARCHAR2(513)

STATE_FILE VARCHAR2(513)

CURRENT_TIME TIMESTAMP(9)

This gives important additional information about the observers, for example, the last time a specific observer was able to ping the primary or the target (in seconds).

Also, the path of the log file and runtime data file are available, making it easier to find them on the observer host in case of a problem.

Conclusion

These new views should greatly improve the experience when monitoring or diagnosing problems with Data Guard. But they are just a part of many improvements we introduced in 23c. Stay tuned for more 🙂

—

Ludovico

New in Data Guard 21c and 23c: Automatic preparation of the primary

Posted on December 22, 2023 by Ludovico

Oracle Data Guard 21c came with a new command:

prepare database for data guard
with db_unique_name is {db_unique_name}
db_recovery_file_dest_size is "{size}"
db_recovery_file_dest is "{dest}" ;

prepare database for data guard

with db_unique_name is {db_unique_name}

db_recovery_file_dest_size is "{size}"

db_recovery_file_dest is "{dest}" ;

This command prepares a database to become primary in a Data Guard configuration.

It sets many recommended parameters:

DB_FILES                      = 1024
LOG_BUFFER                    = 256M
DB_BLOCK_CHECKSUM             = TYPICAL
DB_LOST_WRITE_PROTECT         = TYPICAL
DB_FLASHBACK_RETENTION_TARGET = 120
PARALLEL_THREADS_PER_CPU      = 1
STANDBY_FILE_MANAGEMENT       = AUTO
DG_BROKER_START               = TRUE

DB_FILES = 1024

LOG_BUFFER = 256M

DB_BLOCK_CHECKSUM = TYPICAL

DB_LOST_WRITE_PROTECT = TYPICAL

DB_FLASHBACK_RETENTION_TARGET = 120

PARALLEL_THREADS_PER_CPU = 1

STANDBY_FILE_MANAGEMENT = AUTO

DG_BROKER_START = TRUE

Sets the RMAN archive deletion policy, enables flashback and force logging, creates the standby logs according to the online redo logs configuration, and creates an spfile if the database is running with an init file.

If you tried this in 21c, you have noticed that there is an automatic restart of the database to set all the static parameters. If you weren’t expecting this, the sudden restart could be a bit brutal approach.

In 23c, we added an additional keyword “restart” to specify that you are OK with the restart of the database. If you don’t specify it, the broker will complain that it cannot proceed without a restart:

DGMGRL> prepare database for data guard
> with db_unique_name is chol23c_hwq_lhr
> db_recovery_file_dest_size is "200g"
> db_recovery_file_dest is "/u03/app/oracle/fast_recovery_area"
> ;
Validating database "cdb1" before executing the command.
  DGM-17552: Primary database must be restarted after setting static initialization parameters.
  DGM-17327: Primary database must be restarted to enable archivelog mode.
Failed.
DGMGRL>

DGMGRL> prepare database for data guard

> with db_unique_name is chol23c_hwq_lhr

> db_recovery_file_dest_size is "200g"

> db_recovery_file_dest is "/u03/app/oracle/fast_recovery_area"

> ;

Validating database "cdb1" before executing the command.

DGM-17552: Primary database must be restarted after setting static initialization parameters.

DGM-17327: Primary database must be restarted to enable archivelog mode.

Failed.

DGMGRL>

If you specify it, it will proceed with the restart:

DGMGRL> prepare database for data guard
>   with db_unique_name is chol23c_hwq_lhr
>   db_recovery_file_dest_size is "200g"
>   db_recovery_file_dest is "/u03/app/oracle/fast_recovery_area"
>   restart;
Validating database "chol23c_hwq_lhr" before executing the command.
Preparing database "chol23c_hwq_lhr" for Data Guard.
Initialization parameter DB_FILES set to 1024.
Initialization parameter LOG_BUFFER set to 268435456.
Primary database must be restarted after setting static initialization parameters.
Shutting down database "chol23c_hwq_lhr".
Database closed.
Database dismounted.
ORACLE instance shut down.
Starting database "chol23c_hwq_lhr" to mounted mode.
ORACLE instance started.
Database mounted.
Initialization parameter DB_FLASHBACK_RETENTION_TARGET set to 120.
Initialization parameter DB_LOST_WRITE_PROTECT set to 'TYPICAL'.
RMAN configuration archivelog deletion policy set to SHIPPED TO ALL STANDBY.
Initialization parameter DB_RECOVERY_FILE_DEST_SIZE set to '200g'.
Initialization parameter DB_RECOVERY_FILE_DEST set to '/u03/app/oracle/fast_recovery_area'.
LOG_ARCHIVE_DEST_n initialization parameter already set for local archival.
Initialization parameter LOG_ARCHIVE_DEST_2 set to 'location=use_db_recovery_file_dest valid_for=(all_logfiles, all_roles)'.
Initialization parameter LOG_ARCHIVE_DEST_STATE_2 set to 'Enable'.
Adding standby log group size 1073741824 and assigning it to thread 1.
Adding standby log group size 1073741824 and assigning it to thread 1.
Adding standby log group size 1073741824 and assigning it to thread 1.
Initialization parameter STANDBY_FILE_MANAGEMENT set to 'AUTO'.
Initialization parameter DG_BROKER_START set to TRUE.
Database set to FLASHBACK ON.
Database opened.
Succeeded.
DGMGRL>

DGMGRL> prepare database for data guard

> with db_unique_name is chol23c_hwq_lhr

> db_recovery_file_dest_size is "200g"

> db_recovery_file_dest is "/u03/app/oracle/fast_recovery_area"

> restart;

Validating database "chol23c_hwq_lhr" before executing the command.

Preparing database "chol23c_hwq_lhr" for Data Guard.

Initialization parameter DB_FILES set to 1024.

Initialization parameter LOG_BUFFER set to 268435456.

Primary database must be restarted after setting static initialization parameters.

Shutting down database "chol23c_hwq_lhr".

Database closed.

Database dismounted.

ORACLE instance shut down.

Starting database "chol23c_hwq_lhr" to mounted mode.

ORACLE instance started.

Database mounted.

Initialization parameter DB_FLASHBACK_RETENTION_TARGET set to 120.

Initialization parameter DB_LOST_WRITE_PROTECT set to 'TYPICAL'.

RMAN configuration archivelog deletion policy set to SHIPPED TO ALL STANDBY.

Initialization parameter DB_RECOVERY_FILE_DEST_SIZE set to '200g'.

Initialization parameter DB_RECOVERY_FILE_DEST set to '/u03/app/oracle/fast_recovery_area'.

LOG_ARCHIVE_DEST_n initialization parameter already set for local archival.

Initialization parameter LOG_ARCHIVE_DEST_2 set to 'location=use_db_recovery_file_dest valid_for=(all_logfiles, all_roles)'.

Initialization parameter LOG_ARCHIVE_DEST_STATE_2 set to 'Enable'.

Adding standby log group size 1073741824 and assigning it to thread 1.

Initialization parameter STANDBY_FILE_MANAGEMENT set to 'AUTO'.

Initialization parameter DG_BROKER_START set to TRUE.

Database set to FLASHBACK ON.

Database opened.

Succeeded.

DGMGRL>

Notice that if you already have these static parameters set, the broker will just set the missing dynamic parameters without the need for a restart:

DGMGRL> prepare database for data guard
>   with db_unique_name is chol23c_hwq_lhr
>   db_recovery_file_dest_size is "200g"
>   db_recovery_file_dest is "/u03/app/oracle/fast_recovery_area"
> ;
Validating database "chol23c_hwq_lhr" before executing the command.
Preparing database "chol23c_hwq_lhr" for Data Guard.
Initialization parameter DB_RECOVERY_FILE_DEST_SIZE set to '200g'.
Initialization parameter DB_RECOVERY_FILE_DEST set to '/u03/app/oracle/fast_recovery_area'.
LOG_ARCHIVE_DEST_n initialization parameter already set for local archival.
Initialization parameter LOG_ARCHIVE_DEST_1 set to 'location=use_db_recovery_file_dest valid_for=(all_logfiles, all_roles)'.
Initialization parameter LOG_ARCHIVE_DEST_STATE_1 set to 'Enable'.
Succeeded.

DGMGRL> prepare database for data guard

> with db_unique_name is chol23c_hwq_lhr

> db_recovery_file_dest_size is "200g"

> db_recovery_file_dest is "/u03/app/oracle/fast_recovery_area"

> ;

Validating database "chol23c_hwq_lhr" before executing the command.

Preparing database "chol23c_hwq_lhr" for Data Guard.

Initialization parameter DB_RECOVERY_FILE_DEST_SIZE set to '200g'.

Initialization parameter DB_RECOVERY_FILE_DEST set to '/u03/app/oracle/fast_recovery_area'.

LOG_ARCHIVE_DEST_n initialization parameter already set for local archival.

Initialization parameter LOG_ARCHIVE_DEST_1 set to 'location=use_db_recovery_file_dest valid_for=(all_logfiles, all_roles)'.

Initialization parameter LOG_ARCHIVE_DEST_STATE_1 set to 'Enable'.

Succeeded.

This new command greatly simplifies the preparation of a Data Guard configuration!

Before 21c, you had to do everything by hand.

—

Ludo

Does FLASHBACK QUERY work across incarnations or after a Data Guard failover?

Posted on December 13, 2023 by Ludovico

Short answer: yes.

Let’s just see it in action.

First, I have a Data Guard configuration in place. On the primary database, the current incarnation has a single parent (the template from which it has been created):

SQL> select * from v$database_incarnation;

INCARNATION# RESETLOGS_CHANGE# RESETLOGS PRIOR_RESETLOGS_CHANGE# PRIOR_RES
------------ ----------------- --------- ----------------------- ---------
STATUS  RESETLOGS_ID PRIOR_INCARNATION# FLASHBACK_DATABASE_ALLOWED     CON_ID
------- ------------ ------------------ -------------------------- ----------
           1                 1 14-AUG-23                       0
PARENT    1144840863                  0 NO                                  0

           2           1343420 08-DEC-23                       1 14-AUG-23
CURRENT   1155034180                  1 NO                                  0

SQL> select * from v$database_incarnation;

INCARNATION# RESETLOGS_CHANGE# RESETLOGS PRIOR_RESETLOGS_CHANGE# PRIOR_RES

------------ ----------------- --------- ----------------------- ---------

STATUS RESETLOGS_ID PRIOR_INCARNATION# FLASHBACK_DATABASE_ALLOWED CON_ID

------- ------------ ------------------ -------------------------- ----------

1 1 14-AUG-23 0

PARENT 1144840863 0 NO 0

2 1343420 08-DEC-23 1 14-AUG-23

CURRENT 1155034180 1 NO 0

Just to make room for some undo, I increase the undo_retention. On a PDB, that requires LOCAL UNDO to be configured (I hope it’s the default everywhere nowadays).

SQL> alter session set container=PDB1;

Session altered.

SQL> alter system set undo_retention=86400;

System altered.

SQL> alter session set container=PDB1;

Session altered.

SQL> alter system set undo_retention=86400;

System altered.

Then, I update some data to test flashback query:

SQL> alter session set current_schema=HR;

Session altered.

SQL> update hr.employees set HIRE_DATE=sysdate where employee_id=100;

1 row updated.

SQL> commit;

Commit complete.

SQL> alter session set current_schema=HR;

Session altered.

SQL> update hr.employees set HIRE_DATE=sysdate where employee_id=100;

1 row updated.

SQL> commit;

Commit complete.

At this point, I can see the current data, and the data as it was 1 hour ago:

SQL> select hire_date from hr.employees where employee_id=100;

HIRE_DATE
---------
13-DEC-23

SQL> select hire_date from hr.employees as of timestamp systimestamp-1/24 where employee_id=100;

HIRE_DATE
---------
17-JUN-03

SQL> select hire_date from hr.employees where employee_id=100;

HIRE_DATE

---------

13-DEC-23

SQL> select hire_date from hr.employees as of timestamp systimestamp-1/24 where employee_id=100;

HIRE_DATE

---------

17-JUN-03

Now, I kill the primary database and fail over to the standby database:

# on the primary:
[ primary ] bash-4.4$ ps -eaf | grep pmon
lcaldara 1485907       1  0 10:29 ?        00:00:00 ora_pmon_orcl
lcaldara 1486768 1484883  0 10:37 pts/0    00:00:00 grep pmon
[ primary ] bash-4.4$ kill -9 1485907

# on the standby:
DGMGRL> connect /
Connected to "orcl_site2"
Connected as SYSDG.
DGMGRL> failover to "orcl_site2";
2023-12-13T10:38:31.179+00:00
Performing failover NOW, please wait...

2023-12-13T10:38:37.728+00:00
Failover succeeded, new primary is "orcl_site2".

2023-12-13T10:38:37.729+00:00
Failover processing complete, broker ready.
DGMGRL>

# on the primary:

[ primary ] bash-4.4$ ps -eaf | grep pmon

lcaldara 1485907 1 0 10:29 ? 00:00:00 ora_pmon_orcl

lcaldara 1486768 1484883 0 10:37 pts/0 00:00:00 grep pmon

[ primary ] bash-4.4$ kill -9 1485907

# on the standby:

DGMGRL> connect /

Connected to "orcl_site2"

Connected as SYSDG.

DGMGRL> failover to "orcl_site2";

2023-12-13T10:38:31.179+00:00

Performing failover NOW, please wait...

2023-12-13T10:38:37.728+00:00

Failover succeeded, new primary is "orcl_site2".

2023-12-13T10:38:37.729+00:00

Failover processing complete, broker ready.

DGMGRL>

After connecting to the new primary, I can see the new incarnation due to the open resetlogs after the failover.

SQL> select * from v$database_incarnation;

INCARNATION# RESETLOGS_CHANGE# RESETLOGS PRIOR_RESETLOGS_CHANGE# PRIOR_RES
------------ ----------------- --------- ----------------------- ---------
STATUS  RESETLOGS_ID PRIOR_INCARNATION# FLASHBACK_DATABASE_ALLOWED     CON_ID
------- ------------ ------------------ -------------------------- ----------
           1                 1 14-AUG-23                       0
PARENT    1144840863                  0 NO                                  0

           2           1343420 08-DEC-23                       1 14-AUG-23
PARENT    1155034180                  1 NO                                  0

           3           2704078 13-DEC-23                 1343420 08-DEC-23
CURRENT   1155465511                  2 NO                                  0

SQL> select * from v$database_incarnation;

INCARNATION# RESETLOGS_CHANGE# RESETLOGS PRIOR_RESETLOGS_CHANGE# PRIOR_RES

------------ ----------------- --------- ----------------------- ---------

STATUS RESETLOGS_ID PRIOR_INCARNATION# FLASHBACK_DATABASE_ALLOWED CON_ID

------- ------------ ------------------ -------------------------- ----------

1 1 14-AUG-23 0

PARENT 1144840863 0 NO 0

2 1343420 08-DEC-23 1 14-AUG-23

PARENT 1155034180 1 NO 0

3 2704078 13-DEC-23 1343420 08-DEC-23

CURRENT 1155465511 2 NO 0

And I can still query the data as of a previous timestamp:

SQL> select hire_date from hr.employees where employee_id=100;

HIRE_DATE
---------
13-DEC-23

SQL> select hire_date from hr.employees as of timestamp systimestamp-1/24 where employee_id=100;

HIRE_DATE
---------
17-JUN-03

SQL> select hire_date from hr.employees where employee_id=100;

HIRE_DATE

---------

13-DEC-23

SQL> select hire_date from hr.employees as of timestamp systimestamp-1/24 where employee_id=100;

HIRE_DATE

---------

17-JUN-03

Or flash back the table, if required:

SQL> flashback table hr.employees to timestamp sysdate-1/24;
flashback table hr.employees to timestamp sysdate-1/24
                   *
ERROR at line 1:
ORA-08189: cannot flashback the table because row movement is not enabled


SQL> alter table hr.employees enable row movement;

Table altered.

SQL> flashback table hr.employees to timestamp sysdate-1/24;

Flashback complete.

SQL> select hire_date from hr.employees where employee_id=100;

HIRE_DATE
---------
17-JUN-03

SQL> flashback table hr.employees to timestamp sysdate-1/24;

flashback table hr.employees to timestamp sysdate-1/24

ERROR at line 1:

ORA-08189: cannot flashback the table because row movement is not enabled

SQL> alter table hr.employees enable row movement;

Table altered.

SQL> flashback table hr.employees to timestamp sysdate-1/24;

Flashback complete.

SQL> select hire_date from hr.employees where employee_id=100;

HIRE_DATE

---------

17-JUN-03

So yes, that works. The caveat is still that you need to retain enough data in the undo tablespace to rebuild the rows in their previous state.

—

Ludo

When it comes to using Oracle, trust Oracle…

Posted on July 14, 2023 by Ludovico

A month ago, I saw this article published on the AWS architecture blog:

Disaster Recovery for Oracle Database on Amazon EC2 with Fast-Start Failover

I love seeing people suggesting Oracle Data Guard Fast-Start Failover for high availability. Nevertheless, there are a few problems with the architecture and steps proposed in the article.

I sent my comments via Disqus on the AWS blogging platform, but after a month, my comment was rejected, and the blog content hasn’t changed.

For this reason, I don’t have other places to post my comment but here…

The link to the setup procedure is from 2009.
We have official documentation that we keep up to date. The Fast-Start Failover part:
https://docs.oracle.com/en/database/oracle/oracle-database/19/dgbkr/using-data-guard-broker-to-manage-switchovers-failovers.html#GUID-D26D79F2-0093-4C0E-98CD-224A5C8CBFA4
and the Best Practices guide:
https://docs.oracle.com/en/database/oracle/oracle-database/19/haovw/oracle-data-guard-best-practices.html#GUID-C3A78B07-6584-4380-8D53-E5B831A5894C
The part about cascading standbys references a step-by-step guide from an external blog written many years ago for 11gR2.
The DBMS_SERVICE doc is from 12cR1, while other links are from 21c doc or 19c doc. As of today, most implement 19c. That’s probably the version to use.
https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_SERVICE.html#GUID-C11449DC-EEDE-4BB8-9D2C-0A45198C1928
The steps used to create the database service do not include any HA property, which will make most efforts useless. (see Table 153-6 in the link above).
The article talks about TAF, but no steps exist to configure it. We don’t recommend TAF since 12c anyway. Today (19c), the recommendation is TAC (Transparent Application Continuity).
https://www.oracle.com/docs/tech/application-checklist-for-continuous-availability-for-maa.pdf
But, most important, TAF (or Oracle connectivity in general) does NOT require a host IP change! There is no need to change the DNS when using the recommended connection string with multiple address_lists.
Some RedoRoutes examples are not correct. In this video I explain how they work and how to set them up:
https://www.youtube.com/watch?v=huG8JPu_s4Q
The diagram shows the master observer together with the standby database, which is a bad practice. I explain why and how here:
https://www.youtube.com/watch?v=e81UPLfnLi0

The central message is:

If you need to implement a complex architecture using a software solution, pay attention that the practices suggested by the partner/integrator/3rd party match the ones from the software vendor. In the case of Oracle Data Guard, Oracle knows better 😉

Cheers

—

Ludovico

Video: Where should I put the Observer in a Fast-Start Failover configuration?

Posted on November 29, 2022 by Ludovico

The video explains best practices and different failure scenarios for different observer placements. It also shows how to configure high availability for the observer.

Here’s the summary:

Always try to put the observer(s) on an external site.
If you don’t have any, put it where the primary database is, and have one ready on the secondary site after the role transition.
Don’t put the observer together with the standby database!
Configure multiple observers for high availability, and use the PreferredObserverHosts Data Guard member property to ensure you never run the observer where the standby database is.

Video: The importance of Fast-Start Failover in an Oracle Data Guard configuration

Posted on November 29, 2022 by Ludovico

Why is Fast-Start Failover a crucial component for mission-critical Data Guard deployments?
The observer lowers the RTO in case of failure, and the Fast-Start Failover protection modes protect the database from split-brain and data loss.

Far Sync and Fast-Start Failover Protection modes

Posted on April 14, 2022 by Ludovico

Oracle advertises Far Sync as a solution for “Zero Data Loss at any distance”. This is because the primary sends its redo stream synchronously to the Far Sync, which relays it to the remote physical standby.

There are many reasons why Far Sync is an optimal solution for this use case, but that’s not the topic of this post 🙂

Some customers ask: Can I configure Far Sync to receive the redo stream asynchronously?

Although a direct standby receiving asynchronously would be a better idea, Far Sync can receive asynchronously as well.

And one reason might be to send asynchronously to one Far Sync member that redistributes locally to many standbys.

It is very simple to achieve: just changing the RedoRoutes property on the primary.

RedoRoutes = '(LOCAL : cdgsima_farsync1 ASYNC)'

1	RedoRoutes = '(LOCAL : cdgsima_farsync1 ASYNC)'

This will work seamlessly. The v$dataguard_process will show the async transport process:

NAME PID TYP ACTION CLIENT_PID CLIENT_ROLE GROUP# RESETLOG_ID THREAD# SEQUENCE# BLOCK#
TT02 440 KSV async ORL multi 0 none 2 1098480879 1 146 456

1 2	NAME PID TYP ACTION CLIENT_PID CLIENT_ROLE GROUP# RESETLOG_ID THREAD# SEQUENCE# BLOCK# TT02 440 KSV async ORL multi 0 none 2 1098480879 1 146 456

What about Fast-Start Failover?

Up to and including 19c, ASYNC transport to Far Sync will not work with Fast-Start Failover (FSFO).

ASYNC redo transport mandates Maximum Performance protection mode, and FSFO supports that in conjunction with Far Sync only starting with 21c.

Before 21c, trying to enable FSFO with a Far Sync will fail with:

effective redo transport mode is incompatible with the configuration protection mode

1	effective redo transport mode is incompatible with the configuration protection mode

DGMGRL> show fast_start failover

Fast-Start Failover:  Disabled

  Protection Mode:    MaxPerformance
  Lag Limit:          30 seconds

  Threshold:          30 seconds
  Active Target:      (none)
  Potential Targets:  "cdgsima_lhr1bm"
    cdgsima_lhr1bm invalid - effective redo transport mode is incompatible with the configuration protection mode
  Observer:           (none)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES

  Oracle Error Conditions:
    (none)

DGMGRL> show fast_start failover

Fast-Start Failover: Disabled

Protection Mode: MaxPerformance

Lag Limit: 30 seconds

Threshold: 30 seconds

Active Target: (none)

Potential Targets: "cdgsima_lhr1bm"

cdgsima_lhr1bm invalid - effective redo transport mode is incompatible with the configuration protection mode

Observer: (none)

Shutdown Primary: TRUE

Auto-reinstate: TRUE

Observer Reconnect: (none)

Observer Override: FALSE

Configurable Failover Conditions

Health Conditions:

Corrupted Controlfile YES

Corrupted Dictionary YES

Inaccessible Logfile NO

Stuck Archiver NO

Datafile Write Errors YES

Oracle Error Conditions:

(none)

So if you want FSFO with Far Sync in 19c, it has to be MaxAvailability (and SYNC redo transport to the FarSync).

If you don’t need FSFO, as we have seen, there is no problem. The only protection mode that will not work with Far Sync is Maximum Protection:

If FSFO is required, and you want Maximum Performance before 21c, or Maximum Protection, you have to remove Far Sync from the redo route.

—

Ludovico

Can a physical standby database receive the redo SYNC if the Far Sync instance fails?

Posted on April 7, 2022 by Ludovico

The answer is YES.

In the following configuration, cdgsima_lhr1pq (primary) sends synchronously to cdgsima_farsync1 (far sync), which forwards the redo stream asynchronously to cdgsima_lhr1bm (physical standby):

DGMGRL> show configuration verbose

Configuration - cdgsima

  Protection Mode: MaxPerformance
  Members:
  cdgsima_lhr1pq   - Primary database
    cdgsima_farsync1 - Far sync instance
      cdgsima_lhr1bm   - Physical standby database
    cdgsima_lhr1bm   - Physical standby database (alternate of cdgsima_farsync1)

  Members Not Receiving Redo:
  cdgsima_farsync2 - Far sync instance

DGMGRL> show configuration verbose

Configuration - cdgsima

Protection Mode: MaxPerformance

Members:

cdgsima_lhr1pq - Primary database

cdgsima_farsync1 - Far sync instance

cdgsima_lhr1bm - Physical standby database

cdgsima_lhr1bm - Physical standby database (alternate of cdgsima_farsync1)

Members Not Receiving Redo:

cdgsima_farsync2 - Far sync instance

But if cdgsima_farsync1 is not available, I want the primary to send synchronously to the physical standby database. I accept a performance penalty, but I do not want to compromise my data protection.

I just need to set up the Redoroutes as follows:

-- when primary is cdgsima_lhr1pq 
EDIT DATABASE 'cdgsima_lhr1pq' SET PROPERTY 'RedoRoutes' = '(LOCAL : (cdgsima_farsync1 SYNC PRIORITY=1, cdgsima_lhr1bm SYNC PRIORITY=2 ))';
EDIT FAR_SYNC 'cdgsima_farsync1' SET PROPERTY 'RedoRoutes' = '(cdgsima_lhr1pq : cdgsima_lhr1bm ASYNC)';

-- when primary is cdgsima_lhr1bm
EDIT DATABASE 'cdgsima_lhr1bm' SET PROPERTY 'RedoRoutes' = '(LOCAL : (cdgsima_farsync2 SYNC PRIORITY=1, cdgsima_lhr1pq SYNC PRIORITY=2 ))';
EDIT FAR_SYNC 'cdgsima_farsync2' SET PROPERTY 'RedoRoutes' = '(cdgsima_lhr1bm : cdgsima_lhr1pq ASYNC)';

-- when primary is cdgsima_lhr1pq

EDIT DATABASE 'cdgsima_lhr1pq' SET PROPERTY 'RedoRoutes' = '(LOCAL : (cdgsima_farsync1 SYNC PRIORITY=1, cdgsima_lhr1bm SYNC PRIORITY=2 ))';

EDIT FAR_SYNC 'cdgsima_farsync1' SET PROPERTY 'RedoRoutes' = '(cdgsima_lhr1pq : cdgsima_lhr1bm ASYNC)';

-- when primary is cdgsima_lhr1bm

EDIT DATABASE 'cdgsima_lhr1bm' SET PROPERTY 'RedoRoutes' = '(LOCAL : (cdgsima_farsync2 SYNC PRIORITY=1, cdgsima_lhr1pq SYNC PRIORITY=2 ))';

EDIT FAR_SYNC 'cdgsima_farsync2' SET PROPERTY 'RedoRoutes' = '(cdgsima_lhr1bm : cdgsima_lhr1pq ASYNC)';

This is defined the second part of the RedoRoutes rules:

cdgsima_lhr1bm SYNC PRIORITY=2

1	cdgsima_lhr1bm SYNC PRIORITY=2

Let’s test. If I shutdown abort the farsync instance:

$ rlwrap sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Mar 26 10:55:31 2022
Version 19.13.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Version 19.13.0.0.0

SQL> shutdown abort
ORACLE instance shut down.
SQL>

$ rlwrap sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sat Mar 26 10:55:31 2022

Version 19.13.0.0.0

Connected to:

Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production

Version 19.13.0.0.0

SQL> shutdown abort

ORACLE instance shut down.

SQL>

I can see the new SYNC destination being open almost instantaneously (because the old destination fails immediately with ORA-03113):

2022-03-26T10:55:35.581460+00:00
LGWR (PID:42101): Attempting LAD:2 network reconnect (3113)
LGWR (PID:42101): LAD:2 network reconnect abandoned
2022-03-26T10:55:35.602542+00:00
Errors in file /u01/app/oracle/diag/rdbms/cdgsima_lhr1pq/cdgsima/trace/cdgsima_lgwr_42101.trc:
ORA-03113: end-of-file on communication channel
LGWR (PID:42101): Error 3113 for LNO:3 to 'dgsima1.dbdgsima.misclabs.oraclevcn.com:1521/cdgsima_farsync1.dbdgsima.misclabs.oraclevcn.com'
2022-03-26T10:55:35.608691+00:00
LGWR (PID:42101): LAD:2 is UNSYNCHRONIZED
2022-03-26T10:55:36.610098+00:00
LGWR (PID:42101): Failed to archive LNO:3 T-1.S-141, error=3113
LGWR (PID:42101): Error 1041 disconnecting from LAD:2 standby host 'dgsima1.dbdgsima.misclabs.oraclevcn.com:1521/cdgsima_farsync1.dbdgsima.misclabs.oraclevcn.com'
2022-03-26T10:55:37.143448+00:00
LGWR (PID:42101): LAD:3 is UNSYNCHRONIZED
2022-03-26T10:55:37.143569+00:00
LGWR (PID:42101): LAD:2 no longer supports SYNCHRONIZATION
Starting background process NSS3
2022-03-26T10:55:37.227954+00:00
NSS3 started with pid=38, OS id=78251
2022-03-26T10:55:40.733905+00:00
Thread 1 advanced to log sequence 142 (LGWR switch),  current SCN: 8068734
  Current log# 1 seq# 142 mem# 0: /u03/app/oracle/redo/CDGSIMA_LHR1PQ/onlinelog/o1_mf_1_k251hfvk_.log
2022-03-26T10:55:40.781499+00:00
ARC0 (PID:42266): Archived Log entry 220 added for T-1.S-141 ID 0x9eb046ef LAD:1
2022-03-26T10:55:41.606175+00:00
ALTER SYSTEM SET log_archive_dest_state_3='ENABLE' SCOPE=MEMORY SID='*';
2022-03-26T10:55:43.747483+00:00
LGWR (PID:42101): LAD:3 is SYNCHRONIZED
2022-03-26T10:55:43.816978+00:00
Thread 1 advanced to log sequence 143 (LGWR switch),  current SCN: 8068743
  Current log# 2 seq# 143 mem# 0: /u03/app/oracle/redo/CDGSIMA_LHR1PQ/onlinelog/o1_mf_2_k251hfwz_.log

2022-03-26T10:55:35.581460+00:00

LGWR (PID:42101): Attempting LAD:2 network reconnect (3113)

LGWR (PID:42101): LAD:2 network reconnect abandoned

2022-03-26T10:55:35.602542+00:00

Errors in file /u01/app/oracle/diag/rdbms/cdgsima_lhr1pq/cdgsima/trace/cdgsima_lgwr_42101.trc:

ORA-03113: end-of-file on communication channel

LGWR (PID:42101): Error 3113 for LNO:3 to 'dgsima1.dbdgsima.misclabs.oraclevcn.com:1521/cdgsima_farsync1.dbdgsima.misclabs.oraclevcn.com'

2022-03-26T10:55:35.608691+00:00

LGWR (PID:42101): LAD:2 is UNSYNCHRONIZED

2022-03-26T10:55:36.610098+00:00

LGWR (PID:42101): Failed to archive LNO:3 T-1.S-141, error=3113

LGWR (PID:42101): Error 1041 disconnecting from LAD:2 standby host 'dgsima1.dbdgsima.misclabs.oraclevcn.com:1521/cdgsima_farsync1.dbdgsima.misclabs.oraclevcn.com'

2022-03-26T10:55:37.143448+00:00

LGWR (PID:42101): LAD:3 is UNSYNCHRONIZED

2022-03-26T10:55:37.143569+00:00

LGWR (PID:42101): LAD:2 no longer supports SYNCHRONIZATION

Starting background process NSS3

2022-03-26T10:55:37.227954+00:00

NSS3 started with pid=38, OS id=78251

2022-03-26T10:55:40.733905+00:00

Thread 1 advanced to log sequence 142 (LGWR switch), current SCN: 8068734

Current log# 1 seq# 142 mem# 0: /u03/app/oracle/redo/CDGSIMA_LHR1PQ/onlinelog/o1_mf_1_k251hfvk_.log

2022-03-26T10:55:40.781499+00:00

ARC0 (PID:42266): Archived Log entry 220 added for T-1.S-141 ID 0x9eb046ef LAD:1

2022-03-26T10:55:41.606175+00:00

ALTER SYSTEM SET log_archive_dest_state_3='ENABLE' SCOPE=MEMORY SID='*';

2022-03-26T10:55:43.747483+00:00

LGWR (PID:42101): LAD:3 is SYNCHRONIZED

2022-03-26T10:55:43.816978+00:00

Thread 1 advanced to log sequence 143 (LGWR switch), current SCN: 8068743

Current log# 2 seq# 143 mem# 0: /u03/app/oracle/redo/CDGSIMA_LHR1PQ/onlinelog/o1_mf_2_k251hfwz_.log

Indeed, I can see the new NSS process (synchronous redo transport) spawn at that time:

SQL> r
  1  select NAME
  2  ,PID
  3  ,TYPE
  4  ,ROLE ACTION
  5  ,CLIENT_PID
  6  ,CLIENT_ROLE
  7  ,GROUP#
  8  ,RESETLOG_ID
  9  ,THREAD#
 10  ,SEQUENCE#
 11  ,BLOCK#
 12* from v$dataguard_process where name like 'NSS%'

NAME  PID                      TYP ACTION                   CLIENT_PID CLIENT_ROLE          GROUP# RESETLOG_ID    THREAD#  SEQUENCE#     BLOCK#
----- ------------------------ --- ------------------------ ---------- ---------------- ---------- ----------- ---------- ---------- ----------
NSS2  54961                    KSB sync                              0 none                      0           0          0          0          0
NSS3  78251                    KSB sync                              0 none                      0           0          0          0          0

SQL> !ps -eaf | grep ora_nss
oracle   54961     1  0 Mar10 ?        00:00:55 ora_nss2_cdgsima
oracle   78251     1  0 10:55 ?        00:00:00 ora_nss3_cdgsima

SQL> r

1 select NAME

2 ,PID

3 ,TYPE

4 ,ROLE ACTION

5 ,CLIENT_PID

6 ,CLIENT_ROLE

7 ,GROUP#

8 ,RESETLOG_ID

9 ,THREAD#

10 ,SEQUENCE#

11 ,BLOCK#

12* from v$dataguard_process where name like 'NSS%'

NAME PID TYP ACTION CLIENT_PID CLIENT_ROLE GROUP# RESETLOG_ID THREAD# SEQUENCE# BLOCK#

----- ------------------------ --- ------------------------ ---------- ---------------- ---------- ----------- ---------- ---------- ----------

NSS2 54961 KSB sync 0 none 0 0 0 0 0

NSS3 78251 KSB sync 0 none 0 0 0 0 0

SQL> !ps -eaf | grep ora_nss

oracle 54961 1 0 Mar10 ? 00:00:55 ora_nss2_cdgsima

oracle 78251 1 0 10:55 ? 00:00:00 ora_nss3_cdgsima

—

Ludo

Can I rename a PDB in a Data Guard configuration?

Posted on November 21, 2021 by Ludovico

Someone asked me this question recently.

The answer is: yes!

Let’s see it in action.

On the primary I have:

----- PRIMARY
SQL> show pdbs;

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 RED                            READ WRITE NO
         4 SAND                           READ WRITE NO

----- PRIMARY

SQL> show pdbs;

CON_ID CON_NAME OPEN MODE RESTRICTED

---------- ------------------------------ ---------- ----------

2 PDB$SEED READ ONLY NO

3 RED READ WRITE NO

4 SAND READ WRITE NO

And of course the same PDBs on the standby:

----- STANDBY
SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 RED                            MOUNTED
         4 SAND                           MOUNTED

----- STANDBY

SQL> show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED

---------- ------------------------------ ---------- ----------

2 PDB$SEED MOUNTED

3 RED MOUNTED

4 SAND MOUNTED

Let’s change the PDB RED name to TOBY: The PDB rename operation is straightforward (but it requires a brief downtime). To be done on the primary:

SQL> alter pluggable database red close;

Pluggable database altered.

SQL> alter pluggable database red open restricted;

Pluggable database altered.

SQL> alter session set container=red;

Session altered.

SQL> alter pluggable database rename global_name to toby;

Pluggable database altered.

SQL> alter session set container=cdb$root;

Session altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 TOBY                           READ WRITE YES
         4 SAND                           READ WRITE NO

SQL> alter pluggable database toby close;

Pluggable database altered.


SQL> alter pluggable database toby open;

Pluggable database altered.

SQL>

SQL> alter pluggable database red close;

Pluggable database altered.

SQL> alter pluggable database red open restricted;

Pluggable database altered.

SQL> alter session set container=red;

Session altered.

SQL> alter pluggable database rename global_name to toby;

Pluggable database altered.

SQL> alter session set container=cdb$root;

Session altered.

SQL> show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED

---------- ------------------------------ ---------- ----------

2 PDB$SEED READ ONLY NO

3 TOBY READ WRITE YES

4 SAND READ WRITE NO

SQL> alter pluggable database toby close;

Pluggable database altered.

SQL> alter pluggable database toby open;

Pluggable database altered.

SQL>

On the standby, I can see that the PDB changed its name:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       MOUNTED
         3 TOBY                           MOUNTED
         4 SAND                           MOUNTED
SQL>

SQL> show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED

---------- ------------------------------ ---------- ----------

2 PDB$SEED MOUNTED

3 TOBY MOUNTED

4 SAND MOUNTED

SQL>

The PDB name change is propagated transparently with the redo apply.

—

Ludo

rhpctl addnode gihome: specify HUB or LEAF when adding new nodes to a Flex Cluster

Posted on July 29, 2021 by Ludovico

I have a customer trying to add a new node to a cluster using Fleet Patching and Provisioning.

The error in the command output is not very friendly:

[grid@fpps ~]$ rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \
  -newnodes fppc3:fppc3-vip  -cred fppc-cred
fpps: Audit ID: 269
PRCT-1003 : failed to run "rhphelper" on node "fppc2"
PRCT-1014 : Internal error: RHPHELP_preNodeAddVal-05null

[grid@fpps ~]$ rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \

-newnodes fppc3:fppc3-vip -cred fppc-cred

fpps: Audit ID: 269

PRCT-1003 : failed to run "rhphelper" on node "fppc2"

PRCT-1014 : Internal error: RHPHELP_preNodeAddVal-05null

The “RHPHELP_preNodeAddVal” might already give an idea of the cause: something related to the “cluvfy stage -pre nodeadd” evaluation that we normally do when adding a node by hand. FPP does not really run cluvfy, but it calls the same primitives cluvfy is based on.

In FPP, when the error does not give any useful information, this is the flow to follow:

use “rhpctl query audit” to get the date and time of the failing operation
open the “rhpserver.log.0” and look for the operation log in that time frame
get the UID of the operation e.g., in the following line it is “1556344143”:

[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:25:20.741 KST ]
  [ServerCommon.processParameters:485]  before parsing: params = 
  {-methodName=addnodesWorkingCopy, -userName=grid, -version=19.0.0.0.0, -auditId=-1556344143,
  -auditCli=rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 -newnodes fppc3:fppc3-vip -cred cred_fppc,
  -plsnrPort=31605, -noun=gihome, -isSingleNodeProv=FALSE, -nls_lang=AMERICAN_AMERICA.AL32UTF8,
  -clusterName=fpps-cluster, -plsnrHost=fpps, -SA11204ClusterName=null,
  -lang=en_US, -clientNode=fpps, -verb=addnode, -ghopuid=-1556344143}

[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:25:20.741 KST ]

[ServerCommon.processParameters:485] before parsing: params =

{-methodName=addnodesWorkingCopy, -userName=grid, -version=19.0.0.0.0, -auditId=-1556344143,

-auditCli=rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 -newnodes fppc3:fppc3-vip -cred cred_fppc,

-plsnrPort=31605, -noun=gihome, -isSingleNodeProv=FALSE, -nls_lang=AMERICAN_AMERICA.AL32UTF8,

-clusterName=fpps-cluster, -plsnrHost=fpps, -SA11204ClusterName=null,

-lang=en_US, -clientNode=fpps, -verb=addnode, -ghopuid=-1556344143}

Isolate the log for the operation: grep $UID rhpserver.log.0 > $UID.log
Locate the trace file of the rhphelper remote execution:

[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:26:07.031 KST ] [RHPHELPERUtil.getTraceEnvs:4386] 
  TraceFileLocEnv is :RHPHELPER_TRACEFILE=/u01/app/grid/crsdata/fppc2/rhp/rhphelp_20210727002603.trc

1 2	[UID:-1556344143] [RMI TCP Connection(153)-192.168.1.151] [ 2021-07-27 00:26:07.031 KST ] [RHPHELPERUtil.getTraceEnvs:4386] TraceFileLocEnv is :RHPHELPER_TRACEFILE=/u01/app/grid/crsdata/fppc2/rhp/rhphelp_20210727002603.trc

Find the root cause in the rhphelper trace:

[main] [ 2021-07-27 00:27:02.600 KST ] [reflect.GeneratedMethodAccessor1.invoke:-1]  PRVG-11406 : API with node roles argument must be called for Flex Cluster

1	[main] [ 2021-07-27 00:27:02.600 KST ] [reflect.GeneratedMethodAccessor1.invoke:-1] PRVG-11406 : API with node roles argument must be called for Flex Cluster

In this case, the target cluster is a Flex Cluster, so the command must be run specifying the node_role.

The documentation is not clear (we will fix it soon):

rhpctl addnode gihome {-workingcopy workingcopy_name | -client cluster_name}
  -newnodes node_name:node_vip[:node_role][,node_name:node_vip[:node_role]...]

1 2	rhpctl addnode gihome {-workingcopy workingcopy_name \| -client cluster_name} -newnodes node_name:node_vip[:node_role][,node_name:node_vip[:node_role]...]

node_role must be specified for Flex Clusters, and it must be either HUB or LEAF.

After using the correct command line, the command succeeded.

rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \
 -newnodes fppc3:fppc3-vip:HUB  -cred fppc-cred

1 2	rhpctl addnode gihome -workingcopy WC_gi19110_FPPC3 \ -newnodes fppc3:fppc3-vip:HUB -cred fppc-cred

HTH

—

Ludovico