block corruptions - DBA survival BLOG

UPDATE: In the original version I was missing a few keywords: “incremental level 0” for the base backup and “resetlogs” at the database open. Thanks Gregorz for your comments.

Sorry for this “memories” post, but the technical solution at the end is worth the read, I hope 😉

Back in 2010, I was in charge of a quite complex project and faced some difficulties that led me to recover a database in a different manner. A few years have passed, but I used again the same procedure many times with full satisfaction… I think it’s worth to publish it now.

But first, let me introduce the project details and the problem.

Scope of the project

Transport a >1TB RAC database from AIX 5 on P6 to AIX 6 on P7, from a third-party datacenter in southern Italy to our main datacenter in northern Italy.
The Database featured >1000 datafiles and a huge table (800GB) partitioned by range and sub-partitioned by list (or the opposite, can’t remember).

Challenges

For budget containement, the project owner asked to avoid the use of HACMP (and thus, avoid the use of shared JFS2). I decided then to take the risk and migrate from JFS2 to ASM.

In order to avoid a few platform-related ASM bugs, I also had to upgrade from Oracle 10.2.0.3 to Oracle 10.2.0.4.

Constraints

I had no access to the source database that was 800km far from our datacenter, and I was granted only to ask for RMAN backups.

The total time of service disruption accepted was quite short (<30 minutes) considering the size and the distance of the database, and there was no direct connectivity between the sites (for political reasons).

Globally, the network throughput for sharing files over ftp was very poor.

First solution

This kind of move was very common to me, and because I was not grated to ask for a temporary Data Guard configuration, the easy solution for me was to ask:

1 – one RMAN ONLINE full backup physically sent on disk

2 – many RMAN archive backups sent over network (via ftp)

Then, on my side, restore the full backup, recover the archives sent over time and, at the date X, ask a final archive backup, ask to close the db and send the online redo logs to do a complete recovery on my side, then startup open upgrade.

Problem

I did a first “dry run” open resetlogs in order to test the procedure and make it faster, and also asked to test the application pointing to the destination database.

The very bad surprise was that the source database was doing a huge amount of nologging inserts leading to monster index corruptions after the recovery on the destination database.

ORA-26040: Data block was loaded using the NOLOGGING option

1	ORA-26040: Data block was loaded using the NOLOGGING option

According to the current database maintainer, setting the force logging on the source database was NOT an option because the SAN was not able to cope with the high redo rates.

Solution

By knowing the Oracle recovery mechanisms, I have proposed to the remote maintainer to change the recovery strategy, despite this solution was not clearly stated in the Oracle documentation:

1 – Take a first online incremental backup from the begin scn of the base full backup (thank God block change tracking was in place) and send it physically over disk

2 – Take other smaller online incremental backups, send them over ftp and apply them on the destination with “noredo”

3 – At the date X, shutdown the source, mount it and take a last incremental in mount state

4 – recover noredo the last incremental and open resetlogs the database.

According to the documentation, the “cold incremental strategy” applies if you take “cold full backups”. But from a technical point of view, taking a cold incremental and recovering it on top of a fuzzy online backup this is 100% equivalent of taking a full consistent backup in mount state.
Because all the blocks are consistent to a specific SCN, there are no fuzzy datafiles: they are recovered from incremental taken from a mounted database! This allows to do incremental recovery and open the databases without applying any single archived log and by shutting down the database only once.

Technical steps

First, take a full ONLINE backup on the source:

-- SOURCE
SQL> alter database backup controlfile to '/tmp/source/ludo.cf' reuse;

Database altered.

SQL> exit
$ rman target /
RMAN> backup incremental level 0 database as compressed backupset format '/tmp/source/%U';

-- SOURCE

SQL> alter database backup controlfile to '/tmp/source/ludo.cf' reuse;

Database altered.

SQL> exit

$ rman target /

RMAN> backup incremental level 0 database as compressed backupset format '/tmp/source/%U';

# SOURCE
scp -rp /tmp/source/ destsrv:/tmp/dest/
ludo.cf              100% |*************************************| 40944 KB    00:00
...

# SOURCE

scp -rp /tmp/source/ destsrv:/tmp/dest/

ludo.cf 100% |*************************************| 40944 KB 00:00

...

Then restore it on the destination (with no recovery):

# DEST
RMAN> restore controlfile from '/tmp/ludo.cf';

Starting restore at 11-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1058 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=/.../control01.ctl
output file name=/.../control02.ctl
Finished restore at 11-AUG-15

RMAN> alter database mount;

Statement processed
released channel: ORA_DISK_1

RMAN> catalog start with '/tmp/dest/';
...
RMAN> run
2> {
3> set newname for database to '+DATA';
4>
5> restore database;
6> }
...
Finished restore at 11-AUG-15
RMAN>

# DEST

RMAN> restore controlfile from '/tmp/ludo.cf';

Starting restore at 11-AUG-15

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=1058 device type=DISK

channel ORA_DISK_1: copied control file copy

output file name=/.../control01.ctl

output file name=/.../control02.ctl

Finished restore at 11-AUG-15

RMAN> alter database mount;

Statement processed

released channel: ORA_DISK_1

RMAN> catalog start with '/tmp/dest/';

...

RMAN> run

2> {

3> set newname for database to '+DATA';

5> restore database;

6> }

...

Finished restore at 11-AUG-15

RMAN>

Then, run a COLD incremental backup on the source:

-- SOURCE
SQL> shutdown immediate;
...
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.
...
Database mounted.
SQL> exit
$ rman target /
RMAN>  BACKUP AS COMPRESSED BACKUPSET INCREMENTAL LEVEL 1 
2> CUMULATIVE DATABASE format '/tmp/source/incr%U';
...
Finished backup at 11-AUG-15
RMAN> exit
$ scp -rp /tmp/source/incr* destsrv:/tmp/dest/

-- SOURCE

SQL> shutdown immediate;

...

ORACLE instance shut down.

SQL> startup mount

ORACLE instance started.

...

Database mounted.

SQL> exit

$ rman target /

RMAN> BACKUP AS COMPRESSED BACKUPSET INCREMENTAL LEVEL 1

2> CUMULATIVE DATABASE format '/tmp/source/incr%U';

...

Finished backup at 11-AUG-15

RMAN> exit

$ scp -rp /tmp/source/incr* destsrv:/tmp/dest/

And run the incremental recovery on the source (without redo):

# DEST
RMAN> catalog start with '/tmp/dest/incr';
...
RMAN> run {
2> recover database noredo;
3> }
...
channel ORA_DISK_1: starting incremental datafile backup set restore
...
Finished recover at 11-AUG-15
RMAN> exit
$ sqlplus / as sysdba
...
SQL> alter database disable block change tracking;
Database altered.
SQL> alter database flashback off;
Database altered.
SQL> alter database flashback on;
Database altered.
SQL> create restore point PREUPG guarantee flashback database;
Restore point created.
SQL> -- open resetlogs can be avoided if I copy the online redo logs
SQL> alter database open resetlogs upgrade;
Database altered.
...
-- run catupgrd here

# DEST

RMAN> catalog start with '/tmp/dest/incr';

...

RMAN> run {

2> recover database noredo;

3> }

...

channel ORA_DISK_1: starting incremental datafile backup set restore

...

Finished recover at 11-AUG-15

RMAN> exit

$ sqlplus / as sysdba

...

SQL> alter database disable block change tracking;

Database altered.

SQL> alter database flashback off;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL> create restore point PREUPG guarantee flashback database;

Restore point created.

SQL> -- open resetlogs can be avoided if I copy the online redo logs

SQL> alter database open resetlogs upgrade;

Database altered.

...

-- run catupgrd here

That’s all!

This solution gave me the opportunity to move physically the whole >1TB nologging database from one region to another one with a minimal service disruption and without touching at all the source database.

I used it many times later on, even for bigger databases and on several platforms (yes, also Windows, sigh), it works like a charm.

HTH

—

Ludovico

DBA survival BLOG

Some DBA stuff and lots of Oracle Data Guard

Tag Archives: block corruptions

How cold incremental recovery saved me once