Database Name
|
PRIMARY
|
Primary Database (Non-RAC)
|
STANDBY
|
Standby Database (Non-RAC)
| |
Storage
|
ASM
| |
OS
|
RHEL 5,32 Bit
|
Scenario :
I have two databases, one of them is PRIMARY and the other is physical STANDBY database. Till now both the databases were running perfectly and in sync with each other. During some RnD on physical standby database I performed the below steps :
1. Stopped the real time recovery that applies the archive logs from the primary database to physical standby database.
2. Marked offline the datafile’s of USERS tablespace and then manually deleted the datafiles of USERS tablespace on standby database from the ASM storage using the rm command. Now these datafile’s no more exists.
3. Changed the password of SYS schema on both databases (Primary as well as Standby).
4. Changed the log_archive_dest_state_2 parameter on primary database from ENABLE to defer.
5. Accidently deleted the archive logs from the PRIMARY database which still were not applied to SYANDBY database.
Observations :
Then I observed that the archive log sequence on PRIMARY database has reached 823 and the STANDBY database is lagging behind on archive sequence 723. Reverted the every change except the changes mentioned in step 2. After checking the STANDBY database’s alert log I found that the recovery process is waiting for archive log sequence 824 which was not present on the PRIMARY and STANDBY database because all the logs were deleted accidently as mentioned.
Steps for Fixing the Issue/Problem :
There could be n number of steps to get rid of the scenario as discussed above. Below are the steps to fix the issue :
1. Check the current SCN of PRIMARY and STANDBY database using the below SQL statement :
SELECT MAX(NEXT_CHANGE#) + 1 UNTIL_SCN
FROM V$LOG_HISTORY LH, V$DATABASE DB
WHERE LH.RESETLOGS_CHANGE# = DB.RESETLOGS_CHANGE#
AND LH.RESETLOGS_TIME = DB.RESETLOGS_TIME;
Output of above statement :
SCN of PRIMARY database
|
561102
|
SCN of STANDBY database
|
554102
|
From the above output it’s clear that STANDBY database is lagging behind the PRIMARY database.
2. Start the media recovery process on the STANDBY database and check the log sequence it is waiting for using the below SQL statement :
SQL> alter database recover managed standby database disconnect from
session;
Database altered.
SQL> select process, status, sequence# from v$managed_standby where
process like 'MRP%';
PROCESS STATUS SEQUENCE#
--------- ------------ ----------
MRP0 WAIT_FOR_LOG 724
Output of above statement :
Log Sequence of PRIMARY database
|
823
|
Log Sequence of STANDBY database
|
724
|
3. We know that the PRIMARY and STANDBY datafiles and controlfiles are not consistent so take the incremental backup of the primary database including the control file. Do notice that I am using the SYANDBY keyword while taking the RMAN backup of controlfile of PRIMARY database. Because I am using standby clause/keyword during the RMAN backup of PRIMARY database’s controlfile while restoring the controlfile on STANDBY database I need not to specify the standby clause/keyword again. If I do not use the standby clause during the RMAN backup of controlfile of PRIMARY database I need to put the standby clause/keyword during the restoration of controlfile on STANDBY database side.
On PRIMARY database server connect to target database using the RMAN utility provided by oracle (inbuilt utility of oracle)
export ORACLE_SID=PRIMARY
rman target /
RMAN> backup tag STANDBY_DB incremental from SCN=554102 device type disk
format '/oraInfra/rman/standby%U' as compressed backupset database
include current controlfile for STANDBY;
Here in the above RMAN statement SCN is the current SCN the STANDBY database is running with. The RMAN backup is going to roll forward the STANDBY database from SCN-554102 to SCN-561102.
4. Once the database and the controlfile backup of PRIMARY database on PRIMARY database server is complete take the RMAN backup of PRIMARY database’s archivelogs from the sequence the STANDBY database is waiting for to the sequence PRIMARY database is currently running with even if the archivelogs are not present (deleted in earlier steps during RnD) on the PRIMARY database.
export ORACLE_SID=PRIMARY
rman target /
RMAN> sql "alter system archive log current";
RMAN>backup tag STANDBY_ARC device type disk format
'/oraInfra/rman/standby%U' as compressed backupset archivelog from
sequence 724;
5. Once the backups are over on PRIMARY database use SCP/FTP/SFTP or other safe way to transfer the backups from PRIMARY database server to STANDBY database server. For your safe side you can also take the STANDBY database’s full RMAN backup before proceeding further, I didn’t.
6. Bring down the STANDBY database using the shutdown command and start in nomount mode.
export ORACLE_SID=STANDBY
sqlplus / as sysdba
SQL>shutdown immediate
SQL>startup nomount
7. Take another session of STANDBY database server to run RMAN commands.
export ORACLE_SID=STANDBY
rman target /
RMAN>restore controlfile from ‘/oraInfra/rman/standby2gfryu’;
OR
RMAN>restore standby controlfile from ‘/oraInfra/rman/standby2gfryu’;
8. Mount the STANDBY database using alter database mount command.
RMAN>alter database mount;
9. Register the backup pieces with RMAN using the below command on STANDBY database server. When prompted for registering the backup pieces enter YES, it will then register the backup pieces with RMAN to recover and restore the database with.
Note* : Do remember to check the alert log of STANDBY database parallely to see if the process is going smoothly without any error. In my case error has come because I made the tablespaces/datafiles offline in the RnD steps. To fix that issue I explicitly used the commands to turn the offline tablespaces/datafiles to online.
RMAN>catalog start with ‘oraInfra/rman’;
10. After registering the backup pieces restore the STANDBY database as under :
http://www.oracle-base.com/articles/10g/rman-enhancements-10g.php
# Catalog specific backup piece. CATALOG BACKUPPIECE '/backup/MYSID/01dmsbj4_1_1.bcp'; # Catalog all files and the contents of directories which # begin with the pattern "/backup/MYSID/arch". CATALOG START WITH '/backup/MYSID/arch'; # Catalog all files in the current recovery area. CATALOG RECOVERY AREA NOPROMPT; # Catalog all files in the current recovery area. # This is an exact synonym of the previous command. CATALOG DB_RECOVERY_FILE_DEST NOPROMPT;
RMAN>restore database;
The restore process may take time, depends upon the amount of change in data. Could vary.
11. Once the restore process is over, we now need to recover the database as under.
RMAN>recover database;
Recover process has to fail with the below error, but this is only because it didn't find the current log. This doesn't matter here.
RMAN-00571: ===============================================
RMAN-00569: ========= ERROR MESSAGE STACK FOLLOWS =========
RMAN-00571: ===============================================
RMAN-03002: failure of recover command at 10/29/2011 20:20:51
RMAN-06054: media recovery requesting unknown log: thread 1 seq 824 lowscn 561102
12. Now start the media recovery again to get both PRIMARY and STANDBY database in sync.
SQL> alter database recover managed standby database disconnect from
session;
Database altered.
SQL> select process, status, sequence# from v$managed_standby where
process like 'MRP%';
PROCESS STATUS SEQUENCE#
--------- ------------ ----------
MRP0 WAIT_FOR_LOG 824
SQL> alter database recover managed standby database cancel;
Database altered.
If MRP* is started and waiting for log (and no error in alert.log), then everything is good.
Then if found the below error in STANDBY database’s alert log :
ORA-16191: Primary log shipping client not logged on standby.
There are several reason behind this error. But the error was resolved after recreating the password files on both PRIMARY and STANDBY database servers with ignorecase=Y option.
Command used to recreate the password file is as under :
orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID password=password ignorecase=Y force=Y