环境说明:
主备库均为asm方式建库。数据库版本为11.2.0.4。
主库:192.168.99.237 oracle1
备库:192.168.99.238 oracle2
实验内容:主要先关闭备库,在主库做归档丢失备库无法同步归档文件,备库产生GAP,然后增量备份恢复备库。
实验模拟gap故障:
1、先检查主备库的dg状态,dg状态正常:
主库:
SQL> select switchover_status,database_role,open_mode from v$database;
SWITCHOVER_STATUS DATABASE_ROLE OPEN_MODE
-------------------- ---------------- --------------------
TO STANDBY PRIMARY READ WRITE
SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;
THREAD# PROCESS STATUS SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
0 ARCH CONNECTED 0 0
0 ARCH CONNECTED 0 0
1 ARCH CLOSING 120 0
0 ARCH CONNECTED 0 0
0 RFS IDLE 0 0
1 RFS IDLE 121 0
0 RFS IDLE 0 0
1 MRP0 APPLYING_LOG 121 0
备库:
SQL> select switchover_status,database_role,open_mode from v$database;
SWITCHOVER_STATUS DATABASE_ROLE OPEN_MODE
-------------------- ---------------- --------------------
NOT ALLOWED PHYSICAL STANDBY READ ONLY WITH APPLY
2、备库操作:下边取消实时应用,也就是关掉了mpr进程,再把备库关闭。
SQL> alter database recover managed standby database cancel;
Database altered.
SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;
THREAD# PROCESS STATUS SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
0 ARCH CONNECTED 0 0
0 ARCH CONNECTED 0 0
1 ARCH CLOSING 120 0
0 ARCH CONNECTED 0 0
0 RFS IDLE 0 0
1 RFS IDLE 121 0
0 RFS IDLE 0 0
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
3、主库上操作:模拟归档丢失,这时备库已经关掉,RFS进程无法把主库归档传送过去。
3.1、主库切换产生归档
SQL> alter system switch logfile;
System altered.
SQL> /
System altered.
SQL> /
System altered.
SQL> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
123
3.2、主库上查看刚才产生的归档日志文件:
3.3、备库上查看之前的归档文件信息:
3.4、删除主库中的122和123归档:
ASMCMD> rm -rf thread_1_seq_122.303.1082899809
ASMCMD> rm -rf thread_1_seq_123.304.1082899811
备注:这里让主库模拟丢失的本来是121和122和123,但是备库一直出现不了GAP的状态,后来一看备库日志说的是121已经再被运输中(in transit),所以说这里如果备库重新启动121是直接被应用的,如果把121号归档rm掉,是产生不了GAP的,所以要模拟rm掉归档 就rm 122号归档,也就是切换的第二个归档,这里需要注意下。
4、备库上:启动备库,查看GAP
4.1、启动备库,打开实时同步,检查gap信息:
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;
THREAD# PROCESS STATUS SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
0 ARCH CONNECTED 0 0
0 ARCH CONNECTED 0 0
0 ARCH CONNECTED 0 0
1 ARCH CLOSING 121 0
0 RFS IDLE 0 0
0 RFS IDLE 0 0
1 RFS IDLE 125 0
1 MRP0 WAIT_FOR_GAP 122 0
8 rows selected.
SQL> select * from v$archive_gap;
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
1 122 123
4.2、备库:查看alert日志信息,也能看到报错:
4.3、备库:在备库查看一下缺失的归档文件,发现和主库丢失的一样,所以就出现了gap错误。
5、主库:查询到122号归档的scn号,然后做增量备份,这里要做的是在主库上查询到122号归档的scn号,也就是在对122号归档做操作之后的记录,然后在备库增量恢复。
5.1、查询到24号归档之前的操作
SQL> select FIRST_CHANGE# from v$archived_log where SEQUENCE#=122;
FIRST_CHANGE#
-------------
1062975
5.2、查看当前的归档号
SQL> select current_scn from v$database;
CURRENT_SCN
-----------
1066169
6、模拟在test01表空间下创建2个数据文件:
SQL> alter tablespace test01 add datafile '+datadg' size 100M;
Tablespace altered.
SQL> alter tablespace test01 add datafile '+datadg' size 200M;
Tablespace altered.
7、主库上做基于1062975的备份(也就是SEQUENCE#=122号的scn号)。
RMAN> backup as compressed backupset incremental from SCN 1062975 database format '/home/oracle/standby_%d_%T_%U.bak' include current controlfile for standby filesperset=5 tag 'FOR STANDBY';
Starting backup at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=130 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=+DATADG/orcl/datafile/system.260.1082804591
input datafile file number=00002 name=+DATADG/orcl/datafile/sysaux.261.1082804595
input datafile file number=00008 name=+DATADG/orcl/datafile/test01.273.1082877021
input datafile file number=00004 name=+DATADG/orcl/datafile/users.264.1082804607
input datafile file number=00009 name=+DATADG/orcl/datafile/test01.274.1082904145
channel ORA_DISK_1: starting piece 1 at 10-SEP-21
channel ORA_DISK_1: finished piece 1 at 10-SEP-21
piece handle=/home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak tag=FOR STANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=+DATADG/orcl/datafile/undotbs1.262.1082804599
input datafile file number=00005 name=+DATADG/orcl/datafile/test01.266.1082807589
input datafile file number=00010 name=+DATADG/orcl/datafile/test01.275.1082904199
input datafile file number=00006 name=+DATADG/orcl/datafile/test01.267.1082807665
input datafile file number=00007 name=+DATADG/orcl/datafile/test01.272.1082876997
channel ORA_DISK_1: starting piece 1 at 10-SEP-21
channel ORA_DISK_1: finished piece 1 at 10-SEP-21
piece handle=/home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak tag=FOR STANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including standby control file in backup set
channel ORA_DISK_1: starting piece 1 at 10-SEP-21
channel ORA_DISK_1: finished piece 1 at 10-SEP-21
piece handle=/home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak tag=FOR STANDBY comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 10-SEP-21
这里查看一下备份的文件:
[root@oracle1 oracle]# ll
-rw-r----- 1 oracle asmadmin 1409024 Sep 10 14:46 standby_ORCL_20210910_0r08njpo_1_1.bak
-rw-r----- 1 oracle asmadmin 1302528 Sep 10 14:46 standby_ORCL_20210910_0s08njps_1_1.bak
-rw-r----- 1 oracle asmadmin 1114112 Sep 10 14:46 standby_ORCL_20210910_0t08njpv_1_1.bak
8、主库:把主库备份集传输到备库/home/oracle下
[root@oracle1 oracle]# scp standby_ORCL_20210910_0* oracle@192.168.99.238:/home/oracle
oracle@192.168.99.238's password:
Permission denied, please try again.
oracle@192.168.99.238's password:
standby_ORCL_20210910_0r08njpo_1_1.bak 100% 1376KB 1.3MB/s 00:00
standby_ORCL_20210910_0s08njps_1_1.bak 100% 1272KB 1.2MB/s 00:00
standby_ORCL_20210910_0t08njpv_1_1.bak 100% 1088KB 1.1MB/s 00:00
9、备库:在备库上进行恢复scn号,首先恢复控制文件:
9.1、关闭备库,启动到nomount状态下:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup nomount
ORACLE instance started.
Total System Global Area 2471931904 bytes
Fixed Size 2255752 bytes
Variable Size 620758136 bytes
Database Buffers 1828716544 bytes
Redo Buffers 20201472 bytes
9.2、先用备份集恢复控制文件,并把数据库启动到mount阶段:
RMAN> restore standby controlfile from '/home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak';
Starting restore at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=131 device type=DISK
channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=+DATADG/orcl2/controlfile/control01.ctl
output file name=+FLASHDG/orcl2/controlfile/control02.ctl
Finished restore at 10-SEP-21
RMAN> alter database mount;
database mounted
released channel: ORA_DISK_1
9.3、注册备库原有数据文件:
RMAN> catalog start with '+datadg';
Starting implicit crosscheck backup at 10-SEP-21
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=135 device type=DISK
Crosschecked 1 objects
Finished implicit crosscheck backup at 10-SEP-21
Starting implicit crosscheck copy at 10-SEP-21
using channel ORA_DISK_1
Crosschecked 2 objects
Finished implicit crosscheck copy at 10-SEP-21
searching for all files in the recovery area
cataloging files...
cataloging done
List of Cataloged Files
9.4、注册增量备文件:
RMAN> catalog start with '/home/oracle';
searching for all files that match the pattern /home/oracle
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak
File Name: /home/oracle/.bash_history
File Name: /home/oracle/.bashrc
File Name: /home/oracle/.bash_logout
File Name: /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
File Name: /home/oracle/.bash_profile
File Name: /home/oracle/.kshrc
File Name: /home/oracle/.ssh/known_hosts
File Name: /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
File Name: /home/oracle/.viminfo
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /home/oracle/standby_ORCL_20210910_0t08njpv_1_1.bak
File Name: /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
File Name: /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
9.5、还原增量数据文件(如果没有新增数据文件 跳过):
主库查询增加的数据文件号:(上面模拟的是添加的9号和10号文件)
SQL> select file_name,file_id from dba_data_files;
FILE_NAME FILE_ID
------------------------------------------------------------ ----------
+DATADG/orcl/datafile/system.260.1082804591 1
+DATADG/orcl/datafile/sysaux.261.1082804595 2
+DATADG/orcl/datafile/undotbs1.262.1082804599 3
+DATADG/orcl/datafile/users.264.1082804607 4
+DATADG/orcl/datafile/test01.266.1082807589 5
+DATADG/orcl/datafile/test01.267.1082807665 6
+DATADG/orcl/datafile/test01.272.1082876997 7
+DATADG/orcl/datafile/test01.273.1082877021 8
+DATADG/orcl/datafile/test01.274.1082904145 9
+DATADG/orcl/datafile/test01.275.1082904199 10
恢复9号和10号数据文件:
RMAN> restore datafile 9;
Starting restore at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=194 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00009 to +DATADG/orcl2/datafile/test01.274.1082904145
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 10-SEP-21
RMAN> restore datafile 10;
Starting restore at 10-SEP-21
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00010 to +DATADG/orcl2/datafile/test01.275.1082904199
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 10-SEP-21
9.6、recover问题:recover增量恢复备份文件,发现系统提示,需要把所有数据文件都要restore。
RMAN> recover database noredo;
Starting recover at 10-SEP-21
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 09/10/2021 15:06:58
RMAN-06094: datafile 1 must be restored
RMAN> restore datafile 1;
Starting restore at 10-SEP-21
using channel ORA_DISK_1
channel ORA_DISK_1: restoring datafile 00001
input datafile copy RECID=9 STAMP=1082904975 file name=+DATADG/orcl2/datafile/system.258.1082872823
destination for restore of datafile 00001: +DATADG/orcl2/datafile/system.260.1082804591
channel ORA_DISK_1: copied datafile copy of datafile 00001
output file name=+DATADG/orcl2/datafile/system.306.1082905653 RECID=0 STAMP=0
Finished restore at 10-SEP-21
RMAN> recover database noredo;
Starting recover at 10-SEP-21
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 09/10/2021 15:07:49
RMAN-06094: datafile 2 must be restored
9.7、把剩下所有数据文件全部restore:
RMAN> restore datafile 1;
RMAN> restore datafile 2;
RMAN> restore datafile 3;
RMAN> restore datafile 4;
RMAN> restore datafile 5;
RMAN> restore datafile 6;
RMAN> restore datafile 7;
RMAN> restore datafile 8;
9.8、再次recover增量恢复备份文件:
RMAN> recover database noredo;
Starting recover at 10-SEP-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=70 device type=DISK
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: +DATADG/orcl2/datafile/system.306.1082905653
destination for restore of datafile 00002: +DATADG/orcl2/datafile/sysaux.307.1082905779
destination for restore of datafile 00004: +DATADG/orcl2/datafile/users.309.1082905809
destination for restore of datafile 00008: +DATADG/orcl2/datafile/test01.313.1082905851
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0r08njpo_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00003: +DATADG/orcl2/datafile/undotbs1.308.1082905795
destination for restore of datafile 00005: +DATADG/orcl2/datafile/test01.310.1082905821
destination for restore of datafile 00006: +DATADG/orcl2/datafile/test01.311.1082905833
destination for restore of datafile 00007: +DATADG/orcl2/datafile/test01.312.1082905841
channel ORA_DISK_1: reading from backup piece /home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak
channel ORA_DISK_1: piece handle=/home/oracle/standby_ORCL_20210910_0s08njps_1_1.bak tag=FOR STANDBY
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished recover at 10-SEP-21
9.9、备库:打开数据库,并启动数据库至mount状态下,打开闪回区:
SQL> alter database open;
Database altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
Database mounted.
SQL> alter database flashback on;
Database altered.
SQL> alter database open;
Database altered.
10、备库:打开MRP进程,并查看进程状态:
10.1、备库下打开MPR进程(MPR进程状态为APPLYING_LOG),查看到所有进程正常:
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
SQL> select thread#,process,status,sequence#,DELAY_MINS from v$managed_standby;
THREAD# PROCESS STATUS SEQUENCE# DELAY_MINS
---------- --------- ------------ ---------- ----------
0 ARCH CONNECTED 0 0
0 ARCH CONNECTED 0 0
0 ARCH CONNECTED 0 0
1 ARCH CLOSING 128 0
0 RFS IDLE 0 0
0 RFS IDLE 0 0
1 RFS IDLE 129 0
1 MRP0 APPLYING_LOG 129 0
8 rows selected.
10.2、备库:查看gap状态为空,证明gap故障解决:
SQL> select * from v$archive_gap;
no rows selected
11、查询dg两个库的状态:
11.1、主库:
[oracle@oracle1 tmp]$ sqlplus / as sysdba
SQL> select switchover_status,database_role,open_mode from v$database;
SWITCHOVER_STATUS DATABASE_ROLE OPEN_MODE
-------------------- ---------------- --------------------
TO STANDBY PRIMARY READ WRITE
11.2、备库:
SQL> select switchover_status,database_role,open_mode from v$database;
SWITCHOVER_STATUS DATABASE_ROLE OPEN_MODE
-------------------- ---------------- --------------------
NOT ALLOWED PHYSICAL STANDBY READ ONLY WITH APPLY