前言

今天应用反馈有一套报表库查询有延时,几十分钟没有同步数据了,但是我检查 DG 同步状态之后发现一切正常,手动触发切换日志可以正常同步数据,但是无法实时查询主库的在线重做日志。

不是 DG 配置问题,不是 standby logfile 问题,不是网络问题,能是什么问题呢?

本文记录一下完整的分析过程以及解决方案。

检查 DG 状态

首先,先检查 DG 同步状态是否正常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
set linesize 2222 heading on wrap on
column dest_name format a20
column status format a8
column database_mode format a15
column recovery_mode format a35
column protection_mode format a25
column destination format a15
column archived_seq# format 999999999
column applied_seq# format 999999999
column error format a10
column srl format a5
column db_unique_name format a15
column gap_status format a10

SELECT inst_id, dest_name, status, database_mode, protection_mode,
recovery_mode, gap_status, archived_seq#, applied_seq#, srl,
db_unique_name, destination, error
FROM gv$archive_dest_status
WHERE status <> 'INACTIVE'
AND type = 'PHYSICAL';

INST_ID DEST_NAME STATUS DATABASE_MODE PROTECTION_MODE RECOVERY_MODE GAP_STATUS ARCHIVED_SEQ# APPLIED_SEQ# SRL DB_UNIQUE_NAME DESTINATION ERROR
---------- -------------------- -------- --------------- ------------------------- ----------------------------------- ---------- ------------- ------------ ----- --------------- --------------- ----------
2 LOG_ARCHIVE_DEST_2 VALID OPEN_READ-ONLY MAXIMUM PERFORMANCE MANAGED REAL TIME APPLY NO GAP 123726 123725 YES orcldg orcldg
1 LOG_ARCHIVE_DEST_2 VALID OPEN_READ-ONLY MAXIMUM PERFORMANCE MANAGED REAL TIME APPLY NO GAP 123726 123725 YES orcldg orcldg

检查备库同步进程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
set line2222 pages1000
select inst_id,pid,process,thread#,sequence#,status,delay_mins from gv$managed_standby;

INST_ID PID PROCESS THREAD# SEQUENCE# STATUS DELAY_MINS
---------- ---------- --------- ---------- ---------- ------------ ----------
1 10386 RFS 0 0 IDLE 0
1 3728 ARCH 2 123538 CLOSING 0
1 3730 ARCH 0 0 CONNECTED 0
1 3732 ARCH 2 123543 CLOSING 0
1 3921 ARCH 2 123537 CLOSING 0
1 4613 RFS 0 0 IDLE 0
1 4195 RFS 0 0 IDLE 0
1 4286 RFS 0 0 IDLE 0
1 10384 RFS 1 123728 IDLE 0
1 10381 RFS 2 123544 IDLE 0
1 10388 RFS 0 0 IDLE 0
1 4647 RFS 0 0 IDLE 0
1 5573 MRP0 1 123728 APPLYING_LOG 0

检查 DG 配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- 主库
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1 string LOCATION=+ARCH VALID_FOR=(ALL_
LOGFILES,ALL_ROLES) DB_UNIQUE_
NAME=ORCL
log_archive_dest_2 string service=orcldg LGWR ASYNC vali
d_for=(online_logfiles,primary
_role) db_unique_name=orcldg
-- 备库
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_1 string
log_archive_dest_2 string service=orcl LGWR ASYNC valid_
for=(online_logfiles,primary_r
ole) db_unique_name=orcl

检查 standby logfile 状态:

1
2
3
4
5
6
7
8
   THREAD#     GROUP# MEMBER                                                                                               STATUS     ARC T2.BYTES/1024/1024
---------- ---------- ---------------------------------------------------------------------------------------------------- ---------- --- ------------------
1 5 /u01/app/oracle/oradata/redostd05.log UNASSIGNED NO 200
1 6 /u01/app/oracle/oradata/redostd06.log UNASSIGNED NO 200
1 9 /u01/app/oracle/oradata/redostd09.log UNASSIGNED NO 200
2 7 /u01/app/oracle/oradata/redostd07.log UNASSIGNED NO 200
2 8 /u01/app/oracle/oradata/redostd08.log UNASSIGNED NO 200
2 10 /u01/app/oracle/oradata/redostd10.log UNASSIGNED NO 200

通过以上查询可以发现,DG 配置以及同步都是正常的,应该不会出现无法实时查询的问题,那为什么会延时呢?

尝试在主库手动触发日志切换:

1
2
-- 主库执行
SQL> alter system archive log current;

发现数据可以正常同步过来,那就是说同步是没问题的。

检查 alert 日志

查看数据库 alert 日志看看是不是有报错:

1
2
3
4
5
## 在主库发现有报错
LNS: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (270)
LNS: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 270 for archive log file 2 to 'orcldg'
ORA-00270: error creating archive log

oerr 看一下报错的意思:

1
2
3
4
5
6
[oracle@orcl:/home/oracle]$ oerr ora 270
00270, 00000, "error creating archive log %s"
// *Cause: An error was encountered when either creating or opening
// the destination file for archiving.
// *Action: Check that the archive destination is valid and that there
// is sufficient space on the destination device.

难道是 DG 的磁盘空间满了?

1
2
3
4
5
6
[oracle@orcldg:/]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_jxsemesdg01-lv_root
3.1T 2.6T 334G 89% /
tmpfs 63G 72K 63G 1% /dev/shm
/dev/sda1 477M 79M 373M 18% /boot

磁盘空间还没满,那为什么无法创建归档?检查一下归档目录位置:

1
2
3
4
5
6
7
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 123726
Next log sequence to archive 0
Current log sequence 123727

没有设置归档目录,默认存放到闪回目录下,检查一下闪回配置:

1
2
3
4
5
6
7
SQL> show parameter db_recover

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string /u01/app/oracle/fast_recovery_
area
db_recovery_file_dest_size big integer 100G

好的,破案了,闪回配置的空间上限设置了 100G,然后归档也是存放在闪回目录,当闪回目录达到上限之后,归档自然就无法写入了。

解决方案

解决方法也很简单,就是把归档日志从闪回目录独立出去:

1
2
3
4
[oracle@orcldg:/u01/app/oracle]$ mkdir archivelog
[oracle@orcldg:/u01/app/oracle]$ cd archivelog/
[oracle@orcldg:/u01/app/oracle/archivelog]$ pwd
/u01/app/oracle/archivelog

备库设置归档目录:

1
2
3
4
5
6
7
8
9
10
11
SQL> alter system set log_archive_dest_1='LOCATION=/u01/app/oracle/archivelog VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=orcldg';

System altered.

SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/app/oracle/archivelog
Oldest online log sequence 123726
Next log sequence to archive 0
Current log sequence 123727

主库重新激活同步:

1
2
3
4
5
6
7
SQL> alter system set log_archive_dest_state_2=defer;

System altered.

SQL> alter system set log_archive_dest_state_2=enable;

System altered.

再从检查备库 standby logfile 状态:

1
2
3
4
5
6
7
8
9
10
SQL> select t2.thread#,t1.group#,t1.member,t2.STATUS,t2.ARCHIVED,t2.bytes/1024/1024 from v$logfile t1,v$standby_log t2 where t1.group#=t2.group# order by 1,2;

THREAD# GROUP# MEMBER STATUS ARC T2.BYTES/1024/1024
---------- ---------- ---------------------------------------------------------------------------------------------------- ---------- --- ------------------
1 5 /u01/app/oracle/oradata/redostd05.log UNASSIGNED NO 200
1 6 /u01/app/oracle/oradata/redostd06.log UNASSIGNED NO 200
1 9 /u01/app/oracle/oradata/redostd09.log UNASSIGNED NO 200
2 7 /u01/app/oracle/oradata/redostd07.log ACTIVE YES 200
2 8 /u01/app/oracle/oradata/redostd08.log UNASSIGNED NO 200
2 10 /u01/app/oracle/oradata/redostd10.log UNASSIGNED NO 200

ADG 恢复实时查询,问题解决。

写在最后

一句话总结:闪回恢复区与归档日志共享空间时,必须做好空间规划和监控,否则会导致 DG 实时应用中断而难以察觉。生产环境建议将归档独立部署,避免空间竞争。