当前位置: 首页 > news >正文

恢复某个数据文件不适当,导致DataGuard无法open数据库

1、案例概述

同事反馈:一套11gR2的DataGuard环境,备库执行alter databases open时,一直hang住,数据库的alert日志也没有任何的报错信息。询问得知,由于备库的system数据文件损坏,导致DG环境中断,于是同事从主库备份了数据文件并在备库进行了恢复。目前,故障现象是执行alter databases open时一直hang住。

 

2、案例分析

2.1 建议同事收集10046事件 和进程的pstack 和 strace日志。

2.2 分析10046事件,摘取部分10046日志如下所示。

PARSING IN CURSOR #140049693225008 len=19 dep=0 uid=0 oct=35 lid=0 tim=1758607635447030 hv=684487124 ad='a3fe61a20' sqlid='1h50ks4ncswfn'
ALTER DATABASE OPEN
END OF STMT
PARSE #140049693225008:c=1165,e=5441,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1758607635447028
WAIT #140049693225008: nam='control file sequential read' ela= 21 file#=0 block#=1 blocks=1 obj#=-1 tim=1758607635448107
WAIT #140049693225008: nam='control file sequential read' ela= 9 file#=0 block#=39 blocks=1 obj#=-1 tim=1758607635448165

......

WAIT #140049693225008: nam='db file sequential read' ela= 24 file#=1 block#=1 blocks=1 obj#=-1 tim=1758607635502499
WAIT #140049693225008: nam='control file sequential read' ela= 14 file#=0 block#=413 blocks=1 obj#=-1 tim=1758607635502547
DDE: Problem Key 'ORA 1110' was flood controlled (0x1) (no incident)
ORA-01110: data file 1: '/u01/app/oracle/oradata/dg/datafile/system.271.1133385189'
Managed Recovery: Real Time Apply enabled.
WAIT #140049693225008: nam='control file sequential read' ela= 8 file#=0 block#=1 blocks=1 obj#=-1 tim=1758607635503330

......

WAIT #140049693225008: nam='rdbms ipc reply' ela= 1301 from_process=10 timeout=2147483647 p3=0 obj#=-1 tim=1758607926054331
ORA-10458: standby database requires recovery
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/dg/datafile/system.271.1133385189'
EXEC #140049693225008:c=6067521,e=290607626,p=6542,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=1758607926054783
ERROR #140049693225008:err=10458 tim=1758607926054813
WAIT #140049693225008: nam='SQL*Net break/reset to client' ela= 34 driver id=1650815232 break?=1 p3=0 obj#=-1 tim=1758607926061084
WAIT #140049693225008: nam='SQL*Net break/reset to client' ela= 154 driver id=1650815232 break?=0 p3=0 obj#=-1 tim=1758607926061286
WAIT #140049693225008: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1758607926061311

*** 2025-09-23 14:12:18.084
WAIT #140049693225008: nam='SQL*Net message from client' ela= 12844805 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1758607938906164
CLOSE #140049693225008:c=13,e=13,dep=0,type=0,tim=1758607938906343
=====================
PARSING IN CURSOR #140049693263712 len=55 dep=0 uid=0 oct=42 lid=0 tim=1758607938906731 hv=2655499671 ad='0' sqlid='0kjg1c2g4gdcr'
ALTER SESSION SET EVENTS '10046 trace name context off'
END OF STMT

从10046事件的日志可以看出,alter database open时,虽然表面上是hang住,但实际上已经有报错信息了,提示file 1当前处于不一致的状态,需要继续恢复。

2.3 让同事重新发起alter database open,把数据库刚刚产生的alert日志发出来看看。显示如下:

Tue Sep 23 18:21:23 2025
alter database open read only
Beginning Standby Crash Recovery.
Serial Media Recovery started
Managed Standby Recovery starting Real Time Apply
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_2_782523_nf4x17rc_.arc
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_1_1545049_nf4x2jl5_.arc
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_2_782524_nf4x34bc_.arc
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_1_1545050_nf4x34s5_.arc
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_2_782525_nf4x3hvb_.arc
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_1_1545051_nf4x3j80_.arc
Tue Sep 23 18:21:33 2025
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_2_782526_nf4x41vl_.arc
Media Recovery Log /u01/app/oracle/oradata/fast_recovery_area/DG01/archivelog/2025_09_23/o1_mf_1_1545052_nf4x4288_.arc
Media Recovery Waiting for thread 2 sequence 782527 (in transit)
Media Recovery of Online Log [Thread=2, Seq=782527]
Recovery of Online Redo Log: Thread 2 Group 10 Seq 782527 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/dg/onlinelog/group_10.320.1133484345
Mem# 1: /u01/app/oracle/oradata/dg/onlinelog/group_10.4595.1133484345
Media Recovery Waiting for thread 1 sequence 1545053 (in transit)
Media Recovery of Online Log [Thread=1, Seq=1545053]
Recovery of Online Redo Log: Thread 1 Group 7 Seq 1545053 Reading mem 0
Mem# 0: /u01/app/oracle/oradata/dg/onlinelog/group_7.302.1133484315
Mem# 1: /u01/app/oracle/oradata/dg/onlinelog/group_7.4584.1133484315

可以看出,发起alter database open read only命令后,备库先应用归档日志,归档日志应用完毕后,又开始继续应用standby log,永无止境。所以前台发起的alter database open read only命令就一直hang住。

2.4 感觉还是那个当初出问题的system数据文件有问题。于是再次询问同事,当备库原始的那个system数据文件损坏后,为解决这个问题,是怎么重新将system数据文件恢复到备库的?得知:首先用rman的backup datafile 1..命令备份时报错,于是改成直接用cp命令了,最终,是现在的故障现象,备库没有任何报错,但alter database open read only命令就一直hang住。

2.5 主库在open的情况下,直接用cp命令将system数据文件复制到备库。这肯定不对。由于让同事把rman的backup datafile 1..命令备份时报错的日志发出来看看。backup命令的确是报错了,但其实system数据文件也已经备份成功,具体报错原因不在本案例进行详细说明。让同事用刚刚rman生成的备份集在备库进行恢复,最终故障处理完毕,备库成功open。

 

3、案例总结

怎么恢复备库的某个数据文件,步骤如下:
1) 在主库对该数据文件进行备份。
rman> backup datafile n format '/tmp/datafilen.bak';
2) 将主库刚生成的备份集复制至备库。
3) 在备库注册备份集。
rman> catalog backuppiece '/tmp/datafilen.bak';
4) 停止备库应用日志功能。
SQL> alter database recover managed standby database cancel;
5) 在备库restore数据文件。
rman> restore datafile n;
6) 启用备库应用日志功能。
SQL> alter database recover managed standby database using current logfile disconnect;

 

http://www.hskmm.com/?act=detail&tid=15091

相关文章:

  • Nginx 部署及配置
  • vite静态资源处理
  • 洛谷B4040 [GESP202409 四级] 黑白方块 题解
  • SerpApi:一站式搜索引擎数据抓取API完全指南
  • 补whk时的鲜花(持续更新)
  • css 使用记录 随笔
  • newDay02
  • 【OI 档案-2025】CSP 赛前集训记(初赛后+复赛)
  • Git 从零到一:以 Gitee 为例的实战与可视化指南
  • 代码随想录算法训练营第七天 |第454题.四数相加II、383. 赎金信、第15题. 三数之和
  • day06
  • 前沿速览:TrafficVLM、DeepSeek-Terminus、Qwen3-Omni、蚂蚁百灵、Wan2.2-Animate、Qianfan-VL
  • 代码随想录算法训练营第七天 | leetcode 454 383 15 18
  • 概率期望
  • Day2
  • 2025.9.23总结 - A
  • 8
  • 从3亿到48亿:NuGet周下载量跃迁背后的.NET生态演进与未来挑战(2019-2025)
  • 实用指南:PHP 使用说明
  • 9月23号
  • CF520E Pluses everywhere 题目分析
  • java里面的IO流分为哪几种,他们的区别是什么呢
  • ReLU函数及它的导数
  • 基础数论
  • 第一次个人编程作业-论文查重
  • 使用Claude代码子代理生成项目特定提交消息的技术实践
  • 走迷宫(BFS)
  • MyBatis分页的原理和分页插件的原理是什么
  • 达成度报告
  • 旋转图像-leetcode