这是前段时间遇到的,今天有时间把它记录下来。
情况是这样的,用户的OracleRAC集群机房没有安装UPS,而且供电不稳定,因此总是突然断电,前面几次还可以,供电后集群就自动恢复了,但到最后还是出问题了,现象就是实例启动不了。
经过几次恢复,发现大概可以分为以下几种情况,现分别描述如下。
情况一:可直接从redo日志中恢复
[orasrv@db01 ~]$ sqlplus / as sysdba
sql*Plus: Release 11.2.0.4.0 Production on Mon Jul 11 10:29:04 2016
Copyright (c) 1982,2013,Oracle. All rights reserved.
Connected to an idle instance.
sql> startup mount
ORACLE instance started.
Total System Global Area 1.0796E+11 bytes
Fixed Size 2266024 bytes
Variable Size 5.9861E+10 bytes
Database Buffers 4.8050E+10 bytes
Redo Buffers 50450432 bytes
Database mounted.
sql> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code,arguments: [kcrfr_read_5],[63],[674817],[],[]
此时的情况是实例无法打开,报600错误,其实就是有操作日志没有commit成功,查询一下当前的日志情况
sql> select group#,sequence#,status,first_time,next_change# from v$log;
GROUP# SEQUENCE# STATUS FIRST_TIME NEXT_CHANGE#
---------- ---------- -------------------------------- ------------ ------------
5 63 CURRENT 28-MAY-16 2.8147E+14
2 62 INACTIVE 25-MAY-16 22437896
1 61 INACTIVE 22-MAY-16 21890487
4 68 INACTIVE 24-MAY-16 22317178
6 69 CURRENT 27-MAY-16 2.8147E+14
3 67 INACTIVE 21-MAY-16 21837803
6 rows selected.
OK,是Group5的时间最新,然后到ASM中去查一下group5日志文件的具体路径,在恢复的时候需要手工输入。
[grid@db01 ~]$ asmcmd
ASMCMD> ls DATA/实例名/ONLINELOG/
group_1.261.898205699
group_2.262.898205705
group_3.267.898205911
group_4.268.898205917
group_5.263.898205713
group_6.269.898205925
好,用查到的日志来恢复数据吧!
sql> recover database until cancel using backup controlfile;
ORA-00279: change 22771825 generated at 05/30/2016 11:21:44 needed for thread 2
ORA-00289: suggestion : /work/11.2.0.4/oracle/db/dbs/arch2_69_898205698.dbf
ORA-00280: change 22771825 for thread 2 is in sequence #69
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
+DATA/实例名/ONLINELOG/group_5.263.898205713
Log applied.
Media recovery complete.
运气不错,日志只是没有commit,而不是损坏丢失,然后就可以open实例了,记得带上resetlogs参数。
sql> alter database open resetlogs;
Database altered.
sql> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning,Real Application Clusters,Automatic Storage Management,Oracle Label Security,OLAP,Data Mining,Oracle Database Vault and Real Application Testing options
实例能够启动后,还需要使用集群命令来检查一下状态,不行的话,还可以重新用集群命令启动一下
crs_stat –t
srvctl start database -d 实例名
今天先到这里,下次继续讲如果日志损坏丢失情况下如何恢复。