今天一个开发库启动不了了,发过来报错一看是日志文件损坏了(见下图),接着说了一下前因后果。说是年前服务器掉电了,然后就再没有启动起来过。今天有人用才想到要处理。
先说一下大体的思路,如果损坏的redo log是INACTIVE状态的,也就是实例崩溃恢复用不到的redo log,那处理起来比较容易,直接alter database clear logfile group #;或alter database clear unarchived logfile group #;重建日志组就行了。建议重建日志文件级后对数据库做一个全库备份,特别是强制clear后,造成的归档日志文件断层。在如果损坏的redo log是ACTIVE或CURRENT状态的,也就是实例崩溃恢复需要用到的redo log,那处理起来就比较麻烦了,损坏这种redo log就意味着丢失数据。
redo log的三种状态:
由于这个开发库有种种的问题,恢复起来遇到了各种情况,这里用一个虚拟机上的数据库演示如果CURRENT或ACTIVE状态的日志文件损坏的情况下如何恢复。
1、构造场景
删除一张表的数据但不提交,然后在另一个会话中把数据库shutdown abort。再删除所有的redo log文件。
#session1 sys@ORCL>deletefromzx; 2858rowsdeleted. #session2 sys@ORCL>selectgroup#,statusfromv$log; GROUP#STATUS ---------------------------------------------------------- 1INACTIVE 2ACTIVE 3CURRENT sys@ORCL>shutdownabort; ORACLEinstanceshutdown. #删除redolog文件 [oracle@rhel6~]$cd/u02/app/oracle/oradata/orcl/ [oracle@rhel6orcl]$ls-l total1944992 -rw-r-----1oracleoinstall9748480Feb2423:56control01.ctl -rw-r-----1oracleoinstall9748480Feb2423:56control02.ctl -rw-r-----1oracleoinstall328343552Feb2423:54example01.dbf -rw-r-----1oracleoinstall52429312Feb2423:54redo01.log -rw-r-----1oracleoinstall52429312Feb2423:55redo02.log -rw-r-----1oracleoinstall52429312Feb2423:55redo03.log -rw-r-----1oracleoinstall545267712Feb2423:54sysaux01.dbf -rw-r-----1oracleoinstall796925952Feb2423:54system01.dbf -rw-r-----1oracleoinstall30416896Feb2413:58temp01.dbf -rw-r-----1oracleoinstall110108672Feb2423:54undotbs01.dbf -rw-r-----1oracleoinstall5251072Feb2423:54users01.dbf [oracle@rhel6orcl]$rmredo*log l[oracle@rhel6orcl]$ls-l total1791212 -rw-r-----1oracleoinstall9748480Feb2423:56control01.ctl -rw-r-----1oracleoinstall9748480Feb2423:56control02.ctl -rw-r-----1oracleoinstall328343552Feb2423:54example01.dbf -rw-r-----1oracleoinstall545267712Feb2423:54sysaux01.dbf -rw-r-----1oracleoinstall796925952Feb2423:54system01.dbf -rw-r-----1oracleoinstall30416896Feb2413:58temp01.dbf -rw-r-----1oracleoinstall110108672Feb2423:54undotbs01.dbf -rw-r-----1oracleoinstall5251072Feb2423:54users01.dbf
2、启动数据库出现报错
idle>startup ORACLEinstancestarted. TotalSystemGlobalArea1603411968bytes FixedSize 2253664bytes VariableSize 1476398240bytes DatabaseBuffers 117440512bytes RedoBuffers 7319552bytes Databasemounted. ORA-00313:openFailedformembersofloggroup2ofthread1 ORA-00312:onlinelog2thread1:'/u02/app/oracle/oradata/orcl/redo02.log' ORA-27037:unabletoobtainfilestatus Linux-x86_64Error:2:Nosuchfileordirectory Additionalinformation:3
3、尝试使用clear方式重建日志组出现报错
idle>alterdatabaseclearlogfilegroup2; alterdatabaseclearlogfilegroup2 * ERRORatline1: ORA-01624:log2neededforcrashrecoveryofinstanceorcl(thread1) ORA-00312:onlinelog2thread1:'/u02/app/oracle/oradata/orcl/redo02.log' idle>alterdatabaseclearunarchivedlogfilegroup2; alterdatabaseclearunarchivedlogfilegroup2 * ERRORatline1: ORA-01624:log2neededforcrashrecoveryofinstanceorcl(thread1) ORA-00312:onlinelog2thread1:'/u02/app/oracle/oradata/orcl/redo02.log'
从报错信息中可以看出log 2是实例崩溃恢复所需要的日志文件,不能直接重建。
4、这种情况下使用隐含参数_allow_resetlogs_corruption,创建pfile,把*._allow_resetlogs_corruption=TRUE加入到pfile中。然后mount数据库,强制不完全恢复,再open resetlogs
idle>createpfile='/home/oracle/initorcl.ora'fromspfile; Filecreated. [oracle@rhel6orcl]$vi/home/oracle/initorcl.ora idle>shutdownimmediate; ORA-01109:databasenotopen Databasedismounted. ORACLEinstanceshutdown. idle>startuppfile='/home/oracle/initorcl.ora'mount; ORACLEinstancestarted. TotalSystemGlobalArea1603411968bytes FixedSize 2253664bytes VariableSize 1476398240bytes DatabaseBuffers 117440512bytes RedoBuffers 7319552bytes Databasemounted. idle>showparameter_allow_ NAME TYPE VALUE --------------------------------------------------------------------------------------------------- _allow_resetlogs_corruption boolean TRUE idle>recoverdatabaseuntilcancel; ORA-00279:change1023441generatedat02/24/201723:54:54neededforthread1 ORA-00289:suggestion:/u02/app/oracle/product/11.2.4/db1/dbs/arch1_2_936817668.dbf ORA-00280:change1023441forthread1isinsequence#2 Specifylog:{<RET>=suggested|filename|AUTO|CANCEL} cancel ORA-01547:warning:RECOVERsucceededbutOPENRESETLOGSwouldgeterrorbelow ORA-01194:file1needsmorerecoverytobeconsistent ORA-01110:datafile1:'/u02/app/oracle/oradata/orcl/system01.dbf' ORA-01112:mediarecoverynotstarted idle>alterdatabaSEOpenresetlogs; Databasealtered. idle>selectopen_modefromv$database; OPEN_MODE ------------------------------------------------------------ READWRITE
可以看到现在数据库已经被open了。
5、再次查看第一步中被删除的数据的表,数据仍然存在说明丢失CURRENT或ACTIVE状态的日志文件会导致数据丢失。
idle>selectcount(*)fromzx; COUNT(*) ---------- 2858
以上是在虚拟机上做测试的恢复过程,但是对于前面说到的开发库的恢复就没有这个过程简单了。可以说是解决了一个报错又出来新的报错。
在使用_allow_resetlogs_corruption参数执行不完全恢复,open resetlogs 时,遇到了ORA-01248
sql>alterdatabaSEOpenresetlogs; alterdatabaSEOpenresetlogs * ERRORatline1: ORA-01248:file5wascreatedinthefutureofincompleterecovery
于是先把这个文件offline drop
sql>alterdatabasedatafile5offlinedrop;
再次open resetlogs时又遇到了ORA-00704和ORA-01555
sql>alterdatabaSEOpenresetlogs; alterdatabaSEOpenresetlogs * ERRORatline1: ORA-01092:ORACLEinstanceterminated.Disconnectionforced ORA-00704:bootstrapprocessfailure ORA-00704:bootstrapprocessfailure ORA-00604:erroroccurredatrecursivesqllevel1 ORA-01555:snapshottooold:rollbacksegmentnumber5withname "_SYSSMU5_4116806824$"toosmall ProcessID:3396 SessionID:573Serialnumber:51
由于现在的水平有限,在网上查资料也没有能解决这一系列的问题,最后没办法只能重建库,重新导数据了。
其实上午在模拟这个问题的时候,在open resetlogs时还遇到了一个经典的报错ORA-600 [2662],这个错误可以参考eygle的博客http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html
参考:http://iquicksandi.blog.163.com/blog/static/13228526220107642655204/