实验-比较批量提交与单独提交:
--构造环境
drop table t1 purge;
create table t1(x int);
drop table t2 purge;
create table t2(x int);
select * from v$mystat where rownum=1;
--开始测试
set timing on
--语句1
begin
for i in 1 .. 100000 loop
insert into t1 values (i);
commit;
end loop;
end;
/
--语句2
begin
for i in 1 .. 100000 loop
insert into t2 values (i);
end loop;
commit;
end;
/
--语句3(找出提交过于频繁的SESSION)
select t1.sid,t1.value,t2.name
from v$sesstat t1,v$statname t2
where t2.name like '%user commits%' --可以只选user commits,其他系统级的先不关心
and t1.STATISTIC# = t2.STATISTIC#
and value >= 10
order by value desc;
1. redo与相关latch:
redo alloction latch:用于管理log buffer内存空间分配
redo copy latch :用于从pga中的内存写入log buffer
数据从pga写入log buffer时,LGWR进程处于等待状态,直到进程拷贝完毕数据才把log buffer写出到磁盘。
查看保护数据复制的redo copy latch个数:
sys@HX> select x.ksppinm,y.ksppstvl,x.ksppdesc
from x$ksppi x,x$ksppcv y
where x.indx=y.indx
and x.ksppinm like '%_log_simultaneous_copies%';
KSPPINM |KSPPSTVL |KSPPDESC
------------------------------|----------|----------------------------------------------------------------------
_log_simultaneous_copies |2 |number of simultaneous copies into redo buffer(# of copy latches)
1 row selected.
oracle 8.1.3以后,这个数变成了cpu数的两倍,我这个是单cpu,所以_log_simultaneous_copies为2:
[~]$cat /proc/cpuinfo | grep processor
processor : 0
从v$latch视图查看redo copy latch信息:
sys@HX> select name,gets,immediate_gets,immediate_misses,spin_gets
from v$latch where name like 'redo copy%';
NAME | GETS|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS
------------------------------|----------|--------------|----------------|----------
redo copy | 27| 406615| 165| 0
1 row selected.
从v$latch_children视图获取子latch信息:
sys@HX> select name,spin_gets
from v$latch_children where name like 'redo% copy';
NAME | GETS|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS
------------------------------|----------|--------------|----------------|----------
redo copy | 5| 9903| 30| 0
redo copy | 1| 396712| 135| 0
2 rows selected.
一共两个。
server process进程获取redo copy latch后,紧接着需要获得Redo allocation latch,分配redo空间,分配完成后释放Redo allocation latch,接着把pga中的数据拷贝到redo log buffer,拷贝完成后redo copy latch释放。
数据拷贝完成后,server进程可能需要通知lgwr进程把数据写出(例如commit触发)。为了避免lgwr被不必要通知,server进程需要首先获得redo writting latch,以检查lgwr是否已经激活或者已经被通知。如果lgwr已经激活,那么redo writting latch释放。所以,如果用户提交过于频繁(因为要不断价差LGWR是否激活),可能会导致redo writting latch竞争。
sys@HX> l
select name,spin_gets
from v$latch where name like 'redo writing'
sys@HX> /
NAME | GETS|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS
------------------------------|----------|--------------|----------------|----------
redo writing | 70942| 0| 0| 0
1 row selected.
在执行redo copy的过程中,server进程以log file sync等待事件处于等待状态。拷贝完毕后,进程苏醒,获得redo alloction latch检查相应redo是否已经写入redo log file,如果还没写出那就继续等待。
所以内存的分配(log buffer 获得)与redo写出(log buffer释放)都需要获得redo alloction latch。在繁忙的系统中,这个latch竞争也会很激烈。
2. 9iR2 redo 增强
9iR2中,oracle通过log_parallelism定义redo allocation并发级别,如果定义大于1,将分配多个共享redo log buffer区域,每个log buffer都有自己的 redo alloction latch进行保护,这提高了redo的并发性能,类似子缓冲池。9I中这种多 redo log buffer的机制成为PBRS(public redo strands),使用高端服务器,cpu数量多时,可以考虑启用并行redo。通过视图v$latch可以观察redo alloction latch竞争累计等待时间:
select name,misses,spin_gets
from v$latch
where name in ('redo allocation','redo copy')
或者
select substr(b.name,1,20),spin_gets
from v$latch a,v$latchname b
where b.name in ('redo alloction','redo copy') and a.latch#=b.latch#
NAME | GETS| MISSES|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS
------------------------------|----------|----------|--------------|----------------|----------
redo copy | 42| 0| 408269| 165| 0
redo allocation | 250550| 10| 408273| 0| 0
2 rows selected.
如果misses/gets比率超过1%,或者IMMEDIATE_MISSES/(IMMEDIATE_MISSES+IMMEDIATE_GETS)超过1%,那么通常认为存在latch竞争。
如果主机cpu在16-64,通常log_parallelism可以设置为1-8个。9i中这个参数默认是1,缺省redo allocation latch也只有一个。
3. 10g redo增强
sys@HX> select x.ksppinm,x$ksppcv y
where x.indx=y.indx
and x.ksppinm like '%_log_para%';
KSPPINM |KSPPSTVL |KSPPDESC
------------------------------|----------|----------------------------------------------------------------------
_log_parallelism_max |1 |Maximum number of log buffer strands
_log_parallelism_dynamic |TRUE |Enable dynamic strands
2 rows selected.
10g以后引入参数_log_parallelism_max 和_log_parallelism_dynamic,_log_parallelism_dynamic参数默认被设置为true,表示并行度自动分配,但是不大于_log_parallelism_max。
相比9I 的PBRS,oracle 10g增加了PVRS(private redolog strands)机制,实际就是在共享池中增加了一些小的内存空间,每个空间约65-128k大小,这些内存空间通过各自的 redo allocation latch保护,这样重做日志就不再需要通过pga到log buffer 的拷贝过程,redo copy latch也就不再需要了。新机制下,进行redo写出时,lgwr将pbrs与pvrs中的内容写出。发生flush redo时,所有public redo alloction latch需要被获取,所有public strands 的redo copy latch 需要被检查,所有包含活动事务的private strands需要被持有。PVRS并不试用与RAC系统。
查看大小:
sys@HX> select * from v$sgastat where name like 'private%';
POOL |NAME | BYTES
------------|------------------------------|----------
shared pool |private strands | 3677184
所以,在10g中,可以看到多个的redo allocation latch;
sys@HX> l
select name,spin_gets
from v$latch_children
where name in ('redo allocation')
sys@HX> /
NAME | GETS| MISSES|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS
------------------------------|----------|----------|--------------|----------------|----------
redo allocation | 1079| 0| 0| 0| 0
redo allocation | 1079| 0| 0| 0| 0
redo allocation | 1079| 0| 0| 0| 0
…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
redo allocation | 1089| 0| 0| 0| 0
redo allocation | 1677| 0| 0| 0| 0
redo allocation | 190555| 0| 0| 0| 0
redo allocation | 7885| 0| 0| 0| 0
redo allocation | 28013| 10| 409699| 0| 0
28 rows selected.
检查与PVRS相关的隐含参数:
sys@HX> select x.ksppinm,x$ksppcv y
where x.indx=y.indx
and x.ksppinm like '%_log_private%';
KSPPINM |KSPPSTVL |KSPPDESC
------------------------------|----------|----------------------------------------------------------------------
_log_private_parallelism_mul |10 |Active sessions multiplier to deduce number of private strands
_log_private_mul |5 |Private strand multiplier for log space preallocation
2 rows selected.
由于IMP与PVRS息息相关,可以查看IMU信息统计信息,注意到其中的redo allocation size部分。
sys@HX> select name,value from v$sysstat where upper(name) like '%IMU%';
NAME | VALUE
----------------------------------------------------------------------|----------
doubling up with imu segment | 0
IMU commits | 78802
IMU Flushes | 98
IMU contention | 15
IMU recursive-transaction flush | 3
IMU undo retention flush | 0
IMU ktichg flush | 1
IMU bind flushes | 0
IMU mbu flush | 0
IMU pool not allocated | 21513
IMU CR rollbacks | 0
IMU undo allocation size | 47199448
IMU Redo allocation size | 855624
IMU- Failed to get a private strand | 21513
cell simulated physical IO bytes eligible for predicate offload | 0
cell simulated physical IO bytes returned by predicate offload | 0
OS Maximum resident set size | 0
17 rows selected.
一旦内存到redo log file写出没完成,就可能会出现log file switch等待事件。
sys@HX> select name from v$event_name where upper(name) like '%STRAND%';
NAME
----------------------------------------------------------------------
log file switch (private strand flush incomplete)
4. 利用update语句分析改变向量与重做记录:
sys@HX> select * from test;
ID|NAME
----------|------------------------------
1|ok1
2|ok2
3|ok3
4|ok4
5|ok5
5 rows selected.
update t1 set x=10 where name='ok5';
修改步骤:
(1)找到name='ok5'的记录所在的块,读入buffer cache
(2)在回滚表空间的相应回滚事务表上分配事务槽,记录redo信息,这是第一个改变向量
(3)从回滚段读入buffer cache,创建id=5的前映象,这个也需要产生redo 信息,redo信息从pga复制到log buffer。
(4)修改x=10,这个也需要记录重做日志信息
(5)commit,redo log buffer中记录提交信息,并在回滚段标记事务为非激活,通知lgwr把信息写出到redo file。
sys@HX> alter system switch logfile;
sys@HX> select * from v$logfile where group#=(select group# from v$log where status='CURRENT');
GROUP#|STATUS |TYPE |MEMBER |IS_
----------|-------|-------|----------------------------------------|---
2| |ONLINE |/u01/oradata2/hx/redo02.log |NO
1 row selected.
发出更新操作:
sys@HX> update test set id=10 where name='ok5';
1 row updated.
sys@HX> commit;
Commit complete.
sys@HX> select sid,serial#,username from v$session where sid=(select sid from v$mystat where rownum<=1);
SID| SERIAL#|USERNAME
----------|----------|---------------
31| 459|HR
1 row selected.
在另外的session转储日志文件:
sys@HX> alter system dump logfile '/u01/oradata2/hx/redo02.log';
sys@HX> select value from v$diag_info where name like 'Default%';
VALUE
------------------------------------------------------------
/u01/diag/rdbms/hexel/hexel/trace/hexel_ora_6271.trc
分析转储文件:
找到update事务的相关redo信息:
这部分貌似和书上讲解的不太一样,待定:
5. 查看redo大小
(1)sqltrace:
hr@HX> set autotrace on;
hr@HX> insert into test values(11,'ok11');
Statistics
----------------------------------------------------------
1 recursive calls
22 db block gets
6 consistent gets
2 physical reads
1252 redo size
1132 bytes sent via sql*Net to client
1278 bytes received via sql*Net from client
4 sql*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
上面的redo size 部分就是这个操作产生的redo大小。
(2)记录当前session的redo size:
select a.name,b.value
from v$statname a,v$mystat b
where a.statistic#=b.statistic# and a.name='redo size';
NAME | VALUE
------------------------------|----------
redo size | 5144
(3)记录实例启动以来产生的redo:
hr@HX> select name,value from v$sysstat where name='redo size';
NAME | VALUE
------------------------------|----------
redo size | 181370372
(4)查询归档日志大小:
把数据库设置归档:
sys@HX> startup mount;
sys@HX> show parameter log_archive_dest_1 ;
NAME |TYPE |VALUE
------------------------------------|-----------|------------------------------
log_archive_dest_1 |string |LOCATION=USE_DB_RECOVERY_FILE_DEST
sys@HX> alter system set DB_RECOVERY_FILE_DEST_SIZE=4g;
System altered.
sys@HX> alter system set db_recovery_file_dest ='/u01/oradata2/archived';
System altered.
sys@HX> alter database archivelog;
Database altered.
sys@HX> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 57
Next log sequence to archive 59
Current log sequence 59
sys@HX> alter database open;
如果是9i,还需要设置参数log_archive_start,否则无法自动归档:
log_archive_start = true
查看所有的归档日志:
sys@HX> select name,completion_time,blocks*block_size/1024/1024 mb from v$archived_log where name is not null;
NAME |COMPLETION_TIME | MB
--------------------------------------------------|-------------------|----------
/u01/oradata2/archived/HEXEL/archivelog/2013_11_23|2013-11-23 12:07:42|.221191406
/o1_mf_1_59_990bjg94_.arc | |
/u01/oradata2/archived/HEXEL/archivelog/2013_11_23|2013-11-23 12:11:38|.009765625
/o1_mf_1_60_990bqt71_.arc | |
查看归档目录可用性,通常无法归档时由于尝试过度,相应dest的status可能会变为error:
SYS> select dest_id,dest_name,status from v$archive_dest;
DEST_ID|DEST_NAME |STATUS
----------|------------------------------|---------
1|LOG_ARCHIVE_DEST_1 |VALID
2|LOG_ARCHIVE_DEST_2 |INACTIVE
3|LOG_ARCHIVE_DEST_3 |INACTIVE
4|LOG_ARCHIVE_DEST_4 |INACTIVE
31 rows selected.
6. redo 写触发条件:
(1)每3s超时:
lgwr空闲时,处于rdbms ipc message等待,当3s超时时间到时,如果lgwr发现有redo写出,那么他将执行写出,这个时候log file paralel write等待事件出现:
可以使用10046事件跟踪跟踪;
hr@HX> select event#,event_id,name from v$event_name
where name='rdbms ipc message'
or name like '%log file parallel write%';;
EVENT#| EVENT_ID|NAME
----------|----------|------------------------------
7| 866018717|rdbms ipc message
134|3999721902|log file parallel write
2 rows selected.
hr@HX> alter session set events '10046 trace name context forever,level 12' ;
hr@HX> update test set id=1 where id=2;
hr@HX> alter session set events '10046 trace name context off' ;
(2)阈值达到
redo log buffer满1/3或者具有1m脏数据,
实际上是因为redo log buffer 达到隐含参数_log_io_size大小(这个参数缺省是1/3 log buffer大小,上限是1m)激活lgwr写出数据。注意log buffer size是以log block来衡量的。
sys@HX> select x.ksppinm,x$ksppcv y where x.indx=y.indx and x.ksppinm='_log_io_size';;
KSPPINM |KSPPSTVL |KSPPDESC
------------------------------|----------|----------------------------------------------------------------------
_log_io_size |0 |automatically initiate log write if this many redo blocks in buffer
1 row selected.
因此,实际上我们可以设置log buffer为3m,如果太小,则lgwr会过度被调用:
(3)用户提交:
当事务提交时,在redo stream中将会记录事务提交标识,在事务提交返回成功标识前必须等待lgwr写出数据,并且以log file sync等待事件开始休眠,所以,过度提交肯定会有更多的LOG FILE sync等待事件。可以设置参数,避免出现这个等待事件。但是这样无法保证事务恢复性,因为无法保证log 写出完成。
sys@HX> select x.ksppinm,x$ksppcv y
where x.indx=y.indx and x.ksppinm like '_wait_for_sync%';
KSPPINM |KSPPSTVL |KSPPDESC
------------------------------|----------|----------------------------------------------------------------------
_wait_for_sync |TRUE |wait for sync on commit MUST BE ALWAYS TRUE
1 row selected.
有变量用于记录redo线程需要同步的log block number,如果多个提交在唤醒LGWR之前发生,此变量记录最高 log block number,之前的所有redo都会被写入磁盘,这时候被成为组提交(group commit)。
(4)在dbwr写出之前:
在增量检查点中,如果dbwr要写出low rba与high rba之间的数据,必须要先通知lgwr去执行写出,即先写日志,再写数据。显然,未提交的数据是无法进行实例恢复的(需要回滚)。
7. redo log buffer大小设置
redo log buffer 是轻量级数据,缺省值是max(512kb,128*cpu_count),如果log buffer space等待事件出现并且很显著,可以考虑增大log_buffer较少竞争,log_buffer大小也要以granule为基础:
8. 日志文件状态:
(1)current:
当前使用的日志:
当前使用的,是实例崩溃恢复必须的日志文件。如果当前日志被删除,恢复稍微会麻烦一些。
(2)active:
活动的非当前文件:这种日志可能未完成归档,包含了尚未完成检查点的重做信息,也是实例崩溃恢复必须的。如果日志文件循环使用到这个文件,他将会被覆盖,这个时候需要执行检查点,从而产生log file switch 等待事件:
(3)inactive
非活动日志,不包含实例恢复需要的数据,但是包含介质恢复需要的数据,如果数据库运行在归档模式下,如果切换到这个文件,但是文件还未归档,也会发生log file switch 等待事件。
(4)unused
刚添加,未使用的日志就是这个状态(restlogs启动也会有文件在这个状态)。
9.日志块大小:
虽然log_buffer中的redo entries是以bytes为单位,但是lgwr仍是按照block把数据写出到磁盘。redo block size与操作系统相关,可以通过下面方法查看:
sys@HX> select max(lebsz) from x$kccle;
MAX(LEBSZ)
----------
512
1 row selected
粗略计算:
sys@HX> select name,value from v$sysstat where name in ('redo size','redo wastage','redo blocks written');
NAME | VALUE
------------------------------|----------
redo size | 13342316
redo wastage | 146300
redo blocks written | 27345
每个块头占用16bytes
sys@HX> select ceil((13342316+146300)/27345+16) from dual;
CEIL((13342316+146300)/27345+16)
--------------------------------
510
1 row selected.
[~]$dbfsize /u01/oradata2/hx/redo01.log
Database file: /u01/oradata2/hx/redo01.log
Database file type: file system
Database file size: 204800 512 byte blocks
Database file: /u01/oradata2/hx/system.dbf
Database file type: file system
Database file size: 126720 8192 byte blocks
可以转储重做文件头查看:
sys@HX> alter session set events 'immediate trace name redohdr level 10';
Session altered.
LOG FILE #1:
name #3: /u01/oradata2/hx/redo01.log
Thread 1 redo log links: forward: 2 backward: 0
siz: 0x32000 seq: 0x00000043 hws: 0x2 bsz: 512 nab: 0x13 flg: 0x1 dup: 1
10.重做日志文件相关操作:
(1)增加日志组:
由于日志不能改变大小,所以一般需要增加日志组:
sys@HX> select * from v$logfile;
GROUP#|STATUS |TYPE |MEMBER |IS_
----------|-------|-------|----------------------------------------|---
3| |ONLINE |/u01/oradata2/hx/redo03.log |NO
2| |ONLINE |/u01/oradata2/hx/redo02.log |NO
1| |ONLINE |/u01/oradata2/hx/redo01.log |NO3 rows selected.
sys@HX> alter database add logfile group 4 '/u01/oradata2/hx/redo04.log' size 20m;
Database altered.
sys@HX> alter database add logfile group 5 '/u01/oradata2/hx/redo05.log' size 20m;
Database altered.
sys@HX> select * from v$logfile;
GROUP#|STATUS |TYPE |MEMBER |IS_
----------|-------|-------|----------------------------------------|---
3| |ONLINE |/u01/oradata2/hx/redo03.log |NO
2| |ONLINE |/u01/oradata2/hx/redo02.log |NO
1| |ONLINE |/u01/oradata2/hx/redo01.log |NO
4| |ONLINE |/u01/oradata2/hx/redo04.log |NO
5| |ONLINE |/u01/oradata2/hx/redo05.log |NO
5 rows selected.
(2)删除inactive状态的日志文件:(需要先归档,但是11g下貌似只要切换日志就归档呢)
sys@HX> alter database drop logfile group 1;
Database altered.
sys@HX> select * from v$logfile;
GROUP#|STATUS |TYPE |MEMBER |IS_
----------|-------|-------|----------------------------------------|---
3| |ONLINE |/u01/oradata2/hx/redo03.log |NO
2| |ONLINE |/u01/oradata2/hx/redo02.log |NO
4| |ONLINE |/u01/oradata2/hx/redo04.log |NO
5| |ONLINE |/u01/oradata2/hx/redo05.log |NO
sys@HX> alter system archive log sequence 68;
alter system archive log sequence 68
*
ERROR at line 1:
ORA-16013: 日志 2 sequence# 68 不需要归档
sys@HX> ho rm /u01/oradata2/hx/redo02.log
重新初始化原来的日志文件,使用原来的group#:
sys@HX> alter database add logfile group 2 '/u01/oradata2/hx/redo02.log' size 20m reuse;
Database altered.
强制日志模式下,任何操作都会产生日志:
sys@HX> select force_logging from v$database;
FOR
---
NO
1 row selected.
sys@HX> alter database force logging;
Database altered.
sys@HX> select force_logging from v$database;
FOR
---
YES
1 row selected.
sys@HX> alter database no force logging;
Database altered.
11. oracle redo相关故障恢复;
(1)丢失非活动日志文件:
在数据库运行时候,如果删除了inactive状态的日志文件,切换到这个文件时后,数据库可以检测到错误,直接跳到下一个文件:
sys@HX> select a.group#,a.status,b.member from v$log a,v$logfile b where a.group#=b.group#;
GROUP#|STATUS |MEMBER
----------|----------------|----------------------------------------
3|INACTIVE |/u01/oradata2/hx/redo03.log
2|INACTIVE |/u01/oradata2/hx/redo02.log
4|CURRENT |/u01/oradata2/hx/redo04.log
5|INACTIVE |/u01/oradata2/hx/redo05.log
sys@HX> ho mv /u01/oradata2/hx/redo02.log /u01/oradata2/hx/redo02.log.bak
sys@HX> alter system switch logfile;
System altered.
对于inactive状态文件,由于已经完成检查点,不会丢失数据,采用清除方法,重新建立即可:
sys@HX> alter database clear logfile group 2;
Database altered.
clear并重新建立文件后,这个文件是unused状态,可以继续使用
sys@HX> select a.group#,v$logfile b where a.group#=b.group#;
GROUP#|STATUS |MEMBER
----------|----------------|----------------------------------------
3|INACTIVE |/u01/oradata2/hx/redo03.log
2|UNUSED |/u01/oradata2/hx/redo02.log
4|CURRENT |/u01/oradata2/hx/redo04.log
5|INACTIVE |/u01/oradata2/hx/redo05.log
4 rows selected.
如果是归档模式:
则需要使用命令:
sys@HX> alter database clear unarchived logfile group 2;
(2)丢失active和current状态的文件:
由于正常关闭,日志文件是inactive和current状态,实例不需要恢复,由于无法对current状态的文件执行clear操作,这个时候可以启动到mount状态,clear inactive状态的文件,再使用until cancel恢复,然后使用restlogs打开数据库:
sys@HX> select a.group#,v$logfile b where a.group#=b.group#;
GROUP#|STATUS |MEMBER
----------|----------------|----------------------------------------
2|UNUSED |/u01/oradata2/hx/redo02.log
5|INACTIVE |/u01/oradata2/hx/redo05.log
4|CURRENT |/u01/oradata2/hx/redo04.log
3|INACTIVE |/u01/oradata2/hx/redo03.log
4 rows selected.
sys@HX> ho rm /u01/oradata2/hx/redo* -rf
sys@HX> alter database open;
ERROR:
ORA-03114: 未连接到 ORACLE
alter database open
*
ERROR at line 1:
ORA-03113: 通信通道的文件结尾
进程 ID: 8542
会话 ID: 1 序列号: 5
实例会自己挂掉,查看告警日志文件:
alter database open
Block change tracking file is current.
Sun Nov 24 15:19:33 2013
Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_lgwr_8486.trc:
ORA-00313: open Failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oradata2/hx/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_lgwr_8486.trc:
ORA-00313: open Failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oradata2/hx/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_ora_8542.trc:
ORA-00313: 无法打开日志组 1 (用于线程 ) 的成员
ORA-00312: 联机日志 2 线程 1: '/u01/oradata2/hx/redo02.log'
Sun Nov 24 15:19:34 2013
System state dump requested by (instance=1,osid=8542),summary=[abnormal instance termination].
System State dumped to trace file /u01/diag/rdbms/hexel/hexel/trace/hexel_diag_8476.trc
USER (ospid: 8542): terminating the instance due to error 313
Dumping diagnostic data in directory=[cdmp_20131124151934],requested by (instance=1,summary=[abnormal instance termination].
Instance terminated by USER,pid = 8542
重新启动到mount状态:进行恢复:
SYS> startup mount;
SYS> select a.group#,v$logfile b where a.group#=b.group#;
GROUP#|STATUS |MEMBER
----------|----------------|----------------------------------------
2|UNUSED |/u01/oradata2/hx/redo02.log
5|INACTIVE |/u01/oradata2/hx/redo05.log
4|CURRENT |/u01/oradata2/hx/redo04.log
3|INACTIVE |/u01/oradata2/hx/redo03.log
4 rows selected.
SYS> alter database clear logfile group 2;
Database altered.
SYS> alter database clear logfile group 5;
Database altered.
SYS> alter database clear logfile group 3;
Database altered.
[~]$ls /u01/oradata2/hx/redo* -lh
-rw-r----- 1 oracle oinstall 21M 11月 24 15:22 /u01/oradata2/hx/redo02.log
-rw-r----- 1 oracle oinstall 101M 11月 24 15:22 /u01/oradata2/hx/redo03.log
-rw-r----- 1 oracle oinstall 21M 11月 24 15:22 /u01/oradata2/hx/redo05.log
SYS> recover database until cancel;
Media recovery complete.
SYS> alter database open resetlogs;
数据库异常关闭时,active状态和current状态的日志文件都有实例恢复需要的数据。如果有备份,可以利用备份和归档日志文件,实现不完全恢复,然后使用resetlogs打开数据库,这样就只失去丢失的日志文件中的内容。如果没有备份,那就只能在破坏一致性前提下强制重置日志。使用参数_allow_restlog_corruption可以使所有数据文件以最旧的scn打开,这需要system表空间拥有最旧的scn。
SYS> shutdown abort;
ORACLE instance shut down.
SYS> startup mount;
SYS> ho rm /u01/oradata2/hx/redo* -rf
SYS> alter system set "_allow_resetlogs_corruption"=true scope=spfile;
SYS> select CHECKPOINT_CHANGE# from v$datafile;
CHECKPOINT_CHANGE#
------------------
15480905
15480905
15480905
15480905
15480905
15480905
15480905
7 rows selected.
SYS> recover database using backup controlfile until cancel;
ORA-00279: 更改 15480905 (在 11/24/2013 15:40:46 生成) 对于线程 1 是必需的
ORA-00289: 建议: /u01/oradata2/archived/HEXEL/archivelog/2013_11_24/o1_mf_1_6_%u_.arc
ORA-00280: 更改 15480905 (用于线程 1) 在序列 #6 中
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误
ORA-01194: 文件 1 需要更多的恢复来保持一致性
ORA-01110: 数据文件 1: '/u01/oradata2/hx/system.dbf'
ORA-01112: 未启动介质恢复
观察日志:
ALTER DATABASE RECOVER database using backup controlfile until cancel
Media Recovery Start
Serial Media Recovery started
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 6 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 7 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: ALTER DATABASE RECOVER database using backup controlfile until cancel ...
Sun Nov 24 15:52:34 2013
ALTER DATABASE RECOVER CANCEL
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Sun Nov 24 15:52:39 2013
Checker run found 8 new persistent data failures
SYS> alter database open resetlogs;
Database altered.
继续观察日志(摘取):
Sun Nov 24 15:54:25 2013
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
下面是最后恢复到的change号,实际就是上次完成检查点的scn。
RESETLOGS after incomplete recovery UNTIL CHANGE 15480905
Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_ora_9537.trc:
ORA-00313: 无法打开日志组 2 (用于线程 1) 的成员
ORA-00312: 联机日志 2 线程 1: '/u01/oradata2/hx/redo02.log'
ORA-27037: 无法获得文件状态
………………………………………………………………………………………………………………
最后,关闭参数,由于数据库以不一致状态打开,略去很多一致性检验,可能会遭遇一些ora600,所以最好逻辑备份,重新建库。
SYS> alter system set "_allow_resetlogs_corruption"=false scope=spfile;
System altered.
SYS> startup force;