以前从来没有遇到过如此的问题,特别是gcs drm freeze in enter server mode等待事件,是DRM特性。在影响用户的角度分析,执行频繁的insert语句很慢,我想原因与两个,一是插入的时候需要读取待插入数据块,二是需要读取待插入的索引块,还有索引的分裂。数据块在节点之间来回传输导致等待。
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time | Wait Class |
---|---|---|---|---|---|
gc buffer busy acquire | 266,461 | 225.8K | 847 | 33.1 | Cluster |
DB cpu | 71.6K | 10.5 | |||
gc cr block congested | 30,065 | 57K | 1897 | 8.4 | Cluster |
gcs drm freeze in enter server mode | 101,845 | 50.3K | 493 | 7.4 | Other |
enq: TX - row lock contention | 9,538 | 49.4K | 5179 | 7.2 | Application |
gc current block congested | 40,354 | 43.9K | 1087 | 6.4 | Cluster |
gc current grant busy | 958,406 | 35.8K | 37 | 5.3 | Cluster |
gc cr block 2-way | 531,320 | 26.1K | 49 | 3.8 | Cluster |
gc current block 2-way | 864,948 | 12.4K | 14 | 1.8 | Cluster |
gc buffer busy release | 24,042 | 11.8K | 491 | 1.7 | Cluster |
由于需要等几天才能重启数据库,所以临时的处理方案是将weblogic指向一个实例,从效果上来看,非常不错。
在10g中,可以采用如下方式禁用DRM(当然你也可以只禁用其中的一个模块object affinity或者undo affinity)
--disable object affinity
alter system set "_gc_affinity_time"=0 scope=spfile ;
--disable undo affinity
alter system set "_gc_undo_affinity"=FALSE scope=spfile;
然后同时重启所有实例生效。
如果暂时无法重启实例,可以使用如下命令“事实上”禁用DRM:(以下两个参数可以动态调整)
alter system set “_gc_affinity_limit”=10000000 sid='*';
alter system set “_gc_affinity_minimum”=10000000 sid='*';
在11g中,同样可以使用如下方式禁用DRM,强烈建议关闭:
alter system set "_gc_policy_time"=0 scope=spfile;
然后同时重启所有实例生效。如果不想完全禁用DRM,但是需要禁用read-mostly locking或者reader bypass的机制。可以使用如下命令:
--disable read-mostly locking
alter system set "_gc_read_mostly_locking"=false scope=spfile sid='*';
--disable reader-bypass
alter system set "_gc_bypass_readers"=false scope=spfile sid='*';
Top 10 Foreground Events by Total Wait Time
DB cpu | |
21.1K | 79.0 | |
|
---|---|---|---|---|---|
db file sequential read | 201,429 | 1541.2 | 8 | 5.8 | User I/O |
db file scattered read | 121,867 | 1017.8 | 8 | 3.8 | User I/O |
direct path read temp | 175,993 | 494.2 | 3 | 1.9 | User I/O |
sql*Net message from dblink | 69,387 | 458.1 | 7 | 1.7 | Network |
enq: TX - row lock contention | 108 | 389.1 | 3602 | 1.5 | Application |
log file sync | 112,847 | 356.2 | 3 | 1.3 | Commit |
read by other session | 40,569 | 222.2 | 5 | .8 | User I/O |
db file parallel read | 11,173 | 157.7 | .6 | User I/O | |
gc current block busy | 25,659 | 155 | 6 | .6 | Cluster |
本次出问题的数据库SGA设置为1T,所以安装的时候需要初始化一些特殊的参数。
Best Practices and Recommendations for RAC databases with SGA size over 100GB (文档 ID 1619155.1)
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.3 and laterInformation in this document applies to any platform.
PURPOSE
The goal of this note is to provide best practices and recommendations to users of Oracle Real Application Clusters (RAC) databases using very large SGA (e.g. 100GB) per instance (note that RAC assumes homogeneously sized SGAs across the cluster). This document is compiled and maintained based on Oracle's experience with its global RAC customer base.
This is not meant to replace or supplant the Oracle Documentation set,but rather,it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read,understood,and referenced to provide answers to any questions that may not be clearly addressed by this note.
All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system,application,and business environment.
SCOPE
This article applies to all new and existing RAC implementations.
This is for RAC databases only as most of the parameters listed in here are for RAC Database only.
DETAILS
Note that the recommendations presented in this note are a result of the experience from working on databases with SGA of 1 TB and 2.6 TB.
Also,the databases with SGA of 100GB and 300GB also benefited from the recommendations
init.ora parameters:
Setting this will prevent some timeouts during reconfiguration and DRM. It's a static parameter and rolling restart is supported.
b. Set _ksmg_granule_size to 134217728
Setting this will cut down the time needed to locate the resource for a data block.It's a static parameter and rolling restart is supported.
c. Set shared_pool_size to 15% or larger of the total SGA size.
For example,if SGA size is 1 TB,the shared pool size should be at least 150 GB.It's a dynamic parameter.
d. Set _gc_policy_minimum to 15000
There is no need to set _gc_policy_minimum if DRM is disabled by setting _gc_policy_time = 0._gc_policy_minimumis a dynamic parameter,_gc_policy_time is a static parameter and rolling restart is not supported. To disable DRM,instead of_gc_policy_time,_lm_drm_disable should be used as it's dynamic.
e. Set _lm_tickets to 5000
Default is 1000. Allocating more tickets (used for sending messages) avoids issues where we ran out of tickets during the reconfiguration. It's a static parameter and rolling restart is supported. When increasing the parameter,rolling restart is fine but a cold restart can be necessary when decreasing.
f. Set gcs_server_processes to the twice the default number of lms processes that are allocated.
The default number of lms processes depends on the number of cpus/cores that the server has,
so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide
for the default number of lms processes for your server. Please make sure that the total number of lms processes
of all databases on the server is less than the total number ofcpus/cores on the server. Please refer to the Document 558185.1 It's a static parameter and rolling restart is supported.