Oracle 11g新特性direct path read引发的系统停运故障诊断处理2

前端之家收集整理的这篇文章主要介绍了Oracle 11g新特性direct path read引发的系统停运故障诊断处理2前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。

1、故障现象

(1)一个业务系统输入用户名与密码后无法进入首页,表现为一直在运行等待,运行缓慢

(2)整个系统无法正常使用,接近停运状态

2、 故障解决方法

调整数据库参数alter system setevent='10949 trace name context forever,level 1'来关闭direct path read”(直接路径读)特性,使sql语句可以从缓存中查询数据,达到降低I/O读取量,使全表扫描的数据从缓存中读取,加快sql语句运行速度的目的。

3、 故障原因总结

(1)由于部分sql语句设计或编写效率低下,以及表缺少适应的索引,导致sql语句需要全表扫描,在表较小时,ORACLE数据库将数据读取到缓存后,后续虽然是全表扫描,但均是从缓存中读取,所以问题未体现出来

(2)在表的大小不断增大后,根据ORACLE 11g数据库的算法,在表达到db_cache_size(GB)的2%(默认值)以后,认为采用直接路径读(跳过缓存,直接从磁盘文件中全扫描读取)

(3)DX_T_XXVIATE表大小为1GB,在大量反复以direct pathread磁盘重复读取的情况下,消耗大量的I/O资源,将服务器I/O几乎耗尽

(4)在主机I/O耗尽的情况下,系统的读、写,均几乎处于瘫痪状态

(5)在关闭ORACLE 11G数据库的direct path read新特性功能后,读取方式恢复到从缓存中读取,磁盘读降到“0”,系统恢复正常

4、 改进建议

(1)优化访问DX_T_XXVIATE 相关的sql语句与设计合适的索引,避免大表全表扫描。

5、 故障原因分析

5.1 7月11日故障时段数据库服务器I/O等待严重


5.2 7月11日故障时段磁盘响应非常缓慢

5.3 对比故障当日(7月11日)与上周的I/O磁盘读取量,比上周大十倍

故障前、中、后磁盘读取量对比图:

上面高的蓝色线,是故障当日(2016年7月1日,周一)的磁盘Disk Read KB/s指标线

5.4 高度消耗I/O的sql语句。

sql_ID为b8m6wy846qgbksql语句,physical reads鹤立鸡群,可见此sql语句的影响最为严重。

5.5 全表扫描单次超过6秒的表与其sql语句统计

统计汇总时间:08:00—10:00

sql_ID

target

出现次数

b8m6wy846qgbk

DX_T_XXVIATE

22

6ut0tkd7hvagc

DX_T_XXVIATE

6

1wq3kmd62x4x8

DX_T_XXVIATE

2

b5ma9qfs62bdm

DX_T_XXVIATE

2

f5hujtmsh3pd1

DX_T_XXVIATE

2

65wk2453uk930

DX_T_XXVIATE

2

6yxk40mn2rnzj

DX_T_XXVIATE

2

8h1bzhfm6purf

DX_T_XXVIATE

2

a1r7xr03naax2

DX_T_XXVIATE

2

6yn7ccy6bbw4h

DX_T_XXVIATE

2

6ar757tgxd1sf

DX_T_XXVIATE

2

gu8cjd8c35kys

DX_T_XXVIATE

2

41j3xyugbn6ck

DX_T_XXVIATE

2

98jagc6q2zr4x

DX_T_XXVIATE

2

1a4rwwt36w33d

DX_T_XXVIATE

2

b4bq40ybpt7a8

DX_T_XXVIATE

1

30za43khyxz6t

DX_T_XXVIATE

1

66j38hs9a0yzh

DX_T_XXVIATE

1

25djwmppxq9qt

DX_T_XXVIATE

1

84pdvmjyb9103

DX_T_XXVIATE

1

b69frxfps0k4t

DX_T_XXVIATE

1

统计时间:08:00—10:00 单次扫描超过6秒的sql语句及时长详细清单:

sql_ID

target

elapsed_seconds

start_time

f5hujtmsh3pd1

DX_T_XXVIATE

24

2016/7/11 10:02

f5hujtmsh3pd1

DX_T_XXVIATE

26

2016/7/11 10:02

66j38hs9a0yzh

DX_T_XXVIATE

28

2016/7/11 9:51

b8m6wy846qgbk

DX_T_XXVIATE

19

2016/7/11 9:48

b4bq40ybpt7a8

DX_T_XXVIATE

30

2016/7/11 9:41

6ut0tkd7hvagc

DX_T_XXVIATE

23

2016/7/11 9:38

b8m6wy846qgbk

DX_T_XXVIATE

27

2016/7/11 9:37

b8m6wy846qgbk

DX_T_XXVIATE

25

2016/7/11 9:35

b8m6wy846qgbk

DX_T_XXVIATE

28

2016/7/11 9:35

b8m6wy846qgbk

DX_T_XXVIATE

37

2016/7/11 9:34

65wk2453uk930

DX_T_XXVIATE

38

2016/7/11 9:34

8h1bzhfm6purf

DX_T_XXVIATE

31

2016/7/11 9:33

65wk2453uk930

DX_T_XXVIATE

34

2016/7/11 9:33

8h1bzhfm6purf

DX_T_XXVIATE

30

2016/7/11 9:33

b8m6wy846qgbk

DX_T_XXVIATE

32

2016/7/11 9:32

b8m6wy846qgbk

DX_T_XXVIATE

33

2016/7/11 9:32

b8m6wy846qgbk

DX_T_XXVIATE

30

2016/7/11 9:32

b8m6wy846qgbk

DX_T_XXVIATE

31

2016/7/11 9:32

b8m6wy846qgbk

DX_T_XXVIATE

31

2016/7/11 9:31

b8m6wy846qgbk

DX_T_XXVIATE

45

2016/7/11 9:30

84pdvmjyb9103

DX_T_XXVIATE

39

2016/7/11 9:29

b8m6wy846qgbk

DX_T_XXVIATE

39

2016/7/11 9:25

6ut0tkd7hvagc

DX_T_XXVIATE

33

2016/7/11 9:24

b8m6wy846qgbk

DX_T_XXVIATE

36

2016/7/11 9:21

6yxk40mn2rnzj

DX_T_XXVIATE

40

2016/7/11 9:21

6yxk40mn2rnzj

DX_T_XXVIATE

31

2016/7/11 9:20

b5ma9qfs62bdm

DX_T_XXVIATE

36

2016/7/11 9:19

b5ma9qfs62bdm

DX_T_XXVIATE

39

2016/7/11 9:19

b8m6wy846qgbk

DX_T_XXVIATE

98

2016/7/11 9:10

6ut0tkd7hvagc

DX_T_XXVIATE

41

2016/7/11 9:09

6ut0tkd7hvagc

DX_T_XXVIATE

41

2016/7/11 9:08

gu8cjd8c35kys

DX_T_XXVIATE

48

2016/7/11 9:06

gu8cjd8c35kys

DX_T_XXVIATE

47

2016/7/11 9:05

30za43khyxz6t

DX_T_XXVIATE

96

2016/7/11 9:00

b69frxfps0k4t

DX_T_XXVIATE

77

2016/7/11 8:59

6ut0tkd7hvagc

DX_T_XXVIATE

26

2016/7/11 8:48

98jagc6q2zr4x

DX_T_XXVIATE

23

2016/7/11 8:46

98jagc6q2zr4x

DX_T_XXVIATE

22

2016/7/11 8:46

6ar757tgxd1sf

DX_T_XXVIATE

28

2016/7/11 8:44

6ar757tgxd1sf

DX_T_XXVIATE

23

2016/7/11 8:44

6ut0tkd7hvagc

DX_T_XXVIATE

22

2016/7/11 8:43

a1r7xr03naax2

DX_T_XXVIATE

19

2016/7/11 8:42

a1r7xr03naax2

DX_T_XXVIATE

23

2016/7/11 8:42

b8m6wy846qgbk

DX_T_XXVIATE

25

2016/7/11 8:40

1wq3kmd62x4x8

DX_T_XXVIATE

23

2016/7/11 8:39

1wq3kmd62x4x8

DX_T_XXVIATE

21

2016/7/11 8:39

b8m6wy846qgbk

DX_T_XXVIATE

45

2016/7/11 8:30

b8m6wy846qgbk

DX_T_XXVIATE

41

2016/7/11 8:30

b8m6wy846qgbk

DX_T_XXVIATE

27

2016/7/11 8:27

41j3xyugbn6ck

DX_T_XXVIATE

29

2016/7/11 8:27

41j3xyugbn6ck

DX_T_XXVIATE

48

2016/7/11 8:26

1a4rwwt36w33d

DX_T_XXVIATE

53

2016/7/11 8:21

1a4rwwt36w33d

DX_T_XXVIATE

54

2016/7/11 8:20

b8m6wy846qgbk

DX_T_XXVIATE

42

2016/7/11 8:19

b8m6wy846qgbk

DX_T_XXVIATE

52

2016/7/11 8:18

b8m6wy846qgbk

DX_T_XXVIATE

54

2016/7/11 8:16

6yn7ccy6bbw4h

DX_T_XXVIATE

118

2016/7/11 8:14

6yn7ccy6bbw4h

DX_T_XXVIATE

81

2016/7/11 8:12

b8m6wy846qgbk

DX_T_XXVIATE

84

2016/7/11 8:10

25djwmppxq9qt

DX_T_XXVIATE

49

2016/7/11 8:09

上面数据显示,08:00—10:00统计时间内,所有全表扫描超过6秒的表,全部是DX_T_XXVIATE这一张表,涉及到的sql语句有60多条,执行次数最多的数sql_ID为b8m6wy846qgbk的语句。

5.6 全表扫描最严重sql语句故障前、后、故障解决后磁盘读取数量对比

5.6.1 7月11日以前系统运行正常的情况下sql_ID为b8m6wy846qgbk的语句执行统计信息

--执行统计信息(buffer get很大,但是disk reads为0,判定数据基本从buffer中读取):

--执行计划(对DT_T_OBVIATE全表扫描,预计时间为5分钟30秒):


5.6.2 7月11日故障当日sql_ID为b8m6wy846qgbk的语句执行统计信息

--执行统计信息(buffer get和disk reads都一样的巨大,基本判定每次数据全是从磁盘读取到BUFFER):

--执行计划(对DT_T_OBVIATE全表扫描,预计时间为5分钟30秒,从执行计划的PHV和plan均看出执行计划在系统故障时与正常时,是保持一致的):


5.6.3 故障解决后(取7月12日数据)sql_ID为b8m6wy846qgbk的语句执行统计信息

--执行统计信息(故障解决后,PVH值不变,Disk Reads又恢复到了故障前的“0”,说明每次执行数据又是从BUFFER中读取的):


5.7 等待事件变化识别数据读取方式变化比较

sql_ID

日期

direct path read
发生次数

b8m6wy846qgbk

2016/7/12

0

b8m6wy846qgbk

2016/7/11

4141

b8m6wy846qgbk

2016/7/10

663

b8m6wy846qgbk

2016/7/9

0

b8m6wy846qgbk

2016/7/8

0

b8m6wy846qgbk

2016/7/7

0

看来,系统实际上在2016年7月10日(周日),sql语句的数据读取方式就发生了少量的direct path read,系统实际上已经处于间歇式缓慢状态,到了2016年7月11日(周一),问题特别严重,约99%左右的执行是direct path read,导致I/O耗尽,系统瘫痪。


本文作者:黎俊杰(网名:踩点),从事”系统架构、操作系统、存储设备、数据库、中间件、应用程序“六个层面系统性的性能优化工作

欢迎加入 系统性能优化专业群,共同探讨性能优化技术。群号:258187244

猜你在找的Oracle相关文章