1、故障现象
@H_403_2@(1)一个业务系统输入用户名与密码后无法进入首页,表现为一直在运行等待,运行缓慢 @H_403_2@(2)整个系统无法正常使用,接近停运状态2、 故障解决方法
@H_403_2@调整数据库参数alter system setevent='10949 trace name context forever,level 1'来关闭“direct path read”(直接路径读)特性,使sql语句可以从缓存中查询数据,达到降低I/O读取量,使全表扫描的数据从缓存中读取,加快sql语句运行速度的目的。3、 故障原因总结
(1)由于部分sql语句设计或编写效率低下,以及表缺少适应的索引,导致sql语句需要全表扫描,在表较小时,ORACLE数据库将数据读取到缓存后,后续虽然是全表扫描,但均是从缓存中读取,所以问题未体现出来
(2)在表的大小不断增大后,根据ORACLE 11g数据库的算法,在表达到db_cache_size(GB)的2%(默认值)以后,认为采用直接路径读(跳过缓存,直接从磁盘文件中全扫描读取)
(3)DX_T_XXVIATE表大小为1GB,在大量反复以direct pathread磁盘重复读取的情况下,消耗大量的I/O资源,将服务器I/O几乎耗尽
(4)在主机I/O耗尽的情况下,系统的读、写,均几乎处于瘫痪状态
@H_403_2@(5)在关闭ORACLE 11G数据库的direct path read新特性功能后,读取方式恢复到从缓存中读取,磁盘读降到“0”,系统恢复正常4、 改进建议
@H_403_2@(1)优化访问DX_T_XXVIATE 相关的sql语句与设计合适的索引,避免大表全表扫描。5、 故障原因分析
5.1 7月11日故障时段数据库服务器I/O等待严重
5.2 7月11日故障时段磁盘响应非常缓慢
5.3 对比故障当日(7月11日)与上周的I/O磁盘读取量,比上周大十倍
@H_403_2@故障前、中、后磁盘读取量对比图:
5.4 高度消耗I/O的sql语句。
5.5 全表扫描单次超过6秒的表与其sql语句统计。
@H_403_2@统计汇总时间:08:00—10:00@H_403_2@sql_ID | @H_403_2@target | @H_403_2@出现次数 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 22 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 6 |
@H_403_2@1wq3kmd62x4x8 | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@b5ma9qfs62bdm | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@f5hujtmsh3pd1 | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@65wk2453uk930 | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@6yxk40mn2rnzj | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@8h1bzhfm6purf | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@a1r7xr03naax2 | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@6yn7ccy6bbw4h | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@6ar757tgxd1sf | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@gu8cjd8c35kys | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@41j3xyugbn6ck | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@98jagc6q2zr4x | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@1a4rwwt36w33d | @H_403_2@DX_T_XXVIATE | 2 |
@H_403_2@b4bq40ybpt7a8 | @H_403_2@DX_T_XXVIATE | 1 |
@H_403_2@30za43khyxz6t | @H_403_2@DX_T_XXVIATE | 1 |
@H_403_2@66j38hs9a0yzh | @H_403_2@DX_T_XXVIATE | 1 |
@H_403_2@25djwmppxq9qt | @H_403_2@DX_T_XXVIATE | 1 |
@H_403_2@84pdvmjyb9103 | @H_403_2@DX_T_XXVIATE | 1 |
@H_403_2@b69frxfps0k4t | @H_403_2@DX_T_XXVIATE | 1 |
@H_403_2@sql_ID | @H_403_2@target | @H_403_2@elapsed_seconds | @H_403_2@start_time |
@H_403_2@f5hujtmsh3pd1 | @H_403_2@DX_T_XXVIATE | 24 |
2016/7/11 10:02 |
@H_403_2@f5hujtmsh3pd1 | @H_403_2@DX_T_XXVIATE | 26 |
2016/7/11 10:02 |
@H_403_2@66j38hs9a0yzh | @H_403_2@DX_T_XXVIATE | 28 |
2016/7/11 9:51 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 19 |
2016/7/11 9:48 |
@H_403_2@b4bq40ybpt7a8 | @H_403_2@DX_T_XXVIATE | 30 |
2016/7/11 9:41 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 23 |
2016/7/11 9:38 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 27 |
2016/7/11 9:37 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 25 |
2016/7/11 9:35 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 28 |
2016/7/11 9:35 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 37 |
2016/7/11 9:34 |
@H_403_2@65wk2453uk930 | @H_403_2@DX_T_XXVIATE | 38 |
2016/7/11 9:34 |
@H_403_2@8h1bzhfm6purf | @H_403_2@DX_T_XXVIATE | 31 |
2016/7/11 9:33 |
@H_403_2@65wk2453uk930 | @H_403_2@DX_T_XXVIATE | 34 |
2016/7/11 9:33 |
@H_403_2@8h1bzhfm6purf | @H_403_2@DX_T_XXVIATE | 30 |
2016/7/11 9:33 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 32 |
2016/7/11 9:32 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 33 |
2016/7/11 9:32 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 30 |
2016/7/11 9:32 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 31 |
2016/7/11 9:32 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 31 |
2016/7/11 9:31 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 45 |
2016/7/11 9:30 |
@H_403_2@84pdvmjyb9103 | @H_403_2@DX_T_XXVIATE | 39 |
2016/7/11 9:29 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 39 |
2016/7/11 9:25 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 33 |
2016/7/11 9:24 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 36 |
2016/7/11 9:21 |
@H_403_2@6yxk40mn2rnzj | @H_403_2@DX_T_XXVIATE | 40 |
2016/7/11 9:21 |
@H_403_2@6yxk40mn2rnzj | @H_403_2@DX_T_XXVIATE | 31 |
2016/7/11 9:20 |
@H_403_2@b5ma9qfs62bdm | @H_403_2@DX_T_XXVIATE | 36 |
2016/7/11 9:19 |
@H_403_2@b5ma9qfs62bdm | @H_403_2@DX_T_XXVIATE | 39 |
2016/7/11 9:19 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 98 |
2016/7/11 9:10 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 41 |
2016/7/11 9:09 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 41 |
2016/7/11 9:08 |
@H_403_2@gu8cjd8c35kys | @H_403_2@DX_T_XXVIATE | 48 |
2016/7/11 9:06 |
@H_403_2@gu8cjd8c35kys | @H_403_2@DX_T_XXVIATE | 47 |
2016/7/11 9:05 |
@H_403_2@30za43khyxz6t | @H_403_2@DX_T_XXVIATE | 96 |
2016/7/11 9:00 |
@H_403_2@b69frxfps0k4t | @H_403_2@DX_T_XXVIATE | 77 |
2016/7/11 8:59 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 26 |
2016/7/11 8:48 |
@H_403_2@98jagc6q2zr4x | @H_403_2@DX_T_XXVIATE | 23 |
2016/7/11 8:46 |
@H_403_2@98jagc6q2zr4x | @H_403_2@DX_T_XXVIATE | 22 |
2016/7/11 8:46 |
@H_403_2@6ar757tgxd1sf | @H_403_2@DX_T_XXVIATE | 28 |
2016/7/11 8:44 |
@H_403_2@6ar757tgxd1sf | @H_403_2@DX_T_XXVIATE | 23 |
2016/7/11 8:44 |
@H_403_2@6ut0tkd7hvagc | @H_403_2@DX_T_XXVIATE | 22 |
2016/7/11 8:43 |
@H_403_2@a1r7xr03naax2 | @H_403_2@DX_T_XXVIATE | 19 |
2016/7/11 8:42 |
@H_403_2@a1r7xr03naax2 | @H_403_2@DX_T_XXVIATE | 23 |
2016/7/11 8:42 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 25 |
2016/7/11 8:40 |
@H_403_2@1wq3kmd62x4x8 | @H_403_2@DX_T_XXVIATE | 23 |
2016/7/11 8:39 |
@H_403_2@1wq3kmd62x4x8 | @H_403_2@DX_T_XXVIATE | 21 |
2016/7/11 8:39 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 45 |
2016/7/11 8:30 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 41 |
2016/7/11 8:30 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 27 |
2016/7/11 8:27 |
@H_403_2@41j3xyugbn6ck | @H_403_2@DX_T_XXVIATE | 29 |
2016/7/11 8:27 |
@H_403_2@41j3xyugbn6ck | @H_403_2@DX_T_XXVIATE | 48 |
2016/7/11 8:26 |
@H_403_2@1a4rwwt36w33d | @H_403_2@DX_T_XXVIATE | 53 |
2016/7/11 8:21 |
@H_403_2@1a4rwwt36w33d | @H_403_2@DX_T_XXVIATE | 54 |
2016/7/11 8:20 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 42 |
2016/7/11 8:19 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 52 |
2016/7/11 8:18 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 54 |
2016/7/11 8:16 |
@H_403_2@6yn7ccy6bbw4h | @H_403_2@DX_T_XXVIATE | 118 |
2016/7/11 8:14 |
@H_403_2@6yn7ccy6bbw4h | @H_403_2@DX_T_XXVIATE | 81 |
2016/7/11 8:12 |
@H_403_2@b8m6wy846qgbk | @H_403_2@DX_T_XXVIATE | 84 |
2016/7/11 8:10 |
@H_403_2@25djwmppxq9qt | @H_403_2@DX_T_XXVIATE | 49 |
2016/7/11 8:09 |
5.6 全表扫描最严重sql语句故障前、后、故障解决后磁盘读取数量对比
5.6.1 7月11日以前系统运行正常的情况下sql_ID为b8m6wy846qgbk的语句执行统计信息
@H_403_2@--执行统计信息(buffer get很大,但是disk reads为0,判定数据基本从buffer中读取): @H_403_2@--执行计划(对DT_T_OBVIATE全表扫描,预计时间为5分钟30秒): @H_403_2@ |
5.6.2 7月11日故障当日sql_ID为b8m6wy846qgbk的语句执行统计信息
@H_403_2@--执行统计信息(buffer get和disk reads都一样的巨大,基本判定每次数据全是从磁盘读取到BUFFER): @H_403_2@--执行计划(对DT_T_OBVIATE全表扫描,预计时间为5分钟30秒,从执行计划的PHV和plan均看出执行计划在系统故障时与正常时,是保持一致的): @H_403_2@ |
5.6.3 故障解决后(取7月12日数据)sql_ID为b8m6wy846qgbk的语句执行统计信息
@H_403_2@--执行统计信息(故障解决后,PVH值不变,Disk Reads又恢复到了故障前的“0”,说明每次执行数据又是从BUFFER中读取的): @H_403_2@ |
5.7 等待事件变化识别数据读取方式变化比较
sql_ID |
日期 |
|
@H_403_2@b8m6wy846qgbk | 2016/7/12 |
0 |
@H_403_2@b8m6wy846qgbk | 2016/7/11 |
4141 |
@H_403_2@b8m6wy846qgbk | 2016/7/10 |
663 |
@H_403_2@b8m6wy846qgbk | 2016/7/9 |
0 |
@H_403_2@b8m6wy846qgbk | 2016/7/8 |
0 |
@H_403_2@b8m6wy846qgbk | 2016/7/7 |
0 |
@H_403_2@本文作者:黎俊杰(网名:踩点),从事”系统架构、操作系统、存储设备、数据库、中间件、应用程序“六个层面系统性的性能优化工作 @H_403_2@欢迎加入 系统性能优化专业群,共同探讨性能优化技术。群号:258187244