1,给安徽的同事安装了一个生产Oracle数据库,最近一段时间 总是在2点-10点之间出现数据库连不上的情况,具体tomcat应用日志如下:
08:58:09 ERROR c.d.web.controller.DBAppController - 查询更新版本请求异常org.springframework.dao.DataAccessResourceFailureException:
### Error querying database. Cause: java.sql.sqlException: Io exception: Connection timed out
### The error may exist in file [/usr/local/tomcat/xx/WEB-INF/classes/mapper/DBAppMapper.xml]
### The error may involve com.dabay.web.dao.DBAppDao.selectProperties-Inline
### The error occurred while setting parameters
### sql: SELECT KEY,VALUE,DESCRIPTION FROM APP_PROPERTIES WHERE KEY=? AND DATA_STATUS!='9'
### Cause: java.sql.sqlException: Io exception: Connection timed out
; sql []; Io exception: Connection timed out; nested exception is java.sql.sqlException: Io exception: Connection timed out
08:58:09 ERROR c.d.web.controller.DBAppController - DGW_0922084243406:查询轮播图请求异常org.springframework.dao.DataAccessResourceFailureException:
### Error querying database. Cause: java.sql.sqlException: Io exception: Connection timed out
### The error may exist in file [/usr/local/tomcat/xx/WEB-INF/classes/mapper/DBAppMapper.xml]
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### sql: SELECT TITLE,URL,REMARKS,PNGURL FROM INFO_BANNER WHERE DATA_STATUS!='9' AND ROWNUM<6 ORDER BY ORDERDESC asc,CREATE_TIME desc
### Cause: java.sql.sqlException: Io exception: Connection timed out
; sql []; Io exception: Connection timed out; nested exception is java.sql.sqlException: Io exception: Connection timed out
2,想到排查ORACLE数据库是否正常,百度到了如下三个结果
@H_502_41@一:查看数据库监听是否启动 lsnrctlstatus 二:查看数据库运行状态,是否open selectinstance_name,statusfromv$instance; 三:查看alert日志,查看是否有错误信息 sql>showparameterbackground_dump NAME TYPE ---------------------------------------------------------- VALUE ------------------------------ background_dump_dest string /u01/app/oracle/diag/rdbms/just_test/test/trace 是的,有alert日志,接下来查看alert日志,如下 db_recovery_file_dest_sizeof3882MBis45.88%used.Thisisa user-specifiedlimitontheamountofspacethatwillbeusedbythis databaseforrecovery-relatedfiles,anddoesnotreflecttheamountof spaceavailableintheunderlyingfilesystemorASMdiskgroup. FriSep2202:01:052017 StartingbackgroundprocessCJQ0 FriSep2202:01:052017 CJQ0startedwithpid=22,OSid=6797 FriSep2202:06:052017 StartingbackgroundprocessSMCO FriSep2202:06:052017 SMCOstartedwithpid=32,OSid=7393 FriSep2204:21:102017 Thread1cannotallocatenewlog,sequence221 Privatestrandflushnotcomplete Currentlog#1seq#220mem#0:/u01/app/oracle/oradata/hsrs_pro/redo01.log Thread1advancedtologsequence221(LGWRswitch) Currentlog#2seq#221mem#0:/u01/app/oracle/oradata/hsrs_pro/redo02.log FriSep2209:00:352017 先看到了Thread1cannotallocatenewlog,sequence221,于是又百度了一下,找到了如下结果 (摘自http://blog.csdn.net/zonelan/article/details/7613519) 这个实际上是个比较常见的错误。通常来说是因为在日志被写满时会切换日志组,这个时候会触发一次checkpoint,DBWR会把内存中的脏块往数据文件中写,只要没写结束就不会释放这个日志组。如果归档模式被开启的话,还会伴随着ARCH写归档的过程。如果redolog产生的过快,当CPK或归档还没完成,LGWR已经把其余的日志组写满,又要往当前的日志组里面写redolog的时候,这个时候就会发生冲突,数据库就会被挂起。并且一直会往alert.log中写类似上面的错误信息。 于是有了以下的操作: sql>selectgroup#,sequence#,bytes,members,statusfromv$log;#查看每组日志的状态 GROUP#SEQUENCE# BYTESMEMBERSSTATUS ------------------------------------------------------------------------ 1 220 52428800 1INACTIVE##空闲的 2 221 52428800 1CURRENT##当前的 3 219 52428800 1INACTIVE##空闲的 sql>alterdatabaseaddlogfilegroup4('/u01/app/oracle/oradata/xx/redo04.log')size500M;增加日志组 Databasealtered. sql>alterdatabaseaddlogfilegroup5('/u01/app/oracle/oradata/xx/redo05.log')size500M; Databasealtered. sql>altersystemswitchlogfile;切换日志组sql>selectgroup#,statusfromv$log;#查看状态发现有了区别 GROUP#SEQUENCE#BYTESMEMBERSSTATUS ------------------------------------------------------------------------ 1220524288001INACTIVE 2221524288001ACTIVE 3219524288001INACTIVE 42225242880001ACTIVE 52235242880001CURRENT 经理过如上操作,突然看到了alert日志中有一个recovery并且tomcat应用日志中也有recovery这个单词,于是又百度了一番。分别执行了如下命令(不懂什么意思) sql>select*fromv$flash_recovery_area_usage; sql>select*fromv$recovery_file_dest;查看recovery的实际大小: NAME -------------------------------------------------------------------------------- SPACE_LIMITSPACE_USEDSPACE_RECLAIMABLENUMBER_OF_FILES ----------------------------------------------------- /u01/app/oracle/recovery_area 40705720323926630400205906739241 sql>select*fromv$flash_recovery_area_usage 2; FILE_TYPEPERCENT_SPACE_USED ---------------------------------------------------------- PERCENT_SPACE_RECLAIMABLENUMBER_OF_FILES ---------------------------------------- CONTROLFILE0 00 REDOLOG0 00 ARCHIVEDLOG0 00 FILE_TYPEPERCENT_SPACE_USED ---------------------------------------------------------- PERCENT_SPACE_RECLAIMABLENUMBER_OF_FILES ---------------------------------------- BACKUPPIECE53.96 50.5837 IMAGECOPY42.5 04 FLASHBACKLOG0 00 FILE_TYPEPERCENT_SPACE_USED ---------------------------------------------------------- PERCENT_SPACE_RECLAIMABLENUMBER_OF_FILES ---------------------------------------- FOREIGNARCHIVEDLOG0 00 7rowsselected. sql>showparameterdb_recovery_file_dest_size;最后发现这个才是我要找的查看当前recovery的限制大小 NAMETYPE ---------------------------------------------------------- VALUE ------------------------------ db_recovery_file_dest_sizebiginteger 3882M sql>altersystemsetdb_recovery_file_dest_size=5882Mscope=spfile;改大一点? Systemaltered. sql>showparameterdb_recovery_file_dest_size;但是好像并没有用,还是这么大 NAMETYPE ---------------------------------------------------------- VALUE ------------------------------ db_recovery_file_dest_sizebiginteger 3882M 好吧,仍然百度:)执行了如下命令好像管用了 sql>altersystemsetdb_recovery_file_dest_size=10G; Systemaltered. sql>showparameterdb_recovery_file_dest_size; NAMETYPE ---------------------------------------------------------- VALUE ------------------------------ db_recovery_file_dest_sizebiginteger 10G
先观察看看吧~应用日志10点好像没有超时报错了~~ 完
补充一下,下面这俩货的区别
scope=both scope=spfile Oraclespfile就是动态参数文件,里面设置了Oracle的各种参数。所谓的动态, 就是说你可以在不关闭数据库的情况下,更改数据库参数,记录在spfile里面。更改参数 的时候,有4种scope选项,scope就是范围。 scope=spfile仅仅更改spfile里面的记载,不更改内存,也就是不立即生效,而是等 下次数据库启动生效。 有一些参数只允许用这种方法更改,scope=memory仅仅更改内存,不改spfile。也就是下次 启动就失效了 scope=both内存和spfile都更改,不指定scope参数,等同于scope=both。