ORA-15080: synchronous I/O operation to a disk Failed WARNING: Failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986 Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc: ORA-00345: redo log write error block 509314 count 2023 ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log' ORA-15081: Failed to submit an I/O operation to a disk ORA-15081: Failed to submit an I/O operation to a disk
环形缓冲区和/ var / log / messages中存在相应的错误:
包含导入的驱动器阵列是使用300GB 10k磁盘的RAID 1 0中的10磁盘SAS阵列. RAID控制器是LSI MegaRAID SAS 9260-8i.通过MegaCLI报告没有磁盘或适配器错误.
>这是硬件问题吗?
>有什么方法可以排除故障吗? RAID控制器状态很好.磁盘和逻辑驱动器报告正常.
>这是Linux操作系统还是调优问题?我将尝试使用不同的I / O调度程序. CFQ是默认的.编辑:
其他调度程序已尝试使用相同的结果.此设置中有一个third-party (Vormetric) filesystem encryption module正在运行.删除它可以完成导入.所以现在我想知道这是模块中的缺陷还是它在LSI驱动程序中触发了一个坏的情况.
在导入期间,我们达到了14,000次写入IOPS.
在最近的尝试中,系统在控制台上完全停止以下操作.
冻结前的最后一个输出.
Jun 12 18:54:42 db1-test kernel: megasas: build_ld_io error,sge_count = 51 Jun 12 18:54:42 db1-test kernel: megasas: Err returned from build_and_issue_cmd Jun 12 18:54:42 db1-test kernel: megasas: build_ld_io error,sge_count = 51 Jun 12 18:54:42 db1-test kernel: megasas: Err returned from build_and_issue_cmd Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: timing out command,waited 360s Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: Unhandled error code Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: SCSI error: return code = 0xORA-15080: synchronous I/O operation to a disk Failed WARNING: Failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986 Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc: ORA-00345: redo log write error block 509314 count 2023 ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log' ORA-15081: Failed to submit an I/O operation to a disk ORA-15081: Failed to submit an I/O operation to a disk000ORA-15080: synchronous I/O operation to a disk Failed WARNING: Failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986 Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc: ORA-00345: redo log write error block 509314 count 2023 ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log' ORA-15081: Failed to submit an I/O operation to a disk ORA-15081: Failed to submit an I/O operation to a disk
Jun 12 18:54:42 db1-test kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
解决方法
首先,您需要使用截止时间I / O调度程序而不是CFQ.顾名思义,截止日期确保所有IOP及时完成.
从megaraid卡中抓取事件:
megacli -adpeventlog -getevents -f /tmp/megaraid-$(date +%F_%T) -aALL
检查磁盘上的SMART数据(您需要构建一个新的smartmontools才能使其工作):
# megacli -pdlist -a0 |grep 'Device Id' Device Id: 10 Device Id: 9 # smartctl -a /dev/sda -d megaraid,9 «…» # smartctl -a /dev/sda -d megaraid,10 «…»
如果一切正常,请继续尝试latest driver from LSI.
There is a third-party (Vormetric) filesystem encryption module running in this setup. Removing it allows the import to complete. So now I’m wondering if this is a deficiency in the module or if it is triggering a bad condition in the LSI driver.
Voretric模块可能会做一些不兼容的事情,是的.我首先要与他们讨论他们的模块如何在高负载下拧紧系统.