linux – 数据库导入时LSI RAID控制器错误 – 如何排除故障?

前端之家收集整理的这篇文章主要介绍了linux – 数据库导入时LSI RAID控制器错误 – 如何排除故障?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我们正在Oracle系统上运行数据库转储导入 – (RHEL 5.9,2.6.18-348.6.1.el5).导入未完成,最终错误输出
ORA-15080: synchronous I/O operation to a disk Failed
WARNING: Failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986
Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc:
ORA-00345: redo log write error block 509314 count 2023
ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log'
ORA-15081: Failed to submit an I/O operation to a disk
ORA-15081: Failed to submit an I/O operation to a disk

环形缓冲区和/ var / log / messages中存在相应的错误

包含导入的驱动器阵列是使用300GB 10k磁盘的RAID 1 0中的10磁盘SAS阵列. RAID控制器是LSI MegaRAID SAS 9260-8i.通过MegaCLI报告没有磁盘或适配器错误.

>这是硬件问题吗?
>有什么方法可以排除故障吗? RAID控制器状态很好.磁盘和逻辑驱动器报告正常.
>这是Linux操作系统还是调优问题?我将尝试使用不同的I / O调度程序. CFQ是默认的.

编辑:

其他调度程序已尝试使用相同的结果.此设置中有一个third-party (Vormetric) filesystem encryption module正在运行.删除它可以完成导入.所以现在我想知道这是模块中的缺陷还是它在LSI驱动程序中触发了一个坏的情况.

在导入期间,我们达到了14,000次写入IOPS.

在最近的尝试中,系统在控制台上完全停止以下操作.

冻结前的最后一个输出.

Jun 12 18:54:42 db1-test kernel: megasas: build_ld_io error,sge_count = 51 Jun 12 18:54:42 db1-test kernel: megasas: Err returned from build_and_issue_cmd Jun 12 18:54:42 db1-test kernel: megasas: build_ld_io error,sge_count = 51 Jun 12 18:54:42 db1-test kernel: megasas: Err returned from build_and_issue_cmd Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: timing out command,waited 360s Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: Unhandled error code Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: SCSI error: return code = 0x
ORA-15080: synchronous I/O operation to a disk Failed
WARNING: Failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986
Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc:
ORA-00345: redo log write error block 509314 count 2023
ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log'
ORA-15081: Failed to submit an I/O operation to a disk
ORA-15081: Failed to submit an I/O operation to a disk
ORA-15080: synchronous I/O operation to a disk Failed WARNING: Failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986 Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc: ORA-00345: redo log write error block 509314 count 2023 ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log' ORA-15081: Failed to submit an I/O operation to a disk ORA-15081: Failed to submit an I/O operation to a disk000
Jun 12 18:54:42 db1-test kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK

解决方法

最终 Sergey是对的 – 这是一个驱动程序问题.但是让我们先检查一下:

首先,您需要使用截止时间I / O调度程序而不是CFQ.顾名思义,截止日期确保所有IOP及时完成.

从megaraid卡中抓取事件:

megacli -adpeventlog -getevents -f /tmp/megaraid-$(date +%F_%T) -aALL

检查磁盘上的SMART数据(您需要构建一个新的smartmontools才能使其工作):

# megacli -pdlist -a0 |grep 'Device Id'
Device Id: 10
Device Id: 9

# smartctl -a /dev/sda -d megaraid,9
«…»
# smartctl -a /dev/sda -d megaraid,10
«…»

如果一切正常,请继续尝试latest driver from LSI.

There is a third-party (Vormetric) filesystem encryption module running in this setup. Removing it allows the import to complete. So now I’m wondering if this is a deficiency in the module or if it is triggering a bad condition in the LSI driver.

Voretric模块可能会做一些不兼容的事情,是的.我首先要与他们讨论他们的模块如何在高负载下拧紧系统.

猜你在找的Linux相关文章