linux – 如何解释这个smartctl(smartmon)数据

我们有一台 Linux服务器已经被大量使用了3年.我们在其上运行了许多虚拟化服务器,其中一些服务器表现不佳,并且在很长一段时间内服务器的io容量超过了导致恶劣的iowait.它有4个500GB的Barracuda sata驱动器连接到3com raid控制器. 1驱动器具有操作系统,另外3个设置为raid-5.

现在我们讨论了驱动器的状况以及它们是否正在积极地失败.

这是4个磁盘中1个的输出的一部分.他们都有相对相似的统计数据：

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_Failed RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       169074425
  3 Spin_Up_Time            0x0003   095   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   030    Pre-fail  Always       -       200009354607
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       27856
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       1
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       26
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   060   045    Old_age   Always       -       29 (Lifetime Min/Max 26/37)
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 21 0 0)
195 Hardware_ECC_Recovered  0x001a   046   033   000    Old_age   Always       -       169074425
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

我对此的解释是,我们没有任何坏道或其他迹象表明任何驱动器都在积极发生故障.

但是,高Raw_Read_Error_Rate和Seek_Error_Rate被指向驱动器正在死亡的迹象.

解决方法

根据我的经验,希捷对这两个SMART属性有奇怪的数字.在诊断Seagate时,我倾向于忽略这些并更密切地关注其他领域,如Reallocated Sector Count.当然,如果有疑问,请更换驱动器,但即使是全新的希捷也会拥有这些属性的高数字.

linux – 如何解释这个smartctl(smartmon)数据

解决方法

猜你在找的Linux相关文章