我打开电脑约半小时后,我在dmesg中收到这些错误:
[ 1355.677957] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318420: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251700offset=0(0),inode=1802725748,rec_len=179136,name_len=32 [ 1355.677973] Aborting journal on device sda2-8. [ 1355.678101] EXT4-fs (sda2): Remounting filesystem read-only [ 1355.690144] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318416: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251699offset=0(0),inode=2194783952,rec_len=53280,name_len=152 [ 1356.864720] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1312795: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251176offset=1460(13748),inode=1432317541,rec_len=208208,name_len=119
/ dev / sda是一个SSD,它使用noop调度程序.
/ etc / fstab条目:
UUID=acb4eefa-48ff-4ee1-bb5f-2dccce7d011f / ext4 errors=remount-ro,noatime,discard,user_xattr 0 1
系统信息:
$cat /proc/mounts | grep /dev/sd /dev/sda1 /boot ext2 rw,errors=continue 0 0 $cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=10.04 DISTRIB_CODENAME=lucid DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS" $uname -a Linux leetpad 2.6.35-30-generic-pae #61~lucid1-Ubuntu SMP Thu Oct 13 21:14:29 UTC 2011 i686 GNU/Linux
智能输出-a:
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: STT_FTM28GX25H Serial Number: P637510-MIBY-706A009 Firmware Version: 1916 User Capacity: 128,035,676,160 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Nov 24 20:53:48 2011 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The prevIoUs self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x1d) SMART execute Offline immediate. No Auto Offline data collection support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x00) Error logging NOT supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 0) minutes. Extended self-test routine recommended polling time: ( 0) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_Failed RAW_VALUE 1 Raw_Read_Error_Rate 0x0000 005 000 000 Old_age Offline In_the_past 0 9 Power_On_Hours 0x0000 141 002 000 Old_age Offline - 0 12 Power_Cycle_Count 0x0000 115 002 000 Old_age Offline - 0 184 Unknown_Attribute 0x0000 084 000 000 Old_age Offline In_the_past 0 195 Hardware_ECC_Recovered 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 196 Reallocated_Event_Count 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 197 Current_Pending_Sector 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 198 Offline_Uncorrectable 0x0000 002 107 000 Old_age Offline - 21198 199 UDMA_CRC_Error_Count 0x0000 063 003 000 Old_age Offline - 26957 200 Multi_Zone_Error_Rate 0x0000 099 124 000 Old_age Offline - 446 201 Soft_Read_Error_Rate 0x0000 024 154 000 Old_age Offline - 328 202 TA_Increase_Count 0x0000 115 254 000 Old_age Offline - 115 203 Run_Out_Cancel 0x0000 247 245 000 Old_age Offline - 83 204 Shock_Count_Write_Opern 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 205 Shock_Rate_Write_Opern 0x0000 016 039 000 Old_age Offline - 0 206 Flying_Height 0x0000 005 000 000 Old_age Offline In_the_past 0 207 Spin_High_Current 0x0000 055 015 000 Old_age Offline - 0 208 Spin_Buzz 0x0000 248 001 000 Old_age Offline - 0 209 Offline_Seek_Performnce 0x0000 095 000 000 Old_age Offline In_the_past 0 211 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 212 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 213 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0 Warning: device does not support Error Logging Warning! SMART ATA Error Log Structure error: invalid SMART checksum. SMART Error Log Version: 1 No Errors Logged Warning! SMART Self-Test Log Structure error: invalid SMART checksum. SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests,use: smartctl -t] Device does not support Selective Self Tests/Logging
我运行memtest 7个小时,它没有发现任何内存错误.
任何明显的想法在这种情况下会出现什么问题?我能想象到的最合理的事情是SSD正在静默地丢弃一些写请求,最终导致EXT4文件系统不一致(但没有磁盘I / O错误).怎么会发生这种情况?是否有相关的配置选项我应该确保正确设置?
我应该使用哪些工具来诊断硬件故障?是否可以在不覆盖数据的情况下诊断SSD故障?