今天我做了apt-get update&&
Linux Ubuntu 12.04.5 LTS服务器上的apt-get升级.一切都很顺利.四个小时后,监控工具提醒我磁盘I / O过载.在8核系统上,I / O等待已达到10-40%,系统平均负载从1上升到20.网站变得非常缓慢.
看起来像磁盘或硬件不好,但我不太确定.我应该去哪里挖?任何帮助赞赏.
看起来像磁盘或硬件不好,但我不太确定.我应该去哪里挖?任何帮助赞赏.
uname -a:
Linux p-de-www 3.2.0-77-generic #114-Ubuntu SMP Tue Mar 10 17:26:03 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
最佳:
top - 16:19:59 up 1:38,3 users,load average: 11.54,7.46,5.76 Tasks: 217 total,1 running,216 sleeping,0 stopped,0 zombie cpu(s): 1.3%us,0.2%sy,0.0%ni,80.9%id,17.6%wa,0.0%hi,0.0%si,0.0%st Mem: 16126212k total,4153684k used,11972528k free,193392k buffers Swap: 8387568k total,0k used,8387568k free,2281864k cached
在syslog中有一堆ACPI错误.
在/ var / log / messages中:
root@p-de-www:~# tail -n 100 /var/log/messages Mar 19 15:51:01 p-de-www kernel: [ 4184.716158] ata1: hard resetting link Mar 19 15:51:02 p-de-www kernel: [ 4185.763378] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 15:51:02 p-de-www kernel: [ 4185.882753] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:51:02 p-de-www kernel: [ 4185.882761] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:51:02 p-de-www kernel: [ 4185.883514] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:51:02 p-de-www kernel: [ 4185.883523] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:51:02 p-de-www kernel: [ 4185.883842] ata1.00: configured for UDMA/133 Mar 19 15:51:02 p-de-www kernel: [ 4185.883860] ata1: EH complete Mar 19 15:52:19 p-de-www kernel: [ 4262.752244] ata1: hard resetting link Mar 19 15:52:24 p-de-www kernel: [ 4268.109057] ata1: link is slow to respond,please be patient (ready=0) Mar 19 15:52:26 p-de-www kernel: [ 4269.676180] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 15:52:26 p-de-www kernel: [ 4269.769475] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:52:26 p-de-www kernel: [ 4269.769483] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:52:26 p-de-www kernel: [ 4269.770244] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:52:26 p-de-www kernel: [ 4269.770251] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:52:26 p-de-www kernel: [ 4269.770483] ata1.00: configured for UDMA/133 Mar 19 15:52:26 p-de-www kernel: [ 4269.770496] ata1.00: retrying FLUSH 0xea Emask 0x4 Mar 19 15:52:26 p-de-www kernel: [ 4269.770587] ata1.00: device reported invalid CHS sector 0 Mar 19 15:52:26 p-de-www kernel: [ 4269.770604] ata1: EH complete Mar 19 15:54:39 p-de-www kernel: [ 4402.577394] ata1.00: limiting speed to UDMA/100:PIO4 Mar 19 15:54:39 p-de-www kernel: [ 4402.577557] ata1: hard resetting link Mar 19 15:54:44 p-de-www kernel: [ 4407.934367] ata1: link is slow to respond,please be patient (ready=0) Mar 19 15:54:49 p-de-www kernel: [ 4412.579786] ata1: hard resetting link Mar 19 15:54:51 p-de-www kernel: [ 4415.362269] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 15:54:51 p-de-www kernel: [ 4415.475792] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:54:51 p-de-www kernel: [ 4415.475800] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:54:51 p-de-www kernel: [ 4415.476645] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:54:51 p-de-www kernel: [ 4415.476653] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:54:51 p-de-www kernel: [ 4415.476905] ata1.00: configured for UDMA/100 Mar 19 15:54:51 p-de-www kernel: [ 4415.476934] ata1: EH complete Mar 19 15:55:13 p-de-www kernel: [ 4436.542443] ata1: hard resetting link Mar 19 15:55:15 p-de-www kernel: [ 4438.876963] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 15:55:15 p-de-www kernel: [ 4438.959075] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:55:15 p-de-www kernel: [ 4438.959084] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:55:15 p-de-www kernel: [ 4438.959905] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 15:55:15 p-de-www kernel: [ 4438.959914] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 15:55:15 p-de-www kernel: [ 4438.960212] ata1.00: configured for UDMA/100 Mar 19 15:55:15 p-de-www kernel: [ 4438.960235] ata1: EH complete Mar 19 16:17:32 p-de-www kernel: [ 5774.861347] ata1: hard resetting link Mar 19 16:17:33 p-de-www kernel: [ 5776.132497] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 16:17:33 p-de-www kernel: [ 5776.248345] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:17:33 p-de-www kernel: [ 5776.248353] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:17:33 p-de-www kernel: [ 5776.249163] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:17:33 p-de-www kernel: [ 5776.249172] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:17:33 p-de-www kernel: [ 5776.249441] ata1.00: configured for UDMA/100 Mar 19 16:17:33 p-de-www kernel: [ 5776.249445] ata1.00: retrying FLUSH 0xea Emask 0x4 Mar 19 16:17:33 p-de-www kernel: [ 5776.249538] ata1.00: device reported invalid CHS sector 0 Mar 19 16:17:33 p-de-www kernel: [ 5776.249547] ata1: EH complete Mar 19 16:18:34 p-de-www kernel: [ 5836.778503] ata1: hard resetting link Mar 19 16:18:37 p-de-www kernel: [ 5840.400297] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 16:18:37 p-de-www kernel: [ 5840.500401] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:18:37 p-de-www kernel: [ 5840.500409] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:18:37 p-de-www kernel: [ 5840.501223] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:18:37 p-de-www kernel: [ 5840.501231] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:18:37 p-de-www kernel: [ 5840.501468] ata1.00: configured for UDMA/100 Mar 19 16:18:37 p-de-www kernel: [ 5840.501481] ata1.00: retrying FLUSH 0xea Emask 0x4 Mar 19 16:18:37 p-de-www kernel: [ 5840.501589] ata1: EH complete Mar 19 16:19:38 p-de-www kernel: [ 5900.742501] ata1: hard resetting link Mar 19 16:19:40 p-de-www kernel: [ 5903.077048] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 16:19:40 p-de-www kernel: [ 5903.077537] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:19:40 p-de-www kernel: [ 5903.077546] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:19:40 p-de-www kernel: [ 5903.078334] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:19:40 p-de-www kernel: [ 5903.078342] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:19:40 p-de-www kernel: [ 5903.078579] ata1.00: configured for UDMA/100 Mar 19 16:19:40 p-de-www kernel: [ 5903.078582] ata1.00: retrying FLUSH 0xea Emask 0x4 Mar 19 16:19:40 p-de-www kernel: [ 5903.078679] ata1: EH complete Mar 19 16:21:24 p-de-www kernel: [ 6006.666736] ata1.00: limiting speed to UDMA/33:PIO4 Mar 19 16:21:24 p-de-www kernel: [ 6006.666867] ata1: hard resetting link Mar 19 16:21:29 p-de-www kernel: [ 6012.023734] ata1: link is slow to respond,please be patient (ready=0) Mar 19 16:21:34 p-de-www kernel: [ 6016.669145] ata1: hard resetting link Mar 19 16:21:39 p-de-www kernel: [ 6022.026105] ata1: link is slow to respond,please be patient (ready=0) Mar 19 16:21:44 p-de-www kernel: [ 6026.671575] ata1: hard resetting link Mar 19 16:21:46 p-de-www kernel: [ 6028.726319] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 16:21:46 p-de-www kernel: [ 6028.824829] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:21:46 p-de-www kernel: [ 6028.824836] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:21:46 p-de-www kernel: [ 6028.825575] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:21:46 p-de-www kernel: [ 6028.825579] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:21:46 p-de-www kernel: [ 6028.825811] ata1.00: configured for UDMA/33 Mar 19 16:21:46 p-de-www kernel: [ 6028.825815] ata1.00: retrying FLUSH 0xea Emask 0x4 Mar 19 16:21:46 p-de-www kernel: [ 6028.825918] ata1.00: device reported invalid CHS sector 0 Mar 19 16:21:46 p-de-www kernel: [ 6028.825925] ata1: EH complete Mar 19 16:22:07 p-de-www kernel: [ 6049.650737] ata1: hard resetting link Mar 19 16:22:12 p-de-www kernel: [ 6055.007538] ata1: link is slow to respond,please be patient (ready=0) Mar 19 16:22:17 p-de-www kernel: [ 6059.652963] ata1: hard resetting link Mar 19 16:22:22 p-de-www kernel: [ 6065.009914] ata1: link is slow to respond,please be patient (ready=0) Mar 19 16:22:23 p-de-www kernel: [ 6065.849433] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 16:22:23 p-de-www kernel: [ 6065.978240] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:22:23 p-de-www kernel: [ 6065.978248] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:22:23 p-de-www kernel: [ 6065.979084] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:22:23 p-de-www kernel: [ 6065.979092] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:22:23 p-de-www kernel: [ 6065.979403] ata1.00: configured for UDMA/33 Mar 19 16:22:23 p-de-www kernel: [ 6065.979424] ata1: EH complete Mar 19 16:22:51 p-de-www kernel: [ 6093.626046] ata1: hard resetting link Mar 19 16:22:51 p-de-www kernel: [ 6094.113597] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 19 16:22:51 p-de-www kernel: [ 6094.226485] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:22:51 p-de-www kernel: [ 6094.226492] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:22:51 p-de-www kernel: [ 6094.227269] ACPI Error: [DSSP] Namespace lookup failure,AE_NOT_FOUND (20110623/psargs-359) Mar 19 16:22:51 p-de-www kernel: [ 6094.227276] ACPI Error: Method parse/execution Failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff880405e726b8),AE_NOT_FOUND (20110623/psparse-536) Mar 19 16:22:51 p-de-www kernel: [ 6094.227513] ata1.00: configured for UDMA/33 Mar 19 16:22:51 p-de-www kernel: [ 6094.227541] ata1: EH complete
软件RAID1中有2个磁盘:
root@p-de-www:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sdb2[1] sda2[0] 524276 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 8387572 blocks super 1.2 [2/2] [UU] md3 : active raid1 sdb4[1] sda4[0] 1847608639 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 1073740664 blocks super 1.2 [2/2] [UU]
iotop看起来很好,这些尖刺很少:
377 be/3 root 0.00 B/s 82.29 K/s 0.00 % 7.28 % [jbd2/md2-8]
smartctl -a / dev / sda的输出:
=== START OF INFORMATION SECTION === Device Model: ST3000DM001-1CH166 Serial Number: W1F1YLLX LU WWN Device Id: 5 000c50 05dd292d0 Firmware Version: CC24 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical,4096 bytes physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Mar 19 16:55:48 2015 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The prevIoUs self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 584) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_Failed RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 102 099 006 Pre-fail Always - 195880648 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 502482545 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18486 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 10 183 Runtime_Bad_Block 0x0032 097 097 000 Old_age Always - 3 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 097 095 000 Old_age Always - 197571510318 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 061 045 Old_age Always - 30 (Min/Max 27/35) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 877 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 20 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 80 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 80 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 79985175971891 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 32009289003 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 178724571355 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans,do NOT read-scan remainder of disk. If Selective self-test is pending on power-up,resume after 0 minute delay.
smartctl -a / dev / sdb的输出:
=== START OF INFORMATION SECTION === Device Model: ST3000DM001-1CH166 Serial Number: W1F1VM8Q LU WWN Device Id: 5 000c50 05dbdcafe Firmware Version: CC24 User Capacity: 3,4096 bytes physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Mar 19 16:57:57 2015 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The prevIoUs self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 600) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_Failed RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 178849088 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 498642529 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18467 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 10 183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 082 082 000 Old_age Always - 18 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 062 045 Old_age Always - 30 (Min/Max 26/35) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 876 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 20 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 44448616564768 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 55043480738 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 154979931141 SMART Error Log Version: 1 ATA Error Count: 18 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on,and printed as DDd+hh:mm:SS.sss where DD=days,hh=hours,mm=minutes,SS=sec,and sss=millisec. It "wraps" after 49.710 days. Error 18 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours) When the command that caused the error occurred,the device was active or idle. After command completion occurred,registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 a0 7e 17 05 Error: UNC at LBA = 0x05177ea0 = 85425824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 a0 7e 17 45 00 16d+20:43:03.906 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 16d+20:43:03.905 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 16d+20:43:03.905 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 16d+20:43:03.905 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 16d+20:43:03.905 SET FEATURES [Set transfer mode] Error 17 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours) When the command that caused the error occurred,registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 a0 7e 17 05 Error: UNC at LBA = 0x05177ea0 = 85425824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 a0 7e 17 45 00 16d+20:43:01.000 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 16d+20:43:01.000 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 16d+20:43:01.000 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 16d+20:43:01.000 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 16d+20:43:01.000 SET FEATURES [Set transfer mode] Error 16 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours) When the command that caused the error occurred,registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 a0 7e 17 05 Error: UNC at LBA = 0x05177ea0 = 85425824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 a0 7e 17 45 00 16d+20:42:58.104 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 16d+20:42:58.104 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 16d+20:42:58.104 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 16d+20:42:58.104 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 16d+20:42:58.104 SET FEATURES [Set transfer mode] Error 15 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours) When the command that caused the error occurred,registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 a0 7e 17 05 Error: UNC at LBA = 0x05177ea0 = 85425824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 a0 7e 17 45 00 16d+20:42:55.196 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 16d+20:42:55.196 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 16d+20:42:55.196 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 16d+20:42:55.196 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 16d+20:42:55.196 SET FEATURES [Set transfer mode] Error 14 occurred at disk power-on lifetime: 18168 hours (757 days + 0 hours) When the command that caused the error occurred,registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 a0 7e 17 05 Error: UNC at LBA = 0x05177ea0 = 85425824 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 08 a0 7e 17 45 00 16d+20:42:52.257 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 16d+20:42:52.257 SET FEATURES [Reserved for Serial ATA] 27 00 00 00 00 00 e0 00 16d+20:42:52.257 READ NATIVE MAX ADDRESS EXT ec 00 00 00 00 00 a0 00 16d+20:42:52.256 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 16d+20:42:52.256 SET FEATURES [Set transfer mode] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans,resume after 0 minute delay.
解决方法
你有一段时间没有运行SMART自测.尝试运行smartctl -t long< device>.
它应该需要几个小时,你可以看到smartctl -a的进展:
它应该需要几个小时,你可以看到smartctl -a的进展:
Self-test execution status: ( 0) The prevIoUs self-test routine completed without error or no self-test has ever been run.
如果它没有像上一次运行那样没有完成,那么当驱动器是新的时:
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 5 -
只是摆脱驱动器.
我的猜测是@kasperd是对的.智能日志中出现sata错误/错误的驱动器已损坏.
顺便说一句.高负载和损坏驱动器之间的关系来自负载测量. load是许多等待执行的进程.等待驱动器返回数据的进程确实正在等待执行.