我有一个ubuntu服务器设置了14磁盘ZFS raidz2池.
大约80%的时间,在重新启动时,我将最终得到一个降级池,其中两个磁盘标记为出现故障.出现故障的驱动器并不总是相同,但始终只有两个驱动器.例如:
$sudo zpool status pool: tank state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 sde ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sda ONLINE 0 0 0 sdh ONLINE 0 0 0 11521322863231878081 FAULTED 0 0 0 was /dev/sdf1 15273938560620494453 FAULTED 0 0 0 was /dev/sdg1 sdb ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 errors: No known data errors
我可以导出并重新导入池,磁盘不再出现故障.例如:
$sudo zpool export tank $sudo zpool import tank $sudo zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced,and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sde ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sda ONLINE 0 0 0 sdh ONLINE 0 0 0 sdg ONLINE 0 0 1 sdf ONLINE 0 0 0 sdb ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 errors: No known data errors
正在使用的HBA在另一台服务器上正常工作.
还有什么我可以尝试在重启时避免这些故障驱动器吗?我有另一个我可以交换的HBA.