md
) and, for one reason or another (mostly because your are a lazy admin, admit it!), both disks are reporting unreadable sectors, either through SMART or through actual failed readout attempts.So you installed a 3rd good disk, set it as a spare, then failed one of the 2 bad ones to initiate synchronisation onto the good new disk. However, all hell breaks lose as you find out your synchronisation doesn't complete (
/proc/mdstat
reports U_
or _U
) and instead of ignoring the unreadable sectors as it should, md
decides that it cannot continue.Worse, if you look at your
dmesg
, you find out that it is being polluted by a continuous stream of:RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sda1 disk 1, wo:1, o:1, dev:sdb1Help!!!!
OK, first of all, since this information is quite hard to find, especially if you are in a hurry, here are what the abbreviations above mean:
wd
: working disksrd
: raid diskswo
: write-only (if set to 1, this usually indicates a problem, and that data duplication doe not occur for this device)o
: online
wd:1
as well as wo:1
for the second disk is not something we want to see. Why can't our good spare disk be added as R/W to the gorram array? Heck, if the problematic disk fails, that single-handedly contains our up-to-date data now, we will be in big trouble. What's the point of providing redundancy, really, if md
fails to synchronize as soon as there's one measly sector it cannot read!It's a bird! It's a plane! No, it's
hdparm
!Well, the sad truth of
md
on Linux (which may have improved with newer versions) is that it isn't resilient at all when it comes to unreadable sectors during sync. I guess the developers decided that, since the point of redundancy is to always have at least one good set of data, they didn't need to focus on situation where the "good" set of data may also have some corruption, and therefore never planned for anything but try and re-read an unreadable sector forever, until the disk magically repairs itself (right... fat chance!).Now (and for the rest of this post I will mostly be following the excellent information provided by Bas on his blog) to compensate for that oversight, the trick is to have
md
read the problematic sectors one way or another, so that the synchronisation can complete. May sound easier said than done but most of the time it shouldn't be an issue, as recent disks with SMART are engineered with a set of spare sectors, to be allocated in replacement of unreadable or unwritable ones for exactly this kind of situation. The issue however is that reallocation of sectors only occurs on write access.What this means then is that, while the disk has the technology to "fix" itself, as long as you are only attempting to read the problematic sectors, reallocation will not be triggered and you will continue to get read errors. Thus, you must manually issue a write to the problematic sector(s) to trigger the "recovery" mechanism (NB: I'm using "fix" and "recovery" loosely, as you can of course not recover data from these sectors if they are reallocated, therefore will end up with some corrupted data).
This can be confirmed by checking the
Offline_Uncorrectable
(#198) and Reallocated_Sector_Ct
(#5) reports from SMART:# smartctl -A /dev/sda smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 105 2 Throughput_Performance 0x0026 054 054 000 Old_age Always - 2759 3 Spin_Up_Time 0x0023 084 084 025 Pre-fail Always - 4989 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 11496 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10 191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 060 000 Old_age Always - 32 (Lifetime Min/Max 20/40) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 2 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 10If you see a zero at the end of these attributes but the disk still reports that it has trouble reading sectors, it indicates that the sector reallocation process hasn't kicked in yet, and needs to be triggered manually.
The first order of the day then is to find the address of the sector(s) we should trigger a write to. This is fairly easy, as all you need to do is run a SMART test, with something like
smartctl -t long /dev/sda
and write down the first sector address where a read error is reported:# smartctl -a /dev/sda (...) SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 60% 10864 293039329 (...)Once we have that address, we could of course use
dd
, but an even simpler approach is to use a recent version of hdparm
, as it adds easy support for reading/writing a single sector.First thing to try with
hdparm
then, is confirm that we have a problem accessing that sector:# hdparm --read-sector 293039329 /dev/sda /dev/sda: Input/Output errorThis confirms what the SMART test reported. You can try a few more read attempts, to validate that the sector is busted, and then, you can issue a write so that the disk finally realizes it should reallocate that sector. Note that, because the operation obviously means destroying existing data,
hdparm
requires you to add a --yes-i-know-what-i-am-doing
flag to issue the write, hence:# hdparm --yes-i-know-what-i-am-doing --write-sector 293039329 /dev/sda /dev/sda: re-writing sector 293039329: succeededYou can then issue a read again, which will confirm that the sector has been reallocated:
# hdparm --read-sector 293039329 /dev/sda /dev/sda: reading sector 293039329: succeeded 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000If you issue
smartctl -A
again, you should also see that the sector has been reallocated:198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1It's usually a good idea to use
hdparm
to read adjacent sectors as well, and correct them as needed, then repeat the operations above until the SMART self test completes without error and you have smoked out all the problematic sectors. At this stage, if you issue a resync of the array with the new disk, it should complete successfully and redundancy will be restored. Time to order another replacement and check your data for corruption. But at least, you are redundant again.Addons
- To get details of your md array, you can use
mdadm --detail
. Eg.# mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Tue May 6 18:43:16 2008 Raid Level : raid1 Array Size : 130030016 (124.01 GiB 133.15 GB) Used Dev Size : 130030016 (124.01 GiB 133.15 GB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Jan 10 13:42:29 2012 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 UUID : 0be47c81:ede086ae:0c460403:d81de298 Events : 0.3658859 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 8 35 - spare /dev/sdc3
- You are strongly encouraged to check your
syslog
ormessages
for reports of I/O issues, especially if you want to locate the data that may have been affected. - This method is not guaranteed to work! Sometimes a SMART test will report a read error but a readout of the sector using
hdparam
will work fine, so you won't be able to get the disk to reallocate it. However, tis shouldn't matter too much formd
resync which is what we are interested in here. - If your disk has a lot of unreadable sectors, it is possible that you may run out of spare sectors for reallocation. It's hard to say how many spare sectors are made available by hard drive manufacturers, but I assume it isn't that many.
- You may have a problem recompiling a recent version of
hdparm
on some older Linux systems:fallocate.c: In function ‘do_fallocate_syscall’: fallocate.c:39: error: ‘__NR_fallocate’ undeclared (first use in this function) fallocate.c:39: error: (Each undeclared identifier is reported only once fallocate.c:39: error: for each function it appears in.) make: *** [fallocate.o] Error 1
If that is the case, just add:#define __NR_fallocate 285
infallocate.c
- Some disks seem to be smart enough (no pun intended) to do further correction, once they have registered
Offline_Uncorrectable
sectors, so you may actually find out that, after a few hours, the value ofOffline_Uncorrectable
falls back to zero, and still the sectors can be read or written with extended SMART tests not reporting any issue. Pretty neat, but I still wouldn't entirely trust the disk...
Great write-up!
ReplyDeletehttp://smartmontools.sourceforge.net/badblockhowto.html#e2_example1 has some instructions on how to calculate which files are affected by mapping the LBAs to inodes.