2009-07-23

Why does windows find the need to modify the partition table at boot...

...or how I almost lost 1.5 TB of data today.

If you use software/fake RAID, this might happen to you too.

See, this morning I thought I would upgrade my HTPC's BIOS just for the heck of it. This is the PC that has a 2TB RAID5 array based on nVIDIA's MediaShield.
Now, despite what I had selected during the BIOS update, the BIOS settings and DMI data were reset after reboot, which means that the HDDs were back to individual IDE emulated drives, rather than members of the RAID array.

Normally, this wouldn't be a big deal, except that, before I cancelled the Windows boot, it was apparently able to look at the disks (using the MediaShield driver), find out that the capacity of the disk it was booting from (now a single 1TB IDE/AHCI HDD) was less than the capacity reported in the partition table, and re-write the partition table of HDD1 to reduce the dimensions of the last partition.

Of course, re-writting a partition table without anybody asking you to is the shortest way to screw up a disk or RAID array, and screw up it did: As soon as I restored the RAID settings in the BIOS and booted Windows, my 1.5 TB data partition was now identified as unformatted and gone! Talk about massive data loss...

No respectable O/S should ever modify a partition table without asking the user first. It's just common sense: The O/S is never, and I have to stress out that part, NOT EVER, smarter than its user (no matter what the O/S developers might think, or how smart they think they are themselves). You do not modify a partition table without asking, EVER, it's really simple as that!

Now, after much cursing, and some accidental good luck, I found that if the first drive was disconnected from the RAID5 array (which happened accidentally as I was trying to invert HDD#2 and HDD#3, since it originally looked like the BIOS upgrade has modified the SATA IDs), the rest of the array booted fine, albeit in degraded mode, and saw the old 1.5 TB data partition alright. Definitely makes sense with the fact that Windows would of course only have modified the partition table of the boot drive while the HDDs were in IDE mode.
But of course, as soon as you remove one drive from your RAID5 array, and boot in degraded mode, the array will flag that drive as failed on next reboot

From there on, the solution is to re-add the drive to the array to resync. Takes a while, but if you trust your other disks not to fail duing the super-lengthy re-sync, probably the safest solution.
Otherwise, it's probably a good idea to have a copy of the Master Boot Record (i.e. the first 512 bytes) of every single drive from your array, and restore it using a decent O/S like Linux. Plus, as experience will show you time and again, it's also always good practice to keep a copy of the MBRs of all your disks that contain important data, so that you can try to address any kind of partition formatting catastrophe.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.