...or how I almost lost 1.5 TB of data today.
If you use software/fake RAID, this might happen to you too.
See, this morning I thought I would upgrade my HTPC's BIOS just for the heck of it. This is the PC that has a 2TB RAID5 array based on nVIDIA's MediaShield.
Now, despite what I had selected during the BIOS update, the BIOS settings and DMI data were reset after reboot, which means that the HDDs were back to individual IDE emulated drives, rather than members of the RAID array.
Normally, this wouldn't be a big deal, except that, before I cancelled the Windows boot, it was apparently able to look at the disks (using the MediaShield driver), find out that the capacity of the disk it was booting from (now a single 1TB IDE/AHCI HDD) was less than the capacity reported in the partition table, and re-write the partition table of HDD1 to reduce the dimensions of the last partition.
Of course, re-writting a partition table without anybody asking you to is the shortest way to screw up a disk or RAID array, and screw up it did: As soon as I restored the RAID settings in the BIOS and booted Windows, my 1.5 TB data partition was now identified as unformatted and gone! Talk about massive data loss...
No respectable O/S should ever modify a partition table without asking the user first. It's just common sense: The O/S is never, and I have to stress out that part, NOT EVER, smarter than its user (no matter what the O/S developers might think, or how smart they think they are themselves). You do not modify a partition table without asking, EVER, it's really simple as that!
Now, after much cursing, and some accidental good luck, I found that if the first drive was disconnected from the RAID5 array (which happened accidentally as I was trying to invert HDD#2 and HDD#3, since it originally looked like the BIOS upgrade has modified the SATA IDs), the rest of the array booted fine, albeit in degraded mode, and saw the old 1.5 TB data partition alright. Definitely makes sense with the fact that Windows would of course only have modified the partition table of the boot drive while the HDDs were in IDE mode.
But of course, as soon as you remove one drive from your RAID5 array, and boot in degraded mode, the array will flag that drive as failed on next reboot
From there on, the solution is to re-add the drive to the array to resync. Takes a while, but if you trust your other disks not to fail duing the super-lengthy re-sync, probably the safest solution.
Otherwise, it's probably a good idea to have a copy of the Master Boot Record (i.e. the first 512 bytes) of every single drive from your array, and restore it using a decent O/S like Linux. Plus, as experience will show you time and again, it's also always good practice to keep a copy of the MBRs of all your disks that contain important data, so that you can try to address any kind of partition formatting catastrophe.
Showing posts with label partition. Show all posts
Showing posts with label partition. Show all posts
2009-01-15
This script will save your life!
Or at least your data.
I have been unofficially sysadm'ing a bunch of HP DL380 running Linux over the past few years, and over the Christmas period, one of those, which had a complex mix of ReiserFS, SWAP and XFS partitions on a RAID5 Smart Array device (dev/cciss/c0d0), found nothing better than to start overwritting the MBR!
What this meant of course was, bye bye partition table, and, as you might guess, I didn't find the need to backup the partition data, thinking: "Heck it's RAID5 - what's the worse that can happen?"
ADVICE #1: If you're running a server in PROD, ALWAYS save a copy of the output of fdisk someplace safe!
Thus, there I was, with a non bootable server and a blank partition table, but with data I very much wanted to get back to. And the only tool I had at my disposal (through a remote console connection, because of course, the server had to be remote) was just a Slackware boot CD (because it detects cciss drives) running busybox.
ADVICE #2: If you have physical access to your server, and it's not using a nonstandard disk device, you can probably find a Linux rescue CD with gpart (but better use the latest Debian version, which is more up to date) that supports more partitions types, and will do a much better job. The script below is really if you are in a hurry or you have limited resources.
How difficult can it be to detect partitions with only a shell script then? Well not that difficult, as the script below will prove. But first let me be clear about what the script is meant to achieve:
I have been unofficially sysadm'ing a bunch of HP DL380 running Linux over the past few years, and over the Christmas period, one of those, which had a complex mix of ReiserFS, SWAP and XFS partitions on a RAID5 Smart Array device (dev/cciss/c0d0), found nothing better than to start overwritting the MBR!
What this meant of course was, bye bye partition table, and, as you might guess, I didn't find the need to backup the partition data, thinking: "Heck it's RAID5 - what's the worse that can happen?"
ADVICE #1: If you're running a server in PROD, ALWAYS save a copy of the output of fdisk someplace safe!
Thus, there I was, with a non bootable server and a blank partition table, but with data I very much wanted to get back to. And the only tool I had at my disposal (through a remote console connection, because of course, the server had to be remote) was just a Slackware boot CD (because it detects cciss drives) running busybox.
ADVICE #2: If you have physical access to your server, and it's not using a nonstandard disk device, you can probably find a Linux rescue CD with gpart (but better use the latest Debian version, which is more up to date) that supports more partitions types, and will do a much better job. The script below is really if you are in a hurry or you have limited resources.
How difficult can it be to detect partitions with only a shell script then? Well not that difficult, as the script below will prove. But first let me be clear about what the script is meant to achieve:
- This script is meant to detect the POTENTIAL beginning of a partition only. It will tell you the first cylinder, but it will not tell you the size, so unless it's the last one on disk, you'll have to figure out where the next partition begins as well
- This script will detect primary partitions only. If you have extended partitions, you're on your own
- The script only detected EXT2/EXT3, ReiserFS, Swap (version 2) and XFS. It does not detect FAT or NTFS (because I had no use for it). However, if you know the Magic string and location of other partitions types, you should be able to easily modify the script to add them
- Not counting the comments, I kept the script short and basic, because you're most likely to have to type it in by hand, so shorter is better
- Be mindful that there are likely to be false positives
- And of course, while I succesfully tested this script and all the partitions types on various systems, I am not responsible for any damage occuring from using it.
#!/bin/sh
#
# gpart.sh v1.1 - Linux partitions detection script
# Copyright (C) 2009 >NIL:
# Based in part on gpart (C) 1999-2001 Michail Brzitwa et al.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
#
## Set your disk parameters below
#
device="/dev/sda"
cyl_start=1
# To find your max cyl_end, use fdisk or cfdisk
cyl_end=35697
#
## FS Magic length, string and offset
## Uncomment only one of the sections below
#
## XFS
#bs=4
#magic="XFSB"
#magic_offset=0x0
## ReiserFS
#bs=4
#magic="ReIs"
#magic_offset=0x10034
## Swap space (v2 only!)
#bs=4
#magic="ACE2"
#magic_offset=0x0FFC
## Ext2 / Ext3
bs=2
## OK, here I have to curse the EXT FileSystem devs for
## not chosing a *PROPER* ASCII Magic like everyone else,
## but using 0x53 0xEF instead as this contains i-umlaut
## Thank God for the -e option of echo, which translates
## a "\0###" sequence into the relevant octal character
magic=`echo -ne 'S\0357'`
magic_offset=0x0438
# on almost any recent disk, a cylinder is
# 255(tracks)*63(sectors/track)*512(bytes/sector) = 8225280
# If you're not sure, check what fdisk tells you
cyl_bytes=8225280
mbr_bytes=32256
#
## You shouldn't have to modify anything below this
#
# dd skips in multiples of bs, so we need to compute
# the cylinder size and magic_offset in bs blocks
cyl_blocks=$(($cyl_bytes / $bs))
mbr_blocks=$(($mbr_bytes / $bs))
magic_blocks=$(($magic_offset / $bs))
for i in $(seq $cyl_start $cyl_end); do
if [ $i == 1 ]; then
# For the first cylinder, we need to skip the MBR as well
skip=$(($magic_blocks + $mbr_blocks))
else
skip=$(($(($i-1)) * $cyl_blocks + $magic_blocks))
fi
# Look for the magic block
header=`dd if=$device bs=$bs count=1 skip=$skip 2>/dev/null`
if [ "$header" = "$magic" ]; then
echo "MATCH: cylinder $i ($header)"
fi
done
Subscribe to:
Posts (Atom)