Well, a few days ago I mentioned a problem I had with ‘yawl’ involving a blown hard drive. Fun this wasn’t, and unfortunately I was so swamped with work, I didn’t really have a chance to work on the machine, so it sadly sat, turned off, while I wrestled with the vagaries of Java and EJB3.0.
With some slack time this weekend, I set about seeing what I could recover from the smoking ruin that was the 20gig drive in the machine. Booting the machine revealed only ‘Grub loading’ then ‘Error 17’. Many folks on the net have said this is a blown bootloader, usually happening after a failed upgrade. I know I hadn’t done any upgrade, this was something more serious.
But what to do about it? I couldn’t boot it, it was time to go for a repair CD. Fortunately, I had some experience using the Sys Rescue CD, an opensource toolset that fits on a CD (in fact it’ll fit on a flash drive), and contains most tools an admin will need to repair or maintain a system that has had Something Bad happen to it.
One burned CD later, I had the machine booted. cfdisk happily reported “You have a nice 20gig partition that’s empty! Want to install anything to it?” Not an auspicious start.
I could not mount the faulty partition, so really the only thing to do was to hand it over to fsck and mutter a few incantations.
fsck had a grand old time with the filesystem repair. First indications were good – it actually found the partition, and said there were files on it, though one of the two superblocks was completely missing (linux filesystems have a primary and a backup superblock – sort of the ‘master directory’ for the partition – for just this reason). Without the backup superblock, the entire filesystem would have been gone. Phew.
A good 20 minutes later, after much gnashing, queries about whether I wanted to fix the deallocated blocks and other fun filesystem issues, I had a mounted, readable filesystem. The SysrescueCD is a fully functional single user Linux environment, so I could mount, manipulate, and archive the newly repaired filesystem. I don’t trust it to run on its own – the damage touched just about every open file on the machine (including things like kernel modules), so I doubt the machine is stable. But, I could bring up the network interface and copy off my ~/docs/ directory – where I keep all my business documents. I had a backup of it, but it was quite old.
I feel a lot better now that I’ve gotten my important documents off the machine. The next step will be determining what to do with the box. I’ve already received a replacement 160gig drive I’ll be installing (nothing like an 8x space increase!), and I’d like to archive some ‘less critical, but still nice to have copies of’ files, but for now, I just barely ducked that bullet.
NB -interestingly, this is the only mildly catastrophic hard disk failure I’ve -ever- had. The only other recent failure I can think of was dropping poor hunter while at band practice. It twitched the drive, which I replaced. But I consider laptops to be ‘volatile’ environments, and everything was backed up – no loss. I suppose I should be knocking wood everywhere, but I prefer to think I’m careful enough and don’t do Stupid Things with my machines.
Or maybe this is pure hubris. I gotta go run my backups.