2006-11-02

The Joy of Backups: dirvish, why it's beautiful, and why that once again seems to mean that I can't have it.

Trying to have good backups on Mac OS X is not easy, as you can read here.

Note that the best of the breed on that page, SuperDuper!, is a filesystem-oriented tool. That means that it can be used to backup full volumes only. Its incremental mode does not save "changes since...", but rather updates the current backup to the most recent state. That means that you can't just choose to only restore up to a certain point when a known corruption is included in later backups. That might be good enough for some home users, and it's in fact great for staging and deployment or software testing in organisations, but it doesn't fit my needs.

Also, for me it should be a little easier anyway, or so the theory goes, because I don't much care for preservation of resource forks, which really are the only metadata not found at all in other Unices. In practice, I might run into some inconveniences upon restore if I don't have the resource forks, but not lose a lot of important data. "Or so the theory goes" being the operative part of the sentence.

The tool I finally arrived at was dirvish, a disk-oriented backup system that
a) re-creates the directory structure of the path that should be backed up (I'll call this the source) in a directory that will be used for this backup run (I'll call this the target), which resides in a vault (a direchttp://beta.blogger.com/img/gl.link.giftory which holds all backups for a specific source) inside a bank (a directory that holds a collection of vaults)
b) then inspects each file in the source. If it hasn't changed since the last backup in the same vault, it'll be hard-linked to the current target, if it's been deleted it won't be created in the current target, if it's been newly created, it'll be copied from source to target.
c) Now here comes the kicker: If a previously existing file was changed, it'll be copied back from the previous backup in the same vault, and then synchronized using rsync, which means (almost) only the actual changes will be transferred over the network. If you use it locally, the changed files will simply be copied, but you still have a directory in which you can "cd back in time" (as advertised their snapshot feature with similar capabilities).

If you don't think that's really cool, you need to stop right here; it won't get any less geeky today.

If everything works right, dirvish means that if you start out with 40GB of data, and have no more than 2GB of additive change (that means the full new size of changed files, and new files) on average per day, you can store backups for 131 = 1 (* 40GB) + 130 (* 2 GB) days on a 300GB hard disk. All just a chdir()'s throw away. Dirvish is a beautiful thing indeed.

Now, if you read the stuff above carefully, you might notice that dirvish needs hard links. I talked about that. What, precisely, I talked about is that HFS+ doesn't do hard links, it just fakes them. Badly. Which is kind of a problem for something that call itself Unix. As is widely understand outside Sunnyvale/Cupertino, Unix can't really do without hard links, and in fact Mac OS X comes with a lot of these what I'll call "faux links" out of the box.

For dirvish, it means that all backups are full backups, which somewhat defeats the whole purpose of the exercise.

No problem, you can always use UFS (which can fake resource forks without a problem using simple dot-files); that one does hard links all right. I formatted my external hard drive where my dirvish bank resides with UFS, and voila: Suddenly my backups worked. For a few days.

Then I started encountering kernel panics. Those are the nice grey screens where your Mac tells you that it made a boo-boo right in its underoos and won't you please hold the power button until it sighs a little (really, the sound of the DVD drive on switch-off is creepy), and then switch it on again and hope your data is still there. No kidding there, with filevault each panic is a new adventure.

Basically, it's what a blue screen is for Windows, (both in severity and, lately for me, frequency), only it looks better.

When I re-tried backing up, dirvish fell over I/O errors. I fsck'd the external drive; all seemed fine. Still I/O errors. Then I wanted to copy the filevault to the external drive to "hdiutil convert" it back to the internal one (which generally fixes filesystem problems). That's where I found out that UFS on Mac only supports file of size less than or equal to 4GB in size. Yay, welcome in the previous century.

I found out how to fix filevaults in-place (which I'll farm out to another post as it may interest people who don't like to read my rambling rants (you know Grampa Simpson? "We had onions on our belts, as was the fashion of the day"...)), and did it. Then I could do one backup in peace. During today's, kernel panic.

As of now, I'm without a good backup, and this whole platform starts to piss me off, .

Does anybody here know whether Ubuntu 6.06 includes decent support for the previous (2.16GHz Core Duo) MacBook Pros:
a) Regarding their graphics cards, http://beta.blogger.com/img/gl.link.gifincluding 3D
b) Power management, including suspend to HDD and RAM (preferably at the same time as 3D support)
c) If so, can I have it like Mac OS does it: Suspend to HDD, then to RAM - if you wake up while there's still power, you wake up from RAM, else you come up from the HDD.
d) Sounds, with multiple things playing at the same time (e.g., you hear Skype ring when you listen to music, oh and Skype should work in the first place)
e) Some music player with support for the iPod and Podcasts. MP3 is enough, I don't care about that OGG bullshit nobody uses or the DRM-infested crap that the ITMS sells.
f) All the rest of the hardware in the MBP
g) Especially the weird widescreen resolution that the display provides.

I'm not unhappy with the day-to-day functioning of my Mac per se, and it's true that it works pretty amazingly as long as you do everything their way, but right now, their way is not being able to make good incremental backups, and that I can't tolerate. Time machine my ass; if it's going to be implemented by the people who brought us HFS, I can just as well delete my data myself and be over with it.