Articles

How BackInTime works

In Version control on May 19, 2010 by Matt Giuca Tagged:

I’ve just become a fan of the BackInTime backup utility for Linux. I previously rolled my own backup utility which used rsync, but it didn’t do incremental backups and I’ve become increasingly worried about overwriting my only good backup with a dud. I’ll only trust such a tool if I understand how it works, and BackInTime is extremely simple. I’ve briefly perused the source code but mostly figured out how it works by reading the command-line output and examining the directory structure it creates. So here’s a brief rundown of how it works, for technical users:

  1. Tests whether the drive has changed since last snapshot (using rsync), and if not, quit.
  2. cp -al <previous snapshot> <new snapshot>. This creates a complete recursive copy, but every single file is hard-linked to the old snapshot.
  3. rsync -aEAX –delete-excluded –delete –chmod=Fa-w,D+w <current directory> <new snapshot>. This copies every modified file over the top of the existing hardlink (deleting the hardlink and creating a new file), and sets it to read-only.
  4. Create a simple CSV file fileinfo.bz2 which contains each file’s original permission bits, owner and group.

That’s it. The important thing is I didn’t want my backup hidden away in a database or other opaque structure; I just want it to be files. Of course, the hard-linking makes the backups inexpensive, as only the modified files take up additional space in each snapshot.

You can restore a BackInTime backup, if you have to, just by copying the files over — however, they will be read-only, so it’s preferable if you use BackInTime’s restore function (sadly and confusingly this isn’t available from the command-line, only the GUI).  You can restore individual files or whole directories. The restore process is just:

  1. If the original file already exists, rename it to <filename>.backup.<date>.
  2. rsync the file over to the original location.
  3. Lookup the mode/owner/group in fileinfo.bz2 and set it on the file.

My only concern with this is that if you restore the entire backup over an existing copy, you will end up with .backup.<date> files all over the place, which would be hell to clean up. Also the “Restore” button has no prompt :/

Features also include:

  • Command-line, Gnome or KDE4 GUI support (though as mentioned, restoration is GUI-only).
  • Browse any snapshot from the GUI or in a regular file browser.
  • Graphical diff between any files across snapshots, via Meld.
  • Optionally auto-remove snapshots according to some very smart rules: “Keep all snapshots from today and yesterday; keep one snapshot for the last week and one for two weeks ago; keep one snapshot per month for all previous months of the year; keep one snapshot per year for all previous years; delete the rest.” This seems like a perfectly fine “logarithmic backup” rule, which I have turned on (it is off by default, which I like).
  • Ability to set rsync “exclude” rules by regex.

OK guys, I am sold.

  1. Tests whether the drive has changed since last snapshot, and if not, quit.
  2. cp -al <previous snapshot> <new snapshot>. This creates a complete recursive copy, but every single file is hard-linked to the old snapshot.
  3. rsync -aEAX –delete-excluded –delete –chmod=Fa-w,D+w <current directory> <new snapshot>. This copies every modified file over the top of the existing hardlink (deleting the hardlink and creating a new file), and sets it to read-only.
Advertisements

6 Responses to “How BackInTime works”

  1. So I guess the features I would like to see are:
    – Restore button prompts before restoration.
    – The prompt includes a checkbox (ticked by default) “make a copy of any file which is restored”.
    – Restore works on the command-line.

  2. Interesting. I just found that rsync has a –link-dest=DIR option which checks each file against DIR, and if it hasn’t changed, makes a hardlink with the existing file.

    This seems to replicate the core of BackInTime’s behaviour, so I don’t see why it needs to do a cp -al followed by an rsync (why not just do an rsync –link-dest?)

    Also this “step 1” (check if the folder has changed since the last backup) is really killing me … it takes about 30 minutes on my entire drive, and it’s a complete waste of time because there is 0% chance that my drive won’t have changed (at least my Bash history and Firefox cache will change every day). I wish there was some way to turn it off.

  3. Eliminating the *.backup.* files is actually trivial – not “hell to clean up”:

    find . -name “*.backup.” -print0 | xargs -0 rm -f

    Just make sure you want to delete all of the files before you do this. To see these files instead, do:

    find . -name “*.backup.” -print0 | less

    On the other hand, I’m not able to get backintime to recognize old snapshots. This is critical (in terms of backintime operation) as it isn’t possible to restore without recognizing old snapshots.

    • True, but I’m quite uncomfortable running a complicated command including “rm -f” over my entire drive. Even if there is (by definition) a backup. It would be best if BIT didn’t create those files at all (if told not to). The Restore feature *should* prompt, and on the prompt screen, there should be a check box “back up existing files” which is checked by default (to preserve existing behaviour).

      What do you mean recognize old snapshots? Old ones that it created itself, or ones that you created before that? If it can’t see its own snapshots, then it’s kind of useless (and it must be a bug). If you’re talking about backups you made before using BIT, then yes. It won’t be able to recognise them, because BIT snapshots contain that fileinfo.bz2 file.

  4. I like backintime, too. But there is a very important feature missing.

    Among other things, we do backups, because a hardware failure may destroy our primary data location. If backups are stored over a long period of time they could become corrupted due to hardware failure. Most likely this corruption remains unnoticed, until you need the data at most: when your primary data location is dead. If you think about it, this becomes a very important point.

    So what is missing in backintime/rsync is a method to record hashes for each file which is stored on your backup drive. The backup data needs to be checked against those hashes regulary to prevent backup corruption to be unnoticed.

    To my knowledge, rsync does check if transmitted files are stored correctly using hashes. It should store those hashes along with these files to allow for regular verification.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: