Archive for the ‘Version control’ Category

Articles

How BackInTime works

In Version control on May 19, 2010 by Matt Giuca Tagged:

I’ve just become a fan of the BackInTime backup utility for Linux. I previously rolled my own backup utility which used rsync, but it didn’t do incremental backups and I’ve become increasingly worried about overwriting my only good backup with a dud. I’ll only trust such a tool if I understand how it works, and BackInTime is extremely simple. I’ve briefly perused the source code but mostly figured out how it works by reading the command-line output and examining the directory structure it creates. So here’s a brief rundown of how it works, for technical users:

  1. Tests whether the drive has changed since last snapshot (using rsync), and if not, quit.
  2. cp -al <previous snapshot> <new snapshot>. This creates a complete recursive copy, but every single file is hard-linked to the old snapshot.
  3. rsync -aEAX –delete-excluded –delete –chmod=Fa-w,D+w <current directory> <new snapshot>. This copies every modified file over the top of the existing hardlink (deleting the hardlink and creating a new file), and sets it to read-only.
  4. Create a simple CSV file fileinfo.bz2 which contains each file’s original permission bits, owner and group.

That’s it. The important thing is I didn’t want my backup hidden away in a database or other opaque structure; I just want it to be files. Of course, the hard-linking makes the backups inexpensive, as only the modified files take up additional space in each snapshot.

You can restore a BackInTime backup, if you have to, just by copying the files over — however, they will be read-only, so it’s preferable if you use BackInTime’s restore function (sadly and confusingly this isn’t available from the command-line, only the GUI).  You can restore individual files or whole directories. The restore process is just:

  1. If the original file already exists, rename it to <filename>.backup.<date>.
  2. rsync the file over to the original location.
  3. Lookup the mode/owner/group in fileinfo.bz2 and set it on the file.

My only concern with this is that if you restore the entire backup over an existing copy, you will end up with .backup.<date> files all over the place, which would be hell to clean up. Also the “Restore” button has no prompt :/

Features also include:

  • Command-line, Gnome or KDE4 GUI support (though as mentioned, restoration is GUI-only).
  • Browse any snapshot from the GUI or in a regular file browser.
  • Graphical diff between any files across snapshots, via Meld.
  • Optionally auto-remove snapshots according to some very smart rules: “Keep all snapshots from today and yesterday; keep one snapshot for the last week and one for two weeks ago; keep one snapshot per month for all previous months of the year; keep one snapshot per year for all previous years; delete the rest.” This seems like a perfectly fine “logarithmic backup” rule, which I have turned on (it is off by default, which I like).
  • Ability to set rsync “exclude” rules by regex.

OK guys, I am sold.

  1. Tests whether the drive has changed since last snapshot, and if not, quit.
  2. cp -al <previous snapshot> <new snapshot>. This creates a complete recursive copy, but every single file is hard-linked to the old snapshot.
  3. rsync -aEAX –delete-excluded –delete –chmod=Fa-w,D+w <current directory> <new snapshot>. This copies every modified file over the top of the existing hardlink (deleting the hardlink and creating a new file), and sets it to read-only.
Advertisements

Articles

Why Git Ain’t Better Than X

In Version control on March 26, 2010 by Matt Giuca Tagged: , ,

I’ve been aware of the website Why Git is Better Than X for some time, and it’s always irritated me. This is my rebuttal.

Firstly, some context. I’m an avid Bazaar user. I say this upfront because when I talk about revision control, I’m always biased towards Bazaar. Having said that, since the website, made by Scott Chacon, claims to exist because “I seem to be spending a lot of time lately defending Gitsters against charges of fanboyism, bandwagonism and koolaid-thirst,” I feel it’s necessary to expose this website’s fanboyism, bandwagonism and koolaid-thirst. I’ll try to be objective, but I think this website does a great deal of damage to Bazaar’s reputation so I want to challenge it.

Here’s the deal: Distributed version control systems (DVCSes) are awesome. We all know that, and if you disagree, you’re living in the 90s. Why Git is Better Than X (WGBTX) makes a very good case for DVCSes, comparing the author’s favourite DVCS, Git, against the previous decade’s champion VCS, Subversion. Git works very differently to Subversion, and the site does a good job highlighting the differences. There’s also Perforce, which I don’t know much about, but I gather it’s a crappy proprietary centralised VCS which is worse than Subversion in pretty much every way. [Edit: I got a lot of heat for the Perforce comments. I admit, I know nothing about it so disregard my comments.] So WGBTX does a lot of Perforce bashing. Unfortunately, the two other major DVCSes, Bazaar (bzr) and Mercurial (hg) get a lot of heat too, and in my opinion, for no good reason other than that they don’t behave exactly like Git. (Perhaps the site should be titled “Why Git is more like Git than things that are not Git.”)

So WGBTX isn’t an entirely bad site. It makes a good case for DVCSes. I just think that its comparisons between Git and Bzr/Hg have almost no merit, and therefore the site should be called “Why distributed version control is better than centralised.” Not quite so catchy, though.

So here we go, a point-by-point rebuttal.

Cheap local branching

This is my biggest complaint. The website claims that Git and only Git has “cheap local branching”, while Bazaar, Mercurial, Subversion and Perforce do not. The fact is that cheap local branching is part of being a DVCS, and they all have ways of doing it. They just aren’t the same as Git’s. So I’ll assume that Scott was unaware of Bazaar’s “shared repository” feature (ignorance rather than malice).

In Git, the basic unit of revision control is the “repository”. When you do a git clone, you clone all or part of a remote repository (you get some or all of its branches). Branches live inside a repository. When you do a git branch within the repository, it’s very cheap because all of the branches inside a repo share their revision history. That’s all well and good, but it isn’t the only way to do it.

In Bazaar, the basic unit of revision control is the “branch”. When you do a bzr branch, you clone a single branch. Projects on Launchpad aren’t repositories, they’re branches (related branches are organised on Launchpad into projects, while on my personal computer, I organise them by directory — that’s outside of the VCS). The problem is that these “branches” don’t share data, so making a bzr branch, even on the same file system, copies all of the branch history. The solution in Bazaar is to create a “shared repository”. It’s very simple — bzr init-repo <dir> makes <dir> into a shared repository. Any branches created in a descendent of that directory share revision history with one another. I like this because Bazaar has no conceptual repository, it’s just a transparent cache. So I don’t need to mentally deal with repositories vs branches, I just store my large projects in a shared repo, and it’s all good.

So Bazaar does not have cheap local branching by default, but it’s very easy to achieve. Effectively the only difference with Git is that I am forced to create a shared repository for every branch, even if it’s tiny. I prefer this because I don’t want shared repositories by default — only on large projects I am likely to have a lot of branches on. On my PC, I probably have around 50 Bazaar branches for tiny projects. I have two major projects which have a shared repository each, and a number of branches under that.

[Edit: Jakub points out that the cost of the repository is only one factor — there’s also the cost of the working tree, since every Bazaar branch has a separate working tree as well. I explain in the comments below how to work around this in Bazaar, but I admit it would be nicer if Bazaar had a simpler way of handling this.]

I haven’t used Mercurial, but this page indicates that the basic “clone” command does a full expensive branch, while the “branch” command does a cheap local branch. So this argument only applies to non-distributed VCSes.

Everything is local

No issue — only an argument against non-distributed VCSes.

Git is fast

Yes it is, but .. is this even worth mentioning? Local operations in any VCS are practically instantaneous. In my own experiments, I found that Git was around about 10 times faster than Bazaar. 10 times in this case meaning 0.02 seconds versus 0.2 seconds. I’m perfectly happy with a 0.2 second local commit time.

Remote operations are what takes the real time. And these are governed by the network latency, not the speed of the implementation. Despite the claim that Subversion is so slow it isn’t worth measuring, this assumes that all work is done locally. Of course, even DVCS users need to do remote commands some times.

The real issue here is with the ridiculous times listed for Bazaar. Maybe this was measured a long time ago when Bazaar was slower (I gather it’s been improved a lot). But really, 14 seconds for a bzr status or bzr diff? What changeset was this run on?? The page gives no details.

I repeated all of the experiments myself as closely as I could (I used the same Django repository). My results are attached to this post. I tried, as Scott suggested, adding 2000 files for testing the add, status, diff and commit commands. For Bazaar, add took 1.5 seconds, status and diff took around 0.5 seconds, and commit took around 3 seconds. In Git, add took 2.8 seconds (as Scott noted, Git add is slow for some reason [Edit: Jakub points out that git add is actually copying the files into the repository, not just intending to add them later]), status took around 0.3 seconds, diff didn’t show the added files, and commit took around 0.4 seconds. So Bazaar is slower, but it really won’t affect your work.

Also I suspect the branch figures suffer the same issue as the “Cheap local branching” issue — I assume the git branch of 1.16 seconds was a cheap local branch, while the bzr branch of 82 seconds was an expensive copy, because Scott didn’t use the “shared repository” feature. So this is comparing apples with oranges. I ran a branching test — without using shared repositories, git clone took 5.5 seconds; bzr branch took 31 seconds (though I should note that one can also copy a non-shared bzr branch using cp -a, which took 3.2 seconds). For creating a branch within a shared repository, git took 0.01 seconds, while bzr took 7.3 seconds. Again, Git is still faster, but Bazaar is nowhere near as slow as Scott is reporting.

[Edit: The bzr branch command does more than Git; it creates a separate working tree. You can avoid this by running bzr branch –no-tree, and work in the branch by ‘switching’ some other checkout to it, if you like. That should be faster.]

Git is small

[Edit: Scott has removed “bzr” from this category, but still reports the old Bazaar figures.]

This figure may have changed since Scott reported it, but Bazaar now has better branching formats. When I ran this test, I found Bazaar to be (just) smaller than Git.

For Git, the Django repository is now 27MB (Scott reported 24MB, presumably awhile ago). The entire directory is 53MB.

For Bazaar, the Django repository is 24MB (Scott reported 45MB). The entire directory is 50MB.

So Bazaar is now the title holder for smallest repository format.

I also wish to point out that because Bazaar offers a number of workflows, you can also use it in a “lightweight checkout” mode (i.e., non-distributed VCS). If you just want to do a quick checkout, you can use bzr checkout –lightweight, which creates a Subversion-like branch. You can do anything you can do with Bazaar, but like Subversion, any log, revert or commit actions are performed remotely. In this mode, the Bazaar metadata alone is 672KB, and the entire directory is 27MB.

The staging area

Here is another case of “Git is better than everything else because they don’t do it Git’s way.” Git has this “staging area” which means before you commit, you have to explicitly add each file (not just the first time you create a file, but every time you commit to a file). The advantage here is that you don’t have to commit your entire working changeset, you can just commit some of the files. You won’t accidentally commit changes to a file by accident, but then again, you could accidentally not commit a change you thought you were committing. Also handy is that you can stage only part of your changes to a file, so you don’t accidentally commit your debug prints, for example.

I personally think this is a very bad idea — I’m already prone to forgetting to add files I just created. I’m sure if I used git I would often forget to add every file I wanted to commit. But I can see why people like it. As Scott points out, you can opt-out using git commit -a. I like the idea of a) being able to selectively commit files, and b) being able to commit only part of a file’s changeset, but I think it should be opt-in, not opt-out.

Firstly, obviously, all VCSes, even CVS and Subversion, let you selectively commit files. You just have to explicitly list them on the ‘commit’ command-line. So that’s easy.

As for committing part of a file, Bazaar has a (relatively new) feature called the “shelf”. I can type “bzr shelve” which brings up an interactive screen, very similar to git add, which lets me say Y/N for each change to each file. Anything which I “shelve” is completely reverted (no longer in the physical file), but it’s stored on the “shelf” for later. So if I want to commit only part of the file, I write “bzr shelve”, shelve all the things I don’t want, commit, then write “bzr unshelve” to get them back. The unshelve is great because it does a proper merge with the file as it is now. This is more roundabout for Git users used to just selectively adding, but it’s more powerful, because I will often wish to shelve a change for a long time, perhaps even days. If I’ve got a “mini-feature” which doesn’t warrant a branch, but isn’t finished, I might just kick it out the way onto the shelf, do more work, then unshelve later (giving me a merge). In that sense, it’s almost like a mini-branch.

The point is, Git isn’t the only VCS with this feature, it’s just implemented differently elsewhere. Mercurial doesn’t seem to have shelving built into it, but there is a ShelveExtension which you can add to Mercurial to give it the same feature as Bazaar.

[Edit: I got some comments which show that the Git community sees partial commits as much more natural than shelve/commit/unshelve. I will state clearly that I strongly disagree with that view.

In my view committing something which is not *exactly* the current working tree is asking for trouble. If you run your code through a test suite and it passes, then you commit some but not all of your changes in the current working tree, then you are *committing untested code*. You may think that you are including only the important changes, but programs are complex. Some part of the code which you didn’t explicitly commit may actually be necessary for the other changes to work.

It’s not as easy, doing it the shelve/stash way, but this workflow is the only way to ensure you are committing tested code:
1. bzr shelve
2. Run test suite
3. bzr commit
4. bzr unshelve

Also, Git does have a shelve command as well – git stash. I would recommend that over partial commits.]

Distributed

No issue — only an argument against non-distributed VCSes.

Any workflow

This is probably my biggest gripe with the site. It only claims this is an advantage Git has over Subversion and Perforce, so it isn’t bashing Bazaar/Mercurial. But in my opinion, this is a weakness of Git. Git users have told me they are proud of their distributed-only model. In my experience, “Any workflow” is a major strength of Bazaar, which it wields over Git, Mercurial, Subversion and Perforce.

In the Bazaar manual is a list of workflows. This is the really cool thing about Bazaar. Basically, being distributed is awesome. Being able to work locally is great, being able to commit locally and send changes to a server later is great, not even having a server is great also. But those are just some of the workflows which I have in my everyday job. It turns out that, for me at least, most of the time I am using Bazaar like Subversion. I am working in a close team of a handful of people on a project, and we are all making close changes which could, from one minute to another, conflict.

We can’t afford to be each working on our own separate branches and occasionally push our changes to the server and see if they conflict. We all want to be working with the latest version at all times. That’s the good old fashioned Subversion model. Does this make me a bad DVCS citizen? I don’t think so… because at any time I can whip out a new branch, do a bunch of local commits, merge from trunk, then push. Or be working on the train, and do a merge when I get to the office. I do all of these things. With Bazaar, I can effortlessly switch between workflows, and I love it.

The basic feature Bazaar offers here is bound mode. If I enter bound mode (either by doing a “bzr checkout” instead of “bzr branch”, or by typing “bzr bind” at any time), my local branch is synched to a remote branch. I still have the full history locally, but if I do a commit, it will first check that my branch is up to date, then commit remotely first, and finally apply the commit locally. This “lock-step” development model is often perfect because we never have to do merges.

Despite what WGBTX says, Git doesn’t really offer a “Subversion-style” workflow. If you want to work that way, you have to commit locally, then try a push, and if someone has pushed since you last pulled, you must merge and (as style dictates) rebase.

GitHub

GitHub is great, but every open VCS has its own free development communities. Bazaar has Launchpad (apparently Scott took Bazaar off the list of things Git is better than for this category because Launchpad has a large community — all of Ubuntu for starters). Mercurial has BitBucket. Subversion has heaps — SourceForge and Google Code for starters. Perforce is proprietary crap so who cares.

I know Scott has already come under fire about bashing Bitbucket, and he retracted some comments. But Hg is still listed as a “Git is better than” in this category. Apparently not because it’s easier to get Git hosting than Mercurial hosting, but because “This social aspect of GitHub [that it has a larger community] is the killer.”

I personally don’t see that as a significant advantage — if people want to develop for your project, they won’t care what hosting service you’re using. And it isn’t an advantage of Git at all.

Easy to learn

Apparently a reason why Git is better than Perforce.

I disagree, but I’m very biased. As someone who switched from Subversion to Bazaar, I must say it was extremely easy. Getting my head around the branching and so on was tricky, but at least Bazaar gave me the gentle learning curve, since I could operate in bound mode and it felt exactly like Subversion.

This learning curve should not be underestimated. If you’re working with Bazaar, it’s very easy for someone with Subversion skills to join your project. You can just tell them, “do a checkout, stay bound, and just do everything the same way as Subversion.”

This simply isn’t true with Git, because you have to learn all about branching, merging, rebasing, using SHA-1 hashes for revision IDs, etc, on your first day. Showing that Git has the same commands as Mercurial is a pathetic argument, since they behave quite differently.

That’s it

Apologies if some of these words are harsh. It’s just that I’ve spent a lot of time defending Bazaar against the Gitsters. It really annoyed me seeing this frankly quite ignorant, or at the very least, out of date, website. I find that most people who’ve used Git try to convince me of how much better it is than Subversion. I tell them that I agree, but I use Bazaar. The problem is, they’ve never tried it.

I haven’t really said much about why Bazaar is better than Git in this post. I plan to do some follow-ups which are hopefully more technical than argumentative.

Please comment if you think I’ve got anything wrong. I’m not a Git/Mercurial user, so I’d be happy if you taught me something.

Articles

ViewVC – Your personal Subversion viewer

In Version control on June 10, 2008 by Matt Giuca Tagged: ,

If you’re like me, you have Subversion repositories everywhere. Every project you do starts with an svnadmin create, followed by an svn checkout. If you’re more organised than me, you might even have a shell script to do that for you (I wish I was more organised than me).

If you’re not like me, well damnit, you should be! It’s so easy to create a new svn repository and it saves you from screwing up your work (or in extreme cases deleting it entirely – yes this has happened to me, and a friend too).

(Apologies to those higher beings who’ve moved on from Subversion to even funkier revision systems).

But I’m not here to preach about Subversion. I’m here to show you a neat app you can install to make your life easier, assuming you already use Subversion.

ViewVC is a nifty Python-based web app which lets you browse your files, logs, diffs and history using a web browser. It’s designed for use in large team projects, and it’s automatically active on all SourceForge projects. (Example). But I’m going to show you how to set it up for your own personal projects on your localhost.

How to do it

First, you need Apache, Subversion, and ViewVC. Of course if you have Ubuntu, this is ridiculously easy:

sudo apt-get install apache2 subversion viewvc

If you’re not on Ubuntu, then you’re on your own for these steps. (ViewVC’s documentation is in the INSTALL file inside the source package itself).

Next, edit the ViewVC configuration file. By default, it’s /etc/viewvc/viewvc.conf.

  • Comment-out cvs_roots and svn_roots.
  • Uncomment root_parents and set it to “/some/path : svn”. eg: “root_parents = /home/matt/repos : svn”
  • Set address to your email address (not that you need to know your own email address, but you have to put something there).
  • Set root_as_url_component = 1. Makes nicer URLs (try it without it and see the difference).

That’s it! Now create the directory you referenced above, (eg. /home/matt/repos), and all you need to do is chuck your repositories in there.

Of course, what you really do is keep them wherever you like them, and make symlinks to them. This is quite easy. Whenever you make a new repository (say you make a repository in /home/matt/src/svn/awesome), link it like this:

ln -fs /home/matt/src/svn/awesome /home/matt/repos

(You can delete the link at any time as if it was a regular file).

Now, point your browser to http://localhost/cgi-bin/viewvc.cgi (the default place it is installed), and you will be able to browse all the repositories.

Bonus step

As a last step, we can make that URL a bit nicer. I’d like a URL like this: http://svn.localhost/.

You need to find your Apache config file. On Ubuntu, this is located in /etc/apache2/sites-enabled/000-default. Edit the file and add this to the very bottom:

<VirtualHost *>
    ServerName svn.localhost
    ScriptAlias / "/usr/lib/cgi-bin/viewvc.cgi/"
</VirtualHost>

This sets it so if you access your localhost using the domain name svn.localhost, it will override whatever else you have set up to use ViewVC. The last thing you need to do is get svn.localhost to DNS-lookup to 127.0.0.1 (or the browser won’t be able to find the web server). To do this, edit the file /etc/hosts. Add the following line, near the top:

127.0.0.1    svn.localhost

Now you can bookmark http://svn.localhost/, for a quick and convenient way to browse all of your Subversion repositories. Excellent!