The Laws Have Changed

This week I got an invite for trying out Github beta. Github is a nice service providing some space and tools to set up git repositories for open source projects.

Obviously, setting up HTTP (dumb) read-only git repos is doable on any box connected to the intertron by merely copying the repository; the nice bits of working with git, though, like pushing branches and tags, or fast cloning through the git: protocol are somewhat harder to set up, so anything that relieves you from actually doing this kind of work should always be welcome.

Github not only provides you with that: it also gives you an incredibly nice web interface; a wiki; the ability to easily track other projects through RSS feeds; the ability to easily set up a “fork” of another repository, thus easing the pain of setting up personal development repositories for contributors.

I decided to try Github by moving there JSON-GLib‘s development git repository, plus some branches and the release tags. The operation took me approximately ten minutes, including the account creation, and it was all very easy and very well explained.

At the moment, Github is in beta ((If anyone is interested to try it, I have two invites an invite left to give out invites are now all gone)) and there are lots of projects starting out; if anything happens, cloning out with full repository history is also a nice feature of git, so I guess that side is covered as well. Bugs and feature requests are now handled using wiki pages, so if I’d have to make a request to the guys at Logical Awesome then it would be: dudes, move to a serious bug tracking system – wikis don’t scale for that (believe me: been there, done that).

All in all, it’s a great service – the way SourceForge should have been, if technology allowed it at the time and if they didn’t choose to reimplement the damned BTS and mailing list archives with something that can only be described as “made of fail”; and if they plan to make me pay a reasonable fee, it’s fine for me: Github’s UI alone is worth it.

When the Levee Breaks

Yesterday I decided to start working on the porting of the Gtk2::SourceView Perl module to the new upstream API. For my convenience, and because I know I’ll probably screw up, I decided to use a local git repository so I can experiment with all the branches I want before hitting CVS. Yes, you read that right: the Perl GTK+/GNOME bindings still use CVS on SourceForge.net ((there has been talk about moving them to GNOME CVS, then to SVN, but in the end the maintenance burden would be too high, and some of the members of the team would need at least SVN accounts anyway)). Thus, I decided to import the whole gtk2-perl repository into a git one using git-cvsimport, and – lo and behold – after four hours of checkout, I got it on my machine, complete of full history.

The layout of the bindings modules is composed of a single CVS module and all the Perl modules are inside it; this is far from optimal with git ((or any other SCM software that is not CVS, for that matter)), so I proceeded to split up each Perl module into its own repository, with the help of git-filter-branch – a new command taken from the Cogito suite and added to the 1.5.3 release of git.

The filter-branch command is extremely powerful: it rewrites the history of a repository (which is a destructive operation) by passing a filter function on it. It has a set of predefined filters and contexts of operations, so what you need to do to split out a sub-directory into its own repository is call:

  $ git filter-branch --subdirectory-filter directory refspec

and after that you get all the files filtered out marked as new or modified, so you can use git reset --hard to get rid of them, and have your sub-directory contents as the only recognised content of the repository.

Unfortunately, you can’t really filter out a direct import of a CVS repository: git-cvsimport stores branches and tags, and filtering will most likely create dangling objects; so, what I did was cloning the original repository, to get rid of the local branches, and remove all the tags:

  $ git clone --no-hardlinks /tmp/gtk2-perl Gtk2-SourceView.git
  $ for TAG in `git tag`; do git tag -f -d ${TAG}; done

The --no-hardlinks switch is important for later – I have to thank Ricardo Signes for this tip; in short: it makes git use real copies instead of hardlinking files when cloning a local repository, and will make the garbage collection and pruning phases actually work and prune the unused objects from the git database.

At this point, I just filter-branched and reset:

  $ git filter-branch --subdirectory-filter Gtk2-SourceView HEAD
  $ git reset --hard

and then called:

  $ git gc --aggressive
  $ git prune

and finally obtained my local git repository of the Gtk2-SourceView module from the original CVS repository – with all the history on HEAD preserved. The good part is that the entire set of operations is very repetitive, so it’s suitable for scripting ((I did write a small script which extracted every Perl module sub-directory into its own git repository – but it’s mostly 50 lines sugarcoating the core 5 lines of actual work)). Yey for git! :-)