When the Levee Breaks

Yesterday I decided to start working on the porting of the Gtk2::SourceView Perl module to the new upstream API. For my convenience, and because I know I’ll probably screw up, I decided to use a local git repository so I can experiment with all the branches I want before hitting CVS. Yes, you read that right: the Perl GTK+/GNOME bindings still use CVS on SourceForge.net ((there has been talk about moving them to GNOME CVS, then to SVN, but in the end the maintenance burden would be too high, and some of the members of the team would need at least SVN accounts anyway)). Thus, I decided to import the whole gtk2-perl repository into a git one using git-cvsimport, and – lo and behold – after four hours of checkout, I got it on my machine, complete of full history.

The layout of the bindings modules is composed of a single CVS module and all the Perl modules are inside it; this is far from optimal with git ((or any other SCM software that is not CVS, for that matter)), so I proceeded to split up each Perl module into its own repository, with the help of git-filter-branch – a new command taken from the Cogito suite and added to the 1.5.3 release of git.

The filter-branch command is extremely powerful: it rewrites the history of a repository (which is a destructive operation) by passing a filter function on it. It has a set of predefined filters and contexts of operations, so what you need to do to split out a sub-directory into its own repository is call:

  $ git filter-branch --subdirectory-filter directory refspec

and after that you get all the files filtered out marked as new or modified, so you can use git reset --hard to get rid of them, and have your sub-directory contents as the only recognised content of the repository.

Unfortunately, you can’t really filter out a direct import of a CVS repository: git-cvsimport stores branches and tags, and filtering will most likely create dangling objects; so, what I did was cloning the original repository, to get rid of the local branches, and remove all the tags:

  $ git clone --no-hardlinks /tmp/gtk2-perl Gtk2-SourceView.git
  $ for TAG in `git tag`; do git tag -f -d ${TAG}; done

The --no-hardlinks switch is important for later – I have to thank Ricardo Signes for this tip; in short: it makes git use real copies instead of hardlinking files when cloning a local repository, and will make the garbage collection and pruning phases actually work and prune the unused objects from the git database.

At this point, I just filter-branched and reset:

  $ git filter-branch --subdirectory-filter Gtk2-SourceView HEAD
  $ git reset --hard

and then called:

  $ git gc --aggressive
  $ git prune

and finally obtained my local git repository of the Gtk2-SourceView module from the original CVS repository – with all the history on HEAD preserved. The good part is that the entire set of operations is very repetitive, so it’s suitable for scripting ((I did write a small script which extracted every Perl module sub-directory into its own git repository – but it’s mostly 50 lines sugarcoating the core 5 lines of actual work)). Yey for git! :-)

5 Replies to “When the Levee Breaks”

  1. git-filter-branch rules. I used it a lot when splitting out gio from gvfs, and will use it to merge gio into subversion.

  2. I’ve wondered about reversing the split, and apparently git filter-branch does it too.

    This to move a subproject into a subdir:

    git filter-branch --index-filter \
    'git ls-files -s | sed "s-\t-&newsubdir/-" |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
    git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' HEAD

    Some git pulls to import the subproject histories into a new repo.

    And some grafting:

    echo "$commit-id $graft-id" >> .git/info/grafts
    git filter-branch $graft-id..HEAD

    That’s untested though.

  3. Bindings for the upstream gtksourceview2 would be brilliant. If only because this release of the sourceview engine actually does a decent job at highlighting perl code.

  4. Just to let you know: the link points to a script with a type on the last-but-2 line: it says “&& it-prune” rather than “&& git prune”. Other than that, thanks for a useful script automating all the fiddly bits of extracting history :-)

Comments are closed.