Happenings in the VCS world

It has been a long time since my last blog post on VCSes. I am getting back into the swing of things and will be making a few more posts. Besides, Olav doesn’t have enough to do and he wants more of my long rambling posts to digest.

The VCS world is becoming more and more interesting, even if it is also more and more frustrating. I’ll briefly point out a few things I have seen happen in the last few months that look cool, making this VCS post a little bit different than my others.

cvs

Stinking stingy CVS refuses to die…it seems to prefer slowly petrifying over the years or something. It was great a number of years ago, but there’s just so many better tools these days. However, there does appear to be a light at the end of the tunnel. The last place I am forced to use CVS (work) will finally be switching (to subversion) in a couple months. Woohoo!

svn

I haven’t seen any big changes in subversion itself (only one bug fix release has occurred). However, it looks like they are making progress on finally implementing useful merge functionality. This is interesting on a number of levels: (1) this lack of functionality was one of the big reasons subversion sometimes looks like a (very well polished) antique rather than a modern system; will the incorporation of this feature be enough to stave off some of the ongoing defections to other systems?, (2) this may be interesting for those using bzr-svn, hgsvn, or git-svn — are users of such systems going to find it even easier to use their preferred tool?, (3) the main reason svn’s dozen or so ugly renaming bugs (some of which essentially result in corrupted data) have gone almost completely unnoticed is that most are only triggered in merge operations and subversion’s current merge functionality is so primitive and problematic that hardly anyone uses it. Further, svn’s roadmap clearly lists fixing the rename problems in a different release, after the merge fixes are included. Will the extra visibility that one problem will receive due to a different problem being fixed make subversion look more problematic or less? This will be fun to watch.

On a separate note, it is interesting to see that subversion developers are considering adopting some features of distributed VCSes — sometime in the distant future. An easy to miss but interesting nugget from that email is the following:

Fortunately, we’ve pretty much agreed, IIRC, that we’re willing to punt on subdirectory detachability in working copies in order to get performance improvements.

I have often seen svn and cvs proponents argue that as one of the big advantages to those systems, yet it looks like the svn developers are willing to drop it. Very interesting indeed.

hg

Mercurial version 0.9.5 was released since I did my last round of VCS blog posts and it is on my system. hg-0.9.5 has quite a number of improvements; the one that particularly caught my eye was support for subversion as a source SCM in its convert functionality. When I first looked at mercurial, they suggested people use git-svn and then convert from git to hg. To me, that seemed to push people to just use git. It looks like this has changed.

I have often found it somewhat strange that mercurial doesn’t have more active vocal proponents. Usually one hears from the git or bzr proponents, but not so much from mercurial. Yet it has always had many of the advantages of both (and, in some ways seems to have the most svn-like UI, and would seem a more natural transition for svn converts). I guess it’s a case where having most of the advantages or capabilities of other systems (even multiple other systems) yet not clearly standing out in one particular area will rob you of the active advocates that you could otherwise have. Of course, maybe it’s like the linuxjournal reader’s choice awards phenomenon too; the noise or results that others hear may only be indicative of a certain small subset of the community.

bzr

A lot has happened in the Bazaar world. They had their big 1.0 release in mid-December and are now up to bzr-1.2. They have made impressive gains in performance, particularly with their adoption of the pack idea from git, and it appears they have at long last caught up to the leaders in the field in this area.

Near the end of last year, I corresponded about early versions of the “Main Competitors” writeups of the Why Choose Bazaar page, with Ian Clatworthy. I pointed out some advantages of bzr he hadn’t included, mentioned how some bold claims had no accompanying proof, and pointed out some places where he seemed to be unaware of capabilities of other systems or where I disagreed with some of his claims. The final versions seem to have mixed results; part of my feedback was addressed (and more was addressed in follow-ups), but other parts were not. I’m particularly puzzled by the reticence to investigate the existing capabilities of other systems and the willingness to claim features of bzr as advantages without determining whether they are actually unique. Regardless, though, while one does need to individually verify or discard each claim, the writeups are fairly impressive. I probably need to get back in touch with Ian again.

git

I’m so annoyed with Carl right now. He was the one who introduced me to git a number of years ago, and showed me some really cool things about it. I dropped it almost immediately at the time because it was way too hard to use. But, I’ve always been interested in it and made occasional attempts to tame the dragon ever since.

As many are aware, git has made huge strides towards usability in the 1.5 series, and has recently introduced automatic repacking in git-1.5.4. Because of all this work, I made diligent attempts to understand it over the last couple months. In doing so, I finally had the necessary epiphanies to feel I understand it. It turns out I was able to use it productively long before the uncomfortable feeling of I-don’t-really-understand-this-thing was finally expelled. The result? I found that there are several features of git not present in other systems that I am absolutely addicted to, but looking back on the journey I can’t say that it would be worth the effort for others to follow the same path, despite these awesome features. The thing is still too bloody hard to figure out.

One of my desires for my blog posts series was to point out how horrible the git manpages (i.e. the built in help system for git) are for new users, but I felt uncomfortable doing so until I actually understood them. I was not able to understand even the synopsis of the git-diff manpage until a couple weeks ago. And I tried. Hard. Over days, weeks, and months. I read up on reflogs, the index, git’s storage format, the git tutorial and all kinds of other documentation. I feel stupid now, because I was just missing something simple and now seemingly obvious. But from what I can tell, little should-be-obvious-but-aren’t things like this are blocking lots of people from being able to use git.

Long story short: git has become far more usable…mere mortals can actually figure the system out (a big change from earlier versions) if they have an unusually large level of patience and motivation. git has some really awesome features, but I just can’t recommend it to others in its current state.

Tags: , , , , , , ,

29 Responses to “Happenings in the VCS world”

  1. Jeremy Katz says:

    I think that your path with git mirrors that of many others. The problem is that once we become enlightened, it becomes harder to find and point out the little things that make it difficult for new users to pick up.

    And FWIW, before I came to grips with git, I was a pretty vocal mercurial proponent. Then I started hitting some of the limitations of hg as well as starting to find the features of git that are “can’t live without once you know about them” and, well… not so much of an hg advocate these days.

  2. Havoc says:

    git does have a lot of nice things, I’ve been using it lately even. Too bad it’s saddled with so much baggage. I wish I could figure out how to fix it and make productive suggestions; the problem is that so much of the underlying implementation “leaks,” both into docs, into how people use it, etc. It’s tough to chip away with small suggestions and end up with something several times less complicated to learn (and use), which is what’s needed.

    The solution might be a complete wrapper – one that did not leak the underlying usage of git _at all_, but was complete by itself. The wrapper would probably need to assume a few workflows; maybe a linux-kernel type, and a centralized-with-offline type. Maybe just copying one of the saner-UI systems but implement it with git?

    There are still lots of should-be-easy tasks with git that I just can’t figure out or don’t understand, unfortunately. But finally I can basically get through a day of coding with it.

    The man pages…. my god. Just tell me how to do the top 5 common tasks with each command, with examples! I’m with you, some of these I read a bunch of times and still don’t know what they are trying to say.

    Unlike a year ago I can now imagine using git, but I can’t ever imagine thinking it has a good design. It’s now in the autotools category for me: works, has some nice aspects, but way too many “dammit, what were they thinking” moments.

  3. Ali Sabil says:

    Thanks for this blogpost :) But could you please post the list of the git unique features ?

  4. Sean Kelley says:

    We have been using Mercurial (HG) with about 70 developers remotely deployed world wide. We have a lot of repos. In general we have been very pleased with HG. It just works. Its documentation is superior to the other distributed choices out there and it is something mere mortals coming from centralized Star Team or CVS can grasp. The reason you don’t hear these vocal supporters for HG as much is that ‘It just works’ and ‘works well’. That doesn’t mean there is not room for change and improvements – and those are being made.

    We do embedded Linux development with Mercurial in combination with Poky build system. We rely upon centralized repositories that are shared. We use patch submission and peer review for commits. We leverage the awesome ACL extension for Mercurial to gate push / pull access to repos down to the folder level. We have really fine tuned it to meet our needs.

    I could never imagine trying to support numerous developers spread out who have little or now experience with the GIT man pages or the git docs. GIT is improving but it strikes me as more of a hacker sort of work in progress. For example, why doesn’t GIT have a user list? Sure I could filter the GIT mailing list for patches. I have to agree with Havoc in that GIT reminds me of autotools – rather cryptic for new people but necessary for tasks.

    Sean

  5. otte says:

    FWIW, I’ve never had this feeling of not understanding git (if you don’t count the first week when I started with it). I also think the man pages are great. I’ve never had the need to look up any docs on the web after reading man pages. But I don’t think I understand what git does either. I guess I just got lucky when inventing my mental model of how git works.

    There’s one thing I keep telling people though: Learing git is like learning vi – it’s different from the VCS/text editors you know on a fundamental level. But once you’ve overcome that problem, you’ll not want to go back. Ever.

  6. mg says:

    Thank you for this series, I’ve had some problems wrapping my brain around git. Your post on the whole limbo in git helped me a lot.

    I know its your time and money, but if you ever write about git and your “aha” moments with it, that would be something I would like to read.

    Btw, bzr+avahi looks nice, would like to see avahi integration in git one day.

  7. Frej Soya says:

    @Otte, the text edirtor analogy points out the exact problem ;)

    I have been using vim for ~6 years and after 3-5 months I could do more in textmate than in vim (Yes I dropped os X again, just curious ;).

    I might be stupid, but at least i’m not the only one ;)

  8. The part about nobody seeing the problems with svn merge because merge is so hard that nobody knows it’s there reminds me a little of Paul Graham’s essay, “Beating the Averages” (), where he talks about problems of language design (search for “Blub”, around the middle).

  9. Alex Turner says:

    I started poking around with git about six months ago and I haven’t looked back. Whenever I have had questions I have found folks on the IRC channel and the mailing list very willing to help. And commands like git filter-branch are just must haves for me anymore. I’m a bit of an eager beaver, and sometimes commit stupid things into my repo, and being able to go back and remove them is just awesome. I don’t know how other systems work, but being able to flip back and forth between branches is just so great too. The only downside I see is a lack of a good windows client (I’m on vista, and cygwin in vista doesn’t work so great (I have 4GB RAM, and I get heap errors from time to time).

  10. behdad says:

    Like always, thanks a lot for writing this. Olav has been pushing me to followup with my git request on infra list, and this post helps a lot.

  11. One of the advantages of Mercurial over Git is that it has (supposedly) better documentation, which Carl Worth refused to steal (http://cworth.org/hgbook-git/) ;-), and that it is simplier (at the cost of functionality; c.f. multiple branches per repository are second-class citizens in Mercurial). Git documentation is improving (see “Git User’s Manual”), but it still needs improvements.

    I’d like to know too: what unique (or nearly unique) features of Git you cannot work without?

  12. Eric says:

    The interesting thing about subdirectory detachability is that it might be a misfeature: My company recently had a developer copy (not svn copy) a directory to another place in the repository and start making massive modifications without realizing that the changes were ending up in the original location. Fortunately, this is one of the things version control is good at repairing…

    I hear you about git. I started using it in the days of cogito, mostly because I started hacking on a project already using it. Cogito did a good job of pointing out the important functionality, but was slightly buggy (I accidentally lost some changes when switching to a branch that lacked the modified file) and didn’t include everything I found useful.

    Unfortunately, the list of git commands is massive, including many things not meant for the average user. It has confusing things like both revert and reset, which do completely different things. It has a “clean” command that I almost tried when something went wrong, having come from a subversion background, before realizing that it’s completely different. I’m currently trying to understand fetch, pull, push, and remote, particularly how to get them working automatically.

    On the other hand, once I found git-show-branch, I would never look at svn merges the same way. For that matter, merging itself is downright simple. Then there’s git-describe, which I didn’t notice until just before it got even better. “git-add -p” is something I’ve occasionally dreamed of, without noticing that it was right there. I’m sure bzr and hg have some of these features, but I’m not sure how many, and I don’t know that I really want to learn another vcs right now…

    …but given that I can’t even clone a local repository without installing cpio, I just might.

  13. Elijah says:

    Alex: I thought that msysgit (http://code.google.com/p/msysgit/) had become production ready as a native win32 port of git (and had reached that status, say, a couple weeks ago). I admit I wasn’t paying real close attention, though, so I may have misunderstood. Are you following those efforts?

    Eric: Try ‘git clone file://repo copy’ instead of ‘git clone repo copy’. See http://kerneltrap.org/mailarchive/git/2007/10/1/326757 for details.

    Ali, Jakub: In short, speed[1], repository container[2], the index[3], and history rewriting[4]. (You may find some of those features in other systems, but usually with a less powerful implementation, and definitely not the combination.) Most of these are hard to describe until you experience them; they just don’t click at first. But I’ll give it a shot:

    [1] This has been well discussed everywhere; I need not rehash anything here. However, I will note that this edge does seem to be decreasing some; not that git is getting slower, but that others outside of mercurial had so much room for improvement that it was easier for them to get better.

    [2] I kind of covered this at http://blogs.gnome.org/newren/2007/11/24/local-caching-a-major-distinguishing-difference-between-vcses/. Carl focused on this feature heavily when explaining why he picked git (http://lists.freedesktop.org/archives/cairo/2006-February/006255.html). I didn’t really fully understand his post when I first read it, but suffice it to say that I’m sick of needing a dozen different directories to house different projects, or keeping dozens of patches around and applying and unapplying them, or some combination.

    [3] Sure the index, as currently implemented, bites users hard (see http://blogs.gnome.org/newren/2007/12/08/limbo-why-users-are-more-error-prone-with-git-than-other-vcses/), but there’s also great uses for it as I pointed out in that blog post. The funny thing is that although I conceptually understood it when I wrote that post, I didn’t yet realize just how useful it was. I use it *all* the time now. Being able to mark some changes as “ready for commit” and dividing diffs into “good” and “remainder” is extremely helpful. Being able to easily keep dirty changes in the tree comes in handy way more than you’d expect. git add -p is my best friend.

    [4] Yes, it’s a really bad idea rewrite history that has been published, but the number of cases where rewriting unpublished history (and even special cases of published history) comes in handy is far more than you’d realize. The sheer pain of having once-recorded-never-erased has literally brought my work to a crawl in the past because I wanted to make sure things were perfect. Another side-effect from this is that people tend to make far larger patches than necessary. But I really can’t explain it all that well to someone who hasn’t experienced it. Even svn has revert which really is the same idea (to less of an extreme). In git, from the simple commit –amend, to using reset, to retroactively creating a branch, to rebasing, to git-filter-branch, it’s just amazing the cool things you can do.

  14. gebi says:

    i’m a longtime mercurial user and switched to git because i hit the limits[1] of mercurial some time ago.

    Other problems with mercurial are the unpredictible tags (as mercurial tries to emulate an unversioned tag system on top of the versioned .hgtags file. this get’s really funny with multiple heads and edits of .hgtags).
    And the future direction of mercurials branch support let me worry too, in that one branch can have multiple heads.

    the beginning with git is _definitly_ not easy but in the long run it is really worth the effort.

    [1]: no internal extension api, no real branch support in one repository (you can not even cleanly delete a branch from the history), there where a few cases of repo corruption in our project, for extended features mq is needed but you are forced to 2 seperate repositories.

  15. In my opinion, Mercurial is the SCM “for the adult in you”, while Git is like the swiss-army chainsaw.

    Moreover, as they share the same abstract history model, one could always develop the missing advanced tools for Mercurial as they are in Git or implement a more streamlined hg-like interface to Git.

  16. kalle says:

    For me the switch to git was never any problem. I was using BitKeeper for about 2 years, then the global killer was dropped, and I started using cogito (on top of git), and 1.5-2 years ago when git became userfriendly, I ditched the cogito wrapper. But then again, a friend of mine knows just about EVERYTHING about git, so when I stumble upon something I haven’t tried yet, I usually ask him for a short introduction.

  17. Runa says:

    Great post. I tried git yesterday, and so far it’s been going great. I hope it stays that way too :)

  18. Gé Weijers says:

    The issue with Git is that the model you need to have in your mind is too complicated for casual users, so I would hesitate introducing Git to a group of people who know only 4 CVS commands (checkout, update, commit, and status).

    I’d rather stick with Mercurial, it has some limitations compared to Git but it is a lot easier to explain. I use Git for kernel work, but Mercurial for everything else.

    On top of that: Mercurial is easier to use as a CVS replacement (you have to wean people off the centralized model slowly….)

  19. You neglect to mention Darcs, which with the release of Darcs2 beta has made significant leaps. I can’t get over how much Darcs is ignored in the VCS world.

  20. [...] Elijah’s latest VCS blogpost is on LWN. Noticed the following addition: On a related note, it appears that Emacs will be moving to Bzr, not for a specific technical reason, but because Bzr is becoming a GNU project. [...]

  21. Ali Sabil says:

    Thank you for your reply, here is my opinion about the points you pointed out:
    [1] I completely agree with you, but I would like to point 1 more point about that, concerning slow networked operations, basically for mercurial and bzr they are able to work out of the box using dumb network transports (http, ftp, rsync …), while GIT is not (last time I checked you still had to install git on the remote server and enable a post push hook), this is quite unfortunate because most people end up using dumb transports for bzr and mercurial which make them feel slow for networked operations. Beside that I completely agree with you that GIT is very fast for non networked operations.

    [2] I personally used to like this feature while using Monotone (long before git existed) but now I tend to think that from a user friendliness point of view having 1 folder == 1 branch is far easier for the user to grasp imho.

    [3] I personally think that hg, git and bzr all of them got it wrong, and I tend to prefer darcs UI for this, darcs is known for its interactive ui, which in my opinion is just the best for handling the Limbo files.

    [4] I agree that rewriting the history can be useful, and I don’t see what would stop you from doing this using either bzr of hg, both of them support plugins and extensions, so just go ahead and hack one(there is already bzr-rebase among others), and there is also the uncommit command.

    I would like to add just one last point, basically to me Git is just a mixture of C, perl, shell … code building on top of each other to add functionalities (I am talking about the full git including its porcelain, not just the core) this makes it currently not really extensible (could you extend the existing commit for example ?). This is not the case with bzr and hg, where you have true extension mechanism, for example it would be perfectly feasible to add an plugin to bzr that would allow the –interactive option to the existing commit command, and display a text ui asking about which hunks are to be committed (in a similar fashion to darcs), or a plugin that adds a transport, or a repository format (bzr-svn adds the supports for the svn repo to bzr, unlike git-svn, this allows all the bzr commands either provided by the core or by an extension to be directly usable on an svn repository…).

    Imho, since there is no real winner in this comparison, the one to be chosen ought to be the one that is the easiest to extend and improve, and not the fastest one, or the one with the most feature, or ….

  22. Matthew Bassett says:

    Not to come out as a bzr fanboy or anything, but I find that ‘bzr shelve’ does almost the same thing (well kind of) that you talk about the git index, only with the opposite semantic sense, e.g.:

    ‘bzr commit’ does actually commit all changes

    ‘bzr shelve’ lets you step through the changes you have and temporarily ‘shelve’ them, thus removing them from the current changes awaiting commit (and also the current working copy, until you ‘unshelve’ them).

    This works well for me, since I often mistakenly work on several changes at the same time, in the same branch, and then at commit time realise I need to split these changes out into different commits.

    I also am liking the merge and history functions of bzr, but bear in mind I only have svn and rcs (!!!) experience to compare to…

  23. Nathan Myers says:

    I feel keenly the omission of Monotone from the list of evaluated alternatives. Is there any possibility of filling in the gap?

  24. Eric says:

    Matthew: The “bzr shelve” command sounds very similar to “git stash”. But I’m personally really impressed by stgit, which gives me intergrated support for a “patch stack” in git–I can pop patches, push them, reorder them, etc. This is super-useful when sending patch series upstream.

  25. Govind says:

    @Havoc, about wrappers

    Much of git is designed for use by wrappers, I am currently working on a project to wrap git functionality in a friendlier way that mimics some of what hg and others do. You can find the initial announcement at http://pyrite.sophiasuchtig.com/

    My goal is to hide much of the complexity of git and facilitate certain workflows.

    It goes slowly but it goes. I am also planning on having a pygtk front end that can be used.

    If anyone is interested in helping, I would be glad to have it.

  26. Elijah says:

    Christopher Smith, Nathan Myers: I considered mentioning monotone and darcs (in particular darcs due to the work on darcs2), but I decided to stick with my previous choice to narrow the field to 5 (see http://blogs.gnome.org/newren/2007/11/17/adoption-of-various-vcses/).

    Matthew Bassett: bzr shelve is very cool. Definitely a neat tool…it has some features not present in git stash (picking out pieces of the changes to stash away, instead of just doing all changes). However, that’s not really like the index or its uses at all; there’s a small use case overlap, but not all that much. (If it was the same, git stash would be mere duplication).

    Govind: Looks very cool. Kinda sad though…we have some duplication now (more on that soon).

    Ali: I agree that one branch per repository is easier to grasp, and I don’t mind if the UI makes that the default and encourages it, but the system had better support multiple branches per repo or I don’t want to use it (and it appears that the one branch per repository assumption goes all the way to the core in bzr; hg at least has a good start here with named branches). I agree with you that the ideas in darcs are really cool and unique…but darcs loses on the most important criteria (see http://blogs.gnome.org/newren/2007/11/17/adoption-of-various-vcses/). And, sure, I could start hacking on a system to add more capabilities to make up the differences to git…but git already has it. Anyway, there’s lots of things to consider in the VCS world these days and there’s lots of valid rational conclusions to come to. You can decide based on easiest to extend, most widely adopted, has most active community, has feature x or y, or other reasons. You have a good grasp of the systems and I respect your opinions and conclusion, but I have come to a different one.

  27. Elijah: mercurial named branches are quite a bit different from GIT branches, and they are an alternative to having repositories in different directories for very long lived branches (e.g. stable vs. trunk).

    From my understanding of GIT, its branches are simply tags that point to a head of the history DAG and are automatically updated when one does a pull from the same origin.

    As a mercurial repository can store an arbitrary history DAG like GIT, one could easily work with multiple branch in the same repo simply using “hg update -C -r $HEAD” to switch branch, where $REV is the revision number or hash of the other head.

    If one prefer to use names instead of revnames (hashes or numbers), he can do something similar to GIT branches by simply using local tags, but he would need to manually move the tag when he pulls. Providing an extension that does this automatically would be quite trivial.

    In mercurial, named branches doesn’t need to have a single head and their name is recorded in the immutable history as it is a property of each changeset that belong to them.

    This capability of recording the branch name in the history is something that GIT does not have. To be honest, GIT does not have even the ability to register simple tags in the history, while mercurial has both types of tags.

    In short, they have very different usecases. The GIT branches one is already doable with mercurial (just refer to the heads with their revnames and try keep in your mind where they came from), while the mercurial named branches one is not available with GIT.

    BTW, your effort with EasyGIT and the one from pyrite are very very well appreciated! :)

  28. IMHO Mercurial simply got tags and [named in-repository] branches (with localbranch extension) *wrong*, and Git got it right. Tags and branches are pointers (references) to the nodes in DAG of revisions: tags are “constant” pointers, while branches are “live” pointers, points of growth. This design made me easy to understand what branches are; something CVS was terrible about (branching in CVS, bleh!)

    I agree with gebi that using in-tree .hgtags file for tags which are (and must be) non-versioned, and doing the whole dance with taking most recent and not from manifest version etc. are just an error in design.

    BTW. I disagree that Bazaar is not good for multiple branch workflow: you can clone branch (repository) and have it share repository data with parent, which is something in between multi-branch Git repository and Git alternates for repsitory object database.

  29. Mercurial does support non-versioned tags with the “–local” switch (which puts them in .hg/hgtags instead of .hgtags).
    Of course they are not pushed/pulled as in this case is not clear what to do when there are conflicts (two users have the same tag but pointing to different revisions).
    Having them in the history is useful to see when they are introduced, who introduced them, if they have been changed and to make sure noone messed with them (and if someone did, then it is very simple to revert these changes).
    Just adding an options to push non-versioned tags would make GIT users happy?