Adoption of various VCSes

There are a lot of Version Control Systems out there, and one of the biggest criteria in selecting one to use is who else uses it. I’ll try to quickly summarize what I have learned about the adoption of various VCSes. There are many people who know more than me, but here’s some of the bits that I’ve picked up.

Perceived adoption from lots of reading

I have read many blog posts, comparisons, tutorials, news articles, reader comments (in blogs and at news sites), and emails (including various VCS project archives) about version control systems. In doing so, it is clear to me that some are frequently mentioned and deemed worthy of comparison by others, while many VCSes seem so obscure that they only appear in comparisons at sites that attempt to be exhaustive or completely objective (e.g. at wikipedia). Here are the ones I hear mentioned more frequently than others:

First rung: cvs, subversion, bazaar-ng,
mercurial, tla/baz, and
git.

Though bazaar perhaps belongs in a rung below (more on that in a minute). There are also several VCSes that are still mentioned often, but not as often as the ones above:

Second rung: svk, monotone, darcs,
codeville, perforce, clearcase,
and bitkeeper.

tla/baz died a few years ago (with both developers and users mostly abandoning it for other systems, though I hear tla got revived for maintenance-only changes). Also, bazaar-ng really straddles these two levels rather than being in the upper one, but I was one of the early adopters and it has relatively strong support in the GNOME community so it’s more relevant to me. Perforce, clearcase, and bitkeeper are proprietary and thus irrelevant to me (other than as a comparison point).

Adoption according to project records

Of the non-dead open source systems, here’s a list of links to who uses them plus some comments on the links:

  • bazaar-ngWhoUsesBzr – wiki page name is inconsistent; it should be “ProjectsUsingBzr” (compare to wiki page names below) :-). The page is also slightly misleading; they claim drupal as a user but my searches show otherwise (turns out to just be a developer with an unofficial mirror). Hopefully there aren’t other cases like this.
  • codeville – NoPage – I wasn’t able to find any list of projects using codeville anywhere. In fact, I wasn’t able to find any projects claiming to use it either. It must have shown up in other peoples’ comparisons on the basis of its interesting merge algorithm.
  • cvs – NoPage – I don’t have a good reference page, and it’d likely go out-of-date quickly. However, while CVS is no longer developed and projects are switching from CVS in droves these days, it wasn’t very many years ago that cvs was ubiquitous and a near universal standard. Nearly everyone familiar with at least one vcs is familiar with cvs, making it a useful reference point. Also, it still has a pretty impressive installed base; I’m even forced to use it occasionally in the open source world as well as every day at work.
  • darcsProjectsUsingDarcs – I strongly appreciate the included list of projects that stopped using their VCS (and why). Bonus points to darcs for not hiding anything.
  • gitProjectsUsingGit
  • mercurialProjectsUsingMercurial – I like how they make a separate list for projects with synchronized repositories (bzr and svk ought to adopt this practice, and maybe others)
  • monotoneProjectsUsingMonotone – I really like the project stats provided.
  • subversionopen-source-projects-using-svn – wiki page name isn’t ProjectsUsingSvn; couldn’t they read everyone else’s minds and realize that they needed such a name to fit in with the standard naming scheme? 😉
  • svkProjectsUsingSVK – claims WINE, KDE, and Ruby on Rails as users; my simple searches showed otherwise (likely svk developers just knew of developers from those projects hosting their own unofficial svk mirrors). I don’t know if their other claimed users are are accurate or not; I only checked these three.

Some adoption pages point to both the project home page and the project repositories, which is very helpful. The other adoption wiki pages should adopt that practice too, IMHO.

Adoption by “Big” users

Looking at the adoption pages listed above, each of the projects other than svk and codeville seem to have lots of users. Mostly small projects, but most projects probably are are small and it is also easier for small projects to switch to a new VCS. The real test is whether VCSes are also capable of supporting large projects. I’d like to compare on that basis, but I’m unwilling to investigate how big each listed project is. So, I’ll instead compare based on (a) if I’ve heard of the project before and know at least a little about it, and (b) I think of the project as big. This results in the following list of “big” users of various VCSes:

  • bazaar-ng – This is kind of surprising, but Ubuntu is the only case matching my definition above. As an added surprise, they aren’t in bzr’s list of users. (samba and drupal only have some unofficial users; and in the case of samba, I know they also have unofficial git users. Official adoption only for my comparison purposes; otherwise GNOME and KDE would be in lots of lists.)
  • codeville – none
  • cvs – Used to be used by virtually everything. Many projects still haven’t moved on yet.
  • darcs – none of the projects listed match my definition of “big” above
  • git – linux kernel (and many related projects), much of freedesktop.org (including, Xorg. HAL, DBUS, cairo, compiz), OLPC, and WINE
  • mercurial – opensolaris, mozilla (update: apparently mozilla hasn’t converted quite yet)
  • monotone – tough case. I would have possibly said none here, noting gaim, er, pidgin, as the closest but their stats suggest two projects (Xaraya and OpenEmbedded) are big…and that pidgin is bigger than I realized. I guess I’m changing my rules due to their cool use of stats.
  • subversion – KDE, GNOME, GCC, Samba, Python, and others
  • svk – none

Brief notes about each system

As a quick additional comparison point for those considering adoption, I’ll add some very brief notes about each system that I’ve gathered from my reading or experience with the system. I’ll try to list both a good point and a bad point for each.

  • Free/Open source VCSes
    • bazaar-ng (bzr) – Developed and Evangelized by Canonical (backers of the Ubuntu distribution). Designed to be easy to use and distributed, and often gets praise for those features. It received a bit of a black eye in the early days for being horribly slow (it made cvs look like a speed demon in its early days), though I hear that the speed issues have received lots of attention and changes (and brief recent usage seems to suggest that it’s a lot better). Annoyingly, it provides misleading and less-than-useful results when passing a date to diff (the implemented behavior is well documented and apparently intentional, it’s just crap).
    • codeville – Designed by Bram Cohen (inventor of bittorrent). People seem to find the merge algorithm introduced by codeville interesting. Doesn’t seem to have been adopted much, though, and it even appeared to have died for a while (going a year and a half between releases, with other updates hard to find as well). Seems to be picking back up again.
    • cvs – The VCS that all other VCSes compare to, both because of its recent ubiquity and because its well known flaws are easy to leverage in promoting new alternatives. The developers working on cvs decided its existing flaws could not be fixed without a rewrite, and thus created a new system called subversion. cvs is inherently centralized.
    • darcs – Really interesting and claimed easy to use research system written by David Roundy (some physicist at OSU) that is based on patches rather than source trees. I believe this allows, for example, merging between source trees that do not necessarily have common history (touted as an advanced cherry-picking algorithm that no other VCS can yet match). However, this design has an associated “doppelganger” bug that can cause darcs to become wedged and which requires care from the user to avoid. From the descriptions of this bug, it sounds like something any big project would trigger all the time (it’s an operation I’ve seen happen lots in my GNOME maintainence even on modestly sized projects like metacity.) However, developers apparently can avoid this bug if they know about it and take steps to actively avoid triggering it. I think this is related to “the conflict bug”, which can cause darcs to be slow on large repository merging, but am not sure.
    • git – Invented by Linus Torvalds (inventor of the linux kernel). It has amazed a lot of people (including me) with its speed, and there are many benchmarks out there that are pretty impressive. I’ve heard/seen people claim that it is at least an order of magnitude faster than all other VCSes they’ve tried (from people who then list most all the major VCSes people think of as fast among the list of VCSes they’ve tried). It also has lots of interesting advanced features. However, versions prior to 1.5 were effectively unusable, requiring superhuman ability to learn how to use. The UI warts are being hammered away and git > 1.5 is much better usability-wise; it’s now becoming a usable system once users first learn and understand a few differences from other systems, despite its few remaining warts here and there. The online tutorials have transformed into something welcoming for new users, though the man pages (which double as the built in “–help” system) still remind me more of academic research articles written for a community of existing experts rather than user documentation. Also, no official port to windows (without cygwin) exists yet, though one is apparently getting close. Interestingly, git seems to be highly preferred as a VCS among those I consider low-level hackers.
    • GNU Arch (tla/baz) – Invented by Tom Lord (who also tried to replace libc with his own rewrite). Both tla and baz are dead now with developers and users having moved on, for the most part. Proponents of these systems (particularly Tom) loudly evangelized the merits of distributed version control systems, which probably backfired since tla/baz were so horribly awful in terms of usability, complexity, quirkiness, and speed that these particular distributed VCSes really didn’t have any redeeming qualities or even salvagable pieces. (baz was written as a fork designed to make a usable tla which was backward compatible to tla; the developers eventually gave up and switched to bzr since this was an impossible goal.) I really wish I had the part of my life back I wasted learning and using these systems. And no, I don’t care about impartiality when it comes to them.
    • mercurial (hg) – Written by Matt Mackall (linux kernel developer). Started two days after git, it was designed to replace bitkeeper as the VCS for the kernel. Thus, like git, it focused on speed. While not as fast as git in most benchmarks I’ve seen, it has received lots of praise for being easier to learn, having more accessible documentation, working on Windows, and still being faster than most other VCSes. The community behind mercurial seems to be a bit smaller, however: it doesn’t have nearly as many plugins as bzr or git (let alone cvs or svn). Also, it annoyingly doesn’t accept a date as an argument to diff, unlike all the other major VCSes.
    • monotone (mtn) – Maintained by Nathaniel Smith and Graydon Hoare (who I don’t know of from elsewhere). The main thing I hear about this system is about it’s ideas to focus on authentication of history to verify repository contents and changes. These ideas influenced and were adopted by git and mercurial. On the con side, it appears getting an initial copy can take an extraordinarily large amount of time; for example, if you look at the developer site for pidgin you’ll note that they provide detailed steps on how to get a checkout of pidgin that involves bypassing monotone since it’s too slow to handle this on its own.
    • subversion (svn) – Designed by former cvs maintainers to “be a better cvs”. It doesn’t suffer from many of the same warts as CVS; e.g. commits are atomic, files can be renamed without messing up project history, changes are per-commit rather than per-commit-per-file, and a number of operations are much faster than in cvs. Most users (myself included) feel that it is much nicer than CVS. Like CVS, svn remains inherently centralized and has no useful merge feature. Unlike CVS, half the point of tagging is inherently broken in svn as far as I can tell[*] (you can’t pass a tag to svn diff; you have to search the log trying to find the revision from which the tag was created and then use whatever revision you think is right as the revision number in svn diff).
    • svk – Invented by Chia-liang Kao and now developed by Best Practical Solutions (some random company). Designed to use the subversion repository format but allow decentralized actions. I know little about their system and am hesitant to comment as I can’t think of any good comments I’ve heard (nor more than a couple bad ones.) However, on the light side of things, I absolutely love their SVKAntiFUD page. On that page, in response to the question “svk is built on top of subversion, isn’t it over-engineered and fragile?” an additional note to the answer (claimed to have been added in 2005) states that “Spaghetti code can certainly not be called over-engineered.” While the history page of their wiki suggests it has been there for at least a year, I’m guessing the maintainers don’t know about this comment and will remove it as soon as someone points it out to them.
  • Proprietary (i.e. included only for comparison purposes) VCSes
    • bitkeeper – A system developed by BitMover Inc., founded by Larry McVoy. Gained prominence from its usage for a few years by the linux kernel. “Free Use” (as in no monetary cost) of the system by open source projects was revoked when Andrew Tridgell started reverse engineering the protocol (by telnetting to a server and typing “help”). Most users of this system seem to like it technically, but the free/open source crowd understandably often disliked its proprietary nature. I haven’t used the system, but think of it as being similar to mercurial (though I don’t know for sure if that’s the best match).
    • clearcase – Developed by (the Rational Software division of) IBM. Clearcase is an exceptionally unusual VCS in that I’ve never heard anyone I know mention a positive word about it. Literally. They all seem to have stories about how it seems to hinder progress far more than it helps. There has to be someone out there that likes it (it seems to have quite a number of users for a proprietary VCS despite being exceptionally expensive), but for some reason I haven’t run across them. Very weird. I believe it is actually lock-based instead of either distributed or inherently centralized, meaning that only one person can edit any given file at a time on a given branch. Sounds mind-bogglingly crazy to me.
    • perforce – Developed by Perforce Software, Inc. It seems that users of the system generally like it technically, and it has a free-of-charge clause for open source software development. My rough feeling is that Perforce is like CVS or subversion, but has a number of speed optimizations over those two. It is apparently even worse than cvs or svn for offline working, making editing not-already-opened files in the working copy problematic and error-prone unless online.

The major VCSes

Based on everything above, I consider the following VCSes to be the “major” ones:

cvs, svn, bzr, hg, and git.

I’ll add an “honorable mention” category for monotone and darcs (which bzr nearly belongs in as well, but passes based on the Canonical backing and much higher than average support by developers within the GNOME community). These five VCSes are the ones that I’ll predominantly be comparing between in my subsequent posts.

Update

[*] Kalle Vahlman in the comments points out that you can diff against a tag in svn, though it requires using atrocious syntax and a store of patience:

As much as I agree with [the claim] that SVN is just a prettier CVS, [it] isn’t really true. You can [run]:

svn diff http://svn.gnome.org/svn/metacity/tags/METACITY_2_21_1 http://svn.gnome.org/svn/metacity/trunk

to get differences between the tag and current trunk. If it looks horribly slow to you, it’s because you are on a very fast connection. IT IS SO SLOW IT MAKES LITTLE KITTENS WEEP. But it is possible anyway.

There are a number of other good posts in the comments too, pointing out project adoption cases I potentially missed and noting additional issues with some systems that I won’t be comparing later.

27 thoughts on “Adoption of various VCSes”

  1. Very nice read indeed 🙂

    I’ll just point out that I’ve used ClearCase for 6 months at a Fortune 500 company. Basically, ClearCase uses a virtual file system, so you’ll have to “mount” parts of the source tree you want to edit before you can even access them. It is not distributed, it needs a central server to work properly. What’s more is that you need that server to be online full time.

    It also uses what’s called a ConfSpec file. Basically, it’s kind of like a jhbuild file where you tell it what you want from which branches, but inside _one_ source tree. While this is very flexible for merges (it’s like a file-based form of cherry picking), if developers aren’t careful, ConfSpec files can grow _very_ large and hinder development.

    Like CVS, commits are per-file. In fact, there’s no command to commit multiple files at the same time. I had to write bash aliases to do this.

    As for the lock based mechanism, I would say this is true, but the policy for the company I was at was to create branches for every feature you wanted to develop (like Torvalds advocates for git) and then merge back into the mainline later on, so I didn’t get to see any locking in action.

    Like SVN, branches are stored on the server, so you’ll need to make up your own naming conventions if you want to keep to noise down. Either way, the project I was working on had literally thousands of available branches.

    As for your comment about ClearCase slowing down work, I can only concur. So much so that I actually learned git to do my actual day-to-day work, using ClearCase as a backup medium every once in a while.

    My final word : it is now my personal policy to turn down job offers where ClearCase is used.

    Cheers

  2. I guess darcs finds most usage in the Haskell world. Outside those, two projects in their users list stand out to me: REXML is the standard Ruby XML library, shipped with every Ruby installation, and Prototype is the default Javascript library shipped with at least Ruby on Rails and Pylons, the Python web framework. Even if you don’t know it, wouldn’t be surprised if you had multiple copies of it in your browser’s cache 🙂

  3. “Unlike CVS, half the point of tagging is inherently broken in svn as far as I can tell (you can’t pass a tag to svn diff; you have to search the log trying to find the revision from which the tag was created and then use whatever revision you think is right as the revision number in svn diff).”

    As much as I agree with Linus that SVN is just a prettier CVS, this claim isn’t really true.

    You can actually do this:

    svn diff http://svn.gnome.org/svn/metacity/tags/METACITY_2_21_1 http://svn.gnome.org/svn/metacity/trunk

    to get differences between the tag and current trunk. If it looks horribly slow to you, it’s because you are on a very fast connection. IT IS SO SLOW IT MAKES LITTLE CITTENS WEEP.

    But it is possible anyway.

  4. When using Mercurial and you need diffing depending on date you usually do a hg log search based on date and feed the needed revisions to hg diff.
    Working with dates on DVCSs is quite fuzzy, as local/repo push dates are not as linear as on CVCSs. You can have a local branch that you develop for a long time (with pulls and merges from the canonical repo), but it can get integrated in a main repo much later… which dates are you referring to?.

    I’d also add Xen as a really big project using hg.

    Furthermore, AFAICT, it’s replacing Teamware internally at Sun for all projects.

  5. Probably Telepathy can be considered a big project and it uses darcs.

    However, I have a lot of problem with darcs:
    – I had to delete my commit history at least two times to merge my branch with another branch
    – For other conflicts I had to manually fix things and it took hours!
    – Darcs sometimes fails to do even simple merges, for instance if two branches have an identical change
    – It’s possible to lead darcs into an inconsistent state where it thinks that there are conflicts even if everything is ok

  6. Could you expand on the point about ‘bzr diff’ with date arguments? You made me curious, but ‘bzr help diff’ nor ‘bzr help revisionspec’ are not enough to figure out what it does and why it is crazy.

    (Also, why is there a backslash in front of the apostrophe in the title of this page?)

  7. Hey, long post but worth reading IMHO.

    But I think instead of reading about DVCSs on and on I just should pick one (which I mostly already did with hg) and just use it further, as I can’t find any downside point on that one in my everyday usage 🙂

  8. Great post.

    SVN doesn’t really have branches and tags, it just has a convention for where to check in your code to simulate them, hence being forced to use a URL for them instead of a short name. You also missed that in current subversion you have to remember the revision number of your last merge to merge a branch more than once. This bug is being fixed in 1.5 as well as more features being added: http://blogs.open.collab.net/svn/2007/05/the_subversion__1.html http://blogs.open.collab.net/svn/2007/07/subversion-15–.html There’s also a defence of SVN not being distributed here http://blog.red-bean.com/sussman/?p=79 but it’s focused on proprietary software companies, not open source projects.

    Google uses Perforce internally.

  9. Graydon Hoare is at Mozilla and did a lot of the work to hook up a generalized cycle collection algorithm to the reference counting used by most of Mozilla’s code, for your information.

    As for Mozilla using hg, it’s more than a few small projects — it’s lots of Mozilla 2 work, next-gen stuff, etc. that needs better merge algorithms. One big user is ActionMonkey, the project to merge Adobe’s Tamarin JIT with SpiderMonkey, the JavaScript engine used by Netscape, Mozilla, and Firefox since the early days of the web — it actually does a lot of merging from current SpiderMonkey code. The final switch to hg isn’t coming until after Firefox 3, or very slightly before depending on when the super-locked-down state happens. I wouldn’t have included it yet in the list of hg-using projects, for what it’s worth (or maybe included with a star or a note about “switching to”).

  10. Great comparison! I had to suffer ClearCase while working for Sony Ericsson and it wasn’t that bad IMHO. File locking is optional so you can check out files similar to how subversion etc works. The biggest selling point for me was the nice GUI:s and good windows integration. For example, the version tree browser made it very easy to interactively investigate a VOB, see the branch and merge points as a graph and point-and-click to see a diff and log for a particular commit.

    And it is not true that you have to be connected to the server all the time, ClearCase uses two different view modes live and snapshot. Live, which requires the server to be online, you never need to update your view so you are supposed to stay up-to-date all the time. Snapshot is equivalent to an svn checkout and doesn’t require the server to be online. To commit multiple files at once, you just select the files, choose commit and press the “Apply to All” button after you have written your log message.

    Server administration OTOH, must have been a real PITA. The server was down at least once a week for over a year and the only remedy the admins had was to reboot and pray it would work.

    All in all, I think CC has some very interesting features, but it felt way overcomplicated for ordinary work and, while it has a CLI, without the GUI:s it would have been impossible to work with.

  11. Regarding svk: I used it for about 10 months.

    Good thing is that for projects with many branches and tags a svk checkout with the whole history + your working copy can be less than the working copy of subversion for a single version. This is due to the compressed nature of svn repository file format.

    The math is as follows – in svn if the tree you checkout is 5 MB, your working copy will be about 10 MB. This is due to the hidden data in .svn directories.

    In svk – the working copy is pristine – it does not contain this meta data. In order for some part of your filesystem to be considered a svk working copy – it has to be specifically mentioned in the svk config file of the user.
    Now – svk can get all the revisions from a svn repo and then compress them in its own svn repo – which results in smaller diskspace taken. For projects with many tags and branches – compressability is very high.

    However – svk is VEEEERY slow – slower than svn. Additionally – on low RAM systems (128MB, something which I had to use) – the perl scripts are killing the machine.

    Another extremely annoying thing is that when you sync with the master repository (which is a SINGLE one – svk is centralized inherently, no interdeveloper syncs – only through the master repo) – you have to do that in a SINGLE patch. If you try to merge all the changes on a change by change base, even though you merge everything, things get broken very soon as old revisions of file get recommitted between your local branch and the master repository. This is mentioned in the documentation of svk, but is plain wrong behavior.

    Regarding svn:
    Developer frequently forget that svn can and is used for other things beside source control. I would think that this is in fact the single best thing for svn. You get a webdav resource with autoversioning – just mount it in Windows or Linux and you immediately get a versioned filesystem which you can use for managers which will not learn any VC system. The gam Tribal Trouble uses svn for distributing updates.

  12. I find svk is great if you’re forced to use svn and you’re about to take a plane flight or suffer some other disconnection. It feels just like svn for that use case. Plane lands, you svk push, everything’s great.

    Beyond that, however, svk becomes cranky fast. Definitely don’t try to use it with any VCS other than svn. It claims to work with CVS and Perforce but those claims are woefully optimistic.

    Masukomi’s post sums up Perforce beautifully. http://weblog.masukomi.org/2007/8/31/dear-perforce-fuck-you If you can handle strong language, it’s a great read. Perforce is an antiquated disaster that requires full-time staff to administer. I pity Google for picking it.

  13. Regarding Juri Pakaste’s comment:
    Pylons switched from Darcs to Mercurial in September,
    but I think Darcs is heavily used by the Debian project.

  14. Work is being done by the darcs-developers to fix the misbehaviour that causes darcs to hang.
    Your definition of “big” is obviously flawed, as you also say. One “big” project that uses darcs is the Haskell Compiler GHC btw.

  15. I have used Clearcase as well, and do know some people who would vigorously defend it. It’s good when you have huge volumes of data under version control, because the data isn’t really copied ( by default ): the live view behaves like a ( sometimes sluggish ) versioned NFS file system; thus, new branches and views are nearly instantaneous.
    You mention “Perforce is like CVS or SVN.. “. Well, yes and no ! Yes, it is very fast ( I measured > 10x faster than CVS for checkouts, much for updates, diffs, etc.), but in addition, branching and merging work the way they should ( unlike CVS or SVN, where you get into trouble if you merge a branch twice and didn’t take the precautions ). I’d think that if a centralized VCS is Ok for your model, Perforce is the best of those.

  16. I know of a few companies using svk mirrors internally. I think svk’s adoption is “hidden” under the covers of many projects using a centralized svn repository.

  17. One interesting question is how do the different systems handle line endings. That can be a major PITA when working on cross platform projects. SVN and hg seem to be the smartest, whereas others seem to have a that-is-the-user’s-problem kind of attitude.

  18. Game development companies, which often exceed 1 TB in the repository per project, swear by Perforce. Certainly nothing in the distributed VCS crowd approaches its abilities to handle huge files. (Given that Linus, the creator of Git, has gone on record to say “Files above 10 MB don’t belong in the repository, this is hardly surprising.)

    If you think 100k KLOC is “big”, you need to get out more.

  19. Best Practical Solutions is the company responsible for “Request Tracker”, (package: rt3) a monstrous, perl-CGI ticket-tracking website.

    Unfortunately it’s a far sight more useful for general IT support ticketing than Bugzilla or Trac, which are very development-oriented.

  20. I am yet to see a distributed open-source VCS that handles the line endings well. Lack of proper CRLF support is a showstopper to me, but Mercurial, Darcs, Bazaar and Monotone (the ones I’ve checked) are only starting to became aware of the issue. I’m sticking to svn until one of the above ones solves this.

Comments are closed.