Many different kinds of revision specifiers

Version control systems each use their own method to refer to different versions (also known as ‘revisions’) of the repository. The choice of revision specification often reflects underlying data structures, and the choice of data structures often inhibits or enables various features for the system. Additionally, the methods of displaying and using revision specifiers can also affect the ease with which users can learn and use the new system.

Unfortunately, a full comparison is beyond the scope of this post. I will concentrate on simply introducing the basics and giving a flavor for how things are layed out, which itself is a long enough topic. While conclusions could be drawn with just the data and explanations presented here, I am intentionally avoiding doing so and leaving such to possible later posts. (Besides, bloody taxes and the brain-damaged US tax code have stripped me of any time that I would need to write such additional comparisons.)

Warning: My pictoral representations for each system will be crazier and more complex than usual (and even more lopsidedly complex for some systems than others) in order to keep things short while still showing what is possible.

cvs

Method

See cvs revision numbers and cvs branching basics, particularly figure 2.4 near the end of the branching basics section.

CVS has revision identifiers that are per-file, meaning that repositories at any given time are a combination of many different revisions (one for each file). Ignoring an ugly technical detail about the special revisions 1.1.1.1 and 1.1.1, the first version of a file is numbered 1.1. The next change to the file is recorded as 1.2, the next is 1.3, and so forth. If the user wants to create a branch, based on the 1.3 version of a file, then the branched version is 1.3.2.1. Changing and committing the file on the branch results in 1.3.2.2, then 1.3.2.3, etc. A second branch also created off of 1.3 would be numbered 1.3.4 instead of 1.3.2 (with actual commits numbered 1.3.4.x).

Note that branches are named by a revision with one less number (e.g. 1.4.2 is the name of the branch with commits numbered 1.4.2.x). As such, branch names refer to the beginning of the branch. Each file is branched separately, with per-file revision numbers (it is even possible to branch some files without branching others).

Tags are aliases for a specific version number. Since revisions are per-file, a given tag may refer to different revision numbers for different files (e.g. the ‘v1.0’ tag might refer to version 1.27 of foo.c, 1.36 of bar.h, and 1.218 of foobar.py)

Uniqueness of cvs revisions is not an issue since there is only one repository.

Picture

                       (etc)
                         |
             (etc)   1.4.4.3.2.2
               |         |
            1.4.4.5  1.4.4.3.2.1
               |         |
            1.4.4.4  (1.4.4.3.2)
               |     /
               |   /
            1.4.4.3
               |
  1.4.2.2   1.4.4.2
     |         |
  1.4.2.1   1.4.4.1
     |         |
  (1.4.2)   (1.4.4)
      \       /
       \     /
        \   /
         \ /
         1.4
          |
         1.3
          |
         1.2
          |
         1.1

svn

Method

See svn revisions and working with your branch, particularly figure 4.4 (the branching of one file’s history).

svn uses global revision identifiers, with the first revision being marked as 1, the second as 2, the third as 3, etc.

Branches have an unusual implementation in subversion; they are handled by a namespacing convention: a branch is the combination of revisions within the global repository that exist within a certain namespace. Creating a new branch is done by copying an existing set of files from one namespace to another, recorded as a revision itself.

Tags (an alias for a specific version in history) don’t exist in subversion. Instead, subversion again uses a namespacing convention identical to that done for branches (thus making tags and branches indistinguishable in subversion other than the chosen names), and users are merely discouraged from committing additional changes to files within a tag namespace.

Uniqueness of svn revisions is not an issue since there is only one repository.

Technically, a revision could simultaneously modify any combination of branches and tags by simply committing to all namespaces; however, this is typically discouraged and users only have a certain namespace checked out at a time.

Picture

  trunk   branches/proj-2-22  branches/proj-2-20  tags/RELEASE_2_22_2
   24
                                                        23
                 22
   21
   20
                 19
                                     18
                                     17
   16
   15
                 14
                 13
                 12
   11
                                     10
    9
    8
    7
                                      6
    5
                                      4
    3
    2
    1

bzr

Method

See understanding bzr revision numbers and specifying bzr revisions.

bzr, like svn, uses 1, 2, 3, etc. for revision numbers. However, the revision numbers are always consecutive in a branch. Merged in changes from other branches are given 3 numbers per revision. For example, if changes were merged from a repository that has changes relative to revision 2, the changes would come into the current branch numbered 2.1.1, 2.1.2, 2.1.3, etc. If changes from more than one branch are relative to the same commit, then the middle number is used to distinguish commits from the different branches. Thus one would see another set of changes relative to commit 2 numbered as 2.2.1, 2.2.2, 2.2.3, 2.2.4, etc. (Versions of bzr older than 1.2 used more than 3 numbers in certain cases, but that is no longer true of current versions.) See the picture below to make this clearer.

Branches in bzr are done by creating separate directories (typically with their own repository), though one can set up shared repositories. Each branch will have its own numbering scheme for the revisions it stores, recording the order that the revisions entered that repository. (See below about uniqueness issues.)

Tags in bzr are an alias for a commit, and are stored as part of a branch.

Note that bzr revision numbers are not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both. bzr does store unique identifiers for revisions, known as revid’s (an example of which looks like Matthieu.Moy@imag.fr-20051026185030-93c7cad63ee570df), though they are not shown by default. Users can obtain these unique identifiers by passing the –show-ids flag to bzr log, and these revids can be used in place of the simpler default revision specifiers when prefixed with “revid:”.

Picture

              12
              |
              11
            / | \
          /   |  \
        /     |   \
      10    4.1.5  4.2.2
       \   /  |      |
        \ /   |      |
         9    |    4.2.1
        / \   |   /
       /   \  |  /
       8    4.1.4
       |      |
       7    4.1.3
       | \    |
       |   \  |
       6    4.1.2
       |      |
       5    4.1.1
        \   /
          4
          |
          3
          |
          2
          |
          1

Note: The revision identifiers shown in this picture are dependent on merge order; the revisions 4.1.5, 4.2.1, and 4.2.2 could instead be numbered 4.2.1, 4.1.5 and 4.1.6 respectively if the merges done to obtain revision 11 were done in a different order.

git

Method

See Understanding git history: Commits, and naming git commits.

git uses cryptographic checksums (in particular, sha1sums) of repository contents as revision identifiers. These checksums are 40-character hexadecimal strings (e.g. 621ff6759414e2a723f61b6d8fc04b9805eb0c20). Each revision also knows which revision(s) it was derived from (known as the revision’s parent(s)).

Git can be used with one branch per directory like bzr or hg, but it is more common to have branches stored within the same directory/repository (thus the reason some refer to git as a ‘branch container’). In git, branches record the revision of the most recent commit for the branch; since each commit records its parent(s), a branch consists of its most recent commit plus all ancestors of that commit. When a new commit is made on a branch, the branch just records the new revision. Tags simply record a single revision, much like branches, but tags are not advanced when additional commits are made. tags are not stored as part of a branch or in a revision controlled file, though by default tags that point to commits that are downloaded are themselves downloaded as well.

git revisions are unique by design; if you have the same revision in two different repositories, the revision name for both will be the same.

git does provide more human-meaningful ways of referring to commits, in the form of simple suffixes used to count backwards in history from the tip of a branch (or backwards from a tag or commit). This includes methods for counting relative to different parents, making the suffixes have structural meaning. However, such methods are somewhat hidden; for example, they are not shown in the output of git log. This leaves many users unaware of how to take advantage of them, if they are aware of them at all. (A simple wrapper can get them to be shown, at the cost of a little time; they could be shown at negligible time cost with an integrated solution, but none exists to my knowledge.)

Picture

           650a6f...
              |
           caf806...
          /   |   \         719b9d...
        /     |     \       /
      /       |       \   /
 75cc2c...  147c0a... acac44...
      \       |         |
        \     |         |
         8f50e6...    8147be...
         /    |     /
       /      |   /
  9b39b2... 6e2cde...
    |         |
  01fa22... 1a9d90...
    |    \    |
    |      \  |
 46508c...  b6765c...
    |         |
 1c4e8d...  328638...
       \     /
       6627f7b...
          |
       754b42...
          |    \
          |      \
       d1879f...  fba5d0...
          |
       c962db...

hg

Method

See a hg tour through history, and section 2.4.1, “Changesets, revisions, and talking to other people”.

hg uses a method that may look like a mix of the methods used by git and bzr; it has two distinct methods of referring to each revision. Like git, hg uses sha1sums to refer to revisions (though it abbreviates them to fewer characters by default). Like bzr, hg uses the numbers 1, 2, 3, etc. to refer to revisions. Thus hg has one unique method to refer to revisions and another that is simple and easily manipulatable by users. Each revision (or “changeset” in mercurial’s vocabulary) is of the form revision-number:changeset-identifier (e.g. 3:ff5d7b70a2a9).

Like bzr, branches in hg are typically done by creating separate directories (typically with their own repository). However, it also has named branches for naming branches within a repository, which are somewhat similar to git. (I have been told there are important distinctions between hg named branches and git branches, but I do not fully understand all the details; maybe someone will explain in the comments.)

mercurial has both tags and local tags, with (normal) tags being stored in an .hgtags file that is version controlled, and local tags being stored in a file that is not version controlled nor shared (cloned/pulled/pushed/etc.). Like most other systems, tags in hg are an alias for a specific commit.

The (abbreviated) sha1sum portion of hg revisions (the “changeset identifier”) is unique by design; if you have the same revision in two different repositories, the changeset identifier for both will be the same. The simple number portion of hg revisions (the “revision number”) is not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both.

Picture

             19:c87f92...
                |
             18:650a6f...
               |      \
        15:caf806...   \
         /     |        \
       /       |         \
      /        |          \
13:75cc2c... 14:147c0a... 17:acac44...
      \        /           |
        \     /            |
       12:8f50e6...      16:8147be...
         /    |        /
       /      |      /
9:9b39b2... 11:6e2cde...
    |         |
8:01fa22... 10:1a9d90...
    |    \    |
    |      \  |
5:46508c... 7:b6765c...
    |         |
4:1c4e8d... 6:328638...
       \     /
      3:6627f7b...
          |
      2:754b42...
          |
          |
      1:d1879f...
          |
      0:c962db...

Final notes

Each system uses a different scheme, which have different advantages and disadvantages. Odds are that I am not aware of all the relative merits of these systems yet, though I do know some. Personally, I don’t think any of them are optimal (though I admit that optimality is a somewhat relative term given the inherent trade-offs involved). Unfortunately I’m going off-topic, as I said I wouldn’t be discussing advantages and disadvantages in this post, so I’ll shut my trap here…

Happenings in the VCS world

It has been a long time since my last blog post on VCSes. I am getting back into the swing of things and will be making a few more posts. Besides, Olav doesn’t have enough to do and he wants more of my long rambling posts to digest.

The VCS world is becoming more and more interesting, even if it is also more and more frustrating. I’ll briefly point out a few things I have seen happen in the last few months that look cool, making this VCS post a little bit different than my others.

cvs

Stinking stingy CVS refuses to die…it seems to prefer slowly petrifying over the years or something. It was great a number of years ago, but there’s just so many better tools these days. However, there does appear to be a light at the end of the tunnel. The last place I am forced to use CVS (work) will finally be switching (to subversion) in a couple months. Woohoo!

svn

I haven’t seen any big changes in subversion itself (only one bug fix release has occurred). However, it looks like they are making progress on finally implementing useful merge functionality. This is interesting on a number of levels: (1) this lack of functionality was one of the big reasons subversion sometimes looks like a (very well polished) antique rather than a modern system; will the incorporation of this feature be enough to stave off some of the ongoing defections to other systems?, (2) this may be interesting for those using bzr-svn, hgsvn, or git-svn — are users of such systems going to find it even easier to use their preferred tool?, (3) the main reason svn’s dozen or so ugly renaming bugs (some of which essentially result in corrupted data) have gone almost completely unnoticed is that most are only triggered in merge operations and subversion’s current merge functionality is so primitive and problematic that hardly anyone uses it. Further, svn’s roadmap clearly lists fixing the rename problems in a different release, after the merge fixes are included. Will the extra visibility that one problem will receive due to a different problem being fixed make subversion look more problematic or less? This will be fun to watch.

On a separate note, it is interesting to see that subversion developers are considering adopting some features of distributed VCSes — sometime in the distant future. An easy to miss but interesting nugget from that email is the following:

Fortunately, we’ve pretty much agreed, IIRC, that we’re willing to punt on subdirectory detachability in working copies in order to get performance improvements.

I have often seen svn and cvs proponents argue that as one of the big advantages to those systems, yet it looks like the svn developers are willing to drop it. Very interesting indeed.

hg

Mercurial version 0.9.5 was released since I did my last round of VCS blog posts and it is on my system. hg-0.9.5 has quite a number of improvements; the one that particularly caught my eye was support for subversion as a source SCM in its convert functionality. When I first looked at mercurial, they suggested people use git-svn and then convert from git to hg. To me, that seemed to push people to just use git. It looks like this has changed.

I have often found it somewhat strange that mercurial doesn’t have more active vocal proponents. Usually one hears from the git or bzr proponents, but not so much from mercurial. Yet it has always had many of the advantages of both (and, in some ways seems to have the most svn-like UI, and would seem a more natural transition for svn converts). I guess it’s a case where having most of the advantages or capabilities of other systems (even multiple other systems) yet not clearly standing out in one particular area will rob you of the active advocates that you could otherwise have. Of course, maybe it’s like the linuxjournal reader’s choice awards phenomenon too; the noise or results that others hear may only be indicative of a certain small subset of the community.

bzr

A lot has happened in the Bazaar world. They had their big 1.0 release in mid-December and are now up to bzr-1.2. They have made impressive gains in performance, particularly with their adoption of the pack idea from git, and it appears they have at long last caught up to the leaders in the field in this area.

Near the end of last year, I corresponded about early versions of the “Main Competitors” writeups of the Why Choose Bazaar page, with Ian Clatworthy. I pointed out some advantages of bzr he hadn’t included, mentioned how some bold claims had no accompanying proof, and pointed out some places where he seemed to be unaware of capabilities of other systems or where I disagreed with some of his claims. The final versions seem to have mixed results; part of my feedback was addressed (and more was addressed in follow-ups), but other parts were not. I’m particularly puzzled by the reticence to investigate the existing capabilities of other systems and the willingness to claim features of bzr as advantages without determining whether they are actually unique. Regardless, though, while one does need to individually verify or discard each claim, the writeups are fairly impressive. I probably need to get back in touch with Ian again.

git

I’m so annoyed with Carl right now. He was the one who introduced me to git a number of years ago, and showed me some really cool things about it. I dropped it almost immediately at the time because it was way too hard to use. But, I’ve always been interested in it and made occasional attempts to tame the dragon ever since.

As many are aware, git has made huge strides towards usability in the 1.5 series, and has recently introduced automatic repacking in git-1.5.4. Because of all this work, I made diligent attempts to understand it over the last couple months. In doing so, I finally had the necessary epiphanies to feel I understand it. It turns out I was able to use it productively long before the uncomfortable feeling of I-don’t-really-understand-this-thing was finally expelled. The result? I found that there are several features of git not present in other systems that I am absolutely addicted to, but looking back on the journey I can’t say that it would be worth the effort for others to follow the same path, despite these awesome features. The thing is still too bloody hard to figure out.

One of my desires for my blog posts series was to point out how horrible the git manpages (i.e. the built in help system for git) are for new users, but I felt uncomfortable doing so until I actually understood them. I was not able to understand even the synopsis of the git-diff manpage until a couple weeks ago. And I tried. Hard. Over days, weeks, and months. I read up on reflogs, the index, git’s storage format, the git tutorial and all kinds of other documentation. I feel stupid now, because I was just missing something simple and now seemingly obvious. But from what I can tell, little should-be-obvious-but-aren’t things like this are blocking lots of people from being able to use git.

Long story short: git has become far more usable…mere mortals can actually figure the system out (a big change from earlier versions) if they have an unusually large level of patience and motivation. git has some really awesome features, but I just can’t recommend it to others in its current state.