Many different kinds of revision specifiers

Version control systems each use their own method to refer to different versions (also known as ‘revisions’) of the repository. The choice of revision specification often reflects underlying data structures, and the choice of data structures often inhibits or enables various features for the system. Additionally, the methods of displaying and using revision specifiers can also affect the ease with which users can learn and use the new system.

Unfortunately, a full comparison is beyond the scope of this post. I will concentrate on simply introducing the basics and giving a flavor for how things are layed out, which itself is a long enough topic. While conclusions could be drawn with just the data and explanations presented here, I am intentionally avoiding doing so and leaving such to possible later posts. (Besides, bloody taxes and the brain-damaged US tax code have stripped me of any time that I would need to write such additional comparisons.)

Warning: My pictoral representations for each system will be crazier and more complex than usual (and even more lopsidedly complex for some systems than others) in order to keep things short while still showing what is possible.

cvs

Method

See cvs revision numbers and cvs branching basics, particularly figure 2.4 near the end of the branching basics section.

CVS has revision identifiers that are per-file, meaning that repositories at any given time are a combination of many different revisions (one for each file). Ignoring an ugly technical detail about the special revisions 1.1.1.1 and 1.1.1, the first version of a file is numbered 1.1. The next change to the file is recorded as 1.2, the next is 1.3, and so forth. If the user wants to create a branch, based on the 1.3 version of a file, then the branched version is 1.3.2.1. Changing and committing the file on the branch results in 1.3.2.2, then 1.3.2.3, etc. A second branch also created off of 1.3 would be numbered 1.3.4 instead of 1.3.2 (with actual commits numbered 1.3.4.x).

Note that branches are named by a revision with one less number (e.g. 1.4.2 is the name of the branch with commits numbered 1.4.2.x). As such, branch names refer to the beginning of the branch. Each file is branched separately, with per-file revision numbers (it is even possible to branch some files without branching others).

Tags are aliases for a specific version number. Since revisions are per-file, a given tag may refer to different revision numbers for different files (e.g. the ‘v1.0’ tag might refer to version 1.27 of foo.c, 1.36 of bar.h, and 1.218 of foobar.py)

Uniqueness of cvs revisions is not an issue since there is only one repository.

Picture

                       (etc)
                         |
             (etc)   1.4.4.3.2.2
               |         |
            1.4.4.5  1.4.4.3.2.1
               |         |
            1.4.4.4  (1.4.4.3.2)
               |     /
               |   /
            1.4.4.3
               |
  1.4.2.2   1.4.4.2
     |         |
  1.4.2.1   1.4.4.1
     |         |
  (1.4.2)   (1.4.4)
      \       /
       \     /
        \   /
         \ /
         1.4
          |
         1.3
          |
         1.2
          |
         1.1

svn

Method

See svn revisions and working with your branch, particularly figure 4.4 (the branching of one file’s history).

svn uses global revision identifiers, with the first revision being marked as 1, the second as 2, the third as 3, etc.

Branches have an unusual implementation in subversion; they are handled by a namespacing convention: a branch is the combination of revisions within the global repository that exist within a certain namespace. Creating a new branch is done by copying an existing set of files from one namespace to another, recorded as a revision itself.

Tags (an alias for a specific version in history) don’t exist in subversion. Instead, subversion again uses a namespacing convention identical to that done for branches (thus making tags and branches indistinguishable in subversion other than the chosen names), and users are merely discouraged from committing additional changes to files within a tag namespace.

Uniqueness of svn revisions is not an issue since there is only one repository.

Technically, a revision could simultaneously modify any combination of branches and tags by simply committing to all namespaces; however, this is typically discouraged and users only have a certain namespace checked out at a time.

Picture

  trunk   branches/proj-2-22  branches/proj-2-20  tags/RELEASE_2_22_2
   24
                                                        23
                 22
   21
   20
                 19
                                     18
                                     17
   16
   15
                 14
                 13
                 12
   11
                                     10
    9
    8
    7
                                      6
    5
                                      4
    3
    2
    1

bzr

Method

See understanding bzr revision numbers and specifying bzr revisions.

bzr, like svn, uses 1, 2, 3, etc. for revision numbers. However, the revision numbers are always consecutive in a branch. Merged in changes from other branches are given 3 numbers per revision. For example, if changes were merged from a repository that has changes relative to revision 2, the changes would come into the current branch numbered 2.1.1, 2.1.2, 2.1.3, etc. If changes from more than one branch are relative to the same commit, then the middle number is used to distinguish commits from the different branches. Thus one would see another set of changes relative to commit 2 numbered as 2.2.1, 2.2.2, 2.2.3, 2.2.4, etc. (Versions of bzr older than 1.2 used more than 3 numbers in certain cases, but that is no longer true of current versions.) See the picture below to make this clearer.

Branches in bzr are done by creating separate directories (typically with their own repository), though one can set up shared repositories. Each branch will have its own numbering scheme for the revisions it stores, recording the order that the revisions entered that repository. (See below about uniqueness issues.)

Tags in bzr are an alias for a commit, and are stored as part of a branch.

Note that bzr revision numbers are not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both. bzr does store unique identifiers for revisions, known as revid’s (an example of which looks like Matthieu.Moy@imag.fr-20051026185030-93c7cad63ee570df), though they are not shown by default. Users can obtain these unique identifiers by passing the –show-ids flag to bzr log, and these revids can be used in place of the simpler default revision specifiers when prefixed with “revid:”.

Picture

              12
              |
              11
            / | \
          /   |  \
        /     |   \
      10    4.1.5  4.2.2
       \   /  |      |
        \ /   |      |
         9    |    4.2.1
        / \   |   /
       /   \  |  /
       8    4.1.4
       |      |
       7    4.1.3
       | \    |
       |   \  |
       6    4.1.2
       |      |
       5    4.1.1
        \   /
          4
          |
          3
          |
          2
          |
          1

Note: The revision identifiers shown in this picture are dependent on merge order; the revisions 4.1.5, 4.2.1, and 4.2.2 could instead be numbered 4.2.1, 4.1.5 and 4.1.6 respectively if the merges done to obtain revision 11 were done in a different order.

git

Method

See Understanding git history: Commits, and naming git commits.

git uses cryptographic checksums (in particular, sha1sums) of repository contents as revision identifiers. These checksums are 40-character hexadecimal strings (e.g. 621ff6759414e2a723f61b6d8fc04b9805eb0c20). Each revision also knows which revision(s) it was derived from (known as the revision’s parent(s)).

Git can be used with one branch per directory like bzr or hg, but it is more common to have branches stored within the same directory/repository (thus the reason some refer to git as a ‘branch container’). In git, branches record the revision of the most recent commit for the branch; since each commit records its parent(s), a branch consists of its most recent commit plus all ancestors of that commit. When a new commit is made on a branch, the branch just records the new revision. Tags simply record a single revision, much like branches, but tags are not advanced when additional commits are made. tags are not stored as part of a branch or in a revision controlled file, though by default tags that point to commits that are downloaded are themselves downloaded as well.

git revisions are unique by design; if you have the same revision in two different repositories, the revision name for both will be the same.

git does provide more human-meaningful ways of referring to commits, in the form of simple suffixes used to count backwards in history from the tip of a branch (or backwards from a tag or commit). This includes methods for counting relative to different parents, making the suffixes have structural meaning. However, such methods are somewhat hidden; for example, they are not shown in the output of git log. This leaves many users unaware of how to take advantage of them, if they are aware of them at all. (A simple wrapper can get them to be shown, at the cost of a little time; they could be shown at negligible time cost with an integrated solution, but none exists to my knowledge.)

Picture

           650a6f...
              |
           caf806...
          /   |   \         719b9d...
        /     |     \       /
      /       |       \   /
 75cc2c...  147c0a... acac44...
      \       |         |
        \     |         |
         8f50e6...    8147be...
         /    |     /
       /      |   /
  9b39b2... 6e2cde...
    |         |
  01fa22... 1a9d90...
    |    \    |
    |      \  |
 46508c...  b6765c...
    |         |
 1c4e8d...  328638...
       \     /
       6627f7b...
          |
       754b42...
          |    \
          |      \
       d1879f...  fba5d0...
          |
       c962db...

hg

Method

See a hg tour through history, and section 2.4.1, “Changesets, revisions, and talking to other people”.

hg uses a method that may look like a mix of the methods used by git and bzr; it has two distinct methods of referring to each revision. Like git, hg uses sha1sums to refer to revisions (though it abbreviates them to fewer characters by default). Like bzr, hg uses the numbers 1, 2, 3, etc. to refer to revisions. Thus hg has one unique method to refer to revisions and another that is simple and easily manipulatable by users. Each revision (or “changeset” in mercurial’s vocabulary) is of the form revision-number:changeset-identifier (e.g. 3:ff5d7b70a2a9).

Like bzr, branches in hg are typically done by creating separate directories (typically with their own repository). However, it also has named branches for naming branches within a repository, which are somewhat similar to git. (I have been told there are important distinctions between hg named branches and git branches, but I do not fully understand all the details; maybe someone will explain in the comments.)

mercurial has both tags and local tags, with (normal) tags being stored in an .hgtags file that is version controlled, and local tags being stored in a file that is not version controlled nor shared (cloned/pulled/pushed/etc.). Like most other systems, tags in hg are an alias for a specific commit.

The (abbreviated) sha1sum portion of hg revisions (the “changeset identifier”) is unique by design; if you have the same revision in two different repositories, the changeset identifier for both will be the same. The simple number portion of hg revisions (the “revision number”) is not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both.

Picture

             19:c87f92...
                |
             18:650a6f...
               |      \
        15:caf806...   \
         /     |        \
       /       |         \
      /        |          \
13:75cc2c... 14:147c0a... 17:acac44...
      \        /           |
        \     /            |
       12:8f50e6...      16:8147be...
         /    |        /
       /      |      /
9:9b39b2... 11:6e2cde...
    |         |
8:01fa22... 10:1a9d90...
    |    \    |
    |      \  |
5:46508c... 7:b6765c...
    |         |
4:1c4e8d... 6:328638...
       \     /
      3:6627f7b...
          |
      2:754b42...
          |
          |
      1:d1879f...
          |
      0:c962db...

Final notes

Each system uses a different scheme, which have different advantages and disadvantages. Odds are that I am not aware of all the relative merits of these systems yet, though I do know some. Personally, I don’t think any of them are optimal (though I admit that optimality is a somewhat relative term given the inherent trade-offs involved). Unfortunately I’m going off-topic, as I said I wouldn’t be discussing advantages and disadvantages in this post, so I’ll shut my trap here…

9 thoughts on “Many different kinds of revision specifiers”

  1. The crypto ones would be less unwieldy if encoded as base-52 or something (use all upper and lowercase consonants and digits, no vowels so no words are spelled). Though, pretty hard to switch at this point.

  2. Love your work on this topic – keep it up!

    BTW, I was going to send you a link to Chris Ball’s “Bugs Everywhere,” but I see you’ve all ready commented there 🙂 Interesting stuff.

  3. For your description of Bazaar, you’ve got the terminology a bit wrong: a “revision number” is not a “revision”. Revisions in Bazaar branches are unique. On the other hand, a particular revision number may map to different revisions in different branches. A revision ID uniquely identifies a revision globally, similar to a git or hg commit checksum.

  4. Hg branches are just a nickname for a particular flow of history, when you create a new changeset, it will by default inherit the nickname of the parent.
    You have to keep in mind that this nickname is immutable and will stay in the history forever.

    Afaik git branches is just a pointer to a particular head, no history is kept. There have been some talks in the Mercurial community to add “bookmarks” that would behave similar to that.

    I think that the special naming for parents is quite nice, hg provides it with an extension (parentspec) for power users who wants it.

    Another note about hg, is the possibility to use any non-ambiguous prefix of the hash as an identifier. I don’t know if other SCM uses this but ‘.’ can be used to identify the version you checkouted (working copy version).

  5. First, descriptions of commits counting from branch tips backwards (e.g. HEAD^, or HEAD~5, or master~10^2~12) are transient; only those using tags are constant, but not all revisions have such description. Second, you can get them in git-log output using git-name-rev as filter (see documentation), or –decorate option to git-log.

    BTW. there is yet another way to refer to revision (besides shortened sha-1), namely this returned by git-describe, in the form of closest tag + forward counting + shotened sha-1.

    Note also that all numbering schemes either need central numbering authority (which repository we take those numbers from), or have to be local for a repository.

  6. Havoc, I disagree. Right now Git uses 4 bits/char = 40 chars. Even if you could encode 6 bits/char (base64), you’d still have a 28 character identifier. That’s still too long to type by hand.

    So, I don’t see any reason to introduce all the additional complexity of some sort of custom textual encoding. Either way, the sha1 is unreadable and you’ll never type more than the first 5 chars of it.

  7. Thomas: The leading one is just ugly legacy cruft, AFAICT. I don’t know exactly how it came about, but it is technically possible to get a leading 2 or 3 or whatever. If you google on ‘”Assigning revisions” cederqvist’ you’ll find some links talking about how it’s just convention and the -r option to cvs commit allows you to change the leading revision (though it apparently has some gotchas and I think I remember reading some people advising against it for various reasons). Another good read on the matter is http://www.eyrie.org/~eagle/notes/cvs/revisions.html.

    Havoc: I kind of like having cryptographic checksums around for unique identifiers (they seem cooler than bzr revid’s, IMO); I just think an easily usable and manipulatable identifier is _also_ needed and that it should be just as prominent. I think hg had the right idea here, though I’d rather have something more meaningful and structural than simple 0…n.

    Gabriel: Thanks. 🙂

    James: I modified the post to fix the mistake (changing a couple “revisions” to “revision numbers”); let me know if I still missed anything. I did specify that revids were unique in the original text; was I not clear enough? Thanks for pointing out the issues.

    tonfa: hg stores branch names under revision control too?!? Um, I guess that explains some comments I heard. Interesting; thanks for the explanation.

    Jakub:

    Cool, thanks for the pointer about git-describe. What does the shortened sha1 output by git-describe refer to? Also, I like how you put your numbering schemes comment; I should probably quote that in my post somewhere.

    Should I have said more about the commit descriptions in git? I’m quite familiar with the transiency issues and git-name-rev (see http://www.gnome.org/~newren/eg/git-eg-differences.html#log). In fact, I don’t see the transience issue as a problem at all but a benefit (it makes it a lot easier to refer to “2 commits ago” than can be done in other systems where I first have to find out the current revision number) and it’s really just an extension of the idea of hg and bzr to use easy-to-use revision numbers that are only valid locally.

    My issue with git here isn’t that git doesn’t have facilities for users to obtain useful information, it’s that it doesn’t bother helping users obtain it and learn the system. Personally, I don’t think the –decorate option to git-log is all that useful (except maybe in special circumstances), and even the documentation for git-name-rev suggests a non-optimal use. But even if –decorate was generally useful or the example command in the git-name-rev doc was the optimal one, the problem is that git passes up one of the best opportunities to give users easily manipulatable versions, teach them revision specifiers, and give them a small picture of the structure of the commits. While I’ve railed on git for being user-hostile, I really have found that git is quite close to being a friendly system like the others; it really doesn’t take that much to fix it up. A few small changes here and there (plus filling in a huge documentation gap) really does transform it from user-hostile to user-friendly.

  8. By the way, all graphical history viewers (gitk and qgit, probably also gitnub, gitview and giggle) show branches (heads) and tags markers along revision list (revision graph).

    Also, git-show-branch uses “transient” counting backwards from branches notation; unfortunately it is one of less known commands, and its output is a bit cryptic. It would be nice if git-log had switch which would function as if git-name-rev was specified as filter; the –decorate option only adds refs (heads and branches) markers in a way similar to gitk, qgit and gitweb.

    As to git-describe output: it looks (by default) like this: v1.5.5-rc2-1-ge95d75d, which means 1 commit since version tagged v1.5.5, with shortened sha1 being e95d75d (with 7 characters it should be unique in most repositories). The sha-1 is needed because with multiple branches there can be, and usually are, more than one commit (revision) can be 1 commit after given tag.

    As to being user-friendly… git still shows its origin as a bucnh of low level “plumbing” tools for hackers to use to build they own SCM scripts; it has much, much improved since.

Comments are closed.