Posts Tagged ‘revisions’

Many different kinds of revision specifiers

Monday, March 31st, 2008

Version control systems each use their own method to refer to different versions (also known as ‘revisions’) of the repository. The choice of revision specification often reflects underlying data structures, and the choice of data structures often inhibits or enables various features for the system. Additionally, the methods of displaying and using revision specifiers can also affect the ease with which users can learn and use the new system.

Unfortunately, a full comparison is beyond the scope of this post. I will concentrate on simply introducing the basics and giving a flavor for how things are layed out, which itself is a long enough topic. While conclusions could be drawn with just the data and explanations presented here, I am intentionally avoiding doing so and leaving such to possible later posts. (Besides, bloody taxes and the brain-damaged US tax code have stripped me of any time that I would need to write such additional comparisons.)

Warning: My pictoral representations for each system will be crazier and more complex than usual (and even more lopsidedly complex for some systems than others) in order to keep things short while still showing what is possible.

cvs

Method

See cvs revision numbers and cvs branching basics, particularly figure 2.4 near the end of the branching basics section.

CVS has revision identifiers that are per-file, meaning that repositories at any given time are a combination of many different revisions (one for each file). Ignoring an ugly technical detail about the special revisions 1.1.1.1 and 1.1.1, the first version of a file is numbered 1.1. The next change to the file is recorded as 1.2, the next is 1.3, and so forth. If the user wants to create a branch, based on the 1.3 version of a file, then the branched version is 1.3.2.1. Changing and committing the file on the branch results in 1.3.2.2, then 1.3.2.3, etc. A second branch also created off of 1.3 would be numbered 1.3.4 instead of 1.3.2 (with actual commits numbered 1.3.4.x).

Note that branches are named by a revision with one less number (e.g. 1.4.2 is the name of the branch with commits numbered 1.4.2.x). As such, branch names refer to the beginning of the branch. Each file is branched separately, with per-file revision numbers (it is even possible to branch some files without branching others).

Tags are aliases for a specific version number. Since revisions are per-file, a given tag may refer to different revision numbers for different files (e.g. the ‘v1.0′ tag might refer to version 1.27 of foo.c, 1.36 of bar.h, and 1.218 of foobar.py)

Uniqueness of cvs revisions is not an issue since there is only one repository.

Picture

                       (etc)
                         |
             (etc)   1.4.4.3.2.2
               |         |
            1.4.4.5  1.4.4.3.2.1
               |         |
            1.4.4.4  (1.4.4.3.2)
               |     /
               |   /
            1.4.4.3
               |
  1.4.2.2   1.4.4.2
     |         |
  1.4.2.1   1.4.4.1
     |         |
  (1.4.2)   (1.4.4)
      \       /
       \     /
        \   /
         \ /
         1.4
          |
         1.3
          |
         1.2
          |
         1.1

svn

Method

See svn revisions and working with your branch, particularly figure 4.4 (the branching of one file’s history).

svn uses global revision identifiers, with the first revision being marked as 1, the second as 2, the third as 3, etc.

Branches have an unusual implementation in subversion; they are handled by a namespacing convention: a branch is the combination of revisions within the global repository that exist within a certain namespace. Creating a new branch is done by copying an existing set of files from one namespace to another, recorded as a revision itself.

Tags (an alias for a specific version in history) don’t exist in subversion. Instead, subversion again uses a namespacing convention identical to that done for branches (thus making tags and branches indistinguishable in subversion other than the chosen names), and users are merely discouraged from committing additional changes to files within a tag namespace.

Uniqueness of svn revisions is not an issue since there is only one repository.

Technically, a revision could simultaneously modify any combination of branches and tags by simply committing to all namespaces; however, this is typically discouraged and users only have a certain namespace checked out at a time.

Picture

  trunk   branches/proj-2-22  branches/proj-2-20  tags/RELEASE_2_22_2
   24
                                                        23
                 22
   21
   20
                 19
                                     18
                                     17
   16
   15
                 14
                 13
                 12
   11
                                     10
    9
    8
    7
                                      6
    5
                                      4
    3
    2
    1

bzr

Method

See understanding bzr revision numbers and specifying bzr revisions.

bzr, like svn, uses 1, 2, 3, etc. for revision numbers. However, the revision numbers are always consecutive in a branch. Merged in changes from other branches are given 3 numbers per revision. For example, if changes were merged from a repository that has changes relative to revision 2, the changes would come into the current branch numbered 2.1.1, 2.1.2, 2.1.3, etc. If changes from more than one branch are relative to the same commit, then the middle number is used to distinguish commits from the different branches. Thus one would see another set of changes relative to commit 2 numbered as 2.2.1, 2.2.2, 2.2.3, 2.2.4, etc. (Versions of bzr older than 1.2 used more than 3 numbers in certain cases, but that is no longer true of current versions.) See the picture below to make this clearer.

Branches in bzr are done by creating separate directories (typically with their own repository), though one can set up shared repositories. Each branch will have its own numbering scheme for the revisions it stores, recording the order that the revisions entered that repository. (See below about uniqueness issues.)

Tags in bzr are an alias for a commit, and are stored as part of a branch.

Note that bzr revision numbers are not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both. bzr does store unique identifiers for revisions, known as revid’s (an example of which looks like Matthieu.Moy@imag.fr-20051026185030-93c7cad63ee570df), though they are not shown by default. Users can obtain these unique identifiers by passing the –show-ids flag to bzr log, and these revids can be used in place of the simpler default revision specifiers when prefixed with “revid:”.

Picture

              12
              |
              11
            / | \
          /   |  \
        /     |   \
      10    4.1.5  4.2.2
       \   /  |      |
        \ /   |      |
         9    |    4.2.1
        / \   |   /
       /   \  |  /
       8    4.1.4
       |      |
       7    4.1.3
       | \    |
       |   \  |
       6    4.1.2
       |      |
       5    4.1.1
        \   /
          4
          |
          3
          |
          2
          |
          1

Note: The revision identifiers shown in this picture are dependent on merge order; the revisions 4.1.5, 4.2.1, and 4.2.2 could instead be numbered 4.2.1, 4.1.5 and 4.1.6 respectively if the merges done to obtain revision 11 were done in a different order.

git

Method

See Understanding git history: Commits, and naming git commits.

git uses cryptographic checksums (in particular, sha1sums) of repository contents as revision identifiers. These checksums are 40-character hexadecimal strings (e.g. 621ff6759414e2a723f61b6d8fc04b9805eb0c20). Each revision also knows which revision(s) it was derived from (known as the revision’s parent(s)).

Git can be used with one branch per directory like bzr or hg, but it is more common to have branches stored within the same directory/repository (thus the reason some refer to git as a ‘branch container’). In git, branches record the revision of the most recent commit for the branch; since each commit records its parent(s), a branch consists of its most recent commit plus all ancestors of that commit. When a new commit is made on a branch, the branch just records the new revision. Tags simply record a single revision, much like branches, but tags are not advanced when additional commits are made. tags are not stored as part of a branch or in a revision controlled file, though by default tags that point to commits that are downloaded are themselves downloaded as well.

git revisions are unique by design; if you have the same revision in two different repositories, the revision name for both will be the same.

git does provide more human-meaningful ways of referring to commits, in the form of simple suffixes used to count backwards in history from the tip of a branch (or backwards from a tag or commit). This includes methods for counting relative to different parents, making the suffixes have structural meaning. However, such methods are somewhat hidden; for example, they are not shown in the output of git log. This leaves many users unaware of how to take advantage of them, if they are aware of them at all. (A simple wrapper can get them to be shown, at the cost of a little time; they could be shown at negligible time cost with an integrated solution, but none exists to my knowledge.)

Picture

           650a6f...
              |
           caf806...
          /   |   \         719b9d...
        /     |     \       /
      /       |       \   /
 75cc2c...  147c0a... acac44...
      \       |         |
        \     |         |
         8f50e6...    8147be...
         /    |     /
       /      |   /
  9b39b2... 6e2cde...
    |         |
  01fa22... 1a9d90...
    |    \    |
    |      \  |
 46508c...  b6765c...
    |         |
 1c4e8d...  328638...
       \     /
       6627f7b...
          |
       754b42...
          |    \
          |      \
       d1879f...  fba5d0...
          |
       c962db...

hg

Method

See a hg tour through history, and section 2.4.1, “Changesets, revisions, and talking to other people”.

hg uses a method that may look like a mix of the methods used by git and bzr; it has two distinct methods of referring to each revision. Like git, hg uses sha1sums to refer to revisions (though it abbreviates them to fewer characters by default). Like bzr, hg uses the numbers 1, 2, 3, etc. to refer to revisions. Thus hg has one unique method to refer to revisions and another that is simple and easily manipulatable by users. Each revision (or “changeset” in mercurial’s vocabulary) is of the form revision-number:changeset-identifier (e.g. 3:ff5d7b70a2a9).

Like bzr, branches in hg are typically done by creating separate directories (typically with their own repository). However, it also has named branches for naming branches within a repository, which are somewhat similar to git. (I have been told there are important distinctions between hg named branches and git branches, but I do not fully understand all the details; maybe someone will explain in the comments.)

mercurial has both tags and local tags, with (normal) tags being stored in an .hgtags file that is version controlled, and local tags being stored in a file that is not version controlled nor shared (cloned/pulled/pushed/etc.). Like most other systems, tags in hg are an alias for a specific commit.

The (abbreviated) sha1sum portion of hg revisions (the “changeset identifier”) is unique by design; if you have the same revision in two different repositories, the changeset identifier for both will be the same. The simple number portion of hg revisions (the “revision number”) is not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both.

Picture

             19:c87f92...
                |
             18:650a6f...
               |      \
        15:caf806...   \
         /     |        \
       /       |         \
      /        |          \
13:75cc2c... 14:147c0a... 17:acac44...
      \        /           |
        \     /            |
       12:8f50e6...      16:8147be...
         /    |        /
       /      |      /
9:9b39b2... 11:6e2cde...
    |         |
8:01fa22... 10:1a9d90...
    |    \    |
    |      \  |
5:46508c... 7:b6765c...
    |         |
4:1c4e8d... 6:328638...
       \     /
      3:6627f7b...
          |
      2:754b42...
          |
          |
      1:d1879f...
          |
      0:c962db...

Final notes

Each system uses a different scheme, which have different advantages and disadvantages. Odds are that I am not aware of all the relative merits of these systems yet, though I do know some. Personally, I don’t think any of them are optimal (though I admit that optimality is a somewhat relative term given the inherent trade-offs involved). Unfortunately I’m going off-topic, as I said I wouldn’t be discussing advantages and disadvantages in this post, so I’ll shut my trap here…