States in Version Control Systems

Elijah has been writing an interesting series of articles comparing different version control systems. While the previous articles have been very informative, I think the latest one was a bit muddled. What follows is an expanded version of my comment on that article. Elijah starts by making an analogy between text editors and version control systems, which I think is quite a useful analogy. When working with a text editor, there is a base version of the file on disk, and the version you are currently working on which will become the next saved version. This does map quite well to the concepts of most VCS's. You have a working copy that starts out identical to a base tree from the branch you are editing. You make local changes and eventually commit, creating a new base tree for future edits. In addition to these two "states", Elijah goes on to list three more states that are actually orthogonal to the original two. These additional states refer to certain categorisations of files within the working copy, rather than particular versions of files or trees. Rather than simplifying things, I believe that mingling the two concepts together is more likely to cause confusion. I think this is evident from the fact that the additional states do not fit the analogy we started with. Versioned and Unversioned Files If you are going to use a version control system seriously, it is worth understanding how files within a working copy are managed. Rather than thinking of a flat list of possible states, I think it is helpful to think of a hierarchy of categories. The most basic categorisation is whether a file is versioned or not. Versioned files are those whose state will be saved when committing a new version of the tree. Conversely, unversioned files exist in the working copy but are not recorded when committing new versions of the tree. This concept does not map very well to the original text editor analogy. If text editors did support such a feature, it would be the ability to add paragraphs to the document that do not get stored to disk when you save, but would persist inside the editor. Types of Versioned Files There are various ways to categorise versioned files, but here are some fairly generic ones that fit most VCS's. unchanged modified added removed Each of these categorisations is relative to the base tree for the working copy. The modified category contains both files whose contents have changed and whose metadata has changed (e.g. files that have been renamed). The removed category is interesting because files in this category don't actually exist in the working copy. That said the VCS knows that such files did exist, so it knows to delete the files when committing the next version of the tree. Types of Unversioned Files There are two primary categories for unversioned files: ignored unknown The ignored category consists of unversioned files that the VCS knows the user does not want…

Signed Revisions with Bazaar

One useful feature of Bazaar is the ability to cryptographically sign revisions. I was discussing this with Ryan on IRC, and thought I'd write up some of the details as they might be useful to others. Anyone who remembers the past security of GNOME and Debian servers should be able to understand the benefits of being able to verify the integrity of a source code repository after such an incident. Rather than requiring all revisions made since the last known safe backup to be examined, much of the verification could be done mechanically. Turning on Revision Signing The first thing you'll need to do is get a PGP key and configure GnuPG to use it. The GnuPG handbook is a good reference on doing this. As the aim is to provide some assurance that the revisions you publish were really made by you, it'd be good to get the key signed by someone. Once that is done, it is necessary to configure Bazaar to sign new revisions. The easiest way to do this is to edit ~/.bazaar/bazaar.conf to look something like this: [DEFAULT] email = My Name <me@example.com> create_signatures = always Now when you run "bzr commit", a signature for the new revision will be stored in the repository. With this configuration change, you will be prompted for your pass phrase when making commits. If you'd prefer not to enter it repeatedly, there are a few options available: install gpg-agent, and use it to remember your pass phrase in the same way you use ssh-agent. install the gnome-gpg wrapper, which lets you remember your pass phrase in your Gnome keyring. To use gnome-gpg, you will need to add an additional configuration value: "gpg_signing_command = gnome-gpg". Signatures are transferred along with revisions when you push or pull a branch, perform merges, etc. How Does It Work? So what does the signature look like, and what does it cover? There is no command for printing out the signatures, but we can access them using bzrlib. As an example, lets look at the signature on the head revision of one of my branches: >>> from bzrlib.branch import Branch >>> b = Branch.open('http://bazaar.launchpad.net/~jamesh/storm/reconnect') >>> b.last_revision() 'james.henstridge@canonical.com-20070920110018-8e88x25tfr8fx3f0' >>> print b.repository.get_signature_text(b.last_revision()) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 bazaar-ng testament short form 1 revision-id: james.henstridge@canonical.com-20070920110018-8e88x25tfr8fx3f0 sha1: 467b78c3f8bfe76b222e06c71a8f07fc376e0d7b -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG8lMHAa+T2ZHPo00RAsqjAJ91urHiIcu4Bim7y1tc5WtR+NjvlACgtmdM 9IC0rtNqZQcZ+GRJOYdnYpA= =IONs -----END PGP SIGNATURE----- >>> If we save this signature to a file, we can verify it with a command like "gpg --verify signature.txt" to prove that it was made using my PGP key. Looking at the signed text, we see three lines: An identifier for the checksum algorithm. This is included to future proof old signatures should the need arise to alter the checksum algorithm at a later date. The revision ID that the signature applies to. Note that this is the full globally unique identifier rather than the shorter numeric identifiers that are only unique in the context of an individual branch. The checksum, in SHA1 form. For the…

Bazaar bundles as part of a review process

In my previous article, I outlined Bazaar's bundle feature. This article describes how the Bazaar developers use bundles as part of their development and code review process. Proposed changes to Bazaar are generally posted as patches or bundles to the development mailing list. Each change is discussed on the mailing list (often going through a number of iterations), and ultimately approved or rejected by the core developers. To aide in managing these patches Aaron Bentley (one of the developers wrote a tool called Bundle Buggy. Bundle Buggy watches messages sent to the mailing list, checking for messages containing patches or bundles. It then creates an entry on the web site displaying the patch, and lets developers add comments (which get forwarded to the mailing list). Now while Bundle Buggy can track plain patches, a number of its time saving features only work for bundles: Automatic rejection of superseded patches: when working on a feature, it is common to go through a number of iterations. When going through the list of pending changes, the developers don't want to see all the old versions. Since a bundle describes a Bazaar branch, and it is trivial to check if one branch is an extension of another though, Bundle Buggy can tell which bundles are obsolete and remove them from the list. Automatically mark merged bundles as such: the canonical way to know that a patch has been accepted is for it to be merged to mainline. Each Bazaar revision has a globally unique identifier, so we can easily check to see if the head revision of the bundle is in the ancestry of mainline. When this happens, Bundle Buggy automatically marks them as merged. Using these techniques the list of pending bundles is kept under control. Further Possibilities Of course, these aren't the only things that can be done to save time in the review process. Another useful idea is to automatically try and merge pending bundles or branches to see if they can still be merged without conflicts. This can be used as a way to put the ball back in the contributors court, obligating them to fix the problem before the branch can be reviewed. This sort of automation is not only limited to projects using a mailing list for code review. The same techniques could be applied to a robot that scanned bug reports in the bug tracker (e.g. Bugzilla) for bundles, and updated their status accordingly.

Bazaar Bundles

This article follows on from the series of tutorials on using Bazaar that I have neglected for a while. This article is about the bundle feature of Bazaar. Bundles are to Bazaar branches what patches are to tarballs or plain source trees. Context/unified diffs and the patch utility are arguably one of most important inventions that enable distributed development: The patch is a self contained text file, making it easy to send as an email attachment or attach to a bug report. The size of the patch is proportional to the size of the changes rather than the size of the source tree. So submitting a one line fix to the Linux kernel is as easy as a one line fix for a small one person project. Even if the destination source tree has moved forward since the patch was created, the patch utility does a decent job of applying the changes using heuristics to match the surrounding context. Human intervention is only needed if the edits are to the same section of code. As patches are human readable text files, they are a convenient form to review the code changes. Of course, patches do have their limitations: The unified diff format doesn't convey file moves, instead showing the entire file content being removed and then added again. If the file was changed in addition to being moved, the change can easily be missed when reviewing the patch. Changes to binary files are omitted from the patch. While we can't expect such changes to be represented in a human readable form, it'd be nice for them to be represented in a way that they can be applied at the other end. The patch doesn't record any intermediate steps in the creation of the change. This can be worked around by sending a sequence of patches that each build on the previous one, but this requires a fair bit of attentiveness on the part of the patch creator. If the project in question is using some form of version control, the changes in the patch will likely be attributed to the person who applied the patch rather than the person who made the patch. Using distributed version control solves these limitations, but simply publishing a branch and telling someone to pull from it does not provide all the benefits of a patch. For one, the person reviewing the changes needs to be online to merge the branch and evaluate the changes. Second, the contributor of the change needs somewhere to host the branch. Even though finding a place to host the branch may not be difficult (for example, anyone can host their branches on Launchpad), uploading the branch may be more effort than the contributor cares for (uploading a branch the size of the Linux kernel will take a while, for instance). That branch would need to remain available until the changes were accepted. For Bazaar, bundles provide a solution to this problem. A bundle is effectively a "branch diff", which…

FM Radio in Rhythmbox – The Code

Previously, I posted about the FM radio plugin I was working on. I just posted the code to bug 168735. A few notes about the implementation: The code only supports Video4Linux 2 radio tuners (since that’s the interface my device supports, and the V4L1 compatibility layer doesn’t work for it). It should be possible to port it support both protocols if someone is interested. It does not pass the audio through the GStreamer pipeline. Instead, you need to configure your mixer settings to pass the audio through (e.g. unmute the Line-in source and set the volume appropriately). It plugs in a GStreamer source that generates silence to work with the rest of the Rhythmbox infrastructure. This does mean that the volume control and visualisations won’t work No properties dialog yet. If you want to set titles on the stations, you’ll need to edit rhythmdb.xml directly at the moment. The code assumes that the radio device is /dev/radio0. Other than that, it all works quite well (I've been using it for the last few weeks). Development I developed this plugin in Bazaar using Jelmer's bzr-svn plugin. It produces a repeatable import, so I should be able to cross merge with anyone else producing branches with it. It is also possible to use bzr-svn to merge Bazaar branches back into the original Subversion repository through the use of a lightweight checkout. For anyone wanting to play with my Bazaar branch, it is published in Launchpad and can be grabbed with the following command: bzr branch lp:~jamesh/rhythmbox/fmradio rhythmbox