Google’s Australian Election Tools

It is probably old news to some, but Google have put up an information page on the upcoming Australian Federal Election.

The most useful tool is the Google Maps overlay that provides information about the different electorates. At the moment it only has information about the sitting members, their margin and links to relevant news articles. Presumably more information will become available once the election actually gets called.

Presumably they are planning on offering similar tools for next year’s US elections and this is a beta. So even if you aren’t interested in Australian politics, it might be worth a peak to see what is provided.

Signed Revisions with Bazaar

One useful feature of Bazaar is the ability to cryptographically sign revisions. I was discussing this with Ryan on IRC, and thought I’d write up some of the details as they might be useful to others.

Anyone who remembers the past security of GNOME and Debian servers should be able to understand the benefits of being able to verify the integrity of a source code repository after such an incident. Rather than requiring all revisions made since the last known safe backup to be examined, much of the verification could be done mechanically.

Turning on Revision Signing

The first thing you’ll need to do is get a PGP key and configure GnuPG to use it. The GnuPG handbook is a good reference on doing this. As the aim is to provide some assurance that the revisions you publish were really made by you, it’d be good to get the key signed by someone.

Once that is done, it is necessary to configure Bazaar to sign new revisions. The easiest way to do this is to edit ~/.bazaar/bazaar.conf to look something like this:

[DEFAULT]
email = My Name <me@example.com>
create_signatures = always

Now when you run “bzr commit“, a signature for the new revision will be stored in the repository. With this configuration change, you will be prompted for your pass phrase when making commits. If you’d prefer not to enter it repeatedly, there are a few options available:

  1. install gpg-agent, and use it to remember your pass phrase in the same way you use ssh-agent.
  2. install the gnome-gpg wrapper, which lets you remember your pass phrase in your Gnome keyring. To use gnome-gpg, you will need to add an additional configuration value: “gpg_signing_command = gnome-gpg“.

Signatures are transferred along with revisions when you push or pull a branch, perform merges, etc.

How Does It Work?

So what does the signature look like, and what does it cover? There is no command for printing out the signatures, but we can access them using bzrlib. As an example, lets look at the signature on the head revision of one of my branches:

>>> from bzrlib.branch import Branch
>>> b = Branch.open('http://bazaar.launchpad.net/~jamesh/storm/reconnect')
>>> b.last_revision()
'james.henstridge@canonical.com-20070920110018-8e88x25tfr8fx3f0'
>>> print b.repository.get_signature_text(b.last_revision())
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bazaar-ng testament short form 1
revision-id: james.henstridge@canonical.com-20070920110018-8e88x25tfr8fx3f0
sha1: 467b78c3f8bfe76b222e06c71a8f07fc376e0d7b
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG8lMHAa+T2ZHPo00RAsqjAJ91urHiIcu4Bim7y1tc5WtR+NjvlACgtmdM
9IC0rtNqZQcZ+GRJOYdnYpA=
=IONs
-----END PGP SIGNATURE-----

>>>

If we save this signature to a file, we can verify it with a command like “gpg --verify signature.txt” to prove that it was made using my PGP key. Looking at the signed text, we see three lines:

  1. An identifier for the checksum algorithm. This is included to future proof old signatures should the need arise to alter the checksum algorithm at a later date.
  2. The revision ID that the signature applies to. Note that this is the full globally unique identifier rather than the shorter numeric identifiers that are only unique in the context of an individual branch.
  3. The checksum, in SHA1 form.

For the current signing algorithm, the checksum is made over the long form testament for the revision, which can easily be verified:

$ bzr branch http://bazaar.launchpad.net/~jamesh/storm/reconnect
$ cd reconnect
$ bzr testament --long > testament.txt
$ sha1sum testament.txt
467b78c3f8bfe76b222e06c71a8f07fc376e0d7b  testament.txt

Looking at the long form testament, we can see what the signature ultimately covers:

  1. The revision ID
  2. The name of the committer
  3. The date of the commit
  4. The parent revision IDs
  5. The commit message
  6. A list of the files that comprise the source tree for the revision, along with SHA1 sums of their contents
  7. Any revision properties

So if the revision testament matches the revision signature and the revision signature validates, you can be sure that you are looking at the same code as the person who made the signature.

It is worth noting that while the signature makes an assertion about the state of the tree at that revision — the only thing it tells you about the ancestry is the revision IDs of the parents. If you need assurances about those revisions, you will need to check their signatures separately. One of the reasons for this is that you might not know the full history of a branch if it has ghost revisions (as might happen when importing code from certain foreign version control systems).

Signing Past Revisions

If you’ve already been using Bazaar but had not enabled revision signing, it is likely that you’ve got a bunch of unsigned revisions lying around. If that is the case, you can sign the revisions in bulk using the “bzr sign-my-commits” command. It will go through all revisions in the ancestry, and generate signatures for all the commits that match your committer ID.

Verifying Signatures in Bulk

To verify all signatures found in a repository, John Arbash Meinel’s signing plugin can be used, which provides a “bzr verify-sigs” command. It can be installed with the following commands:

$ mkdir -p ~/.bazaar/plugins
$ bzr branch http://bzr.arbash-meinel.com/plugins/signing/ ~/.bazaar/plugins/signing

When the command is run it will verify the integrity of all the signatures, and give a summary of how many revisions each person has signed.

Schema Generation in ORMs

When Storm was released, one of the comments made was that it did not include the ability to generate a database schema from the Python classes used to represent the tables while this feature is available in a number of competing ORMs. The simple reason for this is that we haven’t used schema generation in any of our ORM-using projects.

Furthermore I’d argue that schema generation is not really appropriate for long lived projects where the data stored in the database is important. Imagine developing an application along these lines:

  1. Write the initial version of the application.
  2. Generate a schema from the code.
  3. Deploy one or more instances of the application in production, and accumulate some data.
  4. Do further development on the application, that involves modifications to the schema.
  5. Deploy the new version of the application.

In order to perform step 5, it will be necessary to modify the existing database to match the new schema. These changes might be in a number of forms, including:

  • adding or removing a table
  • adding or removing a column from a table
  • changing the way data is represented in a particular column
  • refactoring one table into two related tables or vice versa
  • adding or removing an index

Assuming that you want to keep the existing data, it isn’t enough to simply represent the new schema in the updated application: we need to know how that new schema relates to the old one in order to migrate the existing data.

For some changes like addition of tables, it is pretty easy to update the schema given knowledge of the new schema. For others it is more difficult, and will often require custom migration logic. So it is likely that you will need to write a custom script to migrate the schema and data.

Now we have two methods of building the database schema for the application:

  1. generate a schema from the new version of the application.
  2. generate a schema from the old version of the application, then run the migration script.

Are you sure that the two methods will result in the same schema? How about if we iterate the process another 10 times or so? As a related question, are you sure that the database environment your tests are running under match the production environment?

The approach we settled on with Launchpad development was to only deal with migration scripts and not generate schemas from the code. The migration scripts are formulated as a sequence of SQL commands to migrate the schema and data as needed. So to set up a new instance, a base schema is loaded then patched up to the current schema. Each patch leaves a record in the database that it has been applied so it is trivial to bring a database up to date, or check that an application is in sync with the database.

When the schema is not generated from the code, it also means that the code can be simpler. As far as Python ORM layer is concerned, does it matter what type of integer a field contains? Does the Python code care what indexes or constraints are defined for the table? By only specifying what is needed to effectively map data to Python objects, we end up with easy to understand code without annotations that probably can’t specify everything we want anyway.

Upgrading to Ubuntu Gutsy

I got round to upgrading my desktop system to Gutsy today. I’d upgraded my laptop the previous week, so was not expecting much in the way of problems.

I’d done the original install on my desktop back in the Warty days, and the root partition was a bit too small to perform the upgrade. As there was a fair bit of accumulated crud, I decided to do a clean install. Things mostly worked, but there were a few problems, which I detail below:

Dual Head Configuration

With previous releases, I was using the Radeon driver’s MergedFB mode, as it gives a better user experience than the traditional Xinerama code (3D acceleration on both heads, better performance, etc). After moving adding the MergedFB options to xorg.conf, I was just getting the same image cloned on both displays.

Looking at the X server log file, there was a message saying that MergedFB support had been removed in favour of RandR 1.2 support. And it was possible to get dual head working with the xrandr command line tool:

xrandr --output VGA-0 --right-of DVI-0

It was good to know that dual-head still worked, but I didn’t want to reconfigure this every time I restarted the machine. I didn’t find much information on how to configure the initial RandR configuration on the X.org website, but did find a useful guide on the Intel Linux Graphics website. While the guide was aimed at the Intel driver, it had enough information to get things configured for the Radeon driver. The main difference was in the naming of the outputs. Below is a an excerpt of my configuration file that configures things the way I had them previously:

Section "Device"
        Identifier      "ATI Technologies Inc RV280 [Radeon 9200 SE]"
        Driver          "ati"
        BusID           "PCI:1:0:0"
        Option          "monitor-DVI-0" "Sony SDM-S74 [1]"
        Option          "monitor-VGA-0" "Sony SDM-S74 [2]"
EndSection

Section "Monitor"
        Identifier      "Sony SDM-S74 [1]"
        Option          "DPMS"
        HorizSync       30-65
        VertRefresh     50-75
        Option          "LeftOf"        "Sony SDM-S74 [2]"
EndSection

Section "Monitor"
        Identifier      "Sony SDM-S74 [2]"
        Option          "DPMS"
        HorizSync       30-65
        VertRefresh     50-75
EndSection

Section "Screen"
        Identifier      "Default Screen"
        Device          "ATI Technologies Inc RV280 [Radeon 9200 SE]"
        Monitor         "Sony SDM-S74 [1]"
        DefaultDepth    16
        SubSection "Display"
                Modes           "1280x1024" "1024x768" "800x600" "640x480"
                Virtual         2560 1024
        EndSubSection
EndSection

I had originally tried setting the VGA monitor to be “RightOf” the monitor connected to the DVI, but that left me with the desktop in clone mode. The main difference I’ve noticed with this configuration compared to my previous one is that the GDM login prompt displays on the right hand head (VGA) rather than the left hand head (DVI).

Window Shadows Don’t Render

Desktop Effects were enabled by default after the install (and on the live CD). While some effects seemed to work, the shadows on the panel and drop down menus were rendered as opaque grey boxes around the windows. I ended up just disabling the effects to clear up the problem.

This bug had already been reported as bug 141304 (which may be the same as bug 116808).

Firefox Crashes on Startup

When I tried to start firefox, it would momentarily display a window and then crash. This appears to be bug 133124, and seems to only occur on AMD64 systems. The problem appears to be in the ubuntulooks theme engine, and switching to a different control theme makes the problem go away, but hopefully it’ll get fixed for the final release.

Problems Rendering Ligatures in Firefox

The problems rendering ligatures in firefox seem to be back again. This problem was never really fixed, but was worked around by removing the ligature table entries from the DejaVu fonts. With the ligature table entries back, the symptoms have returned. This is bug 37828.

Manic Times

When I got back from Florida, I found a copy of the Manic Times in the mail. It seems that I received the copy because I used to be subscribed to The Chaser back when it was a newspaper. The newspaper is being edited by Charles Firth, who was the US correspondent in the last series of The Chaser’s War on Everything.

The content is fairly different to what was published in The Chaser, in that they are nominally true. That said, they are written from a non-mainstream point of view. The issue I received seemed to focus on the recent APEC conference and related security measures (which appear to have been fairly poor).

I haven’t yet decided whether to subscribe, but the online subscription form thoughtfully includes “Governor-General”, “Deputy Vice-President”, “The Right Hon”, “Queen” and “Archbishop” in the dropdown list of titles to pick from – something that most likely inconveniences people on other web sites.

In Florida

This week I am in Florida for a Launchpad sprint. I was meant to arrive on Sunday night, but I fell asleep in the boarding lounge and missed the San Francisco → Orlando flight (the flight out of Perth was an early morning one, and I didn’t get enough sleep on the plane). The earliest alternative fligh was the same time the next day, so I ended up ariving on Monday night.

I can’t say I was impressed with United’s customer service though. I was directed to the customer service centre in the airport and queued up behind about 10 other people. After a short while, the one staff member at the desk announced that her shift was over and that her replacements would not be arriving for another hour. It seems like really bad management to leave the desk unattended for an hour, particularly when they knew that there were people waiting.

They had a bunch of check-in computers which were supposed to let you change your flight details, so I gave one of these a try.  Unfortunately, the computers directed me to pick up the phone to talk to a representative, and the representative ended up directing me to talk to someone at the customer service centre.  After waiting for the next shift, things got sorted out okay though, which was good.

This was also my first experience with SSSS screening.  In fact I got to experience it twice: once when checking in for the flight I missed, and again for the later flight.  On my way back to Australia, I’ll have two more flights leaving from US airports so it’ll be interesting to see what happens then.

Canonical Shop Open

The new Canonical Shop was opened recently which allows you to buy anything from Ubuntu tshirts and DVDs up to a 24/7 support contract for your server.

One thing to note is that this is the first site using our new Launchpad single sign-on infrastructure. We will be rolling this out to other sites in time, which should give a better user experience to the existing shared authentication system currently in place for the wikis.

Bazaar bundles as part of a review process

In my previous article, I outlined Bazaar‘s bundle feature. This article describes how the Bazaar developers use bundles as part of their development and code review process.

Proposed changes to Bazaar are generally posted as patches or bundles to the development mailing list. Each change is discussed on the mailing list (often going through a number of iterations), and ultimately approved or rejected by the core developers. To aide in managing these patches Aaron Bentley (one of the developers wrote a tool called Bundle Buggy.

Bundle Buggy watches messages sent to the mailing list, checking for messages containing patches or bundles. It then creates an entry on the web site displaying the patch, and lets developers add comments (which get forwarded to the mailing list).

Now while Bundle Buggy can track plain patches, a number of its time saving features only work for bundles:

  1. Automatic rejection of superseded patches: when working on a feature, it is common to go through a number of iterations. When going through the list of pending changes, the developers don’t want to see all the old versions. Since a bundle describes a Bazaar branch, and it is trivial to check if one branch is an extension of another though, Bundle Buggy can tell which bundles are obsolete and remove them from the list.
  2. Automatically mark merged bundles as such: the canonical way to know that a patch has been accepted is for it to be merged to mainline. Each Bazaar revision has a globally unique identifier, so we can easily check to see if the head revision of the bundle is in the ancestry of mainline. When this happens, Bundle Buggy automatically marks them as merged.

Using these techniques the list of pending bundles is kept under control.

Further Possibilities

Of course, these aren’t the only things that can be done to save time in the review process. Another useful idea is to automatically try and merge pending bundles or branches to see if they can still be merged without conflicts. This can be used as a way to put the ball back in the contributors court, obligating them to fix the problem before the branch can be reviewed.

This sort of automation is not only limited to projects using a mailing list for code review. The same techniques could be applied to a robot that scanned bug reports in the bug tracker (e.g. Bugzilla) for bundles, and updated their status accordingly.

Bazaar Bundles

This article follows on from the series of tutorials on using Bazaar that I have neglected for a while. This article is about the bundle feature of Bazaar. Bundles are to Bazaar branches what patches are to tarballs or plain source trees.

Context/unified diffs and the patch utility are arguably one of most important inventions that enable distributed development:

  • The patch is a self contained text file, making it easy to send as an email attachment or attach to a bug report.
  • The size of the patch is proportional to the size of the changes rather than the size of the source tree. So submitting a one line fix to the Linux kernel is as easy as a one line fix for a small one person project.
  • Even if the destination source tree has moved forward since the patch was created, the patch utility does a decent job of applying the changes using heuristics to match the surrounding context. Human intervention is only needed if the edits are to the same section of code.
  • As patches are human readable text files, they are a convenient form to review the code changes.

Of course, patches do have their limitations:

  • The unified diff format doesn’t convey file moves, instead showing the entire file content being removed and then added again. If the file was changed in addition to being moved, the change can easily be missed when reviewing the patch.
  • Changes to binary files are omitted from the patch. While we can’t expect such changes to be represented in a human readable form, it’d be nice for them to be represented in a way that they can be applied at the other end.
  • The patch doesn’t record any intermediate steps in the creation of the change. This can be worked around by sending a sequence of patches that each build on the previous one, but this requires a fair bit of attentiveness on the part of the patch creator.
  • If the project in question is using some form of version control, the changes in the patch will likely be attributed to the person who applied the patch rather than the person who made the patch.

Using distributed version control solves these limitations, but simply publishing a branch and telling someone to pull from it does not provide all the benefits of a patch. For one, the person reviewing the changes needs to be online to merge the branch and evaluate the changes.

Second, the contributor of the change needs somewhere to host the branch. Even though finding a place to host the branch may not be difficult (for example, anyone can host their branches on Launchpad), uploading the branch may be more effort than the contributor cares for (uploading a branch the size of the Linux kernel will take a while, for instance). That branch would need to remain available until the changes were accepted.

For Bazaar, bundles provide a solution to this problem. A bundle is effectively a “branch diff”, which can then be used to integrate a set of revisions into a repository assuming it contains the revisions from the target branch. At this point, those changes can be merged or pulled.

So how do we produce a bundle? Lets start by creating a branch of the project we want to contribute to. For this example, we’ll create a branch of Mailman to make our changes. As Mailman is using Launchpad to host its branches, I can use the shorthand implemented by the Launchpad Bazaar plugin to create my branch:

bzr branch lp:mailman mailman.jamesh
cd mailman.jamesh
# make my changes here
bzr commit

After I am happy with my changes, I can create a bundle of those changes:

bzr bundle > my-changes.diff

As mentioned earlier, a bundle is essentially a diff between two branches. As I did not specify any branch in the above command, Bazaar uses the parent branch, which in this case will be the upstream Mailman branch. If we look at my-changes.diff, we will see a text file with three general sections:

  1. A short header identifying the file as a bundle and giving the last commit message, author and date
  2. A unified diff made between the last common revision with the parent and the head of our branch (this bit is also convenient to review).
  3. Some extra book keeping data. If I’d made multiple commits, this would include data needed to reconstruct the other revisions in the bundle.

I can now submit this bundle in the same way that I’d submit a patch: as an email attachment or in the bug tracker.

To merge the bundle, a developer simply needs to save the bundle to disk and use “bzr merge” on it:

bzr merge my-changes.diff
bzr commit

This will have the same effect as if they merged a branch with those changes. The “bzr log” output will show the merged revisions and “bzr annotate” will credit the changes to the person who made them rather than the person who merged it.

So next time you want to submit a patch to a project that uses Bazaar, consider submitting a bundle instead.

Storm Released

This week at the EuroPython conference, Gustavo Niemeyer announced the release of Storm and gave a tutorial on using it.

Storm is a new object relational mapper for Python that was developed for use in some Canonical projects, and we’ve been working on moving Launchpad over to it. I’ll discuss a few of the nice features of the package:

Loose Binding Between Database Connections and Classes

Storm has a much looser binding between database connections and the classes used to represent records in particular tables. The standard way of querying the database uses a store object:

for obj in store.find(SomeClass, conditions):
    # do something with obj (which will be a SomeClass instance)

Some things to note about this syntax:

  • The class used to represent rows in the table is passed to find(), so it is possible to have multiple classes representing a single table. This can be useful with large tables where you are only interested in a few columns in some cases.
  • The class used to represent the table is not bound to a particular connection. So instances of it can come from different stores.

Lockstep Iteration

As well as iterating over a single table, a Storm result set can iterate over multiple tables together. For instance, if we have a table representing people and a table representing email addresses (where each person can have multiple email addresses), it is possible to iterate over them in lockstep:

for person, email in store.find((Person, Email), Person.id == Email.person):
    print person.name, email.address

Automatic Flushing Before Queries

One of the gotchas when using SQLObject was the way it locally cached updates to tables. This is a great way to reduce the number of updates sent to the database, but could result in unexpected results when performing subsequent SELECT queries. It was up to the programmer to remember to flush changes before doing a query.

With Storm, the store will flush pending changes automatically before performing the query.

Easy To Execute Raw SQL

An ORM can really help when developing a database driven application, but sometimes plain old SQL is a better fit. Storm makes it easy to execute raw SQL against a particular store with the store.execute() method. This method returns an object that you can iterate over to get the tuples from the result set. It also makes sure that any local changes have been flushed before executing the query.

Nice Clean Code

After working with SQLObject for a while, Storm has been a breath of fresh air. The internals are clean and nicely laid out, which makes hacking on it very easy. It was developed using test-driven development methodology, so there is an extensive test suite that makes it easy to validate changes.