Storm Released

This week at the EuroPython conference, Gustavo Niemeyer announced the release of Storm and gave a tutorial on using it.

Storm is a new object relational mapper for Python that was developed for use in some Canonical projects, and we’ve been working on moving Launchpad over to it. I’ll discuss a few of the nice features of the package:

Loose Binding Between Database Connections and Classes

Storm has a much looser binding between database connections and the classes used to represent records in particular tables. The standard way of querying the database uses a store object:

for obj in store.find(SomeClass, conditions):
    # do something with obj (which will be a SomeClass instance)

Some things to note about this syntax:

  • The class used to represent rows in the table is passed to find(), so it is possible to have multiple classes representing a single table. This can be useful with large tables where you are only interested in a few columns in some cases.
  • The class used to represent the table is not bound to a particular connection. So instances of it can come from different stores.

Lockstep Iteration

As well as iterating over a single table, a Storm result set can iterate over multiple tables together. For instance, if we have a table representing people and a table representing email addresses (where each person can have multiple email addresses), it is possible to iterate over them in lockstep:

for person, email in store.find((Person, Email), Person.id == Email.person):
    print person.name, email.address

Automatic Flushing Before Queries

One of the gotchas when using SQLObject was the way it locally cached updates to tables. This is a great way to reduce the number of updates sent to the database, but could result in unexpected results when performing subsequent SELECT queries. It was up to the programmer to remember to flush changes before doing a query.

With Storm, the store will flush pending changes automatically before performing the query.

Easy To Execute Raw SQL

An ORM can really help when developing a database driven application, but sometimes plain old SQL is a better fit. Storm makes it easy to execute raw SQL against a particular store with the store.execute() method. This method returns an object that you can iterate over to get the tuples from the result set. It also makes sure that any local changes have been flushed before executing the query.

Nice Clean Code

After working with SQLObject for a while, Storm has been a breath of fresh air. The internals are clean and nicely laid out, which makes hacking on it very easy. It was developed using test-driven development methodology, so there is an extensive test suite that makes it easy to validate changes.

ZeroConf support for Bazaar

When at conferences and sprints, I often want to see what someone else is working on, or to let other people see what I am working on. Usually we end up pushing up to a shared server and using that as a way to exchange branches. However, this can be quite frustrating when competing for outside bandwidth when at a conference.

It is possible to share the branch from a local web server, but that still means you need to work out the addressing issues.

To make things easier, I wrote a simple Bazaar/Avahi plugin. It provides a command “bzr share“, which does the following:

  • Scan the directory for any Bazaar branches it contains.
  • Start up the Bazaar server to listen on a TCP port and share the given directory.
  • Advertise each of the branches via mDNS using Avahi. They are shared using the branch nickname.

For the client side, the plugin implements a “bzr browse” command that will list the Bazaar branches being advertised on the local network (the name and the bzr:// URL). Using the two commands together, it is trivial to share branches locally or find what branches people are sharing.

I am not completely satisfied with how things work, and have a few ideas for how to improve things:

  1. Provide a dummy transport that lets people pull from branches by their advertised service name. This would essentially just redirect from scheme://$SERVICE/ to bzr://$HOST:$PORT/$PATH.
  2. Maybe provide more control over the names the branches get advertised with. Perhaps this isn’t so important though.
  3. Make “bzr share” start and stop advertising branches as they get added/removed, and handle branch nicknames changing (at this point, it is pretty much blue sky though).
  4. Perhaps some form of access control. I’m not sure how easy this is within the smart server protocol, but it should be possible to query the user over whether to accept a connection or not.

It will be interesting to see how well this works at the next sprint or conference.

Python time.timezone / time.altzone edge case

While browsing the log of one of my Bazaar branches, I noticed that the commit messages were being recorded as occurring in the +0800 time zone even though WA switched over to daylight savings.

Bazaar stores commit dates as a standard UNIX seconds since epoch value and a time zone offset in seconds. So the problem was with the way that time zone offset was recorded. The code in bzrlib that calculates the offset looks like this:

def local_time_offset(t=None):
    """Return offset of local zone from GMT, either at present or at time t."""
    # python2.3 localtime() can't take None
    if t is None:
        t = time.time()

    if time.localtime(t).tm_isdst and time.daylight:
        return -time.altzone
    else:
        return -time.timezone

Now the tm_isdst flag was definitely being set on the time value, so it must have something to do with one of the time module constants being used in the function. Looking at the values, I was surprised:

>>> time.timezone
-28800
>>> time.altzone
-28800
>>> time.daylight
0

So the time module thinks that I don’t have daylight saving, and the alternative time zone has the same offset as the main time zone (+0800). This seems a bit weird since time.localtime() says that the time value is in daylight saving time.

Looking at the Python source code, the way these variables are calculated on Linux systems goes something like this:

  1. Get the current time as seconds since the epoch.
  2. Round this to the nearest year (365 days plus 6 hours, to be exact).
  3. Pass this value to localtime(), and record the tm_gmtoff value from the resulting struct tm.
  4. Add half a year to the rounded seconds since epoch, and pass that to localtime(), recording the tm_gmtoff value.
  5. The earlier of the two offsets is stored as time.timezone and the later as time.altzone. If these two offsets differ, then time.daylight is set to True.

Unfortunately, the UTC offset used in Perth at the beginning of 2006 and the middle of 2006 was +0800, so +0800 gets recorded as the daylight saving time zone too. In the new year, the problem should correct itself, but this highlights the problem of relying on these constants.

Unfortunately, the time.localtime() function from the Python standard library does not expose tm_gmtoff, so there isn’t an easy way to correctly calculate this value.

With the patch I did for pytz to parse binary time zone files, it would be possible to use the /etc/localtime zone file with the Python datetime module without much trouble, so that’s one option. It would be nice if the Python standard library provided an easy way to get this information though.

Recovering a Branch From a Bazaar Repository

In my previous entry, I mentioned that Andrew was actually publishing the contents of all his Bazaar branches with his rsync script, even though he was only advertising a single branch. Yesterday I had a need to actually do this, so I thought I’d detail how to do it.

As a refresher, a Bazaar repository stores the revision graph for the ancestry of all the branches stored inside it. A branch is essentially just a pointer to the head revision of a particular line of development. So if the branch has been deleted but the data is still in the repository, recovering it is a simple matter of discovering the identifier for the head revision.

Finding the head revision

Revisions in a Bazaar repository have string identifiers. While the identifiers can be almost arbitrary strings (there are some restrictions on the characters they can contain), the ones Bazaar creates when you commit are of the form “$email-$date-$random“. So if we know the person who committed the head revision and the date it was committed, we can narrow down the possibilities.

For these sort of low level operations, it is easiest to use the Python bzrlib interface (this is the guts of Bazaar). Lets say that we want to recover a head revision committed by foo@example.com on 2006-12-01. We can get all the matching revision IDs like so:

>>> from bzrlib.repository import Repository
>>> repo = Repository.open('repository-directory')
>>> possible_ids = [x for x in repo.all_revision_ids()
...                 if x.startswith('foo@example.com-20061201')]

Now if you’re working on multiple branches in parallel, it is likely that the matching revisions come from different lines of development. To help work out which revision ID we want, we can look at the branch-nick revision property of each revision, which is recorded in each commit. If the nickname hadn’t been set explicitly for the branch we’re after, it will take the base directory name of the branch as a default. We can easily loop through each of the revisions and print a the nicknames:

>>> for rev_id in sorted(possible_ids):
...     rev = repo.get_revision(rev_id)
...     print rev_id
...     print rev.properties['branch-nick']

We can then take the last revision ID that has the nickname we are after. Since lexical sorting of these revision IDs will have sorted them in date order, it should be the last revision. We can check the log message on this revision to make sure:

>>> rev = repo.get_revision('head-revision-id')
>>> print rev.message

If it doesn’t look like the right revision, you can try some other dates (the dates in the revision identifiers are in UTC, so it might have recorded a different date to the one you remembered). If it is the right revision, we can proceed onto recovering the branch.

Recovering the branch

Once we know the revision identifier, recovering the branch is easy. First we create a new empty branch inside the repository:

$ cd repositorydir
$ bzr init branchdir

We can now use the pull command with a specific revision identifier to recover the branch:

$ cd branchdir
$ bzr pull -r revid:head-revision-id .

It may look a bit weird that we are pulling from a branch that contains no revisions into itself, but since the repository for this empty branch contains the given revision it does the right thing. And since bzr pull canonicalises the branch’s history, the new branch should have the same linear revision history as the original branch.

Recovering the branch from someone else’s repository

The above method assumes that you can create a branch in the repository. But what if the repository belongs to someone else, and you only have read-only access to the repository? You might want to do this if you are trying to recover one of the branches from Andrew’s Java GNOME repository 🙂

The easy way is to copy all the revisions from the read-only repository into one you control. First we’ll create a new repository:

$ bzr init-repo repodir

Then we can use the Repository.fetch() bzrlib routine to copy the revisions:

>>> from bzrlib.repository import Repository
>>> remote_repo = Repository.open('remote-repo-url')
>>> local_repo = Repository.open('repodir')
>>> local_repo.fetch(remote_repo)

When that command completes, you’ll have a local copy of all the revisions and can proceed as described above.

UTC+9

  • Post author:
  • Post category:Uncategorized

Daylight saving started yesterday: the first time since 1991/1992 summer for Western Australia. The legislation finally passed the upper house on 21st November (12 days before the transition date). The updated tzdata packages were released on 27th November (6 days before the transition). So far, there hasn’t been an updated package released for Ubuntu (see bug 72125).

One thing brought up in the Launchpad bug was that not all applications used the system /usr/share/zoneinfo time zone database. So other places that might need updating include:

  • Evolution has a database in /usr/share/evolution-data-server-$version/zoneinfo/ that is in iCalendar VTIMEZONE format.
  • Java has a database in /usr/lib/jvm/java-$version/jre/lib/zi. This uses a different binary file format.
  • pytz (used by Zope 3 and Launchpad among others) has a database consisting of generated Python source files for its database.

All the above rules time zone databases are based on the same source time zone information, but need to be updated individually and in different ways.

In a way, this is similar to the zlib security problems from a few years back: the same problem duplicated in many packages and needing to be fixed over and over again. Perhaps the solution is the same too: get rid of the duplication so that in future only one package needs updating.

As a start, I put together a patch to pytz so that it uses the same format binary time zone files as found in /usr/share/zoneinfo (bug 71227). This still means it has its own time zone database, but it goes a long way towards being able to share the system time zone database. It’d be nice if the other applications and libraries with their own databases could make similar changes.

For people using Windows, there is an update from Microsoft. Apparently you need to install one update now, and then a second update next year — I guess Windows doesn’t support multiple transition rules like Linux does. The page also lists a number of applications that will malfunction and not know about the daylight saving shift, so I guess that they have similar issues of some applications ignoring the system time zone database.