bzr-dbus hacking

When working on my bzr-avahi plugin, Robert asked me about how it should fit in with his bzr-dbus plugin. The two plugins offer complementary features, and could share a fair bit of infrastructure code. Furthermore, by not cooperating, there is a risk that the two plugins could break when both installed together.

Given the dependencies of the two packages, it made more sense to put common infrastructure in bzr-dbus and have bzr-avahi depend on it. That said, bzr-dbus is a bit more difficult to install than bzr-avahi, since it requires installation of a D-Bus service activation file. After looking at the code, it seemed that there was room to simplify how bzr-dbus worked and improve its reliability at the same time.

The primary purpose of bzr-dbus is to send signals over the session bus whenever the head revision of a branch changes. This was implemented using a daemon that is started using D-Bus activation, and sends out the signals in response to method calls made by short lived bzr processes.

While this seems to be the design the dbus-python tutorial guides you to use, I don’t think it is the best fit for bzr-dbus. The approach I took was to do away with the daemon altogether: the D-Bus session bus does a pretty good job of broadcasting the signals on its own.

The code that previously asked the broadcast daemon to send the revision signal was changed to simply send the signal. The following helper made this pretty easy to do without having to write any extra classes to emit the signals:

def send_signal(bus, dbus_interface, signal_name, signature, *args):
    """Send a signal on the bus."""
    message = dbus.lowlevel.SignalMessage('/', dbus_interface, signal_name)
    message.append(signature=signature, *args)
    bus.send_message(message)

With these changes, the commit hook now only needs to connect to the session bus and fire off the signal and return. Previously it was connecting to the bus, getting an the broadcast service (which might involve activating it), sending a method call message and waiting for a method return message. The new code is faster and if no one is listening for the signals, it only wakes the bus.

For code that was consuming the signals, they had to switch to the bus.add_signal_receiver() method to register the callbacks, which allows you to subscribe to a signal irrespective of its origin.

The only missing feature with these changes was annotating the signals with additional URLs when the branch was being shared over the network. As these additional URLs are only really interesting when accessing the branch remotely, I moved the functionality to the “bzr lan-notify” command so that it annotates the revision announcements just before broadcasting them to the local network.

With all the changes applied, the D-Bus API consists entirely of signal emissions, which gives a looser coupling between the various components: each component will happily function in the absence of the others, which is great for reliability.

Once the patches are merged, I’ll have to look at porting bzr-avahi to this infrastructure. Together, these two plugins offer compelling features for local network collaboration.

Running Valgrind on Python Extensions

As most developers know, Valgrind is an invaluable tool for finding memory leaks. However, when debugging Python programs the pymalloc allocator gets in the way.

There is a Valgrind suppression file distributed with Python that gets rid of most of the false positives, but does not give particularly good diagnostics for memory allocated through pymalloc. To properly analyse leaks, you often need to recompile Python with pymalloc.

As I don’t like having to recompile Python I took a look at Valgrind’s client API, which provides a way for a program to detect whether it is running under Valgrind. Using the client API I was able to put together a patch that automatically disables pymalloc when appropriate. It can be found attached to bug 2422 in the Python bug tracker.

The patch still needs a bit of work before it will be mergeable with Python 2.6/3.0 (mainly autoconf foo).  I also need to do a bit more benchmarking on the patch.  If the overhead of turning on this patch is negligible, then it’d be pretty cool to have it enabled by default when Valgrind is available.

Honey Bock

Yesterday I bottled the honey bock that has been brewing over the last week. This one was made with the following ingredients:

  1. A Black Rock Bock beer kit.
  2. 1kg of honey
  3. 500g of Dextrose
  4. Caster sugar for carbonation

The only difference from the standard procedure was replacing part of the brewing sugar with honey. Before being added, the honey needs to be pasteurised, which involves heating it up to 80°C and keeping it at that temperature for half an hour or so. This kills off any any wild yeasts or other undesirables that might spoil the brew.

I’ve used honey in a few other brews over the years but had not tried it with a dark beer, so it will be interesting to see how it turns out. The previous beers had a stronger honey flavour than commercial beers like Beez Neez, which is probably a good thing for a dark beer.  I guess I’ll find out after it matures for about a month.

Two‐Phase Commit in Python’s DB‐API

Marc uploaded a new revision of the Python DB-API 2.0 Specification yesterday that documents the new two phase commit extension that I helped develop on the db-sig mailing list.

My interest in this started from the desire to support two phase commit in Storm – without that feature there are far fewer occasions where its ability to talk to multiple databases can be put to use. As I was doing some work on psycopg2 for Launchpad, I initially put together a PostgreSQL specific patch, which was (rightly) rejected by Federico.

He suggested that it would be better to try and standardise on an API on the db-sig list, so that’s what I did. I looked over the API exposed by other database adapters that supported 2PC, and the 2PC APIs of the major free databases that did not have support in their Python adapters (MySQL and PostgreSQL). The resulting API is a bit more complicated than my original PostgreSQL-only but has the advantage of being implementable on other databases such as MySQL.

Below is a simple example of using the API directly (missing some of the error handling):

# begin transactions for each database connection
conn1.tpc_begin(conn1.xid(42, 'transaction ID', 'connection 1'))
conn2.tpc_begin(conn2.xid(42, 'transaction ID', 'connection 2'))
# Do stuff with both connections
...
try:
    conn1.tpc_prepare()
    conn2.tpc_prepare()
except DatabaseError:
    conn1.tpc_rollback()
    conn2.tpc_rollback()
else:
    conn1.tpc_commit()
    conn2.tpc_commit()

Or alternatively, if you’ve got one connection supporting 2PC and the other only supporting one-phase commit, it could be structured as follows:

# begin transactions for each database connection
conn1.tpc_begin(conn1.xid(42, 'transaction ID', 'connection 1'))
# Do stuff with both connections
...
try:
    conn1.tpc_prepare()
    conn2.commit()
except DatabaseError:
    conn1.tpc_rollback()
    conn2.rollback()
else:
    conn1.tpc_commit()

While it is possible to use the 2PC API directly, it is expected that most applications will rely on a transaction manager to coordinate global transactions, such as Zope’s transaction module.

The hope is that by offering a consistent API, Python application frameworks will be more likely to bother supporting this feature of databases. Hopefully you’ll be able to use the API with PostgreSQL and Storm soon.

Zeroconf Branch Sharing with Bazaar

Bazaar logoAt Canonical, one of the approaches taken to accelerate development is to hold coding sprints (otherwise known as hackathons, hackfests or similar). Certain things get done a lot quicker face to face compared to mailing lists, IRC or VoIP.

When collaborating with someone at one of these sprints the usual way to let others look at my work would be to commit the changes so that they could be pulled or merged by others. With legacy version control systems like CVS or Subversion, this would generally result in me uploading all my changes to a server in another country only for them to be downloaded back to the sprint location by others.

In contrast, with a modern VCS like Bazaar we should be able to avoid this since the full history of the branch is available locally – enough information to let others pull or merge the changes. That said, we’ve often ended up using a server on the internet to exchange changes despite this. This is the same work flow we use when working from home, so I guess the pain of switching to a new work flow outweighs the potential productivity gains.

The Solution

Bazaar makes it easy to run a read only server locally:

bzr serve [--directory=DIR]

However, there is still the issue of others finding the branch. They’d need to know the IP address assigned to my computer at the sprint, and the path to the branch on the server. Ideally they’d just need to know the name of the my branch. As it happens, we’ve got the technology to fix this.

Avahi logoAvahi makes it trivial to advertise and browse for services on the local network without having to worry about what IP addresses have been assigned or what people name their computer. So the solution is to hook Avahi and Bazaar together. This was fairly easy due to Avahi’s DBus interface and the dbus-python bindings.

The result is my bzr-avahi plugin. You can either download tarballs or install the latest version directly with from Bazaar:

bzr branch lp:bzr-avahi ~/.bazaar/plugins/avahi

To use the plugin, you must have at least version 1.1 of Bazaar, the Python bindings for DBus and Avahi, and a working Avahi setup. Once the plugin is installed, it hooks into the standard “bzr serve” command to do the following:

  • scan the directory being served for branches that the user has asked to advertise.
  • ask Avahi to advertise said branches

You can ask to advertise a branch using the new “bzr advertise” command:

bzr advertise [BRANCH-NAME]

If no name is specified, the branch’s nickname is used. The advertise command sends a signal over the session bus to tell any running servers about the change, so there is no need to restart “bzr serve” to see the change.

At this point, the advertised branches should be visible with a service browser like avahi-discover, so that’s half the problem solved. From the client side two things are provided: a special redirecting transport and a command to list all advertised branches on the local network.

The transport allows you to access the branch by its advertised name with most Bazaar commands. For example, merging a branch is as simple as:

$ bzr merge local:BRANCH-NAME
local:BRANCH-NAME is redirected to bzr://hostname.local:4155/path/to/branch
…
All changes applied successfully.
$

If you want to get a list of all advertised branches on the network, the “bzr browse” command will print out a list of branch names and the URLs they translate to.

I believe using these tools together should offer a low enough overhead for direct sharing of branches at sprints that people would actually bother using it. It should be quite useful at the next sprint I go to.

Client Side OpenID

The following article discusses ideas that I wouldn’t even class as vapourware, as I am not proposing to implement them myself. That said, the ideas should still be implementable if anyone is interested.

One well known security weakness in OpenID is its weakness to phishing attacks. An OpenID authentication request is initiated by the user entering their identifier into the Relying Party, which then hands control to the user’s OpenID Provider through an HTTP redirect or form post. A malicious RP may instead forward the user to a site that looks like the user’s OP and record any information they enter. As the user provided their identifier, the RP knows exactly what site to forge.

Out Of Band Authorisation

One way around this is for the OP to authenticate the user and get authorisation out of band — just because the authentication message begins and ends with HTTP requests does not mean that the actual authentication/authorisation need be done through the web browser.

Possibilities include performing the authorisation via a Jabber message or SMS, or some special purpose protocol. Once authorisation is granted, the OP would need to send the OpenID response. Two ways for the web browser to detect this would be polling via AJAX, or using a server-push technique like Comet.

Using a Browser Extension

While the above method adds security it takes the user outside of their web browser, which could be disconcerting. We should be able to provide an improved user experience by using a web browser extension. So what is the best way for the extension to know when to do its thing?

One answer is whenever the user visits the server URL of their OP. Reading through the specification there are no other times when the user is required to visit that URL. So if the web browser extension can intercept GET and POST requests to a particular URL, it should be able to reliably detect when an authentication request is being initiated.

At this point, the extension can take over up to the point where it redirects the user back to the RP. It will need to communicate with the OP in some way to get the response signed, but we have the option of using some previously established back channel.

Moving the OP Client Side

Using the browser extension from the previous section as a starting point, we’ve moved some of the processing to the client side. We might now ask how much work can be moved to the client, and how much work needs to remain on the server?

From the specification, there are three points at which the RP needs to make a direct connection to the OP (or a related server):

  1. When performing discover, the RP needs to be able to read an HTML or XRDS file off some server.
  2. The associate request, used to generate an association that lets the RP verify authentication responses.
  3. The check_authentication request, used to verify a response in the case where an association was not provided in the request (or the OP said the association was invalid).

In all other cases, communication is mediated through the user’s browser (so are being intercepted by the browser extension). Furthermore, these three cases should only occur after the user initiates an OpenID authentication request. This means that the browser extension should be active and talking to the server.

So one option would be to radically simplify the server side so that it simply proxies the associate and check_authentication requests to the browser extension via a secure channel. This way pretty much the entire OP implementation resides in the browser extension with no state being handled by the server.

Conclusion

So it certainly looks like it is possible to migrate almost everything to the client side. That still leaves open the question of whether you’d actually want to do this, since it effectively makes your identity unavailable when away from a computer with the extension installed (a similar problem to use of self asserted infocards with Microsoft’s CardSpace).

Perhaps the intermediate form that still performs most of the OP processing on the server is more useful, providing a level of phishing resistance that would be difficult to fake (not only does it prevent rogue RPs from capturing credentials, the “proxied OP” attack will fail to activate the extension all together).

Re: Python factory-like type instances

Nicolas: Your metaclass example is a good example of when not to use metaclasses. I wouldn’t be surprised if it is executed slightly different to how you expect. Let’s look at how Foo is evaluated, starting with what’s written:

class Foo:
    __metaclass__ = FooMeta

This is equivalent to the following assignment:

Foo = FooMeta('Foo', (), {...})

As FooMeta has an __new__() method, the attempt to instantiate FooMeta will result in it being called. As the return value of __new__() is not a FooMeta instance, there is no attempt to call FooMeta.__init__(). So we could further simplify the code to:

Foo = {
    'linux2': LinuxFoo,
    'win32': WindowsFoo,
}.get(PLATFORM, None)
if not Foo:
    # XXX: this should _really_ raise something other than Exception
    raise Exception, 'Platform not supported'

So the factory function is gone completely here, and it is clear that the decision about which class to use is being made at module import time rather than class instantiation time.

Now this isn’t to say that metaclasses are useless. In both implementations, the code responsible for selecting the class has knowledge of all implementations. To add a new implementation (e.g. for Solaris or MacOS X), the factory function needs to be updated. A better solution would be to provide a way for new implementations to register themselves with the factory. A metaclass could be used to make the registration automatic:

class FooMeta(type):
    def __init__(self, name, bases, attrs):
        cls = super(FooMeta, self).__init__(name, bases, attrs)
        if cls.platform is not None:
            register_foo_implementation(klass.platform, cls)
        return cls

class Foo:
    __metaclass__ = FooMeta
    platform = None
    ...

class LinuxFoo(Foo):
    platform = 'linux2'

Now the simple act of defining a SolarisFoo class would be enough to have it registered and ready to use.

Allocated Seating at Greater Union

On the weekend, I had my first encounter with allocated seating at the Greater Union Innaloo cinemas.

As usual, we’d bought tickets separately. It wasn’t until going in to the actual cinema that a staff member said that we were expected to sit in seats scattered around the cinema (one of which was on the very edge).

As the cinema wasn’t completely full, we did the only sensible thing: ignore the allocations and pick some seats next to each other. Looking around the cinema, it looked like a number of other people were ignoring the allocations (the seat I’d been allocated was taken by someone else in a group of about 5 people).

As far as I can understand, the reason for introducing this was to make the internet booking more compelling by letting you pick your seat. I guess they felt the need to do something, since the current system has never seemed worth it:

  • They charge an extra dollar per ticket for internet sales. This is despite the fact that they get the money earlier, and you might not even turn up (the tickets are sold on a no returns basis).
  • While there is a special queue for picking up internet sales tickets, there often isn’t anyone staffing it. I’ve only seen people in the queue a few times, and they needed to wait until one of the other ticket sellers was free.

Maybe they thought screwing with the majority of their customers’ experience would make the extra dollar worth it.

I sent a complaint to Greater Union, and in future plan to treat their seating allocations as a suggestion. It is a shame that so many other cinemas have been closing down over the years :(

urlparse considered harmful

Over the weekend, I spent a number of hours tracking down a bug caused by the cache in the Python urlparse module. The problem has already been reported as Python bug 1313119, but has not been fixed yet.

First a bit of background. The urlparse module does what you’d expect and parses a URL into its components:

>>> from urlparse import urlparse
>>> urlparse('http://www.gnome.org/')
('http', 'www.gnome.org', '/', '', '', '')

As well as accepting byte strings (which you’d be using at the HTTP protocol level), it also accepts Unicode strings (which you’d be using at the HTML or XML content level):

>>> urlparse(u'http://www.ubuntu.com/')
(u'http', u'www.ubuntu.com', u'/', '', '', '')

As the result is immutable, urlparse implements a cache of up to 20 previous results. Unfortunately, the cache does not distinguish between byte strings and Unicode strings, so parsing a byte string may return unicode components if the result is in the cache:

>>> urlparse('http://www.ubuntu.com/')
(u'http', u'www.ubuntu.com', u'/', '', '', '')

When you combine this with Python’s automatic promotion of byte strings to unicode when concatenating with a unicode string, can really screw things up when you do want to work with byte strings. If you hit such a problem, the code may all look correct but the problem was introduced 20 urlparse calls ago. Even if your own code never passes in Unicode strings, one of the libraries you use might be doing so.

The problem affects more than just the urlparse function. The urljoin function from the same module is also affected since it uses urlparse internally:

>>> from urlparse import urljoin
>>> urljoin('http://www.ubuntu.com/', '/news')
u'http://www.ubuntu.com/news'

It seems safest to avoid the module all together if possible, or at least until the underlying bug is fixed.

OpenID 2.0 Specification Approved

It looks like the OpenID Authentication 2.0 specification has finally been released, along with OpenID Attribute Exchange 1.0. While there are some questionable features in the new specification (namely XRIs), it seems like a worthwhile improvement over the previous specification. It will be interesting to see how quickly the new specification gains adoption.

While this is certainly an important milestone, there are still areas for improvement.

Best Practices For Managing Trust Relationships With OPs

The proposed Provider Authentication Policy Extension allows a Relying Party to specify what level of checking it wants the OpenID Provider to perform on the user (e.g. phishing resistant, multi factor, etc). The OP can then tell the RP what level of checking was actually performed.

What the specification doesn’t cover is why the RP should believe the OP. I can easily set up an OP that performs no checking on the user but claims that it performed “Physical Multi-Factor Authentication” in its responses. Any RP that acted on that assertion would be buggy.

This isn’t to say that the extension is useless. If the entity running the RP also runs the OP, then they might have good reason to believe the responses and act on them. Similarly, they might decide that JanRain are quite trustworthy so believe responses from myOpenID.

What is common in between these situations is that there is a trust relationship between the OP and RP that is outside of the protocol. As the specification gives no guidance on how to set up these relationships, they are likely to be ad-hoc and result in some OpenIDs being more useful than others.

At a minimum, it’d be good to see some best practices document on how to handle this.

Trusted Attribute Exchange

As mentioned in my previous article on OpenID Attribute Exchange, I mentioned that attribute values provided by the OP should be treated as being self asserted. So if the RP receives an email address or Jabber ID via attribute exchange, there is no guarantee that the user actually owns them. This is a problem if the RP wants to start emailing or instant messaging the user (e.g. OpenID enabled mailing list management software). Assuming the RP doesn’t want to get users to revalidate their email address, what can it do?

One of the simplest solutions is to use a trust relationship with the OP. If the RP knows that the OP will only transfer email addresses if the user has previously verified them, then they need not perform a second verification. This leaves us in the same situation as described in the previous situation.

Another solution that has been proposed by Sxip is to make the attribute values self-asserting. This entails making the attribute value contain both the desired information plus a digital signature. Using the email example, if the email address has a valid digital signature and the RP trusts the signer to perform email address verification, then it can accept the email address without further verification.

This means that the RP only needs to manage trust relationships with the attribute signers rather than every OP used by their user base. If there are fewer attribute signers than OPs then this is of obvious benefit to the RP. It also benefits the user since they no longer limited to one of the “approved” OPs.

Canonical IDs for URL Identifiers

I’ve stated previously that I think the support for identifier reuse with respect to URL identifiers is a bit lacking.  It’d be nice to see it expanded in a future specification revision.

Bad Behavior has blocked 456 access attempts in the last 7 days.