django-openid-auth

Last week, we released the source code to django-openid-auth.  This is a small library that can add OpenID based authentication to Django applications.  It has been used for a number of internal Canonical projects, including the sprint scheduler Scott wrote for the last Ubuntu Developer Summit, so it is possible you’ve already used the code.

Rather than trying to cover all possible use cases of OpenID, it focuses on providing OpenID Relying Party support to applications using Django’s django.contrib.auth authentication system.  As such, it is usually enough to edit just two files in an existing application to enable OpenID login.

The library has a number of useful features:

  • As well as the standard method of prompting the user for an identity URL, you can configure a fixed OpenID server URL.  This is useful for deployments where OpenID is being used for single sign on, and you always want users to log in using a particular OpenID provider.  Rather than asking the user for their identity URL, they are sent directly to the provider.
  • It can be configured to automatically create accounts when new identity URLs are seen.
  • User names, full names and email addresses can be set on accounts based on data sent via the OpenID Simple Registration extension.
  • Support for Launchpad‘s Teams OpenID extension, which lets you query membership of Launchpad teams when authenticating against Launchpad’s OpenID provider.  Team memberships are mapped to Django group membership.

While the code can be used for generic OpenID login, we’ve mostly been using it for single sign on.  The hope is that it will help members of the Ubuntu and Launchpad communities reuse our authentication system in a secure fashion.

The source code can be downloaded using the following Bazaar command:

bzr branch lp:django-openid-auth

Documentation on how to integrate the library is available in the README.txt file.  The library includes some code written by Simon Willison for django-openid, and uses the same licensing terms (2 clause BSD) as that project.

Django support landed in Storm

Since my last article on integrating Storm with Django, I’ve merged my changes to Storm’s trunk.  This missed the 0.13 release, so you’ll need to use Bazaar to get the latest trunk or wait for 0.14.

The focus since the last post was to get Storm to cooperate with Django’s built in ORM.  One of the reasons people use Django is the existing components that can be used to build a site.  This ranges from the included user management and administration code to full web shop implementations.  So even if you plan to use Storm for your Django application, your application will most likely use Django’s ORM for some things.

When I last posted about this code, it was possible to use both ORMs in a single app, but they would use separate database connections.  This had a number of disadvantages:

  • The two connections would be running separate transactions in parallel, so changes made by one connection would not be visible to the other connection until after the transaction was complete.  This is a problem when updating records in one table that reference rows that are being updated on the other connection.
  • When you have more than one connection, you introduce a new failure mode where one transaction may successfully commit but the other fail, leaving you with only half the changes being recorded.  This can be fixed by using two phase commit, but that is not supported by either Django or Storm at this point in time.

So it is desirable to have the two ORMs sharing a single connection.  The way I’ve implemented this is as a Django database engine backend that uses the connection for a particular named per-thread store and passes transaction commit or rollback requests through to the global transaction manager.  Configuration is as simple as:

DATABASE_ENGINE = 'storm.django.backend'
DATABASE_NAME = 'store-name'
STORM_STORES = {'store-name': 'database-uri'}

This will work for PostgreSQL or MySQL connections: Django requires some additional set up for SQLite connections that Storm doesn’t do.

Once this is configured, things mostly just work.  As Django and Storm both maintain caches of data retrieved from the database though, accessing the same table with both ORMs could give unpredictable results.  My code doesn’t attempt to solve this problem so it is probably best to access tables with only one ORM or the other.

I suppose the next step here would be to implement something similar to Storm’s Reference class to represent links between objects managed by Storm and objects managed by Django and vice versa.

Transaction Management in Django

In my previous post about Django, I mentioned that I found the transaction handling strategy in Django to be a bit surprising.

Like most object relational mappers, it caches information retrieved from the database, since you don’t want to be constantly issuing SELECT queries for every attribute access. However, it defaults to commiting after saving changes to each object. So a single web request might end up issuing many transactions:

Change object 1 Transaction 1
Change object 2 Transaction 2
Change object 3 Transaction 3
Change object 4 Transaction 4
Change object 5 Transaction 5

Unless no one else is accessing the database, there is a chance that other users could modify objects that the ORM has cached over the transaction boundaries. This also makes it difficult to test your application in any meaningful way, since it is hard to predict what changes will occur at those points. Django does provide a few ways to provide better transactional behaviour.

The @commit_on_success Decorator

The first is a decorator that turns on manual transaction management for the duration of the function and does a commit or rollback when it completes depending on whether an exception was raised. In the above example, if the middle three operations were made inside a @commit_on_success function, it would look something like this:

Change object 1 Transaction 1
Change object 2 Transaction 2
Change object 3
Change object 4
Change object 5 Transaction 3

Note that the decorator is usually used on view functions, so it will usually cover most of the request. That said, there are a number of cases where extra work might be done outside of the function. Some examples include work done in middleware classes and views that call other view functions.

The TransactionMiddleware class

Another alternative is to install the TransactionMiddleware middleware class for the site. This turns on transaction management for the duration of each request, similar to what you’d see with other frameworks giving results something like this:

Change object 1 Transaction 1
Change object 2
Change object 3
Change object 4
Change object 5

Combining @commit_on_success and TransactionMiddleware

At first, it would appear that these two approaches cover pretty much everything you’d want. But there are problems when you combine the two. If we use the @commit_on_success decorator as before and TransactionMiddleware, we get the following set of transactions:

Change object 1 Transaction 1
Change object 2
Change object 3
Change object 4
Change object 5 Transaction 2

The transaction for the @commit_on_success function has extended to cover the operations made before hand. This also means that operations #1 and #5 are now in separate transactions despite the use of TransactionMiddleware. The problem also occurs with nested use of @commit_on_success, as reported in Django bug 2227.

A better behaviour for nested transaction management would be something like this:

  1. On success, do nothing. The changes will be committed by the outside caller.
  2. On failure, do not abort the transaction, but instead mark it as uncommittable. This would have similar semantics to the Zope transaction.doom() function.

It is important that the nested call does not abort the transaction because that would cause a new transaction to be started by subsequent code: that should be left to the code that began the transaction.

The @autocommit decorator

While the above interaction looks like a simple bug, the @autocommit decorator is another matter. It turns autocommit on for the duration of a function call, no matter what the transaction mode for the caller was. If we took the original example and wrapped the middle three operations with @autocommit and used TransactionMiddleware, we’d get 4 transactions: one for the first two operations, then one for each of the remaining operations.

I can’t think of a situation where it would make sense to use, and wonder if it was just added for completeness.

Conclusion

While the nesting bugs remain, my recommendation would be to go for the TransactionMiddleware and avoid use of the decorators (both in your own code and third party components). If you are writing reusable code that requires transactions, it is probably better to assert that django.db.transaction.is_managed() is true so that you get a failure for improperly configured systems while not introducing unwanted transaction boundaries.

For the Storm integration work I’m doing, I’ve set it to use managed transaction mode to avoid most of the unwanted commits, but it still falls prey to the extra commits when using the decorators. So I guess inspecting the code is still necessary. If anyone has other tips, I’d be glad to hear them.

Using Storm with Django

I’ve been playing around with Django a bit for work recently, which has been interesting to see what choices they’ve made differently to Zope 3.  There were a few things that surprised me:

  • The ORM and database layer defaults to autocommit mode rather than using transactions.  This seems like an odd choice given that all the major free databases support transactions these days.  While autocommit might work fine when a web application is under light use, it is a recipe for problems at higher loads.  By using transactions that last for the duration of the request, the testing you do is more likely to help with the high load situations.
  • While there is a middleware class to enable request-duration transactions, it only covers the database connection.  There is no global transaction manager to coordinate multiple DB connections or other resources.
  • The ORM appears to only support a single connection for a request.  While this is the most common case and should be easy to code with, allowing an application to expand past this limit seems prudent.
  • The tutorial promotes schema generation from Python models, which I feel is the wrong choice for any application that is likely to evolve over time (i.e. pretty much every application).  I’ve written about this previously and believe that migration based schema management is a more workable solution.
  • It poorly reinvents thread local storage in a few places.  This isn’t too surprising for things that existed prior to Python 2.4, and probably isn’t a problem for its default mode of operation.

Other than these things I’ve noticed so far, it looks like a nice framework.

Integrating Storm

I’ve been doing a bit of work to make it easy to use Storm with Django.  I posted some initial details on the mailing list.  The initial code has been published on Launchpad but is not yet ready to merge. Some of the main details include:

  • A middleware class that integrates the Zope global transaction manager (which requires just the zope.interface and transaction packages).  There doesn’t appear to be any equivalent functionality in Django, and this made it possible to reuse the existing integration code (an approach that has been taken to use Storm with Pylons).  It will also make it easier to take advantage of other future improvements (e.g. only committing stores that are used in a transaction, two phase commit).
  • Stores can be configured through the application’s Django settings file, and are managed as long lived per-thread connections.
  • A simple get_store(name) function is provided for accessing per-thread stores within view code.

What this doesn’t do yet is provide much integration with existing Django functionality (e.g. django.contrib.admin).  I plan to try and get some of these bits working in the near future.