Psycopg migrated to Bazaar

Last week we moved psycopg from Subversion to Bazaar.  I did the migration using Gustavo Niemeyer‘s svn2bzr tool with a few tweaks to map the old Subversion committer IDs to the email address form conventionally used by Bazaar.

The tool does a good job of following tree copies and create related Bazaar branches.  It doesn’t have any special handling for stuff in the tags/ directory (it produces new branches, as it does for other tree copies).  To get real Bazaar tags, I wrote a simple post-processing script to calculate the heads of all the branches in a tags/ directory and set them as tags in another branch (provided those revisions occur in its ancestry).  This worked pretty well except for a few revisions synthesised by a previous cvs2svn migration.  As these tags were from pretty old psycopg 1 releases I don’t know how much it matters.

As there is no code browsing set up on initd.org yet, I set up mirrors of the 2.0.x and 1.x branches on Launchpad to do this:

It is pretty cool having access to the entire revision history locally, and should make it easier to maintain full credit for contributions from non-core developers.

Psycopg2 2.0.7 Released

Yesterday Federico released version 2.0.7 of psycopg2 (a Python database adapter for PostgreSQL).  I made a fair number of the changes in this release to make it more usable for some of Canonical‘s applications.  The new release should work with the development version of Storm, and shouldn’t be too difficult to get everything working with other frameworks.

Some of the improvements include:

  • Better selection of exceptions based on the SQLSTATE result field.  This causes a number of errors that were reported as ProgrammingError to use a more appropriate exception (e.g. DataError, OperationalError, InternalError).  This was the change that broke Storm’s test suite as it was checking for ProgrammingError on some queries that were clearly not programming errors.
  • Proper error reporting for commit() and rollback(). These methods now use the same error reporting code paths as execute(), so an integrity error on commit() will now raise IntegrityError rather than OperationalError.
  • The compile-time switch that controls whether the display_size member of Cursor.description is calculated is now turned off by default.  The code was quite expensive and the field is of limited use (and not provided by a number of other database adapters).
  • New QueryCanceledError and TransactionRollbackError exceptions.  The first is useful for handling queries that are canceled by statement_timeout.  The second provides a convenient way to catch serialisation failures and deadlocks: errors that indicate the transaction should be retried.
  • Fixes for a few memory leaks and GIL misuses. One of the leaks was in the notice processing code that could be particularly problematic for long-running daemon processes.
  • Better test coverage and a driver script to run the entire test suite in one go.  The tests should all pass too, provided your database cluster uses unicode (there was a report just before the release of one test failing for a LATIN1 cluster).

If you’re using previous versions of psycopg2, I’d highly recommend upgrading to this release.

Future work will probably involve support for the DB-API two phase commit extension, but I don’t know when I’ll have time to get to that.

Two‐Phase Commit in Python’s DB‐API

Marc uploaded a new revision of the Python DB-API 2.0 Specification yesterday that documents the new two phase commit extension that I helped develop on the db-sig mailing list.

My interest in this started from the desire to support two phase commit in Storm – without that feature there are far fewer occasions where its ability to talk to multiple databases can be put to use. As I was doing some work on psycopg2 for Launchpad, I initially put together a PostgreSQL specific patch, which was (rightly) rejected by Federico.

He suggested that it would be better to try and standardise on an API on the db-sig list, so that’s what I did. I looked over the API exposed by other database adapters that supported 2PC, and the 2PC APIs of the major free databases that did not have support in their Python adapters (MySQL and PostgreSQL). The resulting API is a bit more complicated than my original PostgreSQL-only but has the advantage of being implementable on other databases such as MySQL.

Below is a simple example of using the API directly (missing some of the error handling):

# begin transactions for each database connection
conn1.tpc_begin(conn1.xid(42, 'transaction ID', 'connection 1'))
conn2.tpc_begin(conn2.xid(42, 'transaction ID', 'connection 2'))
# Do stuff with both connections
...
try:
    conn1.tpc_prepare()
    conn2.tpc_prepare()
except DatabaseError:
    conn1.tpc_rollback()
    conn2.tpc_rollback()
else:
    conn1.tpc_commit()
    conn2.tpc_commit()

Or alternatively, if you’ve got one connection supporting 2PC and the other only supporting one-phase commit, it could be structured as follows:

# begin transactions for each database connection
conn1.tpc_begin(conn1.xid(42, 'transaction ID', 'connection 1'))
# Do stuff with both connections
...
try:
    conn1.tpc_prepare()
    conn2.commit()
except DatabaseError:
    conn1.tpc_rollback()
    conn2.rollback()
else:
    conn1.tpc_commit()

While it is possible to use the 2PC API directly, it is expected that most applications will rely on a transaction manager to coordinate global transactions, such as Zope’s transaction module.

The hope is that by offering a consistent API, Python application frameworks will be more likely to bother supporting this feature of databases. Hopefully you’ll be able to use the API with PostgreSQL and Storm soon.