Extending geoalchemy through monkeypatching

I’ve been working on the data collection part of my cycle route modelling. I’m hoping that I can, as a first output, put together a map of where people are cycling in Melbourne. A crowd-sourced view of the best places to cycle, if you will. Given I will probably be running this in the cloud1, I thought it was best to actually store the data in a GIS database, rather than lots and lots of flat files.

A quick Google turned up GeoAlchemy, which are GIS extensions for SQLAlchemy. Provides lots of the standard things you want to do as methods on fields, but this is only a limited set of what you can do with PostGIS. Since I’m going to be wanting to do things like binning data, I thought it was worth figuring out how hard it was to call other PostGIS methods.

GeoAlchemy supports subclassing to create new dialects, but you have to subclass 3 classes, and it’s basically a pain in the neck when you just want to extend the functionality of the PostGIS dialect. Probably what I should do is submit a pull request with the rest of the PostGIS API as extensions, but I’m lazy. Henceforth, for the second time this week I am employing monkey patching to get the job done (and for the second time this week, kittens cry).

Functions in GeoAlchemy require two things, a method stub saying how we collect the arguments and the return (look at geoalchemy.postgis.pg_functions) and a mapping from this to the SQL function. Since we only care about one dialect, we can make this easier on ourselves by combining these two things. Firstly we monkeypatch in the method stubs:

from geoalchemy.functions import BaseFunction
from geoalchemy.postgis import pg_functions

@monkeypatchclass(pg_functions)
class more_pg_functions:
    """
    Additional functions to support for PostGIS
    """

    class length_spheroid(BaseFunction):
        _method = 'ST_Length_Spheroid'

Note the _method attribute which isn’t something used anywhere else. We can then patch in support for this:

from geoalchemy.dialect import SpatialDialect

@monkeypatch(SpatialDialect)
def get_function(self, function_cls):
    """
    Add support for the _method attribute
    """

    try:
        return function_cls._method
    except AttributeError:
        return self.__super__get_function(function_cls)

The monkeypatching functions look like this:

def monkeypatch(*args):
    """
    Decorator to monkeypatch a function into class as a method
    """

    def inner(func):
        name = func.__name__

        for cls in args:
            old = getattr(cls, name)
            setattr(cls, '__super__{}'.format(name), old)

            setattr(cls, name, func)

    return inner


def monkeypatchclass(cls):
    """
    Decorator to monkeypatch a class as a baseclass of @cls
    """

    def inner(basecls):
        cls.__bases__ += (basecls,)

        return basecls

    return inner

Finally we can do queries like this:

>>> track = session.query(Track).get(1)
>>> session.scalar(track.points.length_spheroid('SPHEROID["WGS 84",6378137,298.257223563]'))
6791.87502950043

Code on GitHub.

  1. your recommendations for cloud-based services please, must be able to run Flask and PostGIS and be super cheap []
Posted in Uncategorized | 4 Comments

scratching my own itch [or: why I still love open source]

I’m visiting my parents for the long weekend. Sitting in the airport I decided I should use my spare time to write some documentation, so sitting in the airport was the first time I’d tried to connect to work’s OpenVPN server. While it’s awesome that Network Manager can now import OpenVPN configs, it didn’t work because NM doesn’t support the crucial keysize parameter.

Rather than work around the problem, which some people have done, but would annoyingly break my other OpenVPNs, I used the fact that it’s open source to fix the problem properly.

My Dad asked if I was working. No, well, not really. I’m fixing the interface to my VPN client so I can connect to work’s VPN, I replied. Unglaublich! my father remarked. Not unbelievable, because it’s open source!

Posted in Uncategorized | 3 Comments

Review: GNOME 3 Application Development: Beginner’s Guide

GNOME 3 Application DevelopmentThe folk at Packt Publishing sent me an e-copy of GNOME 3 Application Development Beginners Guide a month or so ago.

I’ve been putting off this review because I don’t think this is an very good book and it’s hard to write bad reviews.

First off, the book’s Javascript sections use Seed. I think this is an unconventional choice given that the shell and most of GNOME uses gjs. It had been my experience with the Javascript bindings that gjs was significantly more mature, a view which is confirmed by the fact that Seed has had very little development in the last 18 months.

The book does not seem to use GTK+ best practice, like using Gtk.Grid or Gtk.Application and not using c_new constructor. It is full of things like use of Vala’s [CCode] pragma, but I don’t see why. I felt important and powerful facilities in GLib like properties were not properly explained, especially property binding. There was also a lack of understanding, for example, referring to Timeout objects, which don’t exist (the structure you’re looking for is a Source).

I do like that it uses Anjuta. It’s a shame that it requires unexplained hacks to get things building.

The Clutter section was very poor. Comparing Clutter to GTK+ is simply not reasonable. Clutter is a scene graph API, which doesn’t really have a comparison in the GTK+ stack, which goes from drawing layer to widget layer with no intermediate layer. I immediately noticed the Clutter examples hardcoded layout instead of using a layout manager.

The multimedia section had the user installing non-free codecs. Then it uses alsasink and not auto*sink. It spends a lot of time setting up GStreamer pipelines, rather than using decodebin and playbin, maybe this improves understanding, but I think it mostly will lead to the creation of very rigid apps.

I stopped reading and started skimming at this point. I did again notice weird things like creating JSON using append methods and not the handy JSON-GLib. The examples of HTML5 applications with WebKit perhaps would explain why Seed, except the wrapper is written in Vala, so there’s no problem of conflicting JS engines (I think it works fine anyway, right?). Similarly the application accesses applications by looking in /usr/share/applications rather than libgnome-menus. Again, this will lead to very rigid apps that don’t work very well and doesn’t teach beginners the best practice for GNOME development.

There’s stuff that’s just weird, the system requirements are significantly more powerful than my last computer, on which I was doing GNOME development just fine. There is discussion of how to switch to GNOME Shell, as if it’s required, whereas you can develop GNOME apps in Unity and XFCE just fine.

The typesetting of the book is poor. The source code is weirdly indented and I feel like it lacked readability (there’s great syntax highlighting available for printed text). There are grammatical mistakes that really should have been picked up in editing. The screenshots are blurry in the PDF (looks like some kind of busted bilinear filtering?). Also I can see the resize indicator on the mouse. Generally these serve to make the book look unprofessional.

Finally I don’t think the book really leads the new developer into the community as the best source to get help, which they will undoubtedly need. Of course, the community has already produced some excellent tutorials, which I think new developers would be much better off with.

All up I’m giving it 1 star.

GNOME 3 Application Development: Beginner’s Guide: ★☆☆☆☆

Posted in Uncategorized | 9 Comments

IdentitiesOnly + ssh-agent

I’m really hoping that someone can provide me with some enlightenment.

I have a lot of ssh keys. 6 by today’s count. On my desktop I have my ssh configured with IdentitiesOnly yes and an IdentityFile for each host. This works great.

I then forward my agent to my dev VM. I can see the keys with ssh-add -l. So far so good. If I then ssh into a host, I can see it trying every key from the agent in sequence, which is sometimes going to fail with too many keys tried. However, if I try IdentitiesOnly yes in my dev VM config, it doesn’t offer any keys, if I add IdentityFile it doesn’t work because I don’t have those key files on my VM.

So what’s the solution? What I want is to specify identities by their identifier in the agent, e.g. danni@github, however I can’t see config to do that. Anyone got a nifty solution?

Posted in Uncategorized | 8 Comments

generic lettuce steps for Django models

After I left the Bureau approximately a month ago I’ve taken up a new role with Infoxchange Australia. My first project here is working on a rewrite of an application using Django.

People here are really into behaviour driven testing, and we’re using Lettuce to do it (using a branch with better Django integration).

I sort of dislike this sort of testing, because it creates an annoying abstraction layer on top of the code, with a poorly defined, quasi-real language. It’s like a bad knock off of Applescript. Anyway, I got sick of defining steps per model, so I put together some generic steps for manipulating Django models (that I’ll have to contribute back).

Anyway they look like this (examples of the step in the docstrings):

# build a hash of model verbose names to models
# this is used by get_model()
def _models_generator():
    for model in get_models():
        yield (model._meta.verbose_name, model)
        yield (model._meta.verbose_name_plural, model)

MODELS = dict(_models_generator())


def get_model(model):
    """
    Convert a model's verbose name to the model class. This allows us to
    use the models verbose name in steps.
    """

    name = model.lower()
    model = MODELS.get(model, None)

    assert model, "Could not locate model by name '%s'" % name

    return model


def create_models(model, hashes):
    for hash_ in hashes:
        model.objects.create(**hash_)


def models_exist(model, hashes):
    for hash_ in hashes:
        assert \
            model.objects.filter(**hash_).exists(), \
            "Object does not exist"


@step(r'I have ([a-z][a-z0-9_ ]*) in the database:')
def create_models_generic(step, model):
    """
    And I have admin field values in the database:
    | name         | value   |
    | project_type | Twine   |

    The generic method can be overridden for a specific model by defining a
    function create_badgers(step), which creates the Badger model.
    """

    try:
        globals()['create_%s' % model](step)
    except KeyError:
        model = get_model(model)

        create_models(model, step.hashes)


@step(r'(?:Given|And|Then) ([A-Z][a-z0-9_ ]*) with ([a-z]+) "([^"]*)" has ([A-Z][a-z0-9_ ]*) in the database:')  # noqa
def create_models_for_relation(step, rel_model_name,
                               rel_key, rel_value, model):
    """
    And project with name "Ball Project" has goals in the database:
    | description                             |
    | To have fun playing with balls of twine |
    """

    lookup = {rel_key: rel_value}
    rel_model = get_model(rel_model_name).objects.get(**lookup)

    for hash_ in step.hashes:
        hash_['%s_id' % rel_model_name] = rel_model.id

    create_models_generic(step, model)


@step('(?:Given|And|Then) ([A-Z][a-z0-9_ ]*) should be present in the database')
def step_models_exist(step, model):
    """
    And objectives should be present in the database:
    | description      |
    | Make a mess      |
    """

    model = get_model(model)

    models_exist(model, step.hashes)


@step(r'There should be (\d+) ([a-z][a-z0-9_ ]*) in the database')
def model_count(step, count, model):
    """
    Then there should be 0 goals in the database
    """

    model = get_model(model)

    assert_equals(model.objects.count(), int(count))
Posted in python | Comments Off

Reviewing GNOME3 App Development Beginners Guide

GNOME 3 Application DevelopmentThe folk at Packt Publishing sent me an e-copy of GNOME 3 Application Development Beginners Guide the other day. Since I find myself with a couple of weeks off (more on that another time) I’m going to be reading it and writing a review.

The book weighs in at 366 pages and purports to cover GLib, GTK+, GStreamer, E-D-S, WebKit, desktop D-Bus APIs, i18n and unit testing in both Javascript (via Seed) and Vala.

Hopefully I will get it read in the next couple of weeks and get my thoughts jotted down. I am not getting anything except an e-copy of the book for my trouble so you can trust me to be brutally honest :-P

Posted in Uncategorized | 5 Comments

Generating JSON from SQLAlchemy objects

I had to put together a small web app the other day, using SQLAlchemy and Flask. Because I hate writing code multiple times, when I can do things using a better way, I wanted to be able to serialise SQLAlchemy ORM objects straight to JSON.

I decided on an approach where taking a leaf out of Javascript, I would optionally implement a tojson() method on a class, which I would attempt to call from my JSONEncoder1.

It turns out to be relatively simple to extend SQLAlchemy’s declarative base class to add additional methods (we can also use this as an excuse to implement a general __repr__().

from sqlalchemy.ext.declarative import declarative_base as real_declarative_base

# Let's make this a class decorator
declarative_base = lambda cls: real_declarative_base(cls=cls)

@declarative_base
class Base(object):
    """
    Add some default properties and methods to the SQLAlchemy declarative base.
    """

    @property
    def columns(self):
        return [ c.name for c in self.__table__.columns ]

    @property
    def columnitems(self):
        return dict([ (c, getattr(self, c)) for c in self.columns ])

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self.columnitems)

    def tojson(self):
        return self.columnitems

We can then define our tables in the usual way:

class Client(Base):
    __tablename__ = 'client'

    ...

You can obviously replace any of the methods in your subclass, if you don’t want to serialise the whole thing. Bonus points for anyone who wants to extend this to serialise one-to-many relationships.

And what about calling the tojson() method? That’s easy, we can just provide our own JSONEncoder.

import json

class JSONEncoder(json.JSONEncoder):
    """
    Wrapper class to try calling an object's tojson() method. This allows
    us to JSONify objects coming from the ORM. Also handles dates and datetimes.
    """

    def default(self, obj):
        if isinstance(obj, datetime.date):
            return obj.isoformat()

        try:
            return obj.tojson()
        except AttributeError:
            return json.JSONEncoder.default(self, obj)

Cutting edge Flask provides a way to replace the default JSON encoder, but the version I got out of pip does not. This is relatively easy to work around though by replacing jsonify with our own version.

from flask import Flask

app = Flask(__name__)

def jsonify(*args, **kwargs):
    """
    Workaround for Flask's jsonify not allowing replacement of the JSONEncoder
    in my version of Flask.
    """

    return app.response_class(json.dumps(dict(*args, **kwargs),
                                         cls=JSONEncoder),
                              mimetype='application/json')

If you do have a newer Flask, where you don’t have to replace jsonify, you can also inherit from Flask’s JSONEncoder, which already handles things like datetimes for you.

  1. The tojson() method actually returns a Python dict understandable by JSONEncoder []
Posted in python | 3 Comments

elevation data and APIs

So I found a bit of time to hack on my project today. Today’s task was to load and validate data coming from Geoscience Australia’s SRTM digital elevation model data, which I downloaded from their elevation data portal last week.1 The data is Creative Commons, so I might just upload it somewhere, if I can find a place for 2GB of elevation data.

This let me load elevation data in a 3 arcsecond (about 100m) grid, which I did using the ubiquitous GDAL via its Python API. Initial code is here. It doesn’t do anything super clever yet, like check and normalise the projection, because I don’t need to.2

Looking at plots of values can give you a gist of what’s what (oh look, it goes out to sea, and then the data is masked out) but it doesn’t really validate anything. I could do validation runs against my GPS tracks, but for a first pass, I decided it would be easier to validate using Google’s Elevation API. This is a pretty neat web service that you make a request to, and it gives you back some JSON (or XML). There are undoubtedly Python APIs to access this, but it’s pretty easy to do a simple call with urllib2 or httplib. I chose to reuse my httplib Client wrapper from my RunKeeper/HealthGraph API. I wrote it directly in the test.

For a real unit test, I would have probably calculated the residuals, and ensured they sat within some acceptable range, but I’m lazy, so instead I just plotted them together. Google’s data, you will notice, includes bathymetry, which is actually pretty neat.

SRTM v Google Elevation

  1. Note to the unwary, seems to be buggy in Chrome? []
  2. I did write a skeleton context manager for gdal.Open. I say skeleton because it doesn’t actually do anything smart like turning errors from GDAL into Exceptions, because I didn’t have any errors to handle. []
Posted in Uncategorized | Comments Off

Testing warnings with py.test

For those who use like to add warnings to your Python code, and want to test those warnings actually happen in your unit tests, here are two techniques to do so, both are based around fixtures/funcargs.

Firstly is the mechanism built into py.test using recwarn.

The second is to create a fixture that specifically enables warnings as exceptions and combined that with pytest.raises, for instance:

import warnings

@pytest.fixture
def warnings_as_errors(request):
    warnings.simplefilter('error')

    request.addfinalizer(lambda *args: warnings.resetwarnings())

def test_timers_warn(log, warnings_as_errors):

    log.start_timer('method')

    with pytest.raises(RuntimeWarning):
        log.start_timer('method')

The advantage of this second method is you can guarantee exactly what method call raises the warning without repeatedly having to check recwarn.

Posted in python | Comments Off

Investigating cycling speed anomalies

So as I’ve spent the last year learning Melbourne as a cyclist, there’s been a few times where I’ve found myself on an absolutely staggering hill, only to go down it again, and worse find there was another way that totally avoided the hill; or I’ve chosen routes that subject me to staggering head winds only to be told I should have taken another route instead.

This got me thinking, with everyone tracking their cycles on their smartphones, why couldn’t I feed all of this data into a model, along with some data like NASA’s elevation grids, or the Bureau of Meteorology’s wind observations. As something to keep me occupied over Christmas, I started a little project on the plane to Perth.

It turns out RunKeeper has this handy API that lets you access everything stored there. Unfortunately it seems that no one has really written a good Python API for this, so I put one together.

Throw in a bit of NumPy, PyProj (to convert to rectilinear coordinates) and Matplotlib (to plot it) and you can get a graph that looks like this (which thankfully looks a lot like RunKeeper’s graph):
Speed Anomalies
If we do some long window smoothing, we can get an idea of a cyclist’s average speed and then calculate a percentage anomaly from this average speed. This lets us compensate for different cyclists, how tired they are or if they’re riding with someone else1.

If we then do this for lots of tracks and grid the results based on whether the velocity vector at each point is headed towards or away from Melbourne2 we can get spatial plots that look like this (blue is -1 and red is 1):
Directional Speed Anomalies
If you squint at the graphs you can sort of see that there are many places where the blue/red are inverted, which is promising, it meant something was making us faster one way and slower the other (a hill or wind or the pub). You can also see that I still don’t really have enough data, I tend to always cycle the same routes. If I want to start considering factors that are highly temporally variable, like wind, I’m going to need a lot more data to keep the number of datapoints (always want to call this fold) high in my temporal bins.

The next step I suppose is to set up the RunKeeper download as a web service, so people can submit their RunKeeper tracks to me. This means I’m going to have to fix up some hard coded assumptions in the code, like the UTM zone for rectilinear projection, and what constitutes an inbound or an outbound route. Unsurprisingly this has become a lot more ambitious than a summer project.

If you feel like having a play, there is source code.

  1. This does have the side effect of reducing the signal from head/tail winds, especially on straight trips, I need to think about this more. []
  2. We need this, otherwise the velocity anomaly would average out depending on which direction we’re headed up/down a hill. []
Posted in Uncategorized | 2 Comments