scrambled tofu

IdentitiesOnly + ssh-agent

I’m really hoping that someone can provide me with some enlightenment.

I have a lot of ssh keys. 6 by today’s count. On my desktop I have my ssh configured with IdentitiesOnly yes and an IdentityFile for each host. This works great.

I then forward my agent to my dev VM. I can see the keys with ssh-add -l. So far so good. If I then ssh into a host, I can see it trying every key from the agent in sequence, which is sometimes going to fail with too many keys tried. However, if I try IdentitiesOnly yes in my dev VM config, it doesn’t offer any keys, if I add IdentityFile it doesn’t work because I don’t have those key files on my VM.

So what’s the solution? What I want is to specify identities by their identifier in the agent, e.g. danni@github, however I can’t see config to do that. Anyone got a nifty solution?

generic lettuce steps for Django models

After I left the Bureau approximately a month ago I’ve taken up a new role with Infoxchange Australia. My first project here is working on a rewrite of an application using Django.

People here are really into behaviour driven testing, and we’re using Lettuce to do it (using a branch with better Django integration).

I sort of dislike this sort of testing, because it creates an annoying abstraction layer on top of the code, with a poorly defined, quasi-real language. It’s like a bad knock off of Applescript. Anyway, I got sick of defining steps per model, so I put together some generic steps for manipulating Django models (that I’ll have to contribute back).

Anyway they look like this (examples of the step in the docstrings):

# build a hash of model verbose names to models
# this is used by get_model()
def _models_generator():
    for model in get_models():
        yield (model._meta.verbose_name, model)
        yield (model._meta.verbose_name_plural, model)

MODELS = dict(_models_generator())


def get_model(model):
    """
    Convert a model's verbose name to the model class. This allows us to
    use the models verbose name in steps.
    """

    name = model.lower()
    model = MODELS.get(model, None)

    assert model, "Could not locate model by name '%s'" % name

    return model


def create_models(model, hashes):
    for hash_ in hashes:
        model.objects.create(**hash_)


def models_exist(model, hashes):
    for hash_ in hashes:
        assert \
            model.objects.filter(**hash_).exists(), \
            "Object does not exist"


@step(r'I have ([a-z][a-z0-9_ ]*) in the database:')
def create_models_generic(step, model):
    """
    And I have admin field values in the database:
    | name         | value   |
    | project_type | Twine   |

    The generic method can be overridden for a specific model by defining a
    function create_badgers(step), which creates the Badger model.
    """

    try:
        globals()['create_%s' % model](step)
    except KeyError:
        model = get_model(model)

        create_models(model, step.hashes)


@step(r'(?:Given|And|Then) ([A-Z][a-z0-9_ ]*) with ([a-z]+) "([^"]*)" has ([A-Z][a-z0-9_ ]*) in the database:')  # noqa
def create_models_for_relation(step, rel_model_name,
                               rel_key, rel_value, model):
    """
    And project with name "Ball Project" has goals in the database:
    | description                             |
    | To have fun playing with balls of twine |
    """

    lookup = {rel_key: rel_value}
    rel_model = get_model(rel_model_name).objects.get(**lookup)

    for hash_ in step.hashes:
        hash_['%s_id' % rel_model_name] = rel_model.id

    create_models_generic(step, model)


@step('(?:Given|And|Then) ([A-Z][a-z0-9_ ]*) should be present in the database')
def step_models_exist(step, model):
    """
    And objectives should be present in the database:
    | description      |
    | Make a mess      |
    """

    model = get_model(model)

    models_exist(model, step.hashes)


@step(r'There should be (\d+) ([a-z][a-z0-9_ ]*) in the database')
def model_count(step, count, model):
    """
    Then there should be 0 goals in the database
    """

    model = get_model(model)

    assert_equals(model.objects.count(), int(count))

Reviewing GNOME3 App Development Beginners Guide

The folk at Packt Publishing sent me an e-copy of GNOME 3 Application Development Beginners Guide the other day. Since I find myself with a couple of weeks off (more on that another time) I’m going to be reading it and writing a review.

The book weighs in at 366 pages and purports to cover GLib, GTK+, GStreamer, E-D-S, WebKit, desktop D-Bus APIs, i18n and unit testing in both Javascript (via Seed) and Vala.

Hopefully I will get it read in the next couple of weeks and get my thoughts jotted down. I am not getting anything except an e-copy of the book for my trouble so you can trust me to be brutally honest 😛

Generating JSON from SQLAlchemy objects

I had to put together a small web app the other day, using SQLAlchemy and Flask. Because I hate writing code multiple times, when I can do things using a better way, I wanted to be able to serialise SQLAlchemy ORM objects straight to JSON.

I decided on an approach where taking a leaf out of Javascript, I would optionally implement a tojson() method on a class, which I would attempt to call from my JSONEncoder ((The tojson() method actually returns a Python dict understandable by JSONEncoder)).

It turns out to be relatively simple to extend SQLAlchemy’s declarative base class to add additional methods (we can also use this as an excuse to implement a general __repr__().

from sqlalchemy.ext.declarative import declarative_base as real_declarative_base

# Let's make this a class decorator
declarative_base = lambda cls: real_declarative_base(cls=cls)

@declarative_base
class Base(object):
    """
    Add some default properties and methods to the SQLAlchemy declarative base.
    """

    @property
    def columns(self):
        return [ c.name for c in self.__table__.columns ]

    @property
    def columnitems(self):
        return dict([ (c, getattr(self, c)) for c in self.columns ])

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self.columnitems)

    def tojson(self):
        return self.columnitems

We can then define our tables in the usual way:

class Client(Base):
    __tablename__ = 'client'

    ...

You can obviously replace any of the methods in your subclass, if you don’t want to serialise the whole thing. Bonus points for anyone who wants to extend this to serialise one-to-many relationships.

And what about calling the tojson() method? That’s easy, we can just provide our own JSONEncoder.

import json

class JSONEncoder(json.JSONEncoder):
    """
    Wrapper class to try calling an object's tojson() method. This allows
    us to JSONify objects coming from the ORM. Also handles dates and datetimes.
    """

    def default(self, obj):
        if isinstance(obj, datetime.date):
            return obj.isoformat()

        try:
            return obj.tojson()
        except AttributeError:
            return json.JSONEncoder.default(self, obj)

Cutting edge Flask provides a way to replace the default JSON encoder, but the version I got out of pip does not. This is relatively easy to work around though by replacing jsonify with our own version.

from flask import Flask

app = Flask(__name__)

def jsonify(*args, **kwargs):
    """
    Workaround for Flask's jsonify not allowing replacement of the JSONEncoder
    in my version of Flask.
    """

    return app.response_class(json.dumps(dict(*args, **kwargs),
                                         cls=JSONEncoder),
                              mimetype='application/json')

If you do have a newer Flask, where you don’t have to replace jsonify, you can also inherit from Flask’s JSONEncoder, which already handles things like datetimes for you.

elevation data and APIs

So I found a bit of time to hack on my project today. Today’s task was to load and validate data coming from Geoscience Australia’s SRTM digital elevation model data, which I downloaded from their elevation data portal last week. ((Note to the unwary, seems to be buggy in Chrome?)) The data is Creative Commons, so I might just upload it somewhere, if I can find a place for 2GB of elevation data.

This let me load elevation data in a 3 arcsecond (about 100m) grid, which I did using the ubiquitous GDAL via its Python API. Initial code is here. It doesn’t do anything super clever yet, like check and normalise the projection, because I don’t need to. ((I did write a skeleton context manager for gdal.Open. I say skeleton because it doesn’t actually do anything smart like turning errors from GDAL into Exceptions, because I didn’t have any errors to handle.))

Looking at plots of values can give you a gist of what’s what (oh look, it goes out to sea, and then the data is masked out) but it doesn’t really validate anything. I could do validation runs against my GPS tracks, but for a first pass, I decided it would be easier to validate using Google’s Elevation API. This is a pretty neat web service that you make a request to, and it gives you back some JSON (or XML). There are undoubtedly Python APIs to access this, but it’s pretty easy to do a simple call with urllib2 or httplib. I chose to reuse my httplib Client wrapper from my RunKeeper/HealthGraph API. I wrote it directly in the test.

For a real unit test, I would have probably calculated the residuals, and ensured they sat within some acceptable range, but I’m lazy, so instead I just plotted them together. Google’s data, you will notice, includes bathymetry, which is actually pretty neat.

Testing warnings with py.test

For those who use like to add warnings to your Python code, and want to test those warnings actually happen in your unit tests, here are two techniques to do so, both are based around fixtures/funcargs.

Firstly is the mechanism built into py.test using recwarn.

The second is to create a fixture that specifically enables warnings as exceptions and combined that with pytest.raises, for instance:

import warnings

@pytest.fixture
def warnings_as_errors(request):
    warnings.simplefilter('error')

    request.addfinalizer(lambda *args: warnings.resetwarnings())

def test_timers_warn(log, warnings_as_errors):

    log.start_timer('method')

    with pytest.raises(RuntimeWarning):
        log.start_timer('method')

The advantage of this second method is you can guarantee exactly what method call raises the warning without repeatedly having to check recwarn.

Investigating cycling speed anomalies

So as I’ve spent the last year learning Melbourne as a cyclist, there’s been a few times where I’ve found myself on an absolutely staggering hill, only to go down it again, and worse find there was another way that totally avoided the hill; or I’ve chosen routes that subject me to staggering head winds only to be told I should have taken another route instead.

This got me thinking, with everyone tracking their cycles on their smartphones, why couldn’t I feed all of this data into a model, along with some data like NASA’s elevation grids, or the Bureau of Meteorology’s wind observations. As something to keep me occupied over Christmas, I started a little project on the plane to Perth.

It turns out RunKeeper has this handy API that lets you access everything stored there. Unfortunately it seems that no one has really written a good Python API for this, so I put one together.

Throw in a bit of NumPy, PyProj (to convert to rectilinear coordinates) and Matplotlib (to plot it) and you can get a graph that looks like this (which thankfully looks a lot like RunKeeper’s graph):

If we do some long window smoothing, we can get an idea of a cyclist’s average speed and then calculate a percentage anomaly from this average speed. This lets us compensate for different cyclists, how tired they are or if they’re riding with someone else ((This does have the side effect of reducing the signal from head/tail winds, especially on straight trips, I need to think about this more.)).

If we then do this for lots of tracks and grid the results based on whether the velocity vector at each point is headed towards or away from Melbourne ((We need this, otherwise the velocity anomaly would average out depending on which direction we’re headed up/down a hill.)) we can get spatial plots that look like this (blue is -1 and red is 1):

If you squint at the graphs you can sort of see that there are many places where the blue/red are inverted, which is promising, it meant something was making us faster one way and slower the other (a hill or wind or the pub). You can also see that I still don’t really have enough data, I tend to always cycle the same routes. If I want to start considering factors that are highly temporally variable, like wind, I’m going to need a lot more data to keep the number of datapoints (always want to call this fold) high in my temporal bins.

The next step I suppose is to set up the RunKeeper download as a web service, so people can submit their RunKeeper tracks to me. This means I’m going to have to fix up some hard coded assumptions in the code, like the UTM zone for rectilinear projection, and what constitutes an inbound or an outbound route. Unsurprisingly this has become a lot more ambitious than a summer project.

If you feel like having a play, there is source code.

Extending Selenium with jQuery

Last week I wrote about combining Selenium and py.test and I promised to also talk about my function find_elements_by_jquery().

Selenium by default can find elements by id, CSS selector and XPath, but I often find I already know the query as a jQuery selector, and so frequently it’s easiest just to use that.

We start by overloading the Selenium webdriver. Since the webdriver is exposed through several classes (one per web browser), we do this in a particularly meta way.

from selenium.webdriver.remote.webdriver import WebElement
from selenium.common.exceptions import InvalidSelectorException

def MyWebDriver(base, **kwargs):
    return type('MyWebDriver', (_MyWebDriver, base), kwargs)

class _MyWebDriver(object):
    def create_web_element(self, element_id):
        return MyWebElement(self, element_id)

    def find_elements_by_jquery(self, jq):
        return self.execute_script('''return $('%s').get();''' % jq)

    def find_element_by_jquery(self, jq):
        elems = self.find_elements_by_jquery(jq)
        if len(elems) == 1:
            return elems[0]
        else:
            raise InvalidSelectorException(
                "jQuery selector returned %i elements, expected 1" % len(elems))

We then do a similar implementation for the webelement:

class MyWebElement(WebElement):
    def __repr__(self):
        """Return a pretty name for an element"""

        id = self.get_attribute('id')
        class_ = self.get_attribute('class')

        if len(id) &gt; 0:
            return '#' + id
        elif len(class_) &gt; 0:
            return '.'.join([self.tag_name] + class_.split(' '))
        else:
            return self.tag_name

    def find_elements_by_jquery(self, jq):
        return self.parent.execute_script(
            '''return $(arguments[0]).find('%s').get();''' % jq, self)

    def find_element_by_jquery(self, jq):
        elems = self.find_elements_by_jquery(jq)
        if len(elems) == 1:
            return elems[0]
        else:
            raise InvalidSelectorException(
                "jQuery selector returned %i elements, expected 1" % len(elems))

We can now pass in jQuery selectors for instance b.find_element_by_jquery('#region option:selected'). Or form.find_elements_by_jquery(':input'). It’s especially incredibly powerful when all of your DOM manipulation already works in terms of jQuery selectors.

As an added bonus, overloading the classes lets us add functionality like Firebug style element names (MyWebElement.__repr__) or wrap things like the Wait utility into the webdriver, e.g.

from selenium.webdriver.support.ui import WebDriverWait as Wait
from selenium.common.exceptions import TimeoutException

class FrontendError(Exception):
    pass

# class _MyWebDriver...
    def wait(self, event, timeout=10):
        try:
            Wait(self, timeout).until(event)
        except (TimeoutException, FrontendError) as e:
            # do we have an error dialog
            dialog = self.find_element_by_id('error-dialog')
            if dialog.is_displayed():
                content = dialog.find_element_by_id('error-dialog-content')
                raise FrontendError(content.text)
            else:
                raise e

Combining py.test and Selenium to test webapps

Recently I started adding unit and acceptance tests to a webapp using Selenium, integrated into the existing py.test framework that tests the backend code.

py.test fixtures make using Selenium, via its Python bindings, really straightforward. Here’s how I did it.

First I put all the Selenium related tests in a tests/selenium/ directory. I then created tests/selenium/conftest.py and wrote a fixture to allow tests to access a single instance of the webdriver for the entire session:

import pytest
from selenium import webdriver

browsers = {
    'firefox': webdriver.Firefox,
    'chrome': webdriver.Chrome,
}

@pytest.fixture(scope='session',
                params=browsers.keys())
def driver(request):
    if 'DISPLAY' not in os.environ:
        pytest.skip('Test requires display server (export DISPLAY)')

    b = browsers[request.param]()

    request.addfinalizer(lambda *args: b.quit())

    return b

Note that we’re able to parameterise the fixture so that it runs with multiple browsers. We then add a per-function fixture that sets up the session for an individual test:

@pytest.fixture
def b(driver, url):
    b = driver
    b.set_window_size(1200, 800)
    b.get(url)

    return b

A fixture can refer to other fixtures of more generic scope. So url is a fixture that accesses the optional --url property.

def pytest_addoption(parser):
    parser.addoption('--url', action='store',
                     default='http://localhost/portal/portal.html')

@pytest.fixture(scope='session')
def url(request):
    return request.config.option.url

These fixtures are available for all tests in that package. Tests have the form:

def test_badger(b):
    # test goes here

We can also create per-module fixtures, that optionally inherit our generic fixtures. Say for example we want to run a number of tests (e.g. for WCAG 2.0 compliance) on a number of parameterised instances of the set-up webapp. We might do this in test_wcag.py:

import pytest

@pytest.fixture(scope='module')
def wcag(driver, url):
    """
    Set up a single session for these tests.
    """

    b = driver
    b.set_window_size(1200, 800)
    b.get(url)

    # do stuff here with Selenium to set up webapp
    
    return b

We can now write tests ((find_elements_by_jquery() is a method I’ve added in an extension of Selenium’s webdriver, and is a topic for another post.)) in this module, e.g.

@pytest.mark.wcagF17
@pytest.mark.wcagF62
@pytest.mark.wcagF77
def test_unique_ids(wcag):
    """
    All ids in the document should be unique.
    """

    elems = wcag.find_elements_by_jquery('[id]')
    ids = map(lambda e: e.get_attribute('id'), elems)

    assert len(elems) >= 1 # sanity check
    assert util.unique(ids)

Again, we can parameterise this fixture to set up the webapp in a number of different ways. Note that we have to use driver as our fixture, not b. This is because we can only refer to fixtures more general in scope than the one we are writing.

Finding Ada — Elaine Miles

October 16th is Ada Lovelace Day. A day that showcases women in engineering, maths, science and technology by profiling a woman technologist, scientist, engineer or mathematician on your blog.

This year I’m writing about Elaine Miles, a researcher at the Australian Bureau of Meteorology, who I met through mutual colleagues over lunch one day. She has since become one of my go-to people whenever I require a crash-course in something. She awesomely let me interview her for Ada Lovelace Day.

Miles is a physicist working at the Centre for Australian Weather at Climate Research (CAWCR), a joint project between the Bureau of Meteorology and the Commonwealth Scientific and Industrial Research Organisation (CSIRO), where she is investigating the use of dynamic models to predict sea level in the Western Pacific.

Miles studied Applied Mathematics and Physics at the University of Melbourne. She attributes her love of maths to primary school, where she recalls being chastised by her teacher for attempting the subtraction problems further on in the workbook before they had been taught subtraction. She says she would always lament when another class ran over and cut into the maths lesson. Her love of maths originally led her to enroll in Electrical Engineering, but she didn’t like the black box thinking that engineering encourages, preferring to understand concepts from first principles.

Completing a Bachelor in Applied Mathematics, she went on to do Honours in Physics, working on a project in art conservation, which it turns out is an extremely technical field. She built a laser interferometer using off-the-shelf parts (laser, CCD camera and a laptop) to monitor canvas artworks and detect the problems caused to art by changes in microclimate.

After teaching English in Japan for a year, Miles returned to Melbourne where she began her PhD (Miles is not related to Dr Elaine Miles the glass artist). Miles says she wanted to be learning or developing new things (plus she had unfinished business in art conservation) and so a PhD was the logical progression. Her PhD focused on two areas: the science of paint drying (literally watching paint dry she says) and subsurfacing imaging. She had a focus on south-east Asia, where Western art production techniques are prevalent but unsuitable because of the different climate.

Miles spent 3 months working with galleries in the Philippines where she used her own laser speckle interferometers to study artwork hanging in the gallery, in-situ. As far as she’s aware, studying art in-situ had never been done before. This work allows conservators to determine best practice and a course of action for storing and restoring works of art.

With her PhD close to being submitted, Miles began work at CAWCR where she first worked on data assimilation of weather balloon observations into weather forecasting models. She then moved on to verifying rainfall prediction models and getting weather radar data assimilated into the model.

For the last 10 months she has worked with the Pacific-Australia Climate Change Science and Adaptation Planning Program (PACCSAPP), where she investigates applying POAMA (Predictive Ocean Atmosphere Model for Australia), a dynamic, coupled ocean-atmospheric, multi-model ensemble global seasonal prediction model, to forecast global sea level anomalies 1-9 months in the future, specifically validating predictions with observations in the Western Pacific. This is the first time dynamic models have been used to predict medium-term sea level, and forms an extremely important part of helping the Pacific adapt to the immediate effects of climate change.

As for the future, Miles looks forward to getting her PhD submitted, but would like to continue working with sea level modelling. She hopes to start leading projects in Australia and around the world.