Uncategorized – scrambled tofu

Observer Pattern in Javascript Asynchronous Generators

One of the more recent additions to the Javascript specification is the asynchronous generator protocol. This is an especially useful pattern when you want to consume data off a socket/serial port etc., because it lets you do something like so:

for await (const buffer of readable) {
    await writable.write(buffer);
}

Which is pretty cool, but not a huge improvement on the `pipe` functionality already exposed in Node streams.

Where it really shines is the ability to also yield observations, allowing you to build an observer pattern:

async * download(writable) {
  await this.open();

  try {
    const readable = this.readSectors(...);
    let counter = 0;

    for await (const buffer of readable) {
      const buffer = SECTOR.parse(buffer);
      await writable.write(buffer);

      counter++;
      yield counter;
    }

  } finally {
    await this.close();
  }
}

The primary advantage is the flatness makes our exit handling very obvious. Similarly in readSectors it flattens the entry and exit of the read mode.

Those building React/Redux apps probably want to get those observations into their state. This is relatively easily achieved in redux-saga through the eventChannel API.

function asyncToChannel(generator) {
  return eventChannel(emit => {

    // Set up a promise that iterates the async iterator and emits
    // events on the channel from it.
    (async () => {
      for await (const elem of generator) {
        emit(elem);
      }

      emit(STOP_ITERATION);
    })();

    return () => {
      generator.return();
    };
  });
}

// Saga triggered on DOWNLOAD_BEGIN
function* downloadSaga(action) {
  const writable = ...
  const channel = asyncToChannel(action.data.download(writable));

  // Consume the channel into Redux actions
  while (true) {
    const progress = yield take(channel);

    if (progress === STOP_ITERATION) break;

    yield put(downloadProgress(action.data, progress));
  }

  console.debug("Download complete");
  yield put(downloadComplete(action.data));
}

Tricky to work out, but much more easily read than callback hell.

Using the Nitrokey HSM with GPG in macOS

Getting yourself set up in macOS to sign keys using a Nitrokey HSM with gpg is non-trivial. Allegedly (at least some) Nitrokeys are supported by scdaemon (GnuPG’s stand-in abstraction for cryptographic tokens) but it seems that the version of scdaemon in brew doesn’t have support.

However there is gnupg-pkcs11-scd which is a replacement for scdaemon which uses PKCS #11. Unfortunately it’s a bit of a hassle to set up.

There’s a bunch of things you’ll want to install from brew: opensc, gnupg, gnupg-pkcs11-scd, pinentry-mac, openssl and engine_pkcs11.

brew install opensc gnupg gnupg-pkcs11-scd pinentry-mac \
    openssl engine-pkcs11

gnupg-pkcs11-scd won’t create keys, so if you’ve not made one already, you need to generate yourself a keypair. Which you can do with pkcs11-tool:

pkcs11-tool --module /usr/local/lib/opensc-pkcs11.so -l \
    --keypairgen --key-type rsa:2048 \
    --id 10 --label 'Danielle Madeley'

The --id can be any hexadecimal id you want. It’s up to you to avoid collisions.

Then you’ll need to generate and sign a self-signed X.509 certificate for this keypair (you’ll need both the PEM form and the DER form):

/usr/local/opt/openssl/bin/openssl << EOF
engine -t dynamic \
    -pre SO_PATH:/usr/local/lib/engines/engine_pkcs11.so \
    -pre ID:pkcs11 \
    -pre LIST_ADD:1 \
    -pre LOAD \
    -pre MODULE_PATH:/usr/local/lib/opensc-pkcs11.so
req -engine pkcs11 -new -key 0:10 -keyform engine \
    -out cert.pem -text -x509 -days 3640 -subj '/CN=Danielle Madeley/'
x509 -in cert.pem -out cert.der -outform der
EOF

The flag -key 0:10 identifies the token and key id (see above when you created the key) you’re using. If you want to refer to a different token or key id, you can change these.

And import it back into your HSM:

pkcs11-tool --module /usr/local/lib/opensc-pkcs11.so -l \
    --write-object cert.der --type cert \
    --id 10 --label 'Danielle Madeley'

You can then configure gnupg-agent to use gnupg-pkcs11-scd. Edit the file ~/.gnupg/gpg-agent.conf:

scdaemon-program /usr/local/bin/gnupg-pkcs11-scd
pinentry-program /usr/local/bin/pinentry-mac

And the file ~./gnupg/gnupg-pkcs11-scd.conf:

providers nitrokey
provider-nitrokey-library /usr/local/lib/opensc-pkcs11.so

gnupg-pkcs11-scd is pretty nifty in that it will throw up a (pin entry) dialog if your token is not available, and is capable of supporting multiple tokens and providers.

Reload gpg-agent:

gpg-agent --server gpg-connect-agent << EOF
RELOADAGENT
EOF

Check your new agent is working:

gpg --card-status

Get your key handle (grip), which is the 40-character hex string after the phrase KEY-FRIEDNLY (sic):

gpg-agent --server gpg-connect-agent << EOF
SCD LEARN
EOF

Import this key into gpg as an ‘Existing key’, giving the key grip above:

gpg --expert --full-generate-key

You can now use this key as normal, create sub-keys, etc:

gpg -K
/Users/danni/.gnupg/pubring.kbx
-------------------------------
sec> rsa2048 2017-07-07 [SCE]
 1172FC7B4B5755750C65F9A544B80C280F80807C
 Card serial no. = 4B43 53233131
uid [ultimate] Danielle Madeley <danielle@madeley.id.au>

echo -n "Hello World" | gpg --armor --clearsign --textmode

Side note: the curses-based pinentry doesn’t deal with piping content into stdin, which is why you want pinentry-mac.

You can also import your certificate into gpgsm:

gpgsm --import < ca-certificate
gpgsm --learn-card

And that’s it, now you can sign your git tags with your super-secret private key, or whatever it is you do. Remember that you can’t exfiltrate the secret keys from your HSM in the clear, so if you need a backup you can create a DKEK backup (see the SmartcardHSM docs), or make sure you’ve generated that revocation certificate, or just decided disaster recovery is for dweebs.

Applied PKCS#11

The most involved thing I’ve had to learn this year is how to actually use PKCS #11 to talk to crypto hardware. It’s actually not that clear. Most of the examples are buried in random bits of C from vendors like Oracle or IBM; and the spec itself is pretty dense. Especially when it comes to understanding how you actually use it, and what all the bits and pieces do.

In honour of our Prime Minister saying he should have NOBUS access into our cryptography, which is why we should all start using hardware encryption modules (did you know you can use your TPM) and thus in order to save the next girl 6 months of poking around on a piece of hardware she doesn’t really *get*, I started a document: Applied PKCS#11.

The later sections refer to the API exposed by python-pkcs11, but the first part is generally relevant. Hopefully it makes sense, I’m super keen to get feedback if I’ve made any huge logical leaps etc.

Update on python-pkcs11

I spent a bit of time fleshing out the support matrix for python-pkcs11 and getting things that aren’t SoftHSM into CI for integration testing (there’s still no one-command rollout for BuildBot connected to GitHub, but I got there in the end).

The nice folks at Nitrokey are also sending me some devices to widen the compatibility matrix. Also happy to make it work with CloudHSM if someone at Amazon wants to hook me up!

I also put together API docs that hopefully help to explain how to actually use the thing and added support for RFC3279 to pyasn1_modules (so you can encode your elliptic curve parameters).

Next goal is to open up my Django HSM integrations to add encrypted database fields, encrypted file storage and various other offloads onto the HSM. Also look at supporting certificate objects for all that wonderful stuff.

Searching for documents with arrays of objects using Elasticsearch

Originally posted on ixa.io.

Elasticsearch is pretty nifty in that searching for documents that contain an array item requires no additional work to if that document was flat. Further more, searching for documents that contain an object with a given property in an array is just as easy.

For instance, given documents with a mapping like so:

{
    "properties": {
        "tags": {
            "properties": {
                "tag": {
                    "type": "string"
                },
                "tagtype": {
                    "type": "string"
                }
            }
        }
    }
}

We can do a query to find all the documents with a tag object with tag term find me:

{
    "query": {
        "term": {
            "tags.tag": "find me"
        }
    }
}

Using a boolean query we can extend this to find all documents with a tag object with tag find me or tagtype list.

{
    "query": {
        "bool": {
            "must": [
                {"term": {"tags.tag": "find me"},
                {"term": {"tags.tagtype": "list"}
            ]
        }
    }
}

But what if we only wanted documents that contained tag objects of tagtype list *and* tag find me? While the above query would find them, and they would be scored higher for having two matches, what people often don’t expect is that hide me lists will also be returned when you don’t want them to.

This is especially surprising if you’re doing a filter instead of a query; and especially-especially surprising if you’re doing it using an abstraction API such as elasticutils and expected Django-esque filtering.

How Elasticsearch stores objects

By default the object mapping type stores the values in a flat dotted structure. So:

{
    "tags": {
        "tag": "find me",
        "tagtype": "list"
    }
}

Becomes:

{
    "tags.tag": "find me",
    "tags.tagtype": "list"
}

And for lists:

{
    "tags": [
        {
            "tag": "find me",
            "tagtype": "list"
        }
    ]
}

Becomes:

{
    "tags.tag": ["find me"],
    "tags.tagtype": ["list"]
}

This saves a whole bunch of complexity (and memory and CPU) when implementing searching, but it’s no good for us finding documents containing specific objects.

Enter: the nested type

The solution to finding what we’re looking for is to use the nested query and mark our object mappings up with the nested type. This preserves the objects and allows us to execute a query against the individual objects. Internally it maps them as separate documents and does a child query, but they’re hidden documents, and Elasticsearch takes care to keep the documents together to keep things fast.

So what does it look like? Our mapping only needs one additional property:

{
    "properties": {
        "tags": {
            "type": "nested",
            "properties": {
                "tag": {
                    "type": "string"
                },
                "tagtype": {
                    "type": "string"
                }
            }
        }
    }
}

We then make a nested query. The path is the dotted path of the array we’re searching. query is the query we want to execute inside the array. In this case it’s our boolquery from above. Because individual sub-documents have to match the subquery for the main-query to match, this is now the and operation we are looking for.

{
    "query": {
        "nested": {
            "path": "tags",
            "query": {
                "bool": ...
            }
        }
    }
}

Using nested with Elasticutils

If you are using Elasticutils, unfortunately it doesn’t support nested out of the box, and calling query_raw or filter_raw breaks your Django-esque chaining. However it’s pretty easy to add support using something like the following:

    def process_filter_nested(self, key, value, action):
        """
        Do a nested filter

        Syntax is filter(path__nested=filter).
        """

        return {
            'nested': {
                'path': key,
                'filter': self._process_filters((value,)),
            }
        }

Which you can use something like this:

S().filter(tags__nested=F(**{
    'tags.tag': 'find me',
    'tags.tagtype': 'list',
})

vim + pathogen + git submodules

As part of PyCon Au this weekend I did a lot of hacking on my laptop, which is not something I’ve done for a while, given how much I used to hack on my laptop. I was frequently getting annoyed at how my vim config wasn’t the same as it’s currently on my work desktop.

Back in uni, I used to keep my dotfiles in revision control on a machine I could connect to. All I needed was my ssh agent and I could get all my config.

Recently, when wanting to extend my vim config, people’s pages tell me to do it through vim pathogen, which I didn’t have set up.

Also there have been times when people have asked for my vim setup, which I wasn’t easily able to provide.

Given all of that, I decided it was time to rebuild my config using pathogen + git submodules from the ground up. As part of this, I updated quite a few plugins, and started using a few new things I hadn’t had available without pathogen, so it’s still a bit of a work in progress, but it’s here.

Installing new plugins is easy with git submodule add PATH bundle/MODULE_NAME.

If you’re a vim user, I strongly recommend this approach.

free as in gorgeous

I’m at PyCon Au. ((Thanks to Infoxchange, who sent me, and are also sponsoring the conference.)) I made it this year. There were no unexpected collisions in the week leading up to it.

I decided at the last moment to do a lightning talk on a piece of Django-tech I put together at Infoxchange, a thing called Crisper, which I use in a very form-heavy app I’m developing. Crisper is going to be up on Github, just as soon as I have time to extract it from the codebase it’s part of (which mostly consists of working out where it’s pointlessly coupled).

The title of my post relates to deciding to do this talk, and the amazing free and open graphic design assets I have available at my fingertips to create an instant title slide. 5 minutes in Inkscape, using the Tango palette and Junction (available in Fedora as tlomt-junction-fonts) and Inkscape to put together things like this:

Anyway so some pretty good content today at DjangoCon Au. I especially enjoyed the food for thought about whether we should be targeting new development at Python 3. It has struck me that writing greenfields code on Python 2 is a dead end that will eventually result in my having to port it anyway. Thinking I will conduct a quick stocktake of the dependencies we’re using at work to evaluate a Python 3 port. I think generally we’re writing pretty compatible code using modern Python syntax, so from the point of view of our apps, at port at this stage would be fairly straightforward.

Some interesting discussion too on the future of frameworks like Django. Is the future microframeworks + rich JS on the frontend using Angular or Ember. How do things like Django adapt to that? Even though JS has nothing on Python, is the future node.js on the back and somethingJS on the front? ((It always cracked me up that my previous web-app was this HTML5-y app, but designed without the use of frameworks. Old and new both at once.))

This is actually a nice aside that I’ve been working on a pure in-web app for visualising data from Redmine using D3.js and Redmine’s RESTful API. ((https://github.com/danni/redmine-viz)) Intending to add configuration in localstorage (probably via Angular) once I have the visualisations looking good. It both serves a practical purpose and an interesting experiment on writing web apps without a web server.

Almost back on topic, I had dinner with a bunch of cyclists tonight, and got to talking about my crowd-sourced cycle maps project, ((https://github.com/danni/cycle-router)) which hasn’t gone anywhere because I still don’t have any hosting for it. I should try having dinner with cyclists who work for PaaS hosting providers. ((I need PostGres 9.2 and PostGIS 2.0. Come chat to me!))

Finally, I’m trying to recruit talented web developers who have a love of Agile, test driven design and Python. Also a technical tester. Come and talk to me.

First thoughts on RedHat OpenShift

I’m looking for a PaaS provider that isn’t going to cost me very much (or anything at all) and supports Flask and PostGIS. Based on J5’s recommendation in my blog the other day, I created an OpenShift account.

A free account OpenShift gives you three small gears ((small gears have 1GB of disk and 512MB of RAM allocated)) which are individual containers you can run an app on. You can either run an app on a single gear or have it scale to multiple gears with load balancing. You then install components you need, which OpenShift refers to by the pleasingly retro name of cartridges. So for instance, Python 2.7 is one cartridge and PostgreSQL is another. You can either install all cartridges on one gear or on separate gears based on your resource needs ((I think if you have a load balancing (scalable) application, your database needs to be on its own gear so all the other gears can access it.)).

You choose your base platform cartridge (i.e. Python-2.6) and you optionally give it a git URL to do an initial checkout from (which means you can deploy an app that is already arranged for OpenShift very fast). The base cartridge sets up all the hooks for setting up after a git push (you get a git remote that you can push to to redeploy your app). The two things you need are a root setup.py containing your pip requirements, and a wsgi/application file which is a Python blob containing an WSGI object named application. For Python it uses virtualenv and all that awesome stuff. I assume for node.js you’d provide a package.json and it would use npm, similarly RubyGems for Ruby etc.

There’s a nifty command line tool written in Ruby (what happened to Python-only Redhat?) that lets you do all the sort of cloud managementy stuff, including reloading cartridges and gears, tailing app logs and SSHing into the gear. I think an equivalent of dbshell would be really useful based on your DB cartridge, but it’s not a big deal.

There are these deploy hooks you can add to your git repo to do things like create your databases. I haven’t used them yet, but again it would make deploying your app very fast.

There are also quickstart scripts for deploying things like WordPress, Rails and a Jenkins server onto a new gear. Speaking of Jenkins there’s also a Jenkins client cartridge which I think warrants experimentation.

So what’s a bit crap? Why isn’t my app running on OpenShift yet? Basically because the available cartridges are a little antique. The supported Python is Python 2.6, which I could port my app too; or there are community-supported 2.7 and 3.3 cartridges, so that’s fine for me (TBH, I thought my app would run on 2.6) but maybe annoying for others. There is no Celery cartridge, which is what I would have expected, ideally so you can farm tasks out to other gears, and although you apparently can use it, there’s very little documentation I could find on how to get it running.

Really though the big kick in the pants is there is no cartridge for Postgres 9.2/PostGIS 2.0. There is a community cartridge you can use on your own instance of OpenShift Origin, but that defeats the purpose. So either I’m waiting for new Postgres to be made available on OpenShift or backporting my code to Postgres 8.4.

Anyway, I’m going to keep an eye on it, so stay tuned.

scratching my own itch [or: why I still love open source]

I’m visiting my parents for the long weekend. Sitting in the airport I decided I should use my spare time to write some documentation, so sitting in the airport was the first time I’d tried to connect to work’s OpenVPN server. While it’s awesome that Network Manager can now import OpenVPN configs, it didn’t work because NM doesn’t support the crucial keysize parameter.

Rather than work around the problem, which some people have done, but would annoyingly break my other OpenVPNs, I used the fact that it’s open source to fix the problem properly.

My Dad asked if I was working. No, well, not really. I’m fixing the interface to my VPN client so I can connect to work’s VPN, I replied. Unglaublich! my father remarked. Not unbelievable, because it’s open source!

IdentitiesOnly + ssh-agent

I’m really hoping that someone can provide me with some enlightenment.

I have a lot of ssh keys. 6 by today’s count. On my desktop I have my ssh configured with IdentitiesOnly yes and an IdentityFile for each host. This works great.

I then forward my agent to my dev VM. I can see the keys with ssh-add -l. So far so good. If I then ssh into a host, I can see it trying every key from the agent in sequence, which is sometimes going to fail with too many keys tried. However, if I try IdentitiesOnly yes in my dev VM config, it doesn’t offer any keys, if I add IdentityFile it doesn’t work because I don’t have those key files on my VM.

So what’s the solution? What I want is to specify identities by their identifier in the agent, e.g. danni@github, however I can’t see config to do that. Anyone got a nifty solution?