James Henstridge – Random stuff

What is in the SafeWA QR Codes?

Post author:James Henstridge
Post published:23 December, 2020
Post category:Uncategorised
Post comments:0 Comments

Earlier this month, the Western Australian government introduced the SafeWA contact tracing app, which relies on users scanning a QR code at a venue or event in order to be added to the online register. The app doesn’t request location permission, so it is solely linking your SafeWA user account with the information in the QR code.

The QR codes were quite large, so I was kind of curious what data was held inside them. So I tried scanning one with a different barcode scanning app, which showed a standard URL-style QR code. Here is what was in a code displayed outside the Coles supermarket in Claremont:

https://safewa.health.wa.gov.au/qr-code/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ2ZW51ZUlkIjoiNWZkYzVkYjViMjdjNzY5MDJhN2NiMzU4Iiwic2NhbkxvY2F0aW9uSWQiOiI1ZmRjNWRiNWIyN2M3NjEyYzU3Y2IzNWEiLCJpYXQiOjE2MDgyNzc0MjksImV4cCI6MjIzODk5NzQyOX0.ruc9OkZ0KgjF8z00BBhMzUIh-kJb1DhxhW9nNqvVi5w?scanLocation=5fdc5db5b27c7612c57cb35a&venue=5fdc5db5b27c76902a7cb358

There’s two query parameters in the URL holding what looks like hexadecimal encoded data — scanLocation and venue. The location identifier shares the prefix 5fdc5db5b27c76 with the venue ID: I’m not sure if that means the identifiers are encoding some common data, or whether they were generated using a UUID-style algorithm that generates IDs with a common authority prefix.

Before the query parameters, we have a large chunk of encoded data. Interestingly, it consists of three dot separated chunks with the first two starting with “ey”. That’s a strong indication that we’re looking at a JSON Web Token. Here, the header is:

{"alg":"HS256","typ":"JWT"}

And the payload is:

{"venueId":"5fdc5db5b27c76902a7cb358","scanLocationId":"5fdc5db5b27c7612c57cb35a","iat":1608277429,"exp":2238997429}

The last blob is an HMAC-SHA256 signature of the first two parts. So we’ve essentially got a signed duplicate of the query string parameters together with what looks like issue and expiry time stamps. Assuming these are standard UNIX time stamps, the token was issued at 3:43pm on 18th December. The expiry date has been set 7300 days after the issue date, which would be 20 years if every year was 365 days long.

As they’re using an HMAC signature, presumably the SafeWA app has no way to verify the token: if it did, then anyone with a copy of the app could extract the key and generate their own signed JWT blobs. If the JWT is sent back to the server verbatim, I wonder how much trouble it would be to just check that the (venue, scanLocation) pair is valid?

So there are a few ways they could simplify the QR codes:

If the JWT signatures are actually necessary, dropping the query parameters would remove 20% of the data in the code.
If the signature is unnecessary, dropping the JWT would remove 68% of the data in the code.
Having the QR code take the form of a URL would be useful if the app was set up to claim that URL prefix, since it would allow you to start the check-in process through any barcode scanning app. They haven’t done that though, so the URL prefix could be removed for a shorter plain text QR code.

Lastly, visiting the URL from the QR code directly in the a web browser currently just redirects you to the SafeWA home page and tells you to install the app. It seems like a missed opportunity not to let people sign their attendance at that URL directly, in case they don’t have the app installed or if it is malfunctioning. It could open the door for people spoofing the Health Dept website, but it’s not clear that’s worse than the the status quo where some venues still seem to be running their own contact registers.

Update: I played around with generating QR codes generated from modified versions of the URL to see what the app would accept. The app would accept a QR code with the query parameters stripped out, and fails if the JWT is stripped from the URL. So it is definitely using the JWT token to determine the parameters. It also seems to accept tokens with the signature stripped off, so it seems possible that it doesn’t actually care about validity.

Converting BigBlueButton recordings to self-contained videos

Post author:James Henstridge
Post published:25 July, 2020
Post category:Uncategorised
Post comments:0 Comments

When the pandemic lock downs started, my local Linux User Group started looking at video conferencing tools we could use to continue presenting talks and other events to members. We ended up adopting BigBlueButton: as well as being Open Source, it’s focus on education made it well suited for presenting talks. It has the concept of a presenter role, and built in support for slides (it sends them to viewers as images, rather than another video stream). It can also record sessions for later viewing.

To view those recordings though, you need to use BBB’s web player. I wanted to make sure we could keep the recordings available should the BBB instance we were using went away. Ideally, we’d just be able to convert the recordings to self contained videos files that could be archived and published along side our other recordings. There are a few tools intended to help with this:

bbb-recorder: screen captures Chrome displaying BBB’s web player to produce a video.
bbb-download: this one is intended to run on the BBB server, and combines slides, screen share and presentation audio using ffmpeg. Does not include webcam footage.

I really wanted something that would include both the camera footage and slides in one video, so decided to make my own. The result is bbb-render:

https://github.com/plugorgau/bbb-render

At the present, it consists of two scripts. The first is download.py, which takes the URL of a public BBB recording and downloads all of its assets to a local folder. The second is make-xges.py, which assembles those assets so they’re ready to render.

The resources retrieved by the download script include:

video/webcams.webm:: Video from the presenters’ cameras, plus the audio track for the presentation.
deskshare/deskshare.webm:: Video for screen sharing segments of the presentation. This is the same length as the webcams video, with blank footage when nothing is being shared.
deskshare.xml:: Timing information for when to show the screen share video, along with the aspect ration for a particular share session
shapes.svg:: An SVG file with custom timing attributes that is uses to present the slides and whiteboard scribbles. By following links in the SVG, we can download all the slide images.
cursor.xml:: Mouse cursor position over time. This is used for the “red dot laser pointer” effect.
slides_new.xml:: Not actually slides. For some reason, this is the text chat replay.

My first thought to combine the various parts was to construct a GStreamer pipeline that would play everything back together, using timers to bring slides in and out. This turned out to be easier said than done, so I started looking for something higher level.

It turns out GStreamer has that covered in the form of GStreamer Editing Services: a library intended to help write non-linear editing applications. That fits the problem really well: I’ve got a collection of assets and metadata, so just need to convert all the timing information into an appropriate edit list. I can put the webcam footage in the bottom right corner, ask for a particular slide image to display at a particular point on the timeline and go away at another point, display screen share footage, etc. It also made it easy to add a backdrop image to fill in the blank space around the slides and camera and add a bit of branding to the result.

On top of that, I can serialise that edit list to a file, rather than encoding the video directly. The ges-launch-1.0 utility can load the project to quickly play back the result without without having to wait for the video to encode.

I can even load the project in Pitivi, a video editor built on top of GES:

This makes it very easy to scrub through the timeline to quickly verify that everything looks correct.

At this point, the scripts can produce a crisp 1080p video that should be good enough for most presentations. There are a few areas that could be improved though:

If there are multiple presenters with their webcam on, we still get a single webcam video with each presenter feed shown in a square grid. It would probably look better to try and stack each presenter vertically. This could probably be done by applying videocrop as an effect to extract each individual presenter, and include the video multiple times in the project.
The data in cursor.xml is ignored. It would be pretty easy to display a small red circle image at the correct times and positions.
Whiteboard scribbles are also ignored. This would be a bit trickier to implement. It would probably involve dissecting shapes.svg into a sequence of SVGs containing the elements visible at each point in time. Making matters more complicated, the JavaScript web player adjusts the viewBox when switching to/from slides and screen share, and that changes how the coordinates of the scribbles are interpreted.

As GUADEC is using BigBlueButton this year, hopefully it should help with processing the recordings into individual videos.

Using GAsyncResult APIs with Python’s asyncio

Post author:James Henstridge
Post published:7 October, 2019
Post category:Uncategorised
Post comments:0 Comments

With a GLib implementation of the Python asyncio event loop, I can easily mix asyncio code with GLib/GTK code in the same thread. The next step is to see whether we can use this to make any APIs more convenient to use. A good candidate is APIs that make use of GAsyncResult.

These APIs generally consist of one function call that initiates the asynchronous job and takes a callback. The callback will be invoked sometime later with a GAsyncResult object, which can be passed to a “finish” function to convert this to the result type relevant to the original call. This sort of API is a good candidate to convert to an asyncio coroutine.

We can do this by writing a ready callback that simply stores the result in a future, and then have our coroutine await that future after initiating the job. For example, the following will asynchronously connect to the session bus:

import asyncio
from gi.repository import GLib, Gio

async def session_bus():
    loop = asyncio.get_running_loop()
    bus_ready = loop.create_future()
    def ready_callback(obj, result):
        try:
            bus = Gio.bus_get_finish(result)
        except GLib.Error as exc:
            loop.call_soon_threadsafe(bus_ready.set_exception, exc)
            return
        loop.call_soon_threadsafe(bus_ready.set_result, bus)

    Gio.bus_get(Gio.BusType.SESSION, None, ready_callback)
    return await bus_ready

We’ve now got an API that is conceptually as simple to use as the synchronous Gio.bus_get_sync call, but won’t block other work the application might be performing.

Most of the code is fairly straight forward: the main wart is the two loop.call_soon_threadsafe calls. While everything is executing in the same thread, my asyncio-glib library does not currently wake the asyncio event loop when called from a GLib callback. The call_soon_threadsafe method does the trick by generating some dummy IO to cause a wake up.

Cancellation

One feature we’ve lost with this wrapper is the ability to cancel the asynchronous job. On the GLib side, this is handled with the GCancellable object. On the asyncio side, tasks are cancelled by injecting an asyncio.CancelledError exception into the coroutine. We can propagate this cancellation to the GLib side fairly seamlessly:

async def session_bus():
    ...
    cancellable = Gio.Cancellable()
    Gio.bus_get(Gio.BusType.SESSION, cancellable, ready_callback)
    try:
        return await bus_ready
    except asyncio.CancelledError:
        cancellable.cancel()
        raise

It’s important to re-raise the CancelledError exception, so that it will propagate up to any calling coroutines and let them perform their own cleanup.

By following this pattern I was able to build enough wrappers to let me connect to the D-Bus daemon and issue asynchronous method calls without needing to chain together large sequences of callbacks. The wrappers were all similar enough that it shouldn’t be too difficult to factor out the common code.

Exploring Github Actions

Post author:James Henstridge
Post published:6 September, 2019
Post category:Uncategorised
Post comments:0 Comments

To help keep myself honest, I wanted to set up automated test runs on a few personal projects I host on Github. At first I gave Travis a try, since a number of projects I contribute to use it, but it felt a bit clunky. When I found Github had a new CI system in beta, I signed up for the beta and was accepted a few weeks later.

While it is still in development, the configuration language feels lean and powerful. In comparison, Travis’s configuration language has obviously evolved over time with some features not interacting properly (e.g. matrix expansion only working on the first job in a workflow using build stages). While I’ve never felt like I had a complete grasp of the Travis configuration language, the single page description of Actions configuration language feels complete.

The main differences I could see between the two systems are:

A Github workflow is composed of multiple jobs right from the start.
All jobs run in parallel by default. It is possible to serialise jobs (similar to Travis’s stages) by declaring dependencies between jobs.
Each job specifies which VM image it will run on, with a choice of Ubuntu, Windows, or MacOS versions. If you choose Ubuntu, you can also specify a Docker container to run your build in, giving access to other Linux build environments.
Each job can have a matrix attached, allowing the job to be duplicated according to a set of parameters.
Jobs are composed of a sequence of steps. Unlike Travis’s fixed set of build phases, these are generic.
Steps can consist of either code executed by the shell or a reference to an external action.
Actions are the primary extension mechanism, and are even used for basic tasks like checking out your repository. Actions are either implemented in JavaScript or as a Docker container. Only JavaScript actions are available for Windows and MacOS jobs.

The first project I converted over was asyncio-glib, where I was using Travis to run the test suite on a selection of Python versions. My old Travis configuration can be seen here, and the new Actions workflow can be seen here. Both versions are roughly equivalent, although the actions/setup-python@v1 action doesn’t currently make beta releases of Python available. The result of a run of the workflow can be seen here.

For a second project (videowhisk), I am running the tests against the VM’s default Python image. For this project, I’m more interested in compatibility with the distro release’s GStreamer libraries than compatibility with different Python versions. I suppose I could extend this using the matrix feature to test on multiple Ubuntu versions, or containers for other Linux releases.

While I’ve just been using this to run the test suite, it looks like Actions can be used for a lot more. A project can have multiple workflows with different triggers, so it can also be used for automated triage of bugs or pull requests (e.g. request a review from a specific developer when a pull request is created that modifies files in a specific directory). It also looks like I could create a workflow to automatically publish to PyPI when I push a new tag to the repository that looks like a version number.

It will be interesting to see what this does to the larger ecosystem of “CI as a service” products built to work with Github. On the one hand having a choice is nice, but on the other hand it’s nice to have something well integrated. I really like Gitlab’s integrated CI system for projects I have hosted on various Gitlab instances, for example.

GLib integration for the Python asyncio event loop

Post author:James Henstridge
Post published:5 August, 2019
Post category:Uncategorised
Post comments:2 Comments

As an evening project, I’ve been working on a small library that integrates the GLib main loop with Python’s asyncio. I think I’ve gotten to the point where it might be useful to other people, so have pushed it up here:

https://github.com/jhenstridge/asyncio-glib

This isn’t the only attempt to integrate the two event loops, but the other I found (Gbulb) is unmaintained and seems to reimplement a fair bit of the asyncio (e.g. it has its own transport classes). So I thought I’d see if I could write something smaller and more maintainable, reusing as much code from the standard library as possible.

My first step was writing an implementation of the selectors.BaseSelector interface in terms of the GLib main loop. The select() method just runs a GMainLoop with a custom source that will quit the loop if any of the file descriptors are ready, or the timeout is reached.

For the asyncio event loop, I was able to reuse the standard library asyncio.SelectorEventLoop with my new selector. In action, it looks something like this:

Let the GMainLoop spin until any asyncio events come in.
Return control to the asyncio event loop to process those events.
Repeat

As far as testing goes, the Python standard library comes with a suite of tests parameterised on an event loop implementation. So I’ve just reused that as the bulk of my test suite, and done the same with the selector tests. There are a handful of test failures I still need to diagnose, but for the most part things just work.

Making an asyncio application use this event loop is simple:

import asyncio
import asyncio_glib
asyncio.set_event_loop_policy(asyncio_glib.GLibEventLoopPolicy())

The main limitation of this code is that it relies on asyncio running the GLib main loop. If some other piece of code runs the main loop, asyncio callbacks will not be triggered and will probably lead to busy looping. This isn’t a problem my project (an asyncio server making use of GStreamer), but would be a problem for e.g. a graphical application calling gtk_dialog_run().

1
2
3
4
…
76
Go to the next page