Converting BigBlueButton recordings to self-contained videos

When the pandemic lock downs started, my local Linux User Group started looking at video conferencing tools we could use to continue presenting talks and other events to members. We ended up adopting BigBlueButton: as well as being Open Source, it's focus on education made it well suited for presenting talks. It has the concept of a presenter role, and built in support for slides (it sends them to viewers as images, rather than another video stream). It can also record sessions for later viewing. To view those recordings though, you need to use BBB's web player. I wanted to make sure we could keep the recordings available should the BBB instance we were using went away. Ideally, we'd just be able to convert the recordings to self contained videos files that could be archived and published along side our other recordings. There are a few tools intended to help with this: bbb-recorder: screen captures Chrome displaying BBB's web player to produce a video. bbb-download: this one is intended to run on the BBB server, and combines slides, screen share and presentation audio using ffmpeg. Does not include webcam footage. I really wanted something that would include both the camera footage and slides in one video, so decided to make my own. The result is bbb-render: https://github.com/plugorgau/bbb-render At the present, it consists of two scripts. The first is download.py, which takes the URL of a public BBB recording and downloads all of its assets to a local folder. The second is make-xges.py, which assembles those assets so they're ready to render. The resources retrieved by the download script include: video/webcams.webm:Video from the presenters' cameras, plus the audio track for the presentation. deskshare/deskshare.webm:Video for screen sharing segments of the presentation. This is the same length as the webcams video, with blank footage when nothing is being shared. deskshare.xml:Timing information for when to show the screen share video, along with the aspect ration for a particular share session shapes.svg:An SVG file with custom timing attributes that is uses to present the slides and whiteboard scribbles. By following links in the SVG, we can download all the slide images. cursor.xml:Mouse cursor position over time. This is used for the "red dot laser pointer" effect. slides_new.xml:Not actually slides. For some reason, this is the text chat replay. My first thought to combine the various parts was to construct a GStreamer pipeline that would play everything back together, using timers to bring slides in and out. This turned out to be easier said than done, so I started looking for something higher level. It turns out GStreamer has that covered in the form of GStreamer Editing Services: a library intended to help write non-linear editing applications. That fits the problem really well: I've got a collection of assets and metadata, so just need to convert all the timing information into an appropriate edit list. I can put the webcam footage in the bottom right corner, ask for a particular slide image to display at a…

Using GAsyncResult APIs with Python’s asyncio

With a GLib implementation of the Python asyncio event loop, I can easily mix asyncio code with GLib/GTK code in the same thread. The next step is to see whether we can use this to make any APIs more convenient to use. A good candidate is APIs that make use of GAsyncResult. These APIs generally consist of one function call that initiates the asynchronous job and takes a callback. The callback will be invoked sometime later with a GAsyncResult object, which can be passed to a "finish" function to convert this to the result type relevant to the original call. This sort of API is a good candidate to convert to an asyncio coroutine. We can do this by writing a ready callback that simply stores the result in a future, and then have our coroutine await that future after initiating the job. For example, the following will asynchronously connect to the session bus: We've now got an API that is conceptually as simple to use as the synchronous Gio.bus_get_sync call, but won't block other work the application might be performing. Most of the code is fairly straight forward: the main wart is the two loop.call_soon_threadsafe calls. While everything is executing in the same thread, my asyncio-glib library does not currently wake the asyncio event loop when called from a GLib callback. The call_soon_threadsafe method does the trick by generating some dummy IO to cause a wake up. Cancellation One feature we've lost with this wrapper is the ability to cancel the asynchronous job. On the GLib side, this is handled with the GCancellable object. On the asyncio side, tasks are cancelled by injecting an asyncio.CancelledError exception into the coroutine. We can propagate this cancellation to the GLib side fairly seamlessly: It's important to re-raise the CancelledError exception, so that it will propagate up to any calling coroutines and let them perform their own cleanup. By following this pattern I was able to build enough wrappers to let me connect to the D-Bus daemon and issue asynchronous method calls without needing to chain together large sequences of callbacks. The wrappers were all similar enough that it shouldn't be too difficult to factor out the common code.

Exploring Github Actions

To help keep myself honest, I wanted to set up automated test runs on a few personal projects I host on Github.  At first I gave Travis a try, since a number of projects I contribute to use it, but it felt a bit clunky.  When I found Github had a new CI system in beta, I signed up for the beta and was accepted a few weeks later. While it is still in development, the configuration language feels lean and powerful.  In comparison, Travis's configuration language has obviously evolved over time with some features not interacting properly (e.g. matrix expansion only working on the first job in a workflow using build stages).  While I've never felt like I had a complete grasp of the Travis configuration language, the single page description of Actions configuration language feels complete. The main differences I could see between the two systems are: A Github workflow is composed of multiple jobs right from the start. All jobs run in parallel by default.  It is possible to serialise jobs (similar to Travis's stages) by declaring dependencies between jobs. Each job specifies which VM image it will run on, with a choice of Ubuntu, Windows, or MacOS versions.  If you choose Ubuntu, you can also specify a Docker container to run your build in, giving access to other Linux build environments. Each job can have a matrix attached, allowing the job to be duplicated according to a set of parameters. Jobs are composed of a sequence of steps.  Unlike Travis's fixed set of build phases, these are generic. Steps can consist of either code executed by the shell or a reference to an external action. Actions are the primary extension mechanism, and are even used for basic tasks like checking out your repository.  Actions are either implemented in JavaScript or as a Docker container.  Only JavaScript actions are available for Windows and MacOS jobs. The first project I converted over was asyncio-glib, where I was using Travis to run the test suite on a selection of Python versions.  My old Travis configuration can be seen here, and the new Actions workflow can be seen here.  Both versions are roughly equivalent, although the actions/setup-python@v1 action doesn't currently make beta releases of Python available. The result of a run of the workflow can be seen here. For a second project (videowhisk), I am running the tests against the VM's default Python image.  For this project, I'm more interested in compatibility with the distro release's GStreamer libraries than compatibility with different Python versions.  I suppose I could extend this using the matrix feature to test on multiple Ubuntu versions, or containers for other Linux releases. While I've just been using this to run the test suite, it looks like Actions can be used for a lot more.  A project can have multiple workflows with different triggers, so it can also be used for automated triage of bugs or pull requests (e.g. request a review from a specific developer when…

GLib integration for the Python asyncio event loop

As an evening project, I've been working on a small library that integrates the GLib main loop with Python's asyncio. I think I've gotten to the point where it might be useful to other people, so have pushed it up here: https://github.com/jhenstridge/asyncio-glib This isn't the only attempt to integrate the two event loops, but the other I found (Gbulb) is unmaintained and seems to reimplement a fair bit of the asyncio (e.g. it has its own transport classes). So I thought I'd see if I could write something smaller and more maintainable, reusing as much code from the standard library as possible. My first step was writing an implementation of the selectors.BaseSelector interface in terms of the GLib main loop. The select() method just runs a GMainLoop with a custom source that will quit the loop if any of the file descriptors are ready, or the timeout is reached. For the asyncio event loop, I was able to reuse the standard library asyncio.SelectorEventLoop with my new selector. In action, it looks something like this: Let the GMainLoop spin until any asyncio events come in. Return control to the asyncio event loop to process those events. Repeat As far as testing goes, the Python standard library comes with a suite of tests parameterised on an event loop implementation. So I've just reused that as the bulk of my test suite, and done the same with the selector tests. There are a handful of test failures I still need to diagnose, but for the most part things just work. Making an asyncio application use this event loop is simple: import asyncio import asyncio_glib asyncio.set_event_loop_policy(asyncio_glib.GLibEventLoopPolicy()) The main limitation of this code is that it relies on asyncio running the GLib main loop. If some other piece of code runs the main loop, asyncio callbacks will not be triggered and will probably lead to busy looping. This isn't a problem my project (an asyncio server making use of GStreamer), but would be a problem for e.g. a graphical application calling gtk_dialog_run().

Extracting BIOS images and tools from ThinkPad update ISOs

With my old ThinkPad, Lenovo provided BIOS updates in the form of Windows executables or ISO images for a bootable CD.  Since I had wiped Windows partition, the first option wasn't an option.  The second option didn't work either, since it expected me to be using the drive in the base I hadn't bought.  Luckily I was able to just copy the needed files out of the ISO image to a USB stick that had been set up to boot DOS. When I got my new ThinkPad, I had hoped to do the same thing but found that the update ISO images appeared to be empty when mounted.  It seems that the update is handled entirely from an El Torito emulated hard disk image (as opposed to using the image only to bootstrap the drivers needed to access the CD). So I needed some way to extract that boot image from the ISO.  After a little reading of the spec, I put together the following Python script that does the trick: import struct import sys SECTOR_SIZE = 2048 def find_image(fp): # el-torito boot record descriptor fp.seek(0x11 * SECTOR_SIZE) data = fp.read(SECTOR_SIZE) assert data[:0x47] == b'\x00CD001\x01EL TORITO SPECIFICATION' + b'\x0' * 41 boot_catalog_sector = struct.unpack('<L', data[0x47:0x4B])[0] # check the validation entry in the catalog fp.seek(boot_catalog_sector * SECTOR_SIZE) data = fp.read(0x20) assert data[0:1] == b'\x01' assert data[0x1e:0x20] == b'\x55\xAA' assert sum(struct.unpack('<16H', data)) % 0x10000 == 0 # Read the initial/default entry data = fp.read(0x20) (bootable, image_type, load_segment, system_type, sector_count, image_sector) = struct.unpack('<BBHBxHL', data[:12]) image_offset = image_sector * SECTOR_SIZE if image_type == 1: # 1.2MB floppy image_size = 1200 * 1024 elif image_type == 2: # 1.44MB floppy image_size = 1440 * 1024 elif image_type == 3: # 2.88MB floppy image_size = 2880 * 1024 elif image_type == 4: # Hard disk image. Read the MBR partition table to locate file system fp.seek(image_offset) data = fp.read(512) # Read the first partition entry (bootable, part_type, part_start, part_size) = struct.unpack_from( '<BxxxBxxxLL', data, 0x1BE) assert bootable == 0x80 # is partition bootable? image_offset += part_start * 512 image_size = part_size * 512 else: raise AssertionError('unhandled image format: %d' % image_type) fp.seek(image_offset) return fp.read(image_size) if __name__ == '__main__': with open(sys.argv[1], 'rb') as iso, open(sys.argv[2], 'wb') as img: img.write(find_image(iso)) It isn't particularly pretty, but does the job and spits out a 32MB FAT disk image when run on the ThinkPad X230 update ISOs. It is then a pretty easy task of copying those files onto the USB stick to run the update as before. Hopefully owners of similar laptops find this useful. There appears to be an EFI executable in there too, so it is possible that the firmware update could be run from the EFI system partition too.  I haven't had the courage to try that though.