Converting BigBlueButton recordings to self-contained videos

When the pandemic lock downs started, my local Linux User Group started looking at video conferencing tools we could use to continue presenting talks and other events to members. We ended up adopting BigBlueButton: as well as being Open Source, it’s focus on education made it well suited for presenting talks. It has the concept of a presenter role, and built in support for slides (it sends them to viewers as images, rather than another video stream). It can also record sessions for later viewing.

To view those recordings though, you need to use BBB’s web player. I wanted to make sure we could keep the recordings available should the BBB instance we were using went away. Ideally, we’d just be able to convert the recordings to self contained videos files that could be archived and published along side our other recordings. There are a few tools intended to help with this:

  • bbb-recorder: screen captures Chrome displaying BBB’s web player to produce a video.
  • bbb-download: this one is intended to run on the BBB server, and combines slides, screen share and presentation audio using ffmpeg. Does not include webcam footage.

I really wanted something that would include both the camera footage and slides in one video, so decided to make my own. The result is bbb-render:

https://github.com/plugorgau/bbb-render

At the present, it consists of two scripts. The first is download.py, which takes the URL of a public BBB recording and downloads all of its assets to a local folder. The second is make-xges.py, which assembles those assets so they’re ready to render.

The resources retrieved by the download script include:

video/webcams.webm:
Video from the presenters’ cameras, plus the audio track for the presentation.
deskshare/deskshare.webm:
Video for screen sharing segments of the presentation. This is the same length as the webcams video, with blank footage when nothing is being shared.
deskshare.xml:
Timing information for when to show the screen share video, along with the aspect ration for a particular share session
shapes.svg:
An SVG file with custom timing attributes that is uses to present the slides and whiteboard scribbles. By following links in the SVG, we can download all the slide images.
cursor.xml:
Mouse cursor position over time. This is used for the “red dot laser pointer” effect.
slides_new.xml:
Not actually slides. For some reason, this is the text chat replay.

My first thought to combine the various parts was to construct a GStreamer pipeline that would play everything back together, using timers to bring slides in and out. This turned out to be easier said than done, so I started looking for something higher level.

It turns out GStreamer has that covered in the form of GStreamer Editing Services: a library intended to help write non-linear editing applications. That fits the problem really well: I’ve got a collection of assets and metadata, so just need to convert all the timing information into an appropriate edit list. I can put the webcam footage in the bottom right corner, ask for a particular slide image to display at a particular point on the timeline and go away at another point, display screen share footage, etc. It also made it easy to add a backdrop image to fill in the blank space around the slides and camera and add a bit of branding to the result.

On top of that, I can serialise that edit list to a file, rather than encoding the video directly. The ges-launch-1.0 utility can load the project to quickly play back the result without without having to wait for the video to encode.

I can even load the project in Pitivi, a video editor built on top of GES:

screenshot of Pitivi video editor

This makes it very easy to scrub through the timeline to quickly verify that everything looks correct.

At this point, the scripts can produce a crisp 1080p video that should be good enough for most presentations. There are a few areas that could be improved though:

  • If there are multiple presenters with their webcam on, we still get a single webcam video with each presenter feed shown in a square grid. It would probably look better to try and stack each presenter vertically. This could probably be done by applying videocrop as an effect to extract each individual presenter, and include the video multiple times in the project.
  • The data in cursor.xml is ignored. It would be pretty easy to display a small red circle image at the correct times and positions.
  • Whiteboard scribbles are also ignored. This would be a bit trickier to implement. It would probably involve dissecting shapes.svg into a sequence of SVGs containing the elements visible at each point in time. Making matters more complicated, the JavaScript web player adjusts the viewBox when switching to/from slides and screen share, and that changes how the coordinates of the scribbles are interpreted.

As GUADEC is using BigBlueButton this year, hopefully it should help with processing the recordings into individual videos.

Exploring Github Actions

To help keep myself honest, I wanted to set up automated test runs on a few personal projects I host on Github.  At first I gave Travis a try, since a number of projects I contribute to use it, but it felt a bit clunky.  When I found Github had a new CI system in beta, I signed up for the beta and was accepted a few weeks later.

While it is still in development, the configuration language feels lean and powerful.  In comparison, Travis’s configuration language has obviously evolved over time with some features not interacting properly (e.g. matrix expansion only working on the first job in a workflow using build stages).  While I’ve never felt like I had a complete grasp of the Travis configuration language, the single page description of Actions configuration language feels complete.

The main differences I could see between the two systems are:

  1. A Github workflow is composed of multiple jobs right from the start.
  2. All jobs run in parallel by default.  It is possible to serialise jobs (similar to Travis’s stages) by declaring dependencies between jobs.
  3. Each job specifies which VM image it will run on, with a choice of Ubuntu, Windows, or MacOS versions.  If you choose Ubuntu, you can also specify a Docker container to run your build in, giving access to other Linux build environments.
  4. Each job can have a matrix attached, allowing the job to be duplicated according to a set of parameters.
  5. Jobs are composed of a sequence of steps.  Unlike Travis’s fixed set of build phases, these are generic.
  6. Steps can consist of either code executed by the shell or a reference to an external action.
  7. Actions are the primary extension mechanism, and are even used for basic tasks like checking out your repository.  Actions are either implemented in JavaScript or as a Docker container.  Only JavaScript actions are available for Windows and MacOS jobs.

The first project I converted over was asyncio-glib, where I was using Travis to run the test suite on a selection of Python versions.  My old Travis configuration can be seen here, and the new Actions workflow can be seen here.  Both versions are roughly equivalent, although the actions/setup-python@v1 action doesn’t currently make beta releases of Python available. The result of a run of the workflow can be seen here.

For a second project (videowhisk), I am running the tests against the VM’s default Python image.  For this project, I’m more interested in compatibility with the distro release’s GStreamer libraries than compatibility with different Python versions.  I suppose I could extend this using the matrix feature to test on multiple Ubuntu versions, or containers for other Linux releases.

While I’ve just been using this to run the test suite, it looks like Actions can be used for a lot more.  A project can have multiple workflows with different triggers, so it can also be used for automated triage of bugs or pull requests (e.g. request a review from a specific developer when a pull request is created that modifies files in a specific directory). It also looks like I could create a workflow to automatically publish to PyPI when I push a new tag to the repository that looks like a version number.

It will be interesting to see what this does to the larger ecosystem of “CI as a service” products built to work with Github.  On the one hand having a choice is nice, but on the other hand it’s nice to have something well integrated.  I really like Gitlab’s integrated CI system for projects I have hosted on various Gitlab instances, for example.

Seeking in Transcoded Streams with Rygel

When looking at various UPnP media servers, one of the features I wanted was the ability to play back my music collection through my PlayStation 3.  The complicating factor is that most of my collection is encoded in Vorbis format, which is not yet supported by the PS3 (at this point, it doesn’t seem likely that it ever will).

Both MediaTomb and Rygel could handle this to an extent, transcoding the audio to raw LPCM data to send over the network.  This doesn’t require much CPU power on the server side, and only requires 1.4 Mbit/s of bandwidth, which is manageable on most home networks.  Unfortunately the only playback controls enabled in this mode are play and stop: if you want to pause, fast forward or rewind then you’re out of luck.

Given that Rygel has a fairly simple code base, I thought I’d have a go at fixing this.  The first solution I tried was the one I’ve mentioned a few times before: with uncompressed PCM data file offsets can be easily converted to sample numbers, so if the source format allows time based seeking, we can easily satisfy byte range requests.

I got a basic implementation of this working, but it was a little bit jumpy and not as stable as I’d like.  Before fully debugging it, I started looking at the mysterious DLNA options I’d copied over to get things working.  One of those was the “DLNA operation”, which was set to “range” mode.  Looking at the GUPnP header files, I noticed there was another value named “timeseek”.  When I picked this option, the HTTP requests from the PS3 changed:

GET /... HTTP/1.1
Host: ...
User-Agent: PLAYSTATION 3
Connection: Keep-Alive
Accept-Encoding: identity
TimeSeekRange.dlna.org: npt=0.00-
transferMode.dlna.org: Streaming

The pause, rewind and fast forward controls were now active, although only the pause control actually worked properly. After fast forwarding or rewinding, the PS3 would issue another HTTP request with the TimeSeekRange.dlna.org header specifying the new offset, but the playback position would reset to the start of the track when the operation completed. After a little more experimentation, I found that the playback position didn’t reset if I included TimeSeekRange.dlna.org in the response headers. Of course, I was still sending back the beginning of the track at this point but the PS3 acted as though it was playing from the new point in the song.

It wasn’t much more work to update the GStreamer calls to seek to the requested offset before playback and things worked pretty much as well as for non-transcoded files.  And since this solution didn’t involve byte offsets, it also worked for Rygel’s other transcoders.  It even worked to an extent with video files, but the delay before playback was a bit too high to make it usable — fixing that would probably require caching the GStreamer pipeline between HTTP requests.

Thoughts on DLNA

While it can be fun to reverse engineer things like this, it was a bit annoying to only be able to find out about the feature by reading header files written by people with access to the specification.  I can understand having interoperability and certification requirements to use the DLNA logo, but that does not require that the specifications be private.

As well as keeping the specification private, it feels like some aspects have been intentionally obfuscated, using bit fields represented in both binary and hexadecimal string representations inside the resource’s protocol info.  This might seem reasonable if it was designed for easy parsing, but you need to go through two levels of XML processing (the SOAP envelope and then the DIDL payload) to get to these flags.  Furthermore, the attributes inherited from the UPnP MediaServer specifications are all human readable so it doesn’t seem like an arbitrary choice.

On the bright side, I suppose we’re lucky they didn’t use cryptographic signatures to lock things down like Apple has with some of their protocols and file formats.

Watching iView with Rygel

One of the features of Rygel that I found most interesting was the external media server support.  It looked like an easy way to publish information on the network without implementing a full UPnP/DLNA media server (i.e. handling the UPnP multicast traffic, transcoding to a format that the remote system can handle, etc).

As a small test, I put together a server that exposes the ABC‘s iView service to UPnP media renderers.  The result is a bit rough around the edges, but the basic functionality works.  The source can be grabbed using Bazaar:

bzr branch lp:~jamesh/+junk/rygel-iview

It needs Python, Twisted, the Python bindings for D-Bus and rtmpdump to run.  The program exports the guide via D-Bus, and uses rtmpdump to stream the shows via HTTP.  Rygel then publishes the guide via the UPnP media server protocol and provides MPEG2 versions of the streams if clients need them.

There are still a few rough edges though.  The video from iView comes as 640×480 with a 16:9 aspect ratio so has a 4:3 pixel aspect ratio, but there is nothing in the video file to indicate this (I am not sure if flash video supports this metadata).

Getting Twisted and D-Bus to cooperate

Since I’d decided to use Twisted, I needed to get it to cooperate with the D-Bus bindings for Python.  The first step here was to get both libraries using the same event loop.  This can be achieved by setting Twisted to use the glib2 reactor, and enabling the glib mainloop integration in the D-Bus bindings.

Next was enabling asynchronous D-Bus method implementations.  There is support for this in the D-Bus bindings, but has quite a different (and less convenient) API compared to Twisted.  A small decorator was enough to overcome this impedence:

from functools import wraps

import dbus.service
from twisted.internet import defer

def dbus_deferred_method(*args, **kwargs):
    def decorator(function):
        function = dbus.service.method(*args, **kwargs)(function)
        @wraps(function)
        def wrapper(*args, **kwargs):
            dbus_callback = kwargs.pop('_dbus_callback')
            dbus_errback = kwargs.pop('_dbus_errback')
            d = defer.maybeDeferred(function, *args, **kwargs)
            d.addCallbacks(
                dbus_callback, lambda failure: dbus_errback(failure.value))
        wrapper._dbus_async_callbacks = ('_dbus_callback', '_dbus_errback')
        return wrapper
    return decorator

This decorator could then be applied to methods in the same way as the @dbus.service.method method, but it would correctly handle the case where the method returns a Deferred. Unfortunately it can’t be used in conjunction with @defer.inlineCallbacks, since the D-Bus bindings don’t handle varargs functions properly. You can of course call another function or method that uses @defer.inlineCallbacks though.

The iView Guide

After coding this, it became pretty obvious why it takes so long to load up the iView flash player: it splits the guide data over almost 300 XML files.  This might make sense if it relied on most of these files remaining unchanged and stored in cache, however it also uses a cache-busting technique when requesting them (adding a random query component to the URL).

Most of these files are series description files (some for finished series with no published programs).  These files contain a title, a short description, the URL for a thumbnail image and the IDs for the programs belonging to the series.  To find out about those programs, you need to load all the channel guide XML files until you find which one contains the program.  Going in the other direction, if you’ve got a program description from the channel guide and want to know about the series it belongs to (e.g. to get the thumbnail), you need to load each series description XML file until you find the one that contains the program.  So there aren’t many opportunities to delay loading of parts of the guide.

The startup time would be a lot easier if this information was collapsed down to a smaller number of larger XML files.

More Rygel testing

In my last post, I said I had trouble getting Rygel’s tracker backend to function and assumed that it was expecting an older version of the API.  It turns out I was incorrect and the problem was due in part to Ubuntu specific changes to the Tracker package and the unusual way Rygel was trying to talk to Tracker.

The Tracker packages in Ubuntu remove the D-Bus service activation file for the “org.freedesktop.Tracker” bus name so that if the user has not chosen to run the service (or has killed it), it won’t be automatically activated.  Unfortunately, instead of just calling a Tracker D-Bus method, Rygel was trying to manually activate Tracker via a StartServiceByName() call.  This would fail even if Tracker was running, hence my assumption that it was a tracker API version problem.

This problem will be fixed in the next Rygel release: it will call a method on Tracker directly to see if it is available.  With that problem out of the way, I was able to try out the backend.  It was providing a lot more metadata to the PS3 so more files were playable, which was good.  Browsing folders was also much quicker than the folder back end.  There were a few problems though:

  1. Files are exposed in one of three folders: “All Images”, “All Music” or “All Videos”.  With even a moderate sized music collection, this is unmangeable.  It wasn’t clear what order the files were being displayed in either.
  2. There was quite a long delay before video playback starts.

When the folder back end fixes the metadata and speed issues, I’d be inclined to use it over the tracker back end.

Video Transcoding

Getting video transcoding working turned out to require a newer GStreamer (0.10.23), the “unstripped” ffmpeg libraries and the “bad” GStreamer plugins package from multiverse.  With those installed, things worked pretty well.  With these dependencies encoded in the packaging, it’d be pretty painless to get it set up.  Certainly much easier than setting things up in MediaTomb’s configuration file.