Converting BigBlueButton recordings to self-contained videos

When the pandemic lock downs started, my local Linux User Group started looking at video conferencing tools we could use to continue presenting talks and other events to members. We ended up adopting BigBlueButton: as well as being Open Source, it’s focus on education made it well suited for presenting talks. It has the concept of a presenter role, and built in support for slides (it sends them to viewers as images, rather than another video stream). It can also record sessions for later viewing.

To view those recordings though, you need to use BBB’s web player. I wanted to make sure we could keep the recordings available should the BBB instance we were using went away. Ideally, we’d just be able to convert the recordings to self contained videos files that could be archived and published along side our other recordings. There are a few tools intended to help with this:

  • bbb-recorder: screen captures Chrome displaying BBB’s web player to produce a video.
  • bbb-download: this one is intended to run on the BBB server, and combines slides, screen share and presentation audio using ffmpeg. Does not include webcam footage.

I really wanted something that would include both the camera footage and slides in one video, so decided to make my own. The result is bbb-render:

https://github.com/plugorgau/bbb-render

At the present, it consists of two scripts. The first is download.py, which takes the URL of a public BBB recording and downloads all of its assets to a local folder. The second is make-xges.py, which assembles those assets so they’re ready to render.

The resources retrieved by the download script include:

video/webcams.webm:
Video from the presenters’ cameras, plus the audio track for the presentation.
deskshare/deskshare.webm:
Video for screen sharing segments of the presentation. This is the same length as the webcams video, with blank footage when nothing is being shared.
deskshare.xml:
Timing information for when to show the screen share video, along with the aspect ration for a particular share session
shapes.svg:
An SVG file with custom timing attributes that is uses to present the slides and whiteboard scribbles. By following links in the SVG, we can download all the slide images.
cursor.xml:
Mouse cursor position over time. This is used for the “red dot laser pointer” effect.
slides_new.xml:
Not actually slides. For some reason, this is the text chat replay.

My first thought to combine the various parts was to construct a GStreamer pipeline that would play everything back together, using timers to bring slides in and out. This turned out to be easier said than done, so I started looking for something higher level.

It turns out GStreamer has that covered in the form of GStreamer Editing Services: a library intended to help write non-linear editing applications. That fits the problem really well: I’ve got a collection of assets and metadata, so just need to convert all the timing information into an appropriate edit list. I can put the webcam footage in the bottom right corner, ask for a particular slide image to display at a particular point on the timeline and go away at another point, display screen share footage, etc. It also made it easy to add a backdrop image to fill in the blank space around the slides and camera and add a bit of branding to the result.

On top of that, I can serialise that edit list to a file, rather than encoding the video directly. The ges-launch-1.0 utility can load the project to quickly play back the result without without having to wait for the video to encode.

I can even load the project in Pitivi, a video editor built on top of GES:

screenshot of Pitivi video editor

This makes it very easy to scrub through the timeline to quickly verify that everything looks correct.

At this point, the scripts can produce a crisp 1080p video that should be good enough for most presentations. There are a few areas that could be improved though:

  • If there are multiple presenters with their webcam on, we still get a single webcam video with each presenter feed shown in a square grid. It would probably look better to try and stack each presenter vertically. This could probably be done by applying videocrop as an effect to extract each individual presenter, and include the video multiple times in the project.
  • The data in cursor.xml is ignored. It would be pretty easy to display a small red circle image at the correct times and positions.
  • Whiteboard scribbles are also ignored. This would be a bit trickier to implement. It would probably involve dissecting shapes.svg into a sequence of SVGs containing the elements visible at each point in time. Making matters more complicated, the JavaScript web player adjusts the viewBox when switching to/from slides and screen share, and that changes how the coordinates of the scribbles are interpreted.

As GUADEC is using BigBlueButton this year, hopefully it should help with processing the recordings into individual videos.

GLib integration for the Python asyncio event loop

As an evening project, I’ve been working on a small library that integrates the GLib main loop with Python’s asyncio. I think I’ve gotten to the point where it might be useful to other people, so have pushed it up here:

https://github.com/jhenstridge/asyncio-glib

This isn’t the only attempt to integrate the two event loops, but the other I found (Gbulb) is unmaintained and seems to reimplement a fair bit of the asyncio (e.g. it has its own transport classes). So I thought I’d see if I could write something smaller and more maintainable, reusing as much code from the standard library as possible.

My first step was writing an implementation of the selectors.BaseSelector interface in terms of the GLib main loop. The select() method just runs a GMainLoop with a custom source that will quit the loop if any of the file descriptors are ready, or the timeout is reached.

For the asyncio event loop, I was able to reuse the standard library asyncio.SelectorEventLoop with my new selector. In action, it looks something like this:

  1. Let the GMainLoop spin until any asyncio events come in.
  2. Return control to the asyncio event loop to process those events.
  3. Repeat

As far as testing goes, the Python standard library comes with a suite of tests parameterised on an event loop implementation. So I’ve just reused that as the bulk of my test suite, and done the same with the selector tests. There are a handful of test failures I still need to diagnose, but for the most part things just work.

Making an asyncio application use this event loop is simple:

import asyncio
import asyncio_glib
asyncio.set_event_loop_policy(asyncio_glib.GLibEventLoopPolicy())

The main limitation of this code is that it relies on asyncio running the GLib main loop. If some other piece of code runs the main loop, asyncio callbacks will not be triggered and will probably lead to busy looping. This isn’t a problem my project (an asyncio server making use of GStreamer), but would be a problem for e.g. a graphical application calling gtk_dialog_run().

ThinkPad Infrared Camera

One of the options available when configuring the my ThinkPad was an Infrared camera. The main selling point being “Windows Hello” facial recognition based login. While I wasn’t planning on keeping Windows on the system, I was curious to see what I could do with it under Linux. Hopefully this is of use to anyone else trying to get it to work.

The camera is manufactured by Chicony Electronics (probably a CKFGE03 or similar), and shows up as two USB devices:

04f2:b5ce Integrated Camera
04f2:b5cf Integrated IR Camera

Both devices are bound by the uvcvideo driver, showing up as separate video4linux devices. Interestingly, the IR camera seems to be assigned /dev/video0, so generally gets picked by apps in preference to the colour camera. Unfortunately, the image it produces comes up garbled:

So it wasn’t going to be quite so easy to get things working. Looking at the advertised capture modes, the camera supports Motion-JPEG and YUYV raw mode. So I tried capturing a few JPEG frames with the following GStreamer pipeline:

gst-launch-1.0 v4l2src device=/dev/video0 num-buffers=10 ! image/jpeg ! multifilesink location="frame-%02d.jpg"

Unlike in raw mode, the red illumination LEDs started flashing when in JPEG mode, which resulted in frames having alternating exposures. Here’s one of the better exposures:

What is interesting is that the JPEG frames have a different aspect ratio to the raw version: a more normal 640×480 rather than 400×480. So to start, I captured a few raw frames:

gst-launch-1.0 v4l2src device=/dev/video0 num-buffers=10 ! "video/x-raw,format=(string)YUY2" ! multifilesink location="frame-%02d.raw"

The illumination LEDs stayed on constantly while recording in raw mode. The contents of the raw frames show something strange:

00000000  11 48 30 c1 04 13 44 20  81 04 13 4c 20 41 04 13  |.H0...D ...L A..|
00000010  40 10 41 04 11 40 10 81  04 11 44 00 81 04 12 40  |@.A..@....D....@|
00000020  00 c1 04 11 50 10 81 04  12 4c 10 81 03 11 44 00  |....P....L....D.|
00000030  41 04 10 48 30 01 04 11  40 10 01 04 11 40 10 81  |A..H0...@....@..|
...

The advertised YUYV format encodes two pixels in four bytes, so you would expect any repeating patterns to occur at a period of four bytes. But the data in these frames seems to repeat at a period of five bytes.

Looking closer it is actually repeating at a period of 10 bits, or four packed values for every five bytes. Furthermore, the 800 byte rows work out to 640 pixels when interpreted as packed 10 bit values (rather than the advertised 400 pixels), which matches the dimensions of the JPEG mode.

The following Python code can unpack the 10-bit pixel values:

def unpack(data):
    result = []
    for i in range(0, len(data), 5):
        block = (data[i] |
                 data[i+1] << 8 |
                 data[i+2] << 16 |
                 data[i+3] << 24 |
                 data[i+4] << 32)
        result.append((block >> 0) & 0x3ff)
        result.append((block >> 10) & 0x3ff)
        result.append((block >> 20) & 0x3ff)
        result.append((block >> 30) & 0x3ff)
    return result

After adjusting the brightness while converting to 8-bit greyscale, I get a usable image. Compare a fake YUYV frame with the decoded version:

I suppose this logic could be wrapped up in a GStreamer element to get usable infrared video capture.

I’m still not clear why the camera would lie about the pixel format it produces. My best guess is that they wanted to use the standard USB Video Class driver on Windows, and this let them get at the raw data to process in user space.