GStreamer continuous testing (Part 1)

History so far

For the past 6-9 months, as part of some of the tasks I’ve been handling at Collabora, I’ve been working on setting up a continuous build and testing system for GStreamer. For those who’ve been following GStreamer for long enough, you might remember we had a buildbot instance back around 2005-2006, which continuously built and ran checks on every commit. And when it failed, it would also notify the developers on IRC (in more or less polite terms) that they’d broken the build.

The result was that master (sorry, I mean main, we were using CVS back then) was guaranteed to always be in a buildable state and tests always succeeded. Great, no regressions, no surprises.

At some point in time (around 2007 I think ?) the buildbot was no longer used/maintained… And eventually subtle issues crept in, you wouldn’t be guaranteed checkouts always compile, tests eventually broke, you’d need to track what introduced a regression (git bisect makes that easier, but avoiding it in the first place is even better), etc…

What to test

Fast-forward to 2013, after talking so much about it, it was time to bring back such a system in place. Quite a few things have changed since:

  • There’s a lot more code. In 2005, when 0.10 was released, the GStreamer project was around 400kLOC. We’re now around 1.5MLOC ! And I’m not even taking into account all the dependency code we use in cerbero, the system for building binary SDK releases.
  • There are more usages that we didn’t have back then. New modules (rtsp-server, editing-services, orc now under the GStreamer project umbrella, ..)
  • We provide binary releases for Windows, MacOSX, iOS, Android, …

The problems to tackle were “What do we test ? How do we spot regressions ? How to make it as useful as possible to developers ?”.

In order for a CI system to be useful, you want to limit the Signal-To-Noise ratio as much as possible. Just enabling a massive bunch of tests/use-cases with millions of things to fix is totally useless. Not only is it depressing to see millions of failed tests, but also you can’t spot regressions easily and essentially people don’t care anymore (it’s just noise). You want the system to become a simple boolean (Either everything passes, or something failed. And if it failed, it was because of that last commit(s)). In order to cope with that, you gradually activate/add items to do and check. The bare minimum was essentially testing whether all of GStreamer compiled on a standard linux setup. That serves as a reference point. If someone breaks the build, it becomes useful, you’ve spotted a regression, you can fix it. As time goes by, you start adding other steps and builds (make check passes on gstreamer core, activate that, passes on gst-plugins-base, activate that, cerbero builds fully/cleanly on debian, activate that, etc…).

The other important part is that you want to know as quickly as possible whether a regression was introduced. If you need to wait 3 hours for the CI system to report a regression … that person will have gone to sleep or be taken up by something else. If you know within 10-15mins, then it’s still fresh in their head, they are most likely still online, and you can correct the issue as quickly as possible.

Finally, what do we test ? GStreamer has gotten huge. in that sentence GStreamer is actually not just one module, but a whole collection (GStreamer core, gst-plugins*, but also ORC, gst-rtsp-server, gnonlin, gst-editing-services, ….). Whatever we produce for every release … must be covered. So this now includes the binary releases (formerly from gstreamer.com, but which are handled by the GStreamer project itself since 1.x). So we also need to make sure nothing breaks on all the platforms we target (Linux, Android, OSX, iOS, Windows, …).

To summarize

  1. CI system must be set-up progressively (to detect regressions)
  2. CI system must be fast (so person who introduced the regression can fix it ASAP)
  3. CI system must cover all our offering (including cerbero binary builds)

The result is here (yes, I know, we’re working on fixing the certificates once it moves to the final namespace).

How this was implemented, and what challenges were encountered and handled will be covered in a next post.