Automated testing of GNOME Shell

Automated testing is important to ensure software continues to behave as it is intended and it’s part of more or less all modern software projects, including GNOME Shell and many of the dependencies it builds upon. However, as with most testing, we can always do better to get more complete testing. In this post, we’ll dive into how we recently improved testing in GNOME Shell, and what this unlocks in terms of future testability.

Already existing testing

GNOME Shell already performs testing as part of its continuous integration pipeline (CI), but tests have been limited to unit testing, meaning testing selected components in isolation ensuring they behave as expected, but due to of the nature of the functionalities that Shell implements, the amount of testing one can do as unit testing is rather limiting. Primarily, in something like GNOME Shell, it is just as important to test how things behave when used in their natural environment, i.e. instead of testing specific functionality in isolation, the whole Shell instance needs to be executed with all bits and pieces running as a whole, as if it was a real session.

In other words, what we need is being able running all of GNOME Shell as if it was installed and logged in into on a real system.

Test Everything

As discussed, to actually test enough things, we need to run all of GNOME Shell with all its features, as if it was a real session. What this also means is that we don’t necessarily have the ability to set up actual test cases filled with asserts as one does with unit testing; instead we need mechanisms to verify the state of the compositor in a way that looks more like regular usage. Enter “perf tests“.

Since many years back, GNOME Shell has had automated performance tests, that would measure how well the Shell performed doing various tasks. Each test is a tiny JavaScript function that performs a few operations, while making sure all the performed operations actually happened, and when it finishes, the Shell instance is terminated. For example, a “perf test” could look like

Open overview
Open notifications
Close notifications
Leave overview

As is it turns out, this infrastructure fits rather neatly with the kind of testing we want to add here – tests that that perform various tasks that exercise user facing functionality.

There are, however, more ways to verify that things behave as expected other than triggering these operations and ensuring that they executed correctly. The most immediate next step is to ensure that there were no warnings logged during the whole test run. This is useful in part due to the fact that GNOME Shell is largely written in JavaScript, as this means the APIs provided by lower level components such as Mutter and GLib tend to have runtime input validation in introspected API entry points. Consequently, if an API is misused by some JavaScript code, it tends to result in warnings being logged. We can be more confident that a particular change won’t introduce regressions when it runs GNOME Shell completely without warnings.

This, however, is easier said than done, for two main reasons: we’ll be running in a container, and the complications that comes with mixing memory management models of different programming languages.

Running GNOME Shell in a container

For tests to be useful, they need to run in CI. Running in CI means running in a container, and that is not all that straightforward when it comes to compositors. The containerized environment is rather different than running on a regularly installed and setup Linux distribution; it lack many services that are expected to be running, and provide important functionality needed to build a desktop environment, like service and session management (e.g. logging out), system management (e.g. rebooting), dealing with network connectivity, and so on.

Running with most of these services missing is possible, but results in many warnings, and a partially broken session. To get any useful testing done, we need to eliminate all of these warnings, without just silencing them. Enter service mocking.

Mocked D-Bus Services

In the world of testing, “mocking” involves creating an implementation of an API, without the actual real world API implementation sitting behind it. Often these mocked services provide a limited pre-defined subset of functionality, for example hard coding results of API operations given a pre-defined set of possible input arguments. Sometimes, mocked APIs can simply only be there to pretend a service available, and nothing more is needed unless the functionality it provides needs to be actively triggered.

As part of CI testing in Mutter, the basic building blocks for mocking services needed to run a display server in CI have been implemented, but GNOME Shell needs many more compared to plain Mutter. As of this writing, in addition to the few APIs Mutter relies on, GNOME Shell also needs the following:

org.freedesktop.Accounts (accountsservice) – For e.g. the lock screen
org.freedesktop.UPower (upower) – E.g. battery status
org.freedesktop.NetworkManager (NetworkManager) – Manage internet
org.freedesktop.PolicyKit1 (polkit) – Act as a PolKit agent
net.hadess.PowerProfiles (power-profiles-daemon) – Power profiles management
org.gnome.DisplayManage (gdm) – Registering with GDM
org.freedesktop.impl.portal.PermissionStore (xdg-permission-store) – Permission checking
org.gnome.SessionManager (gnome-session) – Log out / Reboot / …
org.freedesktop.GeoClue2 (GeoClue) – Geolocation control
org.gnome.Shell.CalendarServer (gnome-shell-calendar-server) – Calendar integration

The mock services used by Mutter are implemented using python-dbusmock, and Mutter conveniently installs its own service mocking implementations. Building on top of this, we can easily continue mocking API after API until all the needed ones are provided.

As of now, either upstream python-dbusmock or GNOME Shell have mock implementations of all the mentioned services. All but one, org.freedesktop.Accounts, either existed or needed a trivial implementation. In the future, for further testing that involves interacting with the system, e.g. configuring Wi-Fi, we will need expand what these mocked API implementations can do, but for what we’re doing initially, it’s good enough.

Terminating GNOME Shell

Mixing JavaScript, a garbage collected language, and C, with all its manual memory management, has its caveats, and this is especially true during tear down. In the past the Mutter context was terminated, later followed by the JavaScript context. Terminating the JavaScript context last prevented Clutter and Mutter objects from being destroyed, as JavaScript may still have references to these objects. If you ever wondered why there tends to be warnings in journal when logging out, this is why. All of these warnings and potential crashes mean any tests that rely on zero warnings would fail. We can’t have that!

To improve this situation, we have to shuffle things around a bit. In rough terms, we now terminate the JavaScript context first, ensuring there are no references held by JavaScript, before tearing down the backend and the Mutter context. To make this possible without introducing even more issues, this meant tearing down the whole UI tree on shut-down, making sure the actual JavaScript context disposal more or less only involves cleaning up defunct JavaScript objects.

In the past, this has been complicated too, since not all components can easily handle bits and pieces of the Shell getting destroyed in a rather arbitrary order, as it means signals get emitted when they were not expected to, e.g. when parts of the shell that was expected to still exist has already been cleaned up. A while ago, a new door was opened making it possible to handle rather conveniently: enter the signal tracker, a helper that makes it possible to write code using signal handlers that automatically disconnects signal handlers on shutdown.

With the signal tracker in place and in use, a few smaller final fixes here, and the aforementioned reversed order we tear down the JavaScript context and the Mutter bits, we can now terminate without any warnings being logged.

And as a result, the tests pass!

Enabled in CI

Right now we’re running the “basic” perf test on each merge request in GNOME Shell. It performs some basic operations, including opening the quick settings menu, handles an incoming notification, opens the overview and application grid. A screen recording of what it does can be seen below.

What’s Next

More Tests

Testing more functionality than basic.js. There are some more existing “perf tests” that could potentially be used, but tests that aim for testing specific functionality, for example window management, or configuring the Wi-Fi, that isn’t related to performance don’t really exist yet. This will become easier after the port to standard JavaScript modules, when tests no longer have to be included in the gnome-shell binary itself.

Input Events

So far, widgets are triggered programmatically. Using input events via virtual input devices means we get more fully tested code paths. Better test infrastructure for things related to input is being worked on for Mutter, and can hopefully be reused in GNOME Shell’s tests.

Running tests from Mutter’s CI

GNOME Shell provides a decent sanity test for Clutter, Mutter’s compositing library, so ensuring that it runs successfully and without warnings is useful to make sure changes there doesn’t introduce regressions.

Screenshot-based Tests

Using so called reference screenshots, test will be able to ensure there were no actual visual changes unless so was intended. The basic infrastructure exist in and can be exposed by Mutter, but for something like GNOME Shell, we probably need a way other than in-tree reference images for storage as is done in Mutter, in order to not make the gnome-shell git repository grow out of hand.

Multi-monitor

Currently the tests use a single fixed resolution virtual monitor, but this should be expanded to involve multi monitor and hotplugging. Mutter has ways to create virtual monitors, but does not yet export this via by GNOME Shell consumable API.

GNOME Shell Extensions

Not only GNOME Shell itself needs testing, running tests specifically for extensions, or running GNOME Shell’s own tests as part of testing extensions would have benefits as well.