Automated testing is important to ensure software continues to behave as it is intended and it’s part of more or less all modern software projects, including GNOME Shell and many of the dependencies it builds upon. However, as with most testing, we can always do better to get more complete testing. In this post, we’ll dive into how we recently improved testing in GNOME Shell, and what this unlocks in terms of future testability.
Already existing testing
GNOME Shell already performs testing as part of its continuous integration pipeline (CI), but tests have been limited to unit testing, meaning testing selected components in isolation ensuring they behave as expected, but due to of the nature of the functionalities that Shell implements, the amount of testing one can do as unit testing is rather limiting. Primarily, in something like GNOME Shell, it is just as important to test how things behave when used in their natural environment, i.e. instead of testing specific functionality in isolation, the whole Shell instance needs to be executed with all bits and pieces running as a whole, as if it was a real session.
In other words, what we need is being able running all of GNOME Shell as if it was installed and logged in into on a real system.
As discussed, to actually test enough things, we need to run all of GNOME Shell with all its features, as if it was a real session. What this also means is that we don’t necessarily have the ability to set up actual test cases filled with asserts as one does with unit testing; instead we need mechanisms to verify the state of the compositor in a way that looks more like regular usage. Enter “perf tests“.
- Open overview
- Open notifications
- Close notifications
- Leave overview
As is it turns out, this infrastructure fits rather neatly with the kind of testing we want to add here – tests that that perform various tasks that exercise user facing functionality.
This, however, is easier said than done, for two main reasons: we’ll be running in a container, and the complications that comes with mixing memory management models of different programming languages.
Running GNOME Shell in a container
For tests to be useful, they need to run in CI. Running in CI means running in a container, and that is not all that straightforward when it comes to compositors. The containerized environment is rather different than running on a regularly installed and setup Linux distribution; it lack many services that are expected to be running, and provide important functionality needed to build a desktop environment, like service and session management (e.g. logging out), system management (e.g. rebooting), dealing with network connectivity, and so on.
Running with most of these services missing is possible, but results in many warnings, and a partially broken session. To get any useful testing done, we need to eliminate all of these warnings, without just silencing them. Enter service mocking.
Mocked D-Bus Services
In the world of testing, “mocking” involves creating an implementation of an API, without the actual real world API implementation sitting behind it. Often these mocked services provide a limited pre-defined subset of functionality, for example hard coding results of API operations given a pre-defined set of possible input arguments. Sometimes, mocked APIs can simply only be there to pretend a service available, and nothing more is needed unless the functionality it provides needs to be actively triggered.
As part of CI testing in Mutter, the basic building blocks for mocking services needed to run a display server in CI have been implemented, but GNOME Shell needs many more compared to plain Mutter. As of this writing, in addition to the few APIs Mutter relies on, GNOME Shell also needs the following:
org.freedesktop.Accounts(accountsservice) – For e.g. the lock screen
org.freedesktop.UPower(upower) – E.g. battery status
org.freedesktop.NetworkManager(NetworkManager) – Manage internet
org.freedesktop.PolicyKit1(polkit) – Act as a PolKit agent
net.hadess.PowerProfiles(power-profiles-daemon) – Power profiles management
org.gnome.DisplayManage(gdm) – Registering with GDM
org.freedesktop.impl.portal.PermissionStore(xdg-permission-store) – Permission checking
org.gnome.SessionManager(gnome-session) – Log out / Reboot / …
org.freedesktop.GeoClue2(GeoClue) – Geolocation control
org.gnome.Shell.CalendarServer(gnome-shell-calendar-server) – Calendar integration
The mock services used by Mutter are implemented using python-dbusmock, and Mutter conveniently installs its own service mocking implementations. Building on top of this, we can easily continue mocking API after API until all the needed ones are provided.
As of now, either upstream python-dbusmock or GNOME Shell have mock implementations of all the mentioned services. All but one,
org.freedesktop.Accounts, either existed or needed a trivial implementation. In the future, for further testing that involves interacting with the system, e.g. configuring Wi-Fi, we will need expand what these mocked API implementations can do, but for what we’re doing initially, it’s good enough.
Terminating GNOME Shell
In the past, this has been complicated too, since not all components can easily handle bits and pieces of the Shell getting destroyed in a rather arbitrary order, as it means signals get emitted when they were not expected to, e.g. when parts of the shell that was expected to still exist has already been cleaned up. A while ago, a new door was opened making it possible to handle rather conveniently: enter the signal tracker, a helper that makes it possible to write code using signal handlers that automatically disconnects signal handlers on shutdown.
And as a result, the tests pass!
Enabled in CI
Right now we’re running the “basic” perf test on each merge request in GNOME Shell. It performs some basic operations, including opening the quick settings menu, handles an incoming notification, opens the overview and application grid. A screen recording of what it does can be seen below.
So far, widgets are triggered programmatically. Using input events via virtual input devices means we get more fully tested code paths. Better test infrastructure for things related to input is being worked on for Mutter, and can hopefully be reused in GNOME Shell’s tests.
Running tests from Mutter’s CI
GNOME Shell provides a decent sanity test for Clutter, Mutter’s compositing library, so ensuring that it runs successfully and without warnings is useful to make sure changes there doesn’t introduce regressions.
Using so called reference screenshots, test will be able to ensure there were no actual visual changes unless so was intended. The basic infrastructure exist in and can be exposed by Mutter, but for something like GNOME Shell, we probably need a way other than in-tree reference images for storage as is done in Mutter, in order to not make the
gnome-shell git repository grow out of hand.
Currently the tests use a single fixed resolution virtual monitor, but this should be expanded to involve multi monitor and hotplugging. Mutter has ways to create virtual monitors, but does not yet export this via by GNOME Shell consumable API.
GNOME Shell Extensions
Not only GNOME Shell itself needs testing, running tests specifically for extensions, or running GNOME Shell’s own tests as part of testing extensions would have benefits as well.