Understanding GNOME Shell’s focus stealing prevention

Focus stealing prevention exists for two main reasons: One is security, since we need to prevent rogue apps from deceiving users into e.g. typing their password into another window. If apps can silently claim keyboard focus and open their own window over the currently focused one, this enables phishing and other similar attacks. The other is user experience: Even if an app isn’t maliciously taking over your focus, it can be annoying to have a new window popping up while you’re typing something and have half your sentence end up in the wrong app.

At the same time there are cases where you want apps to be able to request focus, for example when clicking a link in a chat app and wanting it to open in the browser. In this case you want the focus to move to the browser window.

This is why our compositor library mutter implements focus stealing prevention mechanisms, which allow the currently focused app to request that a specific other app be allowed to claim focus now.

<App> is ready??

Most users have probably seen an “<App> is ready” notification in GNOME Shell at some point. Unfortunately this notification doesn’t really explain why it’s being shown and what’s happening, which may cause confusion.

Because of this there have been proposals to disable focus stealing prevention until it works better (mutter issue 673), and a number of GNOME Shell extensions).

Screenshot of a GNOME Shell notification showing that Telegram Desktop Media viewer is ready

These are the main cases where the notification is shown:

  •  A new window is opened and either the launcher app, or the launched app doesn’t implement the XDG Activation protocol or the startup notification specification
  •  An app requests focus for one of its windows, but was not activated in a valid way (e.g. because it wasn’t started by a user action)
  • An app requests focus for a new window, but it’s slow to start and in the meantime there are additional user interactions. In this case we don’t want to interrupt, and show the notification so people can switch at their convenience.
  • An app is launched from an environment that isn’t able to use the XDG Activation protocol (e.g. a terminal)

The protocol responsible for this, XDG Activation, the Wayland equivalent to the X11-specific startup notification spec was introduced somewhat recently (2020), and needs to be adopted by UI toolkits. GNOME 46 and 47 saw a few fixes and the feature was polished both in the client toolkit side (GTK and xdg-desktop-portal, as well as in the compositor implementation mutter, but there are still cases where XDG activation isn’t hooked up properly.

How XDG activation works

Flow xdg activation protocol.
XDG activation flow for moving focus between two existing windows

The way the protocol works is that the currently focused app asks the compositor to create a token linked to the focused window (Wayland surface) and the most recent user interaction (an input event serial associated with a seat).

This token is then used by the app that should receive focus when it requests to be activated. In GNOME Shell, activation means that the the window receives focus and is placed on top of other windows. An activation token may still be rejected, for example if the window linked to the token doesn’t have focus or when the linked user interaction isn’t recent enough.

In addition to handling focus, GNOME Shell also tracks app launching. Until the new app window is actually shown, GNOME Shell uses a “loading spinner” mouse cursor to indicate to the user that the app is loading. If the app doesn’t implement the XDG Activation protocol, the loading indicator only disappears after a timeout because GNOME Shell doesn’t know that the application finished loading and has presented the target window.

The protocol doesn’t define how tokens are given to the target app. One reason for this is because it depends on how the app is started. The main options are:

  • Setting the XDG_ACTIVATION_TOKEN environment variable
  • D-Bus Activation using the platform-data field, which contains the activation token
  • XDG portals that will launch an app (e.g. the OpenURI or OpenFile portals)

The target app then needs to collect the token and use it to have its window activated to receive focus and to signal to the compositor that it started successfully.

Not smart enough

When I started looking into how our focus prevention mechanism works to investigate the issues mentioned above, I was initially pretty confused. There were a lot of cases where the focus window switch worked fine, but other times it wouldn’t. I realized quickly that with existing windows, the “<App> is ready” notification is shown, but new window would get focus immediately.

This struck me as odd: Why are new windows allowed to do whatever, but existing windows are restricted in the way they can take over focus?

I first thought this was some sort of bug, but then I discovered that the behavior was by design: Mutter has a gsettings property called focus-new-windows that controls the focus stealing prevention mechanism. This property can be strict or smart (the latter being the default).

  • smart means that in most cases new windows get focus (even without asking for it) and are raised to the top of the window stack
  • strict means they get focus (are “activated”, in technical terms) only when they are actually supposed to

The smart mode exists in part because there are some cases where our current focus prevention system does not work well. These issues include:

  • Launching apps via terminal (vte issue #2788). The main issue is that the terminal executing a command does not know whether that process will present a window or not. For example, if you launch vim there’s no new window, but if you launch firefox there is.
  • Launching apps via Run a Command in GNOME Shell (gnome-shell issue #7704) shares similar issues as running apps from the terminal
  • Apps launched via custom keyboard shortcut (e.g. set up in Settings > Keyboard > Keyboard Shortcuts)
  • The lack of implementation of the appropriate protocols in apps or toolkits

Because the cases where a new window is opened are a significant percentage of the overall cases where focus prevention is triggered, this smart mode is making it appear as though apps actually implement the XDG Activation protocol, even if they don’t. While it does somewhat reduce annoyance for users, it gives developers the false impression that they don’t have to do anything.

It also makes it harder to debug issues where something doesn’t work as expected or is missing the correct implementation. For example, even in GTK4 the focus transferring is broken in some cases and took a long time to be discovered (gtk issue #6711).

Security implications

Unfortunately the current situation with smart as the default means that we’re not getting most of the benefits of focus stealing prevention. Apps are able to spawn a new window over your current one and grab keyboard focus, because the smart mode just gives the new window focus, circumventing the safety measures. This is trivial to exploit by malicious apps: All they need to do is open a new window, and focus stealing prevention doesn’t apply.

Next steps

While some people have asked for focus stealing prevention to be disabled completely until it’s implemented by most apps and toolkits, I’m not sure this is the best way forward. If we did that, nobody would notice which apps don’t implement it, so there’d be no reason for toolkits to do so.

On the other hand, there are some remaining issues around terminal applications and similar use cases that we don’t have a plan for yet, so just switching to strict to flush out app bugs isn’t ideal either at the moment.

  • There is currently no consensus in the team as to how to proceed. The two main directions we could take are:
  • Switch to strict mode by default (mutter issue #3486) once a few remaining issues are resolved, perhaps with a “flag day” deadline so apps have time to implement it.
  • Slowly make the smart mode stricter over time.

Either way we need to raise more awareness of the issue to get app and toolkit developers interested in improving things in this area, which this blogpost is a part of 🙂

It’d also be helpful if more people (especially developers) turn on strict mode on their system, so we get more testing for which apps work and which don’t. This is the relevant gsetting:

gsettings set org.gnome.desktop.wm.preferences focus-new-windows 'strict'

Thanks

Thanks to the Sovereign Tech Fund for allowing me to take the time to properly work through this as part of my broader effort around improving notifications. Thanks also to Sonny Piers and Tobias Bernard for organizing the STF project, Florian Müllner, Sebastian Wick, Carlos Garnacho, and the rest of the GNOME Shell team for reviewing my MRs, and Jonas Dreßler and Jonas Ådahl for reviewing the blogpost.

Notifications in 46 and beyond

One of the things we’re tackling as part of the STF infrastructure initiative is improving notifications. Other platforms have advanced significantly in this area over the past decade, while we still have more or less the same notifications we had since the early GNOME 3 days, both in terms of API and feature set. There’s plenty to do here 🙂

The notification drawer on GNOME 45

Modern needs

As part of the effort to port GNOME Shell to mobile Jonas looked into the delta between what we currently support and what we’d need for a more modern notification experience. Some of these limitations are specific to GNOME’s implementation, while others are relevant to all desktops.

Tie notifications to apps

As of GNOME 45 there’s no clear identification on notification bubbles which app they were sent by. Sometimes it’s hard to tell where a notification is coming from, which can be annoying when managing notifications in Settings. This also has potential security implications, since the lack of identification makes it trivial to impersonate other apps.

We want all notifications to be clearly identified as coming from a specific app.

Global notification sounds

GNOME Shell can’t play notification sounds in all cases, depending on the API the app is using (see below). Apps not primarily targeting GNOME Shell directly tend to play sounds themselves because they can’t rely on the system always doing it (it’s an optional feature of the XDG Notification API which different desktops handle differently). This works, but it’s messy for app developers because it’s hard to test and they have to implement a fallback sound played by the app. From a user perspective it’s annoying that you can’t always tell where sounds are coming from because they’re not necessarily tied to a notification bubble. There’s also no central place to manage the notification behavior and it doesn’t respect Do Not Disturb.

Notification grouping

Currently all notifications are just added to a single chronological list, which gets messy very quickly. In order to limit the length of the list we only keep the latest 3 notifications for every app, so notifications can disappear before you have a chance to act on them.

Other platforms solve this by grouping notifications by app, or even by message thread, but we don’t have anything like this at the moment.

Notifications grouped by app on the iOS lock screen

Expand media support

Currently each notification bubble can only contain one (small) image. It’s mostly used for user avatars (for messages, emails, and the like), but sometimes also for actual content (e.g. a thumbnail for the image someone sent).

Ideally what we want is to be able to show larger images in addition to avatars, as the actual content of the notification.

As of GNOME 45 we only have a single slot for images on notifications, and it’s too small for actual content.
Other platforms have multiple slots (app icon, user avatar, and content image), and media can be expanded to much larger sizes.

There’s also currently no way to include descriptive text for images in notifications, so they are inaccessible to screen readers. This isn’t as big a deal with the current icons since they’re small and mostly used for ornamental purposes, but will be important when we add larger images in the body.

Updating notification content

It’s not possible for apps to update the content inside notifications they sent earlier. This is needed to show progress bars in notifications, or updating the text if a chat message was modified.

How do we get there?

Unfortunately, it turns out that improving notifications is not just a matter of standardizing a few new features and implementing them in GNOME Shell. The way notifications work today has grown organically over the years and the status quo is messy. There are three different APIs used by apps today: XDG Notification, Gio.Notification, and XDG Portal.

How different notification APIs are used today

XDG Notification

This is the Freedesktop specification for a DBus interface for apps to send notifications to the system. It’s the oldest notification API still in use. Other desktops mostly use this API, e.g. KDE’s KNotification implements this spec.

Somewhat confusingly, this standard has never actually been finalized and is still marked as a draft today, despite not having seen significant changes in the past decade.

Gio.Notification

This is an API in GLib/Gio to send notifications, so it’s only used by GTK apps. It abstracts over different OS notification APIs, primarily the XDG one mentioned above, a private GNOME Shell API, the portal API, and Cocoa (macOS).

The primary one being used is the private DBus interface with GNOME Shell. This API was introduced in the early GNOME 3 days because the XDG standard API was deemed too complicated and was missing some features (in particular notifications were not tied to a specific app).

When using Gio.Notification apps can’t know which backend is used, and how a notification will be displayed or behave. For example, notifications can only persist after the app is closed if the private GNOME Shell API is used. These differences are specific to GNOME Shell, since the private API is only implemented there.

XDG Portal

XDG portals are secure, standardized system APIs for the Linux desktop. They were introduced as part of the push for app sandboxing around Flatpak, but can (and should) be used by non-sandboxed apps as well.

The XDG notification portal was inspired by the private GNOME Shell API, with some additional features from the XDG API mixed in.

XDG portals consist of a frontend and a backend. In the case of the notification portal, apps talk to the frontend using the portal API, while the backend talks to the system notification API. Backends are specific to the desktop environment, e.g. GNOME or KDE. On GNOME, the backend uses the private GNOME Shell API when possible.

The plan

From the GNOME Shell side we have the XDG API (used by non-GNOME apps), and the private API (used via Gio.Notification by GNOME apps). From the app side we additionally have the XDG portal API. Neither of these can easily supersede the others, because they all have different feature sets and are widely used. This makes improving our notifications tricky, because it’s not obvious which of the APIs we should extend.

After several discussions over the past few months we now have consensus that it makes the most sense to invest in the XDG portal API. Portals are the future of system APIs on the free desktop, and enable app sandboxing. Neither of the other APIs can fill this role.

Our plan for notification APIs going forward: Focus on the portal API

This requires work in a number of different modules, including the XDG portal spec, the XDG portal backend for GNOME, GNOME Shell, and client libraries such as Gio.Notification (in GLib), libportal, libnotify, and ashpd.

In the XDG portal spec, we are adding support for a number of missing features:

  • Tying notifications to apps
  • Grouping by message thread
  • Larger images in the notification body
  • Special notifications for e.g. calls and alarms
  • Clearing up some instances of undefined behavior (e.g. markup in the body, playing sounds, whether to show notifications on the lock screen, etc.)

This is the draft XDG desktop portal proposal for the spec changes.

On the GNOME Shell side, these are the primary things we’re doing (some already done in 46):

  • Cleanups and refactoring to make the code easier to work on
  • Improve keyboard navigation and screen reader accessibility
  • Header with app name and icon
  • Show full notification body and buttons in the drawer
  • Larger notification icons (e.g. user avatars on chat notifications)
  • Group notifications from the same app as a stack
  • Allow message threads to be grouped in a single notification bubbles
  • Larger images in the notification body
Mockups of what we’d ideally want, including grouping by app, threading, etc.

There are also animated mockups for some of this, courtesy of Jakub Steiner.

The long-term goal is for apps to switch to the portal API and deprecate both of the others as application-facing APIs. Internally we will still need something to communicate between the portal backend and GNOME Shell, but this isn’t public API so we’re much more flexible here. We might expand either the XDG API or the private GNOME Shell protocol for this purpose, but it has not been decided yet how we’ll do this.

What we did in GNOME 46

When we started the STF project late last year we thought we could just pull the trigger on a draft proposal Jonas for an API with the new capabilities needed for mobile. However, as we started discussing things in more detail we realized that this was the the wrong place to start. GNOME Shell already didn’t implement a number of features that are in the XDG notification spec, so standardizing new features was not the main blocker.

The code around notifications in GNOME Shell has grown historically and has seen multiple major UI redesigns since GNOME 3.0. Additional complexity comes from the fact that we try to avoid breaking extensions, which means it’s difficult to e.g. change function names or signatures. Over time this has resulted in technical debt, such as weird anachronistic structures and names. It was also not using many of the more recent GJS features which didn’t exist yet when this code was written originally.

Anyone remember that notifications used to be on the bottom? This is what they looked like in GNOME 3.6 (2012).

As a first step we restructured and cleaned up legacy code, ported it to the most recent GJS features, updated the coding style, and so on. This unfortunately means extensions need to be updated, but it puts us on much firmer ground for the future.

With this out of the way we added the first batch of features from our list above, namely adding notification headers, expanding notifications in the drawer, larger icons, and some style fixes to icons. We also fixed a very annoying issue with “App is ready” notifications not working as expected when clicking a notification (!3198 and !3199).

We also worked on a few other things that didn’t make it in time for 46, most notably grouping notifications by app (which there’s a draft MR for), and additionally grouping them by thread (prototype only).

Throughout the cycle we also continued to discuss the portal spec, as mentioned above. There are MRs against against XDG desktop portal and the libportal client library implementing the spec changes. There’s also a draft implementation for the GTK portal backend.

Future work

With all the groundwork laid in GNOME 46 and the spec draft mostly ready we’re in a good position to continue iterating on notifications in 47 and beyond. In GNOME 47 we want to add some of the first newly spec’d features, in particular notification sounds, markup support in the body, and display hints (e.g. showing on the lock screen or not).

We also want to continue work on the UI to unlock even more improvements in the future. In particular, grouping by app will allow us to drop the “only keep 3 notifications per app” behavior and will generally make notifications easier to manage, e.g. allowing to dismiss all notifications from a given app. We’re also planning to work on improving keyboard navigation and ensuring all content is accessible to screen readers.

Due to the complex nature of the UI for grouping by app and the many moving parts with moving forward on the spec it’s unclear if we’ll be able to do more than this in the scope of STF and within the 47 cycle. This means that additional features that require the new spec and/or lots of UI work, such as grouping by thread and custom UI for call or alarm notifications will probably be 48+ material.

Conclusion

As we hope this post has illustrated, notifications are way more complex than they might appear. Improving them requires untangling decades of legacy stuff across many different components, coordinating with other projects, and engaging with standards bodies. That complexity has made this hard to work on for volunteers, and there has not been any recent corporate interest in the area, which is why it has been stagnant for some time.

The Sovereign Tech Fund investment has allowed us to take the time to properly work through the problem, clean up technical debt, and make a plan for the future. We hope to leverage this momentum over the coming releases, for a best-in-class notification experience on the free desktop. Stay tuned 🙂

Extensions in GNOME 45

By now it is probably no longer news to many: GNOME Shell moved from GJS’ own custom imports system to standard JavaScript modules (ESM).

Imports? ESM?

JavaScript originated in web browsers to add a bit of interactivity to otherwise static pages. There was no need to split up small code snippets into multiple files, so the language did not provide a mechanism for that.

This did become an issue when people started writing bigger programs in JavaScript, so environments like node.js and GJS added their own import systems to organize code into multiple files. As a consequence, developers and tooling had a hard time transitioning from one environment to another.

That changed in 2015 when ECMAScript 6 standardized modules, resulting in a well-defined, widely-supported syntax supported by all major JavaScript engines. GJS has supported ESModules since 2021, but porting GNOME Shell was a much bigger task that had to be done all at once.

So? Why should I care?

Well, there is a teeny tiny drawback: Modules and legacy imports are incompatible in practice.

Modules are loaded differently than scripts, and some statements — namely import and export — are only valid in modules. That means that trying to import a module with the legacy system will result in a syntax error if the module uses one of those statements (about as likely as a pope being Catholic).

Modules also hide anything to the outside that isn’t explicitly exported. So while it is technically possible to import a script as module, it is about as useful as importing an empty file.

What does this mean for extensions?

Extensions that target older GNOME versions will not work in GNOME 45. Likewise, extensions that are adapted to work with GNOME 45 will not work in older versions.

You can still support more than one GNOME version, but you will have to upload different versions to extensions.gnome.org for pre- and post-45 support.

There is a porting guide with detailed information. The two most important changes (that will be enough for many extensions!) are:

  1. Use standard syntax to import modules from gnome-shell:

    import * as Main from 'resource:///org/gnome/shell/ui/main.js';
    
    Main.notify('Loaded!');
    
  2. Export a default class with enable() and disable() methods from your extension.js.

    You may want to extend the new Extension class that replaces the convenience API from the old ExtensionUtils module.

    import {Extension, gettext as _} from 'resource:///org/gnome/shell/extensions/extension.js';
    
    export default class MyTestExtension extends Extension {
        enable() {
            console.log(_('%s is now enabled').format(this.uuid));
        }
    
        disable() {
            console.log(_('%s is now disabled.').format(this.uuid));
        }
    }
    

Last but not least, you can always find friendly people on Matrix and Discourse who will be happy to help with any porting issues.

Summary
  • Moving from the custom import system from GJS to the industry standard ECMAScript 6 will cause every extension to break. The move though will mean we are following proper standards and not home grown ones allowing greater compatibility with JavaScript ecosystem.
  • Legacy imports are still supported on extensions.gnome.org but you will need to upload a pre and post GNOME 45 support in order support both LTS and regular distribtuions.

For GNOME Extension Developers:
There is a active extension community who can help you port to the new import system at Matrix and Discourse who can help you quickly port to the new version.

You can test your extensions by downloading the latest GNOME OS and trying your extension there.

To the GNOME Community:
Please file bugs with your favorite extensions or have a friendly conversation with your extension writers so that we can help minimize the impact of this change. Ideally, you could help with the port and provide a pull or merge request to help maintainers.

Resources

GNOME Shell styling changes: A PSA for theme authors

TL;DR: The gnome-shell-sass repository is no longer getting updated upstream.

Background

As gnome-shell’s CSS grew more complex, designers needed something more expressive, so they started compiling the stylesheet from SASS. The sources were moved to a subproject, so they could be shared between the regular stylesheet and the GNOME Classic stylesheet.

Fast-forward to the present day, gnome-shell now includes a light variant to support the global color-scheme option.

GNOME Shell in light style

GNOME Classic has been updated to use that built-in support instead of compiling a separate stylesheet on its own.

That means that the gnome-shell-sass repository is no longer needed by the gnome-shell-extensions project, and will therefore no longer be updated upstream.

If you are building your own theme from those sources, you will either have to get them from the gnome-shell repository yourself, or coordinate with other 3rd party theme authors to keep the subproject updated.

Vivid colors in Brno

Co-authored by Sebastian Wick & Jonas Ådahl.

During April 24 to 26 Red Hat invited people working on compositors and display drivers to come together to collaborate on bringing the Linux graphics stack to the next level. There were three high level topics that were discussed at length: Color Management, High Dynamic Range (HDR) and Variable Refresh Rate (VRR). This post will go through the discussions that took place, and occasional rough consensus reached among the people who attended.

The event itself aimed to be both as inclusive and engaging as possible, meaning participants could attend both in person, in the Red Hat office in Brno, Czech Republic, or remotely via a video link. The format of the event was structured in a way aiming to give remote attendees and physical attendees an equal opportunity to participate in discussions. While the hallway track can be a great way to collaborate, discussions accessible remotely were prioritized by having two available rooms with their own video link.

This meant that if the main room wanted to continue on the same topic, while some wanted to do a breakout session, they could go to the other room, and anyone attending remotely could tag along by connecting to the other video link. In the end, the break out room became the room where people collaborated on various things in a less structured manner, leaving the main room to cover the main topics. A reason for this is that the microphones in both rooms were a bit too good, effectively catching any conversation anyone had anywhere in the room. Making one of the rooms a bit more chaotic, while the other focused, also allowed for both ways of collaborating.

For the kernel side, people working on AMD, Intel and NVIDIA drivers were among the attendees, and for user space there was representation from gamescope, GNOME, KDE, smithay, Wayland, weston and wlroots. Some of those people are community contributors and some of them were attending on behalf of Red Hat, Canonical, System76, sourcehut, Collabora, Blue Systems, Igalia, AMD, Intel, Google, and NVIDIA. We had a lot of productive discussion, ending up in total with a 20 (!) page document of notes.

Discussion with remote attendees during the hackfest

Color management & HDR

Wayland

Color management in the Linux graphics stack is shifting in the way it is implemented, away from the style used in X11 where the display server (X.org) takes a hands-off approach and the end result is dependent on individual client capabilities, to an architecture where the Wayland display server takes an active role to ensure that all clients, be them color aware or not, show up on screen correctly.

Pekka Paalanen and Sebastian Wick gave a summary of the current state of digital color on Linux and Wayland. For full details, see the Color and HDR documentation repository.

They described the in-development color-representation and color-management Wayland protocols. The color-representation protocol lets clients describe the way color channels are encoded and the color-management protocol lets clients describe the color channels’ meaning to completely describe the appearance of surfaces. It also gives clients information about how it can optimize its content to the target monitor capabilities to minimize the color transformations in the compositor.

Another key aspect of the Wayland color protocols in development is that compositors will be able to choose what they want to support. This allows for example to implement HDR without involving ICC workflows.

There is already a broad consensus that this type of active color management aligns with the Wayland philosophy and while work is needed in compositors and client toolkits alike, the protocols in question are ready for prototyping and review from the wider community.

Colors in kernel drivers & compositors

There are two parts to HDR and color management for compositors. The first one is to create content from different SDR and HDR sources using color transformations. The second is signaling the monitor to enter the desired mode. Given the current state of kernel API capabilities, compositors are in general required to handle all of their color transformations using shaders during composition. For the short term we will focus on removing the last blockers for HDR signaling and in the long term work on making it possible to offload color space conversions to the display hardware which should ideally make it possible to power down the GPU while playing e.g. a movie

Short term

Entering HDR mode is done by setting the colorimetry (KMS Colorspace property) and overriding the transfer characteristics (KMS HDR_OUTPUT_METADATA property).

Unfortunately the design of the Colorspace property does not mix well with the current broader KMS design where the output format is an implementation detail of the driver. We’re going to tweak the behavior of the Colorspace property such that it doesn’t directly control the InfoFrame but lets the driver choose the correct variant and transparently convert to YCC using the correct matrix if required. This should allow AMD to support HDR signaling upstream as well.

The HDR_OUTPUT_METADATA property is a bit weird as well and should be documented. Changing it might require a mode set and changing the transfer characteristics part of the blob will make monitors glitch, while changing other parameters must not require a mode set and must not glitch.

Both landing support upstream for the AMD driver, and improvements to the documentation should happen soon, enabling proper upstream HDR signaling.

Vendor specific uAPI for color pipelines

Recently a proposal for adding vendor specific properties for exposing hardware color pipelines via KMS has been posted, and while it is great to see work being done to improve situation in the Linux kernel, there are concerns that this opens up for per vendor API that end up necessary for compositors to implement, effectively reintroducing per vendor GPU drivers in userspace outside of mesa.

Still, upstream support in the kernel has its upsides, as it for example makes it much easier to experiment. A way forward discussed is to propose that vendor specific color pipeline properties should be handled with care, by requiring them to be clearly documented as experimental, and disabled by default both with a build configuration, and a off-by-default module parameter.

A proposal for this will be sent by Harry Wentland to the relevant kernel mailing lists.

Color pipelines in KMS

Long term, KMS should support color pipelines without any experimental flags, and there is a wide agreement that it should be done with a vendor agnostic API. To achieve this, a proposal was discussed at length, but to summarize it, the goal is to introduce a new KMS object for color operations. A color operation object exposes a low level mathematical function (e.g. Matrix multiplication, 1D or 3D look up tables) and a link to the next operation. To declare a color pipeline, drivers construct a linked list of these operations, for example 1D LUT → Matrix → 1D LUT to describe the current DEGAMMA_LUT → CTM → GAMMA_LUT KMS pipeline.

The discussions primarily focused on per plane color pipelines for the pre-blending stage, but the same concept should be reusable for the post blending stage on the CRTC.

Eventually this work should also make it possible to cleanly separate KMS properties which change the colors (i.e. color operations) from properties changing the mode and signaling to sinks, such as Broadcast RGB, Colorspace, max_bpc.

It was also agreed that user space needs more control over the output format, i.e. what is transmitted over the wire. Right now this is a driver implementation detail and chosen such that the bandwidth requirements of the selected mode will be satisfied. In particular making it possible to turn off YCC subsampling, specifying the minimum bit depth and specifying the compression strength for DCC seems to have consensus.

There are a lot more details that handle all the quirks that hardware may have. For more details and further discussion about the color pipeline proposal, head over to the RFC that Simon Ser just sent to the relevant mailing lists.

Testing & VKMS

Testability of color pipelines and KMS in general was a topic that was brought up as well, with two areas of interest: testing compositors and the generic DRM layer in the kernel using VKMS, and testing actual kernel drivers.

The state of VKMS is to some degree problematic; it currently lacks a large enough pool of established contributors that can take maintainership responsibilities, i.e. reviewing and landing code, but at the same time, there is an urge to make it a more central part of GPU driver development in general, where it can take a more active role in ensuring cross driver conformance. Discussions on how to create more incentive for both kernel developers and compositor developers to help out were discussed, and while ability to test compositors is a relatively good incentive, an idea discussed was to require new DRM properties to always get a VKMS implementation as well to be able to land. This is, however, not easy, since a significant amount of bootstrapping is needed to make that viable. Some ideas were thrown around, and hopefully something will come out of it; keep an eye on the relevant mailing lists for something related to this area.

For testing actual drivers, the usage of Chamelium was discussed, and while everyone agreed it’s something that is definitely nice to have, it takes a significant amount of resources to maintain wired up CI runners for the community to rely on. Ideally a setup that can be shared across the different compositors and GPU drivers would be great, but it’s a significant task to handle.

Variable Refresh Rate

Smoothing out refresh rate changes

Variable Refresh Rate monitors driven at a certain mode have a minimum and maximum refresh cycle duration and the actual duration can be chosen for every refresh cycle. One problem with most existing VRR monitors however is that when the refresh duration changes too quickly, they tend to produce visible glitches. They appear as brightness changes for a fraction of a second and can be very jarring. To avoid them, each refresh cycle must change the duration only up to some fixed amount. The amount however varies between monitors, with some having no restriction at all.

A VESA certification is currently being deployed aiming to certify monitors where any change in the refresh cycle duration does not result in glitches. For all other monitors, the increase and decrease in duration which does not result in glitches is unknown if not provided by optional EDID/DisplayID data blocks.

Driving monitors glitch-free without machine readable information therefore requires another approach. One idea is to make the limits configurable. Requiring all users to tweak and fiddle to make it work good enough, however, is not very user friendly, so another idea that was discussed is to maintain a database similar to the one used by libinput, but in libdisplay-info, that contains the required information about monitors, even if there is no such information made available by the vendor.

With all of the required information, the smoothing of refresh rate changes still needs to happen somewhere. It was debated whether this should be handled transparently by the kernel, or if it should be completely up to user space. There are pros and cons to both ways, for example better timing ability in the kernel, but less black box magic if handled by user space. In the end, the conclusion is for user space components (i.e. compositors) to handle this themselves first, and then reconsider some point in the future if that is enough, or whether new kernel uAPI is needed.

Low Framerate Compensation

The usual frame rates that a VRR monitor can achieve typically do not cover a bunch of often used low frame rates, such as 30, 25, or 24 Hz. To still be able to show such content without stutter, the display can be driven at a multiple of the target frame rate and present new content on every n-th refresh cycle.

Right now this Low Framerate Compensation (LFC) feature is built into the kernel driver, and when VRR is enabled, user space can transparently present content at refresh rates even lower than what the display supports. While this seems like a good idea, there are problems with this approach. For example the cursor can only be updated when there is a content update, making it very sluggish because of the low rate of content updates even though the screen refreshes multiple times. This either requires a special KMS commit which does not result in an immediate page flip but ends up on the refresh cycles inserted by LFC, or implementing LFC in user space instead. Like with the refresh rate change smoothing talked about earlier, moving LFC to user space might be possible but also might require some help from the kernel to be able to time page flips well enough.

Wayland

For VRR to work, applications need to provide content updates on a surface in a semi-regular interval. GUI applications for example often only draw when something changed which makes the updates irregular, driving VRR to its minimum refresh rate until e.g. an animation is playing and VRR is ramping up the refresh rate over multiple refresh cycles. This results in choppy mouse cursor movements and animations for some time. GUI applications sometimes do provide semi-regular updates, e.g. during animations or video playback. Some applications, like games, always provide semi-regular updates.

Currently there is no1 Wayland protocol letting applications advertise that a surface works with VRR at a moment in time, or at all. There is no way for a compositor to automatically determine if an app or a surface is suitable for VRR as well. For wayland native applications a protocol to communicate this information could be created but there are a lot of applications out there which would work fine with VRR but will not get updated to support this protocol.

Maintaining a database similar to the one mentioned above, but for applications, was discussed, but there is no clear winner in how to do so, and where to store the data. Maintaining a list is cumbersome, and complicates the ability for applications to work with VRR on release, or on distributions with out of date databases. Another idea was a desktop file entry stating support, but this too has its downsides. All in all, there is no clear path forward in how to actually enable VRR for applications transparently without causing issues.

1. Except for a protocol proposal.

Wrap-up

Brno, Czech Republic

The hackfest was a huge success! Not only was this a good opportunity to get everyone up to speed and learn about what everyone is doing, having people with different backgrounds in the discussions made it possible to discuss problems, ideas and solutions spanning all the way from clients over compositors, to drivers and hardware. Especially on the color and HDR topics we came up with good, actionable consensus and a clear path to where we want to go. For VRR we managed to pin-point the remaining issues and know which parts require more experimentation.

For GNOME, Color management, HDR and VRR are all topics that are being actively worked on, and the future is both bright and dynamic, not only when it comes to luminescence and color intensity, but also when it comes to the rate monitors present all these intense colors.

Dor Askayo who has been working on bringing VRR to GNOME attended part of the hackfest, and together we can hopefully bring experimental VRR to GNOME soon. There will be more work needed to iron out the overall experience, as covered above, but getting the fundamental building blocks in place is a critical first step.

For HDR, work has been going on to attach color state information to the scene graph, and at the hackfest Georges Basile Stavracas, Sebastian Wick and Jonas Ådahl sat down and sketched out a new Clutter rendering API that aims replace the current Clutter paint nodes API that is used in Mutter and GNOME Shell, which will make color transformations a first class citizen. We will initially focus on using shaders for everything, but down the road, the goal is to utilize the future color pipeline KMS uAPI for both performance and power consumption improvements.

We’d like to thank Red Hat for organizing and hosting the hackfest and for allowing us to work on these interesting topics, Red Hat and Collabora for sponsoring food and refreshments, and especially Carlos Soriano Sanchez and Tomas Popela for actually doing all the work making the event happen. It was great. Also thanks to Jakub Steiner for the illustration, and Carlos Soriano Sanchez for the photo from the hackfest.

For another great hackfest write-up, head over to Simon Ser’s blog post.

Ensuring steady frame rates with GPU-intensive clients

On Wayland, a surface is the basic primitive used to build what users refer to as a “window”. Wayland clients define their contents by attaching buffers to surfaces. This turns the contents of the buffer into the current surface contents. Wayland clients are free to attach a new buffer to a surface anytime. When a Wayland compositor like Mutter starts working on a new output frame, it picks the latest available buffer for each visible surface. This is called “mailbox semantics” (the buffers are metaphorical letters falling into a mailbox, the visible “letter” is the last one on top).

Problem

With hardware accelerated drawing, a client normally attaches a new buffer to a surface right after it finished calling OpenGL/Vulkan/<insert your favourite drawing API> APIs to define the contents of the buffer. When the compositor processes the protocol requests attaching the buffer to the surface, the GPU generally hasn’t finished drawing to the buffer yet.

Since the contents of the compositor’s output frame depend on the contents of each visible surface, the former cannot complete before the GPU finishes drawing to each of the picked surface buffers (and subsequently to the compositor’s own output buffer, in the general case).

If the GPU does not finish drawing in time for the next display refresh cycle, the compositor’s output frame misses that cycle and is delayed by at least the duration of one refresh cycle. This can be noticeable as judder/stutter, because the compositor’s frame rate is reduced, and the contents of some frames are not consistent with the timing when they become visible.

The likelihood of that happening depends largely on the clients, mainly on how long it takes the GPU to draw their buffer contents and how much time lies between when a client starts drawing to its buffer and when the compositor starts working on its resulting output frame.

In summary, a Wayland compositor can miss a display refresh cycle because the GPU failed to finish drawing to a client buffer in time.

This diagram visualizes a normal and problematic case:

Left side: normal case, right side: problematic case
Left side: normal case, right side: problematic case

Solution

Basic idea

The basic idea is simple: the compositor considers a client buffer “available” per the mailbox semantics only once the GPU finishes drawing to it. Until then, it picks the previously available buffer.

Complications

Now if it was as simple as that might sound, there would be no need to write a >1000-word article about it. 🙂

The main thing which makes things more complicated is that, together with attaching a new buffer, various other surface states can be modified in the same commit. All state changes in the same commit must be applied atomically, i.e. the user must either see all or none of them (per Wayland’s “every frame is perfect” motto). For an example, there are various states which affect how a Wayland surface is scaled for display. Attaching a new buffer and changing the scaling state in the same commit ensures that the surface always appears consistently. If the buffer size and scaling state were to change independently, the surface might intermittently appear in the wrong size.

As if that wasn’t complicated enough, Wayland has so-called synchronized sub-surfaces. State changes for a synchronized sub-surface are not applied immediately, but only the next time any state changes are applied for its parent surface. Conceptually, one can think of the committed sub-surface state becoming part of the parent surface’s state commit. Again, all state combined like this between sub-surfaces (which can be nested, i.e. a sub-surface can be the parent of another sub-surface) and their parents must be applied atomically, all or nothing, to ensure that sub-surfaces and their parents always appear consistently as a whole.

This means that the compositor cannot simply wait for the GPU to finish drawing to client buffers, while applying other corresponding surface state immediately. It needs to stage the committed state changes somehow, and actually apply them only once the GPU has finished drawing to all new buffers attached in the same combined state commit.

Enter transactions

The idea for “stage somehow” is to introduce the concept of a transaction, which combines a set of state changes for one or multiple (sub-)surfaces. When a client commits a set of state changes for a surface, they are inserted into an appropriate transaction; either a new one or an existing one, depending on circumstances.

When the committed state changes should get applied per Wayland protocol semantics, the transaction is committed and inserted into a queue of committed transactions. The queue is ordered such that for any given surface,  state commits are applied in the same order as they were committed by the client. This ensures that the contents of a surface never appear to “move backwards” because one transaction affecting the surface managed to “overtake” another one.

A transaction is considered ready to be applied only once both of these conditions are true:

  1. It’s the oldest (closest to the queue head) transaction in the queue for all surfaces it carries state for.
  2. The GPU has finished drawing to all client buffers attached in the transaction.

Once both of these conditions are true, the transaction is applied atomically. From that point on, the compositor uses the state in the transaction for its output frames.

Results

I implemented the solution described above in Mutter merge request !1880, which was merged for the GNOME 44 release. While it went under the radar of news outlets, I hope that many of you will notice the benefits!

One situation where the benefits of transactions can be noticed is interactive OpenGL applications such as games, with “vsync” disabled (e.g. for better input → output latency), you should be less likely to see stuttering due to Mutter missing a display refresh cycle, in particular in fullscreen and if Mutter can use direct scanout of client buffers.

If the GPU & drivers support true high priority EGL contexts which can preempt lower priority ones (as of this writing, this is true e.g. with “not too old” Intel GPUs), Mutter can now sustain full frame rate even if clients are GPU-bound to lower frame rates, as demonstrated in this video:

Even if the GPU & drivers do not support this, Mutter should now get bogged down less by such heavy clients, in particular the mouse cursor.

It’s effective for X clients running via Xwayland as well, not only for native Wayland clients.

Long term, all major Wayland compositors will want to do something like this. gamescope already does.

Thanks

It took almost two years (on and off, not full-time) from having the initial idea, deciding to try implementing it myself, until finally getting it ready to be merged. I wasn’t very familiar with the Mutter code or Wayland protocol semantics when I started, so I couldn’t have done it without a lot of help from many Mutter and Wayland developers. I am deeply grateful to all of you.

Thanks to Jakub Steiner for the featured image and to Niels De Graef for the diagram of this post.

I would also like to thank Red Hat for giving me the opportunity to work on this, even though “Mutter developer” isn’t really a central part of my job description.

Automated testing of GNOME Shell

Automated testing is important to ensure software continues to behave as it is intended and it’s part of more or less all modern software projects, including GNOME Shell and many of the dependencies it builds upon. However, as with most testing, we can always do better to get more complete testing. In this post, we’ll dive into how we recently improved testing in GNOME Shell, and what this unlocks in terms of future testability.

Already existing testing

GNOME Shell already performs testing as part of its continuous integration pipeline (CI), but tests have been limited to unit testing, meaning testing selected components in isolation ensuring they behave as expected, but due to of the nature of the functionalities that Shell implements, the amount of testing one can do as unit testing is rather limiting. Primarily, in something like GNOME Shell, it is just as important to test how things behave when used in their natural environment, i.e. instead of testing specific functionality in isolation, the whole Shell instance needs to be executed with all bits and pieces running as a whole, as if it was a real session.

In other words, what we need is being able running all of GNOME Shell as if it was installed and logged in into on a real system.

Test Everything

As discussed, to actually test enough things, we need to run all of GNOME Shell with all its features, as if it was a real session. What this also means is that we don’t necessarily have the ability to set up actual test cases filled with asserts as one does with unit testing; instead we need mechanisms to verify the state of the compositor in a way that looks more like regular usage. Enter “perf tests“.

Since many years back, GNOME Shell has had automated performance tests, that would measure how well the Shell performed doing various tasks. Each test is a tiny JavaScript function that performs a few operations, while making sure all the performed operations actually happened, and when it finishes, the Shell instance is terminated. For example, a “perf test” could look like

  1. Open overview
  2. Open notifications
  3. Close notifications
  4. Leave overview

As is it turns out, this infrastructure fits rather neatly with the kind of testing we want to add here – tests that that perform various tasks that exercise user facing functionality.

There are, however, more ways to  verify that things behave as expected other than triggering these operations and ensuring that they executed correctly. The most immediate next step is to ensure that there were no warnings logged during the whole test run. This is useful in part due to the fact that GNOME Shell is largely written in JavaScript, as this means the APIs provided by lower level components such as Mutter and GLib tend to have runtime input validation in introspected API entry points. Consequently, if an API is misused by some JavaScript code, it tends to result in warnings being logged. We can be more confident that a particular change won’t introduce regressions when it runs GNOME Shell completely without warnings.

This, however, is easier said than done, for two main reasons: we’ll be running in a container, and the complications that comes with mixing memory management models of different programming languages.

Running GNOME Shell in a container

For tests to be useful, they need to run in CI. Running in CI means running in a container, and that is not all that straightforward when it comes to compositors. The containerized environment is rather different than running on a regularly installed and setup Linux distribution; it lack many services that are expected to be running, and provide important functionality needed to build a desktop environment, like service and session management (e.g. logging out), system management (e.g. rebooting), dealing with network connectivity, and so on.

Running with most of these services missing is possible, but results in many warnings, and a partially broken session. To get any useful testing done, we need to eliminate all of these warnings, without just silencing them. Enter service mocking.

Mocked D-Bus Services

In the world of testing, “mocking” involves creating an implementation of an API, without the actual real world API implementation sitting behind it. Often these mocked services provide a limited pre-defined subset of functionality, for example hard coding results of API operations given a pre-defined set of possible input arguments. Sometimes, mocked APIs can simply only be there to pretend a service available, and nothing more is needed unless the functionality it provides needs to be actively triggered.

As part of CI testing in Mutter, the basic building blocks for mocking services needed to run a display server in CI have been implemented, but GNOME Shell needs many more compared to plain Mutter. As of this writing, in addition to the few APIs Mutter relies on, GNOME Shell also needs the following:

  • org.freedesktop.Accounts (accountsservice) – For e.g. the lock screen
  • org.freedesktop.UPower (upower) – E.g. battery status
  • org.freedesktop.NetworkManager (NetworkManager) – Manage internet
  • org.freedesktop.PolicyKit1 (polkit) – Act as a PolKit agent
  • net.hadess.PowerProfiles (power-profiles-daemon) – Power profiles management
  • org.gnome.DisplayManage (gdm) – Registering with GDM
  • org.freedesktop.impl.portal.PermissionStore (xdg-permission-store) – Permission checking
  • org.gnome.SessionManager (gnome-session) – Log out / Reboot / …
  • org.freedesktop.GeoClue2 (GeoClue) – Geolocation control
  • org.gnome.Shell.CalendarServer (gnome-shell-calendar-server) – Calendar integration

The mock services used by Mutter are implemented using python-dbusmock, and Mutter conveniently installs its own service mocking implementations. Building on top of this, we can easily continue mocking API after API until all the needed ones are provided.

As of now, either upstream python-dbusmock or GNOME Shell have mock implementations of all the mentioned services. All but one, org.freedesktop.Accounts, either existed or needed a trivial implementation. In the future, for further testing that involves interacting with the system, e.g. configuring Wi-Fi, we will need expand what these mocked API implementations can do, but for what we’re doing initially, it’s good enough.

Terminating GNOME Shell

Mixing JavaScript, a garbage collected language, and C, with all its manual memory management, has its caveats, and this is especially true during tear down. In the past the Mutter context was terminated, later followed by the JavaScript context. Terminating the JavaScript context last prevented Clutter and Mutter objects from being destroyed, as JavaScript may still have references to these objects. If you ever wondered why there tends to be warnings in journal when logging out, this is why. All of these warnings and potential crashes mean any tests that rely on zero warnings would fail. We can’t have that!

To improve this situation, we have to shuffle things around a bit. In rough terms, we now terminate the JavaScript context first, ensuring there are no references held by JavaScript, before tearing down the backend and the Mutter context. To make this possible without introducing even more issues, this meant tearing down the whole UI tree on shut-down, making sure the actual JavaScript context disposal more or less only involves cleaning up defunct JavaScript objects.

In the past, this has been complicated too, since not all components can easily handle bits and pieces of the Shell getting destroyed in a rather arbitrary order, as it means signals get emitted when they were not expected to, e.g. when parts of the shell that was expected to still exist has already been cleaned up. A while ago, a new door was opened making it possible to handle rather conveniently: enter the signal tracker, a helper that makes it possible to write code using signal handlers that automatically disconnects signal handlers on shutdown.

With the signal tracker in place and in use, a few smaller final fixes here, and the aforementioned reversed order we tear down the JavaScript context and the Mutter bits, we can now terminate without any warnings being logged.

And as a result, the tests pass!

Enabled in CI

Right now we’re running the “basic” perf test on each merge request in GNOME Shell. It performs some basic operations, including opening the quick settings menu, handles an incoming notification, opens the overview and application grid. A screen recording of what it does can be seen below.

What’s Next

More Tests

Testing more functionality than basic.js. There are some more existing “perf tests” that could potentially be used, but tests that aim for testing specific functionality, for example window management, or configuring the Wi-Fi, that isn’t related to performance don’t really exist yet. This will become easier after the port to standard JavaScript modules, when tests no longer have to be included in the gnome-shell binary itself.

Input Events

So far, widgets are triggered programmatically. Using input events via virtual input devices means we get more fully tested code paths. Better test infrastructure for things related to input is being worked on for Mutter, and can hopefully be reused in GNOME Shell’s tests.

Running tests from Mutter’s CI

GNOME Shell provides a decent sanity test for Clutter, Mutter’s compositing library, so ensuring that it runs successfully and without warnings is useful to make sure changes there doesn’t introduce regressions.

Screenshot-based Tests

Using so called reference screenshots, test will be able to ensure there were no actual visual changes unless so was intended. The basic infrastructure exist in and can be exposed by Mutter, but for something like GNOME Shell, we probably need a way other than in-tree reference images for storage as is done in Mutter, in order to not make the gnome-shell git repository grow out of hand.

Multi-monitor

Currently the tests use a single fixed resolution virtual monitor, but this should be expanded to involve multi monitor and hotplugging. Mutter has ways to create virtual monitors, but does not yet export this via by GNOME Shell consumable API.

GNOME Shell Extensions

Not only GNOME Shell itself needs testing, running tests specifically for extensions, or running GNOME Shell’s own tests as part of testing extensions would have benefits as well.

GNOME Shell on mobile: An update

It’s been a while since the last update on GNOME Shell mobile, but there’s been a huge amount of progress during that time, which culminated in a very successful demo at the Prototype Fund Demo Day last week.

​The current state of the project is that we have branches with all the individual patches for GNOME Shell and Mutter, which together comprise a pretty complete mobile shell experience. This includes all the basics we set out to cover during the Prototype Fund project (navigation gestures, screen size detection, app grid, on-screen keyboard, etc.) and some additional things we ended up adding along the way.

The heart of the mobile shell experience is the sophisticated 2D gesture navigation: The gestures to go to the overview vertically and switch horizontally between apps are fluid, interruptible, and multi-dimensional. This allows for navigation which is not only quick and ergonomic, but also intuitive thanks to an incredibly simple spatial model.

While the overall gesture paradigm we use is quite similar to what iOS and Android have, there’s one important difference: We have a single overview for both launching and switching, instead of two separate screens on iOS (home screen and multitasking) and three separate screens on Android (home screen, app drawer, multitasking).

This allows us to avoid the awkward “swipe, stop, and wait” gesture to go to multitasking that other systems rely on, as well as the confusing spatial model, where apps live both within the app icon and next to the home screen, and sometimes show up from the left when swiping… up?

Our overview is always a single swipe away, and allows instant access to both open apps and the app grid, without having to choose between launching and switching.

In case you’re wondering where the “overview” state with just the multitasking cards (like we had in previous iterations) went – After some experimentation and informal user research we realized that it’s not really adding any value over the row of thumbnails in the app grid state. The smaller thumbnails are more than large enough to interact with, and more useful because you can see more of them at the same time.

We ported the shell search experience to a single-column layout for the narrower screen, which coincidentally is a direction we’re also exploring for the desktop search layout.

We completely replaced the on-screen keyboard gesture input, applying several tricks that OSKs on other mobile OSes employ, e.g. releasing the currently pressed key when another one is pressed. The heuristics for when the keyboard shows up are a lot more intuitive now and more in line with other mobile OSes.

The keyboard layout was adapted to the narrower size and the emoji keyboard got a redesign. There’s also a very fancy new gesture for hiding the keyboard, and it automatically hides when scrolling the view.

The app grid layout was adapted to portrait sizes, including a new style for folders and lots of spacing and padding tweaks to make it work well for the phone use case. All the advanced re-ordering and organizing features the app grid already had before are of course available.

Luckily for us, Florian independently implemented the new Quick Settings this cycle. These work great on the phone layout, but on top of that we also added notifications to that same menu, to get a unified system menu you can open with a swipe from the top. This is not as mature as other parts of the mobile shell yet and needs further work, which we’ll hopefully get to soon as part of the planned notifications overhaul.

One interesting new feature here is that notifications can be swiped away horizontally to close, and notification bubbles can be swiped up to hide them.

Next steps

From a development perspective the next steps are primarily upstreaming all of the work done so far, starting with the new gesture API, which is used by many different parts of the mobile shell and will bring huge improvements to gestures on desktop as well. This upstreaming effort is going to require many separate merge requests that depend on each other, and will likely take most of the 44 cycle.

Beyond upstreaming what already exists there are many additional things we want or need to work on to make the mobile experience really awesome, including:

  • Calls on the lock screen (i.e. an API for apps to draw over the lock screen)
  • Emergency calls
  • Haptic feedback
  • PIN Unlock
  • Adapt terminal keyboard layout for mobile, more custom keyboard layouts e.g. for URLs
  • Notifications revamp, including grouping and better actions
  • Flashlight quick settings toggle
  • Workspace reordering in the overview

There are also a few rough edges visually which need lower-level changes to fix:

  • Rounded thumbnails in the overview
  • Transparent panel
  • A way for apps to draw behind the top and bottom bars and the keyboard (to allow for glitch-free keyboard showing/hiding)

Help with any of the above would be highly appreciated!

How to try it

In addition to further development work there’s also the question of getting testing images. While the current version is definitely still work in progress, it’s quite usable overall, so we feel it would make sense to start having experimental GNOME OS Nightly images with it. There’s also postmarketOS, who are working to add builds of the mobile shell to their repositories.

The hardware question

The main question we’re being asked by everyone is “What device do I have to get to start using this?”, which at this stage is especially important for development. Unfortunately there’s not a great answer to this right now.

So far we used a Pinephone Pro sponsored by the GNOME Foundation to allow for testing, but unfortunately it’s nowhere near ready in terms of hardware enablement and it’s unclear when it will be.

The original Pinephone is much further along in hardware enablement, but the hardware is too weak to be realistically usable. The Librem 5 is probably the best option in both hardware support and performance, but it still takes a long time to ship. There are a number of Android phones that sort of work, but there unfortunately isn’t one that’s fully mainlined, performant enough, and easy to buy.

Thanks to the Prototype Fund

All of this work was possible thanks to the Prototype Fund, a grant program supporting public interest software by the German Ministry of Education (BMBF).

 

 

Towards GNOME Shell on mobile

As part of the design process for what ended up becoming GNOME 40 the design team worked on a number of experimental concepts, a few of which were aimed at better support for tablets and other smaller devices. Ever since then, some of us have been thinking about what it would take to fully port GNOME Shell to a phone form factor.

GNOME Shell mockup from 2020, showing a tiling-first tablet shell overview and two phone-sized screens
Concepts from early 2020, based on the discussions at the hackfest in The Hague

It’s an intriguing question because post-GNOME 40, there’s not that much missing for GNOME Shell to work on phones, even if not perfectly. A few of the most difficult pieces you need for a mobile shell are already in place today:

  • Fully customizable app grid with pagination, folders, and drag-and-drop re-ordering
  • “Stick-to-finger” horizontal workspace gestures, which are pretty close to what we’d want on mobile for switching apps
  • Swipe up gesture for navigating to overview and app grid, which is also pretty close to what we’d want on mobile

On top of that, many of the things we’re currently working towards for desktop are also relevant for mobile, including quick settings, the notifications redesign, and an improved on-screen keyboard.

Possible thanks to the Prototype Fund

Given all of this synergy, we felt this is a great moment to actually give mobile GNOME Shell a try. Thanks to the Prototype Fund, a grant program supporting public interest software by the German Ministry of Education (BMBF), we’ve been working on mobile support for GNOME Shell for the past few months.

Scope

We’re not expecting to complete every aspect of making GNOME Shell a daily driveable phone shell as part of this grant project. That would be a much larger effort because it would mean tackling things like calls on the lock screen, PIN code unlock, emergency calls, a flashlight quick toggle, and other small quality-of-life features.

However, we think the basics of navigating the shell, launching apps, searching, using the on-screen keyboard, etc. are doable in the context of this project, at least at a prototype stage.

Three phone-sized UI mockups, one showing the shell overview with multitasking cards, the second showing the app grid with tiny multitasking cards on top, and the third showing quick toggles with notifications below.
Mockups for some of the main GNOME Shell views on mobile (overview, app grid, system status area)

Of course, making a detailed roadmap for this kind of effort is hard and we will keep adjusting it as things progress and become more concrete, but these are the areas we plan to work on in roughly the order we want to do them:

  • New gesture API: Technical groundwork for the two-dimensional navigation gestures (done)
  • Screen size detection: A way to detect the shell is running on a phone and adjust certain parts of the UI (done)
  • Panel layout: Using the former, add a separate mobile panel layout, with a different top panel and a new bottom panel for gestures (in progress)
  • Workspaces and multitasking: Make every app a fullscreen “workspace” on mobile (in progress)
  • App Grid layout: Adapt the app grid to the phone portrait screen size, ideally as part of a larger effort to make the app grid work better at various resolutions (in progress)
  • On-screen keyboard: Add a narrow on-screen keyboard mode for mobile portrait
  • Quick settings: Implement the new quick settings designs

Current Progress

One of the main things we want to unlock with this project is the fully semantic two-dimensional navigation gestures we’ve been working towards since GNOME 40. This required reworking gesture recognition at a fairly basic level, which is why most of the work so far has been focused around unlocking this. We introduced a new gesture tracker and had to rewrite a fair amount of the input handling fundamentals in Clutter.

Designing a good API around this took a lot of iterations and there’s a lot of interesting details to get into, but we’ll cover that in a separate deep-dive blogpost about touch gesture recognition in the near future.

Based on the gesture tracking rework, we were able to implement two-dimensional gestures and to improve the experience on touchscreens quite a bit in general. For example, the on-screen keyboard now behaves a lot more like you’re used to from your smartphone.

Here’s a look at what this currently looks like on laptops (highly experimental, the second bar would only be visible on phones):

Some other things that already work or are in progress:

  • Detecting that we’re running on a phone, and disabling/adjusting UI elements based on that
  • A more compact app grid layout that can fit on a mobile portrait screen
  • A bottom bar that can act as handle for gesture navigation; we’ll definitely need this for mobile but it’s is also a potentially interesting future direction for larger screens

Taken together, here’s what all of this looks like on actual phone hardware right now:

Most of this work is not merged into Mutter and GNOME Shell yet, but there are already a few open MRs in case you’d like to dive into the details:

Next Steps

There’s a lot of work ahead, but going forward progress will be faster and more visible because it will be work on the actual UI, rather than on internal APIs. Now that some of the basics are in place we’re also excited to do more testing and development on actual phone hardware, which is especially important for tweaking things like the on-screen keyboard.

Photo of the app grid on a Pinephone Pro leaning against a wood panel.
The current prototype running on a Pinephone Pro sponsored by the GNOME Foundation

An Eventful Instant

Artist, gamers, rejoice! GNOME Shell 42 will let applications handle input events at the full input device rate.

It’s a long story

Traditionally, GNOME Shell has been compressing pointer motion events so its handling is synchronized to the monitor refresh rate, this means applications would typically see approximately 60 events per second (or 144 if you follow the trends).

This trait inherited from the early days of Clutter was not just a shortcut, handling motion events implies looking up the actor that is beneath the pointer (mainly so we know which actor to send the event to) and that was an expensive enough operation that it made sense to do with the lowest frequency possible. If you are a recurrent reader of this blog you might remember how this area got great improvements in the past.

But that alone is not enough, motion events can also end up handled in JS land, and it is in the best interest of GNOME Shell (and people complaining about frame loss) that we don’t need to jump into the JavaScript machinery too often in the course of a frame. This again makes sense to keep to a minimum.

Who wants it different?

Applications typically don’t care a lot about motion events, beyond keeping up with the frame rate. Others however have a stronger reliance on motion event data that this event compression is suboptimal.

Some examples where sending input events at the device rate matters:

  • Applications that use device input for velocity/direction/acceleration calculations (e.g. a drawing app applying a brush effect) want as much granularity as it is possible, compressing events is going to smooth values and tamper with those calculations.
  • Applications that render more often than the frame rate (e.g. games with vsync off) may spend multiple frames without seeing a motion event. Many of those are also timing sensitive, and not just want as much granularity as possible, but also want the events to be delivered as fast as possible.

How crazy is crazy?

As mentioned, events are now sent at the input device rate, but… what rate is that? This starts at tens of times per second on cheap devices, up to the lower hundred-or-so in your regular laptop touchpad, to the low hundreds on drawing tablets.

But enter the gamer, high end gaming mice have an input frequency of 1000Hz, which means there are approximately 16 events per frame (in the typical case of a 60Hz display) that must get through to the application ASAP. This usecase is significantly more demanding than the others, and not by a small margin.

A look under the hood

Having to look up the actor beneath the pointer 1000 times a second (16x as often) means it doesn’t suffice to avoid GPU based picking in favor of SIMD operations, there has to be a very aggressive form of caching as well.

To keep the required calculations to a minimum, Mutter now caches a set of rectangles that approximates the visible, uncovered area of the actor beneath the pointer. These are in the same coordinate space than input events so comparisons are direct. If the pointer moves outside the expressed region or the cache is dropped by other means (e.g. a relayout), the actor is looked up again and the new area cached.

This is of course most optimal when the actors are big, with pointer picking virtually dropping to 0 on e.g. fullscreen clients, but it helps even when blazing your pointer across many actors in the screen. Crossing a button from left to right can take a surprising amount of events.

But what about JavaScript? Would it maybe trigger a thousand times a second? Absolutely not, events as handled within Clutter (and GNOME Shell actors) are still motion compressed. This unthrottled event delivery only applies in the direction of Wayland clients.

There were other areas that got indirectly stressed by the many additional events, there’s been a number of optimizations across the board so it doesn’t turn bad even when Mutter is doing so much more.

How does it feel?

This is something you’d have to check for yourself. But we can show you how this looks!

Green is good.

This is a quick and dirty test application that displays timing information about the received events. Some takeaways from this:

  • Due to human limitations, it is next to impossible to produce a steady 1000Hz input rate for a full second. Moving the mouse left and right wastes precious milliseconds decelerating to accelerate again, even drawing the most perfect circle is too slow to have them need one event per millisecond. The devices are capable of shorter 1000Hz bursts though.
  • The separation between events (i.e. the time difference between the current and last events as received by the client) is predominantly sub-frame. There is only some larger separation when Mutter is busy putting images onscreen.
  • The event latency (time elapsed between emission by the hw/kernel and reception by the application) is <2ms in most cases. There are surely a few events that take a longer time to the application, but it is essentially noise.

Gamers (and other people that care about responsiveness) should notice this as “less janky”.

Didn’t drawing tablets have this before?

Yes and no. Mutter skipped motion compression altogether for drawing tablets, since the applications interested in these really preferred the extra events despite the drawbacks. With these changes in place, drawing tablet users will purely benefit of the improved performance.

Why so loooong

If you have been following GNOME Shell development, you might have heard about this change before. Why it took so long to have this merged?

The showstopper was probably what you would suspect the least: applications that are not handling events. If an application is not reading events in time (is temporarily blocking the main loop, frozen, slow, in a breakpoint, …), these events will queue up.

But this queue is not infinite, the client would eventually be shutdown by the compositor. With these input devices that could take a long… less than half a second. Clearly, there had to be a solution in place before we rolled this in.

There’s been some back and forth here, and several proposed solutions. The applied fix is robust, but unfortunately still temporary, a better solution is being proposed at the Wayland library level but it’s unlikely to be ready before GNOME 42. In the mean time , users can happily shake their input devices without thinking how many times a second is enough.

Until the next adventure!