Understanding GNOME Shell’s focus stealing prevention

Focus stealing prevention exists for two main reasons: One is security, since we need to prevent rogue apps from deceiving users into e.g. typing their password into another window. If apps can silently claim keyboard focus and open their own window over the currently focused one, this enables phishing and other similar attacks. The other is user experience: Even if an app isn’t maliciously taking over your focus, it can be annoying to have a new window popping up while you’re typing something and have half your sentence end up in the wrong app.

At the same time there are cases where you want apps to be able to request focus, for example when clicking a link in a chat app and wanting it to open in the browser. In this case you want the focus to move to the browser window.

This is why our compositor library mutter implements focus stealing prevention mechanisms, which allow the currently focused app to request that a specific other app be allowed to claim focus now.

<App> is ready??

Most users have probably seen an “<App> is ready” notification in GNOME Shell at some point. Unfortunately this notification doesn’t really explain why it’s being shown and what’s happening, which may cause confusion.

Because of this there have been proposals to disable focus stealing prevention until it works better (mutter issue 673), and a number of GNOME Shell extensions).

Screenshot of a GNOME Shell notification showing that Telegram Desktop Media viewer is ready

These are the main cases where the notification is shown:

  •  A new window is opened and either the launcher app, or the launched app doesn’t implement the XDG Activation protocol or the startup notification specification
  •  An app requests focus for one of its windows, but was not activated in a valid way (e.g. because it wasn’t started by a user action)
  • An app requests focus for a new window, but it’s slow to start and in the meantime there are additional user interactions. In this case we don’t want to interrupt, and show the notification so people can switch at their convenience.
  • An app is launched from an environment that isn’t able to use the XDG Activation protocol (e.g. a terminal)

The protocol responsible for this, XDG Activation, the Wayland equivalent to the X11-specific startup notification spec was introduced somewhat recently (2020), and needs to be adopted by UI toolkits. GNOME 46 and 47 saw a few fixes and the feature was polished both in the client toolkit side (GTK and xdg-desktop-portal, as well as in the compositor implementation mutter, but there are still cases where XDG activation isn’t hooked up properly.

How XDG activation works

Flow xdg activation protocol.
XDG activation flow for moving focus between two existing windows

The way the protocol works is that the currently focused app asks the compositor to create a token linked to the focused window (Wayland surface) and the most recent user interaction (an input event serial associated with a seat).

This token is then used by the app that should receive focus when it requests to be activated. In GNOME Shell, activation means that the the window receives focus and is placed on top of other windows. An activation token may still be rejected, for example if the window linked to the token doesn’t have focus or when the linked user interaction isn’t recent enough.

In addition to handling focus, GNOME Shell also tracks app launching. Until the new app window is actually shown, GNOME Shell uses a “loading spinner” mouse cursor to indicate to the user that the app is loading. If the app doesn’t implement the XDG Activation protocol, the loading indicator only disappears after a timeout because GNOME Shell doesn’t know that the application finished loading and has presented the target window.

The protocol doesn’t define how tokens are given to the target app. One reason for this is because it depends on how the app is started. The main options are:

  • Setting the XDG_ACTIVATION_TOKEN environment variable
  • D-Bus Activation using the platform-data field, which contains the activation token
  • XDG portals that will launch an app (e.g. the OpenURI or OpenFile portals)

The target app then needs to collect the token and use it to have its window activated to receive focus and to signal to the compositor that it started successfully.

Not smart enough

When I started looking into how our focus prevention mechanism works to investigate the issues mentioned above, I was initially pretty confused. There were a lot of cases where the focus window switch worked fine, but other times it wouldn’t. I realized quickly that with existing windows, the “<App> is ready” notification is shown, but new window would get focus immediately.

This struck me as odd: Why are new windows allowed to do whatever, but existing windows are restricted in the way they can take over focus?

I first thought this was some sort of bug, but then I discovered that the behavior was by design: Mutter has a gsettings property called focus-new-windows that controls the focus stealing prevention mechanism. This property can be strict or smart (the latter being the default).

  • smart means that in most cases new windows get focus (even without asking for it) and are raised to the top of the window stack
  • strict means they get focus (are “activated”, in technical terms) only when they are actually supposed to

The smart mode exists in part because there are some cases where our current focus prevention system does not work well. These issues include:

  • Launching apps via terminal (vte issue #2788). The main issue is that the terminal executing a command does not know whether that process will present a window or not. For example, if you launch vim there’s no new window, but if you launch firefox there is.
  • Launching apps via Run a Command in GNOME Shell (gnome-shell issue #7704) shares similar issues as running apps from the terminal
  • Apps launched via custom keyboard shortcut (e.g. set up in Settings > Keyboard > Keyboard Shortcuts)
  • The lack of implementation of the appropriate protocols in apps or toolkits

Because the cases where a new window is opened are a significant percentage of the overall cases where focus prevention is triggered, this smart mode is making it appear as though apps actually implement the XDG Activation protocol, even if they don’t. While it does somewhat reduce annoyance for users, it gives developers the false impression that they don’t have to do anything.

It also makes it harder to debug issues where something doesn’t work as expected or is missing the correct implementation. For example, even in GTK4 the focus transferring is broken in some cases and took a long time to be discovered (gtk issue #6711).

Security implications

Unfortunately the current situation with smart as the default means that we’re not getting most of the benefits of focus stealing prevention. Apps are able to spawn a new window over your current one and grab keyboard focus, because the smart mode just gives the new window focus, circumventing the safety measures. This is trivial to exploit by malicious apps: All they need to do is open a new window, and focus stealing prevention doesn’t apply.

Next steps

While some people have asked for focus stealing prevention to be disabled completely until it’s implemented by most apps and toolkits, I’m not sure this is the best way forward. If we did that, nobody would notice which apps don’t implement it, so there’d be no reason for toolkits to do so.

On the other hand, there are some remaining issues around terminal applications and similar use cases that we don’t have a plan for yet, so just switching to strict to flush out app bugs isn’t ideal either at the moment.

  • There is currently no consensus in the team as to how to proceed. The two main directions we could take are:
  • Switch to strict mode by default (mutter issue #3486) once a few remaining issues are resolved, perhaps with a “flag day” deadline so apps have time to implement it.
  • Slowly make the smart mode stricter over time.

Either way we need to raise more awareness of the issue to get app and toolkit developers interested in improving things in this area, which this blogpost is a part of 🙂

It’d also be helpful if more people (especially developers) turn on strict mode on their system, so we get more testing for which apps work and which don’t. This is the relevant gsetting:

gsettings set org.gnome.desktop.wm.preferences focus-new-windows 'strict'

Thanks

Thanks to the Sovereign Tech Fund for allowing me to take the time to properly work through this as part of my broader effort around improving notifications. Thanks also to Sonny Piers and Tobias Bernard for organizing the STF project, Florian Müllner, Sebastian Wick, Carlos Garnacho, and the rest of the GNOME Shell team for reviewing my MRs, and Jonas Dreßler and Jonas Ådahl for reviewing the blogpost.

Notifications in 46 and beyond

One of the things we’re tackling as part of the STF infrastructure initiative is improving notifications. Other platforms have advanced significantly in this area over the past decade, while we still have more or less the same notifications we had since the early GNOME 3 days, both in terms of API and feature set. There’s plenty to do here 🙂

The notification drawer on GNOME 45

Modern needs

As part of the effort to port GNOME Shell to mobile Jonas looked into the delta between what we currently support and what we’d need for a more modern notification experience. Some of these limitations are specific to GNOME’s implementation, while others are relevant to all desktops.

Tie notifications to apps

As of GNOME 45 there’s no clear identification on notification bubbles which app they were sent by. Sometimes it’s hard to tell where a notification is coming from, which can be annoying when managing notifications in Settings. This also has potential security implications, since the lack of identification makes it trivial to impersonate other apps.

We want all notifications to be clearly identified as coming from a specific app.

Global notification sounds

GNOME Shell can’t play notification sounds in all cases, depending on the API the app is using (see below). Apps not primarily targeting GNOME Shell directly tend to play sounds themselves because they can’t rely on the system always doing it (it’s an optional feature of the XDG Notification API which different desktops handle differently). This works, but it’s messy for app developers because it’s hard to test and they have to implement a fallback sound played by the app. From a user perspective it’s annoying that you can’t always tell where sounds are coming from because they’re not necessarily tied to a notification bubble. There’s also no central place to manage the notification behavior and it doesn’t respect Do Not Disturb.

Notification grouping

Currently all notifications are just added to a single chronological list, which gets messy very quickly. In order to limit the length of the list we only keep the latest 3 notifications for every app, so notifications can disappear before you have a chance to act on them.

Other platforms solve this by grouping notifications by app, or even by message thread, but we don’t have anything like this at the moment.

Notifications grouped by app on the iOS lock screen

Expand media support

Currently each notification bubble can only contain one (small) image. It’s mostly used for user avatars (for messages, emails, and the like), but sometimes also for actual content (e.g. a thumbnail for the image someone sent).

Ideally what we want is to be able to show larger images in addition to avatars, as the actual content of the notification.

As of GNOME 45 we only have a single slot for images on notifications, and it’s too small for actual content.
Other platforms have multiple slots (app icon, user avatar, and content image), and media can be expanded to much larger sizes.

There’s also currently no way to include descriptive text for images in notifications, so they are inaccessible to screen readers. This isn’t as big a deal with the current icons since they’re small and mostly used for ornamental purposes, but will be important when we add larger images in the body.

Updating notification content

It’s not possible for apps to update the content inside notifications they sent earlier. This is needed to show progress bars in notifications, or updating the text if a chat message was modified.

How do we get there?

Unfortunately, it turns out that improving notifications is not just a matter of standardizing a few new features and implementing them in GNOME Shell. The way notifications work today has grown organically over the years and the status quo is messy. There are three different APIs used by apps today: XDG Notification, Gio.Notification, and XDG Portal.

How different notification APIs are used today

XDG Notification

This is the Freedesktop specification for a DBus interface for apps to send notifications to the system. It’s the oldest notification API still in use. Other desktops mostly use this API, e.g. KDE’s KNotification implements this spec.

Somewhat confusingly, this standard has never actually been finalized and is still marked as a draft today, despite not having seen significant changes in the past decade.

Gio.Notification

This is an API in GLib/Gio to send notifications, so it’s only used by GTK apps. It abstracts over different OS notification APIs, primarily the XDG one mentioned above, a private GNOME Shell API, the portal API, and Cocoa (macOS).

The primary one being used is the private DBus interface with GNOME Shell. This API was introduced in the early GNOME 3 days because the XDG standard API was deemed too complicated and was missing some features (in particular notifications were not tied to a specific app).

When using Gio.Notification apps can’t know which backend is used, and how a notification will be displayed or behave. For example, notifications can only persist after the app is closed if the private GNOME Shell API is used. These differences are specific to GNOME Shell, since the private API is only implemented there.

XDG Portal

XDG portals are secure, standardized system APIs for the Linux desktop. They were introduced as part of the push for app sandboxing around Flatpak, but can (and should) be used by non-sandboxed apps as well.

The XDG notification portal was inspired by the private GNOME Shell API, with some additional features from the XDG API mixed in.

XDG portals consist of a frontend and a backend. In the case of the notification portal, apps talk to the frontend using the portal API, while the backend talks to the system notification API. Backends are specific to the desktop environment, e.g. GNOME or KDE. On GNOME, the backend uses the private GNOME Shell API when possible.

The plan

From the GNOME Shell side we have the XDG API (used by non-GNOME apps), and the private API (used via Gio.Notification by GNOME apps). From the app side we additionally have the XDG portal API. Neither of these can easily supersede the others, because they all have different feature sets and are widely used. This makes improving our notifications tricky, because it’s not obvious which of the APIs we should extend.

After several discussions over the past few months we now have consensus that it makes the most sense to invest in the XDG portal API. Portals are the future of system APIs on the free desktop, and enable app sandboxing. Neither of the other APIs can fill this role.

Our plan for notification APIs going forward: Focus on the portal API

This requires work in a number of different modules, including the XDG portal spec, the XDG portal backend for GNOME, GNOME Shell, and client libraries such as Gio.Notification (in GLib), libportal, libnotify, and ashpd.

In the XDG portal spec, we are adding support for a number of missing features:

  • Tying notifications to apps
  • Grouping by message thread
  • Larger images in the notification body
  • Special notifications for e.g. calls and alarms
  • Clearing up some instances of undefined behavior (e.g. markup in the body, playing sounds, whether to show notifications on the lock screen, etc.)

This is the draft XDG desktop portal proposal for the spec changes.

On the GNOME Shell side, these are the primary things we’re doing (some already done in 46):

  • Cleanups and refactoring to make the code easier to work on
  • Improve keyboard navigation and screen reader accessibility
  • Header with app name and icon
  • Show full notification body and buttons in the drawer
  • Larger notification icons (e.g. user avatars on chat notifications)
  • Group notifications from the same app as a stack
  • Allow message threads to be grouped in a single notification bubbles
  • Larger images in the notification body
Mockups of what we’d ideally want, including grouping by app, threading, etc.

There are also animated mockups for some of this, courtesy of Jakub Steiner.

The long-term goal is for apps to switch to the portal API and deprecate both of the others as application-facing APIs. Internally we will still need something to communicate between the portal backend and GNOME Shell, but this isn’t public API so we’re much more flexible here. We might expand either the XDG API or the private GNOME Shell protocol for this purpose, but it has not been decided yet how we’ll do this.

What we did in GNOME 46

When we started the STF project late last year we thought we could just pull the trigger on a draft proposal Jonas for an API with the new capabilities needed for mobile. However, as we started discussing things in more detail we realized that this was the the wrong place to start. GNOME Shell already didn’t implement a number of features that are in the XDG notification spec, so standardizing new features was not the main blocker.

The code around notifications in GNOME Shell has grown historically and has seen multiple major UI redesigns since GNOME 3.0. Additional complexity comes from the fact that we try to avoid breaking extensions, which means it’s difficult to e.g. change function names or signatures. Over time this has resulted in technical debt, such as weird anachronistic structures and names. It was also not using many of the more recent GJS features which didn’t exist yet when this code was written originally.

Anyone remember that notifications used to be on the bottom? This is what they looked like in GNOME 3.6 (2012).

As a first step we restructured and cleaned up legacy code, ported it to the most recent GJS features, updated the coding style, and so on. This unfortunately means extensions need to be updated, but it puts us on much firmer ground for the future.

With this out of the way we added the first batch of features from our list above, namely adding notification headers, expanding notifications in the drawer, larger icons, and some style fixes to icons. We also fixed a very annoying issue with “App is ready” notifications not working as expected when clicking a notification (!3198 and !3199).

We also worked on a few other things that didn’t make it in time for 46, most notably grouping notifications by app (which there’s a draft MR for), and additionally grouping them by thread (prototype only).

Throughout the cycle we also continued to discuss the portal spec, as mentioned above. There are MRs against against XDG desktop portal and the libportal client library implementing the spec changes. There’s also a draft implementation for the GTK portal backend.

Future work

With all the groundwork laid in GNOME 46 and the spec draft mostly ready we’re in a good position to continue iterating on notifications in 47 and beyond. In GNOME 47 we want to add some of the first newly spec’d features, in particular notification sounds, markup support in the body, and display hints (e.g. showing on the lock screen or not).

We also want to continue work on the UI to unlock even more improvements in the future. In particular, grouping by app will allow us to drop the “only keep 3 notifications per app” behavior and will generally make notifications easier to manage, e.g. allowing to dismiss all notifications from a given app. We’re also planning to work on improving keyboard navigation and ensuring all content is accessible to screen readers.

Due to the complex nature of the UI for grouping by app and the many moving parts with moving forward on the spec it’s unclear if we’ll be able to do more than this in the scope of STF and within the 47 cycle. This means that additional features that require the new spec and/or lots of UI work, such as grouping by thread and custom UI for call or alarm notifications will probably be 48+ material.

Conclusion

As we hope this post has illustrated, notifications are way more complex than they might appear. Improving them requires untangling decades of legacy stuff across many different components, coordinating with other projects, and engaging with standards bodies. That complexity has made this hard to work on for volunteers, and there has not been any recent corporate interest in the area, which is why it has been stagnant for some time.

The Sovereign Tech Fund investment has allowed us to take the time to properly work through the problem, clean up technical debt, and make a plan for the future. We hope to leverage this momentum over the coming releases, for a best-in-class notification experience on the free desktop. Stay tuned 🙂