Ensuring steady frame rates with GPU-intensive clients

On Wayland, a surface is the basic primitive used to build what users refer to as a “window”. Wayland clients define their contents by attaching buffers to surfaces. This turns the contents of the buffer into the current surface contents. Wayland clients are free to attach a new buffer to a surface anytime. When a Wayland compositor like Mutter starts working on a new output frame, it picks the latest available buffer for each visible surface. This is called “mailbox semantics” (the buffers are metaphorical letters falling into a mailbox, the visible “letter” is the last one on top).

Problem

With hardware accelerated drawing, a client normally attaches a new buffer to a surface right after it finished calling OpenGL/Vulkan/<insert your favourite drawing API> APIs to define the contents of the buffer. When the compositor processes the protocol requests attaching the buffer to the surface, the GPU generally hasn’t finished drawing to the buffer yet.

Since the contents of the compositor’s output frame depend on the contents of each visible surface, the former cannot complete before the GPU finishes drawing to each of the picked surface buffers (and subsequently to the compositor’s own output buffer, in the general case).

If the GPU does not finish drawing in time for the next display refresh cycle, the compositor’s output frame misses that cycle and is delayed by at least the duration of one refresh cycle. This can be noticeable as judder/stutter, because the compositor’s frame rate is reduced, and the contents of some frames are not consistent with the timing when they become visible.

The likelihood of that happening depends largely on the clients, mainly on how long it takes the GPU to draw their buffer contents and how much time lies between when a client starts drawing to its buffer and when the compositor starts working on its resulting output frame.

In summary, a Wayland compositor can miss a display refresh cycle because the GPU failed to finish drawing to a client buffer in time.

This diagram visualizes a normal and problematic case:

Left side: normal case, right side: problematic case
Left side: normal case, right side: problematic case

Solution

Basic idea

The basic idea is simple: the compositor considers a client buffer “available” per the mailbox semantics only once the GPU finishes drawing to it. Until then, it picks the previously available buffer.

Complications

Now if it was as simple as that might sound, there would be no need to write a >1000-word article about it. 🙂

The main thing which makes things more complicated is that, together with attaching a new buffer, various other surface states can be modified in the same commit. All state changes in the same commit must be applied atomically, i.e. the user must either see all or none of them (per Wayland’s “every frame is perfect” motto). For an example, there are various states which affect how a Wayland surface is scaled for display. Attaching a new buffer and changing the scaling state in the same commit ensures that the surface always appears consistently. If the buffer size and scaling state were to change independently, the surface might intermittently appear in the wrong size.

As if that wasn’t complicated enough, Wayland has so-called synchronized sub-surfaces. State changes for a synchronized sub-surface are not applied immediately, but only the next time any state changes are applied for its parent surface. Conceptually, one can think of the committed sub-surface state becoming part of the parent surface’s state commit. Again, all state combined like this between sub-surfaces (which can be nested, i.e. a sub-surface can be the parent of another sub-surface) and their parents must be applied atomically, all or nothing, to ensure that sub-surfaces and their parents always appear consistently as a whole.

This means that the compositor cannot simply wait for the GPU to finish drawing to client buffers, while applying other corresponding surface state immediately. It needs to stage the committed state changes somehow, and actually apply them only once the GPU has finished drawing to all new buffers attached in the same combined state commit.

Enter transactions

The idea for “stage somehow” is to introduce the concept of a transaction, which combines a set of state changes for one or multiple (sub-)surfaces. When a client commits a set of state changes for a surface, they are inserted into an appropriate transaction; either a new one or an existing one, depending on circumstances.

When the committed state changes should get applied per Wayland protocol semantics, the transaction is committed and inserted into a queue of committed transactions. The queue is ordered such that for any given surface,  state commits are applied in the same order as they were committed by the client. This ensures that the contents of a surface never appear to “move backwards” because one transaction affecting the surface managed to “overtake” another one.

A transaction is considered ready to be applied only once both of these conditions are true:

  1. It’s the oldest (closest to the queue head) transaction in the queue for all surfaces it carries state for.
  2. The GPU has finished drawing to all client buffers attached in the transaction.

Once both of these conditions are true, the transaction is applied atomically. From that point on, the compositor uses the state in the transaction for its output frames.

Results

I implemented the solution described above in Mutter merge request !1880, which was merged for the GNOME 44 release. While it went under the radar of news outlets, I hope that many of you will notice the benefits!

One situation where the benefits of transactions can be noticed is interactive OpenGL applications such as games, with “vsync” disabled (e.g. for better input → output latency), you should be less likely to see stuttering due to Mutter missing a display refresh cycle, in particular in fullscreen and if Mutter can use direct scanout of client buffers.

If the GPU & drivers support true high priority EGL contexts which can preempt lower priority ones (as of this writing, this is true e.g. with “not too old” Intel GPUs), Mutter can now sustain full frame rate even if clients are GPU-bound to lower frame rates, as demonstrated in this video:

Even if the GPU & drivers do not support this, Mutter should now get bogged down less by such heavy clients, in particular the mouse cursor.

It’s effective for X clients running via Xwayland as well, not only for native Wayland clients.

Long term, all major Wayland compositors will want to do something like this. gamescope already does.

Thanks

It took almost two years (on and off, not full-time) from having the initial idea, deciding to try implementing it myself, until finally getting it ready to be merged. I wasn’t very familiar with the Mutter code or Wayland protocol semantics when I started, so I couldn’t have done it without a lot of help from many Mutter and Wayland developers. I am deeply grateful to all of you.

Thanks to Jakub Steiner for the featured image and to Niels De Graef for the diagram of this post.

I would also like to thank Red Hat for giving me the opportunity to work on this, even though “Mutter developer” isn’t really a central part of my job description.