Threaded input adventures

Come around and gather, in this article we will talk about how Mutter got an input thread in the native backend.

A trip down memory lane

Mutter wasn’t always a self-contained compositor toolkit, in the past it used to rely on Clutter and Cogl libraries for all the benefits usually brought by toolkits: Being able to draw things on screen, and being able to receive input.

In the rise of Wayland, that reliance on an external toolkit drove many of the design decisions around input management, usually involving adding support in the toolkit, and the necessary hooks so Mutter could use or modify the behavior. It was unavoidable that both sides were involved.

Later on, Mutter merged its own copies of Clutter and Cogl, but the API barrier stayed essentially the same at first. Slowly over time, and still ongoing, we’ve been refactoring Mutter so all the code that talks to the underlying layers of your OS lives together in src/backends, taking this code away from Clutter and Cogl.

A quick jump to the near past

However, in terms of input, the Clutter API barrier did still exist for the most part, it was still heavily influenced by X11 design, and was pretty much used as it was initially designed. Some examples, no special notoriety or order:

We still forwarded input axes in a compact manner, that requires querying the input device to decode event->motion.axes[3] positions into CLUTTER_INPUT_AXIS_PRESSURE. This space-saving peculiarity comes straight from XIEvent and XIQueryDevice.
Pointer constraints were done by hooking a function that could impose the final pointer position.
Emission of wl_touch.cancel had strange hooks into libinput event handling, as the semantics of CLUTTER_TOUCH_CANCEL varied slightly.

Polishing these interactions so the backend code stays more self-contained has been a very significant part of the work involved.

Enter the input thread

The main thread context is already a busy place, in the worst case (and grossly simplified) we:

Dispatch several libinput events, convert them to ClutterEvents
Process several ClutterEvents across the stage actors, let them queue state changes
Process the frame clock
- Relayout
- Repaint
Push the Framebuffer changes

All in the course of a frame. The input thread takes the first step out of that process. For this to work seamlessly the thread needs a certain degree of independence, it needs to produce ClutterEvents and know where will the pointer end up without any external agents. For example:

Device configuration
Pointer barriers
Pointer locks/constraints

The input thread takes over all these. There is of course some involvement from the main thread (e.g. specifying what barriers or constraints are in effect, or virtual input), but these synchronization points are either scarce, or implicitly async already.

The main goal of the input thread is to provide the main thread with ClutterEvents, with the fact that they are produced in a distinct thread being irrelevant. In order to do so, all the information derived from them must be independent of the input thread state. ClutterInputDevice and ClutterInputDeviceTool (representing input devices and drawing tablet tools) are consequently morphing into immutable objects, all changes underneath (e.g. configuration) are handled internally in the input thread, and abstracted away in the emitted events.

“The Dark Side of the Loom” by aldoaldoz is licensed under CC BY-NC-SA 2.0

What it brings today

Having a thread always ready to dispatch libinput may sound like a small part in the complexity involved to give you a new frame, but it does already bring some benefits:

Libinput events are always dispatched ASAP, so this will mean less “client bug: event processing lagging behind by XXms” messages in the journal.
Input handling not being possibly stalled by the rest of the operations in the main thread means fewer awkward situations where we don’t process events in time (e.g. a key release stopping key repeat, at least in the compositor side).
With the cursor logical position being figured alone by the input thread, updating the cursor plane position to reflect the most up-to-date position for the next frame does simply require asking the input thread for it.
Generally, a tidier organization of input code where fewer details leak outside the backend domain.

What it does not bring (yet)

Code is always halfways to a better place, the merged work does not achieve yet everything that could be achieved. Here’s some things you shouldn’t expect to see fixed yet:

The main thread is still in charge of KMS, and updating the cursor plane buffer and position. This means the pointer cursor will still freeze if the main thread stalled, despite the input thread handling events underneath. In the future, There would be another separate thread handling atomic KMS operations, so it’d be possible for the input and KMS threads to talk between them and bypassing any main thread stalls.
The main thread still has some involvement in handling of Ctrl+Alt+Fn, should you need to switch to another TTY while hard-locked. Making it fully handled in the input thread would be a small nicety for developers, perhaps a future piece of work.
Having an input handling that is unblocked by almost anything else is a prerequisite for handling 1000Hz mice and other high-frequency input devices. But the throttling behavior towards those is unchanged, a better behavior should be expected in the short term.

Conclusions

We’ve so far been working really hard in making Mutter as fast and lock free as possible. This is the first step towards a next level in design that is internally protective against stall situations.