Input Focus

The details of behind input focus and X/GTK+ have always confused the
hell out of me. Its all fine and dandy when you only have to think
about GTK+ focus, but whenever I had to think about the interaction
between GTK+, the window manager and what’s actually happening at the
Xlib/Xserver level, by brain used to go to mush. I’d barely figure
out the bits neccessary to fix whatever bug I was up against and
promptly forget it all again five minutes later.

Well, this morning I have to get focus handling working with Xnest
embedded in a GTK+ window. So, I figure I’m really going to have to
understand it this time. Here’s some of the details:

In order for any X window to receive events of a certain type,
you must call XSelectInput() on that window with the
appropriate event mask.
When a key event is generated, the Xserver tries to find a
client and window to deliver the event to. It starts with the
window which contains the pointer and recurses up through its
ancestors until it finds a window with that event selected.
X has the notion of “the keyboard focus window”. This is set
using XSetInputFocus(). When a key event is generated,
the event is propopagated as normal if the focus window contains
the pointer, but propogation stops at the focus window. If the
focus window doesn’t contain the pointer, the event is
delivered directly to the focus window.
What’s important here is this has nothing to do with GTK+
keyboard focus. Its more about which toplevel window is
currently focused by the window manager, rather than which
widget is focused within the application. The XEmbed
spec more or less redefines this as the window’s “activation
state” – i.e. if a toplevel or its descendants is the current
keyboard focus window then the toplevel is said to be active.
None of this really reflects the way modern desktops and
toolkits work. What happens in reality is that applications
never focus themselves (i.e. XSetInputFocus()) unless
the window manager tells it to using the WM_TAKE_FOCUS ICCCM
ClientMessage.
On receipt of this message GTK+ makes a 1 pixel square window,
located just outside the visible area of the toplevel
window, be the keyboard focus window. That causes all
KeyPresses to always go straight to this window (the window
doesn’t have any descendants which can contain the pointer).
When this window receives an X KeyPress event, GTK+ then
generates a GTK key press event (with the toplevel as the
target window) and puts that on the GTK event queue.
At this point the event is entirely in the hands of GTK+. X
has wiped its hands of the whole affair.
Each toplevel GtkWindow knows which widget within the window
is currently focused. All the toplevel now needs to do is
send that event onto the currently focused widget.

One last little interesting detail is how the window manager
implements click-to-focus:

The WM establishes a pasive grab on each unfocused toplevel
window using XGrabButton()
A passive grab is where events are delivered as normal until a
specific key or button combination is pressed and an active grab
is established causing the event (and following events) to be
delived to the grabbing client.
The WM passes GrabModeSync to XGrabButton()
which causes all event delivery to freeze when the specific
key/button combination is pressed.
So, when a user clicks on an unfocused window, all subsequent
events are queued in the Xserver, the WM gets the ButtonPress,
focuses the toplevel of the window which was clicked in and
releases the event queue again using XAllowEvents()

In case its not obvious, I’m only really writing this down so there’s
less chance of me forgetting it all again 🙂