programs won’t start

February 4, 2016

So recently I got pointed to an aging blocker bug that needed attention, since it negatively affected some rawhide users: they weren’t able to launch certain applications. Three known broken applications were gnome-terminal, nautilus, and gedit. Other applications worked, and even these 3 applications worked in wayland, but not Xorg. The applications failed with messages like:

Gtk-WARNING **: cannot open display:

and

org.gnome.Terminal[2246]: Failed to parse arguments: Cannot open display:

left in the log. These messages means that the programs are unable to create a connection to the X server. There are only a few reasons this error message could get displayed:

    — The socket associated with the X server has become unavailable. In the old days this could happen if, for instance, the socket file got deleted from /tmp. Adam Jackson fixed the X server a number of years ago, to also listen on abstract sockets to avoid that problem. This could also happen if SELinux was blocking access to the socket, but users reported seeing the problem even with SELinux put in permissive mode.
    — The X server isn’t running. In our case, clearly the X server is running since the user can see their desktop and launch other programs
    — The X server doesn’t allow the user to connect because that user wasn’t given access, or that user isn’t providing credentials. These programs are getting run as the same user who started the session, so that user definitely has access.
    — GDM doesn’t require users to provide separate credentials to use the X server, so that’s not it either.
    — $DISPLAY isn’t set, so the client doesn’t know which X server to connect to. This is the only likely cause of the problem. Somehow $DISPLAY isn’t getting put in the environment of these programs.

So the next question is, what makes these applications “special”? Why isn’t $DISPLAY set for them, but other applications work fine? Every application has a .desktop file associated with it, which is a small config file giving information about the application (name, icon, how to run it, etc). When a program is run by gnome-shell, gnome-shell uses the desktop file of that program to figure out how to run it. Most of the malfunctioning programs have this in their desktop files:


DBusActivatable=true

That means that the shell shouldn’t try to run the program directly, instead it should ask the dbus-daemon to run the program on the shell’s behalf. Incidentally, the dbus-daemon then asks systemd to run the program on the dbus-daemon’s behalf. That has lots of nice advantages, like automatically integrating program output to the journal, and putting each service in its own cgroup for resource management. More and more programs are becoming dbus activatable because it’s an important step toward integrating systemd’s session management features into the desktop (though we’re not fully there yet, that initiative should become a priority at some point in the near-to-mid future). So clearly the issue is that the dbus-daemon doesn’t have $DISPLAY in its activation environment, and so programs that rely on D-Bus activation aren’t able to open a display connection to the X server. But why?

When a user logs in, GDM will start a dbus-daemon for that user before it starts the user session. It explicitly makes sure that DISPLAY is in the environment when it starts the dbus-daemon so things should be square. They’re obviously not, though, so I decided to try to reproduce the problem. I turned off my wayland session and instead started up an Xorg (actually I used a livecd since I knew for sure the livecd could reproduce the problem) and then looked at a process listing for the dbus-daemon:


/usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation

This wasn’t run by GDM ! GDM uses different command line arguments that these when it starts the dbus-daemon. Okay, so if it wasn’t getting started by GDM it had to be getting started by the systemd during the PAM conversation right before GDM starts the session. I knew this, because there isn’t really thing other than systemd that runs after the user hits enter at the login screen before gdm starts the user’s session. Also, the command line arguments above in the dbus-daemon instance say ‘–systemd-activation’ which is pretty telling. Furthermore, if a dbus-daemon is already running GDM will avoid starting a second one, so this all adds up. I was surprised that we were using the so called “user bus” instead of session bus already in rawhide. But, indeed, running


$ systemctl --user status dbus.service
● dbus.service - D-Bus User Message Bus
Loaded: loaded (/usr/lib/systemd/user/dbus.service; static; vendor preset: enabled)
Active: active (running) since Tue 2016-02-02 15:04:41 EST; 2 days ago

show’s we’re clearly starting the dbus-daemon before GDM starts the session. Of course, this poses the problem. The dbus-daemon can’t possibly have DISPLAY set in its environment if it’s started before the X server is started. Even if it “wanted” to set DISPLAY it couldn’t even know what value to use, since there’s no X server running yet to tell us the DISPLAY !

So what’s the solution? Many years ago I added a feature to D-Bus to allow a client to change the environment of future programs started by the dbus-daemon. This D-Bus method call, UpdateActivationEnvironment, takes a list of key-value pairs that are just environment variables which get put in the environment of programs before they’re activated. So, the fix is simple, GDM just needs to update the bus activation environment to include DISPLAY as soon as it has a DISPLAY to include.

Special thanks to Sebastian Keller who who figured out the problem before I got around to investigating the issue.

2 Responses to “programs won’t start”

  1. Michael Catanzaro Says:

    Keep blogging!

  2. Cole Robinson Says:

    Nice write up!


Leave a Reply to Michael Catanzaro