When is an exit code not an exit code?

TL;DR: I found an interesting bug in flatpak-spawn which taught me that there is a difference between the exit code you pass to exit(), the exit status reported by waitpid(), and the shell variable $?.

One of the goals of Flatpak is to isolate applications from the host system; they can normally only directly run external programs supplied by the Flatpak platform they are built against, rather than whatever executables happen to be installed on the host. But some developer tools do need to be able to run commands on the host system. One example is GNOME Builder, which allows you to compile software on the host; another is flatpak-builder which uses this to build flatpak:s from within a flatpak. (For my part, I’m occasionally working on making Bustle run pkexec dbus-monitor --system on the host, to allow reading all messages on the system bus (a privileged operation) from an unprivileged, sandboxed application. More on this in a future blog post.)

Flatpak’s session helper provides a D-Bus API to do this: a HostCommand method that launches a given command outside the sandbox and returns its process ID; and a HostCommandExited signal which is emitted when the process exists, with its exit status as a uint32. Apps can use this D-Bus API directly, but recent versions of the common runtimes include a wrapper command which is much easier to adapt existing code to use: just replace cat /etc/passwd with flatpak-spawn --host cat /etc/passwd.

In theory, flatpak-spawn --host propagates the exit status from the command it runs, but I found that in practice, it did not. For example, false is a program which does nothing, unsuccessfully:

$ false; echo exit status: $?
1

But when run via flatpak-spawn --host, its exit status is 0:

$ flatpak run --env='PS1=sandbox$ ' \
> --talk-name=org.freedesktop.Flatpak \
> --command=bash org.freedesktop.Sdk//1.6
sandbox$ flatpak-spawn --host false; echo exit status: $?
0

If you care whether the command you launched succeeded, this is problematic! The first clue to what’s going on is in the output of flatpak-spawn --verbose:

sandbox$ flatpak-spawn --verbose --host false; echo exit status: $?
F: child_pid: 18066
F: child exited 18066: 256
exit status: 0

Here’s the code, from the HostCommandExited signal handler:

g_variant_get (parameters, "(uu)", &client_pid, &exit_status);
g_debug ("child exited %d: %d", client_pid, exit_status);

if (child_pid == client_pid)
  exit (exit_status);

So exit_status is 256, even though false actually returns 1. If you read man 3 exit, you will learn:

void exit(int status);

The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).

256 == 0x0100 and 0377 == 0x00ff; so exit_status & 0377 == 0. Now we know why flatpak-spawn returns 0, but why is exit_status equal to 256 rather than 1 in the first place?

It comes from a g_child_watch_add_full() callback. The g_child_watch_add_full() docs tell us:

In many programs, you will want to call g_spawn_check_exit_status() in the callback to determine whether or not the child exited successfully.

Following the link, we learn:

On Unix, [the exit status] is guaranteed to be in the same format waitpid() returns.

And reading the waitpid() documentation, we finally learn that the exit status is an opaque integer which must be inspected with a set of macros. On Linux, the layout is, roughly:

  • When a process calls exit(x), the exit status is ((x & 0xff) << 8); the low byte is 0. This explains why the exit_status for false is 256.
  • When a process is killed by signal y, the exit status is stored in the low byte, with its high bit (0x80) set if the process dumped core. So a process which segfaults and dumps core will have exit status 11 | 0x80 == 11 + 128 == 139

What’s funny about this is that, if the subprocess segfaults and dumps core, when testing from the shell flatpak-spawn --host appears to work.

host$ /home/wjt/segfault; echo exit status: $?
Segmentation fault (core dumped)
exit status: 139
sandbox$ flatpak-spawn --verbose --host /home/wjt/segfault; echo exit status: $?
F: child_pid: 20256
F: child exited 20256: 139
exit status: 139

But there’s a difference between this and a process which actually exits 139:

sandbox$ flatpak-spawn --verbose --host /bin/sh -c 'exit 139'; echo exit status: $?
F: child_pid: 20481
F: child exited 20481: 35584
exit status: 0

I always thought these two were the same. Actually, mapping the signal that killed a process to $? = 128 + signum is just shell convention.

To fix flatpak-spawn, we need to inspect the exit status and recover the exit code or signal. For normal termination, we can pass the exit code to exit(). For signals, the options are:

  • Reset all signal() handlers to SIG_DFL, then send the signal to ourselves and hope we die
  • Follow the shell convention and exit(128 + signal number)

I think the former sounds scary and unreliable, so I implemented the latter. Imperfect, but it’ll do.

Computer discoveries from February 2016

I found a text file named TIL.md lying around on my computer, with one section dated 17th February 2016. Apparently I’d planned to keep a log of the weird or interesting computer things I learned each day, but forgot after a day. I’d also forgotten all the facts in the file and was surprised afresh. Maybe you’ll be surprised too:

  • Windows’ shell and user interface do not support filenames with trailing spaces, so if you have a directory called worstever.christmas˽ (where ˽ represents a space) on your Unix fileserver, and serve it to Windows over SMB, you’ll see a filename like CQHNYI~0. I think this is the DOS-style 8.3 compatibility filename but I’m not sure where it gets generated in this case – Samba?
  • TIFF files can contain multiple images.
  • If you have a multi-subfile TIFF, multi.tiff, and run convert multi.tiff multi.jpeg, you will not get back a file called multi.jpeg; convert will silently assume you meant convert multi.tiff multi-%d.jpeg and give you back multi-0.jpeg, multi-1.jpeg, etc.

For some context: at the time, I was trying to work out why a script that imported a few tens of thousands of photographs into pan.do/ra – which doesn’t like TIFFs – had skipped some photographs, and imported others as a blank white rectangle; and why a Windows application pointed at the same fileserver showed a different number of photographs again. This was also the first time I encountered an inadvertent homoglyph attack: x.jpg and х.jpg are indistinguishable in most fonts.

Moving on

Yesterday was my last day at Collabora. It’s been a fun five years of working with smart and friendly people (the best kind) on interesting problems. I’ve learnt a lot, created many things I’m proud to have been a part of, and made a lot of friends all over the globe; and now I’ve decided to take a break, then try my hand at something different.

I think it’s notable that quite a few of those smart and friendly people I’m thinking of were neither colleagues nor clients. It’s been a privilege to work predominantly in the open, alongside others with the shared goal of advancing the causes of free software, open platforms and open communication systems. I’m not planning to disappear from the GNOME community any time soon, so I’m looking forward to running into a lot of familiar names, faces and IRC nicks in the future. 🙂

Thanks to Rob, Philippe and everyone I’ve worked with at Collabora over the last half-decade. It’s been great! (Oh, hey, also, Collabora is hiring. I’d recommend working there. Maybe they’ll get an application from Guybrush soon…)

I know that it’s not a party if it happens every night

I think I’ve just about caught up on sleep, four days after getting back from A Coruña. This year’s GUADEC was pretty great. One highlight was the bumper crop of interns’ lightning talks. In general, I’m a huge fan of the lightning talk format, because good talks are just as good when they’re three minutes long, and bad talks are only three minutes long. In this session, I didn’t have to invoke that second clause: the quality was really consistently high, the speakers had prepared well, and the talks kept me interested for the duration. Change-overs were smooth, and a few truncated-slide hiccups didn’t trip anyone up. It’s great to see so many people excited about contributing in all manner of ways. Congratulations all round.

🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙

Michael Meeks informed and amused ((slide 35 is a highlight)) as ever. Discussion about Telepathy’s historically patchy support for IRC during the Empathy BOF pushed me into a drive-by release of the IRC backend. Adam Dingle and Jim Nelson’s keynote also stood out—free software business models are a tricky matter, and it was interesting to hear their thoughts on sustaining the dream. I learnt a lot from Owen’s talk on smooth animations, and particularly enjoyed the un-dramatic reveal in Neil and Robert’s talk on Wayland-ifying the Shell, where they switched Pinpoint out of fullscreen to reveal their demo: an apparently-unremarkable Gnome Shell running both X and Wayland applications, including the presentation itself.

Outside the conference itself, my poor scheduling meant I missed the GNOME OS BOF, to my chagrin, in favour of spending a beautiful day exploring A Coruña. I fell into my usual trap of trying to visit museums on Monday (when they are generally closed), but the Torre de Hércules happened to be both open and free ((how appropriate)). Well worth a visit, if you’re ever there.

For me, chatting to old and new friends about GNOME, music, and everything in between are the best part of GUADEC, and this year was no exception. ((We didn’t have an official party this time around, but the nightly Collabora beach party welcomed many wonderful people, including tens of colleagues I rarely get the opportunity to see in person.)) Of course, over the week I also saw a lot of Pulpo a la Gallega. I felt a bit like this cat in the third panel.

Maps and clocks and contact locations

Once upon a time, three intrepid individuals made Empathy publish your location to your contacts, and show your contacts’ locations on a map. Today, I noticed that the Location tab is missing from Preferences—I guess Debian’s Empathy is built without GeoClue support for some reason—and as a result the map looks rather forlorn, what with none of my contacts publishing their location:

Empathy's empty Contact Map View window

A map is an obvious demo to build, but I don’t think it’s that useful (even when it had more than zero contacts on it, I never looked at it). ((Top designers agree! To quote Allan Day, “I could live without contacts on a map ;)”.)) So what would be more useful? For starters, here’s some “relevant art” from Skype, showing a contact’s local time in their tooltip:

Raúl's Skype tooltip shows it's 6am where he is.

Adding that to Empathy might be a useful first step. But unlike Skype, it’s possible to use this information outside the IM app. So, if I spend a lot of time chatting to friends in Melbourne and New York, why not automatically add those timezones to GNOME Clocks? (The last two mock-ups in that section look particularly bare—perhaps the names of some contacts could show up in the space where “local time” does for Boston.)

For this to be useful, of course, someone would have to fix the publishing of location information in the first place. But if fixing it produced a more compelling feature than a map, it would not be such a thankless task.

Chat account group Shell extension

I have a lot of IM accounts ((32, since you ask)). I often want to turn groups of them on and off: for instance, when I’m not at work I turn off my Collabora accounts, and when testing IM-related stuff I need to turn on my test accounts. I got bored of finding the Messaging and VoIP Accounts window, searching for my work accounts, clicking on each one in turn and toggling them on and off, so I wrote a little GNOME Shell extension which gives you little switches in your panel to enable and disable (groups of) accounts.

Chat account groups menu screenshot

Out of the box it just shows you one slider per account; and it comes with a really terrible application for configuring groups. You can get it from GitHub ((“rah rah why GitHub?” Call it an experiment.)). I’m pretty sure it doesn’t conform to the approval requirements for extensions.gnome.org so it’s not available there; and the configuration application could really use some love and caring. But it does work! If you like it, hooray; if you don’t, I’d love a patch. (Pre-emptively: if it doesn’t work on 3.4, that’s probably because I’m on 3.2, and I’d love a patch.)

Spreadsheet party

I spent most of last week holed up in a meeting room at Collabora Towers with Michael Meeks (of SUSE) and Eike Rathke (of Red Hat), working on a prototype of collaborative spreadsheet editing in LibreOffice Calc, using Telepathy tubes to start editing sessions with your IM contacts. Michael’s got a much more extensive and eloquoent post about where we got to, and here’s a quick screencast of the prototype in action!

Looking around Wayland

My new adventure at Collabora involves Wayland, like all the cool kids. I was distraught to learn that, since Wayland only provides clients with pointer position information to the surface currently under the pointer, and only relative to that surface, xeyes no longer works. We’ll see about that…

A bunch of eyes following the mouse in Weston

Watch a phone-cam video of the eyes in action ((with unintentional soundtrack by Marco, Jo and Daniel)) in your choice of WebM or freedom-hating H.264! (I apologise for the shakes, but it yielded smoother results than the GNOME Shell screencast thing.)

The pointer’s position is provided to clients which request it, relative to a surface of their choosing. Thanks to the way surface transformations work in Weston, the eyes still work when rotated without any further effort:

A bunch of ROTATED eyes following the mouse in Weston

Ready for the desktop!

Joking aside, I don’t really expect my branch to be merged any time soon, not least because it’s very much a proof-of-concept and is pretty easy to break. But it was a useful exercise in learning my way around the Wayland and Weston code-bases. The work involved was actually pretty small in the end:

  • Implement a pair of eyes which only work when the cursor is over them;
  • Define a protocol extension allowing clients to ask to track the pointer position relative to a surface;
  • Plumb it into the compositor and client.

Now onto something a little more useful…

The end of Chromium Notes

Alas, Evan Martin’s excellent series of blog posts from the Chrome-on-Linux salt mines has come to an end. His sabbatical apparently didn’t relieve his general malaise, which he explains thusly:

Before we’d jokingly say “year of Linux on the desktop!” and laugh about how it would never happen, but my smiles had become bitter. A short way to put it is that writing high-quality software is not really a goal of the platform; stuff that doesn’t matter like continuously rewriting atop ever-changing platforms is. The scrappiness and free software spirit is what makes me love Linux as a hacker but I recognize now a deeper doom, that it will only ever broadly succeed by removing that spirit (e.g. Android).

I disagree that “writing high-quality software is not really a goal of the platform”, but there is an argument to be made that incrementally developing a high-quality platform (to enable writing high-quality software) makes life harder for third-party developers. It’s easy for free desktop developers—myself included—to underestimate the impact that tweaking the platform has on others, even if the changes make the platform more coherent in the long term. A common justification for churn is “the work is done by volunteers who wouldn’t necessarily spend their time on other things instead of this”, but that tends to ignore the other volunteers, caught up in dealing with unrelated changes, who would rather spend their time on other things.

This is not to say that platform-wide changes should be avoided at all cost: one of the great merits of the free software ecosystem is that it’s possible to make such changes. Nor am I claiming that volunteers cleaning up stagnant code bases is to be discouraged—quite the opposite. Nor is this an anti-GNOME 3 post, lest I be misinterpreted as thinking that Gtk+ 3, GObject Introspection and other leaps forward were a mistake. But taking advantage of this excellent new technology in applications does carry a cost in the short term.

My favourite Empathy feature no-one knows about

Picture the scene. You’re painstakingly composing a lengthy message…

…when you manage to close the tab by accident (maybe you hit Ctrl-W after too many years using a terminal? or maybe you thought your browser was focused?). You hammer your keyboard in frustration: your beautiful prose is gone forever. But wait! What’s this lurking in the Tabs menu?

Not only does it reopen the tab you just closed, but the message you were composing is remembered, too. Crisis averted!

The keyboard shortcut is Shift-Ctrl-T, just like in many web browsers (which inspired this feature). As a secret bonus feature, the keyboard shortcut works in the contact list, too, to rescue you when you’ve closed the very last tab by accident. Empathy remembers the last few tabs, not just the most recent.

Of course, if you don’t know the super-secret contact list shortcut, you can go find the contact in your contact list again: Empathy should still have remembered the message you were typing. (Right now it doesn’t persist across sessions; a patch to add that would be most welcome!)

Undo Close Tab has been around for a while; remembering the half-written message was added in Empathy 3.1.2, so it’s coming soon to a GNOME 3.2 near you! Thanks, Jonny.