Everybody’s Gone To The GUADEC

It’s been ten days since I came back from GUADEC 2018, and I’ve finally caught up enough to find the time to write about it. As ever, it was a pleasure to see familiar faces from around the community, put some new faces to familiar names, and learn some entirely new names and faces! Some talk highlights:

  • In “Patterns of refactoring C to Rust”, Federico Mena Quintero pulled off the difficult trick of giving a very source code-centric talk without losing the audience. (He said afterwards that the style he used is borrowed from a series of talks he referenced in his slides, but the excellent delivery was certainly a large part of why it worked.)
  • Christian Hergert and Corentin Noël’s talk on “What’s happening in Builder?” left me feeling good about the future of cross-architecture and cross-device GNOME app development. Developing OS and platform components in a desktop-containerised world is not a fully-solved problem; between upcoming plans for Builder and Philip Chimento’s Flapjack, I think we’re getting there.
  • I’m well-versed in Flatpak but know very little about Snap, so Robert Ancell’s talk on “Snap Package support in GNOME” was enlightening. It’s heartening that much of the user-facing infrastructure to solve problems common to Snap and Flatpak (such as GNOME Software and portals) is shared, and it was interesting to learn about some of the unique featues of Snap which make it attractive to ISVs.

I couldn’t get to Almería until the Friday evening; I’m looking forward to checking out video recordings of some of the talks I missed. (Shout-out to the volunteers editing these videos! Update: the videos are now mostly published; I’ve added links to the three talks above.)

One of the best bits of any conference is the hallway track, and GUADEC did not disappoint. Fellow Endlesser Carlo Caione and I caught up with Javier Martinez Canillas from Red Hat to discuss some of the boot-loader questions shared between Endless OS and Silverblue, like the downstream Boot Loader Specification module for GRUB, and how to upgrade GRUB itself—which lives outside the atomic world of OSTree—in as robust and crash-proof a manner as is feasible.

On the bus to the campus on Sunday, I had an interesting discussion with Robert Ancell about acquiring domain expertise too late in a project to fix the design decisions made earlier on (which has happened to me a fair few times). While working on LightDM, he avoided this trap by building a thorough integration test suite early on; this allowed him to refactor with confidence as he discovered murky corners of display management. As I understand it (sorry if I’ve misremembered from the noisy bus ride!), he wrote a library which essentially shims every syscall. This made it easier to mock and exercise all the complicated interactions the display manager has with many different parts of the OS via many IPC mechanisms. I always regret it when I procrastinate on test coverage; I’ll keep this discussion in mind as extra ammunition to do the right thing.

My travel to and from Almería was kindly sponsored by the GNOME Foundation. Thank you!

Sponsored by GNOME Foundation

When is an exit code not an exit code?

TL;DR: I found an interesting bug in flatpak-spawn which taught me that there is a difference between the exit code you pass to exit(), the exit status reported by waitpid(), and the shell variable $?.

One of the goals of Flatpak is to isolate applications from the host system; they can normally only directly run external programs supplied by the Flatpak platform they are built against, rather than whatever executables happen to be installed on the host. But some developer tools do need to be able to run commands on the host system. One example is GNOME Builder, which allows you to compile software on the host; another is flatpak-builder which uses this to build flatpak:s from within a flatpak. (For my part, I’m occasionally working on making Bustle run pkexec dbus-monitor --system on the host, to allow reading all messages on the system bus (a privileged operation) from an unprivileged, sandboxed application. More on this in a future blog post.)

Flatpak’s session helper provides a D-Bus API to do this: a HostCommand method that launches a given command outside the sandbox and returns its process ID; and a HostCommandExited signal which is emitted when the process exists, with its exit status as a uint32. Apps can use this D-Bus API directly, but recent versions of the common runtimes include a wrapper command which is much easier to adapt existing code to use: just replace cat /etc/passwd with flatpak-spawn --host cat /etc/passwd.

In theory, flatpak-spawn --host propagates the exit status from the command it runs, but I found that in practice, it did not. For example, false is a program which does nothing, unsuccessfully:

$ false; echo exit status: $?
1

But when run via flatpak-spawn --host, its exit status is 0:

$ flatpak run --env='PS1=sandbox$ ' \
> --talk-name=org.freedesktop.Flatpak \
> --command=bash org.freedesktop.Sdk//1.6
sandbox$ flatpak-spawn --host false; echo exit status: $?
0

If you care whether the command you launched succeeded, this is problematic! The first clue to what’s going on is in the output of flatpak-spawn --verbose:

sandbox$ flatpak-spawn --verbose --host false; echo exit status: $?
F: child_pid: 18066
F: child exited 18066: 256
exit status: 0

Here’s the code, from the HostCommandExited signal handler:

g_variant_get (parameters, "(uu)", &client_pid, &exit_status);
g_debug ("child exited %d: %d", client_pid, exit_status);

if (child_pid == client_pid)
  exit (exit_status);

So exit_status is 256, even though false actually returns 1. If you read man 3 exit, you will learn:

void exit(int status);

The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).

256 == 0x0100 and 0377 == 0x00ff; so exit_status & 0377 == 0. Now we know why flatpak-spawn returns 0, but why is exit_status equal to 256 rather than 1 in the first place?

It comes from a g_child_watch_add_full() callback. The g_child_watch_add_full() docs tell us:

In many programs, you will want to call g_spawn_check_exit_status() in the callback to determine whether or not the child exited successfully.

Following the link, we learn:

On Unix, [the exit status] is guaranteed to be in the same format waitpid() returns.

And reading the waitpid() documentation, we finally learn that the exit status is an opaque integer which must be inspected with a set of macros. On Linux, the layout is, roughly:

  • When a process calls exit(x), the exit status is ((x & 0xff) << 8); the low byte is 0. This explains why the exit_status for false is 256.
  • When a process is killed by signal y, the exit status is stored in the low byte, with its high bit (0x80) set if the process dumped core. So a process which segfaults and dumps core will have exit status 11 | 0x80 == 11 + 128 == 139

What’s funny about this is that, if the subprocess segfaults and dumps core, when testing from the shell flatpak-spawn --host appears to work.

host$ /home/wjt/segfault; echo exit status: $?
Segmentation fault (core dumped)
exit status: 139
sandbox$ flatpak-spawn --verbose --host /home/wjt/segfault; echo exit status: $?
F: child_pid: 20256
F: child exited 20256: 139
exit status: 139

But there’s a difference between this and a process which actually exits 139:

sandbox$ flatpak-spawn --verbose --host /bin/sh -c 'exit 139'; echo exit status: $?
F: child_pid: 20481
F: child exited 20481: 35584
exit status: 0

I always thought these two were the same. Actually, mapping the signal that killed a process to $? = 128 + signum is just shell convention.

To fix flatpak-spawn, we need to inspect the exit status and recover the exit code or signal. For normal termination, we can pass the exit code to exit(). For signals, the options are:

  • Reset all signal() handlers to SIG_DFL, then send the signal to ourselves and hope we die
  • Follow the shell convention and exit(128 + signal number)

I think the former sounds scary and unreliable, so I implemented the latter. Imperfect, but it’ll do.

Computer discoveries from February 2016

I found a text file named TIL.md lying around on my computer, with one section dated 17th February 2016. Apparently I’d planned to keep a log of the weird or interesting computer things I learned each day, but forgot after a day. I’d also forgotten all the facts in the file and was surprised afresh. Maybe you’ll be surprised too:

  • Windows’ shell and user interface do not support filenames with trailing spaces, so if you have a directory called worstever.christmas˽ (where ˽ represents a space) on your Unix fileserver, and serve it to Windows over SMB, you’ll see a filename like CQHNYI~0. I think this is the DOS-style 8.3 compatibility filename but I’m not sure where it gets generated in this case – Samba?
  • TIFF files can contain multiple images.
  • If you have a multi-subfile TIFF, multi.tiff, and run convert multi.tiff multi.jpeg, you will not get back a file called multi.jpeg; convert will silently assume you meant convert multi.tiff multi-%d.jpeg and give you back multi-0.jpeg, multi-1.jpeg, etc.

For some context: at the time, I was trying to work out why a script that imported a few tens of thousands of photographs into pan.do/ra – which doesn’t like TIFFs – had skipped some photographs, and imported others as a blank white rectangle; and why a Windows application pointed at the same fileserver showed a different number of photographs again. This was also the first time I encountered an inadvertent homoglyph attack: x.jpg and х.jpg are indistinguishable in most fonts.

Moving on

Yesterday was my last day at Collabora. It’s been a fun five years of working with smart and friendly people (the best kind) on interesting problems. I’ve learnt a lot, created many things I’m proud to have been a part of, and made a lot of friends all over the globe; and now I’ve decided to take a break, then try my hand at something different.

I think it’s notable that quite a few of those smart and friendly people I’m thinking of were neither colleagues nor clients. It’s been a privilege to work predominantly in the open, alongside others with the shared goal of advancing the causes of free software, open platforms and open communication systems. I’m not planning to disappear from the GNOME community any time soon, so I’m looking forward to running into a lot of familiar names, faces and IRC nicks in the future. 🙂

Thanks to Rob, Philippe and everyone I’ve worked with at Collabora over the last half-decade. It’s been great! (Oh, hey, also, Collabora is hiring. I’d recommend working there. Maybe they’ll get an application from Guybrush soon…)

I know that it’s not a party if it happens every night

I think I’ve just about caught up on sleep, four days after getting back from A Coruña. This year’s GUADEC was pretty great. One highlight was the bumper crop of interns’ lightning talks. In general, I’m a huge fan of the lightning talk format, because good talks are just as good when they’re three minutes long, and bad talks are only three minutes long. In this session, I didn’t have to invoke that second clause: the quality was really consistently high, the speakers had prepared well, and the talks kept me interested for the duration. Change-overs were smooth, and a few truncated-slide hiccups didn’t trip anyone up. It’s great to see so many people excited about contributing in all manner of ways. Congratulations all round.

🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙 🐙

Michael Meeks informed and amused ((slide 35 is a highlight)) as ever. Discussion about Telepathy’s historically patchy support for IRC during the Empathy BOF pushed me into a drive-by release of the IRC backend. Adam Dingle and Jim Nelson’s keynote also stood out—free software business models are a tricky matter, and it was interesting to hear their thoughts on sustaining the dream. I learnt a lot from Owen’s talk on smooth animations, and particularly enjoyed the un-dramatic reveal in Neil and Robert’s talk on Wayland-ifying the Shell, where they switched Pinpoint out of fullscreen to reveal their demo: an apparently-unremarkable Gnome Shell running both X and Wayland applications, including the presentation itself.

Outside the conference itself, my poor scheduling meant I missed the GNOME OS BOF, to my chagrin, in favour of spending a beautiful day exploring A Coruña. I fell into my usual trap of trying to visit museums on Monday (when they are generally closed), but the Torre de Hércules happened to be both open and free ((how appropriate)). Well worth a visit, if you’re ever there.

For me, chatting to old and new friends about GNOME, music, and everything in between are the best part of GUADEC, and this year was no exception. ((We didn’t have an official party this time around, but the nightly Collabora beach party welcomed many wonderful people, including tens of colleagues I rarely get the opportunity to see in person.)) Of course, over the week I also saw a lot of Pulpo a la Gallega. I felt a bit like this cat in the third panel.

Maps and clocks and contact locations

Once upon a time, three intrepid individuals made Empathy publish your location to your contacts, and show your contacts’ locations on a map. Today, I noticed that the Location tab is missing from Preferences—I guess Debian’s Empathy is built without GeoClue support for some reason—and as a result the map looks rather forlorn, what with none of my contacts publishing their location:

Empathy's empty Contact Map View window

A map is an obvious demo to build, but I don’t think it’s that useful (even when it had more than zero contacts on it, I never looked at it). ((Top designers agree! To quote Allan Day, “I could live without contacts on a map ;)”.)) So what would be more useful? For starters, here’s some “relevant art” from Skype, showing a contact’s local time in their tooltip:

Raúl's Skype tooltip shows it's 6am where he is.

Adding that to Empathy might be a useful first step. But unlike Skype, it’s possible to use this information outside the IM app. So, if I spend a lot of time chatting to friends in Melbourne and New York, why not automatically add those timezones to GNOME Clocks? (The last two mock-ups in that section look particularly bare—perhaps the names of some contacts could show up in the space where “local time” does for Boston.)

For this to be useful, of course, someone would have to fix the publishing of location information in the first place. But if fixing it produced a more compelling feature than a map, it would not be such a thankless task.

Chat account group Shell extension

I have a lot of IM accounts ((32, since you ask)). I often want to turn groups of them on and off: for instance, when I’m not at work I turn off my Collabora accounts, and when testing IM-related stuff I need to turn on my test accounts. I got bored of finding the Messaging and VoIP Accounts window, searching for my work accounts, clicking on each one in turn and toggling them on and off, so I wrote a little GNOME Shell extension which gives you little switches in your panel to enable and disable (groups of) accounts.

Chat account groups menu screenshot

Out of the box it just shows you one slider per account; and it comes with a really terrible application for configuring groups. You can get it from GitHub ((“rah rah why GitHub?” Call it an experiment.)). I’m pretty sure it doesn’t conform to the approval requirements for extensions.gnome.org so it’s not available there; and the configuration application could really use some love and caring. But it does work! If you like it, hooray; if you don’t, I’d love a patch. (Pre-emptively: if it doesn’t work on 3.4, that’s probably because I’m on 3.2, and I’d love a patch.)

Spreadsheet party

I spent most of last week holed up in a meeting room at Collabora Towers with Michael Meeks (of SUSE) and Eike Rathke (of Red Hat), working on a prototype of collaborative spreadsheet editing in LibreOffice Calc, using Telepathy tubes to start editing sessions with your IM contacts. Michael’s got a much more extensive and eloquoent post about where we got to, and here’s a quick screencast of the prototype in action!

Looking around Wayland

My new adventure at Collabora involves Wayland, like all the cool kids. I was distraught to learn that, since Wayland only provides clients with pointer position information to the surface currently under the pointer, and only relative to that surface, xeyes no longer works. We’ll see about that…

A bunch of eyes following the mouse in Weston

Watch a phone-cam video of the eyes in action ((with unintentional soundtrack by Marco, Jo and Daniel)) in your choice of WebM or freedom-hating H.264! (I apologise for the shakes, but it yielded smoother results than the GNOME Shell screencast thing.)

The pointer’s position is provided to clients which request it, relative to a surface of their choosing. Thanks to the way surface transformations work in Weston, the eyes still work when rotated without any further effort:

A bunch of ROTATED eyes following the mouse in Weston

Ready for the desktop!

Joking aside, I don’t really expect my branch to be merged any time soon, not least because it’s very much a proof-of-concept and is pretty easy to break. But it was a useful exercise in learning my way around the Wayland and Weston code-bases. The work involved was actually pretty small in the end:

  • Implement a pair of eyes which only work when the cursor is over them;
  • Define a protocol extension allowing clients to ask to track the pointer position relative to a surface;
  • Plumb it into the compositor and client.

Now onto something a little more useful…

The end of Chromium Notes

Alas, Evan Martin’s excellent series of blog posts from the Chrome-on-Linux salt mines has come to an end. His sabbatical apparently didn’t relieve his general malaise, which he explains thusly:

Before we’d jokingly say “year of Linux on the desktop!” and laugh about how it would never happen, but my smiles had become bitter. A short way to put it is that writing high-quality software is not really a goal of the platform; stuff that doesn’t matter like continuously rewriting atop ever-changing platforms is. The scrappiness and free software spirit is what makes me love Linux as a hacker but I recognize now a deeper doom, that it will only ever broadly succeed by removing that spirit (e.g. Android).

I disagree that “writing high-quality software is not really a goal of the platform”, but there is an argument to be made that incrementally developing a high-quality platform (to enable writing high-quality software) makes life harder for third-party developers. It’s easy for free desktop developers—myself included—to underestimate the impact that tweaking the platform has on others, even if the changes make the platform more coherent in the long term. A common justification for churn is “the work is done by volunteers who wouldn’t necessarily spend their time on other things instead of this”, but that tends to ignore the other volunteers, caught up in dealing with unrelated changes, who would rather spend their time on other things.

This is not to say that platform-wide changes should be avoided at all cost: one of the great merits of the free software ecosystem is that it’s possible to make such changes. Nor am I claiming that volunteers cleaning up stagnant code bases is to be discouraged—quite the opposite. Nor is this an anti-GNOME 3 post, lest I be misinterpreted as thinking that Gtk+ 3, GObject Introspection and other leaps forward were a mistake. But taking advantage of this excellent new technology in applications does carry a cost in the short term.