The devil makes work for idle processes

TLDR: in Endless OS, we switched the IO scheduler from CFQ to BFQ, and set the IO priority of the threads doing Flatpak downloads, installs and upgrades to “idle”; this makes the interactive performance of the system while doing Flatpak operations indistinguishable from when the system is idle.

At Endless, we’ve been vaguely aware for a while that trying to use your computer while installing or updating apps is a bit painful, particularly on spinning-disk systems, because of the sheer volume of IO performed by the installation/update process. This was never particularly high priority, since app installations are user-initiated, and until recently, so were app updates.

But, we found that users often never updated their installed apps, so earlier this year, in Endless OS 3.3.10, we introduced automatic background app updates to help users take advantage of “new features and bug fixes” (in the generic sense you so often see in iOS/Android app release notes). This fixed the problem of users getting “stuck” on old app versions, but made the previous problem worse: now, your computer becomes essentially unusable at arbitrary times when app updates happen. It was particularly bad when users unboxed a system with an older version of Endless OS (and hence a hundred or so older apps) pre-installed, received an automatic OS update, then rebooted into a system that’s unusable until all those apps have been updated.

At first, I looked for logic errors in (our versions of) GNOME Software and Flatpak that might cause unneccessary IO during app updates, without success. We concluded that heavy IO load when updating a large app or runtime is largely unavoidable, ((modulo Umang’s work mentioned in the coda)) so I switched to looking at whether we could mitigate this by tweaking the IO scheduler.

The BFQ IO scheduler is supposed to automatically prioritize interactive workloads over bulk workload, which is pretty much exactly what we’re trying to do. The specific example its developers give is watching a video, without hiccups, while copying a huge file in the background. I spent some time with the BFQ developers’ own suite of benchmarks on two test systems: a Lenovo Yoga 900 (with an Intel i5-6200U @ 2.30GHz and a consumer-grade M.2 SSD) and an Endless Mission One (an older system with a Celeron CPU and a laptop-class spinning disk). Neither JP nor I were able to reproduce any interesting results for the dropped-frames benchmark: with either BFQ or CFQ (the previous default IO scheduler), the Yoga essentially never dropped frames, whereas the IO workloads immediately rendered the Mission totally unusable. I had rather more success with a benchmark which measures the time to launch LibreOffice:

  • On the Yoga, when the system was idle, the mean launch time went from 2.838s under CFQ to 2.98s under BFQ (a slight regression), but with heavy background IO, the mean launch time went from 16s with CFQ (standard deviation 0.11) to 3s with BFQ (standard deviation 0.51).
  • On the Mission, with modest background IO, the mean launch time was 108 seconds under BFQ, which sounds awful; but under CFQ, I gave up waiting for LibreOffice to start after 8 minutes!

Emboldened by these results, I went on to look at how the same “time to launch LibreOffice” benchmark fared when the background IO load is “installing and uninstalling a Lollipop Flatpak bundle in a loop”. I also looked at using ionice -c3 to set the IO priority of the install/uninstall loop to idle, which does what its name suggests: BFQ essentially will never serve IO at the idle priority if there is IO pending at any higher priority. You can see some raw data or look at some extended discussion copied from our internal issue tracker to a Flatpak pull request, but I suggest just looking at this chart:

What does it all mean?

  • The coloured bars represent median launch time in seconds for LibreOffice, across 15/30 trials for Yoga/Mission respectively.
  • The black whiskers show the minimum and maximum launch times observed. I know this should have been a box-and-whiskers or violin plot, but I realised too late that multitime does not give enough information to draw those.
  • “unloaded” refers to the performance when the system is otherwise idle.
  • “shell-loop” refers to running while true; do flatpak install -y /home/wjt/Downloads/org.gnome.Lollypop.flatpak; flatpak uninstall -y org.gnome.Lollypop/x86_64/stable; done; “long-lived” refers to performing the same operations with the Flatpak API in a long-lived process. I tried this because I understood that BFQ gives new processes a slight performance boost, but on a real system the GNOME Software and Flatpak system helper processes are long-lived. As you can see, the behaviour under BFQ is actually the other way around in the worst case, and identical for CFQ and in the median case.
  • The “ionice-” prefix means the Flatpak operation was run under ionice -c3.
  • Switching from CFQ to BFQ makes the worst case a little worse at the default IO priority, but the median case much better.
  • Setting the IO priority of the Flatpak process(es) to idle erases that worst-case regression under BFQ, and dramatically improves the median case under CFQ.
  • In combination, the time to launch LibreOffice while performing Flatpak operations in the background on the Mission went from 24 seconds to 12 seconds by switching to BFQ & setting the IO priority to idle.

So, by switching to BFQ and setting IO priorities appropriately, the system’s interactive performance while performing background updates is now essentially indistinguishable from when the system is idle. To implement this in practice, Rob McQueen wrote some patches to set the IO priority of the Flatpak system helper and GNOME Software’s worker threads to idle (both changes are upstream) and changed Endless OS’s default IO scheduler to BFQ where available. As Matthias put it on when shown this chart and that first link: “not bad for a 1-line change”.

Of course, this means apps take a bit longer to install, even on a mostly-idle system. No, I don’t have numbers on how big the impact is: this work happened months ago and it’s taken me this long to write it up because I couldn’t find the time to collect more data. But my colleague Umang is working on eliminating up to half of the disk IO performed during Flatpak installations so that should more than make up for it!

Using Vundle from the Vim Flatpak

I (mostly) use Vim, and it’s available on Flathub. However: my plugins are managed with Vundle, which shells out to git to download and update them. But git is not in the org.freedesktop.Platform runtime that the Vim Flatpak uses, so that’s not going to work!

If you’ve read any of my recent posts about Flatpak, you’ll know my favourite hammer. I allowed Vim to use flatpak-spawn by launching it as:

flatpak run --talk-name=org.freedesktop.Flatpak org.vim.Vim

I saved the following file to /tmp/git:

#!/bin/sh
exec flatpak-spawn --host git "$@"

then ran the following Vim commands to make it executable, add it to the path, then fire up Vundle:

:r !chmod +x /tmp/git
:let $PATH = '/tmp:/app/bin:/usr/bin'
:VundleInstall

This tricks Vundle into running git outside the sandbox. It worked!

I’m posting this partly as a note to self for next time I want to do this, and partly to say “can we do better?”. In this specific case, the Vim Flatpak could use org.freedesktop.Sdk as its runtime, like many other editors do. But this only solves the problem for tools like git which are included in the relevant SDK. What if I’m writing Python and want to use pyflakes, which is not in the SDK?

Everybody’s Gone To The GUADEC

It’s been ten days since I came back from GUADEC 2018, and I’ve finally caught up enough to find the time to write about it. As ever, it was a pleasure to see familiar faces from around the community, put some new faces to familiar names, and learn some entirely new names and faces! Some talk highlights:

  • In “Patterns of refactoring C to Rust”, Federico Mena Quintero pulled off the difficult trick of giving a very source code-centric talk without losing the audience. (He said afterwards that the style he used is borrowed from a series of talks he referenced in his slides, but the excellent delivery was certainly a large part of why it worked.)
  • Christian Hergert and Corentin Noël’s talk on “What’s happening in Builder?” left me feeling good about the future of cross-architecture and cross-device GNOME app development. Developing OS and platform components in a desktop-containerised world is not a fully-solved problem; between upcoming plans for Builder and Philip Chimento’s Flapjack, I think we’re getting there.
  • I’m well-versed in Flatpak but know very little about Snap, so Robert Ancell’s talk on “Snap Package support in GNOME” was enlightening. It’s heartening that much of the user-facing infrastructure to solve problems common to Snap and Flatpak (such as GNOME Software and portals) is shared, and it was interesting to learn about some of the unique featues of Snap which make it attractive to ISVs.

I couldn’t get to Almería until the Friday evening; I’m looking forward to checking out video recordings of some of the talks I missed. (Shout-out to the volunteers editing these videos! Update: the videos are now mostly published; I’ve added links to the three talks above.)

One of the best bits of any conference is the hallway track, and GUADEC did not disappoint. Fellow Endlesser Carlo Caione and I caught up with Javier Martinez Canillas from Red Hat to discuss some of the boot-loader questions shared between Endless OS and Silverblue, like the downstream Boot Loader Specification module for GRUB, and how to upgrade GRUB itself—which lives outside the atomic world of OSTree—in as robust and crash-proof a manner as is feasible.

On the bus to the campus on Sunday, I had an interesting discussion with Robert Ancell about acquiring domain expertise too late in a project to fix the design decisions made earlier on (which has happened to me a fair few times). While working on LightDM, he avoided this trap by building a thorough integration test suite early on; this allowed him to refactor with confidence as he discovered murky corners of display management. As I understand it (sorry if I’ve misremembered from the noisy bus ride!), he wrote a library which essentially shims every syscall. This made it easier to mock and exercise all the complicated interactions the display manager has with many different parts of the OS via many IPC mechanisms. I always regret it when I procrastinate on test coverage; I’ll keep this discussion in mind as extra ammunition to do the right thing.

My travel to and from Almería was kindly sponsored by the GNOME Foundation. Thank you!

Sponsored by GNOME Foundation

Bustle 0.7.1: jumping the ticket barrier

Bustle 0.7.1 is out now and supports monitoring the system bus, without requiring any prior system configuration. It also lets you monitor any other bus by providing its address, which I’ve already used to spy on ibus traffic.

Screenshot: Bustle monitoring IBus messages.

Bustle used to try to intercept all messages by adding one match rule per message type, with the eavesdrop=true flag set. This is how dbus-monitor also worked when Bustle was created. This works okay on the session bus, but for obvious reasons, a regular user being able to snoop on all messages on the system bus would be undesirable; even if you were root, you had to edit the policy to allow you to do this. So, the Bustle UI didn’t have any facility for monitoring the system bus; instead, the web page had some poorly-specified instructions suggesting you remove all restrictive policies from the system bus, reboot, do your monitoring, then reimpose the security policies. Not very stetic!

D-Bus 1.5.6 ((or version 0.18 of the spec if you’re into that)) added a BecomeMonitor method explicitly designed for debugging tools: if the bus considers you privileged enough, it will send you a copy of every future message that passes through the bus until you disconnect. Privileged enough is technically implementation defined; the reference implementation is that if you are root, or have the same UID as the bus itself, you can become a monitor.

Three panels: woman smiling while looking at computer screen; mouse pointer pointing at 'Become a fan' button from an old version of Facebook; woman has been replaced by a desk fan.

Bustle, which runs as a regular user, needs to outsource the task of monitoring the system bus to some other privileged process. The normal way to do this kind of thing – as used by tools like sysprof – is to have a system service which runs as root and uses polkit to determine whether to allow the request. The D-Bus daemon itself is not about to talk to polkit (itself a D-Bus service), though, and when distributed with Flatpak it’s not possible for Bustle to install its own system service.

I decided to cheat and assume that pkexec – the polkit equivalent of sudo – and dbus-monitor are both installed on the host system. A few years ago, dbus-monitor grew the ability to dump out raw D-Bus messages in pcap format, which by pure coincidence is the on-disk format Bustle uses. So Bustle builds a command line as follows:

  • If running inside a Flatpak, append flatpak-spawn --host
  • If trying to monitor the system bus, pkexec
  • dbus-monitor --pcap [--session | --system | --address ADDRESS]

It launches this subprocess, then reads the stream of messages from its stdout. In the system bus case, polkit will prompt the user to authenticate and approve the action:

Screenshot: Authentication is needed to run /usr/bin/dbus-monitor as the super user.

Assuming you authenticate successfully, the messages will flow:

Screenshot: Bustle recording messages on the system bus.

There are a few more fiddly details:

  • If the dbus-monitor subprocess quits unexpectedly, we want to show an error; but if the user hits cancel on the polkit authentication dialog, we don’t. pkexec signals this with exit code 126. Now you know why I care about flatpak-spawn propagating exit codes correctly: without that, Bustle can’t readily distinguish these two cases.
  • Bustle’s old internal monitor code did an initial state dump of names and their owners when it connected to the bus. dbus-monitor doesn’t do this yet. For now, Bustle waits until it knows the monitor is running, then makes its own connection to the same bus and performs the same state dump, which will be captured by the monitor and end up in the same stream. This means that Bustle in Flatpak still needs access to the session and system buses.
  • Once started, dbus-monitor only stops when the bus exits, it tries and fails to write to its stdout pipe, or it is killed by a signal. But sending SIGKILL from the unprivileged Bustle process to the privileged dbus-monitor process has no effect. Weirdly, if I run pkexec dbus-monitor --system in a terminal, I can hit Ctrl+C and kill it just fine. (Is this something to do with controlling terminals? Please tell me if you know how I could make this work.) So instead we just close the pipe that dbus-monitor is writing to, and assume that at some point in the future it will try and fail to log a new message to it. ((I’ve realised while writing this post that I can bring “some point in the future” forward by just connecting to the bus again.))
  • Why bother with the pcap framing when Bustle immediately takes it back apart? It means messages are timestamped in the dbus-monitor process, so they’re marginally more likely to be accurate than they would be if the Bustle UI process (which might be doing some rendering) was adding the timestamps.

On the one hand, I’m pleased with this technique, because it allows me to implement a feature I’ve wanted Bustle to have since I started it in 2008(!). On the other hand, this isn’t going to cut it in the general case of a Flatpaked development tool like sysprof, which needs a system helper but can’t reasonably assume that it’s already installed on the host system. Of course there’s nothing to stop you doing something like flatpak-spawn --host pkexec flatpak run --command=my-system-helper com.example.MyApp ... I suppose…

When is an exit code not an exit code?

TL;DR: I found an interesting bug in flatpak-spawn which taught me that there is a difference between the exit code you pass to exit(), the exit status reported by waitpid(), and the shell variable $?.

One of the goals of Flatpak is to isolate applications from the host system; they can normally only directly run external programs supplied by the Flatpak platform they are built against, rather than whatever executables happen to be installed on the host. But some developer tools do need to be able to run commands on the host system. One example is GNOME Builder, which allows you to compile software on the host; another is flatpak-builder which uses this to build flatpak:s from within a flatpak. (For my part, I’m occasionally working on making Bustle run pkexec dbus-monitor --system on the host, to allow reading all messages on the system bus (a privileged operation) from an unprivileged, sandboxed application. More on this in a future blog post.)

Flatpak’s session helper provides a D-Bus API to do this: a HostCommand method that launches a given command outside the sandbox and returns its process ID; and a HostCommandExited signal which is emitted when the process exists, with its exit status as a uint32. Apps can use this D-Bus API directly, but recent versions of the common runtimes include a wrapper command which is much easier to adapt existing code to use: just replace cat /etc/passwd with flatpak-spawn --host cat /etc/passwd.

In theory, flatpak-spawn --host propagates the exit status from the command it runs, but I found that in practice, it did not. For example, false is a program which does nothing, unsuccessfully:

$ false; echo exit status: $?
1

But when run via flatpak-spawn --host, its exit status is 0:

$ flatpak run --env='PS1=sandbox$ ' \
> --talk-name=org.freedesktop.Flatpak \
> --command=bash org.freedesktop.Sdk//1.6
sandbox$ flatpak-spawn --host false; echo exit status: $?
0

If you care whether the command you launched succeeded, this is problematic! The first clue to what’s going on is in the output of flatpak-spawn --verbose:

sandbox$ flatpak-spawn --verbose --host false; echo exit status: $?
F: child_pid: 18066
F: child exited 18066: 256
exit status: 0

Here’s the code, from the HostCommandExited signal handler:

g_variant_get (parameters, "(uu)", &client_pid, &exit_status);
g_debug ("child exited %d: %d", client_pid, exit_status);

if (child_pid == client_pid)
  exit (exit_status);

So exit_status is 256, even though false actually returns 1. If you read man 3 exit, you will learn:

void exit(int status);

The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).

256 == 0x0100 and 0377 == 0x00ff; so exit_status & 0377 == 0. Now we know why flatpak-spawn returns 0, but why is exit_status equal to 256 rather than 1 in the first place?

It comes from a g_child_watch_add_full() callback. The g_child_watch_add_full() docs tell us:

In many programs, you will want to call g_spawn_check_exit_status() in the callback to determine whether or not the child exited successfully.

Following the link, we learn:

On Unix, [the exit status] is guaranteed to be in the same format waitpid() returns.

And reading the waitpid() documentation, we finally learn that the exit status is an opaque integer which must be inspected with a set of macros. On Linux, the layout is, roughly:

  • When a process calls exit(x), the exit status is ((x & 0xff) << 8); the low byte is 0. This explains why the exit_status for false is 256.
  • When a process is killed by signal y, the exit status is stored in the low byte, with its high bit (0x80) set if the process dumped core. So a process which segfaults and dumps core will have exit status 11 | 0x80 == 11 + 128 == 139

What’s funny about this is that, if the subprocess segfaults and dumps core, when testing from the shell flatpak-spawn --host appears to work.

host$ /home/wjt/segfault; echo exit status: $?
Segmentation fault (core dumped)
exit status: 139
sandbox$ flatpak-spawn --verbose --host /home/wjt/segfault; echo exit status: $?
F: child_pid: 20256
F: child exited 20256: 139
exit status: 139

But there’s a difference between this and a process which actually exits 139:

sandbox$ flatpak-spawn --verbose --host /bin/sh -c 'exit 139'; echo exit status: $?
F: child_pid: 20481
F: child exited 20481: 35584
exit status: 0

I always thought these two were the same. Actually, mapping the signal that killed a process to $? = 128 + signum is just shell convention.

To fix flatpak-spawn, we need to inspect the exit status and recover the exit code or signal. For normal termination, we can pass the exit code to exit(). For signals, the options are:

  • Reset all signal() handlers to SIG_DFL, then send the signal to ourselves and hope we die
  • Follow the shell convention and exit(128 + signal number)

I think the former sounds scary and unreliable, so I implemented the latter. Imperfect, but it’ll do.