Chromium on Flathub

In December 2020, Chromium reached the Flathub stable channel. Assuming you have Flatpak 1.8.2 or newer, and your kernel is configured to allow unprivileged user namespaces, you can download it now.

Screenshot of Chromium showing the Chromium page on flathub.org

History

Endless OS is based on Debian, but rather than releasing as a bunch of .debs, it is released as an immutable OSTree snapshot, with apps added and removed using Flatpak.

For many years, we maintained a branch of Chromium as a traditional OS package which was built into the OS itself, and updated it together with our monthly OS releases. This did not match up well with Chromium, which has a new major version every 6 weeks and typically 2–4 patch versions in between. It’s a security-critical component, and those patch versions invariably fix some rather serious vulnerability. In some ways, web browsers are the best possible example of apps that should be updated independently of the OS. (In a nice parallel, it seems that the Chrome OS folks are also working on separating OS updates from browser updates on Chrome OS.)

Browsers are also the best possible example of apps which should use elaborate sandboxing techniques to limit the impact of security vulnerabilities, and Chromium is indeed a pioneer in this space. Flatpak applies much the same tools to sandbox applications, which ironically made it harder to ship Chromium as a Flatpak: when running in the Flatpak sandbox, it can’t use those same sandboxing APIs provided by the kernel to sandbox itself further.

Flatpak provides its own API for sandboxed applications to launch new instances of themselves with tighter sandboxing; what’s needed is a way to make Chromium use that…

Solution

Ryan Gonzalez has had a long-running project with us to enable Chromium-based apps to work well as Flatpaks. The first targets were apps built with Electron: his zypak project provides an LD_PRELOAD-able library that redirects Chromium’s sandbox to use Flatpak’s sub-sandboxing API. This avoids the need to modify the (often proprietary) apps themselves, and is now used by dozens of Electron apps on Flathub which would otherwise not be usable with Flatpak. There’s also a version of Chrome in the Flathub beta channel using this technique.

For Chromium, we can take a different approach. It’s open-source code, being compiled by Flathub, so Ryan prepared some patches to teach it to use the Flatpak sandboxing APIs directly, for better performance and robustness.

Once the sandbox integration was done, there was a long list of other changes needed to make the Chromium Flatpak work at least as well as our previous built-in version, which André Moreira Magalhães from Endless worked through with Ryan.

Some of these came from the old Endless OS package, such as using a royalty-free implementation of AAC, splitting encumbered codecs to a separate package so they can be excluded as needed for distribution, and discarding background tabs when the system is under memory pressure (which is useful on systems with limited RAM, but is disabled by default on desktop Linux builds).

Others were specific to Flatpak, such as dealing with udev not being available in the sandbox, restoring the ability to create app launchers for websites, integrating with Flatpak’s network proxy portal, and allowing Chromium policy files to be provided by the host system.

Over in Endless OS, we also needed to update users’ existing file associations and migrate their Chromium profiles to its new home.

Impact

Chart of 30 days of Chromium downloads, with three large spikes of around 20,000 daily downloads

The chart above is the Flathub download statistics for Chromium in the past 30 days. Counting the points between 14th March (when the most recent update was pushed) and 21st March, there have been nearly 60,000 downloads. The majority of these will be Endless OS users: our 3.9.2 release in January 2021 rolled this change out to all users, and Endless OS has automatic updates enabled by default. But Flathub has a broader reach than just Endless OS! I believe that users of System76’s Pop!_OS have been migrated from a .deb of Chromium to this Flatpak, and surely there are many users on other distributions, too. It’s also been used as the basis for other apps on Flathub, including ungoogled-chromium.

As an added bonus, the Flatpak is wired up to flatpak-external-data-checker, which now automatically opens a pull request when a new Chromium release is published. Typically, new major releases need manual intervention to refresh the Flatpak patches, but minor releases often build without issue: for these, one can just smoke-test the test build from the pull request, and then merge it, reducing what used to be days of effort rebasing the package in Endless OS to the work of minutes. I love it when a plan comes together.

A quick glance at the issues on the flathub/org.chromium.Chromium repo will show that there is always more work to be done. We would love to see other distributions getting involved, reducing the duplicated work of maintaining Chromium packages for each distro, and making it easier for users of long-term stable branches to get important browser updates quickly and easily.

Bustleman’s Holiday

I recently had a few weeks off work, and mostly did a good job of staying away from my computer. However, I couldn’t resist putting in a few hours to release Bustle 0.8.0, someone’s favourite D-Bus profiler. Get it on Flathub today!

Screenshot of Bustle 0.8.0's About dialog. “Someone's favourite D-Bus profiler”
I never have clarified whose favourite D-Bus profiler it is…

The main user-facing change is a great new icon, designed and drawn by Tobias Bernard. (Unfortunately the Flathub website still shows the old one for CDN cache reasons, but you can feast your eyes above or in your favourite app centre.)

I also rewrote some parts of the app from Haskell to C. In particular, it no longer uses the Haskell D-Bus implementation and libpcap binding, instead using GDBus and GVariant (via some minimal hand-written FFI bindings) and some mostly-existing C code, respectively. Me circa 2008 would be horrified to learn that I’ve done this, and me circa 2020 is sad to be introducing more unsafe code, but the net result is a smaller app with way fewer transitive dependencies.

Why can’t it use GDBus via GObject Introspection, you ask? GLib didn’t have a D-Bus implementation in 2008, and there was no GObject Introspection support for Haskell. These days, bindings for Gio etc. generated via GObject Introspection exist, but migrating Bustle to these would amount to a rewrite, and I don’t have the time or stomach for that. (These libraries all have my name on them because I made a small start at the generator back in 2011 with the express goal of using them in Bustle, but I’d be amazed if any of my work is still present.)

I’ve been fortunate to have had a dozen or so contributors to Bustle over the years, but it’s certainly true that my choice of implementation language has been a big barrier to entry for other contributors. I maintain that it was not a bad choice in and of itself for a weekend project: in 2008, a time before GObject Introspection was widely available, the Haskell GTK and Cairo bindings were unusually thorough and well-written and allowed me to write concise code and iterate quickly. But going against the grain of the community has a cost: the intersection of “Haskell programmer”, “D-Bus expert” and “GNOME enthusiast” is a very small set, and I know several people personally who were interested in contributing until they saw the language.

Why C rather than Rust, you ask? If you want to try to glue Cabal, Cargo and C together, be my guest.

(In case the joke in the title doesn’t translate outside the UK: busman’s holiday on Wiktionary.)

Vanilla is a complex and delicious flavour

Last week, Tobias Bernard published a thought-provoking article, There is no “Linux” Platform (Part 1), based on a talk at LAS 2019. (Unfortunately I couldn’t make it to LAS, and I haven’t found the time to watch a recording of the talk, so I’m going solely from the blog post here.) The article makes some interesting observations, and I found a fair few things to agree with. But I want to offer a counterpoint to this paragraph of the final section, “The Wrong Incentives”:

The Endless OS shell is a great example of this. They started out with vanilla GNOME Shell, but then added ever more downstream patches in order to address issues found in in-house usability tests. This means that they end up having to do huge rebases every release, which is a lot of work. At the same time, the issues that prompted the changes do not get fixed upstream (Endless have recently changed their strategy and are working upstream much more now, so hopefully this will get better in the future).

If we’re looking at the code shipping in Endless OS today, then yes, our desktop is vanilla GNOME Shell with a few hundred patches on top, and yes, as a result, rebasing onto new GNOME releases is a lot of work. But the starting point for Endless OS was not “what’s wrong with GNOME?” but “what would the ideal desktop look like for a new category of users?”.

When Endless began, the goal was to create a new desktop computing product, targeting new computer users in communities which were under-served by existing platforms and products. The company conducted extensive field research, and designed a desktop user interface for those users. Prototypes were made using various different components, including Openbox, but ultimately the decision was made to base the desktop on GNOME, because GNOME provided a collection of components closest to the desired user experience. The key point here is that basing the Endless desktop on GNOME was an implementation detail, made because the GNOME stack is a robust, feature-rich and flexible base for a desktop.

Over time, the strategy shifted away from being based solely around first-party hardware, towards distributing our software to a broader set of users using standard desktop and laptop hardware. Around the same time, Endless made the switch from first- and third-party apps packaged as a combination of Debian packages and an in-house system towards using Flatpak for apps, and contributed towards the establishment of Flathub. Part of the motivation for this switch was to get Endless out of the business of packaging other people’s applications, and instead to enable app developers to directly target desktop Linux distributions including, but not limited to, Endless OS.

A side-effect of this change is that our user experience has become somewhat less consistent because we have chosen not to theme apps distributed through Flathub, with the exception of minimize/maximize window controls and a different UI font; and, of course, Flathub offers apps built with many different toolkits. This is still a net positive: our users have access to many more applications than they would have done if we had continued distributing everything ourselves.

As the prototypal Endless OS user moved closer to the prototypal GNOME user, we have focused more on finding ways to converge with the GNOME user experience. In some cases, we’ve simply removed functionality which we don’t think is necessary for our current crop of users. For example, Endless OS used to target users whose display was a pre-digital TV screen, with a 720×480 resolution. I think persuading the upstream maintainers of GNOME applications to support this resolution would have been a hard sell in 2014, let alone in 2019!

Some other changes we’ve made can and have been simply be proposed upstream as they are, but the bulk of our downstream functionality forms a different product to GNOME, which we feel is still valuable to our users. We are keen to both improve GNOME, and reduce the significant maintenance burden which Tobias rightly refers to, so we’re incrementally working out which functionality could make sense in both Endless and GNOME in some form, working out what that form could be, and implementing it. This is a big project because engaging constructively with the GNOME community involves more thought and nuance than opening a hundred code-dump merge requests and sailing away into the sunset.

If you are building a product whose starting point is “GNOME, but better”, then I encourage you to seriously consider whether you can work upstream first. I don’t think this is a groundbreaking idea in our community! However, that was not the starting point for Endless OS, and even today, we are aiming for a slightly different product to GNOME.

Back out to the big picture that is the subject of Tobias’ article: I agree that desktop fragmentation is a problem for app developers. Flatpak and Flathub are, in my opinion, a major improvement on the status quo: app developers can target a common environment, and have a reasonable expectation of their apps working on all manner of distributions, while we as distro maintainers need not pretend that we know best how to package a Java IDE. As the maintainer of a niche app written using esoteric tools, Flathub allowed me – for the first time since I wrote the first version in 2008 – to distribute a fully-functional, easy-to-install application directly to users without burdening distribution developers with the chore of packaging bleeding-edge versions of Haskell libraries. It gave me a big incentive to spend some of my (now very limited) free time on some improvements to the app that I had been putting off until I had a way to get them to users (including myself on Endless OS) in a timely manner.

On the other hand, we shouldn’t underestimate the value of GNOME – and distros like Debian – being a great base for products that look very different to GNOME itself: it enables experimentation, exploration, and reaching a broader base of users than GNOME alone could do, while pooling the bulk of our resources. (On a personal level, I owe pretty much my entire career in free software to products based on Debian and the GNOME stack!)

Some caveats: I joined Endless in mid-2016, midway through the story above, so I am relying on my past and current colleagues’ recollections of the early days of the company. Although today I am our Director of Platform, I am still just one person in the team! We’re not a hive mind, and I’m sure you’ll find different opinions on some of these points if you ask around.

Flatpak External Data Checker

(This post is a slightly longer version of a lightning talk I gave at GUADEC 2019.)

Many non-free applications’ binaries cannot be redistributed (particularly not in modified form), so they cannot be included directly in a Flatpak. To work around this, Flatpak supports the concept of “extra data”: files which will be downloaded and unpacked from a third-party URI when the app is installed. The URI is accompanied by a checksum and a size, to provide some hope that the data unpacked on the user’s system is the same as what the packager tested. This is used by, for example, the Dropbox Flatpak.

Of course, the Flatpak needs to be kept up to date when new versions of the app are released. At best, the old URL will still point to the same file, so at least the old version of the app will continue to be installed; in some cases, however, vendors publish new versions of the app at the same URL, which means the Flatpak cannot be installed until it is updated.

Some time ago, Joaquim Rocha started work on Flatpak External Data Checker to periodically check a Flatpak manifest and report when it needs updating. As well as just checking that a URL is reachable and has the expected size and checksum, it also knows how to follow a redirect to a stable URI for the latest version (a helpful pattern some apps use), or to find the latest package in an apt repository. I subsequently taught it how to determine the new app’s version, update the AppData file, commit the necessary changes to Git, and send a pull request (like this one).

I tried moderately hard to preserve YAML and XML comments and formatting. For JSON, I gave up trying to preserve formatting (let alone json-glib’s non-standard extensions); the output is at least deterministic, so once it’s reformatted the JSON, the diffs will be smaller in future.

At Endless, we run this for a short list of apps on Flathub (and a few on Endless’s Flatpak repo). If you want to get PRs for an app you maintain, add the necessary metadata to your Flathub application’s manifest, then send a pull request to update the list of repos we check. I hope that in the medium term we could move this over to Flathub’s build infrastructure and run it on every repo (with some way to opt out).

There are a fair few open issues – PRs, suggestions and bug reports all very welcome!

Age rating data for Flathub apps

OARS (Open Age Ratings Service) defines a scheme to include content rating information in apps’ AppData/AppStream file. GNOME Software and similar tools use this metadata to show age ratings for applications. In Endless OS, we also support restricting which applications a given user can install based on this data – see this page, and the reports it links to, for a bit more information about this feature and its future.

Screenshot: “Age Rating: 7. The application was rated this way because it features: Characters in aggressive conflict easily distinguishable from reality”

Every new app on Flathub should include OARS metadata, but there are many existing apps which don’t have this data, so it’s not (yet) enforced at build time. Edit: Bartłomiej tells me that it has been enforced at build time for a little over a month. (See App Requirements and AppData Guidelines on the Flathub wiki for some more information on what’s required and recommended; this branch of appstream-glib is modified to enforce Flathub’s policies.) My colleague Andre Magalhaes crunched the data and opened a tracker task for Flathub apps without OARS metadata. ((We have a similar list for our in-house apps.)) This information is much more useful if it can be relied upon to be present.

If you’re familiar with an app on this list, generating the OARS data is a simple process: open this generator in your browser, answer some questions about the app, and receive some XML. The next step is to put that OARS data into the AppData. ((If you’re the upstream maintainer for the app, you probably already know how to do all this, and can stop reading here!)) Take a look at the app’s Flathub repo and check whether it has an .appdata.xml file.

If it doesn’t, then the app’s AppData must be maintained upstream. Great! Find the upstream repository for the project, and send a merge request there. (Here’s one I sent earlier, for D-Feet.) You can either add the same patch to the Flathub packaging (as I did for D-Feet) or wait for a new upstream release and then update the Flathub packaging to that version (as I also did for D-Feet, a couple of days later).

If the appdata is maintained in the Flathub repo, make the relevant changes there directly. (Here’s a PR I opened for Tux, of Math Command while writing this post.) Ideally, the appdata would make its way upstream, but there are a fair few apps on Flathub which do not have active upstreams.

You might well find that the appdata requirements have become more strict about things other than OARS since the app was last updated, and these will have to be fixed too. In both the example cases above, I had to add release information, which has become mandatory for Flathub apps since these were last updated.

Yes, it was something to do with controlling terminals

Back in June, I wrote about the difficulty of killing the dbus-monitor subprocess when Bustle is recording the system bus:

But sending SIGKILL from the unprivileged Bustle process to the privileged dbus-monitor process has no effect. Weirdly, if I run pkexec dbus-monitor --system in a terminal, I can hit Ctrl+C and kill it just fine. (Is this something to do with controlling terminals? Please tell me if you know how I could make this work.)

Greetings, past me! I’m just like you, only 6 months older, and I’m here to tell you how to make this work.

When you run a process in a terminal (emulator), its controlling TTY is a pseudoterminal (hereafter PTY) slave ((not my choice of terminology)) device. When you press Ctrl+C, the terminal doesn’t send SIGINT to the running process; it writes the ^C character '\x03' into the corresponding PTY master. This is intercepted by the kernel’s TTY layer, which eats this character and sends SIGINT to the (foreground) process (group) attached to the PTY slave. It doesn’t matter if that process (group) is owned by another user: being able to write to its controlling terminal is the capability you need.

Armed with this knowledge, and some further reading on how to set the controlling TTY, it was reasonably easy to wire this up in Bustle:

You may wonder how the second step works in the Flatpak case, where the pkexec dbus-monitor process is not spawned directly by Bustle, but by the Flatpak session helper daemon. Well, we don’t get a chance to run our own code between fork() and exec(), but that’s okay because the session helper does exactly the same thing for us. (I think this is why Ctrl+C works in a host shell running within Flatpak:d GNOME Builder.)

The devil makes work for idle processes

TLDR: in Endless OS, we switched the IO scheduler from CFQ to BFQ, and set the IO priority of the threads doing Flatpak downloads, installs and upgrades to “idle”; this makes the interactive performance of the system while doing Flatpak operations indistinguishable from when the system is idle.

At Endless, we’ve been vaguely aware for a while that trying to use your computer while installing or updating apps is a bit painful, particularly on spinning-disk systems, because of the sheer volume of IO performed by the installation/update process. This was never particularly high priority, since app installations are user-initiated, and until recently, so were app updates.

But, we found that users often never updated their installed apps, so earlier this year, in Endless OS 3.3.10, we introduced automatic background app updates to help users take advantage of “new features and bug fixes” (in the generic sense you so often see in iOS/Android app release notes). This fixed the problem of users getting “stuck” on old app versions, but made the previous problem worse: now, your computer becomes essentially unusable at arbitrary times when app updates happen. It was particularly bad when users unboxed a system with an older version of Endless OS (and hence a hundred or so older apps) pre-installed, received an automatic OS update, then rebooted into a system that’s unusable until all those apps have been updated.

At first, I looked for logic errors in (our versions of) GNOME Software and Flatpak that might cause unneccessary IO during app updates, without success. We concluded that heavy IO load when updating a large app or runtime is largely unavoidable, ((modulo Umang’s work mentioned in the coda)) so I switched to looking at whether we could mitigate this by tweaking the IO scheduler.

The BFQ IO scheduler is supposed to automatically prioritize interactive workloads over bulk workload, which is pretty much exactly what we’re trying to do. The specific example its developers give is watching a video, without hiccups, while copying a huge file in the background. I spent some time with the BFQ developers’ own suite of benchmarks on two test systems: a Lenovo Yoga 900 (with an Intel i5-6200U @ 2.30GHz and a consumer-grade M.2 SSD) and an Endless Mission One (an older system with a Celeron CPU and a laptop-class spinning disk). Neither JP nor I were able to reproduce any interesting results for the dropped-frames benchmark: with either BFQ or CFQ (the previous default IO scheduler), the Yoga essentially never dropped frames, whereas the IO workloads immediately rendered the Mission totally unusable. I had rather more success with a benchmark which measures the time to launch LibreOffice:

  • On the Yoga, when the system was idle, the mean launch time went from 2.838s under CFQ to 2.98s under BFQ (a slight regression), but with heavy background IO, the mean launch time went from 16s with CFQ (standard deviation 0.11) to 3s with BFQ (standard deviation 0.51).
  • On the Mission, with modest background IO, the mean launch time was 108 seconds under BFQ, which sounds awful; but under CFQ, I gave up waiting for LibreOffice to start after 8 minutes!

Emboldened by these results, I went on to look at how the same “time to launch LibreOffice” benchmark fared when the background IO load is “installing and uninstalling a Lollipop Flatpak bundle in a loop”. I also looked at using ionice -c3 to set the IO priority of the install/uninstall loop to idle, which does what its name suggests: BFQ essentially will never serve IO at the idle priority if there is IO pending at any higher priority. You can see some raw data or look at some extended discussion copied from our internal issue tracker to a Flatpak pull request, but I suggest just looking at this chart:

What does it all mean?

  • The coloured bars represent median launch time in seconds for LibreOffice, across 15/30 trials for Yoga/Mission respectively.
  • The black whiskers show the minimum and maximum launch times observed. I know this should have been a box-and-whiskers or violin plot, but I realised too late that multitime does not give enough information to draw those.
  • “unloaded” refers to the performance when the system is otherwise idle.
  • “shell-loop” refers to running while true; do flatpak install -y /home/wjt/Downloads/org.gnome.Lollypop.flatpak; flatpak uninstall -y org.gnome.Lollypop/x86_64/stable; done; “long-lived” refers to performing the same operations with the Flatpak API in a long-lived process. I tried this because I understood that BFQ gives new processes a slight performance boost, but on a real system the GNOME Software and Flatpak system helper processes are long-lived. As you can see, the behaviour under BFQ is actually the other way around in the worst case, and identical for CFQ and in the median case.
  • The “ionice-” prefix means the Flatpak operation was run under ionice -c3.
  • Switching from CFQ to BFQ makes the worst case a little worse at the default IO priority, but the median case much better.
  • Setting the IO priority of the Flatpak process(es) to idle erases that worst-case regression under BFQ, and dramatically improves the median case under CFQ.
  • In combination, the time to launch LibreOffice while performing Flatpak operations in the background on the Mission went from 24 seconds to 12 seconds by switching to BFQ & setting the IO priority to idle.

So, by switching to BFQ and setting IO priorities appropriately, the system’s interactive performance while performing background updates is now essentially indistinguishable from when the system is idle. To implement this in practice, Rob McQueen wrote some patches to set the IO priority of the Flatpak system helper and GNOME Software’s worker threads to idle (both changes are upstream) and changed Endless OS’s default IO scheduler to BFQ where available. As Matthias put it on #flatpak when shown this chart and that first link: “not bad for a 1-line change”.

Of course, this means apps take a bit longer to install, even on a mostly-idle system. No, I don’t have numbers on how big the impact is: this work happened months ago and it’s taken me this long to write it up because I couldn’t find the time to collect more data. But my colleague Umang is working on eliminating up to half of the disk IO performed during Flatpak installations so that should more than make up for it!

Using Vundle from the Vim Flatpak

I (mostly) use Vim, and it’s available on Flathub. However: my plugins are managed with Vundle, which shells out to git to download and update them. But git is not in the org.freedesktop.Platform runtime that the Vim Flatpak uses, so that’s not going to work!

If you’ve read any of my recent posts about Flatpak, you’ll know my favourite hammer. I allowed Vim to use flatpak-spawn by launching it as:

flatpak run --talk-name=org.freedesktop.Flatpak org.vim.Vim

I saved the following file to /tmp/git:

#!/bin/sh
exec flatpak-spawn --host git "$@"

then ran the following Vim commands to make it executable, add it to the path, then fire up Vundle:

:r !chmod +x /tmp/git
:let $PATH = '/tmp:/app/bin:/usr/bin'
:VundleInstall

This tricks Vundle into running git outside the sandbox. It worked!

I’m posting this partly as a note to self for next time I want to do this, and partly to say “can we do better?”. In this specific case, the Vim Flatpak could use org.freedesktop.Sdk as its runtime, like many other editors do. But this only solves the problem for tools like git which are included in the relevant SDK. What if I’m writing Python and want to use pyflakes, which is not in the SDK?

Everybody’s Gone To The GUADEC

It’s been ten days since I came back from GUADEC 2018, and I’ve finally caught up enough to find the time to write about it. As ever, it was a pleasure to see familiar faces from around the community, put some new faces to familiar names, and learn some entirely new names and faces! Some talk highlights:

  • In “Patterns of refactoring C to Rust”, Federico Mena Quintero pulled off the difficult trick of giving a very source code-centric talk without losing the audience. (He said afterwards that the style he used is borrowed from a series of talks he referenced in his slides, but the excellent delivery was certainly a large part of why it worked.)
  • Christian Hergert and Corentin Noël’s talk on “What’s happening in Builder?” left me feeling good about the future of cross-architecture and cross-device GNOME app development. Developing OS and platform components in a desktop-containerised world is not a fully-solved problem; between upcoming plans for Builder and Philip Chimento’s Flapjack, I think we’re getting there.
  • I’m well-versed in Flatpak but know very little about Snap, so Robert Ancell’s talk on “Snap Package support in GNOME” was enlightening. It’s heartening that much of the user-facing infrastructure to solve problems common to Snap and Flatpak (such as GNOME Software and portals) is shared, and it was interesting to learn about some of the unique featues of Snap which make it attractive to ISVs.

I couldn’t get to Almería until the Friday evening; I’m looking forward to checking out video recordings of some of the talks I missed. (Shout-out to the volunteers editing these videos! Update: the videos are now mostly published; I’ve added links to the three talks above.)

One of the best bits of any conference is the hallway track, and GUADEC did not disappoint. Fellow Endlesser Carlo Caione and I caught up with Javier Martinez Canillas from Red Hat to discuss some of the boot-loader questions shared between Endless OS and Silverblue, like the downstream Boot Loader Specification module for GRUB, and how to upgrade GRUB itself—which lives outside the atomic world of OSTree—in as robust and crash-proof a manner as is feasible.

On the bus to the campus on Sunday, I had an interesting discussion with Robert Ancell about acquiring domain expertise too late in a project to fix the design decisions made earlier on (which has happened to me a fair few times). While working on LightDM, he avoided this trap by building a thorough integration test suite early on; this allowed him to refactor with confidence as he discovered murky corners of display management. As I understand it (sorry if I’ve misremembered from the noisy bus ride!), he wrote a library which essentially shims every syscall. This made it easier to mock and exercise all the complicated interactions the display manager has with many different parts of the OS via many IPC mechanisms. I always regret it when I procrastinate on test coverage; I’ll keep this discussion in mind as extra ammunition to do the right thing.

My travel to and from Almería was kindly sponsored by the GNOME Foundation. Thank you!

Sponsored by GNOME Foundation

Bustle 0.7.1: jumping the ticket barrier

Bustle 0.7.1 is out now and supports monitoring the system bus, without requiring any prior system configuration. It also lets you monitor any other bus by providing its address, which I’ve already used to spy on ibus traffic.

Screenshot: Bustle monitoring IBus messages.

Bustle used to try to intercept all messages by adding one match rule per message type, with the eavesdrop=true flag set. This is how dbus-monitor also worked when Bustle was created. This works okay on the session bus, but for obvious reasons, a regular user being able to snoop on all messages on the system bus would be undesirable; even if you were root, you had to edit the policy to allow you to do this. So, the Bustle UI didn’t have any facility for monitoring the system bus; instead, the web page had some poorly-specified instructions suggesting you remove all restrictive policies from the system bus, reboot, do your monitoring, then reimpose the security policies. Not very stetic!

D-Bus 1.5.6 ((or version 0.18 of the spec if you’re into that)) added a BecomeMonitor method explicitly designed for debugging tools: if the bus considers you privileged enough, it will send you a copy of every future message that passes through the bus until you disconnect. Privileged enough is technically implementation defined; the reference implementation is that if you are root, or have the same UID as the bus itself, you can become a monitor.

Three panels: woman smiling while looking at computer screen; mouse pointer pointing at 'Become a fan' button from an old version of Facebook; woman has been replaced by a desk fan.

Bustle, which runs as a regular user, needs to outsource the task of monitoring the system bus to some other privileged process. The normal way to do this kind of thing – as used by tools like sysprof – is to have a system service which runs as root and uses polkit to determine whether to allow the request. The D-Bus daemon itself is not about to talk to polkit (itself a D-Bus service), though, and when distributed with Flatpak it’s not possible for Bustle to install its own system service.

I decided to cheat and assume that pkexec – the polkit equivalent of sudo – and dbus-monitor are both installed on the host system. A few years ago, dbus-monitor grew the ability to dump out raw D-Bus messages in pcap format, which by pure coincidence is the on-disk format Bustle uses. So Bustle builds a command line as follows:

  • If running inside a Flatpak, append flatpak-spawn --host
  • If trying to monitor the system bus, pkexec
  • dbus-monitor --pcap [--session | --system | --address ADDRESS]

It launches this subprocess, then reads the stream of messages from its stdout. In the system bus case, polkit will prompt the user to authenticate and approve the action:

Screenshot: Authentication is needed to run /usr/bin/dbus-monitor as the super user.

Assuming you authenticate successfully, the messages will flow:

Screenshot: Bustle recording messages on the system bus.

There are a few more fiddly details:

  • If the dbus-monitor subprocess quits unexpectedly, we want to show an error; but if the user hits cancel on the polkit authentication dialog, we don’t. pkexec signals this with exit code 126. Now you know why I care about flatpak-spawn propagating exit codes correctly: without that, Bustle can’t readily distinguish these two cases.
  • Bustle’s old internal monitor code did an initial state dump of names and their owners when it connected to the bus. dbus-monitor doesn’t do this yet. For now, Bustle waits until it knows the monitor is running, then makes its own connection to the same bus and performs the same state dump, which will be captured by the monitor and end up in the same stream. This means that Bustle in Flatpak still needs access to the session and system buses.
  • Once started, dbus-monitor only stops when the bus exits, it tries and fails to write to its stdout pipe, or it is killed by a signal. But sending SIGKILL from the unprivileged Bustle process to the privileged dbus-monitor process has no effect. Weirdly, if I run pkexec dbus-monitor --system in a terminal, I can hit Ctrl+C and kill it just fine. (Is this something to do with controlling terminals? Please tell me if you know how I could make this work.) So instead we just close the pipe that dbus-monitor is writing to, and assume that at some point in the future it will try and fail to log a new message to it. ((I’ve realised while writing this post that I can bring “some point in the future” forward by just connecting to the bus again.))
  • Why bother with the pcap framing when Bustle immediately takes it back apart? It means messages are timestamped in the dbus-monitor process, so they’re marginally more likely to be accurate than they would be if the Bustle UI process (which might be doing some rendering) was adding the timestamps.

On the one hand, I’m pleased with this technique, because it allows me to implement a feature I’ve wanted Bustle to have since I started it in 2008(!). On the other hand, this isn’t going to cut it in the general case of a Flatpaked development tool like sysprof, which needs a system helper but can’t reasonably assume that it’s already installed on the host system. Of course there’s nothing to stop you doing something like flatpak-spawn --host pkexec flatpak run --command=my-system-helper com.example.MyApp ... I suppose…