GNOME 3.36 / Endless OS 3.8

Endless OS 3.8.0 has just been released, which brings GNOME 3.36 to our users. There’s a particularly big overlap between “improvements in Endless OS” and “improvements in GNOME” this cycle, so I wanted to take a minute to look back over what the Endless team worked on in GNOME 3.36.

Login & Unlock Screen

Allan Day has already written about the improvements to the login and unlock experience in GNOME 3.36, so I won’t retread this ground in too much detail. As he (and Nick Richards, in his trip report for Endless OS 3.8.0) mentioned, this change has been anticipated for a long time, so I’m particularly glad that Georges Stavracas and Umang Jain (together with Florian Müllner from Red Hat) could make this happen for this release. The first thing I interact with when I sit down at my computer is the login screen or the lock screen, and the refreshed design is a pleasure to use. (My daughter is sad that Granny’s cat is no longer visible on the lock screen, though.)

GNOME unlock dialog, with Will Thompson's name and face, and password “Tremendousdangerouslookingyak” visible

Peek Password

One improvement that’s perhaps most visible in the redesigned lock screen is the inline “eye” icon to reveal the text in the password field, which was implemented by Umang Jain independently of his work on the lock screen itself. The motivation for this change was actually another system dialogue: the Wi-Fi password dialogue.

During the development of the Hack product – a game-like platform for self-directed learning built atop Endless OS – the team ran many playtesting sessions. While the emphasis of these sessions was on Hack itself, the test users – typically younger teens – would often run through initial setup on a freshly-installed OS. Within a few clicks of turning on the computer, you select your Wi-Fi network and enter its password, which turned out to be a big stumbling block for many users. Wi-Fi passwords are long strings of randomly-generated characters, and on many occasions users simply couldn’t enter the password correctly. The entry has always had a Show Text option in the right-click menu, but right-clicking is itself an unfamiliar operation for younger users more familiar with mobile devices.

Parental Controls, redux

For a year or so, Endless OS has included a parental controls feature, which operates along a couple of axes:

  • Specific installed apps can be disabled for particular users. As a special case, all general-purpose web browsers are controlled by a separate toggle.
  • Not-yet-installed apps visible in GNOME Software — which we rebrand as App Center — can be filtered based on their OARS content rating metadata.
  • Users can be prevented from installing apps at all.

In past releases, this feature was hard to discover and use. At a superficial level, the UI to control it was buried in Settings → Details → Users → (select a non-administrator user) → (scroll down) → (frame within frame within frame). But the real issue was that many Endless OS systems have the child as the primary, administrator user, created through Initial Setup when the machine is unboxed. To meaningfully use parental controls, you’d need to create a separate parent user, then downgrade the child’s account, neither of which is a particularly discoverable operation.

In autumn last year, we met with Allan Day, Richard Hughes and Matthias Clasen from Red Hat to talk through this problem space. Following that, Robin Tafel, Philip Withnall and Matthew Leeds designed and implemented a new flow for parental controls. The key changes are:

  1. Parental controls can be enabled during initial setup. Check a box, choose some options, and specify a parent password.
  2. Once initial setup is complete, there is a dedicated Parental Controls app.

Screenshot of “About You” page from GNOME Initial Setup, showing “Set up parental controls for this user” checkbox (checked)

Screenshot of Parental Controls page of GNOME Initial Setup, showing options to restrict which applications can be installed or used

Screenshot of GNOME Initial Setup “Set a Parent Password” page, with two password fields and one password hint field

Screenshot of Parental Controls application, showing options to restrict which apps a user can install or run

There are a few downstream bits and bobs outstanding, such as a cross-reference from GNOME Settings’ Users panel, but the bulk of this feature is available upstream in GNOME Initial Setup, Software, and Shell 3.36. Parental controls needs close integration with the application management infrastructure, and Flatpak upstream has the necessary hooks. On Endless OS, supporting Flatpak apps — plus Chromium as a special case — is good enough, since that is the sole mechanism for installing applications. It would be great to see support in Malcontent for other package and app managers.

Special thanks to Jakub Steiner for creating a great icon at very short notice.

Malcontent icon: Silhoutte of parent and child holding hands

Renaming Folders

One of the biggest differences between vanilla GNOME and Endless OS is the app grid, which in Endless is on the desktop and fully under the user’s control. Georges Stavracas has been incrementally chipping away at this, and support for renaming folders landed in GNOME 3.36.

Screenshot of renaming a folder titled “Jeux”

The Long Tail

Besides highly-visible new features and redesigns, much (perhaps even most?) of the work of maintaining a desktop is in the parts you don’t see: improving libraries and plumbing, incremental tweaks to user interfaces, and dealing with the wide variety of hardware, software and users that interact with GNOME. Spelunking through the commit histories of various projects, I see many names of colleagues present and past, including André Moreira Magalhães and Philip Chimento respectively. Jian-Hong Pan from the Endless kernel team makes an appearance in GNOME Settings, as does a feature from erstwhile Endless kernel hacker Carlo Caione dating back to 2018.

Umang Jain, Philip Withnall and Matthew Leeds have put a lot of work into improving the robustness of GNOME Software and Flatpak, and there’s more landing as we speak. I’m particularly glad that Matthew has been tracking down missing Flatpak app updates in GNOME Software – bugs which hide information can be the trickiest ones to spot. And Philip is solving the latest Mystery of the Missing Progress Bar when installing Flatpak apps in GNOME Software.

I’m certain I’ve missed many great contributions. Please forgive me, fellow Endlessers.

A Broad Church

Perhaps my favourite part of being involved in GNOME is collaborating with great people from organisations who, in a different world, might be bitter rivals. All of the work I’ve described was a joint effort with others from the GNOME community; and, just as other distributors share the fruits of our labour, we and our users share the fruits of theirs. This is the latest in a long line of great GNOME releases – long may this trend continue.

Vanilla is a complex and delicious flavour

Last week, Tobias Bernard published a thought-provoking article, There is no “Linux” Platform (Part 1), based on a talk at LAS 2019. (Unfortunately I couldn’t make it to LAS, and I haven’t found the time to watch a recording of the talk, so I’m going solely from the blog post here.) The article makes some interesting observations, and I found a fair few things to agree with. But I want to offer a counterpoint to this paragraph of the final section, “The Wrong Incentives”:

The Endless OS shell is a great example of this. They started out with vanilla GNOME Shell, but then added ever more downstream patches in order to address issues found in in-house usability tests. This means that they end up having to do huge rebases every release, which is a lot of work. At the same time, the issues that prompted the changes do not get fixed upstream (Endless have recently changed their strategy and are working upstream much more now, so hopefully this will get better in the future).

If we’re looking at the code shipping in Endless OS today, then yes, our desktop is vanilla GNOME Shell with a few hundred patches on top, and yes, as a result, rebasing onto new GNOME releases is a lot of work. But the starting point for Endless OS was not “what’s wrong with GNOME?” but “what would the ideal desktop look like for a new category of users?”.

When Endless began, the goal was to create a new desktop computing product, targeting new computer users in communities which were under-served by existing platforms and products. The company conducted extensive field research, and designed a desktop user interface for those users. Prototypes were made using various different components, including Openbox, but ultimately the decision was made to base the desktop on GNOME, because GNOME provided a collection of components closest to the desired user experience. The key point here is that basing the Endless desktop on GNOME was an implementation detail, made because the GNOME stack is a robust, feature-rich and flexible base for a desktop.

Over time, the strategy shifted away from being based solely around first-party hardware, towards distributing our software to a broader set of users using standard desktop and laptop hardware. Around the same time, Endless made the switch from first- and third-party apps packaged as a combination of Debian packages and an in-house system towards using Flatpak for apps, and contributed towards the establishment of Flathub. Part of the motivation for this switch was to get Endless out of the business of packaging other people’s applications, and instead to enable app developers to directly target desktop Linux distributions including, but not limited to, Endless OS.

A side-effect of this change is that our user experience has become somewhat less consistent because we have chosen not to theme apps distributed through Flathub, with the exception of minimize/maximize window controls and a different UI font; and, of course, Flathub offers apps built with many different toolkits. This is still a net positive: our users have access to many more applications than they would have done if we had continued distributing everything ourselves.

As the prototypal Endless OS user moved closer to the prototypal GNOME user, we have focused more on finding ways to converge with the GNOME user experience. In some cases, we’ve simply removed functionality which we don’t think is necessary for our current crop of users. For example, Endless OS used to target users whose display was a pre-digital TV screen, with a 720×480 resolution. I think persuading the upstream maintainers of GNOME applications to support this resolution would have been a hard sell in 2014, let alone in 2019!

Some other changes we’ve made can and have been simply be proposed upstream as they are, but the bulk of our downstream functionality forms a different product to GNOME, which we feel is still valuable to our users. We are keen to both improve GNOME, and reduce the significant maintenance burden which Tobias rightly refers to, so we’re incrementally working out which functionality could make sense in both Endless and GNOME in some form, working out what that form could be, and implementing it. This is a big project because engaging constructively with the GNOME community involves more thought and nuance than opening a hundred code-dump merge requests and sailing away into the sunset.

If you are building a product whose starting point is “GNOME, but better”, then I encourage you to seriously consider whether you can work upstream first. I don’t think this is a groundbreaking idea in our community! However, that was not the starting point for Endless OS, and even today, we are aiming for a slightly different product to GNOME.

Back out to the big picture that is the subject of Tobias’ article: I agree that desktop fragmentation is a problem for app developers. Flatpak and Flathub are, in my opinion, a major improvement on the status quo: app developers can target a common environment, and have a reasonable expectation of their apps working on all manner of distributions, while we as distro maintainers need not pretend that we know best how to package a Java IDE. As the maintainer of a niche app written using esoteric tools, Flathub allowed me – for the first time since I wrote the first version in 2008 – to distribute a fully-functional, easy-to-install application directly to users without burdening distribution developers with the chore of packaging bleeding-edge versions of Haskell libraries. It gave me a big incentive to spend some of my (now very limited) free time on some improvements to the app that I had been putting off until I had a way to get them to users (including myself on Endless OS) in a timely manner.

On the other hand, we shouldn’t underestimate the value of GNOME – and distros like Debian – being a great base for products that look very different to GNOME itself: it enables experimentation, exploration, and reaching a broader base of users than GNOME alone could do, while pooling the bulk of our resources. (On a personal level, I owe pretty much my entire career in free software to products based on Debian and the GNOME stack!)

Some caveats: I joined Endless in mid-2016, midway through the story above, so I am relying on my past and current colleagues’ recollections of the early days of the company. Although today I am our Director of Platform, I am still just one person in the team! We’re not a hive mind, and I’m sure you’ll find different opinions on some of these points if you ask around.

Flatpak External Data Checker

(This post is a slightly longer version of a lightning talk I gave at GUADEC 2019.)

Many non-free applications’ binaries cannot be redistributed (particularly not in modified form), so they cannot be included directly in a Flatpak. To work around this, Flatpak supports the concept of “extra data”: files which will be downloaded and unpacked from a third-party URI when the app is installed. The URI is accompanied by a checksum and a size, to provide some hope that the data unpacked on the user’s system is the same as what the packager tested. This is used by, for example, the Dropbox Flatpak.

Of course, the Flatpak needs to be kept up to date when new versions of the app are released. At best, the old URL will still point to the same file, so at least the old version of the app will continue to be installed; in some cases, however, vendors publish new versions of the app at the same URL, which means the Flatpak cannot be installed until it is updated.

Some time ago, Joaquim Rocha started work on Flatpak External Data Checker to periodically check a Flatpak manifest and report when it needs updating. As well as just checking that a URL is reachable and has the expected size and checksum, it also knows how to follow a redirect to a stable URI for the latest version (a helpful pattern some apps use), or to find the latest package in an apt repository. I subsequently taught it how to determine the new app’s version, update the AppData file, commit the necessary changes to Git, and send a pull request (like this one).

I tried moderately hard to preserve YAML and XML comments and formatting. For JSON, I gave up trying to preserve formatting (let alone json-glib’s non-standard extensions); the output is at least deterministic, so once it’s reformatted the JSON, the diffs will be smaller in future.

At Endless, we run this for a short list of apps on Flathub (and a few on Endless’s Flatpak repo). If you want to get PRs for an app you maintain, add the necessary metadata to your Flathub application’s manifest, then send a pull request to update the list of repos we check. I hope that in the medium term we could move this over to Flathub’s build infrastructure and run it on every repo (with some way to opt out).

There are a fair few open issues – PRs, suggestions and bug reports all very welcome!

γυαδεκ? χκπτγεδ?

GUADEC in Thessaloniki was a great experience, as ever. Thank you once again to the GNOME Foundation for sponsoring my attendence!

Sponsored by GNOME Foundation

Some personal highlights, in no particular order:

  • A lot of useful and informative discussion at the GNOME Advisory Board meeting on Thursday – we ran out of time, which seems like a good sign.
  • After Benjamin Berg and Iain Lane’s great talk on Managing GNOME Sessions with Systemd, Benjamin and I discussed the special-case they had to make to run GNOME Initial Setup’s “copy worker” early in the user session, and whether we might be able to improve this and various other aspects by launching Initial Setup in a different way.
  • Via Matthias’ talk on Portals, I got thinking about the occasional requests for an “is this app installed?” portal, and I realised that you can actually fake it with existing machinery in some cases. If you care about a specific app, you probably want to be able to talk to it, so you specify --talk-name=org.example.Foo; at which point you can call org.freedesktop.DBus.ListActivatableNames() and check whether org.example.Foo is in the returned list.
  • The Intern Lightning Talks were inspiring: it’s great to see what has caught the interest of new contributors. This year, I was inspired by Srestha Srivastava’s work on Boxes to send a merge request to osinfo-db to generate the necessary XML for Endless OS. This in turn led to a great discussion with Fabiano and Felipe, and to some more issues and merge requests.
  • Alex Larsson was a tough act to follow at the lightning talks, but based on hallway discussion, my bit on Flatpak External Data Checker was of interest. (I taught it how to update appdata on the flight home. The person sitting next to me told me that writing code on flights is a young-person thing, which I took as a compliment.)
  • Not one, but two talks on user testing! One thing I took away is that while it’s possible to conduct remote usability testing, you’ll miss out on body language cues from the test subjects, and in the specific case of GNOME you’ll either bias the sample towards people who already use GNOME, or you’ll introduce the additional variable of whatever remote access tool the user uses. Not ideal!

On the Endless front, the launch of the Coding Education Challenge, and the various talks from my esteemed colleagues about varied activities, were all great to see.

There were lots of clashes for me, so I’m grateful to the AV team for their great work on recording all the talks. (Unfortunately, one of the talks I couldn’t make it to, on GDPR, was not recorded, to avoid distributing what could be construed as legal advice. Alas!) Many thanks to the local team and the GNOME Foundation staff and volunteers who made the event run so smoothly.

Age rating data for Flathub apps

OARS (Open Age Ratings Service) defines a scheme to include content rating information in apps’ AppData/AppStream file. GNOME Software and similar tools use this metadata to show age ratings for applications. In Endless OS, we also support restricting which applications a given user can install based on this data – see this page, and the reports it links to, for a bit more information about this feature and its future.

Screenshot: “Age Rating: 7. The application was rated this way because it features: Characters in aggressive conflict easily distinguishable from reality”

Every new app on Flathub should include OARS metadata, but there are many existing apps which don’t have this data, so it’s not (yet) enforced at build time. Edit: Bartłomiej tells me that it has been enforced at build time for a little over a month. (See App Requirements and AppData Guidelines on the Flathub wiki for some more information on what’s required and recommended; this branch of appstream-glib is modified to enforce Flathub’s policies.) My colleague Andre Magalhaes crunched the data and opened a tracker task for Flathub apps without OARS metadata.1 This information is much more useful if it can be relied upon to be present.

If you’re familiar with an app on this list, generating the OARS data is a simple process: open this generator in your browser, answer some questions about the app, and receive some XML. The next step is to put that OARS data into the AppData.2 Take a look at the app’s Flathub repo and check whether it has an .appdata.xml file.

If it doesn’t, then the app’s AppData must be maintained upstream. Great! Find the upstream repository for the project, and send a merge request there. (Here’s one I sent earlier, for D-Feet.) You can either add the same patch to the Flathub packaging (as I did for D-Feet) or wait for a new upstream release and then update the Flathub packaging to that version (as I also did for D-Feet, a couple of days later).

If the appdata is maintained in the Flathub repo, make the relevant changes there directly. (Here’s a PR I opened for Tux, of Math Command while writing this post.) Ideally, the appdata would make its way upstream, but there are a fair few apps on Flathub which do not have active upstreams.

You might well find that the appdata requirements have become more strict about things other than OARS since the app was last updated, and these will have to be fixed too. In both the example cases above, I had to add release information, which has become mandatory for Flathub apps since these were last updated.

  1. We have a similar list for our in-house apps. []
  2. If you’re the upstream maintainer for the app, you probably already know how to do all this, and can stop reading here! []

Rebasing downstream translations

At Endless, we maintain downstream translations for an number of GNOME projects, such as gnome-software, gnome-control-center and gnome-initial-setup. These are projects where our (large) downstream modifications introduce new user-facing strings. Sometimes, our translation for a string differs from the upstream translation for the same string. This may be due to:

  • a deliberate downstream style choice – such as tú vs. usted in Spanish
  • our fork of the project changing the UI so that the upstream translation does not fit in the space available in our UI – “Suspend” was previously translated into German as „In Bereitschaft versetzen“, which we changed to „Bereitschaft“ for this reason
  • the upstream translation being incorrect
  • the whim of a translator

Whenever we update to a new version of GNOME, we have to reconcile our downstream translations with the changes from upstream. We want to preserve our intentional downstream changes, and keep our translations for strings that don’t exist upstream; but we also want to pull in translations for new upstream strings, as well as improved translations for existing strings. Earlier this year, the translation-rebase baton was passed to me. My predecessor would manually reapply our downstream changes for a set of officially-supported languages, but unlike him, I can pretty much only speak English, so I needed something a bit more mechanical.

I spoke to various people from other distros about this problem.1 A common piece of advice was to not maintain downstream translation changes: appealing, but not really an option at the moment. I also heard that Ubuntu follows a straightforward rule: once the translation for a string has been changed downstream, all future upstream changes to the translation for that string are ignored. The assumption is that all downstream changes to a translation must have been made for a reason, and should be preserved. This is essentially a superset of what we’ve done manually in the past.

I wrote a little tool to implement this logic, pomerge translate-o-tron 3000 (or “t3k” for short).2 Its “rebase” mode takes the last common upstream ancestor, the last downstream commit, and a working copy with the newest downstream code. For each locale, for each string in the translation in the working copy, it compares the old upstream and downstream translations – if they differ, it merges the latter into the working copy. For example, Endless OS 3.5.x was based on GNOME 3.26; Endless OS 3.6.x is based on GNOME 3.32. I might rebase the translations for a module with:

$ cd src/endlessm/gnome-control-center

# The eos3.6 branch is based on the upstream 3.32.1 tag
$ git checkout eos3.6

# Update the .pot file
$ ninja -C build meson-gnome-control-center-2.0-pot

# Update source strings in .po files
$ ninja -C build meson-gnome-control-center-2.0-update-po

# The eos3.6 branch is based on the upstream 3.26.1 tag;
# merge downstream changes between those two into the working copy
$ t3k rebase `pwd` 3.26.1 eos3.5

# Optional: Python's polib formats .po files slightly differently to gettext;
# reformat them back. This has no semantic effect.
$ ninja -C build meson-gnome-control-center-2.0-update-po

$ git commit -am 'Rebase downstream translations'

It also has a simpler “copy” mode which copies translations from one .po file to another, either when the string is untranslated in the target (the default) or for all strings. In some cases, we’ve purchased translations for strings which have not been translated upstream; I’ve used this to submit some of those upstream, such as Arabic translations of the myriad OARS categories, and hope to do more of that in the future now I can make a computer do the hard work.

$ t3k copy \
    ~/src/endlessm/gnome-software/po/ar.po \
    ~/src/gnome/gnome-software/po/ar.po
  1. I’d love to credit the individuals I spoke to but my memory is awful. Please let me know if you remember being on the other side of these conversations and I’ll update this post! []
  2. Thanks to Alexandre Franke for pointing out the existence of at least one existing tool called “pomerge”. In my defence, I originally wrote this script on a Eurostar with no internet connection, so couldn’t search for conflicts at the time. []

Yes, it was something to do with controlling terminals

Back in June, I wrote about the difficulty of killing the dbus-monitor subprocess when Bustle is recording the system bus:

But sending SIGKILL from the unprivileged Bustle process to the privileged dbus-monitor process has no effect. Weirdly, if I run pkexec dbus-monitor --system in a terminal, I can hit Ctrl+C and kill it just fine. (Is this something to do with controlling terminals? Please tell me if you know how I could make this work.)

Greetings, past me! I’m just like you, only 6 months older, and I’m here to tell you how to make this work.

When you run a process in a terminal (emulator), its controlling TTY is a pseudoterminal (hereafter PTY) slave1 device. When you press Ctrl+C, the terminal doesn’t send SIGINT to the running process; it writes the ^C character '\x03' into the corresponding PTY master. This is intercepted by the kernel’s TTY layer, which eats this character and sends SIGINT to the (foreground) process (group) attached to the PTY slave. It doesn’t matter if that process (group) is owned by another user: being able to write to its controlling terminal is the capability you need.

Armed with this knowledge, and some further reading on how to set the controlling TTY, it was reasonably easy to wire this up in Bustle:

You may wonder how the second step works in the Flatpak case, where the pkexec dbus-monitor process is not spawned directly by Bustle, but by the Flatpak session helper daemon. Well, we don’t get a chance to run our own code between fork() and exec(), but that’s okay because the session helper does exactly the same thing for us. (I think this is why Ctrl+C works in a host shell running within Flatpak:d GNOME Builder.)

  1. not my choice of terminology []

The devil makes work for idle processes

TLDR: in Endless OS, we switched the IO scheduler from CFQ to BFQ, and set the IO priority of the threads doing Flatpak downloads, installs and upgrades to “idle”; this makes the interactive performance of the system while doing Flatpak operations indistinguishable from when the system is idle.

At Endless, we’ve been vaguely aware for a while that trying to use your computer while installing or updating apps is a bit painful, particularly on spinning-disk systems, because of the sheer volume of IO performed by the installation/update process. This was never particularly high priority, since app installations are user-initiated, and until recently, so were app updates.

But, we found that users often never updated their installed apps, so earlier this year, in Endless OS 3.3.10, we introduced automatic background app updates to help users take advantage of “new features and bug fixes” (in the generic sense you so often see in iOS/Android app release notes). This fixed the problem of users getting “stuck” on old app versions, but made the previous problem worse: now, your computer becomes essentially unusable at arbitrary times when app updates happen. It was particularly bad when users unboxed a system with an older version of Endless OS (and hence a hundred or so older apps) pre-installed, received an automatic OS update, then rebooted into a system that’s unusable until all those apps have been updated.

At first, I looked for logic errors in (our versions of) GNOME Software and Flatpak that might cause unneccessary IO during app updates, without success. We concluded that heavy IO load when updating a large app or runtime is largely unavoidable,1 so I switched to looking at whether we could mitigate this by tweaking the IO scheduler.

The BFQ IO scheduler is supposed to automatically prioritize interactive workloads over bulk workload, which is pretty much exactly what we’re trying to do. The specific example its developers give is watching a video, without hiccups, while copying a huge file in the background. I spent some time with the BFQ developers’ own suite of benchmarks on two test systems: a Lenovo Yoga 900 (with an Intel i5-6200U @ 2.30GHz and a consumer-grade M.2 SSD) and an Endless Mission One (an older system with a Celeron CPU and a laptop-class spinning disk). Neither JP nor I were able to reproduce any interesting results for the dropped-frames benchmark: with either BFQ or CFQ (the previous default IO scheduler), the Yoga essentially never dropped frames, whereas the IO workloads immediately rendered the Mission totally unusable. I had rather more success with a benchmark which measures the time to launch LibreOffice:

  • On the Yoga, when the system was idle, the mean launch time went from 2.838s under CFQ to 2.98s under BFQ (a slight regression), but with heavy background IO, the mean launch time went from 16s with CFQ (standard deviation 0.11) to 3s with BFQ (standard deviation 0.51).
  • On the Mission, with modest background IO, the mean launch time was 108 seconds under BFQ, which sounds awful; but under CFQ, I gave up waiting for LibreOffice to start after 8 minutes!

Emboldened by these results, I went on to look at how the same “time to launch LibreOffice” benchmark fared when the background IO load is “installing and uninstalling a Lollipop Flatpak bundle in a loop”. I also looked at using ionice -c3 to set the IO priority of the install/uninstall loop to idle, which does what its name suggests: BFQ essentially will never serve IO at the idle priority if there is IO pending at any higher priority. You can see some raw data or look at some extended discussion copied from our internal issue tracker to a Flatpak pull request, but I suggest just looking at this chart:

What does it all mean?

  • The coloured bars represent median launch time in seconds for LibreOffice, across 15/30 trials for Yoga/Mission respectively.
  • The black whiskers show the minimum and maximum launch times observed. I know this should have been a box-and-whiskers or violin plot, but I realised too late that multitime does not give enough information to draw those.
  • “unloaded” refers to the performance when the system is otherwise idle.
  • “shell-loop” refers to running while true; do flatpak install -y /home/wjt/Downloads/org.gnome.Lollypop.flatpak; flatpak uninstall -y org.gnome.Lollypop/x86_64/stable; done; “long-lived” refers to performing the same operations with the Flatpak API in a long-lived process. I tried this because I understood that BFQ gives new processes a slight performance boost, but on a real system the GNOME Software and Flatpak system helper processes are long-lived. As you can see, the behaviour under BFQ is actually the other way around in the worst case, and identical for CFQ and in the median case.
  • The “ionice-” prefix means the Flatpak operation was run under ionice -c3.
  • Switching from CFQ to BFQ makes the worst case a little worse at the default IO priority, but the median case much better.
  • Setting the IO priority of the Flatpak process(es) to idle erases that worst-case regression under BFQ, and dramatically improves the median case under CFQ.
  • In combination, the time to launch LibreOffice while performing Flatpak operations in the background on the Mission went from 24 seconds to 12 seconds by switching to BFQ & setting the IO priority to idle.

So, by switching to BFQ and setting IO priorities appropriately, the system’s interactive performance while performing background updates is now essentially indistinguishable from when the system is idle. To implement this in practice, Rob McQueen wrote some patches to set the IO priority of the Flatpak system helper and GNOME Software’s worker threads to idle (both changes are upstream) and changed Endless OS’s default IO scheduler to BFQ where available. As Matthias put it on #flatpak when shown this chart and that first link: “not bad for a 1-line change”.

Of course, this means apps take a bit longer to install, even on a mostly-idle system. No, I don’t have numbers on how big the impact is: this work happened months ago and it’s taken me this long to write it up because I couldn’t find the time to collect more data. But my colleague Umang is working on eliminating up to half of the disk IO performed during Flatpak installations so that should more than make up for it!

  1. modulo Umang’s work mentioned in the coda []

Wandering in the symlink forest forever

Last week, Philip Withnall told me that Meson has built-in support for generating code coverage reports: just configure with -Db_coverage=true, run your tests with ninja test, then run ninja coverage-{text,html,xml} to generate the report in the format of your choice. The XML format is compatible with Cobertura’s output, which is convenient since Endless’s Jenkins is already configure to consume Cobertura XML generated by Autotools projects using our EOS_COVERAGE_REPORT macro. So it was a simple matter of adding gcovr to the build enviroment, running ninja coverage-xml after the tests, and moving the report to the right place for Jenkins to find it. It worked well on the projects I tested, so I decided to enable it for all Meson projects built in our CI. Sure, I thought, it’s not so useful for our forks of GNOME and third-party projects, but it’s harmless and saves adding per-project config, right?

Fast-forward to yesterday, when someone noticed that a systemd build had been stuck on the ninja coverage-xml step for 16 hours. Uh oh.

It turns out that gcovr follows symlinks when scanning for coverage files, but didn’t check for cycles. systemd’s test suite generates a fake sysfs tree, with many circular references via symlinks. For example, there are 64 self-referential ttyX trees:

$ ls -l build/test/sys/devices/virtual/tty/tty1
total 12
-rw-r--r-- 1 wjt wjt    4 Oct  9 12:16 dev
drwxr-xr-x 2 wjt wjt 4096 Oct  9 12:16 power
lrwxrwxrwx 1 wjt wjt   21 Oct  9 12:16 subsystem -> ../../../../class/tty
-rw-r--r-- 1 wjt wjt   16 Oct  9 12:16 uevent
$ ls -l build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
lrwxrwxrwx 1 wjt wjt 30 Oct  9 12:16 build/test/sys/devices/virtual/tty/tty1/subsystem/tty1 -> ../../devices/virtual/tty/tty1
$ readlink -f build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
/home/wjt/src/endlessm/systemd/build/test/sys/devices/virtual/tty/tty1

And, worse, all other ttyY trees are accessible via the symlinks from each ttyX tree. The kernel caps the number of symlinks per path to 40 before lookups fail with ELOOP, but that’s still 6440 paths to resolve, just for the fake ttys. Quite a big number!

The fix is straightforward: maintain a set of visited (st_dev, st_ino) pairs while walking the tree, and prune subtrees we’ve already visited. I tried adding a similar highly self-referential symlink graph to the gcovr test suite, so that it would run in reasonable time if the fix works and essentially never terminate if it does not. Unfortunately, pytest has exactly the same bug: while searching for tests to run, it gets lost wandering in the symlink forest forever.

This bug is a good metaphor for my habit of starting supposedly-quick side-projects.

They should have called it Mirrorball

TL;DR: there’s now an rsync server at rsync://images-dl.endlessm.com/public from which mirror operators can pull Endless OS images, along with an instance of Mirrorbits to redirect downloaders to their nearest—and hopefully fastest!—mirror. Our installer for Windows and the eos-download-image tool baked into Endless OS both now fetch images via this redirector, and from the next release of Endless OS our mirrors will be used as BitTorrent web seeds too. This should improve the download experience for users who are near our mirrors.

If you’re interested in mirroring Endless OS, check out these instructions and get in touch. We’re particularly interested in mirrors in Southeast Asia, Latin America and Africa, since our mission is to improve access to technology for people in these areas.

Big thanks to Niklas Edmundsson, who administers the mirror at Academic Computer Club, Umeå University, who recommended Mirrorbits and provided the nudge needed to get this work going, and to dotsrc.org and Mythic Beasts who are also mirroring Endless OS already.

Read on if you are interested in the gory details of setting this up.


We’ve received a fair few offers of mirroring over the years, but without us providing an rsync server, mirror operators would have had to fetch over HTTPS using our custom JSON manifest listing the available images: extra setup and ongoing admin for organisations who are already generously providing storage and bandwidth. So, let’s set up an rsync server! One problem: our images are not stored on a traditional filesystem, but in Amazon S3. So we need some way to provide an rsync server which is backed by S3.

I decided to use an S3-backed FUSE filesystem to mount the bucket holding our images. It needs to provide a 1:1 mapping from paths in S3 to paths on the mounted filesystem (preserving the directory hierarchy), perform reasonably well, and ideally offer local caching of file contents. I looked at two implementations (out of the many that are out there) which have these features:

  • s3fs-fuse, which is packaged for Debian as s3fs. Debian is the predominant OS in our server infrastructure, as well as the base for Endless OS itself, so it’s convenient to have a package.1
  • goofys, which claims to offer substantially better performance for file metadata than s3fs.

I went with s3fs first, but it is a bit rough around the edges:

  • Our S3 bucket name contains dots, which is not uncommon. By default, if you try to use one of these with s3fs, you’ll get TLS certificate errors. This turns out to be because s3fs accesses S3 buckets as $NAME.s3.amazonaws.com, and the certificate is for *.s3.amazonaws.com, which does not match foo.bar.s3.amazonaws.com. s3fs has a -o use_path_request_style flag which avoids this problem by putting the bucket name into the request path rather than the request domain, but this use of that parameter is only documented in a GitHub Issues comment.
  • If your bucket is in a non-default region, AWS serves up a redirect, but s3fs doesn’t follow it. Once again, there’s an option you can use to force it to use a different domain, which once again is documented in a comment on an issue.
  • Files created with s3fs have their permissions stored in an x-amz-meta-mode header. Files created by other tools (which is to say, all our files) do not have this header, so by default get mode 0000 (ie unreadable by anybody), and so the mounted filesystem is completely unusable (even by root, with FUSE’s default settings).

There are two ways to fix this last problem, short of adding this custom header to all existing and future files:

  1. The -o complement_stat option forces files without the magic header to have mode 0400 (user-readable) and directories 0500 (user-readable and -searchable).
  2. The -o umask=0222 option (from FUSE) makes the files and directories world-readable (an improvement on complement_stat in my opinion) at the cost of marking all files executable (which they are not)

I think these are all things that s3fs could do by itself, by default, rather than requiring users to rummage through documentation and bug reports to work out what combination of flags to add. None of these were showstoppers; in the end it was a catastrophic memory leak (since fixed in a new release) that made me give up and switch to goofys.

Due to its relaxed attitude towards POSIX filesystem semantics where performance would otherwise suffer, goofys’ author refers to it as a “Filey System”.2 In my testing, throughput is similar to s3fs, but walking the directory hierarchy is orders of magnitude faster. This is due to goofys making more assumptions about the bucket layout, not needing to HEAD each file to get its permissions (that x-amz-meta-mode thing is not free), and having heuristics to detect traversals of the directory tree and optimize for that case.3

For on-disk caching of file contents, goofys relies on catfs, a separate FUSE filesystem by the same author. It’s an interesting design: catfs just provides a write-through cache atop any other filesystem. The author has data showing that this arrangement performs pretty well. But catfs is very clearly labelled as alpha software (“Don’t use this if you value your data.”) and, as described in this bug report with a rather intemperate title, it was not hard to find cases where it DOSes itself or (worse) returns incorrect data. So we’re running without local caching of file data for now. This is not so bad, since this server only uses the file data for periodic synchronisation with mirrors: in day-to-day operation serving redirects, only the metadata is used.

With this set up, it’s plain sailing: a little rsync configuration generator that uses the filter directive to only publish the last two releases (rather than forcing terabytes of archived images onto our mirrors) and setting up Mirrorbits. Our Mirrorbits instance is configured with some extra “mirrors” for our CloudFront distribution so that users who are closer to a CloudFront-served region than any real mirrors are directed there; it could also have been configured to make the European mirrors (which they all are, as of today) only serve European users, and rely on its fallback mechanism to send the rest of the world to CloudFront. It’s a pretty nice piece of software.

If you’ve made it this far, and you operate a mirror in Southeast Asia, South America or Africa, we’d love to hear from you: our mission is to improve access to technology for people in these areas.

  1. Do not confuse s3fs-fuse with fuse-s3fs, a totally separate project packaged in Fedora. It uses its own flattened layout for the S3 bucket rather than mapping S3 paths to filesystem paths, so is not suitable for what we’re doing here. []
  2. This inspired a terrible “joke”. []
  3. I’ve implemented a similar optimization elsewhere in our codebase: since we have many "directories", it takes many less requests to ask S3 for the full contents of the bucket and transform that list into a tree locally than it does to list each directory individually. []