On Flatpak disk usage and deduplication

There is a blog post doing the rounds asserting that Flatpak Is Not The Future. The post is really long, and it seems unlikely that I and the author will ever agree on this topic, so I’m only going to talk about a couple of paragraphs about disk usage and sharing of runtimes between apps which caught my eye. This is highly relevant to my day job because all apps on Endless OS are Flatpaks—for example, the English downloadable version has 58 Flatpak apps pre-installed, and 13 runtimes—and I’ve had and answered some of the same questions discussed in the post.

They claim that they deduplicate runtimes. I question how much can really be shared between different branches when everything is recompiled.

This question is really easy to answer using du, which does not double-count files which are hardlinked together. Let’s compare the 20.08 and 21.08 versions of the freedesktop runtime:

wjt@camille:~$ du -sh /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/20.08
674M	/var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/20.08
wjt@camille:~$ du -sh /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08
498M	/var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08
wjt@camille:~$ du -sh /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/20.08 /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08
674M	/var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/20.08
385M	/var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08
wjt@camille:~$ echo $(( 498 - 385 ))                                                                                                                                                
113

113 MB (out of 498 MB for the smaller, more up-to-date 21.08 runtime) is shared between these two runtimes.

How about the GNOME 41 runtime, which is derived from the 21.08 freedesktop runtime?

wjt@camille:~$ du -sh /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08                                                                                                
498M	/var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08
wjt@camille:~$ du -sh /var/lib/flatpak/runtime/org.gnome.Platform/x86_64/41                                                                                                
715M	/var/lib/flatpak/runtime/org.gnome.Platform/x86_64/41
wjt@camille:~$ du -sh /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08 /var/lib/flatpak/runtime/org.gnome.Platform/x86_64/41
498M	/var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/21.08
327M	/var/lib/flatpak/runtime/org.gnome.Platform/x86_64/41
wjt@camille:~$ echo $(( 715 - 327 ))                                                                                                                                                
388

388 MB (out of 715 MB) of the GNOME 41 runtime is shared with the 21.08 runtime.

I can’t imagine what system updates will be like in the future when you have a few dozen apps storing tens of gigabytes of runtimes that all want to be kept up to date.

There is no need to imagine! I have 163 Flatpak apps on my Endless OS system. Let’s see how many runtimes I have, how big they are, and how many apps use each one:

wjt@camille:~$ flatpak list --app --columns=runtime | sort | uniq -c | wc -l
18
wjt@camille:~$ flatpak list --app --columns=runtime | sort | uniq -c | sort -n
      1 com.endlessm.apps.Platform/x86_64/6
      1 org.freedesktop.Platform/x86_64/18.08
      1 org.gnome.Platform/x86_64/3.34
      1 org.gnome.Sdk/x86_64/41
      1 org.kde.Platform/x86_64/5.14
      2 com.endlessm.Platform/x86_64/eos3.2
      2 org.gnome.Platform/x86_64/3.28
      3 org.freedesktop.Platform/x86_64/19.08
      3 org.freedesktop.Sdk/x86_64/21.08
      3 org.gnome.Platform/x86_64/3.38
      3 org.kde.Platform/x86_64/5.15-21.08
      5 org.gnome.Platform/x86_64/3.36
      9 org.kde.Platform/x86_64/5.15
     10 org.freedesktop.Platform/x86_64/20.08
     24 com.endlessm.apps.Platform/x86_64/5
     28 org.freedesktop.Platform/x86_64/21.08
     30 org.gnome.Platform/x86_64/40
     36 org.gnome.Platform/x86_64/41
wjt@camille:~$ cd /var/lib/flatpak/runtime; flatpak list --app --columns=runtime | sort | uniq | xargs du -sh --total
918M	com.endlessm.apps.Platform/x86_64/5
907M	com.endlessm.apps.Platform/x86_64/6
2.0G	com.endlessm.Platform/x86_64/eos3.2
632M	org.freedesktop.Platform/x86_64/18.08
211M	org.freedesktop.Platform/x86_64/19.08
569M	org.freedesktop.Platform/x86_64/20.08
385M	org.freedesktop.Platform/x86_64/21.08
782M	org.freedesktop.Sdk/x86_64/21.08
26M	org.gnome.Platform/x86_64/3.28
329M	org.gnome.Platform/x86_64/3.34
264M	org.gnome.Platform/x86_64/3.36
263M	org.gnome.Platform/x86_64/3.38
198M	org.gnome.Platform/x86_64/40
277M	org.gnome.Platform/x86_64/41
264M	org.gnome.Sdk/x86_64/41
436M	org.kde.Platform/x86_64/5.14
231M	org.kde.Platform/x86_64/5.15
215M	org.kde.Platform/x86_64/5.15-21.08
8.7G	total

I have 18 runtimes, totalling 8.7 GB of storage (deduplicated), not “tens of gigabytes”. The top 5 most-used runtimes on my system cover 128 of the 163 apps. (I am ignoring the .Locale extensions of each runtime: the English and French translations of the 21.08 runtime total 17 MB, compared to 498 MB for the runtime itself, so I think this is a reasonable simplification for rough numbers.) As for updates? GNOME Software applies them automatically and silently. I don’t think about them at all.

People may disagree about whether the numbers above are large or small, compared to the upsides that Flatpak does or does not bring. But the numbers themselves are readily accessible, as is much of the past and ongoing work that has gone into making them as small/large as they are.

Personally, I think the trade-off is absolutely worth it for me and for Endless OS users, particularly since going all-in on Flatpak means that the base, immutable Endless OS install is just 4.2 GB. Of course there is room for improvement, and years ago I wrote a quick hack to help study exactly which files would ideally be shared between two runtimes but are not. At the time, the primary cause was non-reproducible builds. Since then, the Flatpak ecosystem has moved over to Buildstream which should help a lot, though I haven’t rerun the experiments except for what you see above. Automated statistics about apps using obsolete runtimes might be useful for the Flathub community, as might automated runtime updates, and some further work on understanding if and how the derived runtimes (GNOME & KDE) could share more with the freedesktop runtime version they are based on. And, would widespread use of filesystems that support block-level deduplication (like btrfs) help?

Endless Orange Week: GNOME on WSL

The week of 8th–12th November was Endless Orange Week, a program where the entire Endless OS Foundation team engaged in projects designed to grow our collective learning related to our skills, work and mission. My project was to explore running a complete GNOME desktop in a window on Windows, via Windows Subsystem for Linux.

Screenshot of Windows Remote Desktop Connection window, containing a GNOME desktop running Hack and GNOME Setttings, in turn showing Virtualization: wsl

Why?!

We’ve long faced the challenge of getting Endless OS into the hands of existing PC users, whether to use it themselves or to try it out with a view to a larger deployment. Most people don’t know what an OS is, and even if they have a spare PC find the process of replacing the OS technically challenging. Over the years, we’ve tried various approaches: live USBs, an ultra-simple standalone installer (as seen in GNOME OS), dual-booting with Windows (with a 3-click installer app), virtual machine images, and so on. These have been modestly successful – 5% of our download users are using a dual-boot system, for example – but there’s still room for improvement. (I have a personal interest in this problem space because it’s what I joined Endless to work on in 2016!)

In the last few years, it’s become possible to run Linux executables on Windows, using Windows Subsystem for Linux (WSL). Installing the Debian app from the Windows Store gives you a command-line environment which works pretty much like a normal Debian system, running atop a Microsoft-supplied Linux kernel. Most recently, unmodified applications can present X11 or Wayland windows and play audio through PulseAudio; behind the scenes, a Microsoft-supplied distribution with their branches of Weston and PulseAudio exports each Wayland or X11 window & its audio over RDP to the host system, where it appears as a free-standing window like any other. There is also support upstream in Mesa for using the host system’s GPU, with DirectX commands forwarded to the host via a miraculous kernel interface.

This raises an interesting question: rather than individual apps installed and launched from a command line, could the whole desktop be run as a window, packaged up into an easy-to-use launcher, and published in the Windows Store? If so, this would be a nice improvement on the other installation methods we’ve tried to date!

Proofs of concept

I spent last week researching this. Yes, you can indeed run a complete GNOME desktop under WSL. I tried two approaches, which each have strengths and weaknesses. I worked with Debian Bookworm, which has GNOME 41 and an up-to-date Mesa with the Direct3D backend. Imagine telling someone 20 years ago that Debian would one day include development headers for DirectX in the main repository!

I packaged up my collection of scripts and hacks into something which can build a suitable rootfs to import into WSL and launch either demo with a single command. There may be things that I got working in my “pet” container that don’t quite work in this replicable “cattle” container, but I did my best.

GNOME desktop as X11 app

GNOME Shell can be run as a so-called “nested session”, with the entire desktop appearing as a window inside your existing session. Thanks to my team-mate Georges Stavracas for his help understanding this mode, and particularly the surprising (to me) detail that the nested session can only be run as an X11 window, not a Wayland window, which explained some baffling errors I saw when I first tried to get this going.

Once you’ve got enough of the environment GNOME expects running (more on this below), you can indeed just run it with a carefully-crafted set of environment variables and command-line arguments, and it appears on your Windows system, resplendent with Weston’s window decorations:

GNOME desktop running a game, in a window on Windows

Apps can even emit sound over PulseAudio as normal, or at least they could once I fixed an edge case in Flatpak’s handling of the PulseAudio socket. It’s hard to show audio in a screenshot, but you can see the sounds being emitted by Sidetrack (the WebKitWebViewProcess) as well as Hack somewhere in the background (the python3 process), both of which are Flatpak apps, and the RDP sink as output.

So on the face of it this seems quite promising! But Shell’s nested mode is primarily intended for development, with the window size fixed at launch by an environment variable with DEBUG in its name.

I found WSLg to be quite fragile. WSLg’s Xwayland often fell over for unknown reasons. I had to go out of my way to install a newer Intel graphics driver than would automatically be used, to get the vGPU support needed for Mesa’s Direct3D backend to work. But having done this, on one of my machines, the driver would just crash with SIGILL – apparently the driver unconditionally uses AVX instructions even if the CPU doesn’t support them. On my higher-end machine, most apps worked fine, but Shell on this stack would display just a few frames and then hang. In both cases, I could work around the problem by forcing Mesa to use software rendering, but this means losing one of the key advantages of WSLg!

Another weird anecdote: using the D3D12 backend, the GTK demo app’s shadertoy demo works fine, but its gears demo doesn’t render anything – and nor does glxgears! Apparently D3D12 just doesn’t like gears⁈

Thanks to Daniel Stone at Collabora for his patient help navigating Mesa and D3D12 passthrough.

GNOME desktop exported over RDP

In the past few GNOME releases, it’s become possible to access the desktop remotely using RDP. This is Windows’ native remote-access protocol, and is also the mechanism used by WSLg to export windows and audio to the host.

So another approach is to launch GNOME Shell in its headless mode, and then explicitly connect to it with Windows’ RDP client. A nice touch in WSL is that, by the magic of binfmt_misc and an automatic mount of the host system’s drive, you can invoke Windows executables on the host from within the WSL environment, so a single launch script can bring up GNOME, then spawn the Windows RDP client on the host with the correct parameters. (This is how WSLg works too.)

Not pictured below are the ugly authentication and certificate warning dialogs during the connection flow:

GNOME desktop, accessed over RDP, showing GTK 4's Shadertoy GL demo

Here, GNOME Shell is (AFAICT) rendering using Mesa’s accelerated D3D12 backend, as are GL applications running on it (the GTK 4 Shadertoy demo). But in this model we lose WSLg’s PulseAudio forwarding, and its use of shared memory to send the pixel contents of the desktop to the client. Both of these are solvable problems, though. GNOME Remote Desktop uses the same RDP library, FreeRDP, as WSLg, and all the other supporting code on the Linux side is open-source. GNOME Remote Desktop uses Pipewire rather than PulseAudio, and Mutter rather than Weston, so WSLg’s RDP plugins for PulseAudio and Weston could not be used directly, but audio forwarding over RDP seems a desirable feature to support for normal remoting use-cases. Supporting the shared-memory transport for RDP in GNOME Remote Desktop is perhaps a harder sell, but in principle it could be done.

Just like the nested session, the dimensions of a headless GNOME Shell session are currently fixed on startup. But again I think this would be desirable to solve anyway: this already works well for regular virtual machines, and when connecting to Windows RDP servers.

Rough edges

When you start a WSL shell, PID 1 is an init process provided by WSL, and that’s pretty much all you have: no systemd, no D-Bus system or session bus, nothing. GNOME requires, at least, a functioning system and session bus with various services on them. So for this prototype I used genie, which launches systemd in its own PID namespace and gives you shells within. This works OK, once you change the default target to not try to bring up a full graphical session, disable features not supported by the WSL kernel, and deal with something trampling on WSLg’s X11 sockets. (I thought it is systemd-tmpfiles, but I tried masking the x11.conf file with no success, so I hacked around it for now.) It may be easier to manually launch the D-Bus system bus and session bus without systemd, and run gnome-session in its non-systemd mode, but I expect over time that running GNOME without a systemd user instance will be an increasingly obscure configuration.

Speaking of X11 sockets: both my demos launch GNOME Shell as a pure Wayland compositor, without X11 support. This is because Mutter requires the X11 socket directory to have the sticky bit set, for security reasons, and refuses to start Xwayland if this is not true. But on WSLg /tmp/.X11-unix is a symlink; it is not possible to set the sticky bits on symlinks, and Mutter uses lstat() to explicitly check the symlink’s permissions rather than its target. I think it would be safe to check the symlink’s target instead, provided that Mutter also checked that /tmp had the sticky bit (preventing the symlink from being replaced), but I haven’t fully thought this through.

WSL is non-trivial to set up. The first time you try to run a WSL distro installed from the Windows Store, you have to follow a link to a support article which tells you how to install WSL, which involves a trip deep into the Settings app followed by a download and reboot. As mentioned above, I also had issues with the vGPU support in Intel’s driver on both my systems, which I had to go out of my way to install in the first place, and WSLg’s Xwayland session was somewhat unstable. So I fear it may not be much easier for a non-technical user than our existing installation methods. Perhaps this will change over time.

GNOME Remote Desktop’s RDP backend needs some manual set-up at present. You have to manually generate a TLS key and certificate, set up a new username & password combo, and set the session to be read-write. You also have to arrange for the GNOME Keyring to be unlocked in your headless session, which is a nice chicken-and-egg problem. Once you’ve done all this (and remembered to install a PipeWire session manager in your minimal container) it works rather nicely, and I know that design and engineering work is ongoing to make the set-up easier.

Conclusions

Although I got the desktop running, and there are some obvious bits of follow-up work that could be done to make the experience better, I don’t think this is currently a viable approach for making it easier to try GNOME or Endless OS. There is too much manual set-up required; while it might be possible to bundle some of the Linux side of this into the WSL wrapper app, installing WSL itself is still quite cumbersome, if not exactly rocket science. The many moving parts are still rather new, and I hit various crashes and strange behaviour. Even with software rendering, the performance was fine on my relatively high-end developer laptop, but performance was pretty bad on my normal Windows machine, a lower-spec device that might be more representative of computers in general.

I do still think this general approach of running the desktop windowed in a container on a foreign OS is an interesting one to keep an eye on and re-evaluate periodically. Chrome OS is another potential target, since it also supports running arbitrary Linux containers with Wayland forwarding, though my understanding is that it also involves rather a lot of manual set-up and is not supported on managed devices or when parental controls are in use…

I was happy to experience first-hand the progress GNOME has made in supporting RDP. This kind of functionality may not be important to most GNOME developers and enthusiasts but it’s really important in some contexts. I used to work in an environment where I needed remote access to my desktop, and RDP was the only permitted protocol. Back in 2014, the tools to do this on my Linux system were truly dire, particularly if you want to access a normal desktop remotely rather than a virtualised desktop; by contrast, accessing my Windows system from a Windows, Linux or macOS client worked really well. GNOME Remote Desktop has made some big strides in the right direction, with better integration with the desktop and fewer fragile hacks. I’ll keep watching this space with interest.

WSL itself is also an impressive technical achievement, and the entire Linux side of it is free software. There is nice integration with the host system – for example, you can run Visual Studio Code on the host by running code in WSL, and everything works transparently.

On a personal level, I learnt many new things during the course of Endless Orange Week. Besides learning about WSL, I also learnt how to break on a syscall in gdb (catch syscall 16 for ioctl() on x86_64) and inspect the parameter registers; how Mesa chooses its backend (fun fact: most of the modules in /usr/lib/x86_64-linux-gnu/dri/ are hardlinks of one another); the importance of a PipeWire session manager; more about how PID and mount namespaces work; and so on. It was a nice change from my usual day-to-day work, and I think the research is valuable, even if it doesn’t immediately translate into a production project.

Chromium on Flathub

In December 2020, Chromium reached the Flathub stable channel. Assuming you have Flatpak 1.8.2 or newer, and your kernel is configured to allow unprivileged user namespaces, you can download it now.

Screenshot of Chromium showing the Chromium page on flathub.org

History

Endless OS is based on Debian, but rather than releasing as a bunch of .debs, it is released as an immutable OSTree snapshot, with apps added and removed using Flatpak.

For many years, we maintained a branch of Chromium as a traditional OS package which was built into the OS itself, and updated it together with our monthly OS releases. This did not match up well with Chromium, which has a new major version every 6 weeks and typically 2–4 patch versions in between. It’s a security-critical component, and those patch versions invariably fix some rather serious vulnerability. In some ways, web browsers are the best possible example of apps that should be updated independently of the OS. (In a nice parallel, it seems that the Chrome OS folks are also working on separating OS updates from browser updates on Chrome OS.)

Browsers are also the best possible example of apps which should use elaborate sandboxing techniques to limit the impact of security vulnerabilities, and Chromium is indeed a pioneer in this space. Flatpak applies much the same tools to sandbox applications, which ironically made it harder to ship Chromium as a Flatpak: when running in the Flatpak sandbox, it can’t use those same sandboxing APIs provided by the kernel to sandbox itself further.

Flatpak provides its own API for sandboxed applications to launch new instances of themselves with tighter sandboxing; what’s needed is a way to make Chromium use that…

Solution

Ryan Gonzalez has had a long-running project with us to enable Chromium-based apps to work well as Flatpaks. The first targets were apps built with Electron: his zypak project provides an LD_PRELOAD-able library that redirects Chromium’s sandbox to use Flatpak’s sub-sandboxing API. This avoids the need to modify the (often proprietary) apps themselves, and is now used by dozens of Electron apps on Flathub which would otherwise not be usable with Flatpak. There’s also a version of Chrome in the Flathub beta channel using this technique.

For Chromium, we can take a different approach. It’s open-source code, being compiled by Flathub, so Ryan prepared some patches to teach it to use the Flatpak sandboxing APIs directly, for better performance and robustness.

Once the sandbox integration was done, there was a long list of other changes needed to make the Chromium Flatpak work at least as well as our previous built-in version, which André Moreira Magalhães from Endless worked through with Ryan.

Some of these came from the old Endless OS package, such as using a royalty-free implementation of AAC, splitting encumbered codecs to a separate package so they can be excluded as needed for distribution, and discarding background tabs when the system is under memory pressure (which is useful on systems with limited RAM, but is disabled by default on desktop Linux builds).

Others were specific to Flatpak, such as dealing with udev not being available in the sandbox, restoring the ability to create app launchers for websites, integrating with Flatpak’s network proxy portal, and allowing Chromium policy files to be provided by the host system.

Over in Endless OS, we also needed to update users’ existing file associations and migrate their Chromium profiles to its new home.

Impact

Chart of 30 days of Chromium downloads, with three large spikes of around 20,000 daily downloads

The chart above is the Flathub download statistics for Chromium in the past 30 days. Counting the points between 14th March (when the most recent update was pushed) and 21st March, there have been nearly 60,000 downloads. The majority of these will be Endless OS users: our 3.9.2 release in January 2021 rolled this change out to all users, and Endless OS has automatic updates enabled by default. But Flathub has a broader reach than just Endless OS! I believe that users of System76’s Pop!_OS have been migrated from a .deb of Chromium to this Flatpak, and surely there are many users on other distributions, too. It’s also been used as the basis for other apps on Flathub, including ungoogled-chromium.

As an added bonus, the Flatpak is wired up to flatpak-external-data-checker, which now automatically opens a pull request when a new Chromium release is published. Typically, new major releases need manual intervention to refresh the Flatpak patches, but minor releases often build without issue: for these, one can just smoke-test the test build from the pull request, and then merge it, reducing what used to be days of effort rebasing the package in Endless OS to the work of minutes. I love it when a plan comes together.

A quick glance at the issues on the flathub/org.chromium.Chromium repo will show that there is always more work to be done. We would love to see other distributions getting involved, reducing the duplicated work of maintaining Chromium packages for each distro, and making it easier for users of long-term stable branches to get important browser updates quickly and easily.

GNOME 3.36 / Endless OS 3.8

Endless OS 3.8.0 has just been released, which brings GNOME 3.36 to our users. There’s a particularly big overlap between “improvements in Endless OS” and “improvements in GNOME” this cycle, so I wanted to take a minute to look back over what the Endless team worked on in GNOME 3.36.

Login & Unlock Screen

Allan Day has already written about the improvements to the login and unlock experience in GNOME 3.36, so I won’t retread this ground in too much detail. As he (and Nick Richards, in his trip report for Endless OS 3.8.0) mentioned, this change has been anticipated for a long time, so I’m particularly glad that Georges Stavracas and Umang Jain (together with Florian Müllner from Red Hat) could make this happen for this release. The first thing I interact with when I sit down at my computer is the login screen or the lock screen, and the refreshed design is a pleasure to use. (My daughter is sad that Granny’s cat is no longer visible on the lock screen, though.)

GNOME unlock dialog, with Will Thompson's name and face, and password “Tremendousdangerouslookingyak” visible

Peek Password

One improvement that’s perhaps most visible in the redesigned lock screen is the inline “eye” icon to reveal the text in the password field, which was implemented by Umang Jain independently of his work on the lock screen itself. The motivation for this change was actually another system dialogue: the Wi-Fi password dialogue.

During the development of the Hack product – a game-like platform for self-directed learning built atop Endless OS – the team ran many playtesting sessions. While the emphasis of these sessions was on Hack itself, the test users – typically younger teens – would often run through initial setup on a freshly-installed OS. Within a few clicks of turning on the computer, you select your Wi-Fi network and enter its password, which turned out to be a big stumbling block for many users. Wi-Fi passwords are long strings of randomly-generated characters, and on many occasions users simply couldn’t enter the password correctly. The entry has always had a Show Text option in the right-click menu, but right-clicking is itself an unfamiliar operation for younger users more familiar with mobile devices.

Parental Controls, redux

For a year or so, Endless OS has included a parental controls feature, which operates along a couple of axes:

  • Specific installed apps can be disabled for particular users. As a special case, all general-purpose web browsers are controlled by a separate toggle.
  • Not-yet-installed apps visible in GNOME Software — which we rebrand as App Center — can be filtered based on their OARS content rating metadata.
  • Users can be prevented from installing apps at all.

In past releases, this feature was hard to discover and use. At a superficial level, the UI to control it was buried in Settings → Details → Users → (select a non-administrator user) → (scroll down) → (frame within frame within frame). But the real issue was that many Endless OS systems have the child as the primary, administrator user, created through Initial Setup when the machine is unboxed. To meaningfully use parental controls, you’d need to create a separate parent user, then downgrade the child’s account, neither of which is a particularly discoverable operation.

In autumn last year, we met with Allan Day, Richard Hughes and Matthias Clasen from Red Hat to talk through this problem space. Following that, Robin Tafel, Philip Withnall and Matthew Leeds designed and implemented a new flow for parental controls. The key changes are:

  1. Parental controls can be enabled during initial setup. Check a box, choose some options, and specify a parent password.
  2. Once initial setup is complete, there is a dedicated Parental Controls app.

Screenshot of “About You” page from GNOME Initial Setup, showing “Set up parental controls for this user” checkbox (checked)

Screenshot of Parental Controls page of GNOME Initial Setup, showing options to restrict which applications can be installed or used

Screenshot of GNOME Initial Setup “Set a Parent Password” page, with two password fields and one password hint field

Screenshot of Parental Controls application, showing options to restrict which apps a user can install or run

There are a few downstream bits and bobs outstanding, such as a cross-reference from GNOME Settings’ Users panel, but the bulk of this feature is available upstream in GNOME Initial Setup, Software, and Shell 3.36. Parental controls needs close integration with the application management infrastructure, and Flatpak upstream has the necessary hooks. On Endless OS, supporting Flatpak apps — plus Chromium as a special case — is good enough, since that is the sole mechanism for installing applications. It would be great to see support in Malcontent for other package and app managers.

Special thanks to Jakub Steiner for creating a great icon at very short notice.

Malcontent icon: Silhoutte of parent and child holding hands

Renaming Folders

One of the biggest differences between vanilla GNOME and Endless OS is the app grid, which in Endless is on the desktop and fully under the user’s control. Georges Stavracas has been incrementally chipping away at this, and support for renaming folders landed in GNOME 3.36.

Screenshot of renaming a folder titled “Jeux”

The Long Tail

Besides highly-visible new features and redesigns, much (perhaps even most?) of the work of maintaining a desktop is in the parts you don’t see: improving libraries and plumbing, incremental tweaks to user interfaces, and dealing with the wide variety of hardware, software and users that interact with GNOME. Spelunking through the commit histories of various projects, I see many names of colleagues present and past, including André Moreira Magalhães and Philip Chimento respectively. Jian-Hong Pan from the Endless kernel team makes an appearance in GNOME Settings, as does a feature from erstwhile Endless kernel hacker Carlo Caione dating back to 2018.

Umang Jain, Philip Withnall and Matthew Leeds have put a lot of work into improving the robustness of GNOME Software and Flatpak, and there’s more landing as we speak. I’m particularly glad that Matthew has been tracking down missing Flatpak app updates in GNOME Software – bugs which hide information can be the trickiest ones to spot. And Philip is solving the latest Mystery of the Missing Progress Bar when installing Flatpak apps in GNOME Software.

I’m certain I’ve missed many great contributions. Please forgive me, fellow Endlessers.

A Broad Church

Perhaps my favourite part of being involved in GNOME is collaborating with great people from organisations who, in a different world, might be bitter rivals. All of the work I’ve described was a joint effort with others from the GNOME community; and, just as other distributors share the fruits of our labour, we and our users share the fruits of theirs. This is the latest in a long line of great GNOME releases – long may this trend continue.

Vanilla is a complex and delicious flavour

Last week, Tobias Bernard published a thought-provoking article, There is no “Linux” Platform (Part 1), based on a talk at LAS 2019. (Unfortunately I couldn’t make it to LAS, and I haven’t found the time to watch a recording of the talk, so I’m going solely from the blog post here.) The article makes some interesting observations, and I found a fair few things to agree with. But I want to offer a counterpoint to this paragraph of the final section, “The Wrong Incentives”:

The Endless OS shell is a great example of this. They started out with vanilla GNOME Shell, but then added ever more downstream patches in order to address issues found in in-house usability tests. This means that they end up having to do huge rebases every release, which is a lot of work. At the same time, the issues that prompted the changes do not get fixed upstream (Endless have recently changed their strategy and are working upstream much more now, so hopefully this will get better in the future).

If we’re looking at the code shipping in Endless OS today, then yes, our desktop is vanilla GNOME Shell with a few hundred patches on top, and yes, as a result, rebasing onto new GNOME releases is a lot of work. But the starting point for Endless OS was not “what’s wrong with GNOME?” but “what would the ideal desktop look like for a new category of users?”.

When Endless began, the goal was to create a new desktop computing product, targeting new computer users in communities which were under-served by existing platforms and products. The company conducted extensive field research, and designed a desktop user interface for those users. Prototypes were made using various different components, including Openbox, but ultimately the decision was made to base the desktop on GNOME, because GNOME provided a collection of components closest to the desired user experience. The key point here is that basing the Endless desktop on GNOME was an implementation detail, made because the GNOME stack is a robust, feature-rich and flexible base for a desktop.

Over time, the strategy shifted away from being based solely around first-party hardware, towards distributing our software to a broader set of users using standard desktop and laptop hardware. Around the same time, Endless made the switch from first- and third-party apps packaged as a combination of Debian packages and an in-house system towards using Flatpak for apps, and contributed towards the establishment of Flathub. Part of the motivation for this switch was to get Endless out of the business of packaging other people’s applications, and instead to enable app developers to directly target desktop Linux distributions including, but not limited to, Endless OS.

A side-effect of this change is that our user experience has become somewhat less consistent because we have chosen not to theme apps distributed through Flathub, with the exception of minimize/maximize window controls and a different UI font; and, of course, Flathub offers apps built with many different toolkits. This is still a net positive: our users have access to many more applications than they would have done if we had continued distributing everything ourselves.

As the prototypal Endless OS user moved closer to the prototypal GNOME user, we have focused more on finding ways to converge with the GNOME user experience. In some cases, we’ve simply removed functionality which we don’t think is necessary for our current crop of users. For example, Endless OS used to target users whose display was a pre-digital TV screen, with a 720×480 resolution. I think persuading the upstream maintainers of GNOME applications to support this resolution would have been a hard sell in 2014, let alone in 2019!

Some other changes we’ve made can and have been simply be proposed upstream as they are, but the bulk of our downstream functionality forms a different product to GNOME, which we feel is still valuable to our users. We are keen to both improve GNOME, and reduce the significant maintenance burden which Tobias rightly refers to, so we’re incrementally working out which functionality could make sense in both Endless and GNOME in some form, working out what that form could be, and implementing it. This is a big project because engaging constructively with the GNOME community involves more thought and nuance than opening a hundred code-dump merge requests and sailing away into the sunset.

If you are building a product whose starting point is “GNOME, but better”, then I encourage you to seriously consider whether you can work upstream first. I don’t think this is a groundbreaking idea in our community! However, that was not the starting point for Endless OS, and even today, we are aiming for a slightly different product to GNOME.

Back out to the big picture that is the subject of Tobias’ article: I agree that desktop fragmentation is a problem for app developers. Flatpak and Flathub are, in my opinion, a major improvement on the status quo: app developers can target a common environment, and have a reasonable expectation of their apps working on all manner of distributions, while we as distro maintainers need not pretend that we know best how to package a Java IDE. As the maintainer of a niche app written using esoteric tools, Flathub allowed me – for the first time since I wrote the first version in 2008 – to distribute a fully-functional, easy-to-install application directly to users without burdening distribution developers with the chore of packaging bleeding-edge versions of Haskell libraries. It gave me a big incentive to spend some of my (now very limited) free time on some improvements to the app that I had been putting off until I had a way to get them to users (including myself on Endless OS) in a timely manner.

On the other hand, we shouldn’t underestimate the value of GNOME – and distros like Debian – being a great base for products that look very different to GNOME itself: it enables experimentation, exploration, and reaching a broader base of users than GNOME alone could do, while pooling the bulk of our resources. (On a personal level, I owe pretty much my entire career in free software to products based on Debian and the GNOME stack!)

Some caveats: I joined Endless in mid-2016, midway through the story above, so I am relying on my past and current colleagues’ recollections of the early days of the company. Although today I am our Director of Platform, I am still just one person in the team! We’re not a hive mind, and I’m sure you’ll find different opinions on some of these points if you ask around.

Flatpak External Data Checker

(This post is a slightly longer version of a lightning talk I gave at GUADEC 2019.)

Many non-free applications’ binaries cannot be redistributed (particularly not in modified form), so they cannot be included directly in a Flatpak. To work around this, Flatpak supports the concept of “extra data”: files which will be downloaded and unpacked from a third-party URI when the app is installed. The URI is accompanied by a checksum and a size, to provide some hope that the data unpacked on the user’s system is the same as what the packager tested. This is used by, for example, the Dropbox Flatpak.

Of course, the Flatpak needs to be kept up to date when new versions of the app are released. At best, the old URL will still point to the same file, so at least the old version of the app will continue to be installed; in some cases, however, vendors publish new versions of the app at the same URL, which means the Flatpak cannot be installed until it is updated.

Some time ago, Joaquim Rocha started work on Flatpak External Data Checker to periodically check a Flatpak manifest and report when it needs updating. As well as just checking that a URL is reachable and has the expected size and checksum, it also knows how to follow a redirect to a stable URI for the latest version (a helpful pattern some apps use), or to find the latest package in an apt repository. I subsequently taught it how to determine the new app’s version, update the AppData file, commit the necessary changes to Git, and send a pull request (like this one).

I tried moderately hard to preserve YAML and XML comments and formatting. For JSON, I gave up trying to preserve formatting (let alone json-glib’s non-standard extensions); the output is at least deterministic, so once it’s reformatted the JSON, the diffs will be smaller in future.

At Endless, we run this for a short list of apps on Flathub (and a few on Endless’s Flatpak repo). If you want to get PRs for an app you maintain, add the necessary metadata to your Flathub application’s manifest, then send a pull request to update the list of repos we check. I hope that in the medium term we could move this over to Flathub’s build infrastructure and run it on every repo (with some way to opt out).

There are a fair few open issues – PRs, suggestions and bug reports all very welcome!

Age rating data for Flathub apps

OARS (Open Age Ratings Service) defines a scheme to include content rating information in apps’ AppData/AppStream file. GNOME Software and similar tools use this metadata to show age ratings for applications. In Endless OS, we also support restricting which applications a given user can install based on this data – see this page, and the reports it links to, for a bit more information about this feature and its future.

Screenshot: “Age Rating: 7. The application was rated this way because it features: Characters in aggressive conflict easily distinguishable from reality”

Every new app on Flathub should include OARS metadata, but there are many existing apps which don’t have this data, so it’s not (yet) enforced at build time. Edit: Bartłomiej tells me that it has been enforced at build time for a little over a month. (See App Requirements and AppData Guidelines on the Flathub wiki for some more information on what’s required and recommended; this branch of appstream-glib is modified to enforce Flathub’s policies.) My colleague Andre Magalhaes crunched the data and opened a tracker task for Flathub apps without OARS metadata. ((We have a similar list for our in-house apps.)) This information is much more useful if it can be relied upon to be present.

If you’re familiar with an app on this list, generating the OARS data is a simple process: open this generator in your browser, answer some questions about the app, and receive some XML. The next step is to put that OARS data into the AppData. ((If you’re the upstream maintainer for the app, you probably already know how to do all this, and can stop reading here!)) Take a look at the app’s Flathub repo and check whether it has an .appdata.xml file.

If it doesn’t, then the app’s AppData must be maintained upstream. Great! Find the upstream repository for the project, and send a merge request there. (Here’s one I sent earlier, for D-Feet.) You can either add the same patch to the Flathub packaging (as I did for D-Feet) or wait for a new upstream release and then update the Flathub packaging to that version (as I also did for D-Feet, a couple of days later).

If the appdata is maintained in the Flathub repo, make the relevant changes there directly. (Here’s a PR I opened for Tux, of Math Command while writing this post.) Ideally, the appdata would make its way upstream, but there are a fair few apps on Flathub which do not have active upstreams.

You might well find that the appdata requirements have become more strict about things other than OARS since the app was last updated, and these will have to be fixed too. In both the example cases above, I had to add release information, which has become mandatory for Flathub apps since these were last updated.

Rebasing downstream translations

At Endless, we maintain downstream translations for an number of GNOME projects, such as gnome-software, gnome-control-center and gnome-initial-setup. These are projects where our (large) downstream modifications introduce new user-facing strings. Sometimes, our translation for a string differs from the upstream translation for the same string. This may be due to:

  • a deliberate downstream style choice – such as tú vs. usted in Spanish
  • our fork of the project changing the UI so that the upstream translation does not fit in the space available in our UI – “Suspend” was previously translated into German as „In Bereitschaft versetzen“, which we changed to „Bereitschaft“ for this reason
  • the upstream translation being incorrect
  • the whim of a translator

Whenever we update to a new version of GNOME, we have to reconcile our downstream translations with the changes from upstream. We want to preserve our intentional downstream changes, and keep our translations for strings that don’t exist upstream; but we also want to pull in translations for new upstream strings, as well as improved translations for existing strings. Earlier this year, the translation-rebase baton was passed to me. My predecessor would manually reapply our downstream changes for a set of officially-supported languages, but unlike him, I can pretty much only speak English, so I needed something a bit more mechanical.

I spoke to various people from other distros about this problem. ((I’d love to credit the individuals I spoke to but my memory is awful. Please let me know if you remember being on the other side of these conversations and I’ll update this post!)) A common piece of advice was to not maintain downstream translation changes: appealing, but not really an option at the moment. I also heard that Ubuntu follows a straightforward rule: once the translation for a string has been changed downstream, all future upstream changes to the translation for that string are ignored. The assumption is that all downstream changes to a translation must have been made for a reason, and should be preserved. This is essentially a superset of what we’ve done manually in the past.

I wrote a little tool to implement this logic, pomerge translate-o-tron 3000 (or “t3k” for short). ((Thanks to Alexandre Franke for pointing out the existence of at least one existing tool called “pomerge”. In my defence, I originally wrote this script on a Eurostar with no internet connection, so couldn’t search for conflicts at the time.)) Its “rebase” mode takes the last common upstream ancestor, the last downstream commit, and a working copy with the newest downstream code. For each locale, for each string in the translation in the working copy, it compares the old upstream and downstream translations – if they differ, it merges the latter into the working copy. For example, Endless OS 3.5.x was based on GNOME 3.26; Endless OS 3.6.x is based on GNOME 3.32. I might rebase the translations for a module with:

$ cd src/endlessm/gnome-control-center

# The eos3.6 branch is based on the upstream 3.32.1 tag
$ git checkout eos3.6

# Update the .pot file
$ ninja -C build meson-gnome-control-center-2.0-pot

# Update source strings in .po files
$ ninja -C build meson-gnome-control-center-2.0-update-po

# The eos3.6 branch is based on the upstream 3.26.1 tag;
# merge downstream changes between those two into the working copy
$ t3k rebase `pwd` 3.26.1 eos3.5

# Optional: Python's polib formats .po files slightly differently to gettext;
# reformat them back. This has no semantic effect.
$ ninja -C build meson-gnome-control-center-2.0-update-po

$ git commit -am 'Rebase downstream translations'

It also has a simpler “copy” mode which copies translations from one .po file to another, either when the string is untranslated in the target (the default) or for all strings. In some cases, we’ve purchased translations for strings which have not been translated upstream; I’ve used this to submit some of those upstream, such as Arabic translations of the myriad OARS categories, and hope to do more of that in the future now I can make a computer do the hard work.

$ t3k copy \
    ~/src/endlessm/gnome-software/po/ar.po \
    ~/src/gnome/gnome-software/po/ar.po

The devil makes work for idle processes

TLDR: in Endless OS, we switched the IO scheduler from CFQ to BFQ, and set the IO priority of the threads doing Flatpak downloads, installs and upgrades to “idle”; this makes the interactive performance of the system while doing Flatpak operations indistinguishable from when the system is idle.

At Endless, we’ve been vaguely aware for a while that trying to use your computer while installing or updating apps is a bit painful, particularly on spinning-disk systems, because of the sheer volume of IO performed by the installation/update process. This was never particularly high priority, since app installations are user-initiated, and until recently, so were app updates.

But, we found that users often never updated their installed apps, so earlier this year, in Endless OS 3.3.10, we introduced automatic background app updates to help users take advantage of “new features and bug fixes” (in the generic sense you so often see in iOS/Android app release notes). This fixed the problem of users getting “stuck” on old app versions, but made the previous problem worse: now, your computer becomes essentially unusable at arbitrary times when app updates happen. It was particularly bad when users unboxed a system with an older version of Endless OS (and hence a hundred or so older apps) pre-installed, received an automatic OS update, then rebooted into a system that’s unusable until all those apps have been updated.

At first, I looked for logic errors in (our versions of) GNOME Software and Flatpak that might cause unneccessary IO during app updates, without success. We concluded that heavy IO load when updating a large app or runtime is largely unavoidable, ((modulo Umang’s work mentioned in the coda)) so I switched to looking at whether we could mitigate this by tweaking the IO scheduler.

The BFQ IO scheduler is supposed to automatically prioritize interactive workloads over bulk workload, which is pretty much exactly what we’re trying to do. The specific example its developers give is watching a video, without hiccups, while copying a huge file in the background. I spent some time with the BFQ developers’ own suite of benchmarks on two test systems: a Lenovo Yoga 900 (with an Intel i5-6200U @ 2.30GHz and a consumer-grade M.2 SSD) and an Endless Mission One (an older system with a Celeron CPU and a laptop-class spinning disk). Neither JP nor I were able to reproduce any interesting results for the dropped-frames benchmark: with either BFQ or CFQ (the previous default IO scheduler), the Yoga essentially never dropped frames, whereas the IO workloads immediately rendered the Mission totally unusable. I had rather more success with a benchmark which measures the time to launch LibreOffice:

  • On the Yoga, when the system was idle, the mean launch time went from 2.838s under CFQ to 2.98s under BFQ (a slight regression), but with heavy background IO, the mean launch time went from 16s with CFQ (standard deviation 0.11) to 3s with BFQ (standard deviation 0.51).
  • On the Mission, with modest background IO, the mean launch time was 108 seconds under BFQ, which sounds awful; but under CFQ, I gave up waiting for LibreOffice to start after 8 minutes!

Emboldened by these results, I went on to look at how the same “time to launch LibreOffice” benchmark fared when the background IO load is “installing and uninstalling a Lollipop Flatpak bundle in a loop”. I also looked at using ionice -c3 to set the IO priority of the install/uninstall loop to idle, which does what its name suggests: BFQ essentially will never serve IO at the idle priority if there is IO pending at any higher priority. You can see some raw data or look at some extended discussion copied from our internal issue tracker to a Flatpak pull request, but I suggest just looking at this chart:

What does it all mean?

  • The coloured bars represent median launch time in seconds for LibreOffice, across 15/30 trials for Yoga/Mission respectively.
  • The black whiskers show the minimum and maximum launch times observed. I know this should have been a box-and-whiskers or violin plot, but I realised too late that multitime does not give enough information to draw those.
  • “unloaded” refers to the performance when the system is otherwise idle.
  • “shell-loop” refers to running while true; do flatpak install -y /home/wjt/Downloads/org.gnome.Lollypop.flatpak; flatpak uninstall -y org.gnome.Lollypop/x86_64/stable; done; “long-lived” refers to performing the same operations with the Flatpak API in a long-lived process. I tried this because I understood that BFQ gives new processes a slight performance boost, but on a real system the GNOME Software and Flatpak system helper processes are long-lived. As you can see, the behaviour under BFQ is actually the other way around in the worst case, and identical for CFQ and in the median case.
  • The “ionice-” prefix means the Flatpak operation was run under ionice -c3.
  • Switching from CFQ to BFQ makes the worst case a little worse at the default IO priority, but the median case much better.
  • Setting the IO priority of the Flatpak process(es) to idle erases that worst-case regression under BFQ, and dramatically improves the median case under CFQ.
  • In combination, the time to launch LibreOffice while performing Flatpak operations in the background on the Mission went from 24 seconds to 12 seconds by switching to BFQ & setting the IO priority to idle.

So, by switching to BFQ and setting IO priorities appropriately, the system’s interactive performance while performing background updates is now essentially indistinguishable from when the system is idle. To implement this in practice, Rob McQueen wrote some patches to set the IO priority of the Flatpak system helper and GNOME Software’s worker threads to idle (both changes are upstream) and changed Endless OS’s default IO scheduler to BFQ where available. As Matthias put it on #flatpak when shown this chart and that first link: “not bad for a 1-line change”.

Of course, this means apps take a bit longer to install, even on a mostly-idle system. No, I don’t have numbers on how big the impact is: this work happened months ago and it’s taken me this long to write it up because I couldn’t find the time to collect more data. But my colleague Umang is working on eliminating up to half of the disk IO performed during Flatpak installations so that should more than make up for it!

Wandering in the symlink forest forever

Last week, Philip Withnall told me that Meson has built-in support for generating code coverage reports: just configure with -Db_coverage=true, run your tests with ninja test, then run ninja coverage-{text,html,xml} to generate the report in the format of your choice. The XML format is compatible with Cobertura’s output, which is convenient since Endless’s Jenkins is already configure to consume Cobertura XML generated by Autotools projects using our EOS_COVERAGE_REPORT macro. So it was a simple matter of adding gcovr to the build enviroment, running ninja coverage-xml after the tests, and moving the report to the right place for Jenkins to find it. It worked well on the projects I tested, so I decided to enable it for all Meson projects built in our CI. Sure, I thought, it’s not so useful for our forks of GNOME and third-party projects, but it’s harmless and saves adding per-project config, right?

Fast-forward to yesterday, when someone noticed that a systemd build had been stuck on the ninja coverage-xml step for 16 hours. Uh oh.

It turns out that gcovr follows symlinks when scanning for coverage files, but didn’t check for cycles. systemd’s test suite generates a fake sysfs tree, with many circular references via symlinks. For example, there are 64 self-referential ttyX trees:

$ ls -l build/test/sys/devices/virtual/tty/tty1
total 12
-rw-r--r-- 1 wjt wjt    4 Oct  9 12:16 dev
drwxr-xr-x 2 wjt wjt 4096 Oct  9 12:16 power
lrwxrwxrwx 1 wjt wjt   21 Oct  9 12:16 subsystem -> ../../../../class/tty
-rw-r--r-- 1 wjt wjt   16 Oct  9 12:16 uevent
$ ls -l build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
lrwxrwxrwx 1 wjt wjt 30 Oct  9 12:16 build/test/sys/devices/virtual/tty/tty1/subsystem/tty1 -> ../../devices/virtual/tty/tty1
$ readlink -f build/test/sys/devices/virtual/tty/tty1/subsystem/tty1
/home/wjt/src/endlessm/systemd/build/test/sys/devices/virtual/tty/tty1

And, worse, all other ttyY trees are accessible via the symlinks from each ttyX tree. The kernel caps the number of symlinks per path to 40 before lookups fail with ELOOP, but that’s still 6440 paths to resolve, just for the fake ttys. Quite a big number!

The fix is straightforward: maintain a set of visited (st_dev, st_ino) pairs while walking the tree, and prune subtrees we’ve already visited. I tried adding a similar highly self-referential symlink graph to the gcovr test suite, so that it would run in reasonable time if the fix works and essentially never terminate if it does not. Unfortunately, pytest has exactly the same bug: while searching for tests to run, it gets lost wandering in the symlink forest forever.

This bug is a good metaphor for my habit of starting supposedly-quick side-projects.