Release (semi-)automation

The time I have available to maintain GNOME Initial Setup is very limited, as anyone who has looked at the commit history will have noticed. I’d love more eyes & hands on this important but easy-to-overlook component, particularly to guide it kindly but firmly into the modern age of GTK 4 and the refreshed HIG.

I found that making a batch of 1–3 releases across different GNOME branches every few months was surprisingly time-consuming and error-prone, even with the pretty comprehensive release process checklist on the GNOME Wiki, so I’ve been periodically trying to automate bits of it away.

Philip Withnall’s gitlab-changelog script makes writing the NEWS file a lot quicker. I taught it to output the human-readable names of each updated translation (a nice additional contribution would be to also include the name of the human who updated the translation) and made it a little smarter about guessing the Git commit range to scan.

Beyond that, I added a Meson run target, maintainer-upload-release pointing at a script which performs some rudimentary coherence checks on the version number, tags the release (using git-evtag if available), atomically pushes the branch and that tag to GNOME GitLab, then copies the source tarball to master.gnome.org. (Apparently it has been almost 12 years since I did something similar in telepathy-gabble, building on the make maintainer-upload-release target that Simon McVittie added in 2008, which is where I borrowed the name.) Maybe other module maintainers may find this script useful too – it’s quite generic.

Putting these together, the release flow looks like this:

git switch gnome-42
git pull
../pwithnall/gitlab-changelog/gitlab-changelog.py GNOME/gnome-initial-setup
# Manually edit NEWS to incorporate the changelog, adjusted as needed
# Manually check the version in meson.build
git commit -am 'NEWS for 42.Y'
ninja -C _build dist maintainer-upload-release

Another release-related quality-of-life improvement is to make GitLab CI not only build and test the project (in the vain hope that there might actually be tests!) but also check that the install and gnome-initial-setup-pot targets both work. (At one point or another both have failed at or around release time; now they never will again, famous last words.)

I know none of this is rocket science, but I find it all makes the process quicker and less cumbersome, and it’s stopped me from repeating errors like uploading the wrong version on a few tired evenings. Obviously this could all be taken further: perhaps a manually-invoked CI pipeline that does all this stuff, more checks, etc. But while I’m on this train of thought:

Why do we release GNOME modules one-by-one at all?

The workflow we use to release Endless OS is a bit different to GNOME. Once we merge a change to some module’s Git repository, such as eos-updater or our shrinking branch of GNOME Software, that change embarks on a scenic automated journey that takes it to the next nightly build of the entire OS, both as an OSTree update and as fresh installation media. I use these nightly builds for my daily work, safe in the knowledge that I can roll back to the previous build if necessary.

We don’t make releases of individual modules: instead, when it comes time to release the OS, we trigger a pipeline that (among many other things) pushes the already-built OS update to the production repo, and creates Release_x.y.z tags on each Git repo.

This was quite an adjustment for me at first, compared to lovingly hand-crafting NEWS files and coming up with funny/esoteric release names, but now that I’m used to it it’s hard to go back. Why can’t GNOME do the same?

At this point in the post, we are straying into territory that I have limited first-hand knowledge of. Caveat lector! But here goes:

Thanks to GNOME OS, GNOME already has nightly builds of the entire desktop and apps: so rather than having to build everything yourself, or wait for a development release of GNOME, you can just update & reboot your GNOME OS VM and test the change right there. gnome-build-meta knows how to build every GNOME module; and if you can build the code, it seems a conceptually small step to run ninja dist and the stuff above to publish tags and tarballs for each module.

So you could well imagine on 43.beta release day, someone in the release team could boot the latest GNOME OS nightly, declare it to be Good, and push a button that tags every relevant GNOME module & builds and uploads all the tarballs, and then go back to their day, rather than having to chase down module owners who haven’t quite got around to making the release, fix random build breakages, and so on.

To make this work reliably, I think you’d need every module’s CI to be run through gnome-build-meta, building that MR against the rest of the project, so that g-b-m build failures would be caught before (not after) the offending change lands in the module in question. Seems doable – in Endless we have the equivalent thing managed by a jenkins-job-builder template, the GitHub Pull Request Builder plugin, and a gnarly script.

Continuous integration and deployment are becoming the norm throughout the software industry, for good reasons laid out quite well in articles like Shipping Fast Changes Your Life: the smaller the gap between making a change and it reaching a user, the faster the feedback, and the less costly it is to fix a bug or change course.

The free software movement has historically been ahead of the curve on this, with the “release early, release often” philosophy. And GNOME in particular has used a time-based release process for two decades, allowing major distros to align their schedules to GNOME and get updates into the hands of users quickly, which went some way towards overcoming the fact that GNOME does not own the full pipeline from source code to end users.

Havoc Pennington’s June 2002 email proposing this model has aged rather well, in my opinion, and places a heavy emphasis on the development branch being usable:

The unstable branch must always be dogfood-quality. If testers can’t test it by using it daily, they can’t make the jump. If the unstable branch becomes too unstable, we can’t release it on a reliable schedule, so we have to start breaking the stable branch as a stopgap.

Interestingly the time-based release schedule wiki page states that the schedule should contain:

Regular test release dates, approximately every 2 weeks.

These days, GNOME releases are closer to monthly. In the context of the broader industry where updates reach users multiple times a day, this is starting to look a little less forward-thinking! Of course, continuously deploying an entire OS to production is rather harder than continuously deploying web apps or apps in app stores, if only because the stakes are higher: you need a really robust automatic rollback mechanism to save your users’ plant-based bacon substitute if a new OS build fails to boot, or worse, contains an updater bug that prevents future updates being applied! Still, I believe that a bit of automation would go a long way in allowing module maintainers and the release team alike to spend their scarce mental energy on other things, and allow the project to increase the frequency of releases. What am I missing?

Chromium on Flathub

In December 2020, Chromium reached the Flathub stable channel. Assuming you have Flatpak 1.8.2 or newer, and your kernel is configured to allow unprivileged user namespaces, you can download it now.

Screenshot of Chromium showing the Chromium page on flathub.org

History

Endless OS is based on Debian, but rather than releasing as a bunch of .debs, it is released as an immutable OSTree snapshot, with apps added and removed using Flatpak.

For many years, we maintained a branch of Chromium as a traditional OS package which was built into the OS itself, and updated it together with our monthly OS releases. This did not match up well with Chromium, which has a new major version every 6 weeks and typically 2–4 patch versions in between. It’s a security-critical component, and those patch versions invariably fix some rather serious vulnerability. In some ways, web browsers are the best possible example of apps that should be updated independently of the OS. (In a nice parallel, it seems that the Chrome OS folks are also working on separating OS updates from browser updates on Chrome OS.)

Browsers are also the best possible example of apps which should use elaborate sandboxing techniques to limit the impact of security vulnerabilities, and Chromium is indeed a pioneer in this space. Flatpak applies much the same tools to sandbox applications, which ironically made it harder to ship Chromium as a Flatpak: when running in the Flatpak sandbox, it can’t use those same sandboxing APIs provided by the kernel to sandbox itself further.

Flatpak provides its own API for sandboxed applications to launch new instances of themselves with tighter sandboxing; what’s needed is a way to make Chromium use that…

Solution

Ryan Gonzalez has had a long-running project with us to enable Chromium-based apps to work well as Flatpaks. The first targets were apps built with Electron: his zypak project provides an LD_PRELOAD-able library that redirects Chromium’s sandbox to use Flatpak’s sub-sandboxing API. This avoids the need to modify the (often proprietary) apps themselves, and is now used by dozens of Electron apps on Flathub which would otherwise not be usable with Flatpak. There’s also a version of Chrome in the Flathub beta channel using this technique.

For Chromium, we can take a different approach. It’s open-source code, being compiled by Flathub, so Ryan prepared some patches to teach it to use the Flatpak sandboxing APIs directly, for better performance and robustness.

Once the sandbox integration was done, there was a long list of other changes needed to make the Chromium Flatpak work at least as well as our previous built-in version, which André Moreira Magalhães from Endless worked through with Ryan.

Some of these came from the old Endless OS package, such as using a royalty-free implementation of AAC, splitting encumbered codecs to a separate package so they can be excluded as needed for distribution, and discarding background tabs when the system is under memory pressure (which is useful on systems with limited RAM, but is disabled by default on desktop Linux builds).

Others were specific to Flatpak, such as dealing with udev not being available in the sandbox, restoring the ability to create app launchers for websites, integrating with Flatpak’s network proxy portal, and allowing Chromium policy files to be provided by the host system.

Over in Endless OS, we also needed to update users’ existing file associations and migrate their Chromium profiles to its new home.

Impact

Chart of 30 days of Chromium downloads, with three large spikes of around 20,000 daily downloads

The chart above is the Flathub download statistics for Chromium in the past 30 days. Counting the points between 14th March (when the most recent update was pushed) and 21st March, there have been nearly 60,000 downloads. The majority of these will be Endless OS users: our 3.9.2 release in January 2021 rolled this change out to all users, and Endless OS has automatic updates enabled by default. But Flathub has a broader reach than just Endless OS! I believe that users of System76’s Pop!_OS have been migrated from a .deb of Chromium to this Flatpak, and surely there are many users on other distributions, too. It’s also been used as the basis for other apps on Flathub, including ungoogled-chromium.

As an added bonus, the Flatpak is wired up to flatpak-external-data-checker, which now automatically opens a pull request when a new Chromium release is published. Typically, new major releases need manual intervention to refresh the Flatpak patches, but minor releases often build without issue: for these, one can just smoke-test the test build from the pull request, and then merge it, reducing what used to be days of effort rebasing the package in Endless OS to the work of minutes. I love it when a plan comes together.

A quick glance at the issues on the flathub/org.chromium.Chromium repo will show that there is always more work to be done. We would love to see other distributions getting involved, reducing the duplicated work of maintaining Chromium packages for each distro, and making it easier for users of long-term stable branches to get important browser updates quickly and easily.

They should have called it Mirrorball

TL;DR: there’s now an rsync server at rsync://images-dl.endlessm.com/public from which mirror operators can pull Endless OS images, along with an instance of Mirrorbits to redirect downloaders to their nearest—and hopefully fastest!—mirror. Our installer for Windows and the eos-download-image tool baked into Endless OS both now fetch images via this redirector, and from the next release of Endless OS our mirrors will be used as BitTorrent web seeds too. This should improve the download experience for users who are near our mirrors.

If you’re interested in mirroring Endless OS, check out these instructions and get in touch. We’re particularly interested in mirrors in Southeast Asia, Latin America and Africa, since our mission is to improve access to technology for people in these areas.

Big thanks to Niklas Edmundsson, who administers the mirror at Academic Computer Club, Umeå University, who recommended Mirrorbits and provided the nudge needed to get this work going, and to dotsrc.org and Mythic Beasts who are also mirroring Endless OS already.

Read on if you are interested in the gory details of setting this up.


We’ve received a fair few offers of mirroring over the years, but without us providing an rsync server, mirror operators would have had to fetch over HTTPS using our custom JSON manifest listing the available images: extra setup and ongoing admin for organisations who are already generously providing storage and bandwidth. So, let’s set up an rsync server! One problem: our images are not stored on a traditional filesystem, but in Amazon S3. So we need some way to provide an rsync server which is backed by S3.

I decided to use an S3-backed FUSE filesystem to mount the bucket holding our images. It needs to provide a 1:1 mapping from paths in S3 to paths on the mounted filesystem (preserving the directory hierarchy), perform reasonably well, and ideally offer local caching of file contents. I looked at two implementations (out of the many that are out there) which have these features:

  • s3fs-fuse, which is packaged for Debian as s3fs. Debian is the predominant OS in our server infrastructure, as well as the base for Endless OS itself, so it’s convenient to have a package. ((Do not confuse s3fs-fuse with fuse-s3fs, a totally separate project packaged in Fedora. It uses its own flattened layout for the S3 bucket rather than mapping S3 paths to filesystem paths, so is not suitable for what we’re doing here.))
  • goofys, which claims to offer substantially better performance for file metadata than s3fs.

I went with s3fs first, but it is a bit rough around the edges:

  • Our S3 bucket name contains dots, which is not uncommon. By default, if you try to use one of these with s3fs, you’ll get TLS certificate errors. This turns out to be because s3fs accesses S3 buckets as $NAME.s3.amazonaws.com, and the certificate is for *.s3.amazonaws.com, which does not match foo.bar.s3.amazonaws.com. s3fs has a -o use_path_request_style flag which avoids this problem by putting the bucket name into the request path rather than the request domain, but this use of that parameter is only documented in a GitHub Issues comment.
  • If your bucket is in a non-default region, AWS serves up a redirect, but s3fs doesn’t follow it. Once again, there’s an option you can use to force it to use a different domain, which once again is documented in a comment on an issue.
  • Files created with s3fs have their permissions stored in an x-amz-meta-mode header. Files created by other tools (which is to say, all our files) do not have this header, so by default get mode 0000 (ie unreadable by anybody), and so the mounted filesystem is completely unusable (even by root, with FUSE’s default settings).

There are two ways to fix this last problem, short of adding this custom header to all existing and future files:

  1. The -o complement_stat option forces files without the magic header to have mode 0400 (user-readable) and directories 0500 (user-readable and -searchable).
  2. The -o umask=0222 option (from FUSE) makes the files and directories world-readable (an improvement on complement_stat in my opinion) at the cost of marking all files executable (which they are not)

I think these are all things that s3fs could do by itself, by default, rather than requiring users to rummage through documentation and bug reports to work out what combination of flags to add. None of these were showstoppers; in the end it was a catastrophic memory leak (since fixed in a new release) that made me give up and switch to goofys.

Due to its relaxed attitude towards POSIX filesystem semantics where performance would otherwise suffer, goofys’ author refers to it as a “Filey System”. ((This inspired a terrible “joke”.)) In my testing, throughput is similar to s3fs, but walking the directory hierarchy is orders of magnitude faster. This is due to goofys making more assumptions about the bucket layout, not needing to HEAD each file to get its permissions (that x-amz-meta-mode thing is not free), and having heuristics to detect traversals of the directory tree and optimize for that case. ((I’ve implemented a similar optimization elsewhere in our codebase: since we have many "directories", it takes many less requests to ask S3 for the full contents of the bucket and transform that list into a tree locally than it does to list each directory individually.))

For on-disk caching of file contents, goofys relies on catfs, a separate FUSE filesystem by the same author. It’s an interesting design: catfs just provides a write-through cache atop any other filesystem. The author has data showing that this arrangement performs pretty well. But catfs is very clearly labelled as alpha software (“Don’t use this if you value your data.”) and, as described in this bug report with a rather intemperate title, it was not hard to find cases where it DOSes itself or (worse) returns incorrect data. So we’re running without local caching of file data for now. This is not so bad, since this server only uses the file data for periodic synchronisation with mirrors: in day-to-day operation serving redirects, only the metadata is used.

With this set up, it’s plain sailing: a little rsync configuration generator that uses the filter directive to only publish the last two releases (rather than forcing terabytes of archived images onto our mirrors) and setting up Mirrorbits. Our Mirrorbits instance is configured with some extra “mirrors” for our CloudFront distribution so that users who are closer to a CloudFront-served region than any real mirrors are directed there; it could also have been configured to make the European mirrors (which they all are, as of today) only serve European users, and rely on its fallback mechanism to send the rest of the world to CloudFront. It’s a pretty nice piece of software.

If you’ve made it this far, and you operate a mirror in Southeast Asia, South America or Africa, we’d love to hear from you: our mission is to improve access to technology for people in these areas.