Endless contributions to GNOME 44

The GNOME 44 release is rushing towards us like an irate pangolin! Here is a quick roundup of some of the Endless OS Foundation team’s contributions over its six-month development cycle.

Software

As in the previous cycle, our team has been a key contributor to GNOME Software 44. Based on a very unscientific analysis of the Git commit log, about 30% of non-merge commits and 75% of merge commits to GNOME Software during this cycle came from our team. Co-maintainer Philip Withnall has continued his work to refactor Software’s internal threading model to improve its reliability. He’s also contributed a number of fixes in both GNOME Software and Flatpak to fix issues related to app updates, such as not leaking large temporary directories when an update fails. Dan Nicholson fixed an issue in Flatpak which would cause Software to remove an app rather than updating it when its ID changes.

Georges Stavracas added some sysprof instrumentation which allowed him to quickly pinpoint the main cause of slow loading of category pages. To our collective surprise, the culprit was… loading remote icons for apps! Georges fixed this issue by downloading icons asynchronously. A side-by-side comparison is really quite striking:

As we came closer to the release of Endless OS 5, we realised we needed some improvements to the handling of OS updates in GNOME Software, such as showing a Learn More link for major upgrades, distinguishing between major upgrades and minor updates, and using the distro’s icon when showing a minor update. Like Endless OS, GNOME OS uses eos-updater, although these improvements will not kick in fully there right now, since it currently does not set any OS version metadata on its updates, or a logo in os-release.

The GNOME Software updates page, showing a minor update for Endless OS with the Endless logo beside it.

Of course, we’ve also contributed to the ongoing maintenance of Software, and other functional improvements such as displaying release urgency levels for firmware updates.

Looking ahead, Joana Filizola has spearheaded a series of user studies on topics like how navigation within Software works, discoverability of search, and the name ‘Software’ itself: we hope these will bear fruit in future GNOME cycles.

Shell

As well as ongoing maintenance of Shell and Mutter, Georges Stavracas contributed improvements to the quick settings pills, adding subtitles to improve the information density. This went hand-in-hand with work to improve GNOME’s handling of Flatpak apps that are running in the background (i.e. without a visible window). Previously this was rather crude: if a Flatpak app ran without a window for some period of time, you would get a decontextualized dialog box asking if you want to allow the app to keep running. Choosing the “wrong” option would kill the app and forbid it from running in the background in future – breaking core functionality for certain apps. In GNOME 44, background apps are instead listed within the quick settings popover, and those apps that do use the background portal API to ask nicely to run in the background are allowed to do so without user interaction.

We also supported the design team’s experiments around how window focus is communicated.

GLib

Philip Withnall has, as in many previous cycles, contributed tens of hours of ongoing maintenance to this library that underpins the entire desktop. This has included a number of GVariant security fixes (like this one), GApplication security fixes, GDBus refcounting fixes, and more. Philip also added g_free_sized() and g_aligned_free_sized(), mirroring similar functions in C23, so that applications can start using these without needing to check for (or wait for) C23 support in the toolchain.

Initial Setup

I spent somewhat fewer hours—but not zero!—on general maintenance of Initial Setup. Georges fixed a regression that meant that privacy policies could not be viewed from within Initial Setup; I fixed the display of a shortlist of keyboard layouts, and of non-ASCII characters in location names after switching locale; and Cassidy James Blaede refreshed the design of the password page to use Adwaita widgets & styling.

Password page of GNOME Initial Setup. The password fields have inline icons to edit the text and reveal the password.

…and more

Every quarter, the engineering teams at Endless OS Foundation have an “intermission week”, where the team sets aside our normal priorities to focus on addressing tech debt, wishlist items, innovative or experimental ideas, and learning. Some of the items above came out of the last couple of intermission weeks! On top of that, Philip has spent some time experimenting with APIs to allow apps’ state to be saved and restored; and João Paulo Rechi Vita explored making the GNOME Online Accounts daemon quit when idle, saving a small but non-zero amount of RAM. Neither of these are quite in a production-ready state, but as they say: there’s always another release!

Meanwhile, we’ve been working on extending the set of web apps offered in GNOME Software on Endless OS, using more expansive criteria than the list shipped by GNOME Software by default, and a different delivery mechanism for the catalogue. More on this in a future post!

Recovering a truncated Zoom meeting recording on Endless OS

One of my colleagues was recording a Zoom meeting. The session ended in such a way that the recording was left unfinished, named video2013876469.mp4.tmp. Trying to play it in Videos just gave the error “This file is invalid and cannot be played”. ffplay gave the more helpful error “moov atom not found”. It wasn’t too hard to recover the file, but it did involve a bit of podman knowledge, so I’m writing it up here for posterity.

A bit of DuckDuckGo-ing tells us that the moov atom is, roughly, the index of the video. Evidently Zoom writes it at the end of the file when the recording is stopped cleanly; so if that doesn’t happen, this metadata is missing and the file cannot be played. A little more DuckDuckGo-ing tells us that untrunc can be used to recover such a truncated recording, given an untruncated recording from the same source. Basically it uses the metadata from the complete recording to infer what the metadata should be for the truncated recording, and fill it in.

I built untrunc with podman as follows:

git clone https://github.com/ponchio/untrunc.git
cd untrunc
podman build -t untrunc .

I made a fresh directory in my home directory and placed the truncated video and a successful Zoom recording into it:

mkdir ~/tmp
# Make the directory world-writable (see later note)
chmod 777 ~/tmp
cp ~/Downloads/video2013876469.mp4.tmp ~/tmp
cp ~/Downloads/zoom_0.mp4 ~/tmp

Now I ran the untrunc container, mounting $HOME/tmp from my host system to /files in the container, and passing paths relative to that:

podman run --rm -it -v $HOME/tmp:files \
    localhost/untrunc:latest \
    /files/zoom_0.mp4 \
    /files/video2013876469.mp4.tmp

and off it went:

Reading: /files/zoom_0.mp4
Repair: /files/video2013876469.mp4.tmp
Mdat not found!

Trying a different approach to locate mdat start
Repair: /files/video2013876469.mp4.tmp
Backtracked enough!

Trying a different approach to locate mdat start
Repair: /files/video2013876469.mp4.tmp
Mdat not found!

[…]

Found 71359 packets.
Found 39648 chunks for mp4a
Found 31711 chunks for avc1
Saving to: /files/video2013876469.mp4_fixed.mp4

The fixed file is now at ~/tmp/video2013876469.mp4_fixed.mp4.

Tinkering with the directory permissions was needed because untrunc’s Dockerfile creates a user within the container and runs the entrypoint as that user, rather than running as root. I believe in the Docker world this is considered good practice because root in a Docker container is root on the host, too. However I’m using podman as an unprivileged user; so UID 0 in the container is my user on the host; and the untrunc user in the container ends up mapped to UID 166535 on the host, which doesn’t have write access to the directory I mounted into the container. Honestly I don’t know how you’re supposed to manage this, so I just made the directory world-writable and moved on with my life.

GNOME 43: Endless’s Part In Its Creation

GNOME 43 is out, and as always there is lots of good stuff in there. (Me circa 2014 would be delighted to see the continuous improvements in GNOME’s built-in RDP support.) During this cycle, the OS team at Endless OS Foundation spent a big chunk of our time on other initiatives, such as bringing Endless Key to more platforms and supporting the Endless Laptop programme. Even so, we made some notable contributions to this GNOME release. Here are a few of them!

App grid pagination improvements

The Endless OS desktop looks a bit different to GNOME, most notably in that the app grid lives on the wallpaper, not behind it. But once you’re at the app grid, it behaves the same in both desktops. Endless OS computers typically have hundreds of apps installed, so it’s normal to have 2, 3, or more pages of apps.

We’ve learned from Endless OS users and partners that the row of dots at the bottom of the grid did not provide enough of a clue that there are more pages than the first. And when given a hint that more pages are available, indicated by those dots, users rarely discovered that they can switch with the scroll wheel or a swipe: they would instead click on those tiny dots. Tricky even for an accomplished mouse user!

GNOME 40 introduced an effect where moving the mouse to the edges of the screen would cause successive pages of apps to “peek” in. As we’ve carried out user testing on our GNOME 41-based development branch (more on this another time) we found that this was not enough: if you don’t know the other pages are there, there’s no reason to deliberately move your mouse pointer to the empty space at the edges of the screen.

So, we proposed for GNOME something similar to what we designed and shipped in Endless OS 4: always-visible pagination arrows. What we ended up implementing & shipping in GNOME 43 is a bit different to what we’d proposed, after several rounds of iteration with the GNOME design team, and I think it’s even better for it. Implementing this was also an opportunity to fix up many existing bugs in the grid, particularly when dragging and dropping between pages.

GNOME 43 app grid, showing a pagination arrow to the right-hand side

GNOME 43 app grid, showing next page peeking in while dragging-and-dropping an app

GNOME Software

43% of the code-changing commits between GNOME Software 42.0 and the tip of gnome-software main as of 29th September came from Endless – not bad, but still no match for Red Hat’s Milan Crha, who single-handedly wrote 46% of the commits in that range! (Counting commits is a crude measure, and excluding translation updates and merge commits overlooks significant, essential work; even with those caveats, I still think the number is striking.)

Many of our contributions in this cycle were part of the ongoing threading rework that Philip Withnall spoke about at GUADEC 2022, with the goals of improving performance, reducing memory usage, and eliminating hangs due to thread-pool exhaustion. Along the way, this included some improvements to the way that featured and Editor’s Choice apps are retrieved.

Several patches bearing Joaquim Rocha’s name and an Endless email address landed in this cycle, improving Software’s handling of apps queued for installation, despite Joaquim having moved on from Endless 3 years ago. These originally come from Endless’s fork of GNOME Software and date back to 2018, and made their way upstream as part of our quest to converge our fork with upstream. In related news, we recently rebased the Endless OS branch of Software onto the gnome-43; we are down from 200+ patches a few years ago to 19. Nearly there!

At the start of this year, Phaedrus Leeds was contracted by the GNOME Foundation (funded by Endless Network) to reintroduce the ability to install and manage web apps with Software, even when GNOME Web is installed with Flatpak. This work was not quite ready for the GNOME 42 feature freeze, and landed in GNOME 43. I personally did a trivial amount of work to enable this feature in GNOME OS and add a few sites to the curated list, but as I write this post I have realised that these additions were not actually shipped in the GNOME Software 43.0 tarball. I did a bit of research into how we can expand this curated list without creating a tonne of extra work for our community of volunteers, but haven’t had a chance to write this up just yet.

Five "Picks from the Web" In GNOME Software 43

Looking to the future, Georges Stavracas has recently spent some time improving Software’s sysprof integration to help understand where Software is spending its time, and hence improve its perceived responsiveness. One of the first discoveries is that the majority of the delay before a category page becomes responsive is spent downloading app icons; making this asynchronous will make Software feel much snappier. Alas, the current approach for fixing this will change Software’s plugin API, so will have to wait for GNOME 44. I’m sure that with decent profiler integration and enough eyes on the profiler, we’ll be able to find more cases like this.

GTK 4-flavoured Initial Setup

Serial GTK 4 porter Georges Stavracas ported Initial Setup to GTK 4. Since Initial Setup uses libmalcontent-ui to implement its parental controls pages, he also ported the Parental Controls app to GTK 4.

"About You" page in GNOME Initial Setup 43. Full name: Michael Banyan. Username: bovine poet laureate

Parental controls in GNOME 43

This port was a direct update of the existing UI to a new toolkit version, only adopting new widgets like GtkPasswordEntry and AdwPreferencesPage where it was trivial to do so. Designs exist for a refreshed Initial Setup interface – anyone interested in picking this up?

Initial Setup has a remarkably large dependency graph, which made this update trickier than it might otherwise have been. I made a start back in January for GNOME 42 and dealt with some of the easier library changes, but more traps awaited:

  • Initial Setup depends on goa-backend-1.0, the bit of GNOME Online Accounts that actually has a user interface (which uses WebKit). This is, for now, GTK 3 only. The solution Georges used here was the same as he used in GNOME Settings: move it into a separate process.
  • Next up, Initial Setup itself uses WebKit (to show the Mozilla Location Service terms of service and abrt privacy policy). The GTK 4 port of WebKit was not widely available in distros at the time of the port. As a result, Initial Setup’s GitLab CI switched from Fedora to Arch. It also means that Initial Setup has a transitive dependency on libsoup 3…
  • Malcontent uses libflatpak; until recently, libflatpak had a hard dependency on libsoup 2.4 with no libsoup 3 port in evidence. So Initial Setup would transitively link to both libsoup 2.4 and 3, and abort on startup, if parental controls were enabled. Happily, a libcurl backend appeared in libflatpak 1.14, and libostree already had a libcurl backend, so if your distro configures both of those to use libcurl then parental controls can be safely enabled in Initial Setup 43.

It’s interesting to me that libsoup’s API changes have caused several GNOME-adjacent projects not to migrate to the new API, but to de-facto move away from libsoup. Hindsight is 20/20, etc. There is a draft pull request to build ostree against libsoup 3, so perhaps they will return to the soup tureen in due course.

Behind the scenes / friends of GNOME

Not all heroes wear capes, and not all contributions are as visible as others. Our team continues to co-maintain countless GNOME and GNOME-adjacent modules, and fix tricky problems at their source, such as this file-descriptor leak Georges caught in libostree.

We’ve been involved in GNOME design discussions, with Cassidy James Blaede (who joined Endless earlier this year) joining the Design team. We helped reach consensus for the new Quick Settings design, and are continuing to be involved in future design initiatives.

Someone recently asked Cassidy on Twitter whether it is true that GNOME OS is “basically just a modified version of Endless OS”, as they had heard. It’s not! But you could probably consider them second cousins once removed, and they have enough in common that improvements flow both ways. GNOME OS uses eos-updater, our libostree-based daemon that downloads and installs OS updates (and does some other stuff that GNOME OS doesn’t use). A while back, Dan Nicholson taught eos-updater how to not lose changes in /etc in the time between an update being installed, and it being booted. (Which can be pretty bad! Entire users can be lost this way!) But we found that this libostree feature interacted poorly with the way /boot is automounted on systems that use systemd-boot, so the change was disabled on such systems. More recently, Dan fixed libostree to work correctly in this case, so eos-updater can now correctly preserve changes in /etc. GNOME OS uses systemd-boot, so in due course the fixed libostree and eos-updater will appear there and this problem you probably didn’t know you have will be fixed.

And since my last post, Jian-Hong Pan updated TurtleBlocks on Flathub to the GNOME 42 runtime, dealing with another of the long tail of Flathub apps on end-of-lifed runtime versions. Sadly it fails to build on the GNOME 43 runtime due to an apparent setuptools regression, but 42 has another 6 months in it yet.

I could go on, and I’m sure I’ve there are things my fickle memory has overlooked, but for now: onwards!

Creating Windows installation media on Linux

Every so often I need to install Windows, most recently for my GNOME on WSL experiments, and to do this I need to write the Windows installer ISO to a USB stick. Unlike most Linux distro ISOs, these are true, pure ISO 9660 images—not hybrid images that can also be treated as a DOS/MBR disk image—so they can’t just be written directly to the disk. Microsoft’s own tool is only available for Windows, of course.

I’m sure there are other ways but this is what I do. Edit: check the comments for an approach which involves 2 partitions and a little more careful copying, but no special tools. I’m writing it down so I can easily find the instructions next time!

The basic process is quite simple:

  • Download an ISO 9660 disk image from Microsoft
  • Partition the USB drive with a single basic data partition, formatted as FAT32
  • Mount the ISO image – on GNOME, you should just be able to double-click it to mount it with Disk Image Mounter
  • Copy all the files from the mounted ISO image to the USB drive

But there is a big catch with that last step: at least one of the .wim files in the ISO is too large for a FAT32 partition.

The trick is to first copy all the files to a writeable directory on internal storage, then use a tool called wimlib-imagex split from wimlib to split the large .wim file into a number of smaller .swm files before copying them to the FAT32 partition. I think I compiled it from source, in a toolbox container, but you could also use this OCI container image whose README helpfully provides these instructions:

find . -size +4294967000c -iname '*.wim' -print | while read -r wimpath; do
  wimbase="$(basename "$wimpath" '.wim')"
  wimdir="$(dirname "$wimpath")"
  echo "splitting ${wimpath}"
  docker run \
    --rm \
    --interactive \
    --tty \
    --volume "$(pwd):/work" \
    "backplane/wimlib-imagex" \
      split "$wimpath" "${wimdir}/${wimbase}.swm" 4000
done

Now you can copy all those files, minus the too-large .wim, onto the FAT32 drive, and then boot from it.

This all assumes that you only care about a modern system with EFI firmware. I have no idea about creating a BIOS-bootable Windows installer on Linux, and fortunately I have never needed to do this: to test stuff on a BIOS Windows installation, I have used the time-limited virtual machines that Microsoft publishes for testing stuff in old versions of Internet Explorer.

I was inspired to resurrect this old draft post by a tweet by Ross Burton.

Release (semi-)automation

The time I have available to maintain GNOME Initial Setup is very limited, as anyone who has looked at the commit history will have noticed. I’d love more eyes & hands on this important but easy-to-overlook component, particularly to guide it kindly but firmly into the modern age of GTK 4 and the refreshed HIG.

I found that making a batch of 1–3 releases across different GNOME branches every few months was surprisingly time-consuming and error-prone, even with the pretty comprehensive release process checklist on the GNOME Wiki, so I’ve been periodically trying to automate bits of it away.

Philip Withnall’s gitlab-changelog script makes writing the NEWS file a lot quicker. I taught it to output the human-readable names of each updated translation (a nice additional contribution would be to also include the name of the human who updated the translation) and made it a little smarter about guessing the Git commit range to scan.

Beyond that, I added a Meson run target, maintainer-upload-release pointing at a script which performs some rudimentary coherence checks on the version number, tags the release (using git-evtag if available), atomically pushes the branch and that tag to GNOME GitLab, then copies the source tarball to master.gnome.org. (Apparently it has been almost 12 years since I did something similar in telepathy-gabble, building on the make maintainer-upload-release target that Simon McVittie added in 2008, which is where I borrowed the name.) Maybe other module maintainers may find this script useful too – it’s quite generic.

Putting these together, the release flow looks like this:

git switch gnome-42
git pull
../pwithnall/gitlab-changelog/gitlab-changelog.py GNOME/gnome-initial-setup
# Manually edit NEWS to incorporate the changelog, adjusted as needed
# Manually check the version in meson.build
git commit -am 'NEWS for 42.Y'
ninja -C _build dist maintainer-upload-release

Another release-related quality-of-life improvement is to make GitLab CI not only build and test the project (in the vain hope that there might actually be tests!) but also check that the install and gnome-initial-setup-pot targets both work. (At one point or another both have failed at or around release time; now they never will again, famous last words.)

I know none of this is rocket science, but I find it all makes the process quicker and less cumbersome, and it’s stopped me from repeating errors like uploading the wrong version on a few tired evenings. Obviously this could all be taken further: perhaps a manually-invoked CI pipeline that does all this stuff, more checks, etc. But while I’m on this train of thought:

Why do we release GNOME modules one-by-one at all?

The workflow we use to release Endless OS is a bit different to GNOME. Once we merge a change to some module’s Git repository, such as eos-updater or our shrinking branch of GNOME Software, that change embarks on a scenic automated journey that takes it to the next nightly build of the entire OS, both as an OSTree update and as fresh installation media. I use these nightly builds for my daily work, safe in the knowledge that I can roll back to the previous build if necessary.

We don’t make releases of individual modules: instead, when it comes time to release the OS, we trigger a pipeline that (among many other things) pushes the already-built OS update to the production repo, and creates Release_x.y.z tags on each Git repo.

This was quite an adjustment for me at first, compared to lovingly hand-crafting NEWS files and coming up with funny/esoteric release names, but now that I’m used to it it’s hard to go back. Why can’t GNOME do the same?

At this point in the post, we are straying into territory that I have limited first-hand knowledge of. Caveat lector! But here goes:

Thanks to GNOME OS, GNOME already has nightly builds of the entire desktop and apps: so rather than having to build everything yourself, or wait for a development release of GNOME, you can just update & reboot your GNOME OS VM and test the change right there. gnome-build-meta knows how to build every GNOME module; and if you can build the code, it seems a conceptually small step to run ninja dist and the stuff above to publish tags and tarballs for each module.

So you could well imagine on 43.beta release day, someone in the release team could boot the latest GNOME OS nightly, declare it to be Good, and push a button that tags every relevant GNOME module & builds and uploads all the tarballs, and then go back to their day, rather than having to chase down module owners who haven’t quite got around to making the release, fix random build breakages, and so on.

To make this work reliably, I think you’d need every module’s CI to be run through gnome-build-meta, building that MR against the rest of the project, so that g-b-m build failures would be caught before (not after) the offending change lands in the module in question. Seems doable – in Endless we have the equivalent thing managed by a jenkins-job-builder template, the GitHub Pull Request Builder plugin, and a gnarly script.

Continuous integration and deployment are becoming the norm throughout the software industry, for good reasons laid out quite well in articles like Shipping Fast Changes Your Life: the smaller the gap between making a change and it reaching a user, the faster the feedback, and the less costly it is to fix a bug or change course.

The free software movement has historically been ahead of the curve on this, with the “release early, release often” philosophy. And GNOME in particular has used a time-based release process for two decades, allowing major distros to align their schedules to GNOME and get updates into the hands of users quickly, which went some way towards overcoming the fact that GNOME does not own the full pipeline from source code to end users.

Havoc Pennington’s June 2002 email proposing this model has aged rather well, in my opinion, and places a heavy emphasis on the development branch being usable:

The unstable branch must always be dogfood-quality. If testers can’t test it by using it daily, they can’t make the jump. If the unstable branch becomes too unstable, we can’t release it on a reliable schedule, so we have to start breaking the stable branch as a stopgap.

Interestingly the time-based release schedule wiki page states that the schedule should contain:

Regular test release dates, approximately every 2 weeks.

These days, GNOME releases are closer to monthly. In the context of the broader industry where updates reach users multiple times a day, this is starting to look a little less forward-thinking! Of course, continuously deploying an entire OS to production is rather harder than continuously deploying web apps or apps in app stores, if only because the stakes are higher: you need a really robust automatic rollback mechanism to save your users’ plant-based bacon substitute if a new OS build fails to boot, or worse, contains an updater bug that prevents future updates being applied! Still, I believe that a bit of automation would go a long way in allowing module maintainers and the release team alike to spend their scarce mental energy on other things, and allow the project to increase the frequency of releases. What am I missing?

Small steps towards a GTK 4-based Initial Setup

Over the Christmas holidays, I was mostly occupied with the literal care and feeding of small humans, but I found a bit of time for the metaphorical care and feeding of Initial Setup for GNOME 42 as well. Besides a bit of review and build and CI housekeeping, I wrote some patches to update it for API changes in libgnome-desktop (merged) and libgweather (pending). The net result is an app which looks and works exactly the same, complete with a copy of the widget formerly known as GWeatherLocationEntry (RIP) with its serial numbers filed off.

Of course, my ultimate goal was to port Initial Setup to GTK 4. I made some other tiny steps in that direction, such as removing a redundant use of GtkFrame that becomes actively harmful with the removal of the shadow-type property in GTK 4, and now have a proof-of-concept port of just the final page which both compiles and runs!

Screenshots of "All done!" page of Initial Setup

But, I will not have time to complete this port in time for the GNOME 42 UI freeze on 12th February. If you are reading this and feel inspired to pick this up, even just a page or two, more hands would be much appreciated.

γυαδεκ? χκπτγεδ?

GUADEC in Thessaloniki was a great experience, as ever. Thank you once again to the GNOME Foundation for sponsoring my attendence!

Sponsored by GNOME Foundation

Some personal highlights, in no particular order:

  • A lot of useful and informative discussion at the GNOME Advisory Board meeting on Thursday – we ran out of time, which seems like a good sign.
  • After Benjamin Berg and Iain Lane’s great talk on Managing GNOME Sessions with Systemd, Benjamin and I discussed the special-case they had to make to run GNOME Initial Setup’s “copy worker” early in the user session, and whether we might be able to improve this and various other aspects by launching Initial Setup in a different way.
  • Via Matthias’ talk on Portals, I got thinking about the occasional requests for an “is this app installed?” portal, and I realised that you can actually fake it with existing machinery in some cases. If you care about a specific app, you probably want to be able to talk to it, so you specify --talk-name=org.example.Foo; at which point you can call org.freedesktop.DBus.ListActivatableNames() and check whether org.example.Foo is in the returned list.
  • The Intern Lightning Talks were inspiring: it’s great to see what has caught the interest of new contributors. This year, I was inspired by Srestha Srivastava’s work on Boxes to send a merge request to osinfo-db to generate the necessary XML for Endless OS. This in turn led to a great discussion with Fabiano and Felipe, and to some more issues and merge requests.
  • Alex Larsson was a tough act to follow at the lightning talks, but based on hallway discussion, my bit on Flatpak External Data Checker was of interest. (I taught it how to update appdata on the flight home. The person sitting next to me told me that writing code on flights is a young-person thing, which I took as a compliment.)
  • Not one, but two talks on user testing! One thing I took away is that while it’s possible to conduct remote usability testing, you’ll miss out on body language cues from the test subjects, and in the specific case of GNOME you’ll either bias the sample towards people who already use GNOME, or you’ll introduce the additional variable of whatever remote access tool the user uses. Not ideal!

On the Endless front, the launch of the Coding Education Challenge, and the various talks from my esteemed colleagues about varied activities, were all great to see.

There were lots of clashes for me, so I’m grateful to the AV team for their great work on recording all the talks. (Unfortunately, one of the talks I couldn’t make it to, on GDPR, was not recorded, to avoid distributing what could be construed as legal advice. Alas!) Many thanks to the local team and the GNOME Foundation staff and volunteers who made the event run so smoothly.

Using Vundle from the Vim Flatpak

I (mostly) use Vim, and it’s available on Flathub. However: my plugins are managed with Vundle, which shells out to git to download and update them. But git is not in the org.freedesktop.Platform runtime that the Vim Flatpak uses, so that’s not going to work!

If you’ve read any of my recent posts about Flatpak, you’ll know my favourite hammer. I allowed Vim to use flatpak-spawn by launching it as:

flatpak run --talk-name=org.freedesktop.Flatpak org.vim.Vim

I saved the following file to /tmp/git:

#!/bin/sh
exec flatpak-spawn --host git "$@"

then ran the following Vim commands to make it executable, add it to the path, then fire up Vundle:

:r !chmod +x /tmp/git
:let $PATH = '/tmp:/app/bin:/usr/bin'
:VundleInstall

This tricks Vundle into running git outside the sandbox. It worked!

I’m posting this partly as a note to self for next time I want to do this, and partly to say “can we do better?”. In this specific case, the Vim Flatpak could use org.freedesktop.Sdk as its runtime, like many other editors do. But this only solves the problem for tools like git which are included in the relevant SDK. What if I’m writing Python and want to use pyflakes, which is not in the SDK?

Everybody’s Gone To The GUADEC

It’s been ten days since I came back from GUADEC 2018, and I’ve finally caught up enough to find the time to write about it. As ever, it was a pleasure to see familiar faces from around the community, put some new faces to familiar names, and learn some entirely new names and faces! Some talk highlights:

  • In “Patterns of refactoring C to Rust”, Federico Mena Quintero pulled off the difficult trick of giving a very source code-centric talk without losing the audience. (He said afterwards that the style he used is borrowed from a series of talks he referenced in his slides, but the excellent delivery was certainly a large part of why it worked.)
  • Christian Hergert and Corentin Noël’s talk on “What’s happening in Builder?” left me feeling good about the future of cross-architecture and cross-device GNOME app development. Developing OS and platform components in a desktop-containerised world is not a fully-solved problem; between upcoming plans for Builder and Philip Chimento’s Flapjack, I think we’re getting there.
  • I’m well-versed in Flatpak but know very little about Snap, so Robert Ancell’s talk on “Snap Package support in GNOME” was enlightening. It’s heartening that much of the user-facing infrastructure to solve problems common to Snap and Flatpak (such as GNOME Software and portals) is shared, and it was interesting to learn about some of the unique featues of Snap which make it attractive to ISVs.

I couldn’t get to Almería until the Friday evening; I’m looking forward to checking out video recordings of some of the talks I missed. (Shout-out to the volunteers editing these videos! Update: the videos are now mostly published; I’ve added links to the three talks above.)

One of the best bits of any conference is the hallway track, and GUADEC did not disappoint. Fellow Endlesser Carlo Caione and I caught up with Javier Martinez Canillas from Red Hat to discuss some of the boot-loader questions shared between Endless OS and Silverblue, like the downstream Boot Loader Specification module for GRUB, and how to upgrade GRUB itself—which lives outside the atomic world of OSTree—in as robust and crash-proof a manner as is feasible.

On the bus to the campus on Sunday, I had an interesting discussion with Robert Ancell about acquiring domain expertise too late in a project to fix the design decisions made earlier on (which has happened to me a fair few times). While working on LightDM, he avoided this trap by building a thorough integration test suite early on; this allowed him to refactor with confidence as he discovered murky corners of display management. As I understand it (sorry if I’ve misremembered from the noisy bus ride!), he wrote a library which essentially shims every syscall. This made it easier to mock and exercise all the complicated interactions the display manager has with many different parts of the OS via many IPC mechanisms. I always regret it when I procrastinate on test coverage; I’ll keep this discussion in mind as extra ammunition to do the right thing.

My travel to and from Almería was kindly sponsored by the GNOME Foundation. Thank you!

Sponsored by GNOME Foundation

When is an exit code not an exit code?

TL;DR: I found an interesting bug in flatpak-spawn which taught me that there is a difference between the exit code you pass to exit(), the exit status reported by waitpid(), and the shell variable $?.

One of the goals of Flatpak is to isolate applications from the host system; they can normally only directly run external programs supplied by the Flatpak platform they are built against, rather than whatever executables happen to be installed on the host. But some developer tools do need to be able to run commands on the host system. One example is GNOME Builder, which allows you to compile software on the host; another is flatpak-builder which uses this to build flatpak:s from within a flatpak. (For my part, I’m occasionally working on making Bustle run pkexec dbus-monitor --system on the host, to allow reading all messages on the system bus (a privileged operation) from an unprivileged, sandboxed application. More on this in a future blog post.)

Flatpak’s session helper provides a D-Bus API to do this: a HostCommand method that launches a given command outside the sandbox and returns its process ID; and a HostCommandExited signal which is emitted when the process exists, with its exit status as a uint32. Apps can use this D-Bus API directly, but recent versions of the common runtimes include a wrapper command which is much easier to adapt existing code to use: just replace cat /etc/passwd with flatpak-spawn --host cat /etc/passwd.

In theory, flatpak-spawn --host propagates the exit status from the command it runs, but I found that in practice, it did not. For example, false is a program which does nothing, unsuccessfully:

$ false; echo exit status: $?
1

But when run via flatpak-spawn --host, its exit status is 0:

$ flatpak run --env='PS1=sandbox$ ' \
> --talk-name=org.freedesktop.Flatpak \
> --command=bash org.freedesktop.Sdk//1.6
sandbox$ flatpak-spawn --host false; echo exit status: $?
0

If you care whether the command you launched succeeded, this is problematic! The first clue to what’s going on is in the output of flatpak-spawn --verbose:

sandbox$ flatpak-spawn --verbose --host false; echo exit status: $?
F: child_pid: 18066
F: child exited 18066: 256
exit status: 0

Here’s the code, from the HostCommandExited signal handler:

g_variant_get (parameters, "(uu)", &client_pid, &exit_status);
g_debug ("child exited %d: %d", client_pid, exit_status);

if (child_pid == client_pid)
  exit (exit_status);

So exit_status is 256, even though false actually returns 1. If you read man 3 exit, you will learn:

void exit(int status);

The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).

256 == 0x0100 and 0377 == 0x00ff; so exit_status & 0377 == 0. Now we know why flatpak-spawn returns 0, but why is exit_status equal to 256 rather than 1 in the first place?

It comes from a g_child_watch_add_full() callback. The g_child_watch_add_full() docs tell us:

In many programs, you will want to call g_spawn_check_exit_status() in the callback to determine whether or not the child exited successfully.

Following the link, we learn:

On Unix, [the exit status] is guaranteed to be in the same format waitpid() returns.

And reading the waitpid() documentation, we finally learn that the exit status is an opaque integer which must be inspected with a set of macros. On Linux, the layout is, roughly:

  • When a process calls exit(x), the exit status is ((x & 0xff) << 8); the low byte is 0. This explains why the exit_status for false is 256.
  • When a process is killed by signal y, the exit status is stored in the low byte, with its high bit (0x80) set if the process dumped core. So a process which segfaults and dumps core will have exit status 11 | 0x80 == 11 + 128 == 139

What’s funny about this is that, if the subprocess segfaults and dumps core, when testing from the shell flatpak-spawn --host appears to work.

host$ /home/wjt/segfault; echo exit status: $?
Segmentation fault (core dumped)
exit status: 139
sandbox$ flatpak-spawn --verbose --host /home/wjt/segfault; echo exit status: $?
F: child_pid: 20256
F: child exited 20256: 139
exit status: 139

But there’s a difference between this and a process which actually exits 139:

sandbox$ flatpak-spawn --verbose --host /bin/sh -c 'exit 139'; echo exit status: $?
F: child_pid: 20481
F: child exited 20481: 35584
exit status: 0

I always thought these two were the same. Actually, mapping the signal that killed a process to $? = 128 + signum is just shell convention.

To fix flatpak-spawn, we need to inspect the exit status and recover the exit code or signal. For normal termination, we can pass the exit code to exit(). For signals, the options are:

  • Reset all signal() handlers to SIG_DFL, then send the signal to ourselves and hope we die
  • Follow the shell convention and exit(128 + signal number)

I think the former sounds scary and unreliable, so I implemented the latter. Imperfect, but it’ll do.