Will Thompson – Will Thompson

Removing Online Accounts from GNOME Initial Setup 46

It’s exciting to see the ongoing work to revitalise GNOME Online Accounts. One section of this post jumped out to me:

Another big step to improving reliability and maintainability is updating to the latest dependencies and industry conventions. This meant first supporting OAuth 2.0 in a standard browser so, for example, when you log into a Google account you’re doing it with your preferred, trusted browser.

This poses a problem for GNOME Initial Setup, which currently uses GNOME Online Accounts so that you can configure those accounts, and pick up your name and avatar from them, while setting up your computer. But by design, you can’t open a web browser (or any other application) in the Initial Setup session.

After some discussion with Andy and others, we concluded that we should remove Online Accounts support from Initial Setup. The removal is tracked by this issue and I started a Discourse thread to raise awareness (along with this post). One suggestion is to add a page to GNOME Tour linking to the Online Accounts section of Settings, as a way to keep this feature prominently visible; but another alternative is to assume that people will find it if/when they use applications that use GOA. The latter is the path of least resistance since it needs no new code!

Endless OS’s privacy-preserving metrics system

Endless OS contains an optional anonymous telemetry system, where installations of the OS collect usage data and periodically send it back to us at Endless OS Foundation. Although the system is open-source, has existed in various forms for close to a decade, and individual pieces of it are documented, I don’t think we’ve ever written much about how it all fits together. So here I will try to correct that by describing the current incarnation of this system in its entirety, and why we have it.

Let’s stare directly at the elephant in the room: usage metrics – telemetry, analytics, or whatever other terms one might like to use – is something that free software community members are often opposed to, with some good justification. A lot has changed in the 10 years since the first commit on this project in September 2013. The Facebook–Cambridge Analytica data scandal and other similar events have raised public awareness of the potential for abuse of personal data, and there have been massive regulatory improvements in this space, too. Of course, I have my own opinions about data collection, which you can probably guess from my having spent most of my adult life working on free software! When I joined Endless, I wasn’t comfortable with previous incarnations of our metrics system, but after it was reworked into the form described below, I am confident that it strikes a good balance.

I can’t hope to change every reader’s mind about telemetry, but I do hope to make a more optimistic case for privacy-respecting metrics that are a net positive for free software users and communities alike.

We are only human, so there may be things we have overlooked, whether conceptual or in the implementation. And, we have very limited time to work on this stack at present. If you have feedback or suggestions; if you’re interested in contributing ideas, documentation, code, or analysis; if you want to adopt or adapt these tools in your own distro or projects: please do get in touch!

Goals

Before we go into the details of this system, let’s talk about why an open-source project might want such a system in the first place. After all, the easiest metrics system to build and maintain is no metrics system at all! For Endless OS Foundation, there are two primary reasons to have this system: to demonstrate the impact our work has, and to guide our ongoing work to improve our tools.

Endless OS Foundation is a non-profit, funded by private philanthropic grants. To justify our continued funding, which enables us to keep developing tools like Endless OS and contributing to open-source projects such as Kolibri, GNOME, and Flathub, we need to be able to demonstrate the impact that our work has had in the world. We work with other social-impact organisations who have the same need to justify to their own sponsors why they are collaborating with us and investing their time and money in putting our tools in people’s hands. One way that we do this is through qualitative research, such as end-user interviews; but particularly as a small, geographically-disparate team, we can only do so much of this, so we seek to back this up with quantitative data. Quantity has a quality of its own.

In addition, we are not making software in a vacuum: we are trying to address the digital divide by designing technology for people underserved by other platforms. So at a very basic level, we wish to understand: how do people use our tools? What apps and features of the desktop do people use, or not use? Do people actually use their computer at all, or are they too intimidated by technology to even take the computer out of its box? (This is not a hypothetical example!)

Having such information about if and how people are using our software helps us to make informed decisions about how to improve it, rather than basing our actions on assumption and guesswork. In some scenarios, user testing & interviews are the best way to make such decisions; but when considering usage of a whole desktop over time rather than in a single 1-hour testing session, it’s rarely practical to carry out enough such studies to be confident that you’re not just generalising from a handful of anecdotes.

Commercial organisations have similar needs, which is why basically every website, online store, app and platform today outside the free software space has some kind of analytics capability. In Telemetry Is Not Your Enemy, Glyph Lefkowitz described the FLOSS community’s reflexive rejection of telemetry in any form as “a massive and continuing own-goal”, and imagined how the open-source process could yield telemetry that is “thoughtful, respectful of user privacy, and designed with the principle of least privilege in mind”. I tend to agree. The structure and transparency of FLOSS gives us the opportunity to gather actionable data, transparently, in order to make better decisions and improve our software without betraying our users; and by doing so we could make better use of our limited time and attention.

We also have a very important non-goal: we do not believe people’s individual usage of technology should be tracked. We don’t want or need to know what a particular person does with their computer, or any personal data about them. We are a non-profit organisation with a philanthropic mission to give people access to, and control over, technology. We do not wish to sell data about our users, deliver behaviourally-targeted advertisements to users, and so on.

OK, time to dive in!

Gory details

On Endless OS, applications use a D-Bus API (via a small C library, eos-metrics) to record metrics events locally on the device.

This API is implemented by a system-wide service, named eos-metrics-event-recorder or eos-event-recorder-daemon (no, I don’t know why it has two different names either), which buffers those events in memory, and periodically submits them anonymously to a server, Azafea, which ingests them into a PostgreSQL database (after a short layover in a Redis queue). If the computer is offline – often the case for Endless OS systems! – events are persisted to a size-limited ring buffer on disk, and submitted when the computer is online. (At one point we had a tool for exporting events to external storage on an offline system, and submitting them from a separate online system, but this has bit-rotted.)

In some cases, the events are recorded by individual applications. For example, eos-updater records an event when an OS update fails, and we have patched GNOME Shell in Endless OS to record aggregate app usage duration. There is also another system-wide service, eos-metrics-instrumentation, which records the aggregate duration of user sessions, and information about the system’s hardware.

When stored in the remote database on our server, events carry an event-specific payload (such as the updater error which occurred); a timestamp; the OS version (e.g. “5.0.4”) on which the event occurred; and the channel the system belongs to. The channel is defined by the identifier of the OS image that the system was originally installed from (such as eos-eos5.0-amd64-amd64.230510-144309.fr, which is the downloadable build of 5.0.4 with French content preinstalled), whether this is a live USB, dual-boot install or standalone install, plus an empty-by-default dictionary describing the deployment site (which may be used to distinguish different deployments of the same OS image).

The crucial point here is that the same OS image ID is reported by many systems, so while we can slice the data to a particular cohort of users, we cannot identify the specific computer that any given event came from, or even that two different events came from the same computer. Since we typically have a different OS image for each deployment partner, we are still able to filter and share a summary of only the data that relates to that partner, which might be anywhere between 50 computers and thousands of computers.

There are two types of events. Singular events are buffered and submitted to the server as-is. Aggregate events are, as the name suggests, aggregated on the local system by calendar day and calendar month, and submitted after the current day or month ends. As well as the event payload, they carry an integer counter, which is currently used only for durations (e.g. “Chromium was open for 4 hours on 20^th June 2023”), but in principle could also be used to reflect multiple occurrences of the same event. One reason you might want this is to distinguish “OS update failed once due to filesystem corruption on 10,000 different computers” from “factoid actualy just statistical error. Average updater fails 0 times per year. Updaters Georg, who lives in cave & fails to update due to filesystem corruption over 10,000 times each day, is an outlier adn should not have been counted”.

You can view the complete list of events reported by the OS & accepted by the server in the Azafea documentation. (The “Deprecated Events” section documents, for posterity, events that were once recorded at some point in the history of Endless OS, but are not recorded by current OS versions.)

Since no personal data is collected, the metrics system is enabled by default. Defaults are powerful, so having the defaults this way around makes the data more representative of the user base than an opt-in system would; it also means we have a strong need to respect our users’ privacy. A page in Initial Setup allows the user to disable the metrics system; and until Initial Setup is complete, events are buffered locally but not submitted to the server. This means that events from early in the first boot process are not lost if the default is not changed, but that nothing is submitted until the device owner has been informed & given the chance to opt out. (If they do opt out, the buffered events are erased and no further events are buffered unless they later opt back in.)

There is one more component, eos-phone-home, which on the client side is completely separate, though the information flows into the same PostgreSQL database on the server. This is inspired by a system designed for Ubuntu (though I’m not sure if it is in use) and is conceptually somewhat similar to the DNF ‘countme’ system in Fedora. It sends to Endless OS Foundation:

A one-time activation message the first time a device goes online, with channel, hardware vendor & model (e.g. “Dell Inc.”, “XPS 13 9380” for my laptop), and the current OS version (e.g. “5.0.4”). The server performs a GeoIP lookup against an offline database, stores the country code (e.g. “GB”) and location rounded to the nearest 1° of latitude & longitude, then discards the IP address without logging it.
A ping from each online device (except live USBs), at most once every 24 hours, with channel, hardware vendor & model, current OS version, and the number of previous successful pings from this device. This allows measuring retention over time, without identifying any specific machine. The server determines a country code (and nothing more) via offline GeoIP, then discards the IP address without logging or storing it. The ping also includes a boolean for whether the user has enabled or disabled the system above. Roughly 90% of users have the full metrics system enabled.

For context, 1° of latitude is roughly 111km. 1° of longitude varies between 111km at the equator and 0km at the poles; at the very north of the UK it’s about 55km. This is a 1° by 1° area over Chicago, a city of 2.7 million people:

A rectangle superimposed on a map of Chicago. It covers the bulk of the Chicago metropolitan area, as well as some nearby less-populated areas and a body of water.

And here is São Paulo, where somewhere between 12 and 33 million people live, depending on how you count:

A rectangle superimposed on a map of São Paulo. It completely covers the city and its surrounding metropolitan area, extending almost as far as Campinas, a nearby city.

Here in the UK we sometimes describe large areas in terms of the area of Wales, so here is a 1° by 1° area that is roughly one-third of the size of Wales:

A rectangle superimposed on a map of Wales. It covers a bit less than half of the country.

Every component described above is open-source and auditable. For completeness: it’s not an integral part of the stack, but we use a self-hosted instance of Metabase, an open-core business intelligence platform, to visualise the data stored in the PostgreSQL database. The Endless OS Foundation team has access to Metabase, and thence to the data stored by Azafea; nobody else has access to the raw data. Metabase allows us to make read-only charts and dashboards visible to the public. Here is a visualisation of the total RAM in Endless OS users’ machines in the past 24 hours at the time of writing, rounded to the nearest half-gigabyte (or rather, for my fellow pedants, gibibyte), taken from this event. You can see the live chart here. And if you mouse over one of the segments in the live chart, you can infer from the totals that there are somewhere north of 17,000 computers running Endless OS 4 or newer. (This is an underestimate of our total user count because it only considers the computers in use in a particular 24-hour window, which have not opted out of metrics, and which are not part of the ~20% of systems running an older OS version.)

Chart: “How much RAM do our users have?”

2,048 MB: 1.48%
3.584 MB: 64.23%
5,632 MB: 6.97%
7.680 MB: 23.32%
Other: 3.99%

And what have we actually learned?

As you can see above, at the time of writing, a large majority of Endless OS users have around 4GB of RAM. Most of the rest have rather more, but around 1.5% of our users are clinging on with just 2GB. This justifies our periodic work on better handling of low-memory conditions, and reminds us that RAM is not so plentiful as our own devices might have us believe. Perhaps if we could reduce the OS’s RAM usage, we could retain more users on such lower-end hardware.

We have a tool called eos-gates that intercepts users trying to run Windows software or install deb or RPM packages, and (where possible) guides them to the same app, or an equivalent, on Flathub. At the time of writing, by a very wide margin, the most popular Windows app our users try to install is Roblox. Sadly we can’t really recommend a straightforward way to install that particular game, but this data has helped us to find other apps that we should add to the mapping.

After the release of Endless OS 4, a few users came to our forum to report abnormally long boot times. We were able to confirm from our startup-timing metric that this was not an isolated problem, and address the issue in an OS update by moving a repository migration job that was extremely slow on spinning-disk systems with many apps installed to later in the boot process. We saw the improvement in the startup-timing data for subsequent OS releases.

We have also used this data to support decisions outside Endless OS Foundation. For a recent example, the team that develops the freedesktop SDK – used by every one of the 2,000+ apps on Flathub – were considering whether to require a CPU that supports the SSE 4.2 instructions. Because we collect anonymous hardware metrics, we were able to provide a snapshot of the CPUs in use on a given day by Endless OS users, and the freedesktop-sdk team was able to determine what proportion of those systems would be unable to run software from Flathub if this change were made. The other dataset under consideration was the Steam hardware survey, but Endless OS systems are a better representation of “regular people’s computers”, rather than gamers who you would expect to skew towards higher-end systems. (This case demonstrated that recording the CPU model name, such as Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, is not really adequate. CPU flags would be more useful. In this case, Jordan laboriously mapped the model names back to the relevant data.)

Future work

There is of course more that we could do on this system. In terms of more instrumentation: a common hypothesis is that most desktop Linux systems are single-user, in the sense that they only have one Unix user account (which may or may not be shared between multiple humans). This is supported by anecdote but it would be useful to be able to quantify this. There are many more questions like this: are multi-user systems multi-lingual? Do many users have multiple keyboard layouts configured? How common are different monitor configurations?

It would be interesting to look at ways to further anonymise the reported data. One possible approach is randomised response, where before submitting a boolean data point, you flip a coin. If it comes up heads, you report true. If it comes up tails, you report the actual boolean flag. Then, if 20% of reports are false, you can assume that the true ratio is 40%, without having to know which specific data points are valid. There are related techniques for numerical data where you bias the data with some known probability.

You could use this system as part of an opt-in user studies programme. For example, some tool could explicitly prompt a sample of users for their consent to record identifiable data over (say) a week, to perform some targeted behavioural study, perhaps in combination with a Shell extension implementing some experimental change. The site ID mechanism could be used to tag an individual system for the duration of the study, or which half of an A/B test the system fell into; and further events could be recorded for that week. After the week, the user would be prompted for some written feedback, then the ID would be removed and the additional events disabled.

While we cannot share the raw underlying data publicly – lest someone smarter than me figure out a way to de-anonymise it – I would love to see us publish dashboards of aggregated data, similar to the Steam hardware survey, Mozilla’s Hardware Across The Web, Flathub’s statistics, and so on. This hasn’t happened yet only for want of having time to do it. Please do get in touch if there is specific data of public interest.

And finally: Endless OS is just a moderate slice of the open-source desktop pie. We have somewhere in the low-to-mid tens-of-thousands of active online users – this is not insubstantial, but my guess is that it is one or two orders of magnitude smaller than larger platforms like Fedora, Ubuntu, and Steam OS. I believe the tools Endless has built are pretty good – shout-out to Kurt von Laven, Philip Chimento, Devin Ekins, Cosimo Cecchi, Mathieu Bridon, Guillaume Ayoub, Antoine Michon, and many other contributors over nearly a decade – and it would be great to see them adopted elsewhere in the desktop Linux space. Even more than that, I would love to see privacy-respecting telemetry accepted more widely in the desktop Linux space, so that we as a community can make better-informed, data-driven decisions.

Endless contributions to GNOME 44

The GNOME 44 release is rushing towards us like an irate pangolin! Here is a quick roundup of some of the Endless OS Foundation team’s contributions over its six-month development cycle.

Software

As in the previous cycle, our team has been a key contributor to GNOME Software 44. Based on a very unscientific analysis of the Git commit log, about 30% of non-merge commits and 75% of merge commits to GNOME Software during this cycle came from our team. Co-maintainer Philip Withnall has continued his work to refactor Software’s internal threading model to improve its reliability. He’s also contributed a number of fixes in both GNOME Software and Flatpak to fix issues related to app updates, such as not leaking large temporary directories when an update fails. Dan Nicholson fixed an issue in Flatpak which would cause Software to remove an app rather than updating it when its ID changes.

Georges Stavracas added some sysprof instrumentation which allowed him to quickly pinpoint the main cause of slow loading of category pages. To our collective surprise, the culprit was… loading remote icons for apps! Georges fixed this issue by downloading icons asynchronously. A side-by-side comparison is really quite striking:

As we came closer to the release of Endless OS 5, we realised we needed some improvements to the handling of OS updates in GNOME Software, such as showing a Learn More link for major upgrades, distinguishing between major upgrades and minor updates, and using the distro’s icon when showing a minor update. Like Endless OS, GNOME OS uses eos-updater, although these improvements will not kick in fully there right now, since it currently does not set any OS version metadata on its updates, or a logo in os-release.

The GNOME Software updates page, showing a minor update for Endless OS with the Endless logo beside it.

Of course, we’ve also contributed to the ongoing maintenance of Software, and other functional improvements such as displaying release urgency levels for firmware updates.

Looking ahead, Joana Filizola has spearheaded a series of user studies on topics like how navigation within Software works, discoverability of search, and the name ‘Software’ itself: we hope these will bear fruit in future GNOME cycles.

Shell

As well as ongoing maintenance of Shell and Mutter, Georges Stavracas contributed improvements to the quick settings pills, adding subtitles to improve the information density. This went hand-in-hand with work to improve GNOME’s handling of Flatpak apps that are running in the background (i.e. without a visible window). Previously this was rather crude: if a Flatpak app ran without a window for some period of time, you would get a decontextualized dialog box asking if you want to allow the app to keep running. Choosing the “wrong” option would kill the app and forbid it from running in the background in future – breaking core functionality for certain apps. In GNOME 44, background apps are instead listed within the quick settings popover, and those apps that do use the background portal API to ask nicely to run in the background are allowed to do so without user interaction.

We also supported the design team’s experiments around how window focus is communicated.

GLib

Philip Withnall has, as in many previous cycles, contributed tens of hours of ongoing maintenance to this library that underpins the entire desktop. This has included a number of GVariant security fixes (like this one), GApplication security fixes, GDBus refcounting fixes, and more. Philip also added g_free_sized() and g_aligned_free_sized(), mirroring similar functions in C23, so that applications can start using these without needing to check for (or wait for) C23 support in the toolchain.

Initial Setup

I spent somewhat fewer hours—but not zero!—on general maintenance of Initial Setup. Georges fixed a regression that meant that privacy policies could not be viewed from within Initial Setup; I fixed the display of a shortlist of keyboard layouts, and of non-ASCII characters in location names after switching locale; and Cassidy James Blaede refreshed the design of the password page to use Adwaita widgets & styling.

Password page of GNOME Initial Setup. The password fields have inline icons to edit the text and reveal the password.

…and more

Every quarter, the engineering teams at Endless OS Foundation have an “intermission week”, where the team sets aside our normal priorities to focus on addressing tech debt, wishlist items, innovative or experimental ideas, and learning. Some of the items above came out of the last couple of intermission weeks! On top of that, Philip has spent some time experimenting with APIs to allow apps’ state to be saved and restored; and João Paulo Rechi Vita explored making the GNOME Online Accounts daemon quit when idle, saving a small but non-zero amount of RAM. Neither of these are quite in a production-ready state, but as they say: there’s always another release!

Meanwhile, we’ve been working on extending the set of web apps offered in GNOME Software on Endless OS, using more expansive criteria than the list shipped by GNOME Software by default, and a different delivery mechanism for the catalogue. More on this in a future post!

Using the ‘glab’ CLI tool with GNOME GitLab

I like to use the glab command-line tool, which used to be a third-party project but which has apparently now been adopted by GitLab themselves. In particular, the glab mr family of commands to interact with merge requests are invaluable for checking out branches from contributors’ forks.

Since October 2022, GNOME’s GitLab instance now has a somewhat unusual configuration where the SSH hostname (ssh.gitlab.gnome.org) is different to the web/API hostname (gitlab.gnome.org). To make old checkouts continue to work, I have the following configuration in my ~/.ssh/config:

Host gitlab.gnome.org
   Hostname ssh.gitlab.gnome.org

But whether you set the SSH hostname in this way, or use the new hostname in Git remote URLs, glab will complain:

none of the git remotes configured for this repository points to a known GitLab host. Please use `glab auth login` to authenticate and configure a new host for glab

To get this to work, set the GitLab hostname to ssh.gitlab.gnome.org and the API hostname to gitlab.gnome.org. In ~/.config/glab-cli/config.yml, this looks like this:

hosts:
    ssh.gitlab.gnome.org:
        token: redacted
        api_host: gitlab.gnome.org
        git_protocol: ssh
        api_protocol: https

With this configuration, glab auth status shows incorrect API URLs, but the tool actually works:

$ glab auth status
ssh.gitlab.gnome.org
  ✓ Logged in to ssh.gitlab.gnome.org as wjt (/home/wjt/.config/glab-cli/config.yml)
  ✓ Git operations for ssh.gitlab.gnome.org configured to use ssh protocol.
  ✓ API calls for ssh.gitlab.gnome.org are made over https protocol
  ✓ REST API Endpoint: https://ssh.gitlab.gnome.org/api/v4/
  ✓ GraphQL Endpoint: https://ssh.gitlab.gnome.org/api/graphql/

I’m posting this because I spent a while trying to find a way to override the SSH hostname, before finding this issue which explains that you do it the other way around, by overriding the API hostname.

Recovering a truncated Zoom meeting recording on Endless OS

One of my colleagues was recording a Zoom meeting. The session ended in such a way that the recording was left unfinished, named video2013876469.mp4.tmp. Trying to play it in Videos just gave the error “This file is invalid and cannot be played”. ffplay gave the more helpful error “moov atom not found”. It wasn’t too hard to recover the file, but it did involve a bit of podman knowledge, so I’m writing it up here for posterity.

A bit of DuckDuckGo-ing tells us that the moov atom is, roughly, the index of the video. Evidently Zoom writes it at the end of the file when the recording is stopped cleanly; so if that doesn’t happen, this metadata is missing and the file cannot be played. A little more DuckDuckGo-ing tells us that untrunc can be used to recover such a truncated recording, given an untruncated recording from the same source. Basically it uses the metadata from the complete recording to infer what the metadata should be for the truncated recording, and fill it in.

I built untrunc with podman as follows:

git clone https://github.com/ponchio/untrunc.git
cd untrunc
podman build -t untrunc .

I made a fresh directory in my home directory and placed the truncated video and a successful Zoom recording into it:

mkdir ~/tmp
# Make the directory world-writable (see later note)
chmod 777 ~/tmp
cp ~/Downloads/video2013876469.mp4.tmp ~/tmp
cp ~/Downloads/zoom_0.mp4 ~/tmp

Now I ran the untrunc container, mounting $HOME/tmp from my host system to /files in the container, and passing paths relative to that:

podman run --rm -it -v $HOME/tmp:files \
    localhost/untrunc:latest \
    /files/zoom_0.mp4 \
    /files/video2013876469.mp4.tmp

and off it went:

Reading: /files/zoom_0.mp4
Repair: /files/video2013876469.mp4.tmp
Mdat not found!

Trying a different approach to locate mdat start
Repair: /files/video2013876469.mp4.tmp
Backtracked enough!

Trying a different approach to locate mdat start
Repair: /files/video2013876469.mp4.tmp
Mdat not found!

[…]

Found 71359 packets.
Found 39648 chunks for mp4a
Found 31711 chunks for avc1
Saving to: /files/video2013876469.mp4_fixed.mp4

The fixed file is now at ~/tmp/video2013876469.mp4_fixed.mp4.

Tinkering with the directory permissions was needed because untrunc’s Dockerfile creates a user within the container and runs the entrypoint as that user, rather than running as root. I believe in the Docker world this is considered good practice because root in a Docker container is root on the host, too. However I’m using podman as an unprivileged user; so UID 0 in the container is my user on the host; and the untrunc user in the container ends up mapped to UID 166535 on the host, which doesn’t have write access to the directory I mounted into the container. Honestly I don’t know how you’re supposed to manage this, so I just made the directory world-writable and moved on with my life.

GNOME 43: Endless’s Part In Its Creation

GNOME 43 is out, and as always there is lots of good stuff in there. (Me circa 2014 would be delighted to see the continuous improvements in GNOME’s built-in RDP support.) During this cycle, the OS team at Endless OS Foundation spent a big chunk of our time on other initiatives, such as bringing Endless Key to more platforms and supporting the Endless Laptop programme. Even so, we made some notable contributions to this GNOME release. Here are a few of them!

App grid pagination improvements

The Endless OS desktop looks a bit different to GNOME, most notably in that the app grid lives on the wallpaper, not behind it. But once you’re at the app grid, it behaves the same in both desktops. Endless OS computers typically have hundreds of apps installed, so it’s normal to have 2, 3, or more pages of apps.

We’ve learned from Endless OS users and partners that the row of dots at the bottom of the grid did not provide enough of a clue that there are more pages than the first. And when given a hint that more pages are available, indicated by those dots, users rarely discovered that they can switch with the scroll wheel or a swipe: they would instead click on those tiny dots. Tricky even for an accomplished mouse user!

GNOME 40 introduced an effect where moving the mouse to the edges of the screen would cause successive pages of apps to “peek” in. As we’ve carried out user testing on our GNOME 41-based development branch (more on this another time) we found that this was not enough: if you don’t know the other pages are there, there’s no reason to deliberately move your mouse pointer to the empty space at the edges of the screen.

So, we proposed for GNOME something similar to what we designed and shipped in Endless OS 4: always-visible pagination arrows. What we ended up implementing & shipping in GNOME 43 is a bit different to what we’d proposed, after several rounds of iteration with the GNOME design team, and I think it’s even better for it. Implementing this was also an opportunity to fix up many existing bugs in the grid, particularly when dragging and dropping between pages.

GNOME Software

43% of the code-changing commits between GNOME Software 42.0 and the tip of gnome-software main as of 29th September came from Endless – not bad, but still no match for Red Hat’s Milan Crha, who single-handedly wrote 46% of the commits in that range! (Counting commits is a crude measure, and excluding translation updates and merge commits overlooks significant, essential work; even with those caveats, I still think the number is striking.)

Many of our contributions in this cycle were part of the ongoing threading rework that Philip Withnall spoke about at GUADEC 2022, with the goals of improving performance, reducing memory usage, and eliminating hangs due to thread-pool exhaustion. Along the way, this included some improvements to the way that featured and Editor’s Choice apps are retrieved.

Several patches bearing Joaquim Rocha’s name and an Endless email address landed in this cycle, improving Software’s handling of apps queued for installation, despite Joaquim having moved on from Endless 3 years ago. These originally come from Endless’s fork of GNOME Software and date back to 2018, and made their way upstream as part of our quest to converge our fork with upstream. In related news, we recently rebased the Endless OS branch of Software onto the gnome-43; we are down from 200+ patches a few years ago to 19. Nearly there!

At the start of this year, Phaedrus Leeds was contracted by the GNOME Foundation (funded by Endless Network) to reintroduce the ability to install and manage web apps with Software, even when GNOME Web is installed with Flatpak. This work was not quite ready for the GNOME 42 feature freeze, and landed in GNOME 43. I personally did a trivial amount of work to enable this feature in GNOME OS and add a few sites to the curated list, but as I write this post I have realised that these additions were not actually shipped in the GNOME Software 43.0 tarball. I did a bit of research into how we can expand this curated list without creating a tonne of extra work for our community of volunteers, but haven’t had a chance to write this up just yet.

Looking to the future, Georges Stavracas has recently spent some time improving Software’s sysprof integration to help understand where Software is spending its time, and hence improve its perceived responsiveness. One of the first discoveries is that the majority of the delay before a category page becomes responsive is spent downloading app icons; making this asynchronous will make Software feel much snappier. Alas, the current approach for fixing this will change Software’s plugin API, so will have to wait for GNOME 44. I’m sure that with decent profiler integration and enough eyes on the profiler, we’ll be able to find more cases like this.

GTK 4-flavoured Initial Setup

Serial GTK 4 porter Georges Stavracas ported Initial Setup to GTK 4. Since Initial Setup uses libmalcontent-ui to implement its parental controls pages, he also ported the Parental Controls app to GTK 4.

This port was a direct update of the existing UI to a new toolkit version, only adopting new widgets like GtkPasswordEntry and AdwPreferencesPage where it was trivial to do so. Designs exist for a refreshed Initial Setup interface – anyone interested in picking this up?

Initial Setup has a remarkably large dependency graph, which made this update trickier than it might otherwise have been. I made a start back in January for GNOME 42 and dealt with some of the easier library changes, but more traps awaited:

Initial Setup depends on goa-backend-1.0, the bit of GNOME Online Accounts that actually has a user interface (which uses WebKit). This is, for now, GTK 3 only. The solution Georges used here was the same as he used in GNOME Settings: move it into a separate process.
Next up, Initial Setup itself uses WebKit (to show the Mozilla Location Service terms of service and abrt privacy policy). The GTK 4 port of WebKit was not widely available in distros at the time of the port. As a result, Initial Setup’s GitLab CI switched from Fedora to Arch. It also means that Initial Setup has a transitive dependency on libsoup 3…
Malcontent uses libflatpak; until recently, libflatpak had a hard dependency on libsoup 2.4 with no libsoup 3 port in evidence. So Initial Setup would transitively link to both libsoup 2.4 and 3, and abort on startup, if parental controls were enabled. Happily, a libcurl backend appeared in libflatpak 1.14, and libostree already had a libcurl backend, so if your distro configures both of those to use libcurl then parental controls can be safely enabled in Initial Setup 43.

It’s interesting to me that libsoup’s API changes have caused several GNOME-adjacent projects not to migrate to the new API, but to de-facto move away from libsoup. Hindsight is 20/20, etc. There is a draft pull request to build ostree against libsoup 3, so perhaps they will return to the soup tureen in due course.

Behind the scenes / friends of GNOME

Not all heroes wear capes, and not all contributions are as visible as others. Our team continues to co-maintain countless GNOME and GNOME-adjacent modules, and fix tricky problems at their source, such as this file-descriptor leak Georges caught in libostree.

We’ve been involved in GNOME design discussions, with Cassidy James Blaede (who joined Endless earlier this year) joining the Design team. We helped reach consensus for the new Quick Settings design, and are continuing to be involved in future design initiatives.

Someone recently asked Cassidy on Twitter whether it is true that GNOME OS is “basically just a modified version of Endless OS”, as they had heard. It’s not! But you could probably consider them second cousins once removed, and they have enough in common that improvements flow both ways. GNOME OS uses eos-updater, our libostree-based daemon that downloads and installs OS updates (and does some other stuff that GNOME OS doesn’t use). A while back, Dan Nicholson taught eos-updater how to not lose changes in /etc in the time between an update being installed, and it being booted. (Which can be pretty bad! Entire users can be lost this way!) But we found that this libostree feature interacted poorly with the way /boot is automounted on systems that use systemd-boot, so the change was disabled on such systems. More recently, Dan fixed libostree to work correctly in this case, so eos-updater can now correctly preserve changes in /etc. GNOME OS uses systemd-boot, so in due course the fixed libostree and eos-updater will appear there and this problem you probably didn’t know you have will be fixed.

And since my last post, Jian-Hong Pan updated TurtleBlocks on Flathub to the GNOME 42 runtime, dealing with another of the long tail of Flathub apps on end-of-lifed runtime versions. Sadly it fails to build on the GNOME 43 runtime due to an apparent setuptools regression, but 42 has another 6 months in it yet.

I could go on, and I’m sure I’ve there are things my fickle memory has overlooked, but for now: onwards!

Fitting Endless OS images on small disks

Last week I read Jorge Castro’s article On “Wasting disk space” with interest, and not only because it cites one of my own articles 😉. Jorge encouraged me to write up the conversation we had off the back of it, so here we go!

People like to fixate on the disk space used by installing a calculator app as a Flatpak when you don’t have any other Flatpak apps installed. For example, on my system GNOME Calculator takes up 9.3 MB for itself, plus 803.1 MB for the GNOME 42 runtime it depends on. Regular readers will not be surprised when I say that that 803.1 MB figure looks rather different when you realise that Calculator is just one of 70 apps on my system that use that runtime; 11.5 MB of runtime per app feels a lot more reasonable.

But I do have one app installed which depends on the GNOME 3.34 runtime, which has been unsupported since August 2020, and the GNOME 3.34 runtime only shares 102 MB of its files with the GNOME 42 runtime, leaving 769 MB installed solely for this one 11 MB app. Not such a big deal on my laptop with a half-terabyte drive, but this gets to a point Jorge makes near the end of his article:

Yes, they take up more room, but it’s not by much, and unless you’re on an extremely size-restrained system (like say a 64GB Chromebook you are repurposing) then for most systems it’s a wash.

Part of the insight behind Endless OS is that it can be more practical and cost-effective to fill a large hard disk with apps and educational resources than to get access to high-bandwidth connectivity in remote or disadvantaged communities, and the devices we and our deployment partners use generally have plentiful storage. But inexpensive, lower-end devices with 64 GB storage are still quite common, and having runtimes installed for one or two apps consumes valuable space that could be used for more offline content. So this can be a real problem for us, if a small handful of apps are bloating an image to the point where it doesn’t fit, or leaves little space for documents & updates.

So, our OS image builder has a mode to tabulate the apps that will be preinstalled in a given image configuration, grouped by the runtime they use, with their approximate sizes. I added this mode when I was trying to make an Endless OS ISO fit on a 4.7 GB DVD a few years back, packing in as much content as possible. It’s a bit rough and ready, and tends to overestimate the disk space needed (because it does not take into account deduplication identical files between different apps and runtimes, or installing only a subset of translations for an app or runtime) but it is still useful to get a sense of where the space is going.

I ran this for the downloadable English configuration of Endless OS earlier today. I’ll spare you the full output here but if you are curious it’s in this gist. It shows that, like on my system, TurtleBlocks is the only reason the GNOME 3.34 runtime is preinstalled, and so removing that app or updating it to a more modern runtime would probably save somewhere between 1 GB and 2 GB in the image! Normally after seeing this I would go and see if I can take a few minutes to send an update to the app, but in this case someone has beaten me to it. (If you are a Python expert interested in block-based programming tools for kids, why not lend a hand getting this over the line?) It also shows that a couple of our unmaintained and sadly closed-source first-party apps, Resumé and My Budget, are stuck on an even more prehistoric runtime.

Running it on a deployment partner’s custom image that they reported is too big for the 64 GB target hardware showed that the Othman Quran browser is also using an older runtime; once more, someone else has already noticed in the last couple of days, and if you are an expert on GTK’s font selection your input would be welcome on that pull request.

This tool is how it came to pass that I updated the runtimes of gbrainy and Genius, two apps I have never used, last month, and Klavaro back in May, among others.

There tends to be a flurry of community activity every 6-12 months to update all the apps that depend on newly end-of-lifed runtimes, and many of these turn out not to be rocket science, though it is not (yet!) something that flatpak-external-data-checker can do automatically. Then we are left with a long tail of more lightly-maintained apps where the update is more cumbersome to do, as in the case of gbrainy, where it took a bit of staring at the unfamiliar error messages produced by the Mono C# compiler to figure out where the problem might lie. If you are good at debugging build failures, this kind of thing is a great way to contribute to Flathub; and if someone can remind me of the URL of the live-updating TODO list of apps on obsolete runtimes I once saw, I’ll add the link here.

While I’ve focused on one of the problems that apps depending on obsolete runtimes can cause, it’s not all bad news. If you really love, or need, an app that is abandoned or has not been updated in a while, the Flatpak model you can still install & use that app and its old runtime without your distribution having to keep around some obsolete version of the libraries, or you having to stay on an old version of the distro. As and when that app does get an update, the unused and end-of-lifed runtime on your system will be uninstalled automatically by modern versions of Flatpak.

Creating Windows installation media on Linux

Every so often I need to install Windows, most recently for my GNOME on WSL experiments, and to do this I need to write the Windows installer ISO to a USB stick. Unlike most Linux distro ISOs, these are true, pure ISO 9660 images—not hybrid images that can also be treated as a DOS/MBR disk image—so they can’t just be written directly to the disk. Microsoft’s own tool is only available for Windows, of course.

I’m sure there are other ways but this is what I do. Edit: check the comments for an approach which involves 2 partitions and a little more careful copying, but no special tools. I’m writing it down so I can easily find the instructions next time!

The basic process is quite simple:

Download an ISO 9660 disk image from Microsoft
Partition the USB drive with a single basic data partition, formatted as FAT32
Mount the ISO image – on GNOME, you should just be able to double-click it to mount it with Disk Image Mounter
Copy all the files from the mounted ISO image to the USB drive

But there is a big catch with that last step: at least one of the .wim files in the ISO is too large for a FAT32 partition.

The trick is to first copy all the files to a writeable directory on internal storage, then use a tool called wimlib-imagex split from wimlib to split the large .wim file into a number of smaller .swm files before copying them to the FAT32 partition. I think I compiled it from source, in a toolbox container, but you could also use this OCI container image whose README helpfully provides these instructions:

find . -size +4294967000c -iname '*.wim' -print | while read -r wimpath; do
  wimbase="$(basename "$wimpath" '.wim')"
  wimdir="$(dirname "$wimpath")"
  echo "splitting ${wimpath}"
  docker run \
    --rm \
    --interactive \
    --tty \
    --volume "$(pwd):/work" \
    "backplane/wimlib-imagex" \
      split "$wimpath" "${wimdir}/${wimbase}.swm" 4000
done

Now you can copy all those files, minus the too-large .wim, onto the FAT32 drive, and then boot from it.

This all assumes that you only care about a modern system with EFI firmware. I have no idea about creating a BIOS-bootable Windows installer on Linux, and fortunately I have never needed to do this: to test stuff on a BIOS Windows installation, I have used the time-limited virtual machines that Microsoft publishes for testing stuff in old versions of Internet Explorer.

I was inspired to resurrect this old draft post by a tweet by Ross Burton.

How many Flathub apps reuse other package formats?

Today I read Comparison of Fedora Flatpaks and Flathub remotes by Hari Rana, who is an active and valued member of the Flatpak community. The article is a well-researched and well-written overview of how these two Flatpak ecosystems differ, and contains the following remark about one major difference (emphasis mine):

Flathub is open with what source a Flatpak application (re)uses, whereas Fedora Flatpaks strictly reuses the RPM format.

As such, Flathub has tons of applications that reuse other package formats.

When this article was discussed in the Flatpak Matrix channel, several people wondered whether “tons” is a fair assessment. Let’s find out!

The specific examples given in the article are of apps which reuse a .deb (to which I will add .rpm), AppImage, Snap package, or binary .tar.gz archive. It’s not so easy to distinguish a binary tarball from a source tarball, so as a substitute I will look for apps which use the extra-data to download external sources at install time rather than at build time.

I have cloned every repo from the Flathub GitHub organisation with this script I had lying around. There are 2,220 such repositories. This is a bigger number than the 1,518 apps cited in the blog post, because it includes many thing which are not apps, such as 258 GTK themes and 60 digital audio workstation plugins. I also believe that the 1,518 number does not include end-of-lifed apps, whereas my methodology does. This post will also ignore the existence of OBS Studio and Firefox, where those projects build the Flatpak from source on their own infrastructure and push the result into Flathub.

Now I’m just going to grep them all for the offending strings:

$ (for i in */
do
    if git -C $i grep --quiet -E '(\.(deb|rpm|AppImage|snap)\&gt;)|(extra-data)'
    then
        echo $i
    fi
done) | wc -l
237

(Splitting apart the search terms, we have 141 repos matching .deb, 10 for .rpm, 23 for .AppImage, 6 for .snap, and 110 for extra-data. These numbers don’t sum to 237 because the same repo can use multiple formats, and these binary files are often used by extra-data apps.)

So by my back-of-an-envelope calculation, 237 out of 2220 repos on Flathub repackage other binary formats. This is a little under 11%. Of those 237, 51 are GTK themes, specifically variations of the Mint, Pop and Yaru themes. If we assume that all the other 186 are apps, and that none of them are EOLed, then 186 divided by 1,518 gives us a little more than 12% of apps on Flathub that are repackaged from other binary formats. (I believe this is a slight overestimate but I have run out of time this morning.)

Is that a big number? It’s roughly what I expected. Is it “ton[ne]s”? Compared to Fedora’s Flatpak repo, where everything is built from source, it certainly is: indeed, it’s more than the total number of apps in the Fedora Flatpak repo!

If it is valuable for Flathub to provide proprietary apps like Slack whose publishers do not currently wish to support Flatpak (which I believe it is) then it’s unavoidable that some apps repackage other binary formats. OK, time for one last bit of data: what if we exclude extra-data apps?

$ (for i in */
do
    if ! git -C $i grep --quiet extra-data && \
       git -C $i grep --quiet -E '\.(deb|rpm|AppImage|snap)\>'
    then
        echo $i
    fi
done )| wc -l
127

So (ignoring non-extra-data apps which use binary tarballs, if any such apps exist) that’s something like 76 apps and 51 GTK themes which probably could be built from source by Flathub, but aren’t. It may be hard to build some of these apps from source (perhaps the upstream build system requires network access) but the rewards would include support for aarch64 and any other architectures Flathub may add, and arguably greater transparency in how the app is built.

If you want to do your own research in this vein, you may be interested in gasinvein‘s Flatpak remote metadata fetcher, which would let you generate and analyse a 200 MiB JSON file rather than by cloning and grep-ing 4.6 GiB of Git repositories. His analysis using this data yields 174 apps, quite close to my 186 estimate above.

./flatpak-remote-metadata.py -u https://dl.flathub.org/repo flathub | \
    jq -r '.[] | select(
        .manifest | objects | .modules[] | recurse(.modules | arrays | .[]) |
        .sources | arrays | .[] | .url | strings | test(".*.(deb|rpm|snap|AppImage)$")
    ) | .metadata.Application.name // .metadata.Runtime.name' | \
    sort -u | wc -l

Release (semi-)automation

The time I have available to maintain GNOME Initial Setup is very limited, as anyone who has looked at the commit history will have noticed. I’d love more eyes & hands on this important but easy-to-overlook component, particularly to guide it kindly but firmly into the modern age of GTK 4 and the refreshed HIG.

I found that making a batch of 1–3 releases across different GNOME branches every few months was surprisingly time-consuming and error-prone, even with the pretty comprehensive release process checklist on the GNOME Wiki, so I’ve been periodically trying to automate bits of it away.

Philip Withnall’s gitlab-changelog script makes writing the NEWS file a lot quicker. I taught it to output the human-readable names of each updated translation (a nice additional contribution would be to also include the name of the human who updated the translation) and made it a little smarter about guessing the Git commit range to scan.

Beyond that, I added a Meson run target, maintainer-upload-release pointing at a script which performs some rudimentary coherence checks on the version number, tags the release (using git-evtag if available), atomically pushes the branch and that tag to GNOME GitLab, then copies the source tarball to master.gnome.org. (Apparently it has been almost 12 years since I did something similar in telepathy-gabble, building on the make maintainer-upload-release target that Simon McVittie added in 2008, which is where I borrowed the name.) Maybe other module maintainers may find this script useful too – it’s quite generic.

Putting these together, the release flow looks like this:

git switch gnome-42
git pull
../pwithnall/gitlab-changelog/gitlab-changelog.py GNOME/gnome-initial-setup
# Manually edit NEWS to incorporate the changelog, adjusted as needed
# Manually check the version in meson.build
git commit -am 'NEWS for 42.Y'
ninja -C _build dist maintainer-upload-release

Another release-related quality-of-life improvement is to make GitLab CI not only build and test the project (in the vain hope that there might actually be tests!) but also check that the install and gnome-initial-setup-pot targets both work. (At one point or another both have failed at or around release time; now they never will again, famous last words.)

I know none of this is rocket science, but I find it all makes the process quicker and less cumbersome, and it’s stopped me from repeating errors like uploading the wrong version on a few tired evenings. Obviously this could all be taken further: perhaps a manually-invoked CI pipeline that does all this stuff, more checks, etc. But while I’m on this train of thought:

Why do we release GNOME modules one-by-one at all?

The workflow we use to release Endless OS is a bit different to GNOME. Once we merge a change to some module’s Git repository, such as eos-updater or our shrinking branch of GNOME Software, that change embarks on a scenic automated journey that takes it to the next nightly build of the entire OS, both as an OSTree update and as fresh installation media. I use these nightly builds for my daily work, safe in the knowledge that I can roll back to the previous build if necessary.

We don’t make releases of individual modules: instead, when it comes time to release the OS, we trigger a pipeline that (among many other things) pushes the already-built OS update to the production repo, and creates Release_x.y.z tags on each Git repo.

This was quite an adjustment for me at first, compared to lovingly hand-crafting NEWS files and coming up with funny/esoteric release names, but now that I’m used to it it’s hard to go back. Why can’t GNOME do the same?

At this point in the post, we are straying into territory that I have limited first-hand knowledge of. Caveat lector! But here goes:

Thanks to GNOME OS, GNOME already has nightly builds of the entire desktop and apps: so rather than having to build everything yourself, or wait for a development release of GNOME, you can just update & reboot your GNOME OS VM and test the change right there. gnome-build-meta knows how to build every GNOME module; and if you can build the code, it seems a conceptually small step to run ninja dist and the stuff above to publish tags and tarballs for each module.

So you could well imagine on 43.beta release day, someone in the release team could boot the latest GNOME OS nightly, declare it to be Good, and push a button that tags every relevant GNOME module & builds and uploads all the tarballs, and then go back to their day, rather than having to chase down module owners who haven’t quite got around to making the release, fix random build breakages, and so on.

To make this work reliably, I think you’d need every module’s CI to be run through gnome-build-meta, building that MR against the rest of the project, so that g-b-m build failures would be caught before (not after) the offending change lands in the module in question. Seems doable – in Endless we have the equivalent thing managed by a jenkins-job-builder template, the GitHub Pull Request Builder plugin, and a gnarly script.

Continuous integration and deployment are becoming the norm throughout the software industry, for good reasons laid out quite well in articles like Shipping Fast Changes Your Life: the smaller the gap between making a change and it reaching a user, the faster the feedback, and the less costly it is to fix a bug or change course.

The free software movement has historically been ahead of the curve on this, with the “release early, release often” philosophy. And GNOME in particular has used a time-based release process for two decades, allowing major distros to align their schedules to GNOME and get updates into the hands of users quickly, which went some way towards overcoming the fact that GNOME does not own the full pipeline from source code to end users.

Havoc Pennington’s June 2002 email proposing this model has aged rather well, in my opinion, and places a heavy emphasis on the development branch being usable:

The unstable branch must always be dogfood-quality. If testers can’t test it by using it daily, they can’t make the jump. If the unstable branch becomes too unstable, we can’t release it on a reliable schedule, so we have to start breaking the stable branch as a stopgap.

Interestingly the time-based release schedule wiki page states that the schedule should contain:

Regular test release dates, approximately every 2 weeks.

These days, GNOME releases are closer to monthly. In the context of the broader industry where updates reach users multiple times a day, this is starting to look a little less forward-thinking! Of course, continuously deploying an entire OS to production is rather harder than continuously deploying web apps or apps in app stores, if only because the stakes are higher: you need a really robust automatic rollback mechanism to save your users’ plant-based bacon substitute if a new OS build fails to boot, or worse, contains an updater bug that prevents future updates being applied! Still, I believe that a bit of automation would go a long way in allowing module maintainers and the release team alike to spend their scarce mental energy on other things, and allow the project to increase the frequency of releases. What am I missing?