How many Flathub apps reuse other package formats?

Today I read Comparison of Fedora Flatpaks and Flathub remotes by Hari Rana, who is an active and valued member of the Flatpak community. The article is a well-researched and well-written overview of how these two Flatpak ecosystems differ, and contains the following remark about one major difference (emphasis mine):

Flathub is open with what source a Flatpak application (re)uses, whereas Fedora Flatpaks strictly reuses the RPM format.

As such, Flathub has tons of applications that reuse other package formats.

When this article was discussed in the Flatpak Matrix channel, several people wondered whether “tons” is a fair assessment. Let’s find out!

The specific examples given in the article are of apps which reuse a .deb (to which I will add .rpm), AppImage, Snap package, or binary .tar.gz archive. It’s not so easy to distinguish a binary tarball from a source tarball, so as a substitute I will look for apps which use the extra-data to download external sources at install time rather than at build time.

I have cloned every repo from the Flathub GitHub organisation with this script I had lying around. There are 2,220 such repositories. This is a bigger number than the 1,518 apps cited in the blog post, because it includes many thing which are not apps, such as 258 GTK themes and 60 digital audio workstation plugins. I also believe that the 1,518 number does not include end-of-lifed apps, whereas my methodology does. This post will also ignore the existence of OBS Studio and Firefox, where those projects build the Flatpak from source on their own infrastructure and push the result into Flathub.

Now I’m just going to grep them all for the offending strings:

$ (for i in */
do
    if git -C $i grep --quiet -E '(\.(deb|rpm|AppImage|snap)\>)|(extra-data)'
    then
        echo $i
    fi
done) | wc -l
237

(Splitting apart the search terms, we have 141 repos matching .deb, 10 for .rpm, 23 for .AppImage, 6 for .snap, and 110 for extra-data. These numbers don’t sum to 237 because the same repo can use multiple formats, and these binary files are often used by extra-data apps.)

So by my back-of-an-envelope calculation, 237 out of 2220 repos on Flathub repackage other binary formats. This is a little under 11%. Of those 237, 51 are GTK themes, specifically variations of the Mint, Pop and Yaru themes. If we assume that all the other 186 are apps, and that none of them are EOLed, then 186 divided by 1,518 gives us a little more than 12% of apps on Flathub that are repackaged from other binary formats. (I believe this is a slight overestimate but I have run out of time this morning.)

Is that a big number? It’s roughly what I expected. Is it “ton[ne]s”? Compared to Fedora’s Flatpak repo, where everything is built from source, it certainly is: indeed, it’s more than the total number of apps in the Fedora Flatpak repo!

If it is valuable for Flathub to provide proprietary apps like Slack whose publishers do not currently wish to support Flatpak (which I believe it is) then it’s unavoidable that some apps repackage other binary formats. OK, time for one last bit of data: what if we exclude extra-data apps?

$ (for i in */
do
    if ! git -C $i grep --quiet extra-data && \
       git -C $i grep --quiet -E '\.(deb|rpm|AppImage|snap)\>'
    then
        echo $i
    fi
done )| wc -l
127

So (ignoring non-extra-data apps which use binary tarballs, if any such apps exist) that’s something like 76 apps and 51 GTK themes which probably could be built from source by Flathub, but aren’t. It may be hard to build some of these apps from source (perhaps the upstream build system requires network access) but the rewards would include support for aarch64 and any other architectures Flathub may add, and arguably greater transparency in how the app is built.

If you want to do your own research in this vein, you may be interested in gasinvein‘s Flatpak remote metadata fetcher, which would let you generate and analyse a 200 MiB JSON file rather than by cloning and grep-ing 4.6 GiB of Git repositories. His analysis using this data yields 174 apps, quite close to my 186 estimate above.

./flatpak-remote-metadata.py -u https://dl.flathub.org/repo flathub | \
    jq -r '.[] | select(
        .manifest | objects | .modules[] | recurse(.modules | arrays | .[]) |
        .sources | arrays | .[] | .url | strings | test(".*.(deb|rpm|snap|AppImage)$")
    ) | .metadata.Application.name // .metadata.Runtime.name' | \
    sort -u | wc -l

Release (semi-)automation

The time I have available to maintain GNOME Initial Setup is very limited, as anyone who has looked at the commit history will have noticed. I’d love more eyes & hands on this important but easy-to-overlook component, particularly to guide it kindly but firmly into the modern age of GTK 4 and the refreshed HIG.

I found that making a batch of 1–3 releases across different GNOME branches every few months was surprisingly time-consuming and error-prone, even with the pretty comprehensive release process checklist on the GNOME Wiki, so I’ve been periodically trying to automate bits of it away.

Philip Withnall’s gitlab-changelog script makes writing the NEWS file a lot quicker. I taught it to output the human-readable names of each updated translation (a nice additional contribution would be to also include the name of the human who updated the translation) and made it a little smarter about guessing the Git commit range to scan.

Beyond that, I added a Meson run target, maintainer-upload-release pointing at a script which performs some rudimentary coherence checks on the version number, tags the release (using git-evtag if available), atomically pushes the branch and that tag to GNOME GitLab, then copies the source tarball to master.gnome.org. (Apparently it has been almost 12 years since I did something similar in telepathy-gabble, building on the make maintainer-upload-release target that Simon McVittie added in 2008, which is where I borrowed the name.) Maybe other module maintainers may find this script useful too – it’s quite generic.

Putting these together, the release flow looks like this:

git switch gnome-42
git pull
../pwithnall/gitlab-changelog/gitlab-changelog.py GNOME/gnome-initial-setup
# Manually edit NEWS to incorporate the changelog, adjusted as needed
# Manually check the version in meson.build
git commit -am 'NEWS for 42.Y'
ninja -C _build dist maintainer-upload-release

Another release-related quality-of-life improvement is to make GitLab CI not only build and test the project (in the vain hope that there might actually be tests!) but also check that the install and gnome-initial-setup-pot targets both work. (At one point or another both have failed at or around release time; now they never will again, famous last words.)

I know none of this is rocket science, but I find it all makes the process quicker and less cumbersome, and it’s stopped me from repeating errors like uploading the wrong version on a few tired evenings. Obviously this could all be taken further: perhaps a manually-invoked CI pipeline that does all this stuff, more checks, etc. But while I’m on this train of thought:

Why do we release GNOME modules one-by-one at all?

The workflow we use to release Endless OS is a bit different to GNOME. Once we merge a change to some module’s Git repository, such as eos-updater or our shrinking branch of GNOME Software, that change embarks on a scenic automated journey that takes it to the next nightly build of the entire OS, both as an OSTree update and as fresh installation media. I use these nightly builds for my daily work, safe in the knowledge that I can roll back to the previous build if necessary.

We don’t make releases of individual modules: instead, when it comes time to release the OS, we trigger a pipeline that (among many other things) pushes the already-built OS update to the production repo, and creates Release_x.y.z tags on each Git repo.

This was quite an adjustment for me at first, compared to lovingly hand-crafting NEWS files and coming up with funny/esoteric release names, but now that I’m used to it it’s hard to go back. Why can’t GNOME do the same?

At this point in the post, we are straying into territory that I have limited first-hand knowledge of. Caveat lector! But here goes:

Thanks to GNOME OS, GNOME already has nightly builds of the entire desktop and apps: so rather than having to build everything yourself, or wait for a development release of GNOME, you can just update & reboot your GNOME OS VM and test the change right there. gnome-build-meta knows how to build every GNOME module; and if you can build the code, it seems a conceptually small step to run ninja dist and the stuff above to publish tags and tarballs for each module.

So you could well imagine on 43.beta release day, someone in the release team could boot the latest GNOME OS nightly, declare it to be Good, and push a button that tags every relevant GNOME module & builds and uploads all the tarballs, and then go back to their day, rather than having to chase down module owners who haven’t quite got around to making the release, fix random build breakages, and so on.

To make this work reliably, I think you’d need every module’s CI to be run through gnome-build-meta, building that MR against the rest of the project, so that g-b-m build failures would be caught before (not after) the offending change lands in the module in question. Seems doable – in Endless we have the equivalent thing managed by a jenkins-job-builder template, the GitHub Pull Request Builder plugin, and a gnarly script.

Continuous integration and deployment are becoming the norm throughout the software industry, for good reasons laid out quite well in articles like Shipping Fast Changes Your Life: the smaller the gap between making a change and it reaching a user, the faster the feedback, and the less costly it is to fix a bug or change course.

The free software movement has historically been ahead of the curve on this, with the “release early, release often” philosophy. And GNOME in particular has used a time-based release process for two decades, allowing major distros to align their schedules to GNOME and get updates into the hands of users quickly, which went some way towards overcoming the fact that GNOME does not own the full pipeline from source code to end users.

Havoc Pennington’s June 2002 email proposing this model has aged rather well, in my opinion, and places a heavy emphasis on the development branch being usable:

The unstable branch must always be dogfood-quality. If testers can’t test it by using it daily, they can’t make the jump. If the unstable branch becomes too unstable, we can’t release it on a reliable schedule, so we have to start breaking the stable branch as a stopgap.

Interestingly the time-based release schedule wiki page states that the schedule should contain:

Regular test release dates, approximately every 2 weeks.

These days, GNOME releases are closer to monthly. In the context of the broader industry where updates reach users multiple times a day, this is starting to look a little less forward-thinking! Of course, continuously deploying an entire OS to production is rather harder than continuously deploying web apps or apps in app stores, if only because the stakes are higher: you need a really robust automatic rollback mechanism to save your users’ plant-based bacon substitute if a new OS build fails to boot, or worse, contains an updater bug that prevents future updates being applied! Still, I believe that a bit of automation would go a long way in allowing module maintainers and the release team alike to spend their scarce mental energy on other things, and allow the project to increase the frequency of releases. What am I missing?