Thoughts on Flatpak after four months of Epiphany Technology Preview

May 27, 2018

It’s been four months since I announced Epiphany Technology Preview — which I’ve been using as my main browser ever since — and five months since I announced the availability of a stable channel via Flatpak. For the most part, it’s been a good experience. Having the latest upstream development code for everything is wonderful and makes testing very easy. Any user can painlessly download and install either the latest stable version or the bleeding-edge development version on any Linux system, regardless of host dependencies, either via a couple clicks in GNOME Software or one command in the terminal. GNOME Software keeps it updated, so I always have a recent version. Thanks to this, I’m often noticing problems shortly after they’re introduced, rather than six months later, as was so often the case for me in the past. Plus, other developers can no longer complain that there’s a problem with my local environment when I report a bug they can’t reproduce, because Epiphany Technology Preview is a canonical distribution environment, a ground truth of sorts.

There have been some rough patches where Epiphany Technology Preview was not working properly — sometimes for several days — due to various breaking changes, and the long time required to get a successful SDK build when it’s failing. For example, multimedia playback was broken for all of last week, due to changes in how the runtime is built. H.264 video is still broken, since the relevant Flatpak extension is only compatible with the 3.28 runtime, not with master. Opening files was broken for a while due to what turned out to be a bug in mutter that was causing the OpenURI portal to crash. I just today found another bug where closing a portal while visiting Slack triggered a gnome-shell crash. For the most part, these sorts of problems are expected by testers of unstable nightly software, though I’m concerned about the portal bugs because these affect stable users too. Anyway, these are just bugs, and all software has bugs: they get fixed, nothing special.

So my impression of Flatpak is still largely positive. Flatpak does not magically make our software work properly in all host environments, but it hugely reduces the number of things that can go wrong on the host system. In recent years, I’ve seen users badly break Epiphany in various ways, e.g. by installing custom mimeinfo or replacing the network backend. With Flatpak, either of these would require an incredible amount of dedicated effort. Without a doubt, Flatpak distribution is more robust to user error. Another advantage is that we get the latest versions of OS dependencies, like GStreamer, libsoup, and glib-networking, so we can avoid the many bugs in these components that have been fixed in the years since our users’ LTS distros froze the package versions. I appreciate the desire of LTS distros to provide stability for users, but at the same time, I’m not impressed when users report issues with the browser that we fixed two years ago in one dependency or another. Flatpak is an excellent compromise solution to this problem: the LTS distro retains an LTS core, but specific applications can use newer dependencies from the Flatpak runtime.

But there is one huge downside to using Flatpak: we lose crash reports. It’s at best very difficult — and often totally impossible — to investigate crashes when using Flatpak, and that’s frankly more important than any of the gains I mention above. For example, today Epiphany Technology Preview is crashing pretty much constantly. It’s surely a bug in WebKit, but that’s all I can figure out. The way to get a backtrace from a crashing app in flatpak is to use coredumpctl to manually dump the core dump to disk, then launch a bash shell in the flatpak environment and manually load it up in gdb. The process is manual, old-fashioned, primitive, and too frustrating for me by a lot, so I wrote a little pyexpect script to automate this process for Epiphany, thinking I could eventually generalize it into a tool that would be useful for other developers. It’s a horrible hack, but it worked pretty well the day I tested it. I haven’t seen it work since. Debuginfo seems to be constantly broken, so I only see a bunch of ???s in my backtraces, and how are we supposed to figure out how to debug that? So I have no way to debug or fix the WebKit bug, because I can’t get a backtrace. The broken, inconsistent, or otherwise-unreliable debuginfo is probably just some bug that will be fixed eventually (and which I half suspect may be related to our recent freedesktop SDK upgrade. Update: Alex has debugged the debuginfo problem and it looks like that’s on track to be solved), but even once it is, we’re back to square one: it’s still too much effort to get the backtrace, relative to developing on the host system, and that’s a hard problem to solve. It requires tools that do not exist, and for which we have no plans to create, or even any idea of how to create them.

This isn’t working. I need to be able to effortlessly get a backtrace out of my application, with no or little more effort than running coredumpctl gdb as I would without Flatpak in the way. Every time I see Epiphany or WebKit crash, knowing I can do nothing to debug or investigate, I’m very sorely tempted to switch back to using Fedora’s Epiphany, or good old JHBuild. (I can’t promote BuildStream here, because BuildStream has the same problem.)

So the developer experience is just not good, but set that aside: the main benefits of Flatpak are for users, not developers, after all. Now, what if users run into a crash, how can they report the bug? Crash reports are next to useless without a backtrace, and wise developers refuse to look at crash reports until a quality backtrace has been posted. So first we need to fix the developer experience to work properly, but even then, it’s not enough: we need an automatic crash reporter, along the lines of ABRT or apport, to make reporting crashes realistically-achievable for users, as it already is for distro-packaged apps. But this is a much harder problem to solve. Such a tool will require integration with coredumpctl, and I have not the faintest clue how we could go about making coredumpctl support container environments. Yet without this, we’re asking application developers to give up their most valuable data — crash reports — in order to use Flatpak.

Eventually, if we don’t solve crash reporting, Epiphany’s experiment with Flatpak will have to come to an end, because that’s more important to me than the (admittedly-tremendous) benefits of Flatpak. I’m still hopeful that the ingenuity of the Flatpak community will find some solutions. We’ll see how this goes.

Comments

4 responses to “Thoughts on Flatpak after four months of Epiphany Technology Preview”

May 27, 2018

John McHugh

There is https://github.com/tsondergaard/raven-crashdump which might help a bit. Sentry integrates well with Gitlab as well so there is that.
May 28, 2018

Adrián Pérez

Many systemd components already have some degree of support for containers. For example “systemd-journald” will send logs to the host, and “journalctl -M ” can then show logs for a particular guest virtual machine (or container). I think these features require the guest to be registered with “systemd-machined”, which seems a bit too much for application containers (like the case of Flatpak). Now, I don’t know whether “systemd-coredumpd” can grab core dumps from containers today, but I am sure that we can engage with the systemd team to find some middle-ground solution which does not require complete registration of guests while still getting core dumps saved.
1. May 28, 2018
  
  Michael Catanzaro
  
  There is a bug report to add container support to systemd-coredump, https://github.com/systemd/systemd/issues/4791, but they’re talking about an -M flag so I assume that’s going to require registration with systemd-machined, yes. I’m not sure why that would be a problem for flatpak, though. A flatpak really is a container, after all. That’s what “machine” means in the context of systemd.
May 29, 2018

Zbigniew Jędrzejewski-Szmek

systemd-coredump *does* have some support for containers, mostly at the low level. Unfortunately the tooling (in particular coredumpctl) doesn’t really take advantage of this.

I just made a “crash” (kill -SEGV $$) in a contain, and ‘journctl -o verbose -n1 MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1’ contains additional information:
…
COREDUMP_UNIT=machine-rawhide.scope
COREDUMP_HOSTNAME=rawhide
COREDUMP_CONTAINER_CMDLINE=systemd-nspawn -M rawhide –bind=/home
COREDUMP_PROC_MOUNTINFO=990 961 253:1 /var/lib/machines/rawhide / rw,relatime shared:382 master:1
…

For a flatpak app:
COREDUMP_UNIT=user@1000.service
COREDUMP_USER_UNIT=flatpak-com.spotify.Client-18778.scope
COREDUMP_PID=18802
COREDUMP_UID=1000
COREDUMP_GID=1000
COREDUMP_SIGNAL=6
COREDUMP_RLIMIT=18446744073709551615
COREDUMP_COMM=spotify
COREDUMP_EXE=/newroot/app/extra/share/spotify/spotify
COREDUMP_OWNER_UID=1000
COREDUMP_SLICE=user-1000.slice
COREDUMP_CMDLINE=/app/extra/share/spotify/spotify
COREDUMP_CGROUP=/user.slice/user-1000.slice/user@1000.service/flatpak-com.spotify.Client-18778.scope
COREDUMP_PROC_CGROUP=0::/user.slice/user-1000.slice/user@1000.service/flatpak-com.spotify.Client-18778.scope
COREDUMP_PROC_MOUNTINFO=795 794 253:1 /var/lib/flatpak/runtime/org.freedesktop.Platform/x86_64/1.6/efe5101028887f5bf41c61751305f1fbc4d>
796 794 253:1 /var/lib/flatpak/app/com.spotify.Client/x86_64/stable/37fb3d48a9a1c13981ab8a3fddc73002c6cdbe6b51>
797 795 253:1 /var/lib/flatpak/runtime/org.freedesktop.Platform.Locale/x86_64/1.6/ab59ec512ad70138d0ce5d3c95fa>
…

So the information is there. And if something is missing, we’ll be happy to add it.
I think it should be doable to make ‘coredumpctl gdb’ do something smart in case of chrooted/containerized applications. The first steps would be to figure out if all interesting information is stored in the journal entry, and how gdb should be invoked to provide the best debugging experience. As systemd upstream, we certainly want to this to work.