On Flatpak Nightlies

Here’s a little timeline of some fun we had with the GNOME master Flatpak runtime last week:

  • Tuesday, July 10: a bad runtime build is published.  Trying to start any application results in error while loading shared libraries: libdw.so.1: cannot open shared object file: No such file or directory. Problem is the library is present in org.gnome.Sdk instead of org.gnome.Platform, where it is required.
  • Thursday, July 12:  the bug is reported on WebKit Bugzilla (since it broke Epiphany Technology Preview)
  • Saturday, July 14: having returned from GUADEC, I notice the bug report and bisect the issue to a particular runtime build. Mathieu Bridon fixes the issue in the freedesktop SDK and opens a merge request.
  • Monday, July 16: Mathieu’s fix is committed. We now have to wait until Tuesday for the next build.
  • Tuesday, Wednesday, and Thursday: we deal with various runtime build failures. Each day, we get a new build log and try to fix whatever build failure is reported. Then, we wait until the next day and see what the next failure is. (I’m not aware of any way to build the runtime locally. No doubt it’s possible somehow, but there are no instructions for doing so.)
  • Friday, July 20: we wait. The build has succeeded and the log indicates the build has been published, but it’s not yet available via flatpak update
  • Saturday, July 21: the successful build is now available. The problem is fixed.

As far as I know, it was not possible to run any nightly applications during this two week period, except developer applications like Builder that depend on org.gnome.Sdk instead of the normal org.gnome.Platform. If you used Epiphany Technology Preview and wanted a functioning web browser, you had to run arcane commands to revert to the last good runtime version.

This multi-week response time is fairly typical for us. We need to improve our workflow somehow. It would be nice to be able to immediately revert to the last good build once a problem has been identified, for instance.

Meanwhile, even when the runtime is working fine, some apps have been broken for months without anyone noticing or caring. Perhaps it’s time for a rethink on how we handle nightly apps. It seems likely that only a few apps, like Builder and Epiphany, are actually being regularly used. The release team has some hazy future plans to take over responsibility for the nightly apps (but we have to take over the runtimes first, since those are more important), and we’ll need to somehow avoid these issues when we do so. Having some form of notifications for failed builds would be a good first step.

P.S. To avoid any possible misunderstandings: the client-side Flatpak technology itself is very good. It’s only the server-side infrastructure that is problematic here. Clearly we have a lot to fix, but it won’t require any changes in Flatpak.

Thank you, address sanitizer developers

I don’t often write useless blog posts, but today will be an exception. The address sanitizer (asan) is a ludicrously good tool. The developers deserve a huge thank you.

There are other good sanitizers too, but asan is what I have been using recently.

Thank you!

Security vulnerability in Epiphany Technology Preview

If you use Epiphany Technology Preview, please update immediately and ensure you have revision 3.29.2-26 or newer. We discovered and resolved a vulnerability that allowed websites to access internal Epiphany features and thereby exfiltrate passwords from the password manager. We apologize for this oversight.

The unstable Epiphany 3.29.2 release is the only affected release. Epiphany 3.29.1 is not affected. Stable releases, including Epiphany 3.28, are also not affected.

There is no reason to believe that the issue was discovered or exploited by any attackers, but you might wish to change your passwords if you are concerned.

Thoughts on Flatpak after four months of Epiphany Technology Preview

It’s been four months since I announced Epiphany Technology Preview — which I’ve been using as my main browser ever since — and five months since I announced the availability of a stable channel via Flatpak. For the most part, it’s been a good experience. Having the latest upstream development code for everything is wonderful and makes testing very easy. Any user can painlessly download and install either the latest stable version or the bleeding-edge development version on any Linux system, regardless of host dependencies, either via a couple clicks in GNOME Software or one command in the terminal. GNOME Software keeps it updated, so I always have a recent version. Thanks to this, I’m often noticing problems shortly after they’re introduced, rather than six months later, as was so often the case for me in the past. Plus, other developers can no longer complain that there’s a problem with my local environment when I report a bug they can’t reproduce, because Epiphany Technology Preview is a canonical distribution environment, a ground truth of sorts.

There have been some rough patches where Epiphany Technology Preview was not working properly — sometimes for several days — due to various breaking changes, and the long time required to get a successful SDK build when it’s failing. For example, multimedia playback was broken for all of last week, due to changes in how the runtime is built. H.264 video is still broken, since the relevant Flatpak extension is only compatible with the 3.28 runtime, not with master. Opening files was broken for a while due to what turned out to be a bug in mutter that was causing the OpenURI portal to crash. I just today found another bug where closing a portal while visiting Slack triggered a gnome-shell crash. For the most part, these sorts of problems are expected by testers of unstable nightly software, though I’m concerned about the portal bugs because these affect stable users too. Anyway, these are just bugs, and all software has bugs: they get fixed, nothing special.

So my impression of Flatpak is still largely positive. Flatpak does not magically make our software work properly in all host environments, but it hugely reduces the number of things that can go wrong on the host system. In recent years, I’ve seen users badly break Epiphany in various ways, e.g. by installing custom mimeinfo or replacing the network backend. With Flatpak, either of these would require an incredible amount of dedicated effort. Without a doubt, Flatpak distribution is more robust to user error. Another advantage is that we get the latest versions of OS dependencies, like GStreamer, libsoup, and glib-networking, so we can avoid the many bugs in these components that have been fixed in the years since our users’ LTS distros froze the package versions. I appreciate the desire of LTS distros to provide stability for users, but at the same time, I’m not impressed when users report issues with the browser that we fixed two years ago in one dependency or another. Flatpak is an excellent compromise solution to this problem: the LTS distro retains an LTS core, but specific applications can use newer dependencies from the Flatpak runtime.

But there is one huge downside to using Flatpak: we lose crash reports. It’s at best very difficult — and often totally impossible — to investigate crashes when using Flatpak, and that’s frankly more important than any of the gains I mention above. For example, today Epiphany Technology Preview is crashing pretty much constantly. It’s surely a bug in WebKit, but that’s all I can figure out. The way to get a backtrace from a crashing app in flatpak is to use coredumpctl to manually dump the core dump to disk, then launch a bash shell in the flatpak environment and manually load it up in gdb. The process is manual, old-fashioned, primitive, and too frustrating for me by a lot, so I wrote a little pyexpect script to automate this process for Epiphany, thinking I could eventually generalize it into a tool that would be useful for other developers. It’s a horrible hack, but it worked pretty well the day I tested it. I haven’t seen it work since. Debuginfo seems to be constantly broken, so I only see a bunch of ???s in my backtraces, and how are we supposed to figure out how to debug that? So I have no way to debug or fix the WebKit bug, because I can’t get a backtrace. The broken, inconsistent, or otherwise-unreliable debuginfo is probably just some bug that will be fixed eventually (and which I half suspect may be related to our recent freedesktop SDK upgrade. Update: Alex has debugged the debuginfo problem and it looks like that’s on track to be solved), but even once it is, we’re back to square one: it’s still too much effort to get the backtrace, relative to developing on the host system, and that’s a hard problem to solve. It requires tools that do not exist, and for which we have no plans to create, or even any idea of how to create them.

This isn’t working. I need to be able to effortlessly get a backtrace out of my application, with no or little more effort than running coredumpctl gdb as I would without Flatpak in the way. Every time I see Epiphany or WebKit crash, knowing I can do nothing to debug or investigate, I’m very sorely tempted to switch back to using Fedora’s Epiphany, or good old JHBuild. (I can’t promote BuildStream here, because BuildStream has the same problem.)

So the developer experience is just not good, but set that aside: the main benefits of Flatpak are for users, not developers, after all. Now, what if users run into a crash, how can they report the bug? Crash reports are next to useless without a backtrace, and wise developers refuse to look at crash reports until a quality backtrace has been posted. So first we need to fix the developer experience to work properly, but even then, it’s not enough: we need an automatic crash reporter, along the lines of ABRT or apport, to make reporting crashes realistically-achievable for users, as it already is for distro-packaged apps. But this is a much harder problem to solve. Such a tool will require integration with coredumpctl, and I have not the faintest clue how we could go about making coredumpctl support container environments. Yet without this, we’re asking application developers to give up their most valuable data — crash reports — in order to use Flatpak.

Eventually, if we don’t solve crash reporting, Epiphany’s experiment with Flatpak will have to come to an end, because that’s more important to me than the (admittedly-tremendous) benefits of Flatpak. I’m still hopeful that the ingenuity of the Flatpak community will find some solutions. We’ll see how this goes.

On setenv() and Explosions

Skim over this abbreviated and badly-formatted function definition, from glibpacrunner.c:

static void
handle_method_call (GDBusConnection *connection, /* ... */)
{
/* ... */

if (/* ... */)
{
gchar *libproxy_url = g_strdup_printf ("pac+%s", pac_url);
g_setenv ("http_proxy", libproxy_url, TRUE);
g_free (libproxy_url);
}
else
g_setenv ("http_proxy", "wpad://", TRUE);

g_proxy_resolver_lookup_async (/* ... */);
}

Note:

  1. This is a GDBus method call handler, so you know GDBus is currently running, and you know this is a multithreaded application.
  2. The code calls setenv().

Because you’re a careful programmer, you check the manpage before using a function, and read all the way to the bottom, so you know that setenv() is not threadsafe. Unless your application is trivial and links to no libraries, you can be pretty confident that it is running secondary threads, and that, somewhere, those threads call getenv(), because getenv() is used pervasively in libraries. (A frequent culprit for internationalized applications is gettext(), conventionally defined as the underscore function _().)

So, what does this mean in practice? It means that if any other thread is calling getenv() at the same time that you call setenv(), one or the other call will blow up. This is a timing-dependent race that will only happen occasionally. Thanks to Murphy, you can expect it to only occur on the hardware that your important customer is running, and never for your developers when they attempt to reproduce the issue.

The safest solution is to ensure your application only ever calls setenv() at the very top of main(), before your first secondary thread is initialized. It can be quite hard to know which library calls will initialize secondary threads. (What about if you use GOptionContext? Or GApplication? If you need to check a setting before changing the environment, how many threads does a GSettings object spawn when you create it? Even if it was zero — and it’s not — what if a custom GIO module was in use?) Your best bet is to restrict yourself to using setenv() at the very, very top of main(), where you know it’s safe. This applies to unsetenv(), as well.

This is not some theoretical problem. In practice, we have documented:

  • Unsafe use of setenv() in gnome-session would cause gnome-session to crash, bringing down the entire desktop session and returning the user to the login screen.
  • In a downstream product using a modified version of gnome-initial-setup, unsafe use of setenv() to change LANG on the language selection page would cause the first boot experience to crash, leaving the sufficiently-unlucky user staring at an empty unusable desktop after turning on her new computer.

Both of these issues were rare races that were not possible to reproduce.

In a sense, setenv() is not really special, in that lots of POSIX functions modify global state and are unsafe to use across threads. The problem with setenv() is that, once such an issue is identified, it can be very difficult to fix. Sometimes there is an alternative to setting an environment variable, but often environment variables are the only way to configure important settings. If you have no other choice but to set an environment variable, then there is not really much you can do other than risk randomly crashing. Lesson: never make an environment variable the only way for an application to configure an important setting at runtime. Your inconsiderate mandatory environment variable might not be a problem for your own code, but it will be a problem for unforeseen future code that needs to use your code. Mandatory environment variables have caused many problems in GNOME, some unfixable on platforms without library constructors, and some frankly ludicrous. (libproxy, you are not batting very well right now.) The Linux desktop is already stuck with many mandatory environment variables that we cannot get rid of; we do not need to keep adding more. Please, please, please: learn from our mistakes.

I have no clue how to fix glib-pacrunner. It’s already a separate D-Bus service just to allow it to use unsetenv() (which it does safely) to trick libproxy. I suppose fixing this would require spawning a second helper process in order to run setenv() before the proxy lookup. More likely, we’ll just leave it broken, since to my knowledge, nobody has reported a crash here yet.

Software is wonderful!

On Compiling WebKit (now twice as fast!)

Are you tired of waiting for ages to build large C++ projects like WebKit? Slow headers are generally the problem. Your C++ source code file #includes a few headers, all those headers #include more, and those headers #include more, and more, and more, and since it’s C++ a bunch of these headers contain lots of complex templates to slow down things even more. Not fun.

It turns out that much of the time spent building large C++ projects is effectively spent parsing the same headers again and again, over, and over, and over, and over, and over….

There are three possible solutions to this problem:

  • Shred your CPU and buy a new one that’s twice as fast.
  • Use C++ modules: import instead of #include. This will soon become the best solution, but it’s not standardized yet. For WebKit’s purposes, we can’t use it until it works the same in MSVCC, Clang, and three-year-old versions of GCC. So it’ll be quite a while before we’re able to take advantage of modules.
  • Use unified builds (sometimes called unity builds).

WebKit has adopted unified builds. This work was done by Keith Miller, from Apple. Thanks, Keith! (If you’ve built WebKit before, you’ll probably want to say that again: thanks, Keith!)

For a release build of WebKitGTK+, on my desktop, our build times used to look like this:

real 62m49.535s
user 407m56.558s
sys 62m17.166s

That was taken using WebKitGTK+ 2.17.90; build times with any 2.18 release would be similar. Now, with trunk (or WebKitGTK+ 2.20, which will be very similar), our build times look like this:

real 33m36.435s
user 214m9.971s
sys 29m55.811s

Twice as fast.

The approach is pretty simple: instead of telling the compiler to build the original C++ source code files that developers see, we instead tell the compiler to build unified source files that look like this:

// UnifiedSource1.cpp
#include "CSSValueKeywords.cpp"
#include "ColorData.cpp"
#include "HTMLElementFactory.cpp"
#include "HTMLEntityTable.cpp"
#include "JSANGLEInstancedArrays.cpp"
#include "JSAbortController.cpp"
#include "JSAbortSignal.cpp"
#include "JSAbstractWorker.cpp"

Since files are included only once per translation unit, we now have to parse the same headers only once for each unified source file, rather than for each individual original source file, and we get a dramatic build speedup. It’s pretty terrible, yet extremely effective.

Now, how many original C++ files should you #include in each unified source file? To get the fastest clean build time, you would want to #include all of your C++ source files in one, that way the compiler sees each header only once. (Meson can do this for you automatically!) But that causes two problems. First, you have to make sure none of the files throughout your entire codebase use conflicting variable names, since the static keyword and anonymous namespaces no longer work to restrict your definitions to a single file. That’s impractical in a large project like WebKit. Second, because there’s now only one file passed to the compiler, incremental builds now take as long as clean builds, which is not fun if you are a WebKit developer and actually need to make changes to it. Unifying more files together will always make incremental builds slower. After some experimentation, Apple determined that, for WebKit, the optimal number of files to include together is roughly eight. At this point, there’s not yet much negative impact on incremental builds, and past here there are diminishing returns in clean build improvement.

In WebKit’s implementation, the files to bundle together are computed automatically at build time using CMake black magic. Adding a new file to the build can change how the files are bundled together, potentially causing build errors in different files if there are symbol clashes. But this is usually easy to fix, because only files from the same directory are bundled together, so random unrelated files will never be built together. The bundles are always the same for everyone building the same version of WebKit, so you won’t see random build failures; only developers who are adding new files will ever have to deal with name conflicts.

To significantly reduce name conflicts, we now limit the scope of using statements. That is, stuff like this:

using namespace JavaScriptCore;
namespace WebCore {
//...
}

Has been changed to this:

namespace WebCore {
using namespace JavaScriptCore;
// ...
}

Some files need to be excluded due to unsolvable name clashes. For example, files that include X11 headers, which contain lots of unnamespaced symbols that conflict with WebCore symbols, don’t really have any chance. But only a few files should need to be excluded, so this does not have much impact on build time. We’ve also opted to not unify most of the GLib API layer, so that we can continue to use conventional GObject names in our implementation, but again, the impact of not unifying a few files is minimal.

We still have some room for further performance improvement, because some significant parts of the build are still not unified, including most of the WebKit layer on top. But I suspect developers who have to regularly build WebKit will already be quite pleased.

On Python Shebangs

So, how do you write a shebang for a Python program? Let’s first set aside the python2/python3 issue and focus on whether to use env. Which of the following is correct?

#!/usr/bin/env python
#!/usr/bin/python

The first option seems to work in all environments, but it is banned in popular distros like Fedora (and I believe also Debian, but I can’t find a reference for this). Using env in shebangs is dangerous because it can result in system packages using non-system versions of python. python is used in so many places throughout modern systems, it’s not hard to see how using #!/usr/bin/env in an important package could badly bork users’ operating systems if they install a custom version of python in /usr/local. Don’t do this.

The second option is broken too, because it doesn’t work in BSD environments. E.g. in FreeBSD, python is installed in /usr/local/bin. So FreeBSD contributors have been upstreaming patches to convert #!/usr/bin/python shebangs to #!/usr/bin/env python. Meanwhile, Fedora has begun automatically rewriting #!/usr/bin/env python to #!/usr/bin/python, but with a warning that this is temporary and that use of #!/usr/bin/env python will eventually become a fatal error causing package builds to fail.

So obviously there’s no way to write a shebang that will work for both major Linux distros and major BSDs. #!/usr/bin/env python seems to work today, but it’s subtly very dangerous. Lovely. I don’t even know what to recommend to upstream projects.

Next problem: python2 versus python3. By now, we should all be well-aware of PEP 394. PEP 394 says you should never write a shebang like this:

#!/usr/bin/env python
#!/usr/bin/python

unless your python script is compatible with both python2 and python3, because you don’t know what version you’re getting. Your python script is almost certainly not compatible with both python2 and python3 (and if you think it is, it’s probably somehow broken, because I doubt you regularly test it with both). Instead, you should write the shebang like this:

#!/usr/bin/env python2
#!/usr/bin/python2
#!/usr/bin/env python3
#!/usr/bin/python3

This works as long as you only care about Linux and BSDs. It doesn’t work on macOS, which provides /usr/bin/python and /usr/bin/python2.7, but still no /usr/bin/python2 symlink, even though it’s now been six years since PEP 394. It’s hard to understate how frustrating this is.

So let’s say you are WebKit, and need to write a python script that will be truly cross-platform. How do you do it? WebKit’s scripts are only needed (a) during the build process or (b) by developers, so we get a pass on the first problem: using /usr/bin/env should be OK, because the scripts should never be installed as part of the OS. Using #!/usr/bin/env python — which is actually what we currently do — is unacceptable, because our scripts are python2 and that’s broken on Arch, and some of our developers use that. Using #!/usr/bin/env python2 would be dead on arrival, because that doesn’t work on macOS. Seems like the option that works for everyone is #!/usr/bin/env python2.7. Then we just have to hope that the Python community sticks to its promise to never release a python2.8 (which seems likely).

…yay?

Announcing Epiphany Technology Preview

If you use macOS, the best way to use a recent development snapshot of WebKit is surely Safari Technology Preview. But until now, there’s been no good way to do so on Linux, short of running a development distribution like Fedora Rawhide.

Enter Epiphany Technology Preview. This is a nightly build of Epiphany, on top of the latest development release of WebKitGTK+, running on the GNOME master Flatpak runtime. The target audience is anyone who wants to assist with Epiphany development by testing the latest code and reporting bugs, so I’ve added the download link to Epiphany’s development page.

Since it uses Flatpak, there are no host dependencies asides from Flatpak itself, so it should work on any system that can run Flatpak. Thanks to the Flatpak sandbox, it’s far more secure than the version of Epiphany provided by your operating system. And of course, you enjoy automatic updates from GNOME Software or any software center that supports Flatpak.

Enjoy!

(P.S. If you want to use the latest stable version instead, with all the benefits provided by Flatpak, get that here.)

Epiphany Stable Flatpak Releases

The latest stable version of Epiphany is now available on Flathub. Download it here. You should be able to double click the flatpakref to install it in GNOME Software, if you use any modern GNOME operating system not named Ubuntu. But, in my experience, GNOME Software is extremely buggy, and it often as not does not work for me. If you have trouble, you can use the command line:

flatpak install --from https://flathub.org/repo/appstream/org.gnome.Epiphany.flatpakref

This has actually been available for quite a while now, but I’ve delayed announcing it because some things needed to be fixed to work well under Flatpak. It’s good now.

I’ve also added a download link to Epiphany’s webpage, so that you can actually, you know, download and install the software. That’s a useful thing to be able to do!

Benefits

The obvious benefit of Flatpak is that you get the latest stable version of Epiphany (currently 3.26.5) and WebKitGTK+ (currently 2.18.3), no matter which version is shipped in your operating system.

The other major benefit of Flatpak is that the browser is protected by Flatpak’s top-class bubblewrap sandbox. This is, of course, a UI process sandbox, which is different from the sandboxing model used in other browsers, where individual browser tabs are sandboxed from each other. In theory, the bubblewrap sandbox should be harder to escape than the sandboxes used in other major browsers, because the attack surface is much smaller: other browsers are vulnerable to attack whenever IPC messages are sent between the web process and the UI process. Such vulnerabilities are mitigated by a UI process sandbox. The disadvantage of this approach is that tabs are not sandboxed from each other, as they would be with a web process sandbox, so it’s easier for a compromised tab to do bad things to your other tabs. I’m not sure which approach is better, but clearly either way is much better than having no sandbox at all. (I still hope to have a web process sandbox working for use when WebKit is used outside of Flatpak, but that’s not close to being ready yet.)

Problems

Now, there are a couple of loose ends. We do not yet have desktop notifications working under Flatpak, and we also don’t block the screen from turning off when you’re watching fullscreen video, so you’ll have to wiggle your mouse every five minutes or so when you’re watching YouTube to keep the lights on. These should not be too hard to fix; I’ll try to get them both working soon. Also, drag and drop does not work. I’m not nearly brave enough to try fixing that, though, so you’ll just have to live without drag and drop if you use the Flatpak version.

Also, unfortunately the stable GNOME runtimes do not receive regular updates. So while you get the latest version of Epiphany, most everything else will be older. This is not good. I try to make sure that WebKit gets updated, so you’ll have all the latest security updates there, but everything else is generally stuck at older versions. For example, the 3.26 runtime uses, for the most part, whatever software versions were current at the time of the 3.26.1 release, and any updates newer than that are just not included. That’s a shame, but the GNOME release team does not maintain GNOME’s Flatpak runtimes: we have three other other redundant places to store the same build information (JHBuild, GNOME Continuous, BuildStream) that we need to take care of, and adding yet another is not going to fly. Hopefully this situation will change soon, though, since we should be able to use BuildStream to replace the current JSON manifest that’s used to generate the Flatpak runtimes and keep everything up to date automatically. In the meantime, this is a problem to be aware of.

Product review: WASD V2 Keyboard

A new blog on Planet GNOME often means an old necropost for us residents of the future to admire.

I, too, bought a custom keyboard from WASD. It is quite nice to be able to customize the printing using an SVG file. Yes, my keyboard has GNOME feet on the super keys, and a Dvorak layout, and, oh yes, Cantarell font. Yes, Cantarell was silly, and yes, it means bad kerning, but it is kind of cool to know I’m probably the only person on the planet to have a Cantarell keyboard.

It was nice for a little under one year. Then I noticed that the UV printing on some of the keys was beginning to wear off. WASD lets you purchase individual keycaps at a reasonable price, and I availed myself of that option for a couple keys that needed it, and then a couple more. But now some of the replacement keycaps need to be replaced, and I’ve owned the keyboard for just over a year and a half. It only makes sense to purchase a product this expensive if it’s going to last.

I discovered that MAX Keyboard offers custom keyboard printing using SVG files, and their keycaps are compatible with WASD. I guess it’s a clone of WASD’s service, because I’ve never heard of MAX before, but I don’t actually know which came first. Anyway, you can buy just the keycaps without the keyboard, for a reasonable price. But they apparently use a UV printing process, which is what WASD does, so I have no clue if MAX will hold up any better or not. I decided not to purchase it. (At least, not now. Who knows what silly things I might do in the future.) Instead, I purchased a blank PBT keycap set from them. It arrived yesterday, and it seems nice. It’s a slightly different shade of black than WASD’s keycaps, but that’s OK. Hopefully these will hold up better, and I won’t need to replace the entire keyboard. And hopefully I don’t find I need to look at the keys to find special characters or irregularly-used functions like PrintScreen and media keys. We’ll see.