Category: Fedora

WebKitGTK API Versions Demystified
WebKitGTK has a bunch of different confusing API versions. Here are the three API versions that are currently supported by upstream:
- webkitgtk-6.0: This is WebKitGTK for GTK 4 (and libsoup 3), introduced in WebKitGTK 2.40. This is what’s built by default if you build WebKit with -DPORT=GTK.
- webkit2gtk-4.1: This is WebKitGTK for GTK 3 and libsoup 3, introduced in WebKitGTK 2.32. Get this by building with -DPORT=GTK -DUSE_GTK3=ON.
- webkit2gtk-4.0: This is WebKitGTK for GTK 3 and libsoup 2, introduced in WebKitGTK 2.6. Get this by building with -DPORT=GTK -DUSE_GTK3=ON -DUSE_SOUP2=ON.
webkitgtk-6.0 contains a bunch of API changes, and all deprecated APIs were removed. If you’re upgrading to webkitgtk-6.0, then you’re also upgrading your application from GTK 3 to GTK 4, and have to adapt to much bigger GTK API changes anyway, so this seemed like a good opportunity to break compatibility and fix old mistakes for the first time in a very long time.

webkit2gtk-4.1 is exactly the same as webkit2gtk-4.0, except for the libsoup API version that it links against.

webkit2gtk-4.0 is — remarkably — mostly API stable since it was released in September 2014. Some particular C APIs have been deprecated and don’t work properly anymore, but no stable C APIs have been removed during this time, and library ABI is maintained. (Caveat: WebKitGTK used to have unstable DOM APIs, some of which were removed before the DOM API was eventually stabilized. Nowadays, the DOM APIs are all stable but deprecated in webkit2gtk-4.1 and webkit2gtk-4.0, and removed in webkitgtk-6.0.)

If you are interested in history, here are the older API versions that do not matter anymore:
- webkit2gtk-5.0: This was an unstable API version used during development of the GTK 4 API, intended for WebKit developers rather than application developers. It was obsoleted by webkitgtk-6.0.
- webkit2gtk-3.0: original API version of WebKitGTK for GTK 3 and libsoup 2, obsoleted by webkit2gtk-4.0. This was the original “WebKit 2” API version, which was only used by a few applications before it was removed one decade ago (history).
- webkitgtk-3.0: note the missing “2”, this is “WebKit 1” (predating the modern multiprocess architecture) for GTK 3. This API version was widely-used on Linux, and its removal one decade ago precipitated a security crisis which I called The Great API Break. (This crisis was worth it. Modern WebKit’s multiprocess architecture is far more secure than the old single-process architecture.)
- webkitgtk-1.0: the original WebKitGTK API version, this is “WebKit 1” for GTK 2. This API version was also widely-used on Linux before it was removed in The Great API Break.
Fedora and RHEL users, are you confused by all the different confusing downstream package names? Here is your map:
- webkitgtk6.0, webkit2gtk4.1, and webkit2gtk4.0: This is the current binary package naming in Fedora, corresponding precisely to the WebKitGTK API version to reduce confusion.
- webkit2gtk3: old name for webkit2gtk4.0, still used in RHEL 9 and RHEL 8
- webkitgtk4: even older name for webkit2gtk4.0, still used in RHEL 7
- webkitgtk3: this is the webkitgtk-3.0 API version, still used in RHEL 7
- webkitgtk: this is webkitgtk-1.0, used in RHEL 6
Notably, webkit2gtk4.0, webkit2gtk3, and webkitgtk4 are all the same thing.
April 28, 2025
Safety on Matrix
Hello Matrix users, please review the following:
- Update from Matrix Trust & Safety team regarding attacks on the Matrix network
- GNOME’s new Matrix safety guidelines
Be cautious with unexpected private message invites. Do not accept private message invites from users you do not recognize. If you want to talk to somebody who has rejected your invite, contact them in a public room first.

Fractal users: Fractal 11.rc will block media preview by default in public rooms, and can be configured to additionally block media preview in private rooms if desired.

Matrix server administrators: please review the Matrix.org Foundation announcement Introducing Policy Servers.
April 21, 2025
I Entered My GitHub Credentials into a Phishing Website!
We all think we’re smart enough to not be tricked by a phishing attempt, right? Unfortunately, I know for certain that I’m not, because I entered my GitHub password into a lookalike phishing website a year or two ago. Oops! Fortunately, I noticed right away, so I simply changed my unique, never-reused password and moved on. But if the attacker were smarter, I might have never noticed. (This particular attack website was relatively unsophisticated and proxied only an unauthenticated view of GitHub, a big clue that something was wrong. Update: I want to be clear that it would have been very easy for the attacker to simply redirect me to the real github.com after stealing my credentials, in which case I would not have noticed the attack and would not have known to change my password.)

You might think multifactor authentication is the best defense against phishing. Nope. Although multifactor authentication is a major security improvement over passwords alone, and the particular attack that tricked me did not attempt to subvert multifactor authentication, it’s actually unfortunately pretty easy for phishers to defeat most multifactor authentication if they wish to do so:
- Multifactor authentication based on SMS is extremely insecure (because SS7 is insecure)
- Multifactor authentication based on phone calls is also insecure (because SIM swapping isn’t going away; determined attackers will steal your phone number if it’s an obstacle to them)
- Multifactor authentication based on authenticator apps (using TOTP or HOTP) is much better in general, but still fails against phishing. When you paste your one-time access code into a phishing website, the phishing website can simply “proxy” the access code you kindly provided to them by submitting it to the real website. This only allows authenticating once, but once is usually enough.
Fortunately, there is a solution: passkeys. Based on FIDO2 and WebAuthn, passkeys resist phishing because the authentication process depends on the domain of the service that you’re actually connecting to. If you think you’re visiting https://example.com, but you’re actually visiting a copycat website with a Cyrillic а instead of Latin a, then no worries: the authentication will fail, and the frustrated attacker will have achieved nothing.

The most popular form of passkey is local biometric authentication running on your phone, but any hardware security key (e.g. YubiKey) is also a good bet.

target.com Is More Secure than Your Bank!

I am not joking when I say that target.com is more secure than your bank (which is probably still relying on SMS or phone calls, and maybe even allows you to authenticate using easily-guessable security questions):

target.com is introducing passkeys!

Good job for supporting passkeys, Target.

It’s probably perfectly fine for Target to support passkeys alongside passwords indefinitely. Higher-security websites that want to resist phishing (e.g. your employer’s SSO service) should consider eventually allowing only passkeys.

No Passkeys in WebKitGTK

Unfortunately for GNOME users, WebKitGTK does not yet support WebAuthn, so passkeys will not work in GNOME Web (Epiphany). That’s my browser of choice, so I’ve never touched a passkey before and don’t actually know how well they work in practice. Maybe do as I say and not as I do? If you require high security, you will unfortunately need to use Firefox or Chrome instead, at least for the time being.

Why Was Michael Visiting a Fake github.com?

The fake github.com appeared higher than the real github.com in the DuckDuckGo search results for whatever I was looking for at the time. :(
August 5, 2024
How to Get Hacked by North Korea
Good news: exploiting memory safety vulnerabilities is becoming more difficult. Traditional security vulnerabilities will remain a serious threat, but attackers prefer to take the path of least resistance, and nowadays that is to attack developers rather than the software itself. Once the attackers control your computer, they can attempt to perform a supply chain attack and insert backdoors into your software, compromising all of your users at once.

If you’re a software developer, it’s time to start focusing on the possibility that attackers will target you personally. Yes, you. If you use Linux, macOS, or Windows, take a moment to check your home directory for a hidden .n2 folder. If it exists, alas! You have been hacked by the North Koreans. (Future malware campaigns will presumably be more stealthy than this.)

Attackers who target developers are currently employing two common strategies:
- Fake job interview: you’re applying to job postings and the recruiter asks you to solve a programming problem of some sort, which involves installing software from NPM (or PyPI, or another language ecosystem’s package manager).
- Fake debugging request: you receive a bug report and the reporter helpfully provides a script for reproducing the bug. The script may have dependencies on packages in NPM (or PyPI, or another language ecosystem’s package manager) to make it harder to notice that it’s malware. I saw a hopefully innocent bug report that was indistinguishable from such an attack just last week.
But of course you would never execute such a script without auditing the dependencies thoroughly! (Insert eyeroll emoji here.)

By now, most of us are hopefully already aware of typosquatting attacks, and the high risk that untrustworthy programming language packages may be malicious. But you might not have considered that attackers might target you personally. Exercise caution whenever somebody asks you to install packages from these sources. Take the time to start a virtual machine before running the code. (An isolated podman container probably works too?) Remember, attackers will target the path of least resistance. Virtualization or containerization raises the bar considerably, and the attacker will probably move along to an easier target.
July 29, 2024
GNOME 45 Core Apps Update
It’s been a few months since I last reviewed the state of GNOME core apps. For GNOME 45, we have implemented the changes proposed in the “Imminent Core App Changes” section of that blog post:
- Loupe enters core as GNOME’s new image viewer app, developed by Christopher Davis and Sophie Herold. Loupe will be branded as Image Viewer and replaces Eye of GNOME, which will no longer use the Image Viewer branding. Eye of GNOME will continue to be maintained by Felix Riemann, and contributions are still welcome there.
- Snapshot enters core as GNOME’s new camera app, developed by Maximiliano Sandoval and Jamie Murphy. Snapshot will be branded as Camera and replaces Cheese. Cheese will continue to be maintained by David King, and contributions are still welcome there.
- GNOME Photos has been removed from core without replacement. This application could have been retained if more developers were interested in it, but we have made the decision to remove it due to lack of volunteers interested in maintaining it. Photos will likely be archived eventually, unless a new maintainer volunteers to save it.
GNOME 45 beta will be released imminently with the above changes. Testing the release and reporting bugs is much appreciated.

We are also looking for volunteers interested in helping implement future core app changes. Specifically, improvements are required for Music to remain in core, and improvements are required for Geary to enter core. We’re also not quite sure what to do with Contacts. If you’re interested in any of these projects, consider getting involved.
August 17, 2023
GNOME Core Apps Update
It’s been a while since my big core app reorganization for GNOME 3.22. Here is a history of core app changes since then:
- GNOME 3.26 (September 2017) added Music, To Do (which has since been renamed to Endeavor), and Document Scanner (simple-scan). (I blogged about this at the time, then became lazy and stopped blogging about core app updates, until now.)
- To Do was removed in GNOME 3.28 (March 2018) due to lack of consensus over whether it should really be a core app. As a result of this, we improved communication between GNOME release team and design team to ensure both teams agree on future core app changes. Mea culpa.
- Documents was removed in GNOME 3.32 (March 2019).
- A new Developer Tools subcategory of core was created in GNOME 3.38 (September 2020), adding Builder, dconf Editor, Devhelp, and Sysprof. These apps are only interesting for software developers and are not intended to be installed by default in general-purpose operating systems like the rest of GNOME core.
- GNOME 41 (September 2021) featured the first larger set of changes to GNOME core since GNOME 3.22. This release removed Archive Manager (file-roller), since Files (nautilus) is now able to handle archives, and also removed gedit (formerly Text Editor). It added Connections and a replacement Text Editor app (gnome-text-editor). It also added a new Mobile subcategory of core, for apps intended for mobile-focused operating systems, featuring the dialer app Calls. (To date, the Mobile subcategory has not been very successful: so far Calls is the only app included there.)
- GNOME 42 (March 2022) featured a second larger set of changes. Screenshot was removed because GNOME Shell gained a built-in screenshot tool. Terminal was removed in favor of Console (kgx). We also moved Boxes to the Developer Tools subcategory, to recommend that it no longer be installed by default in general purpose operating systems.
- GNOME 43 (September 2022) added D-Spy to Developer Tools.
OK, now we’re caught up on historical changes. So, what to expect next?

New Process for Core Apps Changes

Although most of the core app changes have gone smoothly, we ran into some trouble replacing Terminal with Console. Console provides a fresher and simpler user interface on top of vte, the same terminal backend used by Terminal, so Console and Terminal share much of the same underlying functionality. This means work of the Terminal maintainers is actually key to the success of Console. Using a new terminal app rather than evolving Terminal allowed for bigger changes to the default user experience without upsetting users who prefer the experience provided by Terminal. I think Console is generally nicer than Terminal, but it is missing a few features that Fedora Workstation developers thought were important to have before replacing Terminal with Console. Long story short: this core app change was effectively rejected by one of our most important downstreams. Since then, Console has not seen very much development, and accordingly it is unlikely to be accepted into Fedora Workstation anytime soon. We messed up by adding the app to core before downstreams were comfortable with it, and at this point it has become unclear whether Console should remain in core or whether we should give up and bring back Terminal. Console remains for now, but I’m not sure where we go from here. Help welcome.

To prevent this situation from happening again, Chris and Sophie developed a detailed and organized process for adding or removing core apps, including a new Incubator category designed to provide notice to downstreams that we are considering adding new apps to GNOME core. The new Incubator is much more structured than my previous short-lived Incubator attempt in GNOME 3.22. When apps are added to Incubator, I’ve been proactively asking other Fedora Workstation developers to provide feedback to make sure the app is considered ready there, to avoid a repeat of the situation with Console. Other downstreams are also welcome to watch the Incubator/Submission project and provide feedback on newly-submitted apps, which should allow plenty of heads-up so downstreams can let us know sooner rather than later if there are problems with Incubator apps. Hopefully this should ensure apps are actually adopted by downstreams when they enter GNOME core.

Imminent Core App Changes

Currently there are two apps in Incubator. Loupe is a new image viewer app developed by Chris and Sophie to replace Image Viewer (eog). Snapshot is a new camera app developed by Maximiliano and Jamie to replace Cheese. These apps are maturing rapidly and have received primarily positive feedback thus far, so they are likely to graduate from Incubator and enter GNOME core sooner rather than later. The time to provide feedback is now. Don’t be surprised if Loupe is included in core for GNOME 45.

In addition to Image Viewer and Cheese, we are also considering removing Photos. Photos is one of our “content apps” designed to allow browsing an entire collection of files independently of their filesystem locations. Historically, the other two content apps were Documents and Music. The content app strategy did not work very well for Documents, since a document browser doesn’t really offer many advantages over a file browser, but Photos and Music are both pretty decent at displaying your collection of pictures or songs, assuming you have such a collection. We have been discussing what to do with Photos and the other content apps for a very long time, at least since 2015. It took a very long time to reach some rough consensus, but we have finally agreed that the design of Photos still makes sense for GNOME: having a local app for viewing both local and cloud photos is still useful. However, Photos is no longer actively maintained. Several basic functionality bugs imperiled timely release of Fedora 37 last fall, and the app is less useful than previously because it no longer integrates with cloud services like Google Photos. (The Google integration depends on libgdata, which was removed from GNOME 44 because it did not survive the transition to libsoup 3.) Photos has failed the new core app review process due to lack of active maintenance, and will be soon be removed from GNOME core unless a new maintainer steps up to take care of it. Volunteers welcome.

Future Core App Changes

Lastly, I want to talk about some changes that are not yet planned, but might occur in the future. Think of this entire section as brainstorming rather than any concrete plans.

Like Photos, we have also been discussing the status of Music. The popularity of DRM-encumbered cloud music services has increased, and local music storage does not seem to be as common as it used to be. If you do have local music, Music is pretty decent at handling it, but there are prominent bugs and missing features (like the ability to select which folders to index) detracting from the user experience. We do not have consensus on whether having a core app to play local music files still makes sense, since most users probably do not have a local music collection anymore. But perhaps all that is a moot point, because Videos (totem) 3.38 removed support for opening audio files, leaving us with no core apps capable of playing audio for the past 2.5 years. Previously, our default music player was Videos, which was really weird, and now we have none; Music can only play audio files that you’ve navigated to using Music itself, so it’s impossible for Music to be our default music player. My suggestion to rename Videos to Media Player and handle audio files again has not been well-received, so the most likely solution to this conundrum is to teach Music how to open audio files, likely securing its future in core. A merge request exists, but it does not look close to landing. Fedora Workstation is still shipping Rhythmbox rather than Music specifically due to this problem. My opinion is this needs to be resolved for Music to remain in core.

It would be nice to have an email client in GNOME core, since everybody uses email and local clients are much nicer than webmail. The only plausible candidate here is Geary. (If you like Evolution, consider that you might not like the major UI changes and many, many feature removals that would be necessary for Evolution to enter GNOME core.) Geary has only one active maintainer, and adding a big application that depends on just one person seems too risky. If more developers were interested in maintaining Geary, it would feel like a safer addition to GNOME core.

Contacts feels a little out of place currently. It’s mostly useful for storing email addresses, but you cannot actually do anything with them because we have no email application in core. Like Photos, Contacts has had several recent basic functionality bugs that imperiled timely Fedora releases, but these seem to have been largely resolved, so it’s not causing urgent problems. Still, for Contacts to remain in the long term, we’re probably going to need another maintainer here too. And perhaps it only makes sense to keep if we add Geary.

Finally, should Maps move to the Mobile category? It seems clearly useful to have a maps app installed by default on a phone, but I wonder how many desktop users really prefer to use Maps rather than a maps website.

GNOME 44 Core Apps

I’ll end this blog post with an updated list of core apps as of GNOME 44. Here they are:
- Main category (26 apps):
  - Calculator
  - Calendar
  - Characters
  - Cheese
  - Clocks
  - Connections
  - Console (kgx)
  - Contacts
  - Disks (gnome-disk-utility)
  - Disk Usage Analyzer (baobab)
  - Document Scanner (simple-scan)
  - Document Viewer (evince)
  - Files (nautilus)
  - Fonts (gnome-font-viewer)
  - Help (yelp)
  - Image Viewer (eog)
  - Logs
  - Maps
  - Music
  - Photos
  - Software
  - System Monitor
  - Text Editor
  - Videos (totem)
  - Weather
  - Web (epiphany)
- Developer Tools (6 apps):
  - Boxes
  - Builder
  - dconf Editor
  - Devhelp
  - D-Spy
  - sysprof
- Mobile (1 app):
  - Calls
May 10, 2023
WebKitGTK API for GTK 4 Is Now Stable
With the release of WebKitGTK 2.40.0, WebKitGTK now finally provides a stable API and ABI for GTK 4 applications. The following API versions are provided:
- webkit2gtk-4.0: this API version uses GTK 3 and libsoup 2. It is obsolete and users should immediately port to webkit2gtk-4.1. To get this with WebKitGTK 2.40, build with -DPORT=GTK -DUSE_SOUP2=ON.
- webkit2gtk-4.1: this API version uses GTK 3 and libsoup 3. It contains no other changes from webkit2gtk-4.0 besides the libsoup version. With WebKitGTK 2.40, this is the default API version that you get when you build with -DPORT=GTK. (In 2.42, this might require a different flag, e.g. -DUSE_GTK3=ON, which does not exist yet.)
- webkitgtk-6.0: this API version uses GTK 4 and libsoup 3. To get this with WebKitGTK 2.40, build with -DPORT=GTK -DUSE_GTK4=ON. (In 2.42, this might become the default API version.)
WebKitGTK 2.38 had a different GTK 4 API version, webkit2gtk-5.0. This was an unstable/development API version and it is gone in 2.40, so applications using it will break. Fortunately, that should be very few applications. If your operating system ships GNOME 42, or any older version, or the new GNOME 44, then no applications use webkit2gtk-5.0 and you have no extra work to do. But for operating systems that ship GNOME 43, webkit2gtk-5.0 is used by gnome-builder, gnome-initial-setup, and evolution-data-server:
- For evolution-data-server 3.46, use this patch which applies on evolution-data-server 3.46.4.
- For gnome-initial-setup 43, use this patch which applies on gnome-initial-setup 43.2. (Update: for your convenience, this patch will be included in gnome-initial-setup 43.3.)
- For gnome-builder 43, all required changes are present in version 43.7.
Remember, patching is only needed for GNOME 43. Other versions of GNOME will have no problems with WebKitGTK 2.40.

There is no proper online documentation yet, but in the meantime you can view the markdown source for the migration guide to help you with porting your applications. Although the API is now stable and it is close to feature parity with the GTK 3 version, there are some problems to be aware of:
- No support for accessibility. The GTK and WebKitGTK developers are collaborating on a plan to fix this, and I hope this will be ready for WebKitGTK 2.42.
- Multiple NVIDIA proprietary graphics driver users report that it doesn’t work at all. Nobody currently plans to work on this. The WebKitGTK developers do not use NVIDIA graphics, and it will have to be debugged by somebody who does.
- Various smaller regressions here and there. The GTK 4 version is almost as stable as GTK 3, and now is a good time to start using it. As with any interesting new software, there is still some work to do.
Big thanks to everyone who helped make this possible.
March 21, 2023
Stop Using QtWebKit
Today, WebKit in Linux operating systems is much more secure than it used to be. The problems that I previously discussed in this old, formerly-popular blog post are nowadays a thing of the past. Most major Linux operating systems now update WebKitGTK and WPE WebKit on a regular basis to ensure known vulnerabilities are fixed. (Not all Linux operating systems include WPE WebKit. It’s basically WebKitGTK without the dependency on GTK, and is the best choice if you want to use WebKit on embedded devices.) All major operating systems have removed older, insecure versions of WebKitGTK (“WebKit 1”) that were previously a major security problem for Linux users. And today WebKitGTK and WPE WebKit both provide a webkit_web_context_set_sandbox_enabled() API which, if enabled, employs Linux namespaces to prevent a compromised web content process from accessing your personal data, similar to Flatpak’s sandbox. (If you are a developer and your application does not already enable the sandbox, you should fix that!)

Unfortunately, QtWebKit has not benefited from these improvements. QtWebKit was removed from the upstream WebKit codebase back in 2013. Its current status in Fedora is, unfortunately, representative of other major Linux operating systems. Fedora currently contains two versions of QtWebKit:
- The qtwebkit package contains upstream QtWebKit 2.3.4 from 2014. I believe this is used by Qt 4 applications. For avoidance of doubt, you should not use applications that depend on a web engine that has not been updated in eight years.
- The newer qt5-qtwebkit contains Konstantin Tokarev’s fork of QtWebKit, which is de facto the new upstream and without a doubt the best version of QtWebKit available currently. Although it has received occasional updates, most recently 5.212.0-alpha4 from March 2020, it’s still based on WebKitGTK 2.12 from 2016, and the release notes bluntly state that it’s not very safe to use. Looking at WebKitGTK security advisories beginning with WSA-2016-0006, I manually counted 507 CVEs that have been fixed in WebKitGTK 2.14.0 or newer.
These CVEs are mostly (but not exclusively) remote code execution vulnerabilities. Many of those CVEs no doubt correspond to bugs that were introduced more recently than 2.12, but the exact number is not important: what’s important is that it’s a lot, far too many for backporting security fixes to be practical. Since qt5-qtwebkit is two years newer than qtwebkit, the qtwebkit package is no doubt in even worse shape. And because QtWebKit does not have any web process sandbox, any remote code execution is game over: an attacker that exploits QtWebKit gains full access to your user account on your computer, and can steal or destroy all your files, read all your passwords out of your password manager, and do anything else that your user account can do with your computer. In contrast, with WebKitGTK or WPE WebKit’s web process sandbox enabled, attackers only get access to content that’s mounted within the sandbox, which is a much more limited environment without access to your home directory or session bus.

In short, it’s long past time for Linux operating systems to remove QtWebKit and everything that depends on it. Do not feed untrusted data into QtWebKit. Don’t give it any HTML that you didn’t write yourself, and certainly don’t give it anything that contains injected data. Uninstall it and whatever applications depend on it.

Update: I forgot to mention what to do if you are a developer and your application still uses QtWebKit. You should ensure it uses the most recent release of QtWebEngine for Qt 6. Do not use old versions of Qt 6, and do not use QtWebEngine for Qt 5.
November 4, 2022

Best Practices for Build Options

Build options are sometimes tricky to get right. Here’s my take on best practices. The golden rule is to set good upstream defaults. Everything else follows from this.

Rule 1: Choose Good Upstream Defaults

Occasionally I see upstream developers complain that a downstream operating system has built their software “incorrectly,” generally because some important dependency or feature has been disabled. Sometimes downstreams really do mess up, but more often poor upstream defaults are to blame. Upstreams must set good defaults because upstream software developers know far more about their projects than downstream packagers do. Upstreams generally have a good idea of how they expect software to be built by downstreams, whereas downstreams generally do not. Accordingly, do the thinking upstream whenever possible. When you set good defaults, it becomes easier for downstreams to build your software the way you expect, because active effort is required for downstreams to mess things up.

For example, say a project has the following two build options:

Option Name	Default Value
`--enable-thing-you-usually-want-enabled`	`false`
`--disable-thing-you-rarely-want-disabled`	`true`

The thing you usually want enabled is not enabled by default, and the thing you rarely want disabled is disabled by default. Sad. Unfortunately, this pattern used to be extremely common with Autotools build systems, because in the real world, the names of the options are more subtle than this, and also because nobody likes squinting at configure.ac files to audit whether the options make sense. Meson build systems tend to be somewhat better because meson_options.txt is separate from the rest of the build definitions, making it easier to review all your options and check to ensure their defaults are set appropriately. However, there are still a few more subtle ways you can mess up your Meson build system, which I’ll discuss below.

Rule 2: Prefer Upstream Defaults Downstream

Conversely, downstreams should not second-guess upstream defaults unless you have a good reason to do so and really know what you’re doing.

For example, glib-networking’s Meson build system provides you with two different TLS backend options: OpenSSL or GnuTLS. The GnuTLS backend is enabled by default (well, sort of, see the next section on auto dependencies) while the OpenSSL backend is disabled by default. There’s a good reason for this: the OpenSSL backend is half-baked, partially due to bugs in glib-networking, and partially because OpenSSL just cannot do certain things that GnuTLS can. The OpenSSL backend is provided because some environments really do require it for license reasons, but it’s not the right choice for general-purpose operating systems. It may be tempting to think that you can pick whichever library you prefer, but you really should not.

Another example: WebKitGTK’s CMake build system provides a USE_WPE_RENDERER build option, which is enabled by default. This option controls which graphics rendering stack is used: if enabled, rendering uses libwpe and wpebackend-fdo, whereas if disabled, rendering uses a legacy internal Wayland compositor. The option is provided because libwpe and wpebackend-fdo are newer dependencies that are expected to be unavailable on older (pre-2020) operating systems, so older operating systems legitimately need to be able to disable it. But this configuration receives little serious testing and the upstream developers do not notice when it breaks, so you really should not be using it unless you have to. This recently caused rendering bugs that appeared to be distribution-specific, which upstream developers were not willing to investigate because upstream developers could not reproduce the issue.

Sticking with upstream defaults is generally safest. Sometimes you really need to override them. If so, go ahead. Just be careful.

Rule 3: Handle Auto Dependencies and Features with Care

The worst default ever is “build with feature enabled only if dependency xyz is installed; otherwise, disable it.” This is called an auto dependency. If using CMake or Autotools, auto dependencies are almost never permissible, and in this case “handle with care” means repent and fix it. Auto dependencies are acceptable only if you are using the Meson build system.

The theory behind auto dependencies is that it’s convenient for people casually building the software to do so with the fewest number of build errors possible, which is true. Problem is, this screws over serious production builds of your software by requiring your downstreams to possess magical knowledge of what dependencies are required to build your software properly. Users generally expect most features to be enabled at build time, but if upstream uses auto dependencies, the result is a build dependencies lottery: your feature will be enabled or disabled due to luck, based on which downstream build dependencies transitively depend on which other build dependencies. Even if it’s built properly today, that could easily change tomorrow when some other dependency changes in some other package. Just say no. Do not expect downstreams to look at your build system at all, let alone study the possible build options and make accurate judgments about which build dependencies are required to enable them. Avoiding auto dependencies is part of setting good upstream defaults.

Look at this example from WebKit’s OptionsGTK.cmake:

if (ENABLE_SPELLCHECK)
    find_package(Enchant)
    if (NOT PC_ENCHANT_FOUND)
        message(FATAL_ERROR "Enchant is needed for ENABLE_SPELLCHECK")
    endif ()
endif ()

ENABLE_SPELLCHECK is ON by default. If you don’t have enchant installed, the build will fail unless you manually disable it by passing -DENABLE_SPELLCHECK=OFF". This makes it hard to mess up: downstreams have to make an intentional choice to build with spellchecking disabled. It cannot happen by accident.

Many projects would instead write it like this:

if (ENABLE_SPELLCHECK)
    find_package(Enchant)
    if (NOT PC_ENCHANT_FOUND)
        set(ENABLE_SPELLCHECK OFF)
    endif ()
endif ()

But this is an auto dependency, which results in downstream build dependency lottery. If you write your build system like this, you cannot complain when the feature winds up disabled by mistake in downstream builds. Don’t do this.

Exception: if you use Meson, auto dependencies are acceptable if you use the feature option type and set the default to auto. Although auto features are silently enabled or disabled by default depending on whether the required dependency is present, you can easily override this behavior for serious production builds by passing -Dauto_features=enabled, which enables all the auto features and will result in build failures if dependencies are missing. All major Linux operating systems do this when building Meson packages, so Meson’s auto features should not cause problems.

Rule 4: Be Very Careful with Meson’s Build Types

Let’s categorize software builds into production builds or non-production builds. A production build is intended to be either distributed to end users or else run production workloads, whereas a non-production build is intended for testing or development and might have extra debug features enabled, like slow assertions. (These are more commonly called release builds or debug builds, but that terminology would be very confusing in the context of this discussion, as you’re about to see.)

The CMake and Meson build systems give us more than just two build types. Compare CMake build types to the corresponding Meson build types:

CMake Build Type	Meson Build Type	Meson `debug` Option	Production Build? (excludes Windows)
`Release`	`release`	`false`	Yes
`Debug`	`debug`	`true`	No
`RelWithDebInfo`	`debugoptimized`	`true`	Yes, be careful!
`MinSizeRel`	`minsize`	`true`	Yes, be careful!
N/A	`plain`	`false`	Yes

To simplify, let’s exclude Windows from the discussion for now. (We’ll come back to Windows in a bit.) Now, notice the nomenclature difference between CMake’s RelWithDebInfo (“release with debuginfo”) build type versus Meson’s debugoptimized build type. This build type functions exactly the same for both Meson and CMake, but CMake’s name is better because it clearly indicates that this is a release or production build type, whereas the Meson name seems to indicate it is a debug or non-production build type, and Meson’s debug option is set to true. In fact, it is an optimized production build with debuginfo enabled, the same style of build that almost all Linux operating systems use for their packages (although operating systems use the plain build type instead). The same problem exists for Meson’s minsize build type. This is another production build type where debug is true.

The Meson build type name accurately reflects that the debug option is enabled, but this is very confusing because for most platforms, that option only controls whether debuginfo is generated. Looking at the table above, you can see that you must never use the debug option alone to decide whether you have a production build or a non-production build. As the table indicates, the only non-production build type is the vanilla debug build type, which you can detect by checking the combination of the debug and optimization options. You have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build. I wrote this in bold because it is important and not at all obvious. (However, before applying this rule in a cross-platform project, keep reading below to see the huge caveat regarding Windows.)

Here’s an example of what not to do in your meson.build:

# Use debug/optimization flags to determine whether to enable debug or disable
# cast checks
gtk_debug_cflags = []
debug = get_option('debug')
optimization = get_option('optimization')
if debug
  gtk_debug_cflags += '-DG_ENABLE_DEBUG'
  if optimization in ['0', 'g']
    gtk_debug_cflags += '-DG_ENABLE_CONSISTENCY_CHECKS'
  endif
elif optimization in ['2', '3', 's']
  gtk_debug_cflags += ['-DG_DISABLE_CAST_CHECKS', '-DG_DISABLE_ASSERT']
endif

This is from GTK’s meson.build. The code based only on the optimization option is OK, but the code that sets -DG_ENABLE_DEBUG is looking only at the debug option. What the code really wants to do is set G_ENABLE_DEBUG if this is a non-production build, but instead it is tied to debuginfo, which is not the desired result. Downstreams are forced to scratch their heads as to what they should do. Impassioned build engineers have held spirited debates about this particular meson.build snippet. Don’t do this! (I will submit a merge request to improve this.)

Here’s a much better, although still not perfect, example of how to do the same thing, this time from GLib’s meson.build:

# Use debug/optimization flags to determine whether to enable debug or disable
# cast checks
glib_debug_cflags = []
glib_debug = get_option('glib_debug')
if glib_debug.enabled() or (glib_debug.auto() and get_option('debug'))
  glib_debug_cflags += ['-DG_ENABLE_DEBUG']
  message('Enabling various debug infrastructure')
elif get_option('optimization') in ['2', '3', 's']
  glib_debug_cflags += ['-DG_DISABLE_CAST_CHECKS']
  message('Disabling cast checks')
endif

if not get_option('glib_assert')
  glib_debug_cflags += ['-DG_DISABLE_ASSERT']
  message('Disabling GLib asserts')
endif

if not get_option('glib_checks')
  glib_debug_cflags += ['-DG_DISABLE_CHECKS']
  message('Disabling GLib checks')
endif

Notice how GLib provides explicit build options that allow downstreams to decide whether debug should be enabled or not. Using explicit build options here was a good idea! The defaults for glib_assert and glib_checks are intentionally set to true to encourage their use in production builds, while G_DISABLE_CAST_CHECKS is based only on the optimization level. But sadly, if not explicitly configured, GLib sets the value of the glib_debug_cflags option automatically, based on only the value of the debug option. This is actually an OK use of an auto feature, because it is a carefully-considered attempt to provide good default behavior for downstreams, but it fails here because it assumes that debug means “non-production build,” which we have previously established cannot be determined without checking optimization as well. (I will submit a merge request to improve this.)

Here’s another helpful table that shows how the various build types correspond to CFLAGS:

CMake/Meson Build Type	CMake `CFLAGS`	Meson `CFLAGS`
`Release`/`release`	`-O3 -DNDEBUG`	`-O3`
`Debug`/`debug`	`-g`	`-O0 -g`
`RelWithDebInfo`/`debugoptimized`	`-O2 -g -DNDEBUG`	`-O2 -g`
`MinSizeRel`/`minsize`	`-Os -DNDEBUG`	`-Os -g`

Notice Meson’s minsize build type includes debuginfo, while CMake’s does not. Since debuginfo requires a huge amount of space, CMake’s behavior seems better here. We’ll discuss NDEBUG momentarily.

OK, so that all makes sense, right? Well I thought so too, until I ran a draft of this blog post past Jussi, who pointed out that the Meson build types function completely differently on Windows than they do on other platforms. Unfortunately, whereas on most platforms the debug option only controls debuginfo generation, on Windows it instead controls whether the C library enables extra runtime debugging checks. So while debugoptimized and minsize are production build types on Linux and have nice corresponding CMake build types, they are non-production build types on Windows. This is a Meson defect. The point to remember is that the debug option is completely different on Windows than it is on other platforms, so my otherwise-nice rule for detecting production builds does not work properly on Windows. Cross-platform projects need to be especially careful with the debug option. There are various ways this could be fixed in Meson in the future: a nice simple proposal would be to add a new debuginfo option separate from debug, then deprecate the debugoptimized build type and replace it with releasewithdebuginfo.

CMake dodges all these problems and avoids any ambiguity because its build types are named differently: “RelWithDebInfo” and “MinSizeRel” leave no doubt that you are dealing with a release (production) build.

Rule 5: Think about NDEBUG

The other behavior difference visible in the table above is that CMake defines NDEBUG for its production build types, whereas Meson has a separate option bn_debug that controls whether to define NDEBUG. NDEBUG controls whether the C and C++ assert() macro is enabled: if this value is defined, asserts are disabled. CMake is the only build system that defines NDEBUG for you automatically. You really need to think about this: if your software is performance-sensitive and contains slow assertions, the consequences of messing this up are severe, e.g. see this historical mesa bug where Fedora’s mesa package suffered a 10x slowdown because mesa upstream accidentally enabled assertions by default. Again, please, do not blame downstreams for bad upstream defaults: downstreams are (usually) not experts on upstream software, and cannot possibly be expected to pick better defaults than upstream’s.

Meson allows developers to explicitly choose whether to enable assertions in production builds. Assertions are enabled in production by default, the opposite of CMake’s behavior. Some developers prefer that all asserts be disabled in production builds to optimize speed as far as possible, but this is usually not the best choice: having assertions enabled in production provides valuable confidence that your code is actually functioning as intended, and often improves security by converting many code execution exploits into denial of service. Most assertions do not have noticeable performance impact, so I prefer to leave most assertions enabled by default in production, and disable only asserts that are slow. Hence, I like Meson’s default behavior. But many engineers disagree with me, and some projects really do need assertions disabled in production; in particular, everyone agrees that performance-sensitive assertions should not be running in production builds. If you’re using Meson and want assertions disabled in production builds, you’re in theory supposed to use b_ndebug=if-release, but it doesn’t actually work because it only disables assertions if your build type is release or plain, while leaving assertions enabled for debugoptimized and minsize builds. We’ve already established that these are both production build types, so sadly that behavior is broken. Instead, it’s better to manually define NDEBUG except in non-production builds. Again, you have a non-production (debug) build when debug is true and if optimization is 0 or g; otherwise, you have a production build (except on Windows).

Rule 6: plain Means “Production Build,” Not “No Flags”

The GNOME release team recently had an exciting debate about the meaning of Meson’s plain build type. It is impressive how build engineers can be so enthusiastic about build options!

I asked Jussi to explain the plain build type. His response was: “Meson does not, by itself, add compiler flags,” emphasis mine. It does not mean your project should not add its own compiler flags, and it certainly does not mean it’s OK to set bad defaults as long as they are vanilla-flavored. It is a production build type, and you should ensure that it receives defaults in line with the other production build types. You’ll be fine if you follow the same rule we already established: you have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build (except on Windows).

The plain build type exists because it makes it easier for downstreams to implement their own compiler flags. Downstreams have to pass -O2 -g via CFLAGS because CMake and Meson are the only build systems that can do this automatically, and it’s easier to let downstreams disable this functionality than to force downstreams to set different CFLAGS separately for each supported build system.

Rule 7: Don’t Forget Hardening Flags

Sadly, by default all build systems generate insecure, unhardened binaries that should never be used in production. This is true of Autotools, CMake, Meson, and likely also whatever other build system you are thinking of. You must manually add your own hardening flags or your builds will be insecure. Unfortunately this is a little complicated to do. Fedora and RHEL’s recommended compiler flags are documented here. The freedesktop-sdk and GNOME Flatpak runtimes use these recommendations as the basis for their compiler flags, and by default, so do Flatpak applications based on these runtimes. It’s actually not very easy to replicate the same hardening flags since libraries and executables require different flags, so naively setting CFLAGS is not possible. Fedora and RHEL use GCC spec files to achieve this, whereas freedesktop-sdk relies on building GCC itself with a non-default configuration (yes, second-guessing upstream defaults). The good news is that all major downstreams have figured this out one way or another, so you only need to worry about it if you’re doing your own production builds.

Conclusion

That’s all you need to know. Remember, upstreams know their software better than downstreams do, so the hard thinking should happen upstream. We can minimize mistakes and trouble if upstreams carefully set good defaults, and if downstreams deviate from those defaults only when truly necessary. Keep these rules in mind to avoid unnecessary bug reports from dissatisfied users.

History

I updated this blog post on August 3, 2022 to change the primary guidance to “You have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build.” Originally, I failed to consider g. -Og means “optimize debugging experience” and it is supposedly a better choice than -O0 for debugging according to gcc(1). It’s definitely not actually, but at least that’s the intent.

Jussi responded to this blog post on August 13, 2022 to discuss why Meson’s build types don’t work so well. Read his response.

July 15, 2022

Creating Quality Backtraces for Crash Reports
Hello Linux users! Help developers help you: include a quality backtrace taken with gdb each and every time you create an issue report for a crash. If you don’t, most developers will request that you provide a backtrace, then ignore your issue until you manage to figure out how to do so. Save us the trouble and just provide the backtrace with your initial report, so everything goes smoother. (Backtraces are often called “stack traces.” They are the same thing.)

Don’t just copy the lower-quality backtrace you see in your system journal into your issue report. That’s a lot better than nothing, but if you really want the crash to be fixed, you should provide the developers with a higher-quality backtrace from gdb. Don’t know how to get a quality backtrace with gdb? Read on.

(Note: this blog post is occasionally updated to maintain relevance and remove historical information. Last update: October 2023)

Modern Crash Reporting

Here are instructions for getting a quality backtrace for a crashing process on Fedora 35, or any other Linux-based OS that enables coredumpctl and debuginfod:
```
$ coredumpctl gdb
(gdb) bt full
```
Enter ‘c’ (continue) when required. Enter ‘y’ when prompted to enable debuginfod. When it’s done printing, press ‘q’ to quit. That’s it! That’s all you need to know. You’re done. Two points of note:
- When a process crashes, a core dump is caught by systemd-coredump and stored for future use. The coredumpctl gdb command opens the most recent core dump in gdb. systemd-coredump has been enabled by default in Fedora since Fedora 26.
- After opening the core dump, gdb uses debuginfod to automatically download all required debuginfo packages, ensuring the generated backtrace is useful. debuginfod has been enabled by default in Fedora since Fedora 35.
Quality Linux operating systems ought to configure both debuginfod and systemd-coredump for you, so that they are running out-of-the-box. If you’re missing debuginfod or systemd-coredump, then read on to learn how to take a backtrace without these tools. It will be more complicated, of course.

The steps above will not work if the crashing application uses Flatpak. If you’re trying to take a backtrace for an application that uses Flatpak and already have systemd-coredump working, then go ahead and skip ahead to the section below on Flatpak. If you don’t have systemd-coredump working yet, read on.

systemd-coredump

If your operating system enables systemd-coredump by default, then congratulations! This makes reporting crashes much easier because you can easily retrieve a core dump for any recent crash using the coredumpctl command. For example, coredumpctl alone will list all available core dumps. coredumpctl gdb will open the core dump of the most recent crash in gdb. coredumpctl gdb 1234 will open the core dump corresponding to the most recent crash of a process with pid 1234. It doesn’t get easier than this.

Core dumps are stored under /var/lib/systemd/coredump. systemd-coredump will automatically delete core dumps that exceed configurable size limits (2 GB by default). It also deletes core dumps if your free disk space falls below a configurable threshold (15% free by default). Additionally, systemd-tmpfiles will delete core dumps automatically after some time has passed (three days by default). This ensures your disk doesn’t fill up with old core dumps. Although most of these settings seem good to me, the default 2 GB size limit is way too low in my opinion, as it causes systemd to immediately discard crashes of any application that uses WebKit. I recommend raising this limit to 20 GB by creating an /etc/systemd/coredump.conf.d/50-coredump.conf drop-in containing the following:
```
[Coredump]
ProcessSizeMax=20G
ExternalSizeMax=20G
```
The other settings are likely sufficient to prevent your disk from filling up with core dumps.

Sadly, although systemd-coredump has been around for a good while now and many Linux operating systems have it enabled by default, many still do not. Most notably, the Debian and Ubuntu ecosystems are still not yet on board. To check if systemd-coredump is enabled on your system:
```
$ cat /proc/sys/kernel/core_pattern
```
If you see systemd-coredump, then you’re good.

To enable it in Debian or Ubuntu, just install it:
```
# apt install systemd-coredump
```
Ubuntu users, note this will cause apport to be uninstalled, since it is currently incompatible. Also note that I switched from $ (which indicates a normal prompt) to # (which indicates a root prompt).

In other operating systems, you may have to manually enable it:
```
# echo "kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" > /etc/sysctl.d/50-coredump.conf
# /usr/lib/systemd/systemd-sysctl --prefix kernel.core_pattern
```
Note the exact core pattern to use changes occasionally in newer versions of systemd, so these instructions may not work everywhere.

Detour: Manual Core Dump Handling

If you don’t want to enable systemd-coredump, life is harder and you should probably reconsider, but it’s still possible to debug most crashes. First, enable core dump creation by removing the default 0-byte size limit on core files:
```
$ ulimit -c unlimited
```
This change is temporary and only affects the current instance of your shell. For example, if you open a new tab in your terminal, you will need to set the ulimit again in the new tab.

Next, run your program in the terminal and try to make it crash. A core file will be generated in the current directory. Open it by starting the program that crashed in gdb and passing the filename of the core file that was created. For example:
```
$ gdb gnome-chess ./core
```
This is downright primitive, though:
- You’re going to have a hard time getting backtraces for services that are crashing, for starters. If starting the service normally, how do you set the ulimit? I’m sure there’s a way to do it, but I don’t know how! It’s probably easier to start the service manually, but then what command line flags are needed to properly do so? It will be different for each service, and you have to figure this all out for yourself.
- Special situations become very difficult. For example, if a service is crashing only when run early during boot, or only during an initial setup session, you are going to have an especially hard time.
- If you don’t know how to reproduce a crash that occurs only rarely, it’s inevitably going to crash when you’re not prepared to manually catch the core dump. Sadly, not all crashes will occur on demand when you happen to be running the software from a terminal with the right ulimit configured.
- Lastly, you have to remember to delete that core file when you’re done, because otherwise it will take up space on your disk space until you do. You’ll probably notice if you leave core files scattered in your home directory, but you might not notice if you’re working someplace else.
Seriously, just enable systemd-coredump. It solves all of these problems and guarantees you will always have easy access to a core dump when something crashes, even for crashes that occur only rarely.

Debuginfo Installation

Now that we know how to open a core dump in gdb, let’s talk about debuginfo. When you don’t have the right debuginfo packages installed, the backtrace generated by gdb will be low-quality. Almost all Linux software developers deal with low-quality backtraces on a regular basis, because most users are not very good at installing debuginfo. Again, if you’re using Fedora 35 or newer, you don’t have to worry about this anymore because debuginfod will take care of everything for you. I would be thrilled if other Linux operating systems would quickly adopt debuginfod so we can put the era of low-quality crash reports behind us. But if you’re using an operating system that does not provide a debuginfod server, you’ll need to learn how to install debuginfo manually.

As an example, I decided to force gnome-chess to crash using the command killall -SEGV gnome-chess, then I ran coredumpctl gdb to open the resulting core dump in gdb. After a bunch of spam, I saw this:
```
Missing separate debuginfos, use: dnf debuginfo-install gnome-chess-40.1-1.fc34.x86_64
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `/usr/bin/gnome-chess --gapplication-service'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa23d8b55bf in __GI___poll (fds=0x5636deb06930, nfds=2, timeout=2830)
    at ../sysdeps/unix/sysv/linux/poll.c:29
29  return SYSCALL_CANCEL (poll, fds, nfds, timeout);
[Current thread is 1 (Thread 0x7fa23ca0cd00 (LWP 140177))]
(gdb)
```
If you are using Fedora, RHEL, or related operating systems, the line “missing separate debuginfos” is a good hint that debuginfo is missing. It even tells you exactly which dnf debuginfo-install command to run to remedy this problem! But this is a Fedora ecosystem feature, and you won’t see this on most other operating systems. Usually, you’ll need to manually locate the right debuginfo packages to install. Debian and Ubuntu users can do this by searching for and installing -dbg or -dbgsym packages until each frame in the backtrace looks good. You’ll just have to manually guess the names of which debuginfo packages you need to install based on the names of the libraries in the backtrace. Look here for instructions for popular operating systems.

How do you know when the backtrace looks good? When each frame has file names, line numbers, function parameters, and local variables! Here is an example of a bad backtrace, if I continue the gnome-chess example above without properly installing the required debuginfo:
```
(gdb) bt full
#0 0x00007fa23d8b55bf in __GI___poll (fds=0x5636deb06930, nfds=2, timeout=2830)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
#1 0x00007fa23eee648c in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
#2 0x00007fa23ee8fc03 in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3 0x00007fa23e4b599d in g_application_run () at /lib64/libgio-2.0.so.0
#4 0x00005636dd7b79a2 in chess_application_main ()
#5 0x00007fa23d7e7b75 in __libc_start_main (main=0x5636dd7aaa50 <main>, argc=2, argv=0x7fff827b6438, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff827b6428)
    at ../csu/libc-start.c:332
        self = <optimized out>
        result = <optimized out>
        unwind_buf = 
              {cancel_jmp_buf = {{jmp_buf = {94793644186304, 829313697107602221, 94793644026480, 0, 0, 0, -829413713854928083, -808912263273321683}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x2, 0x7fff827b6438}, data = {prev = 0x0, cleanup = 0x0, canceltype = 2}}}
        not_first_call = <optimized out>
#6 0x00005636dd7aaa9e in _start ()
```
This backtrace has six frames, which shows where the code was during program execution when the crash occurred. You can see line numbers for frame #0 (poll.c:29) and #5 (libc-start.c:332), and these frames also show the values of function parameters and variables on the stack, which are often useful for figuring out what went wrong. These frames have good debuginfo because I already had debuginfo installed for glibc. But frames #1 through #4 do not look so useful, showing only function names and the library and nothing else. This is because I’m using Fedora 34 rather than Fedora 35, so I don’t have debuginfod yet, and I did not install proper debuginfo for libgio, libglib, and gnome-chess. (The function names are actually only there because programs in Fedora include some limited debuginfo by default. In many operating systems, you will see ??? instead of function names.) A developer looking at this backtrace is not going to know what went wrong.

Now, let’s run the recommended debuginfo-install command:
```
# dnf debuginfo-install gnome-chess-40.1-1.fc34.x86_64
```
When the command finishes, we’ll start gdb again, using coredumpctl gdb just like before. This time, we see this:
```
Missing separate debuginfos, use: dnf debuginfo-install avahi-libs-0.8-14.fc34.x86_64 colord-libs-1.4.5-2.fc34.x86_64 cups-libs-2.3.3op2-7.fc34.x86_64 fontconfig-2.13.94-2.fc34.x86_64 glib2-2.68.4-1.fc34.x86_64 graphene-1.10.6-2.fc34.x86_64 gstreamer1-1.19.1-2.1.18.4.fc34.x86_64 gstreamer1-plugins-bad-free-1.19.1-3.1.18.4.fc34.x86_64 gstreamer1-plugins-base-1.19.1-2.1.18.4.fc34.x86_64 gtk4-4.2.1-1.fc34.x86_64 json-glib-1.6.6-1.fc34.x86_64 krb5-libs-1.19.2-2.fc34.x86_64 libX11-1.7.2-3.fc34.x86_64 libX11-xcb-1.7.2-3.fc34.x86_64 libXfixes-6.0.0-1.fc34.x86_64 libdrm-2.4.107-1.fc34.x86_64 libedit-3.1-38.20210714cvs.fc34.x86_64 libepoxy-1.5.9-1.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libidn2-2.3.2-1.fc34.x86_64 librsvg2-2.50.7-1.fc34.x86_64 libstdc++-11.2.1-1.fc34.x86_64 libxcrypt-4.4.25-1.fc34.x86_64 llvm-libs-12.0.1-1.fc34.x86_64 mesa-dri-drivers-21.1.8-1.fc34.x86_64 mesa-libEGL-21.1.8-1.fc34.x86_64 mesa-libgbm-21.1.8-1.fc34.x86_64 mesa-libglapi-21.1.8-1.fc34.x86_64 nettle-3.7.3-1.fc34.x86_64 openldap-2.4.57-5.fc34.x86_64 openssl-libs-1.1.1l-1.fc34.x86_64 pango-1.48.9-2.fc34.x86_64
```
Yup, Fedora ecosystem users will need to run dnf debuginfo-install twice to install everything required, because gdb doesn’t list all required packages until the second time. Next, we’ll run coredumpctl gdb one last time. There will usually be a few more debuginfo packages that are still missing because they’re not available in the Fedora repositories, but now you’ll probably have enough to get a quality backtrace:
```
(gdb) bt full
#0  0x00007fa23d8b55bf in __GI___poll (fds=0x5636deb06930, nfds=2, timeout=2830)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
#1  0x00007fa23eee648c in g_main_context_poll
    (priority=, n_fds=2, fds=0x5636deb06930, timeout=, context=0x5636de7b24a0)
    at ../glib/gmain.c:4434
        ret = 
        errsv = 
        poll_func = 0x7fa23ee97c90 
        max_priority = 2147483647
        timeout = 2830
        some_ready = 
        nfds = 2
        allocated_nfds = 2
        fds = 0x5636deb06930
        begin_time_nsec = 30619110638882
#2  g_main_context_iterate.constprop.0
    (context=context@entry=0x5636de7b24a0, block=block@entry=1, dispatch=dispatch@entry=1, self=)
    at ../glib/gmain.c:4126
        max_priority = 2147483647
        timeout = 2830
        some_ready = 
        nfds = 2
        allocated_nfds = 2
        fds = 0x5636deb06930
        begin_time_nsec = 30619110638882
#3  0x00007fa23ee8fc03 in g_main_context_iteration
    (context=context@entry=0x5636de7b24a0, may_block=may_block@entry=1) at ../glib/gmain.c:4196
        retval = 
#4  0x00007fa23e4b599d in g_application_run
    (application=0x5636de7ae260 [ChessApplication], argc=-2105843004, argv=)
    at ../gio/gapplication.c:2560
        arguments = 0x5636de7b2400
        status = 0
        context = 0x5636de7b24a0
        acquired_context = 
        __func__ = "g_application_run"
#5  0x00005636dd7b79a2 in chess_application_main (args=0x7fff827b6438, args_length1=2)
    at src/gnome-chess.p/gnome-chess.c:5623
        _tmp0_ = 0x5636de7ae260 [ChessApplication]
        _tmp1_ = 0x5636de7ae260 [ChessApplication]
        _tmp2_ = 
        result = 0
...
```
I removed the last two frames because they are triggering a strange WordPress bug, but that’s enough to get the point. It looks much better! Now the developer can see exactly where the program crashed, including filenames, line numbers, and the values of function parameters and variables on the stack. This is as good as a crash report is normally going to get. In this case, it crashed when running poll() because gnome-chess was not actually doing anything at the time of the crash, since we crashed it by manually sending a SIGSEGV signal. Normally the backtrace will look more interesting.

debuginfod for Debian Users

Debian users can use debuginfod, but it has to be enabled manually:
```
$ DEBUGINFOD_URLS=https://debuginfod.debian.net/ gdb
```
See here for more information. This requires Debian 11 “bullseye” or newer. If you’re using Ubuntu or other operating systems derived from Debian, you’ll need to wait until a debuginfod server for your operating system is available.

Flatpak

If your application uses Flatpak, you can use the flatpak-coredumpctl script to open core dumps in gdb. For most runtimes, including those distributed by GNOME or Flathub, you will need to manually install (a) the debug extension for your app, (b) the SDK runtime corresponding to the platform runtime that you are using, and (c) the debug extension for the SDK runtime. For example, to install everything required to debug Epiphany 40 from Flathub, you would run:
```
$ flatpak install org.gnome.Epiphany.Debug//stable
$ flatpak install org.gnome.Sdk//40
$ flatpak install org.gnome.Sdk.Debug//40
```
(flatpak-coredumpctl will fail to start if you don’t have the correct SDK runtime installed, but it will not fail if you’re missing the debug extensions. You’ll just wind up with a bad backtrace.)

The debug extensions need to exactly match the versions of the app and runtime that crashed, so backtrace generation may be unreliable after you install them for the very first time, because you would have installed the latest versions of the extensions, but your core dump might correspond to an older app or runtime version. If the crash is reproducible, it’s a good idea to run flatpak update after installing to ensure you have the latest version of everything, then reproduce the crash again.

Once your debuginfo is installed, you can open the backtrace in gdb using flatpak-coredumpctl. You just have to tell flatpak-coredumpctl the app ID to use:
```
$ flatpak-coredumpctl org.gnome.Epiphany
```
You can pass matches to coredumpctl using -m. For example, to open the core dump corresponding to a crashed process with pid 1234:
```
$ flatpak-coredumpctl -m 1234 org.gnome.Epiphany
```
Thibault Saunier wrote flatpak-coredumpctl because I complained about how hard it used to be to debug crashed Flatpak applications. Clearly it is no longer hard. Thanks Thibault!

On newer versions of Debian and Ubuntu, flatpak-coredumpctl is included in the libflatpak-dev subpackage rather than the base flatpak package, so you’ll have to install libflatpak-dev first.

Fedora Flatpaks

Flatpaks distributed by Fedora are different than those distributed by GNOME or by Flathub because they do not have debug extensions. Historically, this has meant that debugging crashes was impractical. The best solution was to give up.

Good news! Fedora’s Flatpaks are compatible with debuginfod, which means debug extensions will no longer be missed. You do still need to manually install the org.fedoraproject.Sdk runtime corresponding to the version of the org.fedoraproject.Platform runtime that the application uses, because this is required for flatpak-coredumpctl to work, but nothing else is required. For example, to get a backtrace for Fedora’s Epiphany Flatpak using a Fedora 35 host system, I ran:
```
$ flatpak install org.fedoraproject.Sdk//f34
$ flatpak-coredumpctl org.gnome.Epiphany
(gdb) bt full
```
(The f34 is not a typo. Epiphany currently uses the Fedora 34 runtime regardless of what host system you are using.)

That’s it!

Miscellany

At this point, you should know enough to obtain a high-quality backtrace on most Linux systems. That will usually be all you really need, but it never hurts to know a little more, right?

Alternative Types of Backtraces

At the top of this blog post, I suggested using bt full to take the backtrace because this type of backtrace is the most useful to most developers. But there are other types of backtraces you might occasionally want to collect:
- bt on its own without full prints a much shorter backtrace without stack variables or function parameters. This form of the backtrace is more useful for getting a quick feel for where the bug is occurring, because it is much shorter and easier to read than a full backtrace. But because there are no stack variables or function parameters, it might not contain enough information to solve the crash. I sometimes like to paste the first few lines of a bt backtrace directly into an issue report, then submit the bt full version of the backtrace as an attachment, since an entire bt full backtrace can be long and inconvenient if pasted directly into an issue report.
- thread apply all bt prints a backtrace for every thread. Normally these backtraces are very long and noisy, so I don’t collect them very often, but when a threadsafety issue is suspected, this form of backtrace will sometimes be required.
- thread apply all bt full prints a full backtrace for every thread. This is what automated bug report tools generally collect, because it provides the most information. But these backtraces are usually huge, and this level of detail is rarely needed, so I normally recommend starting with a normal bt full.
If in doubt, just use bt full like I showed at the top of this blog post. Developers will let you know if they want you to provide the backtrace in a different form.

gdb Logging

You can make gdb print your session to a file. For longer backtraces, this may be easier than copying the backtrace from a terminal:
```
(gdb) set logging enabled
```
Memory Corruption

While a backtrace taken with gdb is usually enough information for developers to debug crashes, memory corruption is an exception. Memory corruption is the absolute worst. When memory corruption occurs, the code will crash in a location that may be far removed from where the actual bug occurred, rendering gdb backtraces useless for tracking down the bug. As a general rule, if you see a crash inside a memory allocation routine like malloc() or g_slice_alloc(), you probably have memory corruption. If you see magazine_chain_pop_head(), that’s called by g_slice_alloc() and is a sure sign of memory corruption. Similarly, crashes in GTK’s CSS machinery are almost always caused by memory corruption somewhere else.

Memory corruption is generally impossible to debug unless you are able to reproduce the issue under valgrind. valgrind is extremely slow, so it’s impractical to use it on a regular basis, but it will get to the root of the problem where gdb cannot. As a general rule, you want to run valgrind with --track-origins=yes so that it shows you exactly what went wrong:
```
$ valgrind --track-origins=yes my_app
```
If you cannot reproduce the issue under valgrind, you’re usually totally out of luck. Memory corruption that only occurs rarely or under unknown conditions will lurk in your code indefinitely and cause occasional crashes that are effectively impossible to fix.

Another good tool for debugging memory corruption is address sanitizer (asan), but this is more complicated to use. Experienced users who are comfortable with rebuilding applications using special compiler flags may find asan very useful. However, because it can be very difficult to use, I recommend sticking with valgrind if you’re just trying to report a bug.

Apport and ABRT

There are two popular downstream bug reporting tools: Ubuntu has Apport, and Fedora has ABRT. These tools are relatively easy to use — no command line knowledge required — and produce quality crash reports. Unfortunately, while the tools are promising, the crash reports go to downstream packagers who are generally either not watching bug reports, or else not interested or capable of fixing upstream software problems. Since downstream reports are very often ignored, it’s better to report crashes directly to upstream if you want your issue to be seen by the right developers and actually fixed. Of course, only report issues upstream if you’re using a recent software version. Fedora and Arch users can pretty much always safely report directly to upstream, as can Ubuntu users who are using the very latest version of Ubuntu. If you are an Ubuntu LTS user, you should stick with reporting issues to downstream only, or at least take the time to verify that the issue still occurs with a more recent software version.

There are a couple more problems with these tools. As previously mentioned, Ubuntu’s apport is incompatible with systemd-coredump. If you’ve read this far, you know you really want systemd-coredump enabled, so I recommend disabling apport until it learns to play ball with systemd-coredump.

The technical design of Fedora’s ABRT is currently better because it actually retrieves your core dumps from systemd-coredump, so you don’t have to choose between one or the other. Unfortunately, ABRT has many serious user experience bugs and warts. I can’t recommend it for this reason, but it if it works well enough for you, it does create some great downstream crash reports. Whether a downstream package maintainer will look at those reports is hit or miss, though.

What is a crash, really?

Most developers consider crashes on Unix systems to be program termination via a Unix signal that triggers creation of a core dump. The most common of these are SIGSEGV (segmentation fault, “invalid memory reference”) or SIBABRT (usually an intentional crash due to an assertion failure). Less-common signals are SIGBUS (“bad memory access”) or SIGILL (“illegal instruction”). Sandboxed applications might occasionally see SIGSYS (“bad system call”). See the manpage signal(7) for a full list. These are cases where you can get a backtrace to help with tracking down the issues.

What is not a crash? If your application is hanging or just not behaving properly, that is not a crash. If your application is killed using SIGTERM or SIGKILL — this can happen when systemd-oomd determines you are low on memory, or when a service is taking too long to stop — this is also not a crash in the usual sense of the word, because you’re not going to be able to get a backtrace for it. If a website is slow or unavailable, the news might say that it “crashed,” but it’s obviously not the same thing as what we’re talking about here. The techniques in this blog post are no use for these sorts of “crashes.”

Conclusion

If you have systemd-coredump enabled and debuginfod installed and working, most crash reports will be simple. Memory corruption is a frustrating exception. Encourage your operating system to enable systemd-coredump and debuginfod if it doesn’t already. Happy crash reporting!
September 18, 2021

Category: Fedora

target.com Is More Secure than Your Bank!

No Passkeys in WebKitGTK

Why Was Michael Visiting a Fake github.com?

New Process for Core Apps Changes

Imminent Core App Changes

Future Core App Changes

GNOME 44 Core Apps

Rule 1: Choose Good Upstream Defaults

Rule 2: Prefer Upstream Defaults Downstream

Rule 3: Handle Auto Dependencies and Features with Care

Rule 4: Be Very Careful with Meson’s Build Types

Rule 5: Think about NDEBUG

Rule 6: plain Means “Production Build,” Not “No Flags”

Rule 7: Don’t Forget Hardening Flags

Conclusion

History

Modern Crash Reporting

systemd-coredump

Detour: Manual Core Dump Handling

Debuginfo Installation

debuginfod for Debian Users

Flatpak

Fedora Flatpaks

Miscellany

Alternative Types of Backtraces

gdb Logging

Memory Corruption

Apport and ABRT

What is a crash, really?

Conclusion