Month: July 2022

Common GLib Programming Errors
Let’s examine four mistakes to avoid when writing programs that use GLib, or, alternatively, four mistakes to look for when reviewing code that uses GLib. Experienced GNOME developers will find the first three mistakes pretty simple and basic, but nevertheless they still cause too many crashes. The fourth mistake is more complicated.

These examples will use C, but the mistakes can happen in any language. In unsafe languages like C, C++, and Vala, these mistakes usually result in security issues, specifically use-after-free vulnerabilities.

Mistake 1: Failure to Disconnect Signal Handler

Every time you connect to a signal handler, you must think about when it should be disconnected to prevent the handler from running at an incorrect time. Let’s look at a contrived but very common example. Say you have an object A and wish to connect to a signal of object B. Your code might look like this:
```
static void
some_signal_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_signal_connect (b, "some-signal", (GCallback)some_signal_cb, a);
}
```
Very simple. Now, consider what happens if the object B outlives object A, and Object B emits some-signal after object A has been destroyed. Then the line a_do_something (self) is a use-after-free, a serious security vulnerability. Drat!

If you think about when the signal should be disconnected, you won’t make this mistake. In many cases, you are implementing an object and just want to disconnect the signal when your object is disposed. If so, you can use g_signal_connect_object() instead of the vanilla g_signal_connect(). For example, this code is not vulnerable:
```
static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_signal_connect_object (b, "some-signal", (GCallback)some_signal_cb, a, 0);
}
```
g_signal_connect_object() will disconnect the signal handler whenever object A is destroyed, so there’s no longer any problem if object B outlives object A. This simple change is usually all it takes to avoid disaster. Use g_signal_connect_object() whenever the user data you wish to pass to the signal handler is a GObject. This will usually be true in object implementation code.

Sometimes you need to pass a data struct as your user data instead. If so, g_signal_connect_object() is not an option, and you will need to disconnect manually. If you’re implementing an object, this is normally done in the dispose function:
```
// Object A instance struct (or priv struct)
struct _A {
  B *b;
  gulong some_signal_id;
};

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_assert (a->some_signal_id == 0);
  a->some_signal_id = g_signal_connect (b, "some-signal", (GCallback)some_signal_cb, a, 0);
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_signal_handler (&a->some_signal_id, a->b);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}
```
Here, g_clear_signal_handler() first checks if &a->some_signal_id is 0. If not, it disconnects and sets &a->some_signal_id to 0. Setting your stored signal ID to 0 and checking whether it is 0 before disconnecting is important because dispose may run multiple times to break reference cycles. Attempting to disconnect the signal multiple times is another common programmer error!

Instead of calling g_clear_signal_handler(), you could equivalently write:
```
if (a->some_signal_id != 0) {
  g_signal_handler_disconnect (a->b, a->some_signal_id);
  a->some_signal_id = 0;
}
```
But writing that manually is no fun.

Yet another way to mess up would be to use the wrong integer type to store the signal ID, like guint instead of gulong.

There are other disconnect functions you can use to avoid the need to store the signal handler ID, like g_signal_handlers_disconnect_by_data(), but I’ve shown the most general case.

Sometimes, object implementation code will intentionally not disconnect signals if the programmer believes that the object that emits the signal will never outlive the object that is connecting to it. This assumption may usually be correct, but since GObjects are refcounted, they may be reffed in unexpected places, leading to use-after-free vulnerabilities if this assumption is ever incorrect. Your code will be safer and more robust if you disconnect always.

Mistake 2: Misuse of GSource Handler ID

Mistake #2 is basically the same as Mistake #1, but using GSource rather than signal handlers. For simplicity, my examples here will use the default main context, so I don’t have to show code to manually create, attach, and destroy the GSource. The default main context is what you’ll want to use if (a) you are writing application code, not library code, and (b) you want your callbacks to execute on the main thread. (If either (a) or (b) does not apply, then you need to carefully study GMainContext to ensure you do not mess up; see Mistake #4.)

Let’s use the example of a timeout source, although the same style of bug can happen with an idle source or any other type of source that you create:
```
static gboolean
my_timeout_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
  return G_SOURCE_REMOVE;
}

static void
some_method_of_a (A *self)
{
  g_timeout_add (42, (GSourceFunc)my_timeout_cb, a);
}
```
You’ve probably guessed the flaw already: if object A is destroyed before the timeout fires, then the call to a_do_something() is a use-after-free, just like when we were working with signals. The fix is very similar: store the source ID and remove it in dispose:
```
// Object A instance struct (or priv struct)
struct _A {
  gulong my_timeout_id;
};

static gboolean
my_timeout_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
  a->my_timeout_id = 0;
  return G_SOURCE_REMOVE;
}

static void
some_method_of_a (A *self)
{
  g_assert (a->my_timeout_id == 0);
  a->my_timeout_id = g_timeout_add (42, (GSourceFunc)my_timeout_cb, a);
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_handle_id (&a->my_timeout_id, g_source_remove);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}
```
Much better: now we’re not vulnerable to the use-after-free issue.

As before, we must be careful to ensure the source is removed exactly once. If we remove the source multiple times by mistake, GLib will usually emit a critical warning, but if you’re sufficiently unlucky you could remove an innocent unrelated source by mistake, leading to unpredictable misbehavior. This is why we need to write a->my_timeout_id = 0; before returning from the timeout function, and why we need to use g_clear_handle_id() instead of g_source_remove() on its own. Do not forget that dispose may run multiple times!

We also have to be careful to return G_SOURCE_REMOVE unless we want the callback to execute again, in which case we would return G_SOURCE_CONTINUE. Do not return TRUE or FALSE, as that is harder to read and will obscure your intent.

Mistake 3: Failure to Cancel Asynchronous Function

When working with asynchronous functions, you must think about when it should be canceled to prevent the callback from executing too late. Because passing a GCancellable to asynchronous function calls is optional, it’s common to see code omit the cancellable. Be suspicious when you see this. The cancellable is optional because sometimes it is really not needed, and when this is true, it would be annoying to require it. But omitting it will usually lead to use-after-free vulnerabilities. Here is an example of what not to do:
```
static void
something_finished_cb (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
  A *self = user_data;
  B *b = (B *)source_object;
  g_autoptr (GError) error = NULL;

  if (!b_do_something_finish (b, result, &error)) {
    g_warning ("Failed to do something: %s", error->message);
    return;
  }

  a_do_something_else (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  b_do_something_async (b, NULL /* cancellable */, a);
}
```
This should feel familiar by now. If we did not use A inside the callback, then we would have been able to safely omit the cancellable here without harmful effects. But instead, this example calls a_do_something_else(). If object A is destroyed before the asynchronous function completes, then the call to a_do_something_else() will be a use-after-free.

We can fix this by storing a cancellable in our instance struct, and canceling it in dispose:
```
// Object A instance struct (or priv struct)
struct _A {
  GCancellable *cancellable;
};

static void
something_finished_cb (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
  B *b = (B *)source_object;
  A *self = user_data;
  g_autoptr (GError) error = NULL;

  if (!b_do_something_finish (b, result, &error)) {
    if (!g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED))
      g_warning ("Failed to do something: %s", error->message);
    return;
  }
  a_do_something_else (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  b_do_something_async (b, a->cancellable, a);
}

static void
a_init (A *self)
{
  self->cancellable = g_cancellable_new ();
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;

  g_cancellable_cancel (a->cancellable);
  g_clear_object (&a->cancellable);

  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}
```
Now the code is not vulnerable. Note that, since you usually do not want to print a warning message when the operation is canceled, there’s a new check for G_IO_ERROR_CANCELLED in the callback.

Update 1: I managed to mess up this example in the first version of my blog post. The example above is now correct, but what I wrote originally was:
```
if (!b_do_something_finish (b, result, &error) &&
    !g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED)) {
  g_warning ("Failed to do something: %s", error->message);
  return;
}
a_do_something_else (self);
```
Do you see the bug in this version? Cancellation causes the asynchronous function call to complete the next time the application returns control to the main context. It does not complete immediately. So when the function is canceled, A is already destroyed, the error will be G_IO_ERROR_CANCELLED, and we’ll skip the return and execute a_do_something_else() anyway, triggering the use-after-free that the example was intended to avoid. Yes, my attempt to show you how to avoid a use-after-free itself contained a use-after-free. You might decide this means I’m incompetent, or you might decide that it means it’s too hard to safely use unsafe languages. Or perhaps both!

Update 2: My original example had an unnecessary explicit check for NULL in the dispose function. Since g_cancellable_cancel() is NULL-safe, the dispose function will cancel only once even if dispose runs multiple times, because g_clear_object() will set a->cancellable = NULL. Thanks to Guido for suggesting this improvement in the comments.

Mistake 4: Incorrect Use of GMainContext in Library or Threaded Code

My fourth common mistake is really a catch-all mistake for the various other ways you can mess up with GMainContext. These errors can be very subtle and will cause functions to execute at unexpected times. Read this main context tutorial several times. Always think about which main context you want callbacks to be invoked on.

Library developers should pay special attention to the section “Using GMainContext in a Library.” It documents several security-relevant rules:
- Never iterate a context created outside the library.
- Always remove sources from a main context before dropping the library’s last reference to the context.
- Always document which context each callback will be dispatched in.
- Always store and explicitly use a specific GMainContext, even if it often points to some default context.
- Always match pushes and pops of the thread-default main context.
If you fail to follow all of these rules, functions will be invoked at the wrong time, or on the wrong thread, or won’t be called at all. The tutorial covers GMainContext in much more detail than I possibly can here. Study it carefully. I like to review it every few years to refresh my knowledge. (Thanks Philip Withnall for writing it!)

Properly-designed libraries follow one of two conventions for which main context to invoke callbacks on: they may use the main context that was thread-default at the time the asynchronous operation started, or, for method calls on an object, they may use the main context that was thread-default at the time the object was created. Hopefully the library explicitly documents which convention it follows; if not, you must look at the source code to figure out how it works, which is not fun. If the library documentation does not indicate that it follows either convention, it is probably unsafe to use in threaded code.

Conclusion

All four mistakes are variants on the same pattern: failure to prevent a function from being unexpectedly called at the wrong time. The first three mistakes commonly lead to use-after-free vulnerabilities, which attackers abuse to hack users. The fourth mistake can cause more unpredictable effects. Sadly, today’s static analyzers are probably not smart enough to catch these mistakes. You could catch them if you write tests that trigger them and run them with an address sanitizer build, but that’s rarely realistic. In short, you need to be especially careful whenever you see signals, asynchronous function calls, or main context sources.

Sequel

The adventure continues in Common GLib Programming Errors, Part Two: Weak Pointers.
July 27, 2022

Best Practices for Build Options

Build options are sometimes tricky to get right. Here’s my take on best practices. The golden rule is to set good upstream defaults. Everything else follows from this.

Rule 1: Choose Good Upstream Defaults

Occasionally I see upstream developers complain that a downstream operating system has built their software “incorrectly,” generally because some important dependency or feature has been disabled. Sometimes downstreams really do mess up, but more often poor upstream defaults are to blame. Upstreams must set good defaults because upstream software developers know far more about their projects than downstream packagers do. Upstreams generally have a good idea of how they expect software to be built by downstreams, whereas downstreams generally do not. Accordingly, do the thinking upstream whenever possible. When you set good defaults, it becomes easier for downstreams to build your software the way you expect, because active effort is required for downstreams to mess things up.

For example, say a project has the following two build options:

Option Name	Default Value
`--enable-thing-you-usually-want-enabled`	`false`
`--disable-thing-you-rarely-want-disabled`	`true`

The thing you usually want enabled is not enabled by default, and the thing you rarely want disabled is disabled by default. Sad. Unfortunately, this pattern used to be extremely common with Autotools build systems, because in the real world, the names of the options are more subtle than this, and also because nobody likes squinting at configure.ac files to audit whether the options make sense. Meson build systems tend to be somewhat better because meson_options.txt is separate from the rest of the build definitions, making it easier to review all your options and check to ensure their defaults are set appropriately. However, there are still a few more subtle ways you can mess up your Meson build system, which I’ll discuss below.

Rule 2: Prefer Upstream Defaults Downstream

Conversely, downstreams should not second-guess upstream defaults unless you have a good reason to do so and really know what you’re doing.

For example, glib-networking’s Meson build system provides you with two different TLS backend options: OpenSSL or GnuTLS. The GnuTLS backend is enabled by default (well, sort of, see the next section on auto dependencies) while the OpenSSL backend is disabled by default. There’s a good reason for this: the OpenSSL backend is half-baked, partially due to bugs in glib-networking, and partially because OpenSSL just cannot do certain things that GnuTLS can. The OpenSSL backend is provided because some environments really do require it for license reasons, but it’s not the right choice for general-purpose operating systems. It may be tempting to think that you can pick whichever library you prefer, but you really should not.

Another example: WebKitGTK’s CMake build system provides a USE_WPE_RENDERER build option, which is enabled by default. This option controls which graphics rendering stack is used: if enabled, rendering uses libwpe and wpebackend-fdo, whereas if disabled, rendering uses a legacy internal Wayland compositor. The option is provided because libwpe and wpebackend-fdo are newer dependencies that are expected to be unavailable on older (pre-2020) operating systems, so older operating systems legitimately need to be able to disable it. But this configuration receives little serious testing and the upstream developers do not notice when it breaks, so you really should not be using it unless you have to. This recently caused rendering bugs that appeared to be distribution-specific, which upstream developers were not willing to investigate because upstream developers could not reproduce the issue.

Sticking with upstream defaults is generally safest. Sometimes you really need to override them. If so, go ahead. Just be careful.

Rule 3: Handle Auto Dependencies and Features with Care

The worst default ever is “build with feature enabled only if dependency xyz is installed; otherwise, disable it.” This is called an auto dependency. If using CMake or Autotools, auto dependencies are almost never permissible, and in this case “handle with care” means repent and fix it. Auto dependencies are acceptable only if you are using the Meson build system.

The theory behind auto dependencies is that it’s convenient for people casually building the software to do so with the fewest number of build errors possible, which is true. Problem is, this screws over serious production builds of your software by requiring your downstreams to possess magical knowledge of what dependencies are required to build your software properly. Users generally expect most features to be enabled at build time, but if upstream uses auto dependencies, the result is a build dependencies lottery: your feature will be enabled or disabled due to luck, based on which downstream build dependencies transitively depend on which other build dependencies. Even if it’s built properly today, that could easily change tomorrow when some other dependency changes in some other package. Just say no. Do not expect downstreams to look at your build system at all, let alone study the possible build options and make accurate judgments about which build dependencies are required to enable them. Avoiding auto dependencies is part of setting good upstream defaults.

Look at this example from WebKit’s OptionsGTK.cmake:

if (ENABLE_SPELLCHECK)
    find_package(Enchant)
    if (NOT PC_ENCHANT_FOUND)
        message(FATAL_ERROR "Enchant is needed for ENABLE_SPELLCHECK")
    endif ()
endif ()

ENABLE_SPELLCHECK is ON by default. If you don’t have enchant installed, the build will fail unless you manually disable it by passing -DENABLE_SPELLCHECK=OFF". This makes it hard to mess up: downstreams have to make an intentional choice to build with spellchecking disabled. It cannot happen by accident.

Many projects would instead write it like this:

if (ENABLE_SPELLCHECK)
    find_package(Enchant)
    if (NOT PC_ENCHANT_FOUND)
        set(ENABLE_SPELLCHECK OFF)
    endif ()
endif ()

But this is an auto dependency, which results in downstream build dependency lottery. If you write your build system like this, you cannot complain when the feature winds up disabled by mistake in downstream builds. Don’t do this.

Exception: if you use Meson, auto dependencies are acceptable if you use the feature option type and set the default to auto. Although auto features are silently enabled or disabled by default depending on whether the required dependency is present, you can easily override this behavior for serious production builds by passing -Dauto_features=enabled, which enables all the auto features and will result in build failures if dependencies are missing. All major Linux operating systems do this when building Meson packages, so Meson’s auto features should not cause problems.

Rule 4: Be Very Careful with Meson’s Build Types

Let’s categorize software builds into production builds or non-production builds. A production build is intended to be either distributed to end users or else run production workloads, whereas a non-production build is intended for testing or development and might have extra debug features enabled, like slow assertions. (These are more commonly called release builds or debug builds, but that terminology would be very confusing in the context of this discussion, as you’re about to see.)

The CMake and Meson build systems give us more than just two build types. Compare CMake build types to the corresponding Meson build types:

CMake Build Type	Meson Build Type	Meson `debug` Option	Production Build? (excludes Windows)
`Release`	`release`	`false`	Yes
`Debug`	`debug`	`true`	No
`RelWithDebInfo`	`debugoptimized`	`true`	Yes, be careful!
`MinSizeRel`	`minsize`	`true`	Yes, be careful!
N/A	`plain`	`false`	Yes

To simplify, let’s exclude Windows from the discussion for now. (We’ll come back to Windows in a bit.) Now, notice the nomenclature difference between CMake’s RelWithDebInfo (“release with debuginfo”) build type versus Meson’s debugoptimized build type. This build type functions exactly the same for both Meson and CMake, but CMake’s name is better because it clearly indicates that this is a release or production build type, whereas the Meson name seems to indicate it is a debug or non-production build type, and Meson’s debug option is set to true. In fact, it is an optimized production build with debuginfo enabled, the same style of build that almost all Linux operating systems use for their packages (although operating systems use the plain build type instead). The same problem exists for Meson’s minsize build type. This is another production build type where debug is true.

The Meson build type name accurately reflects that the debug option is enabled, but this is very confusing because for most platforms, that option only controls whether debuginfo is generated. Looking at the table above, you can see that you must never use the debug option alone to decide whether you have a production build or a non-production build. As the table indicates, the only non-production build type is the vanilla debug build type, which you can detect by checking the combination of the debug and optimization options. You have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build. I wrote this in bold because it is important and not at all obvious. (However, before applying this rule in a cross-platform project, keep reading below to see the huge caveat regarding Windows.)

Here’s an example of what not to do in your meson.build:

# Use debug/optimization flags to determine whether to enable debug or disable
# cast checks
gtk_debug_cflags = []
debug = get_option('debug')
optimization = get_option('optimization')
if debug
  gtk_debug_cflags += '-DG_ENABLE_DEBUG'
  if optimization in ['0', 'g']
    gtk_debug_cflags += '-DG_ENABLE_CONSISTENCY_CHECKS'
  endif
elif optimization in ['2', '3', 's']
  gtk_debug_cflags += ['-DG_DISABLE_CAST_CHECKS', '-DG_DISABLE_ASSERT']
endif

This is from GTK’s meson.build. The code based only on the optimization option is OK, but the code that sets -DG_ENABLE_DEBUG is looking only at the debug option. What the code really wants to do is set G_ENABLE_DEBUG if this is a non-production build, but instead it is tied to debuginfo, which is not the desired result. Downstreams are forced to scratch their heads as to what they should do. Impassioned build engineers have held spirited debates about this particular meson.build snippet. Don’t do this! (I will submit a merge request to improve this.)

Here’s a much better, although still not perfect, example of how to do the same thing, this time from GLib’s meson.build:

# Use debug/optimization flags to determine whether to enable debug or disable
# cast checks
glib_debug_cflags = []
glib_debug = get_option('glib_debug')
if glib_debug.enabled() or (glib_debug.auto() and get_option('debug'))
  glib_debug_cflags += ['-DG_ENABLE_DEBUG']
  message('Enabling various debug infrastructure')
elif get_option('optimization') in ['2', '3', 's']
  glib_debug_cflags += ['-DG_DISABLE_CAST_CHECKS']
  message('Disabling cast checks')
endif

if not get_option('glib_assert')
  glib_debug_cflags += ['-DG_DISABLE_ASSERT']
  message('Disabling GLib asserts')
endif

if not get_option('glib_checks')
  glib_debug_cflags += ['-DG_DISABLE_CHECKS']
  message('Disabling GLib checks')
endif

Notice how GLib provides explicit build options that allow downstreams to decide whether debug should be enabled or not. Using explicit build options here was a good idea! The defaults for glib_assert and glib_checks are intentionally set to true to encourage their use in production builds, while G_DISABLE_CAST_CHECKS is based only on the optimization level. But sadly, if not explicitly configured, GLib sets the value of the glib_debug_cflags option automatically, based on only the value of the debug option. This is actually an OK use of an auto feature, because it is a carefully-considered attempt to provide good default behavior for downstreams, but it fails here because it assumes that debug means “non-production build,” which we have previously established cannot be determined without checking optimization as well. (I will submit a merge request to improve this.)

Here’s another helpful table that shows how the various build types correspond to CFLAGS:

CMake/Meson Build Type	CMake `CFLAGS`	Meson `CFLAGS`
`Release`/`release`	`-O3 -DNDEBUG`	`-O3`
`Debug`/`debug`	`-g`	`-O0 -g`
`RelWithDebInfo`/`debugoptimized`	`-O2 -g -DNDEBUG`	`-O2 -g`
`MinSizeRel`/`minsize`	`-Os -DNDEBUG`	`-Os -g`

Notice Meson’s minsize build type includes debuginfo, while CMake’s does not. Since debuginfo requires a huge amount of space, CMake’s behavior seems better here. We’ll discuss NDEBUG momentarily.

OK, so that all makes sense, right? Well I thought so too, until I ran a draft of this blog post past Jussi, who pointed out that the Meson build types function completely differently on Windows than they do on other platforms. Unfortunately, whereas on most platforms the debug option only controls debuginfo generation, on Windows it instead controls whether the C library enables extra runtime debugging checks. So while debugoptimized and minsize are production build types on Linux and have nice corresponding CMake build types, they are non-production build types on Windows. This is a Meson defect. The point to remember is that the debug option is completely different on Windows than it is on other platforms, so my otherwise-nice rule for detecting production builds does not work properly on Windows. Cross-platform projects need to be especially careful with the debug option. There are various ways this could be fixed in Meson in the future: a nice simple proposal would be to add a new debuginfo option separate from debug, then deprecate the debugoptimized build type and replace it with releasewithdebuginfo.

CMake dodges all these problems and avoids any ambiguity because its build types are named differently: “RelWithDebInfo” and “MinSizeRel” leave no doubt that you are dealing with a release (production) build.

Rule 5: Think about NDEBUG

The other behavior difference visible in the table above is that CMake defines NDEBUG for its production build types, whereas Meson has a separate option bn_debug that controls whether to define NDEBUG. NDEBUG controls whether the C and C++ assert() macro is enabled: if this value is defined, asserts are disabled. CMake is the only build system that defines NDEBUG for you automatically. You really need to think about this: if your software is performance-sensitive and contains slow assertions, the consequences of messing this up are severe, e.g. see this historical mesa bug where Fedora’s mesa package suffered a 10x slowdown because mesa upstream accidentally enabled assertions by default. Again, please, do not blame downstreams for bad upstream defaults: downstreams are (usually) not experts on upstream software, and cannot possibly be expected to pick better defaults than upstream’s.

Meson allows developers to explicitly choose whether to enable assertions in production builds. Assertions are enabled in production by default, the opposite of CMake’s behavior. Some developers prefer that all asserts be disabled in production builds to optimize speed as far as possible, but this is usually not the best choice: having assertions enabled in production provides valuable confidence that your code is actually functioning as intended, and often improves security by converting many code execution exploits into denial of service. Most assertions do not have noticeable performance impact, so I prefer to leave most assertions enabled by default in production, and disable only asserts that are slow. Hence, I like Meson’s default behavior. But many engineers disagree with me, and some projects really do need assertions disabled in production; in particular, everyone agrees that performance-sensitive assertions should not be running in production builds. If you’re using Meson and want assertions disabled in production builds, you’re in theory supposed to use b_ndebug=if-release, but it doesn’t actually work because it only disables assertions if your build type is release or plain, while leaving assertions enabled for debugoptimized and minsize builds. We’ve already established that these are both production build types, so sadly that behavior is broken. Instead, it’s better to manually define NDEBUG except in non-production builds. Again, you have a non-production (debug) build when debug is true and if optimization is 0 or g; otherwise, you have a production build (except on Windows).

Rule 6: plain Means “Production Build,” Not “No Flags”

The GNOME release team recently had an exciting debate about the meaning of Meson’s plain build type. It is impressive how build engineers can be so enthusiastic about build options!

I asked Jussi to explain the plain build type. His response was: “Meson does not, by itself, add compiler flags,” emphasis mine. It does not mean your project should not add its own compiler flags, and it certainly does not mean it’s OK to set bad defaults as long as they are vanilla-flavored. It is a production build type, and you should ensure that it receives defaults in line with the other production build types. You’ll be fine if you follow the same rule we already established: you have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build (except on Windows).

The plain build type exists because it makes it easier for downstreams to implement their own compiler flags. Downstreams have to pass -O2 -g via CFLAGS because CMake and Meson are the only build systems that can do this automatically, and it’s easier to let downstreams disable this functionality than to force downstreams to set different CFLAGS separately for each supported build system.

Rule 7: Don’t Forget Hardening Flags

Sadly, by default all build systems generate insecure, unhardened binaries that should never be used in production. This is true of Autotools, CMake, Meson, and likely also whatever other build system you are thinking of. You must manually add your own hardening flags or your builds will be insecure. Unfortunately this is a little complicated to do. Fedora and RHEL’s recommended compiler flags are documented here. The freedesktop-sdk and GNOME Flatpak runtimes use these recommendations as the basis for their compiler flags, and by default, so do Flatpak applications based on these runtimes. It’s actually not very easy to replicate the same hardening flags since libraries and executables require different flags, so naively setting CFLAGS is not possible. Fedora and RHEL use GCC spec files to achieve this, whereas freedesktop-sdk relies on building GCC itself with a non-default configuration (yes, second-guessing upstream defaults). The good news is that all major downstreams have figured this out one way or another, so you only need to worry about it if you’re doing your own production builds.

Conclusion

That’s all you need to know. Remember, upstreams know their software better than downstreams do, so the hard thinking should happen upstream. We can minimize mistakes and trouble if upstreams carefully set good defaults, and if downstreams deviate from those defaults only when truly necessary. Keep these rules in mind to avoid unnecessary bug reports from dissatisfied users.

History

I updated this blog post on August 3, 2022 to change the primary guidance to “You have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build.” Originally, I failed to consider g. -Og means “optimize debugging experience” and it is supposedly a better choice than -O0 for debugging according to gcc(1). It’s definitely not actually, but at least that’s the intent.

Jussi responded to this blog post on August 13, 2022 to discuss why Meson’s build types don’t work so well. Read his response.

July 15, 2022

Month: July 2022

Common GLib Programming Errors

Mistake 1: Failure to Disconnect Signal Handler

Mistake 2: Misuse of GSource Handler ID

Mistake 3: Failure to Cancel Asynchronous Function

Mistake 4: Incorrect Use of GMainContext in Library or Threaded Code

Conclusion

Sequel

Best Practices for Build Options

Rule 1: Choose Good Upstream Defaults

Rule 2: Prefer Upstream Defaults Downstream

Rule 3: Handle Auto Dependencies and Features with Care

Rule 4: Be Very Careful with Meson’s Build Types

Rule 5: Think about NDEBUG

Rule 6: plain Means “Production Build,” Not “No Flags”

Rule 7: Don’t Forget Hardening Flags

Conclusion

History