Common GLib Programming Errors

July 27, 2022

Let’s examine four mistakes to avoid when writing programs that use GLib, or, alternatively, four mistakes to look for when reviewing code that uses GLib. Experienced GNOME developers will find the first three mistakes pretty simple and basic, but nevertheless they still cause too many crashes. The fourth mistake is more complicated.

These examples will use C, but the mistakes can happen in any language. In unsafe languages like C, C++, and Vala, these mistakes usually result in security issues, specifically use-after-free vulnerabilities.

Mistake 1: Failure to Disconnect Signal Handler

Every time you connect to a signal handler, you must think about when it should be disconnected to prevent the handler from running at an incorrect time. Let’s look at a contrived but very common example. Say you have an object A and wish to connect to a signal of object B. Your code might look like this:

static void
some_signal_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_signal_connect (b, "some-signal", (GCallback)some_signal_cb, a);
}

Very simple. Now, consider what happens if the object B outlives object A, and Object B emits some-signal after object A has been destroyed. Then the line a_do_something (self) is a use-after-free, a serious security vulnerability. Drat!

If you think about when the signal should be disconnected, you won’t make this mistake. In many cases, you are implementing an object and just want to disconnect the signal when your object is disposed. If so, you can use g_signal_connect_object() instead of the vanilla g_signal_connect(). For example, this code is not vulnerable:

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_signal_connect_object (b, "some-signal", (GCallback)some_signal_cb, a, 0);
}

g_signal_connect_object() will disconnect the signal handler whenever object A is destroyed, so there’s no longer any problem if object B outlives object A. This simple change is usually all it takes to avoid disaster. Use g_signal_connect_object() whenever the user data you wish to pass to the signal handler is a GObject. This will usually be true in object implementation code.

Sometimes you need to pass a data struct as your user data instead. If so, g_signal_connect_object() is not an option, and you will need to disconnect manually. If you’re implementing an object, this is normally done in the dispose function:

// Object A instance struct (or priv struct)
struct _A {
  B *b;
  gulong some_signal_id;
};

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_assert (a->some_signal_id == 0);
  a->some_signal_id = g_signal_connect (b, "some-signal", (GCallback)some_signal_cb, a, 0);
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_signal_handler (&a->some_signal_id, a->b);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Here, g_clear_signal_handler() first checks if &a->some_signal_id is 0. If not, it disconnects and sets &a->some_signal_id to 0. Setting your stored signal ID to 0 and checking whether it is 0 before disconnecting is important because dispose may run multiple times to break reference cycles. Attempting to disconnect the signal multiple times is another common programmer error!

Instead of calling g_clear_signal_handler(), you could equivalently write:

if (a->some_signal_id != 0) {
  g_signal_handler_disconnect (a->b, a->some_signal_id);
  a->some_signal_id = 0;
}

But writing that manually is no fun.

Yet another way to mess up would be to use the wrong integer type to store the signal ID, like guint instead of gulong.

There are other disconnect functions you can use to avoid the need to store the signal handler ID, like g_signal_handlers_disconnect_by_data(), but I’ve shown the most general case.

Sometimes, object implementation code will intentionally not disconnect signals if the programmer believes that the object that emits the signal will never outlive the object that is connecting to it. This assumption may usually be correct, but since GObjects are refcounted, they may be reffed in unexpected places, leading to use-after-free vulnerabilities if this assumption is ever incorrect. Your code will be safer and more robust if you disconnect always.

Mistake 2: Misuse of GSource Handler ID

Mistake #2 is basically the same as Mistake #1, but using GSource rather than signal handlers. For simplicity, my examples here will use the default main context, so I don’t have to show code to manually create, attach, and destroy the GSource. The default main context is what you’ll want to use if (a) you are writing application code, not library code, and (b) you want your callbacks to execute on the main thread. (If either (a) or (b) does not apply, then you need to carefully study GMainContext to ensure you do not mess up; see Mistake #4.)

Let’s use the example of a timeout source, although the same style of bug can happen with an idle source or any other type of source that you create:

static gboolean
my_timeout_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
  return G_SOURCE_REMOVE;
}

static void
some_method_of_a (A *self)
{
  g_timeout_add (42, (GSourceFunc)my_timeout_cb, a);
}

You’ve probably guessed the flaw already: if object A is destroyed before the timeout fires, then the call to a_do_something() is a use-after-free, just like when we were working with signals. The fix is very similar: store the source ID and remove it in dispose:

// Object A instance struct (or priv struct)
struct _A {
  gulong my_timeout_id;
};

static gboolean
my_timeout_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
  a->my_timeout_id = 0;
  return G_SOURCE_REMOVE;
}

static void
some_method_of_a (A *self)
{
  g_assert (a->my_timeout_id == 0);
  a->my_timeout_id = g_timeout_add (42, (GSourceFunc)my_timeout_cb, a);
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_handle_id (&a->my_timeout_id, g_source_remove);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Much better: now we’re not vulnerable to the use-after-free issue.

As before, we must be careful to ensure the source is removed exactly once. If we remove the source multiple times by mistake, GLib will usually emit a critical warning, but if you’re sufficiently unlucky you could remove an innocent unrelated source by mistake, leading to unpredictable misbehavior. This is why we need to write a->my_timeout_id = 0; before returning from the timeout function, and why we need to use g_clear_handle_id() instead of g_source_remove() on its own. Do not forget that dispose may run multiple times!

We also have to be careful to return G_SOURCE_REMOVE unless we want the callback to execute again, in which case we would return G_SOURCE_CONTINUE. Do not return TRUE or FALSE, as that is harder to read and will obscure your intent.

Mistake 3: Failure to Cancel Asynchronous Function

When working with asynchronous functions, you must think about when it should be canceled to prevent the callback from executing too late. Because passing a GCancellable to asynchronous function calls is optional, it’s common to see code omit the cancellable. Be suspicious when you see this. The cancellable is optional because sometimes it is really not needed, and when this is true, it would be annoying to require it. But omitting it will usually lead to use-after-free vulnerabilities. Here is an example of what not to do:

static void
something_finished_cb (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
  A *self = user_data;
  B *b = (B *)source_object;
  g_autoptr (GError) error = NULL;

  if (!b_do_something_finish (b, result, &error)) {
    g_warning ("Failed to do something: %s", error->message);
    return;
  }

  a_do_something_else (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  b_do_something_async (b, NULL /* cancellable */, a);
}

This should feel familiar by now. If we did not use A inside the callback, then we would have been able to safely omit the cancellable here without harmful effects. But instead, this example calls a_do_something_else(). If object A is destroyed before the asynchronous function completes, then the call to a_do_something_else() will be a use-after-free.

We can fix this by storing a cancellable in our instance struct, and canceling it in dispose:

// Object A instance struct (or priv struct)
struct _A {
  GCancellable *cancellable;
};

static void
something_finished_cb (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
  B *b = (B *)source_object;
  A *self = user_data;
  g_autoptr (GError) error = NULL;

  if (!b_do_something_finish (b, result, &error)) {
    if (!g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED))
      g_warning ("Failed to do something: %s", error->message);
    return;
  }
  a_do_something_else (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  b_do_something_async (b, a->cancellable, a);
}

static void
a_init (A *self)
{
  self->cancellable = g_cancellable_new ();
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;

  g_cancellable_cancel (a->cancellable);
  g_clear_object (&a->cancellable);

  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Now the code is not vulnerable. Note that, since you usually do not want to print a warning message when the operation is canceled, there’s a new check for G_IO_ERROR_CANCELLED in the callback.

Update 1: I managed to mess up this example in the first version of my blog post. The example above is now correct, but what I wrote originally was:

if (!b_do_something_finish (b, result, &error) &&
    !g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED)) {
  g_warning ("Failed to do something: %s", error->message);
  return;
}
a_do_something_else (self);

Do you see the bug in this version? Cancellation causes the asynchronous function call to complete the next time the application returns control to the main context. It does not complete immediately. So when the function is canceled, A is already destroyed, the error will be G_IO_ERROR_CANCELLED, and we’ll skip the return and execute a_do_something_else() anyway, triggering the use-after-free that the example was intended to avoid. Yes, my attempt to show you how to avoid a use-after-free itself contained a use-after-free. You might decide this means I’m incompetent, or you might decide that it means it’s too hard to safely use unsafe languages. Or perhaps both!

Update 2: My original example had an unnecessary explicit check for NULL in the dispose function. Since g_cancellable_cancel() is NULL-safe, the dispose function will cancel only once even if dispose runs multiple times, because g_clear_object() will set a->cancellable = NULL. Thanks to Guido for suggesting this improvement in the comments.

Mistake 4: Incorrect Use of GMainContext in Library or Threaded Code

My fourth common mistake is really a catch-all mistake for the various other ways you can mess up with GMainContext. These errors can be very subtle and will cause functions to execute at unexpected times. Read this main context tutorial several times. Always think about which main context you want callbacks to be invoked on.

Library developers should pay special attention to the section “Using GMainContext in a Library.” It documents several security-relevant rules:

Never iterate a context created outside the library.
Always remove sources from a main context before dropping the library’s last reference to the context.
Always document which context each callback will be dispatched in.
Always store and explicitly use a specific GMainContext, even if it often points to some default context.
Always match pushes and pops of the thread-default main context.

If you fail to follow all of these rules, functions will be invoked at the wrong time, or on the wrong thread, or won’t be called at all. The tutorial covers GMainContext in much more detail than I possibly can here. Study it carefully. I like to review it every few years to refresh my knowledge. (Thanks Philip Withnall for writing it!)

Properly-designed libraries follow one of two conventions for which main context to invoke callbacks on: they may use the main context that was thread-default at the time the asynchronous operation started, or, for method calls on an object, they may use the main context that was thread-default at the time the object was created. Hopefully the library explicitly documents which convention it follows; if not, you must look at the source code to figure out how it works, which is not fun. If the library documentation does not indicate that it follows either convention, it is probably unsafe to use in threaded code.

Conclusion

All four mistakes are variants on the same pattern: failure to prevent a function from being unexpectedly called at the wrong time. The first three mistakes commonly lead to use-after-free vulnerabilities, which attackers abuse to hack users. The fourth mistake can cause more unpredictable effects. Sadly, today’s static analyzers are probably not smart enough to catch these mistakes. You could catch them if you write tests that trigger them and run them with an address sanitizer build, but that’s rarely realistic. In short, you need to be especially careful whenever you see signals, asynchronous function calls, or main context sources.

Sequel

The adventure continues in Common GLib Programming Errors, Part Two: Weak Pointers.

Comments

4 responses to “Common GLib Programming Errors”

July 28, 2022

Sébastien Wilmet

I’ll read this in the coming days I think, even though I think I no longer do these mistakes.

If you do a second post for this kind of things (this post would have been too long anyway), I would add weak pointers/references handling: for example if we forget to call g_object_remove_weak_pointer().

Another thing, that may silently “work”: when overriding a vfunc, calling the wrong parent vfunc (for example in dispose, call the parent finalize, or the reverse). I wrote a simple static analysis tool to detect that mistake: https://gitlab.gnome.org/swilmet/gdev-c-utils/-/tree/main/src/gcu-check-chain-ups
1. July 29, 2022
  
  Michael Catanzaro
  
  I wish I had included a section on weak pointers. Now I’m not sure whether to edit it and republish or just do a follow-up post.
  
  Chaining up to the wrong vfunc, or not chaining up at all, is another good unfortunate mistake, although not very likely to lead to the same sort of issues that I’ve focused on here.
August 1, 2022

Guido

Thanks for the nice summary. weak pointers and vfunc would indeed be great additions (I recently managed to forget to remove a weak pointer `w_to_b` (contained in GObject `a`) pointing to object b. So when `b` went away and `a` was already disposed the nulling corrupted arbitrary memory – some preexisiting unit tests fortunately helped me to track this down.

I think
“`
if (a->cancellable) {
g_cancellable_cancel (a->cancellable);
g_clear_object (&a->cancellable);
}
“`

can be shortened to

“`
g_cancellable_cancel (a->cancellable);
g_clear_object (&a->cancellable);
“`

as ` g_cancellable_cancel` handles `NULL` gracefully.
1. August 3, 2022
  
  Michael Catanzaro
  
  I updated this blog post to use your suggested improvement. Thanks.