Asynchronous State Machines with Fibers

Writing state machines gets a bit of a bad reputation because they are often implemented in complex manners which are specific to the problem domain. I think that makes people shy away from writing them when they are truly beneficial, including myself.

Where they often go awry is when you have some sort of work that needs to be done asynchronously. This is exceedingly common in UI programming like GTK applications but just as easily found in daemons.

Because of this, I see people explicitly avoiding the state machine, or worse, implicitly avoiding its correctness by open-coding a solution across a dozen callbacks.

With DexLimiter and DexFiber I find I can write these state machines better.

You can use the limiter with a max-concurrency of 1 to get an “asynchronous Mutex” of sorts. No lock management necessary.

static void
password_daemon_init (PasswordDaemon *daemon)
{
  daemon->limiter = dex_limiter_new (1);
}

Imagine, if you will, that a limiter is a mutex plus a callback/closure which fires as soon as a slot is free. That means we need a little state to send to our transition callback.

/* Define our closure state for a transition */
DEX_DEFINE_CLOSURE_TYPE (StateTransition, state_transition,
  DEX_DEFINE_CLOSURE_OBJECT (PasswordDaemon, daemon),
  DEX_DEFINE_CLOSURE_VALUE (PasswordDaemonMode, target))

That is a nice wrapper around defining a struct with a new and free function.

Now we can request a transition of the state machine. Since our DexLimiter is an asynchronous mutex (with a single runnable slot), the fiber will not be spawned until it is the highest priority.

DexFuture *
password_daemon_transition (PasswordDaemon     *daemon,
                            PasswordDaemonMode  mode)
{
  StateTransition *transition;

  transition = state_transition_new ();
  transition->daemon = g_object_ref (daemon);
  transition->target = mode;

  return dex_limiter_run (daemon->limiter, NULL, 0,
                          password_daemon_transition_fiber,
                          transition,
                          state_transition_free);
}

That makes our transition code very clean when you combine the fiber with g_autoptr() and dex_await() to await the completion of futures. So a state machine might look like the following:

static DexFuture *
password_daemon_transition_fiber (gpointer user_data)
{
  TransitionState *state = user_data;
  GError *error = NULL;

  switch (state->target)
    {
     case PASSWORD_DAEMON_MODE_HANDOFF:
       if (state->daemon->mode != PASSWORD_DAEMON_MODE_INITIAL)
         return invalid_transition (state->daemon->mode,
                                    state->target);

       if (!password_daemon_enter_handoff (self, &error))
         return dex_future_new_for_error (&error);

       break;

      case PASSWORD_DAEMON_MODE_LOCKED:
        ...

      case PASSWORD_DAEMON_MODE_UNLOCKED:
        ...
    }

  return dex_future_new_enum (state->daemon->mode);
}

static gboolean
password_daemon_enter_handoff (PasswordDaemon  *daemon,
                               GError         **error)
{
  GSocket *control;

  if (!(control = dex_await_object (create_socket (), error)))
    return FALSE;

  ...
}

What I find nice about this is enter/leave transition components can be customized for the state machine transition. That leaves room for transitions between states which require specialization for correctness.

This is much cleaner than ad-hoc callbacks chained together because you can await in the transition fiber for asynchronous work to complete and the state machine itself is preserved. No shoving temporary state in your class instance. No testing hell to see if you caught all the failure cases. No pain with sequencing or order of main loop processing.

Hopefully that shows you can use libdex to write more correct and cleaner state machines by keeping the majority of the implementation in one place.

Testing Keyboard Input Latency

I occasionally see people go through great effort to do end-to-end testing of keyboard input latency. That is fantastic but it requires hardware and patience I don’t, nor will ever, have.

Here is a much simpler way to get about 90% of the value. For example, everything but driver/interrupt handler latency and display link scanout/monitor visibility latency and of course your app side (but you could theoretically rig this up to do that too, inside your app). Not that those aren’t important, but they definitely fall into the category of things I personally cannot control for you.

Keyspeed is a very simple GTK application which uses /dev/uinput to synthesize keypresses. Since it knows the time of provenance, it can compare that to when it gets the event back from compositor delivery.

Wrap all that data up in Sysprof capture marks, pull in some from the compositor (GNOME Shell/Mutter support this), tie in some callgraphs/flamegraphs, and you have a very good overview of what is going on during your keypress.

Run it like this (but remember to chmod back when you’re done less you have attack surface available).

$ sudo chmod 660 /dev/uinput
$ git clone https://gitlab.gnome.org/chergert/keypress
$ sudo dnf install sysprof-devel libinput-devel gtk4-devel
$ make
$ sysprof-cli --gtk --gnome-shell capture.syscap -- ./keyspeed
$ sysprof capture.syscap

a screenshot of keypress. key being press/release on left, time it took on the right.

Currently, this only shows you keypress send to receive in GTK, but if someone cared enough, you could make it take the next GtkFrameTimings and use that to get the presentation time. I don’t need that for what I’m doing, so it doesn’t.

If you go to the marks section, you can dive in to a specific keypress/release cycle. Zoom in on just that section, switch back to callgraph/flamegraph profiler and see what was going on.

Pretty simple, no special hardware needed.

You can see how long it took, where time was spent, and more importantly, how much time was empty between things that matter.

A screenshot of sysprof showing the marks section with timing information for minutia happening across GTK/Mutter for full event delivery timing.

A screenshot of sysprof showing the marks section with timing information for minutia happening across GTK/Mutter for full event delivery timing.

A Data Layer for GTK applications

Gom is a very old object mapper I wrote to bridge GObject to SQLite. It made a lot of assumptions about the world based on when it was prototyped.

The past couple years had me using it again for the documentation search in Manuals. Typically, I would have just built Manuals to parse all the XML files on disk and hold them in memory. That’s how both Devhelp and Builder always did things. Once we started supporting Flatpak SDKs that was no longer realistic. You could have numerous SDKs all with copies of the overlapping data and it just became easier to have a query model.

One of the more performance critical limitations was the locking model. When gom-1.0 was written, it was not common for distributions to compile SQLite with locking support. So you just created a single thread and did your work over there.

Bolting fulltext search and many other missing features onto the old ABI just wasn’t realistic. Especially when I’ve wanted to make the thing properly async for years. One of my other projects, Libdex is just right over there and perfect for this sort of problem.

The landscape changed and so do our horizons.

A new informed ABI

In the years after Gom was prototyped, I worked at a commercial database company and learned a great deal about implementing the internals of both that database and more traditional RDBMS. That left a certain cringe on my mouth whenever looking at my code predating it. Knowing how things get done inside the database allows for building better APIs to interact with it.

This time everything is async. Queries are modeled like you do with a compiler. Lowered into the back-end specific implementation. There can be an entity map and real transactions which allows you to read back the same instance despite which query inflated it.

The Center

Your early stage objects are the GomRepository, GomDriver, and GomRegistry.

The registry describes the entities that can exist within the repository. This is handy because it allows us to pre-compile information into a model that is both immutable and fast at runtime. Compare that to methods like g_object_class_list_properties() which is a performance bottleneck of its own.

The driver is very obvious. It is our abstraction layer for the database engines. Currently we have support for SQLite and PostgreSQL.

The repository is the center of the center. It is how you query, insert, update, delete, transact, and more. It is likely your application instance owns one of these unless of course you use Gom as your file format in which case you’ll have one per “document”.

Two Access Models

This new version of Gom can support either the entity mapping you’re used to or; optionally, raw access to relations/projections via the GomCursor.

As the cursor moves through the resulting rows you will have access to all the projections requested in the query. Though it holds enough information to allow you to gom_cursor_materialize() the row into an GomEntity subclass.

If you want a snapshot of that cursor row without materializing, you can use GomRecord which can also conveniently be used in GListModel for integration into GTK applications.

Most of the time, you’ll use materialization. And even then it is likely to happen through automated collections rather than with a cursor directly. More on that later.

Sessions

As I mentioned, there was no concept of transactions previously.

In this iteration we have GomSession. It is your standard identity-map layer with transaction-scoping. If you perform multiple queries for the same record, the session will ensure you get the same instance back. That is essential when you do local mutations on an instance and what to see that reflected in followup queries.

Additionally, it makes it nice to have multiple views of an object with an editor or listview and needing them to stay in sync.

Relationship Modeling

Support for relationships was adhoc previously. We had some functions named in ways that made you think you could, but I assure you, they were not well tested.

This time around you can model your GomEntity with 1:1, 1:M, M:M, inverse, self-referencing, all while handling proper delete rules. Combing this with the session support mentioned previously is crucial.

So now you should be able to show related models easily in GtkListView while keeping the paginated-and-lazy model beneath it transactional.

Migrations

In the previous version migrations were dynamic, but largely controlled by Gom itself. Very inflexible.

This time around we have things broken down into Migrator and Migration.

You can use built-in implementations like the EntityMigrator or implement your own. CustomMigrator makes that easy. Especially since you can inject your own migrations at just the right point.

Internally, libgom-2 can snapshot your GomRegistry at specific versions based on the provided metadata. Then it performs a diff between two versions of the registry to determine what migration work must be done.

You can just as easily use a SqlMigration with custom SQL scripts. This stuff is all highly composable now to get exactly what you need.

Live List Models

I’ve written many ways to get live SQLite results into GTK over the past two decades. I think one of the first was a GtkTreeModel implementation for GTK 2 which could do it. With that in mind, it was still rather annoying when making Manuals so I set off to make that convenient.

We have GomRecordListModel, GomEntityListModel, GomRelatedListModel, GomQueryModel all of which have practical uses based on application needs.

But in short, most of those are lazy and support transaction-backed stable identities for entities. Very useful when you have a list of items and an editor loaded in another frame, both of which must reflect the same data.

Expression Trees

This time around I implemented proper expression trees. They model the query, relations, and projections in a manner that allows the driver to lower into a query much more accurately.

You can model things like function-calls cleanly all of which required writing manual SQL before. If you did anything outside of what gom could generate previously, it became madness to maintain.

Vectors

This version of libgom embeds the vec1 extension for SQLite. That means we can store vectors in your records and query them. GomVector makes that easier to manage as a property within your application entity.

I can think of a few things this will be useful for, maybe you can too.

Profiling

This version of libgom has profiling support with another project of mine, Sysprof. The whole library emits profiler marks about what is going on so that it is easy for you to figure out why something might be slow in your application.

Since we’ve already done the integration of Sysprof into GLib/GObject, GTK, Pango, Libdex, and GNOME Shell/Mutter you can very quickly get an idea with details of what is going on in your application. Click record, select the problem area, zoom, and it is often pretty clear. You can have flamegraphs, callgraphs, and timing marks all in one place.

A screenshot of Sysprof in the marks section. A zoomed in timeline across the top, which marks in rows sorted by group/category/name. The timeline boxes show the time region they occupied. In the boxes are the message associated with the mark.

Local First with Sync Coordination

One of my personal motivations for this is around building a native sync protocol for applications I’m building. I wrote numerous SQLite-based sync protocols for the now defunct catch.com before they were acquired by apple. That means I know multiple wrong ways to do it.

This time around, I want to put it right in the data-mapper at the point where you have the most insight. So libgom has the right abstractions in place to build that. The GomSyncCoordinator manages the process and GomSyncTransport is the abstraction-point for service integration.

You work with GomDelta at this layer. The application can provide you with a GomMergePolicy to help make decisions which allow for contextually doing the right thing.

This part is still very new. I’m still building the other side of it but landing the shape early allows me to mock and test things comprehensively before committing to the ABI.

My goal is building a practical, robust, and correct implementation for personal local first features.

A small personal note: as I wrote in my recent update from France, I am no longer employed by Red Hat. Work like this is currently self-funded, out of pocket, while my family and I settle into a new chapter. If you find it useful, a note of encouragement or a contribution means a lot right now. It helps make it possible to keep improving the free software infrastructure many of us rely on.

Stackless Coroutines in Libdex

Fibers are always a nice way to keep your async C code clean while using Libdex. However, occasionally you may want a lighter option which doesn’t require a stack or saving registers for work doing little more than coordinating futures.

I’ve added Stackless Coroutines for this which still allows writing future-coordinating code. Though this will suspend/resume your coroutine by re-entering the function and jumping to the next position. Your threads stack is reused. State is saved in your closure state.

This isn’t a new concept. It is really old just like fibers. What is useful is that this style of continuation passing may still be represented as a DexFuture and therefore composed like the others.

You can place these stackless coroutines in DexTaskGroup alongside fibers, threadpool work, and others. Cancellation will propagate to a clean exit point of the coroutine just like it would with a fiber.

Overhead is a bit lower than fibers in synthetic benchmarks depending on use. I was actually impressed our fiber implementation performed as well as it did head-to-head.

To make building your coroutine continuation easier, libdex provides a handy macro to create your typedef struct, _new(), and _free() helpers in a single macro expansion using DEX_DEFINE_CLOSURE_TYPE().

You use it like this:

DEX_DEFINE_CLOSURE_TYPE (MyTaskState, my_task_state,
  DEX_DEFINE_CLOSURE_VALUE (gsize, bytes),
  DEX_DEFINE_CLOSURE_POINTER (GBytes *, bytes_obj, g_bytes_unref),
  DEX_DEFINE_CLOSURE_OBJECT (GSocketConnection, conn))

Coroutines cannot use the exact syntax that fibers do for awaiting, which is a bummer, but a side-effect of trying to make something that works across Linux, Windows, FreeBSD, Solaris, macOS, etc. Particularly because the implementation must use switch/case to stay portable without address-of-label support on MSVC nor clang-cl.exe.

So awaiting is a bit more clear you’re suspending/resuming the stackless coroutine.

DEX_DEFINE_CLOSURE_TYPE (LoadState, load_state,
  DEX_DEFINE_CLOSURE_OBJECT (GFile, file),
  DEX_DEFINE_CLOSURE_OBJECT (GFileInputStream, input),
  DEX_DEFINE_CLOSURE_OBJECT (GFileInfo, info),
  DEX_DEFINE_CLOSURE_VALUE (int, io_priority))

static DexFuture *
do_something (DexCoroutineContext *context,
              gpointer             user_data)
{
  LoadState *state = user_data;
  g_autoptr(GError) error = NULL;

  DEX_COROUTINE_BEGIN (context);

  DEX_COROUTINE_SUSPEND_OBJECT (
    &state->input, &error,
    dex_file_read (state->file, state->io_priority));

  if (error != NULL)
    return dex_future_new_for_error (g_steal_pointer (&error));

  DEX_COROUTINE_SUSPEND_OBJECT (
    &state->info, &error,
    dex_file_input_stream_query_info (
      state->input,
      G_FILE_ATTRIBUTE_STANDARD_SIZE,
      state->io_priority));

  if (error != NULL)
    return dex_future_new_for_error (g_steal_pointer (&error));

  /* maybe do something useful here */

  return dex_future_new_int64 (g_file_info_get_size (state->info));

  DEX_COROUTINE_END;
}

You do need to be careful about placing things on the stack, because they wont be there on the other side of that DEX_COROUTINE_SUSPEND_* macro expansion. That is because when the scheduler jumps back into your stackless coroutine, it will use a switch/case to jump to the next bit of code. Don’t fear though, just add your state to your continuation which we’ve established is easy to do now.

If you don’t like these macros, you can do things the manual way using dex_coroutine_context_suspend() and dex_coroutine_context_resume() who’s APIs are not terrible either. They do require you make up your own program-counter regime though which for the macro case is basically just __COUNTER__.

You can spawn your coroutine using dex_scheduler_spawn_coroutine() or as part of a work-queue in DexLimiter with dex_limiter_run_coroutine().

dex_scheduler_spawn_coroutine (
  dex_thread_pool_scheduler_get_default (),
  my_coroutine,
  my_coroutine_new (),
  (GDestroyNotify) my_coroutine_free);

I hope you've enjoyed this attempt to make another 1970s technology useful in a modern world.

Libdex Improvements

libdex 1.2 is still in pre-alpha phase but it is also far enough along that it is worth talking about the direction: libdex is growing from a library of future and fiber helpers into a more complete concurrency toolkit.

The most important 1.2 theme is that applications can now describe not just what work should happen concurrently, but how that work should be bounded and owned. DexLimiter lets a workload run with a fixed concurrency budget, with dex_limiter_run() handling the common fiber case by acquiring a permit before work starts and releasing it after the fiber completes. For larger workflows, DexTaskGroup gives related futures a structured scope that can be closed, awaited, or cancelled as one unit.

That combination makes cleanup much easier to reason about when a workflow has many moving pieces. A loader can start many subtasks, keep only a useful number active at once, and return a single future representing the whole operation. If the window closes, the project changes, or the operation times out, the group gives the application one place to cleanly shut the work down.

static DexFuture *
load_many_files (GPtrArray *files)
{
  g_autoptr(DexTaskGroup) group = dex_task_group_new (0);
  g_autoptr(DexLimiter) limiter = dex_limiter_new (8);

  for (guint i = 0; i < files->len; i++)
    {
      GFile *file = g_ptr_array_index (files, i);

      dex_task_group_add (group,
                          dex_limiter_run (limiter,
                                           NULL,
                                           0,
                                           load_one_file,
                                           g_object_ref (file),
                                           g_object_unref));
    }

  return dex_future_with_timeout_seconds (dex_task_group_close (group), 10);
}

There is also a new DexThreadPool for the cases that are not naturally fiber-shaped. Fibers and schedulers are still the right fit for cooperative async work, but many applications need to integrate blocking libraries, database clients, filesystem helpers, or other foreign code. A fixed pool of reusable OS threads, dex_thread_pool_submit(), and asynchronous dex_thread_pool_close() give that integration story a bounded queue and an explicit shutdown path.

Deadlines are another practical piece of the same story. The new timeout wrappers, including dex_future_with_timeout_seconds() and dex_future_with_deadline(), turn time limits into ordinary future composition. Instead of open-coded timeout state spread across an application, a future can resolve normally, reject normally, or reject with DEX_ERROR_TIMED_OUT when the deadline wins.

On the I/O side, 1.2 continues filling in the operations that make responsiveness easier to preserve. dex_aio_open() and dex_aio_close() matter because even operations that look small can stall when they touch the kernel, storage, or network-backed filesystems. Keeping those calls in libdex’s file-descriptor AIO model makes it easier to keep them off the UI thread, using io_uring where it is available and the fallback AIO backend elsewhere.

The broader GIO coverage is intentionally less surprising, but still important. More app launching, GFile, stream, socket, resolver, proxy, TLS, DTLS, permission, subprocess, and Unix-facing APIs now have future-first wrappers. That is the kind of coverage people should expect from libdex over time: not every wrapper needs a release headline, but each one reduces the pressure to leave the future model for common GNOME application work.