Concurrency, Parallelism, I/O Scheduling, Thread Pooling, and Work-Stealing

Around 15 years ago I worked on some interesting pieces of software which are unfortunately still not part of my daily toolbox in GNOME and GTK programming. At the time, the world was going through changes to how we would write thread pools, particularly with regards to wait-free programming and thread-pooling.

New trends like work-stealing were becoming a big thing, multiple-CPUs with multiple NUMA nodes were emerging on easy to acquire computers. We all were learning that CPU frequency was going to stall and that non-heterogeneous CPUs were going to be the “Next Big Thing”.

To handle those changes gracefully, we were told that we need to write software differently. Intel pushed that forward with Threading Building Blocks (TBB). Python had been doing things with Twisted which had an ergonomic API and of course “Stackless Python” and similar was a thing. Apple eventually came out with Grand Central Dispatch. Microsoft Research had the Concurrency and Coordination Runtime (CCR) which I think came out of robotics work.

Meanwhile, we had GThreadPool which honestly hasn’t changed that much since. Eventually the _async/_finish paring we’re familiar with today emerged followed by GTask to provide a more ergonomic API on top of it.

A bit before the GTask we all know today, I had written libgtask which was more of a Python Twisted-style API which provided “deferreds” and nice ways to combine them together. That didn’t come across into GLib unfortunately. To further the pain there, what we got in the form of GTask has some serious side-effects which make it unsuitable as a general construct in my humble opinion.

After realizing libgtask was eclipsed by GLib itself, I set off on another attempt in the form of libiris. That took a different approach that tried to merge the mechanics of CCR (message passing, message ports, coordination arbiters, actors), the API ergonomics of Python Twisted’s Deferred, and Apple’s NSOperation. It provided a wait-free work-stealing scheduler to boot. But it shared a major drawback of GLib’s GTask (beyond correctness bugs that plague it today). Primarily that thread pools can only process the work queue and therefore if you need to combine poll() or GSource that attach to a GMainContext you’re going to require code-flow to repeatedly bounce between threads.

This is because you can simplify a thread pool worker to while (has_work()) do_work ();. Any GSource or I/O either needs to bounce to the main thread where the applications GMainContext exists or to another I/O worker thread if doing synchronous I/O. On Linux, for a very long time, synchronous I/O was the “best” option if you wanted to actually use the page cache provided by the kernel, so that’s largely what GLib and GIO does.

The reason we couldn’t do something else is that to remove an item from the global work queue required acquiring a GMutex and blocking until an item is available. On Linux at least, we didn’t have APIs to be able to wait on a Futex while also poll()ing a set of file-descriptors. (As a note I should mention for posterity that FD_FUTEX was a thing for a short while, but never really usable).

In the coming years, we got a lot of new features in Linux that allowed improvements to the stack. We got signalfd to be able to poll() on incoming Unix signals. We got eventfd() which allowed a rather low-overhead way to notify coordinating code with a poll()able file-descriptor. Then EFD_SEMAPHORE was added so that we can implement sem_t behavior with a file-descriptor. It even supports O_NONBLOCK.

The EFD_SEMAPHORE case is so interesting to me because it is provides the ability to do something similar to what IRIX did 20+ years ago, which is a pollable semaphore! Look for usnewpollsema() if you’re interested.

There was even some support in GLib to support epoll(), but that seems to have stalled out. And honestly, making it use io_uring might be smarter option now.

After finishing the GTK 4 port of GNOME Builder I realized how much code I’ve been writing in the GAsyncReadyCallback style. I don’t particularly love it and I can’t muster the energy to write more code in that style. I feel like I’m writing top-half/bottom-half interrupt handlers yet I lack the precision to pin things to a thread as well as having to be very delicate with ownership to tightly control object finalization. That last part is so bad we basically don’t use GLib’s GTask in Builder in favor of IdeTask which is smart about when and where to release object references to guarantee finalization on a particular thread.

One thing that all these previous projects and many hundreds of thousands of async-oriented C code written has taught me is that all these components are interlinked. Trying to solve one of them without the others is restrictive.

That brings me to 2022, where I’m foolishly writing another C library that solves this for the ecosystem I care most about, GTK. It’s goal is to provide the pieces I need in both applications as well as toolkit authoring. For example, if I were writing another renderer for GTK, this time I’d probably built it on something like this. Given the requirements, that means that some restrictions exist.

  • I know that I need GMainContext to work on thread pool threads if I have any hope of intermixing workloads I care about on worker threads.
  • I 100% don’t care about solving distributed computing workloads or HTTP socket servers. I refuse to race for requests-per-second at the cost of API ergonomics or power usage.
  • I know that I want work stealing between worker threads and that it should be wait-free to avoid lock contention.
  • Worker threads should try to pin similar work to their own thread to avoid bouncing between NUMA nodes. This increases data cacheline hits as well as reduces chances of allocations and frees moving between threads (something you want minimized).
  • I know that if I have 32 thread pool threads and 4 jobs are added to the global queue, I don’t want 32 threads waking up from poll() to try to take that those work items.
  • The API needs to be simple, composable, and obvious. There is certainly a lot of inspiration to be taken from things like std::future, JavaScript’s Promise, Python’s work on deferred execution.
  • GObject has a lot of features and because of that it goes through great lengths to provide correctness. That comes at great costs for things you want to feel like a new “primative”, so avoiding it makes sense. We can still use GTypeInstance though, following in the footsteps of GStreamer and GTK’s Scene Kit.
  • Cancellation is critical and not only should it cause the work you created to cancel, that cancellation should propagate to the things your work depended on unless non-cancelled work also depends on it.

I’ve built most of the base of this already. The primary API you interact with is DexFuture and there are many types of futures. You have futures for IO (using io_uring). Futures for unix signals. A polled semaphore where dex_semaphore_wait() is a future. It can wrap _async/_finish pairs and provide the result as a Future. Thread pools are efficient in waiting on work (so staying dormant until necessary and minimal thread wake-ups) while also coordinating to complete work items.

There is still a lot of work to go, and so far I’ve only focused on the abstractions and Linux implementations. But I feel like there is promise (no pun intended) and I’m hoping to port swaths of code in Builder to this in the coming months. If you’d like to help, I’d be happy to have you, especially if you’d like to focus on alternate DexAioBackends, DexSemaphore using something other than eventfd() on BSD/Solaris/macOS/Windows, and additional future types. Additionally, working to GLib to support GMainContext directly using io_uring would be appreciated.

You can find the code here, but it will likely change in the near future.

Builder GTK 4 Porting, Part IV

This week was a little slower as I was struggling with an adjustment to my new medication. Things progress nonetheless.

Text Editor

I spent a little time this week triaging some incoming Text Editor issues and feature requests. I’d really like this application to get into maintenance mode soon because I have plenty of other projects to maintain.

  • Added support for gnome-text-editor - to open a file from standard input, even if you’re communicating to a single instance application from terminal.
  • Branch GNOME 42 so we can add new strings.
  • Fix a no-data-loss crash during shutdown.

Template-GLib

  • Fix template evaluation on macOS.
  • Make boolean expression precedence more predictable.
  • Cleanup output of templates with regards to newlines.

libpanel

  • Propagate modified page state to tabs
  • Some action tweaks to make things more keyboard shortcut friendly.

Builder

  • Merged support for configuration editing from Georges.
  • Add lots of keybindings using our new keybinding engine.
  • Track down and triage that shortcut controllers do not capture/bubble to popovers. Added workarounds for common popovers in Builder.
  • Teach Builder to load keybindings from plugins again and auto-manage them.
  • Lots of tweaks to the debugger UI and where widgetry is placed.
  • Added syntax highlighting for debugger disassembly.
  • Added menus and toggles for various logging and debugger hooks. You can get a breakpoint on g_warning() or g_critical() by checking a box.
  • Ability to select a build target as the default build target finally.
  • More menuing fixes all over the place, particularly with treeviews and sourceviews.
  • Fix keyboard navigation and activation for the new symbol-tree
  • Port the find-other-file plugin to the new workspace design which no longer requires using global search.
  • GTK 4 doesn’t appear to scroll to cells in textview as reliably as I’d like, so I dropped the animation code in Builder and we jump strait to the target before showing popovers.
  • Various work on per-language settings overrides by project.
  • Drop the Rust rls plugin as we can pretty much just rely on rust-analyzer now.
  • Lots of CSS tweaks to make things fit a bit better with upcoming GNOME styling.
  • Fix broken dialog which prevented SDK updates from occurring with other dependencies.

A screenshot of builder's find-other-file plugin

A screenshot of Builder's debugger

A screenshot showing the build target selection dialog

A screenshot of the run menu A screenshot of the logging menu

Builder GTK 4 Porting, Part II

Another week of work towards porting Builder to GTK 4. Since I can’t add to TWIG from IRC, I’ll try harder to drop some occasional updates here.

GtkSourceView

  • Merged fixes for highlighting unicode literals for C
  • Improved parsing of language values in snippet bundles
  • GtkSourceGutter will not correctly mark prelit and selection quarks within GtkSourceGutterLines.
  • Fixed a bunch of little mouse pointer annoyances when using GtkSourceHover interactive tooltips.
  • GtkSourceGutterRenderers can now opt-out of signal emission for GtkSourceGutterRenderer::query_data(). Signal emission with GObject is rather slow, so avoiding it on every line can be helpful. Just set the virtual method pointer to NULL. The signal was only ever added to make porting easier anyway.

libpanel

  • Merged fixes to be used as a subproject with static libraries only
  • CSS styling matches recent changes in libadwaita, particularly around making navigation tabs and panel frame headers more unified.
  • PanelWidget no longer uses a GtkBinLayout so that it’s easier for subclasses to integrate with popovers from size_allocate() to call gtk_popover_present().

Builder

  • Georges did a live coding stream where they ported a bunch of the “buildui” plugin. That is merged to the GTK 4 port now. It brings a number of features back to the UI including the build terminal, run terminal, build panel (with pipeline stages, warnings, and errors), and project information in the omnibar popover.
  • Günther did a bunch of work porting our old snippet files to the new XML-based snippet bundle format upstream in GtkSourceView. Along with that came porting of the snippets plugin for Builder’s new GTK 4-based editor.
  • Workspace windows have a bit better predictability when restoring sizes.
  • The project creation workflow was ported, albeit needs to have the redesign implemented still.
  • Lots of resiliency fixes for clang and symbol-tree plugins to improve life-cycle management.
  • The Valgrind plugin was ported to C. This was also back-ported to main because it fixed a number of oddities (crashes) occurring in PyGObject.
  • Builder’s “omni-gutter” GtkSourceGutterRenderer was ported to GTK 4 and got a lot of cleanups along the way. I believe there is still some outstanding things to fix such as handling rendering for symbolic icons as I’m pretty sure that’s not correct yet.
  • The “debuggerui” plugin has been ported to GTK 4 and appears to be working well now. This plugin is responsible for bridging the internal IdeDebugger interfaces to the UI interface.
  • Style schemes were updated for GtkSourceView 5
  • Tabs are now the default navigation interface for pages. There is likely still a lot to do around how we want empty frames to look and behave though.
  • The project-tree is now most ported, and with some workarounds to allow making GtkPopover work you can actually display popovers and activate menu items.
  • A long-standing plugin ordering issue has been fixed so that gtk/menus.ui embedded in plugin resources have menu-merging in the proper order.
  • Some incremental work landed to support per-project configuration of languages.
  • The “ls” plugin (directory views) supports “humanized” timestamps again and columns are resizable once more.
  • You can place panels in the right-side-bar now.

Upcoming

There are a bunch of foundational things to still get landed before I feel I can get Builder flipped over to our Nightly builds. In particular we need to land support for things like:

  • Keyboard shortcuts using GtkShortcutController. This was all done with libdazzle previously.
  • Allow plugins to define custom keyboard shortcuts and merge them into the controller.
  • Port “shellcmd” plugin and integrate keyboard shortcuts to apply those commands.
  • Finish rewrite of the search popover. I’m trying to delay this until GtkListView with sections is supported, as it would provide us a much greater path for performance.
  • A lot of our problems would be simpler if we could make GtkActionMuxer use an alternate action muxer parent from another (non-descendant) widget.
  • Configuration editing in the project configuration window. This is a new design so the port is not 1:1.

A screenshot of Builder with the project-tree context menu and debugger on display

A screenshot of Builder with various panels moved around to non-standard locations

A GTK 4 based Text Editor

It started as an application for me to verify the correctness of the GtkSourceView 5 API (which targets GTK 4). After that it helped me implement JIT support for GtkSourceView languages. Once that was done it became my test case while I wrote the GTK 4 macOS backend and revamped the GL renderer.

It is a simple and humble text editor. It does not have all the corner cases you’d expect from a text editor yet. It does not have aspirations to be a programmers text editor.

Now that you know this is very much a technology preview release only, you might be tempted to keep your important data away from it.

What it can do

  • Simple interface designed by the GNOME design team. You can find the mockups in the traditional places
  • Search and Replace
  • Typical GtkSourceView features
  • Quick access to recent documents
  • Multiple windows
  • Automatic discovery of .editorconfig and modelines settings
  • Light and dark mode
  • Automatically save files to drafts, restored in case of crash
  • Printing
  • Can be run from within a Flatpak sandbox and uses document portal for access to host files

What it cannot do

  • It doesn’t protect you from trying to open really large files
  • Support your custom GTK 4 theme
  • Auto-completion or snippets
  • Plugins
  • Custom file encodings
  • Spell check
  • Change style schemes beyond light and dark
  • Translations or Help of any kind

Building

Here is a release tarball.

If you’d like to test it out, one way is to clone the repository from GNOME Builder and click Run. Additionally, you can find a Flatpak in the gnome-nightly Flatpak repository.

GTK 4 got a new macOS backend (now with OpenGL)

I’ve been busy the past few months writing a new GDK backend for macOS when not maintaining my other projects. Historically our macOS performance wasn’t something to rave about. But it’s getting better in GTK 4.

The new backend can do both software rendering with Cairo and hardware-based OpenGL rendering using the same OpenGL renderer as we use on GNU/Linux.

This was a fairly substantial “greenfield” rewrite of the backend because so much of it had bit-rotted during the development of GTK 4. GDK hardly looks the same as it did in previous releases and that is a good thing. It’s much easier to write a new backend these days.

I tried to polish it off a bit too, teaching it to do CSD edge-snapping and more. If you’re unfortunate enough to be using the software renderer, it does have some tricks to make drawing a bit faster than in the past. We dropped our use of the quartz Cairo backend in favor of the image backend because, well, it’s faster. Additionally we get a bit clever with opaque regions to speed up CSD compositing.

It also uses the CVDisplayLink to get presentation timing information from the display server to drive our frame clock.

A screenshot of the macOS backend

Thanks again to my employer, Red Hat, for funding this work so we can all benefit from having our applications reach more users.

GtkSourceView Next

Earlier this year I started a branch to track GTK 4 development which is targeted for release by end-of-year. I just merged it which means that our recently released gtksourceview-4-8 branch is going to be our LTS for GTK 3. As you might remember from the previous maintainer, GtkSourceView 4.x is the continuation of the GtkSourceView 3.x API with all the deprecated API removed and a number of API improvements.

Currently, GtkSourceView.Next is 5.x targeting the GTK 4.x API. It’s a bit of an unfortunate number clash, but it’s been fine for WebKit so we’ll see how it goes.

It’s really important that we start getting solid testing because GtkSourceView is used all over the place and is one of those “must have” dependencies when moving to a new GTK major ABI.

Preparations in GTK 4

Since I also spend time contributing to GTK, I decided to help revamp GtkTextView for GTK 4. My goal was to move various moving parts into GtkTextView directly so that we could make them more resilient.

Undo Support

One feature was undo support. GTK 4 now has native support for undo by implementing text history in a compact form within GTK itself. You can now set the enable-undo properties to TRUE on GtkTextView, GtkEditable widgets like GtkText or GtkEntry, and others.

GPU Rendered Text (sort of)

Matthias Clasen and I sat down one afternoon last year and wrote a new PangoRenderer for GSK using render nodes and the texture atlas provided by the OpenGL and Vulkan renderers. Since then, GtkTextView gained a GtkTextLineDisplay cache so that we can keep these immutable render nodes around across multiple snapshots.

Text is still rendered on the CPU into a texture atlas, which is uploaded to the GPU and re-used when possible. Maybe someday things like pathfinder will provide a suitable future.

GtkTextView and Widgets

Previously, the gutters for GtkTextView were simply a GdkWindow which could be rendered to with Cairo. This didn’t fit well into the “everything should be a widget” direction for GTK 4. So now you can pack a widget into each of the 4 gutters around the edges of a GtkTextView. This means you can handle input better too using GtkGesture and GtkEventControllers. More importantly, though, it means you can improve performance of gutter rendering using snapshots and cached render nodes when it makes sense to do so.

Changes in GtkSourceView Next

Moving to a new major ABI is a great time to do cleanups too as it will cause the least amount of friction. So I took this opportunity to revamp much of the GtkSourceView code. We follow more modern GObject practices and have bumped our compiler requirements to closely match GTK 4 itself. This still means no g_autoptr() usage from within GtkSourceView sadly thanks to MSVC being … well the worse C compiler still in wide use.

GtkSourceGutterRenderer is now a GtkWidget

Now that we have margins which can contain widgets and contribute to the render node tree, both GtkSourceGutter and GtkSourceGutterRenderer are GtkWidget. This will mean you need to change custom gutter renderers a bit, but in practice it means a lot less code than they previously contained. It also makes supporting HiDPI much easier.

GtkSourceCompletion Revamp

I spent a lot of time making completion a pleasing experience in GNOME Builder and that work has finally made it upstream. To improve performance and simplicity of implementation, this has changed the GtkSourceCompletionProvider and GtkSourceCompletionProposal interfaces in significant ways.

GtkSourceCompletionProposal is now a mostly superfluous type used to denote a specialized GObject. It doesn’t have any functions in the vtable nor any properties currently and the goal is to avoid adding them. Simply G_IMPLEMENT_INTERFACE (GTK_SOURCE_TYPE_COMPLETION_PROPOSAL, NULL) when defining your proposal object GType.

This is because all of the completion provider implementation can now be performed from GtkSourceCompletionProvider. This interface focus on using interfaces like GListModel (like the rest of GTK 4) and how to asynchronously generate and refine the results with additional key-presses.

The completion window has been revamped and now allows proposals to fill a number of columns including an icon, return-type (Left Hand Side), Typed Text, and supplementary text. It resizes with content and ensures that we only inflate the number of GObjects necessary to view the current set. A fixed number of widgets are also created to reduce CSS and measurement costs.

Further, proposals may now have “alternates” which allows for providers to keep all of the DoSomething() proposals with 20 overloaded forms for each base type in whatever language of the day is being used from clogging up the suggestions.

The new GtkSourceCompletionCell widget is a generic container used throughout completion for everything from containing icons, text, or even custom widgetry for the completion details popover.

Completion Preview

GtkSourceGutterLines

A new abstraction, GtkSourceGutterLines, was added to help reduce overhead in generation of content in the gutter. The design of gutters lead to an exorbitant amount of measurement work on every frame. This was actually the biggest hurdle in making GTK 3 applications scroll smoothly. The new design allows for all the renderers to collect information about lines in one pass (along with row height measurements) and then snapshot in their second pass. Combined with the ability to cache render nodes, gutter renderers should have what they need to remain fast even in HiDPI environments.

The implementation of this also has a few nice details to further reduce overhead, but I’ll leave that to those interested in reading the code.

GtkSourceBuffer::cursor-moved

GtkSourceBuffer now has a cursor-moved signal. This seemed to be something implemented all over the place so we might as well have it upstream.

Reduce signal emission overhead

A number of places have had signal emission overhead reduced. Especially in property notifications.

Spaces Drawing

The GtkSourceSpaceDrawer now caches render nodes for drawing spaces. This should improve the performance in the vast majority of cases. However, one case still could be improved upon: tabs when the tab width changes (generally when used after text or spaces).

New Features

Snippets

A new snippet engine has landed based on a much improved version from GNOME Builder. You can provide bundles using an XML snippets file. You can also create them dynamically from your application and insert them into the GtkSourceView. In fact, many completion providers are expected to do this.

The snippet language is robust and shares many features and implementation details from GNOME Builder.

Assistants

A new subsystem, GtkSourceAssistant is used to provide accessory information in a GtkSourceView. Currently this type is private and an implementation detail. However, GtkSourceCompletion and GtkSourceSnippet build upon it to provide some of their features. In the long term, we expect hover providers to also take advantage of this subsystem.

Sysprof Support

GtkSourceView now uses the Sysprof collector API just like GTK 4 does (among many other GNOME projects). This means you can get profiling information about renderings right in the Sysprof visualizer along other data.

Future Work

PCRE2

With GRegex on the chopping block for deprecation, it’s time to start moving to PCRE2 much like VTE did. Doing so will not only make us more deprecation safe, but ensure that we can actually use the JIT feature of the regex engine. With how much regexes are used by the highligting engine, this should be a fairly sizable improvement.

This has now been implemented.

Hover Providers

In GNOME Builder, we added an abstraction for “Hover Providers”. This is also a thing in the Language Server Protocol realm. Nothing exists upstream in GtkSourceView for this and that should probably change. Otherwise all the trickyness in making transient popovers work is put on application authors.

Style Schemes

I would like to remove or revamp some of our default style schemes. They do not handle the world of dyanmic GTK themes so well and become a constant source of bug reports by applications that want a “one size fits all” style scheme. I’m not sure yet on the complete right answer long term here, but my expectation is that we’d want to move toward a default style scheme that is mostly font changes rather than color changes which eventually fall apart on the more … interesting themes.

Anyway, that’s all for now!

GtkSourceView Snippets

I’m trying to blog about every week now this year, so here we go again.

The past week I’ve been pushing hard on finishing up the snippets work for the GTK 4 port. It’s always quite a bit more work to push something upstream because you have to be so much more complete while being generic at the same time.

I think at this point though I can move on to other features and projects as the branch seems to be in good shape. I’ve fixed a number of bugs in the GTK 4 port along the way and made tests, documentation, robustness fixes, style-scheme integration, a completion provider, file-format and parser, and support for layering snippet files the same way style-schemes and language-specs work.

As part of the GTK 4 work I’ve spent a great deal time modernizing the code-base. Now that we can depend on the same things that GTK 4 will depend on, we can use some more modern compiler features. Additionally, GObject has matured so much since most of the library was written and we can use that to our advantage.

Sysprof Developments

This week I spent a little time fixing up a number of integration points with Sysprof and our tooling.

The libsysprof-capture-3.a static library is now licensed under the BSD 2-clause plus patent to make things easier to consume from all sorts of libraries and applications.

We have a MR for GJS to switch to libsysprof-capture-3.a and improve plumbing so Sysprof can connect automatically.

We also have a number of patches for GLib and GTK that improve the chances we can get useful stack-traces when unwinding from the Linux kernel (which perf_event_open does).

A MR for GNOME Shell automatically connects the GJS profiler which is required as libgjs is being used as a library here. The previous GJS patches only wire things up when the gjs binary is used.

With that stuff in place, you can get quite a bit of data correlated now.

# Logout, Switch to VT2
sysprof-cli -c "gnome-shell --wayland --display-server" --gjs --gnome-shell my-capture.syscap

If you don’t want mixed SpiderMonkey and perf stack-traces, you can use --no-perf. You can’t really rely on sample rates between two systems at the same time anyway.

With that in place, you can start correlating more frame data.

Flatpaking Terminals

One thing Builder has done for a long time is make terminals work seamlessly even if distributed using container technologies. Because pseudo-terminals are steeped in esoteric UNIX history, it can be non-obvious how to make this work.

I’m in a place to help you not have to deal with that pain because I’ve already gone through it. So I created some utility code and a demo application that can be packaged with Flatpak. If it detects it’s running under Flatpak it will use a few techniques to get a user-preferred shell executed on the host with a PTY controlled by application.

Check out the code.

Edit: The flatterm repository has been updated to use the brand new VTE_PTY_NO_CTTY flag that was added in response to this blog post. Users of Vte from git (what will be 0.58) get to enjoy writing even less code.

GtkSourceView moved to Meson

The master branch of GtkSourceView (what will become 4.4) has moved to meson for development. I branched gtksourceview-4-2 for patch releases of 4.2.x which will remain autotools. Today’s release of gtksourceview-4.3.1 contains both autotools and meson. However 4.3.2 will remove autotools entirely.

I also landed some code to speed up line number drawing which was a non-trivial amount of the render cost while kinetic scrolling.