Keeping your fast code fast

Over the past few weeks I’ve been finishing up various projects for 3.36. None of this is surprising for those that follow me on twitter, but sadly I find it hard to blog as often as I should.

One of the projects I completed before the end of the cycle is a memory allocation tracker for Sysprof. It’s basically a modern port of the Memprof code from 20 years ago, but tied into Sysprof and using fancier techniques to move data quickly between processes. It uses an LD_PRELOAD to override many of the weak memory symbols in glibc such as malloc() and free(). When those functions are reached, a stack trace is captured directly into a mmap()‘d ring buffer shared by Sysprof. We create a new one of these per-thread so that no locking is necessary between threads. Sysprof will mux all the data together for us.

Below is a quick example running gtk4-widget-factory. We show similar callgraphs as we do when doing CPU profiling, but ordered by the amount of memory allocated. This simple tool and less than 20 minutes of effort found many allocations we could completely avoid across both GTK and Clutter.

A callgraph of memory allocations

I just want to mention how refreshing it is to have memory allocation tracking while still starting the application in what feels like instantly. It was quite a bit of tweaking to get that level of performance and I’m thrilled with the result.

Additionally, I spent some time looking at what sort of things cause temporary lockups in GNOME Shell during active use. With a fio script in hand, I had the necessary things to cause the buffer cache to be exhausted and force many applications working set out of memory. That usually does the trick to cause short lockups.

But what is going on when things stall? Does the GPU driver get bogged down? Does the Shell get blocked on GC? Is there some sort of blocking API involved?

To answer this I put together a scrappy little LD_PRELOAD tool called “iobt” which will write out a Sysprof capture file when some blocking operations are called. This found a very peculiar bug where GNOME Shell could end up blocking on the compositor thread when it thought it was doing all async I/O operations.

Furthermore, I found a number of other I/O operations happening on the main thread which will easily lock things up under heavy writeback scenarios. Patches for all of these are upstream, half of them are merged at this point, and some even backported to 3.28 for various distros.

There are still some things to do going forward, like use cgroupsv2 to help enforce CPU and Memory availability and other priorities. I’m also looking for pointers from GPU people on how to debug what is going on during long blocking eglSwapBuffers() calls as I’ve seen under memory pressure.

I’m always inspired by what the Shell developers build and I’m honored to get to help polish it even more.

GtkSourceView Branched

I’ve branched GtkSourceView for 4.6 (gtksourceview-4-6) which you should be using instead of master for your application’s Nightly Flatpak builds. I will land the GTK 4 port on master early next week. A message to gnome-announce-list has been sent and will hopefully make it into distribution packagers inbox shortly.

Long story short is that the 4.6 series will be our long-term (and last) series for GTK 3 applications. I expect this to be maintained for many years. Master will become the beginning of our transition to GTK 4 and the place we land lots of upstream features for Next.0.

GtkSourceView Snippets

I’m trying to blog about every week now this year, so here we go again.

The past week I’ve been pushing hard on finishing up the snippets work for the GTK 4 port. It’s always quite a bit more work to push something upstream because you have to be so much more complete while being generic at the same time.

I think at this point though I can move on to other features and projects as the branch seems to be in good shape. I’ve fixed a number of bugs in the GTK 4 port along the way and made tests, documentation, robustness fixes, style-scheme integration, a completion provider, file-format and parser, and support for layering snippet files the same way style-schemes and language-specs work.

As part of the GTK 4 work I’ve spent a great deal time modernizing the code-base. Now that we can depend on the same things that GTK 4 will depend on, we can use some more modern compiler features. Additionally, GObject has matured so much since most of the library was written and we can use that to our advantage.

GtkSourceView branching

Branching

We’re currently finishing up the cycle towards GNOME 3.36, which means it’s almost time to start branching and thinking about what we want to land early in the 3.37 development cycle. My goal is to branch gtksourceview-4-6 which will be our long-term stable branch for gtksourceview-4.x (similar to how the gnome-3-24 branch is our long-term stable for the gtksourceview-3.x series. After that, master will move to GTK 4 as we start to close in on GTK 4 development. The miss-alignment in version numbers is an unfortunate reality, but a reality I inherited so we’ll keep on keepin’ on.

That means if you are not setting a branch in your flatpak manifests, you will want to start doing that when we branch (probably in the next couple of weeks) or your builds will start to fail. Presumably, this only will affect your Nightly builds, because who targets upstream master in production builds, not you surely!

Snippets

I’ve started moving some features from Builder into GtkSourceView. However, I’m limiting those to the GTK 4 port because I don’t particularly want to add new ABI right before putting a branch into long-term stability mode. The first to be uplifted into GtkSourceView is Builder’s snippet engine. It also went through quite a bit of rewrite and simplification as part of this process to make it more robust. Furthermore, having moved undo/redo into GtkTextBuffer directly has done wonders from a correctness standpoint. The snippet engine used to easily be confused by the undo/redo engine.

The most difficult part of the snippet engine is dealing with GtkTextMark that are adjacent. In particular if you have each snippet focus-position wrapped in marks. Adjacent, empty mark ranges can end up overlapping each other and you have to be particularly diligent to prevent that. But the code in the branch has a pretty good handle on that, much better than what I had done in the past inside of Builder.

The bits I still need to do to finish up the snippet engine:

  • Land the GTK 4 port on master.
  • Add various style tags to bundled style schemes.
  • Documentation of course.
  • A new snippet manager, file format, and parser. We’ll probably switch to XML for this so it matches language-spec and style-scheme instead of our adhoc format from Builder.

Future work

Completion

After the GTK 4 port and snippet engine has landed, I’ll probably turn my work towards updating the completion APIs to take advantage of various GLib/GIO advancements from recent years (similar to what we did in Builder). We can also do a bit of style refreshing there based on what Builder did too.

Indenters

If we can find a nice API for it, I’d like to land a basic indenter API as well. The one we have in Builder has worked, but it’s not exactly simple to write new indenters. Then we can start having contributions upstream which can be tied to a specific language-spec. Doing language specific stuff is always nice.

Movements Engine

How you move through a GtkTextView is rather simplistic. There are a number of keyboard accelerators for common movements but they are far more restrictive than what you’d expect from a code editor. In Builder we’ve had custom signals to do more robust movements which the VIM, Emacs, and Sublime emulation builds off.

We can make this more robust in the future using GTK 4’s widget actions. I plan to move a lot of the custom movements from Builder into GtkSourceView so that we can eventually have keyboard shortcut emulation upstream.

Event Controllers

One of the trickiest (and dirtiest) bits of code in Builder is our VIM keybindings emulation for the editor. It always was meant to be a hack to get us to GTK 4, and it did that fine. But we should let it end with that as the constraints from a gtkrc-based (GTK 2.x) system ported to CSS (GTK 3.x) is simply too much pain to bare.

In GTK 4 we have event controllers, especially those for handling keyboard input. I think it is possible for us to move much of a VIM emulation layer (for input) into an event controller (or GtkGesture even). Given that this would use C code instead of CSS, I’d have a much easier time dealing with all the hundreds of corner cases where VIM is internally inconsistent, but expected behavior.

Hover Providers

The Language Server Protocol has had success at abstracting a number of things, including the concept of “Hover Providers”. These are essentially interactive tooltips. Builder has support for them built upon transient GtkPopover. This seems like a possible candidate to move up to GtkSourceView too.

Other Possibilities

Some other possibilities, given enough interest, could be our OmniGutterRenderer from Builder (with integrated debug breakpoint and diagnostic integration), line-change gutter renderers (which can be connected to Git, SVN, etc), reformatting, semantic highlighting, and multiple cursors. However that last one is incredibly difficult to do from a completeness standpoint as it might need some level of plumbing down in GtkTextView for robustness sake.

GtkSourceView on GTK 4

I spent some time this cycle porting GtkSourceView to GTK 4. It was a good opportunity to help me catch up on how GTK 4’s internals have changed into something modern. It gave me a chance to fix a few pot-holes along the way too.

One of the pot-holes was one I left in GtkTextView years ago. When I plumbed the pixelcache into GTK 3’s TextView I had only cached the primary text content. It seemed fine at the time because the gutters (used for line numbers) is just not that many pixels. So if we have to re-generate that every frame, so be it.

However, in a HiDPI world and 4k monitors on our laps things start to get… warm. So while changing the drawing model in GtkTextView we decided to make the GtkTextView gutters real widgets. Doing so means that GtkSourceGutterRenderer will be real GtkWidget‘s going forward and can do all sorts of neat stuff widgets can do.

But to address the speed of rendering we needed a better way to avoid walking the text btree linearly so many times while rendering the gutter. I’ve added a new class GtkSourceGutterLines to allow collecting information about the text buffer in one-pass. The renderers can then use that information when creating render nodes to avoid further tree scans.

I have some other plans for what I’d like to see before a 5.0 of GtkSourceView. I’ve already written a more memory-compact undo/redo engine for GTK’s GtkTextView, GtkEntry, GtkText, and friends which allowed me to delete that code from the GtkSourceView port. Better yet, you get undo/redo in all the places you would, well, expect it.

In particular I would like to see the async+GListModel based API for completion from Builder land upstream. Builder also has a robust snippet engine which could be reusable from GtkSourceView as that is a fairly useful thing across editors. Perhaps we could extract Builder’s indenter APIs and movements engine too. These are used by Builder’s Vim emulation quite heavily, for example.

If you like following development of stuff I’m doing you can always get that fix here on Twitter given my blogging infrequency.

Introducing Bonsai

TL;DR: Pair your Linux devices, developer APIs to share files, create object graphs with partial sync between devices, transactions, secondary indexes, rebasing, and more built upon GVariant and LMDB. Tooling to build cloudless multi-device services.

A picture of a Bonsai tree and a gnomeI’ve been spending a great deal of time thinking about what types of products I’d like to see in GNOME and what is missing to make that happen.

One observation is that I want access to my files and application data on all my computing devices but I don’t want to store that data on other peoples computers. I have computers, they have internet access, I shouldn’t have to use a multi-tenancy cloud if I’m running as much Free Software as I do. But if that is going to be competitive it needs to be easier than the alternatives.

But to build this I need a few fundamental layers to build applications atop. I’ll need access to files using all the GIO file APIs we love (GFile, GFileEnumerator, GIOStream, etc). I’ll also need the ability to read and write application data in a way that can be shared between devices which may not always be connected to my home Wi-Fi. In particular, we need to give developers great tools to make applications that natively support device synchronization.

What I’ve built to experiment with this all is Bonsai. It is very much an experiment at this phase but it is getting interesting enough to collaborate with others who would like to join me.

Bonsai consists of a daemon that you run on your “mostly connected” computer. Although that could easily be a raspberry pi quality computer in your home. That computer hosts the “upstream” storage space for files and application content.

Other devices like laptops, phones, or IoT can be paired with that primary device. They communicate using TLS connections using pinned self-signed certificates with point-to-point D-Bus serialization on top. The D-Bus serialization makes it convenient to use gdbus-codegen to generate proxies and services.

One service available to devices is the storage service. It can be consumed from libbonsai-storage to allow applications to browse, create, move, modify and stream file content.

Applications are much better when they can communicate between devices. So a Data-Access-Object library, aptly named libbonsai-dao, provides serializable object storage built upon GVariant and LMDB. It supports primary and secondary indexes, queries, cursors, transactions, and incremental sync between devices. It has the ability to rebase local changes atop changes pulled from the primary Bonsai device.

That last bit is neat because it means if an application is running on two devices which create new content they don’t clobber each others history.
The primary issue here is dealing with merge conflicts but libbonsai-dao provides some design for data objects to do the right thing.

Bonsai could also could serve as a base to build interesting services like backup, VPN, media sharing and casting, news, notes, calendars, contacts, and more. But honestly, it can only do that if people are actually interested in something like this. If so, let me know and see if you can lend your time or ideas for what you’d want this to become.

Sysprof Updates

I just uploaded the sysprof-3.33.4 tarball as we progress towards 3.34. This alpha release has some interesting new features that some of you may find interesting as you continue your quests to improve the performance of your system by improving the software running upon it.

An image of Sysprof with various performance graphs

For a while, I’ve been wondering about various ways to move GtkTextView forward in GTK 4. It’s of particular interest to me because I spent some time in the GTK 3 days making it faster for smooth scrolling. However, the designs that were employed there work better on the traditional Xorg setup than they do on GTK 3’s Wayland backend. Now that GTK 4 can have a proper GL pipeline, there is a lot of room for improvement.

Thanks to the West Coast Hackfest, I had a chance to sit down with Matthias and work through that design. GtkLabel was already using some accelerated text rendering so we started by making that work for GtkTextView. Then we extended the GSK PangoRenderer to handle the rest of the needs of GtkTextView and Matthias re-implemented some features to avoid cairo fallbacks.

After the hackfest I also found time to implement layout caching of PangoLayout. It helps reduce some of the CPU overhead in calculating line layouts.

As we start using the GPU more it becomes increasingly important to keep the CPU usage low. If we don’t it’s very likely to raise overall energy usage. To help keep us honest, I’ve added some RAPL energy statistics to Sysprof.

Sysprof design work

Since my last post, I’ve been working on a redesign of Sysprof (among other things) to make it a bit more useful and friendly to newcomers.

Many years ago, I worked on a small profiler project called “Perfkit” that never really went anywhere. I had already done most of my UI research for this years ago, so it was pretty much just a matter of applying that design to the Sysprof code-base.

Now you can individually show extra detail rows and much more. Same great Sysprof 3-part callgraph breakdown.

A screenshot of Sysprof

I’ll get some delayed 3.33.3 tarballs out this week.

GTK 3 Frame Profiler

I back-ported the GTK 4 frame-profiler by Matthias to GTK 3 today. Here is an example of a JavaScript application using GJS and GTK 3. The data contains mixed native and JS stack-traces along with compositor frame information from gnome-shell.

What is going on here is that we have binary streams (although usually captured into a memfd) that come from various systems. GJS is delivered a FD via GJS_TRACE_FD environment variable. GTK is delivered a FD via GTK_TRACE_FD. We get data from GNOME Shell using the org.gnome.Sysprof3.Profiler D-Bus service exported on the org.gnome.Shell peer. Stack-traces come from GJS using SIGPROF and the SpiderMonkey profiler API. Native stack traces come from the Linux kernel’s perf_event_open system. Various data like CPU frequency and usage, memory and such come from /proc.

Muxing all the data streams uses sysprof_capture_writer_cat() which knows how to read data frames from supplemental captures and adjust counter IDs, JIT-mappings, and other file-specific data into a joined capture.

A quick reminder that we have a Platform Profiling Initiative in case you’re interested in helping out on this stuff.