It can be very handy to store things you might do as meta programming in your GObjectClass‘s private data (See G_TYPE_CLASS_GET_PRIVATE()).
Doing so is perfectly fine, but you need to be aware of how GTypeInstance initialization works. Each of your parent classes instance init functions are called before your subclasses instance init (and in order of the type hierarchy). What might seem non-obvious though is that the GTypeInstance.g_class pointer is updated as each successive _init() function is called.
That means if you have my_widget_init() and your parent class is GtkWidget, the gtk_widget_init() does not know it’s instantiating a subclass. Further more, GTK_WIDGET_GET_CLASS() called from gtk_widget_init() will get you the base classes GtkWidgetClass, not the subclasses GtkWidgetClass.
There are ways around this if you don’t use G_DEFINE_TYPE(), but honestly, who wants to do that.
One technique around this, which I used in Bonsai’s DAO, is to use a single-linked list where the head is in each subclass, but the tail exists in each of the parent classes. That way you share all the parent structures, but the subclasses can access all of theirs. You’ll still want to defer most setup work until constructed() though so you can get the full class information of the subclass and hierarchy.
First off, before using Sysprof to improve the performance of a particular piece of software, make sure you’re compiling with flags that allow us to have enough information to unwind stack frames. Sysprof will use libunwind in some cases, but a majority of our stack unwinding is done by the Linux kernel which can currently only follow eh_frame (exception handling) information.
I generally disable the G_SLICE allocator because it isn’t really all that helpful on modern Linux systems using glibc and can also make it more difficult to track down leaks. Furthermore, it can get in the way of releasing memory back to the system in the form of malloc_trim() should we start doing that in the future. (Hint, I’d like to).
Finding code run often on the system
Sysprof, at it’s core, is a “whole system” profiler. That means it is not designed to profile just your single program, but instead all the processes on the system. This is very useful in a desktop scenario where we have lots of interconnected components.
At this point, excercise your system to try to bring out the behavior you want to optimize. Then click “Stop” to stop recording and view the results.
You’ll notice a lot of time in gnome-software there. It turns out I’m on a F32 alpha install and there was a behavior change in libcurl that has screwed up a number of previously valid use cases. But if I didn’t know that already, this would point me where to start looking. You’ll notice that I hadn’t compiled libcurl or gnome-software from source, so the stack traces are not as detailed as they would be otherwise.
On the right side is a callgraph starting from “[Everything]”. It is split out by process and then by the callstack you see in that program. On the top-left side, is a list of all functions that were collected (and decoded). On the bottom-left side is a list of callers for the selected function above it. This is useful when you want to backtrack to all the places a function was called. (Note that this is a sampling-based profiler, so there is no guarantee all functions were intercepted).
Use this information to find the relevant code within a particular project. Tweak some things, try again, test…
Tracking down extraneous allocations
One of the things that can slow down your application is doing memory allocations in the hot paths. Allocating memory is still pretty expensive compared to all of the other things your application could be doing.
In 3.36, Sysprof gained support for tracking memory allocations with a LD_PRELOAD. However, it must spawn the application directly.
At this point run your application to exercise the targeted behavior. Then press “Stop” and you’ll be presented with the recording. Usually the normal callgraph is selected by default. Select the “Memory Allocations” row and you’ll see the memory callgraph.
This time you’ll see memory allocation size next to the function. Explore a bit, and look for things that seem out of place. In the following image, I notice a lot of transforms being allocated. After a quick discussion with Benjamin, he landed a small patch to make those go away. So sometimes you don’t even have to write code yourself!
A variant of this patch went into Mutter’s copy of Clutter for a healthy memory improvement too.
Finding main loop slow downs
In Sysprof master, we have a “Speedtrack” aid that can help you find various long running operations such as fsync(). I used this late in the 3.36 cycle to fix a bunch of I/O happening on GNOME Shell’s compositor thread. Select the “Speedtrack” aid, and disable the “Callgraph” as that will clash with speedtrack currently. This also uses an LD_PRELOAD so you’ll have to spawn the application just like for memory tracking.
The aid will give you callgraphs of various things that happened in your main thread that you might want to avoid doing. Stuff like fsync(), read() and more. It also creates marks for the duration of these calls so you can track down how long they ran for.
You can also see how long some operations have taken. Here we see g_main_context_iteration() took 22 milliseconds. On a 60hz system, that can’t be good because we either missed a frame or took too long to do something to be able to submit our frame in time. You can select the time range by activating this row. In the future we want this to play better with callgraphs so you can see what was sampled during that timespan.
Anyway, I hope that gives you some insight into how to use things!
I spent some time this cycle porting GtkSourceView to GTK 4. It was a good opportunity to help me catch up on how GTK 4’s internals have changed into something modern. It gave me a chance to fix a few pot-holes along the way too.
One of the pot-holes was one I left in GtkTextView years ago. When I plumbed the pixelcache into GTK 3’s TextView I had only cached the primary text content. It seemed fine at the time because the gutters (used for line numbers) is just not that many pixels. So if we have to re-generate that every frame, so be it.
However, in a HiDPI world and 4k monitors on our laps things start to get… warm. So while changing the drawing model in GtkTextView we decided to make the GtkTextView gutters real widgets. Doing so means that GtkSourceGutterRenderer will be real GtkWidget‘s going forward and can do all sorts of neat stuff widgets can do.
But to address the speed of rendering we needed a better way to avoid walking the text btree linearly so many times while rendering the gutter. I’ve added a new class GtkSourceGutterLines to allow collecting information about the text buffer in one-pass. The renderers can then use that information when creating render nodes to avoid further tree scans.
I have some other plans for what I’d like to see before a 5.0 of GtkSourceView. I’ve already written a more memory-compact undo/redo engine for GTK’s GtkTextView, GtkEntry, GtkText, and friends which allowed me to delete that code from the GtkSourceView port. Better yet, you get undo/redo in all the places you would, well, expect it.
In particular I would like to see the async+GListModel based API for completion from Builder land upstream. Builder also has a robust snippet engine which could be reusable from GtkSourceView as that is a fairly useful thing across editors. Perhaps we could extract Builder’s indenter APIs and movements engine too. These are used by Builder’s Vim emulation quite heavily, for example.
If you like following development of stuff I’m doing you can always get that fix here on Twitter given my blogging infrequency.
I just uploaded the sysprof-3.33.4 tarball as we progress towards 3.34. This alpha release has some interesting new features that some of you may find interesting as you continue your quests to improve the performance of your system by improving the software running upon it.
For a while, I’ve been wondering about various ways to move GtkTextView forward in GTK 4. It’s of particular interest to me because I spent some time in the GTK 3 days making it faster for smooth scrolling. However, the designs that were employed there work better on the traditional Xorg setup than they do on GTK 3’s Wayland backend. Now that GTK 4 can have a proper GL pipeline, there is a lot of room for improvement.
Thanks to the West Coast Hackfest, I had a chance to sit down with Matthias and work through that design. GtkLabel was already using some accelerated text rendering so we started by making that work for GtkTextView. Then we extended the GSK PangoRenderer to handle the rest of the needs of GtkTextView and Matthias re-implemented some features to avoid cairo fallbacks.
After the hackfest I also found time to implement layout caching of PangoLayout. It helps reduce some of the CPU overhead in calculating line layouts.
As we start using the GPU more it becomes increasingly important to keep the CPU usage low. If we don’t it’s very likely to raise overall energy usage. To help keep us honest, I’ve added some RAPL energy statistics to Sysprof.
Since my last post, I’ve been working on a redesign of Sysprof (among other things) to make it a bit more useful and friendly to newcomers.
Many years ago, I worked on a small profiler project called “Perfkit” that never really went anywhere. I had already done most of my UI research for this years ago, so it was pretty much just a matter of applying that design to the Sysprof code-base.
Now you can individually show extra detail rows and much more. Same great Sysprof 3-part callgraph breakdown.
I’ll get some delayed 3.33.3 tarballs out this week.
Our first 3.33 release has landed as we move towards 3.34. There is a lot to do this cycle in case you’re interested in contributing. The best way to get started is to dive into the code. We can help you with that on IRC.
Lots of this release is code behind the scenes, so screenshots won’t do them justice. But there are some visible goodies too.
We got a D-Bus Inspector inspired by D-feet. The long term goal is to merge that new code into D-feet itself.
Occasionally we got issues filed about not being able to half-tile Builder. The smaller your screen size, the more unrealistic of an expectation that is, but we can certainly do a little better. The new search box takes up less space and animates in as necessary.
We gained some initial Podman support so long as your system has support for podman exec --preserve-fds. You can even debug inside the containers if they have gdb in the container.
Some users were surprised by how Builder auto-downloaded SDKs and Runtimes to build projects. So we’ve added a confirmation dialog to that process so the user can make informed decisions.
I’ve also moved Git integration out of process. Both the Git and Flatpak plugins previously used git in process which had a number of drawbacks. We created some new API and the gnome-builder-git daemon which provides access to common git operations using libgit2-glib and D-Bus serialization binary format over stdin/stdout. We have push, stage, and signed commit support, but that currently lacks UI in this release.
For a subset of GNOME modules, I’m still using jhbuild. I also spend a great deal of time tracking down memory bugs in various libraries. So it is very handy to have libasan.so working with meson -Db_sanitize=address.
To make things work, you currently need to:
dnf install libasan liblsan (or similar for your OS).