I obsess over battery life. So having a working Sysprof in Fedora 39 with actually useful frame-pointers has been lovely. I heard it asked at an All Systems Go talk if having frame-pointers enabled has gained any large performance improvements and that probably deserves addressing.
The answer to that is quite simply yes. Sometimes it’s directly a side-effect of me and others sending performance patches (such as Shell search performance or systemd-oomd patches). Sometimes it just prevents the issues from showing up on peoples systems to begin with. Basically all the new code I write now is done in tandem with Sysprof to visualize how things ran. Misguided choices often stick out earlier.
I think it’s also important to recognize that in addition to gaining performance improvements we’ve not seen people complain about performance regressions. That means we can have visibility to improve things without a significant burden in exchange.
Here is a little gem that I would have been unlikely to find without system-wide frame-pointers. Basically API contract validation needs to do a couple lookups for flags on the TypeNode
for GTypeInstance
. I’ll remind the reader that GTypeInstance
is what underlies GObject
, GskRenderNode
, and is likely to be our “performance escape hatch” from GObject
.
Those checks, in particular for G_TYPE_IS_ABSTRACT()
and G_TYPE_IS_DEPRECATED()
were easily taking up nearly a percent of samples in some tight loop tests (like creating thousands of GTK render nodes). It turns out that both g_type_create_instance()
and g_type_free_instance()
were doing these checks. Additionally g_value_unset()
on a GBoxed
type can do this too (via g_boxed_free()
). That gets used all the time for closure invocations such as through the g_signal_*
API.
A quick peek with Sysprof, thanks to those frame-pointers, shows the common code paths which hit this. It looks like the flags for abstract
and deprecated
are stored on an accessory object for the TypeNode
. This is a vestige of a day where we must have thought it prudent to be very tight about memory consumption in TypeNode
s. But unfortunately, accessing that accessory data requires acquiring the read side of a GRWLock
because the type system is mutable. As it were, there is space to cache these bits in the TypeNode
directly and the patch linked above does just that.
Combining the above patch with this patch from Emmanuele does wonders for the g_type_create_instance()
performance. It basically drops things down to the cost of your malloc()
implementation, which is much more ideal.
All of this was only on my radar because I was fixing up a few performance issues in GTK’s OpenGL renderer. Getting extraneous TypeNode
checks out of hot code paths and instead at consumer API boundaries instead is always a win for performance.
This is just one example of many. And thankfully, many more people are capable of casually improving performance rather than relying on someone like me thanks to Sysprof and frame-pointers on Fedora.