I obsess over battery life. So having a working Sysprof in Fedora 39 with actually useful frame-pointers has been lovely. I heard it asked at an All Systems Go talk if having frame-pointers enabled has gained any large performance improvements and that probably deserves addressing.
The answer to that is quite simply yes. Sometimes it’s directly a side-effect of me and others sending performance patches (such as Shell search performance or systemd-oomd patches). Sometimes it just prevents the issues from showing up on peoples systems to begin with. Basically all the new code I write now is done in tandem with Sysprof to visualize how things ran. Misguided choices often stick out earlier.
I think it’s also important to recognize that in addition to gaining performance improvements we’ve not seen people complain about performance regressions. That means we can have visibility to improve things without a significant burden in exchange.
Here is a little gem that I would have been unlikely to find without system-wide frame-pointers. Basically API contract validation needs to do a couple lookups for flags on the
GTypeInstance. I’ll remind the reader that
GTypeInstance is what underlies
GskRenderNode, and is likely to be our “performance escape hatch” from
Those checks, in particular for
G_TYPE_IS_DEPRECATED() were easily taking up nearly a percent of samples in some tight loop tests (like creating thousands of GTK render nodes). It turns out that both
g_type_free_instance() were doing these checks. Additionally
g_value_unset() on a
GBoxed type can do this too (via
g_boxed_free()). That gets used all the time for closure invocations such as through the
A quick peek with Sysprof, thanks to those frame-pointers, shows the common code paths which hit this. It looks like the flags for
deprecated are stored on an accessory object for the
TypeNode. This is a vestige of a day where we must have thought it prudent to be very tight about memory consumption in
TypeNodes. But unfortunately, accessing that accessory data requires acquiring the read side of a
GRWLock because the type system is mutable. As it were, there is space to cache these bits in the
TypeNode directly and the patch linked above does just that.
Combining the above patch with this patch from Emmanuele does wonders for the
g_type_create_instance() performance. It basically drops things down to the cost of your
malloc() implementation, which is much more ideal.
All of this was only on my radar because I was fixing up a few performance issues in GTK’s OpenGL renderer. Getting extraneous
TypeNode checks out of hot code paths and instead at consumer API boundaries instead is always a win for performance.
This is just one example of many. And thankfully, many more people are capable of casually improving performance rather than relying on someone like me thanks to Sysprof and frame-pointers on Fedora.