First off, before using Sysprof to improve the performance of a particular piece of software, make sure you’re compiling with flags that allow us to have enough information to unwind stack frames. Sysprof will use libunwind
in some cases, but a majority of our stack unwinding is done by the Linux kernel which can currently only follow eh_frame
(exception handling) information.
In my ~/.config/jhbuildrc
, I have the following:
os.environ['CFLAGS'] = '-ggdb -O2 -fno-omit-frame-pointer'
os.environ['G_SLICE'] = 'always-malloc'
I generally disable the G_SLICE
allocator because it isn’t really all that helpful on modern Linux systems using glibc and can also make it more difficult to track down leaks. Furthermore, it can get in the way of releasing memory back to the system in the form of malloc_trim()
should we start doing that in the future. (Hint, I’d like to).
Finding code run often on the system
Sysprof, at it’s core, is a “whole system” profiler. That means it is not designed to profile just your single program, but instead all the processes on the system. This is very useful in a desktop scenario where we have lots of interconnected components.
At this point, excercise your system to try to bring out the behavior you want to optimize. Then click “Stop” to stop recording and view the results.
You’ll notice a lot of time in gnome-software
there. It turns out I’m on a F32 alpha install and there was a behavior change in libcurl that has screwed up a number of previously valid use cases. But if I didn’t know that already, this would point me where to start looking. You’ll notice that I hadn’t compiled libcurl
or gnome-software
from source, so the stack traces are not as detailed as they would be otherwise.
On the right side is a callgraph starting from “[Everything]”. It is split out by process and then by the callstack you see in that program. On the top-left side, is a list of all functions that were collected (and decoded). On the bottom-left side is a list of callers for the selected function above it. This is useful when you want to backtrack to all the places a function was called. (Note that this is a sampling-based profiler, so there is no guarantee all functions were intercepted).
Use this information to find the relevant code within a particular project. Tweak some things, try again, test…
Tracking down extraneous allocations
One of the things that can slow down your application is doing memory allocations in the hot paths. Allocating memory is still pretty expensive compared to all of the other things your application could be doing.
In 3.36, Sysprof gained support for tracking memory allocations with a LD_PRELOAD
. However, it must spawn the application directly.
At this point run your application to exercise the targeted behavior. Then press “Stop” and you’ll be presented with the recording. Usually the normal callgraph is selected by default. Select the “Memory Allocations” row and you’ll see the memory callgraph.
This time you’ll see memory allocation size next to the function. Explore a bit, and look for things that seem out of place. In the following image, I notice a lot of transforms being allocated. After a quick discussion with Benjamin, he landed a small patch to make those go away. So sometimes you don’t even have to write code yourself!
A variant of this patch went into Mutter’s copy of Clutter for a healthy memory improvement too.
Finding main loop slow downs
In Sysprof master, we have a “Speedtrack” aid that can help you find various long running operations such as fsync()
. I used this late in the 3.36 cycle to fix a bunch of I/O happening on GNOME Shell’s compositor thread. Select the “Speedtrack” aid, and disable the “Callgraph” as that will clash with speedtrack currently. This also uses an LD_PRELOAD
so you’ll have to spawn the application just like for memory tracking.
The aid will give you callgraphs of various things that happened in your main thread that you might want to avoid doing. Stuff like fsync()
, read()
and more. It also creates marks for the duration of these calls so you can track down how long they ran for.
You can also see how long some operations have taken. Here we see g_main_context_iteration()
took 22 milliseconds. On a 60hz system, that can’t be good because we either missed a frame or took too long to do something to be able to submit our frame in time. You can select the time range by activating this row. In the future we want this to play better with callgraphs so you can see what was sampled during that timespan.
Anyway, I hope that gives you some insight into how to use things!
> make sure you’re compiling with flags that allow us to have enough information to unwind stack frames
Oh, so that’s why I could never get it to show useful info 🙂 How much do these flags slow things down? Maybe some of them could be enabled in org.gnome.Platform//master so it’s not jhbuild-only.
Well my opinion is that
-fomit-frame-pointer
is not really of much help onx86_64
. It can shrink your binary size a bit, but unless you’re doing database benchmarks, it’s negligible and can make debugging production software a giant pain. Onx86
it was helpful because compilers could avoid having to spill arguments to the stack.