How to use Sysprof to…

First off, before using Sysprof to improve the performance of a particular piece of software, make sure you’re compiling with flags that allow us to have enough information to unwind stack frames. Sysprof will use libunwind in some cases, but a majority of our stack unwinding is done by the Linux kernel which can currently only follow eh_frame (exception handling) information.

In my ~/.config/jhbuildrc, I have the following:

os.environ['CFLAGS'] = '-ggdb -O2 -fno-omit-frame-pointer'
os.environ['G_SLICE'] = 'always-malloc'

I generally disable the G_SLICE allocator because it isn’t really all that helpful on modern Linux systems using glibc and can also make it more difficult to track down leaks. Furthermore, it can get in the way of releasing memory back to the system in the form of malloc_trim() should we start doing that in the future. (Hint, I’d like to).

Finding code run often on the system

Sysprof, at it’s core, is a “whole system” profiler. That means it is not designed to profile just your single program, but instead all the processes on the system. This is very useful in a desktop scenario where we have lots of interconnected components.

Ensure the “Callgraph” aid is selected and click “Record”.

At this point, excercise your system to try to bring out the behavior you want to optimize. Then click “Stop” to stop recording and view the results.

You’ll be presented with a callgraph like the following after it has completed recording and loaded the information.

You’ll notice a lot of time in gnome-software there. It turns out I’m on a F32 alpha install and there was a behavior change in libcurl that has screwed up a number of previously valid use cases. But if I didn’t know that already, this would point me where to start looking. You’ll notice that I hadn’t compiled libcurl or gnome-software from source, so the stack traces are not as detailed as they would be otherwise.

On the right side is a callgraph starting from “[Everything]”. It is split out by process and then by the callstack you see in that program. On the top-left side, is a list of all functions that were collected (and decoded). On the bottom-left side is a list of callers for the selected function above it. This is useful when you want to backtrack to all the places a function was called. (Note that this is a sampling-based profiler, so there is no guarantee all functions were intercepted).

Use this information to find the relevant code within a particular project. Tweak some things, try again, test…

Tracking down extraneous allocations

One of the things that can slow down your application is doing memory allocations in the hot paths. Allocating memory is still pretty expensive compared to all of the other things your application could be doing.

In 3.36, Sysprof gained support for tracking memory allocations with a LD_PRELOAD. However, it must spawn the application directly.

Start by toggling “Launch Application” and set arguments for the application you want to profile. Select “Track Allocations”.

At this point run your application to exercise the targeted behavior. Then press “Stop” and you’ll be presented with the recording. Usually the normal callgraph is selected by default. Select the “Memory Allocations” row and you’ll see the memory callgraph.

This time you’ll see memory allocation size next to the function. Explore a bit, and look for things that seem out of place. In the following image, I notice a lot of transforms being allocated. After a quick discussion with Benjamin, he landed a small patch to make those go away. So sometimes you don’t even have to write code yourself!

A variant of this patch went into Mutter’s copy of Clutter for a healthy memory improvement too.

Finding main loop slow downs

In Sysprof master, we have a “Speedtrack” aid that can help you find various long running operations such as fsync(). I used this late in the 3.36 cycle to fix a bunch of I/O happening on GNOME Shell’s compositor thread. Select the “Speedtrack” aid, and disable the “Callgraph” as that will clash with speedtrack currently. This also uses an LD_PRELOAD so you’ll have to spawn the application just like for memory tracking.

The aid will give you callgraphs of various things that happened in your main thread that you might want to avoid doing. Stuff like fsync(), read() and more. It also creates marks for the duration of these calls so you can track down how long they ran for.

Deep in Pango, various files are being loaded on demand which can mean expensive read() during the main loop operations.

You can also see how long some operations have taken. Here we see g_main_context_iteration() took 22 milliseconds. On a 60hz system, that can’t be good because we either missed a frame or took too long to do something to be able to submit our frame in time. You can select the time range by activating this row. In the future we want this to play better with callgraphs so you can see what was sampled during that timespan.

Anyway, I hope that gives you some insight into how to use things!

2 thoughts on “How to use Sysprof to…”

  1. > make sure you’re compiling with flags that allow us to have enough information to unwind stack frames

    Oh, so that’s why I could never get it to show useful info 🙂 How much do these flags slow things down? Maybe some of them could be enabled in org.gnome.Platform//master so it’s not jhbuild-only.

    1. Well my opinion is that -fomit-frame-pointer is not really of much help on x86_64. It can shrink your binary size a bit, but unless you’re doing database benchmarks, it’s negligible and can make debugging production software a giant pain. On x86 it was helpful because compilers could avoid having to spill arguments to the stack.

Comments are closed.