GTK 3 Frame Profiler

I back-ported the GTK 4 frame-profiler by Matthias to GTK 3 today. Here is an example of a JavaScript application using GJS and GTK 3. The data contains mixed native and JS stack-traces along with compositor frame information from gnome-shell.

What is going on here is that we have binary streams (although usually captured into a memfd) that come from various systems. GJS is delivered a FD via GJS_TRACE_FD environment variable. GTK is delivered a FD via GTK_TRACE_FD. We get data from GNOME Shell using the org.gnome.Sysprof3.Profiler D-Bus service exported on the org.gnome.Shell peer. Stack-traces come from GJS using SIGPROF and the SpiderMonkey profiler API. Native stack traces come from the Linux kernel’s perf_event_open system. Various data like CPU frequency and usage, memory and such come from /proc.

Muxing all the data streams uses sysprof_capture_writer_cat() which knows how to read data frames from supplemental captures and adjust counter IDs, JIT-mappings, and other file-specific data into a joined capture.

A quick reminder that we have a Platform Profiling Initiative in case you’re interested in helping out on this stuff.

Sysprof Developments

This week I spent a little time fixing up a number of integration points with Sysprof and our tooling.

The libsysprof-capture-3.a static library is now licensed under the BSD 2-clause plus patent to make things easier to consume from all sorts of libraries and applications.

We have a MR for GJS to switch to libsysprof-capture-3.a and improve plumbing so Sysprof can connect automatically.

We also have a number of patches for GLib and GTK that improve the chances we can get useful stack-traces when unwinding from the Linux kernel (which perf_event_open does).

A MR for GNOME Shell automatically connects the GJS profiler which is required as libgjs is being used as a library here. The previous GJS patches only wire things up when the gjs binary is used.

With that stuff in place, you can get quite a bit of data correlated now.

# Logout, Switch to VT2
sysprof-cli -c "gnome-shell --wayland --display-server" --gjs --gnome-shell my-capture.syscap

If you don’t want mixed SpiderMonkey and perf stack-traces, you can use --no-perf. You can’t really rely on sample rates between two systems at the same time anyway.

With that in place, you can start correlating more frame data.

Sysprof Developments

Earlier this month, Matthias and I teamed up to push through some of our profiling tooling for GTK and GNOME. We took the occasional work I had done on Sysprof over the past few years and integrated that into the GTK-4.x tree.

Sysprof uses a binary log file to store information about execution in a manner that is easy to write-buffer and read-back using positioned reads. It helps keep the sampling overhead of sysprof low. But it’s too detail oriented for each application supporting the format to write. To make this stuff reusable I created a libsysprof-capture-3.a static library we embed from various layers of the platform.

GTK-4.x is now using this. Builder itself uses it to log internal statistics, tracing data, and counters for troubleshooting. I’ve also put forward patches for GJS to integrate with it. Georges revamped and pushed forward a prototype by Jonas to integrate with Mutter/Shell and get us frame timings and Cogl pipeline data. With some work we can finish off the i915 data sources that Eric Anholt did to correlate GPU commands too.

What this means for developers is that soon we’ll be able to capture system information from various layers in the stack and correlate them using similar clocks. We’re only scratching the surface right now, but it’s definitely promising. It’s already useful to quantify the true performance improvements of merge-requests in Mutter and Shell.

To help achieve this goal during the 3.34 cycle, I’ve started the GNOME Profiling Initiative to track integration of various problem spaces. If you’re an application developer, you can use sysprof_capture_writer_new_for_env() to create a new SysprofCaptureWriter if Sysprof is profiling you (otherwise you’ll get NULL back). Use that to write marks, logs, metadata, files, or anything else you would like to capture.

If you’re interested in helping to write more data collectors, that would be appreciated. Data sources like battery voltage/wattage consumption seem like good candidates so that we can better optimize our platform for battery-based devices.

I have a Sysprof Copr repository for Rawhide and F30 if you’d like to try stuff out and submit issues.

Many thanks to Red Hat for sponsoring all the work I do on GNOME and my amazing manager Matthias for visiting Portland to make this stuff happen even sooner.

As always, follow my grumpy ramblings on birdsite for early previews of my work.

Flatpak at SCaLE 15x

A decade ago I lived on the west side of Los Angeles. One of my favorite conferences was Southern California Linux Expo. Much like Karen, this is the conference where I performed my first technical talk. It’s also where I met and became friends with great people like Jono, Ted, Jeff, the fantastic organizing staff, and so many more.

I was happy to come back again this year and talk about what I’ve been working on in GNOME. The combination of Flatpak and Builder (and Sysprof).

The event was live streamed on youtube, and you can watch it here. I expect the rooms to be cut and uploaded as individual talks but I don’t know the timeline for that. I’ll update this if/when I discover that so you can youtube-dl if you prefer.

Profiling Flatpak’d applications

One of the great powers of namespace APIs on Linux (mount namespaces, user namespaces, etc) is that you can create a new view into the world of your computer that is very different from the host. This can make traditional profiling tools difficult.

To begin with, we need to ensure that we have access to ptrace or perf infrastructure. Easy enough, just don’t drop those privileges before calling execve(). This is the --allow=devel option to flatpak run. But after that, we need to do the detailed phase of translating instruction pointers to a function name.

Generally, the translation between an instruction pointer and a function name requires looking up what file is mapped over the address. Then you open that file with an ELF reader and locate the file containing debug information (which may be the same file). Then open that file and locate what function contains that instruction pointer. (We subtract the beginning of the map from the instruction pointer to get a relative offset).

Now, here is the trouble with mount namespaces. The “path” of the map might be something like “/newroot/usr/lib/libgtk-3.so“. Of course, “/newroot/” doesn’t actually exist.

So… what to do.

Well, we can find information about the mounts in the process by looking at /proc/$pid/mountinfo. Just look for the longest common prefix to get the translation from the mount namespace into the host. One caveat though. The translated path is relative to the root of the partition mounted. So the next step is to locate the file-system mount on the host (via /proc/mounts).

If you put this all together, you can profile and extract symbols from applications running inside of containers.

Now, most applications aren’t going to ship debug symbols with them. But we’ll need those for proper symbol extraction. So if you install the .Debug variant of a Flatpak runtime (org.gnome.Sdk.Debug for example) then we can setup symbol resolution specifically for /usr/lib/debug. Builder now does this.

I’m not sure, but I think this makes Builder and Sysprof one of the first (if not the first) profiler on Linux to support symbol extraction while profiling containerized applications from outside the container.

I’ve only tested with Flatpak, but I don’t see why this code can’t work with other tooling using mount namespaces.

Sysprof Plans for 3.24

The 3.24 cycle is just getting started, and I have a few plans for Sysprof to give us a more polished profiling experience in Builder. The details can be found on the mailing list.

In particular, I’d love to land support for visualizers. I expect this to happen soon, since there is just a little bit more to work through to make that viable. This will enable us to get a more holistic view of performance and allow us to drill into callgraphs during a certain problematic period of the profile.

Once we have visualizer support, we can start doing really cool things like extracting GPU counters, gdk/clutter frame-clock timing, dbus/cpu/network monitors and whatever else you come up with.

A CPU visualizer displayed above the callgraph to provide additional context to the profiled execution.
A CPU visualizer displayed above the callgraph to provide additional context to the profiled execution.

Additionally we have some work to do around getting access to symbols when we are running in binary stripped environments. This means we can upload a stripped binary to your IoT/low-power device to profile, but have the instruction-pointer-to-symbol resolver happen on the developers workstation.

As I just alluded to, I’d love to see remote profiling happen too. There is some plumbing that needs to occur here, but in general it shouldn’t be terribly complicated.

Builder Nightly Flatpak

First off, I’ll be in Portland at the first ever LAS giving demos and starting on Builder features for 3.24.

For a while now, you’ve been able to get Builder from the gnome-apps-nightly Flatpak repository. Until now, it had a few things that made it difficult to use. We care a whole lot about making our tooling available via Flatpak because it is going to allow us to get new code into users hands quicker, safer, and more stable.

So over the last couple of weeks I’ve dug in and really started polishing things up. A few patches in Flatpak, a few patches in Builder, and a few patches in Sysprof start getting us towards something refreshing.

Python Jedi should be working now. So you can autocomplete in Python (including GObject Introspection) to your hearts delight.

Jhbuild from within the Flatpak works quite well now. So if you have a jhbuild environment on your host, you can use the Flatpak nightly and still target your jhbuild setup.

One of the tricks to being a module maintainer is getting other people to do your work. Thankfully the magnificent Patrick Griffis came to the rescue and got polkit building inside of Flatpak. Combined with some additional Sysprof patches, we have a profiler that can run from Flatpak.

Another pain point was that the terminal was inside of a pid/mount/network namespace different than that of the host. This meant that /usr/lib was actually from the Flatpak runtime, not your system /usr/lib. This has been fixed using one of the new developer features in Flatpak.

Flatpak now supports executing programs on the host (a sandbox breakout) for applications that are marked as developer tools. For those of you building your own Flatpaks, this requires --allow=devel when running the flatpak build-finish command. Of course, one could expect UI/UX flows to make this known to the user so that it doesn’t get abused for nefarious purposes.

Now that we have access to execute a command on the host using the HostCommand method of the org.freedesktop.Flatpak.Development interface, we can piggy back to execute our shell.

The typical pty dance, performed in our program.

/* error handling excluded */
int master_fd = vte_pty_get_fd (pty);
grantpt (master_fd);
unlockpt (master_fd);
char *name = ptsname (master_fd);
int pty_fd = open (name, O_RDWR, 0);

Then when executing the HostCommand method we simply pass pty_fd as a mapping to stdin (0), stdout (1), and stderr (2).

On the Flatpak side, it will check if any of these file-descriptors are a tty (with the convenient isatty() function. If so, it performs the necessary ioctl() to make our spawned process the controller of the pty. Lovely!

So now we have a terminal, whose process is running on the host, using a pty from inside our Flatpak.

Sysprof + Builder

After the GNOME 3.20 cycle completed I started revamping Sysprof. More here, here, and here. The development went so smoothly that I did a 3.20 release a couple of weeks later.

A primary motivation of that work was rebuilding Sysprof into a set of libraries for building new tools. In particular, I wanted to integrate Sysprof with Builder as our profiler of choice.

On my flight back from GUADEC I laid the groundwork to integrate these two projects. As of Builder 3.21.90 (released yesterday) you can now profile your project quite easily. There are  more corner cases we need to handle but I consider those incremental bugs now.

Some of our upcoming work will be to integrate the Sysprof data collectors with Python and Gjs. The Gjs implementation is written, it just needs polish and integration with upstream. I think it will be fantastic once we have a compelling profiling story weather you are writing C, C++, Vala, Python, or Gjs.

We’ve also expanded the architectures supported by Sysprof. So I expect by time 3.22 is released, Sysprof will support POWER8, ARM, ARM64, mips, and various others as long as you have an up to date Linux kernel. That is an important part of our future plans to support remote profiling (possibly over USB or TCP). If you’re interested in working on this, contact me! The plumbing is there, we just need someone with time and ambition to lead the charge.

Selecting the Profiler

Builder's Callgraph View

How to Sysprof

So now that a new Sysprof release is shipped, lets pick on an unsuspecting library to see what it is like to improve performance in a real-world scenario. Today we’ll pick on GtkSourceView. They shouldn’t feel bad though, GtkSourceView is an absolutely wonderful library and like any piece of software, it can be improved.

GtkSourceView has a lovely helper program to test things out in the tests/ directory. If you are an app or library developer, please do this! It makes things much easier.

So lets run ./test-widget in our jhbuild environment, and start-up Sysprof. Often times, you’ll want to see how your program affects the whole system. But for this test, I want to focus on test-widget, so we will limit our capture to samples in that process. Do this by turning off the Profile my entire system switch and then selecting your target process from the list.

1-setup-profiler

Next, click record. You might be prompted to authorize your user to access performance counters based on your system configuration and user permissions.

2-click-record

After your profiling session has started, switch back to the test application and exercise the crap out of it. In this case, I turned on some features in the test widget like line numbers (something I always have enabled in Gedit and Builder) and started scrolling like crazy. I did this until I had about 30,000 samples recorded. Sysprof will tell you how many callchains have been recorded in the upper right corner of the window.

Then press the Stop button. Depending on the size of your capture, it might take a couple seconds, but the callgraph will then be generated. It has to crack open all of the linked libraries and extract symbol information from them, so it can take a second or two.

3-view-callgraph

Now the mysterious part. Start diving into the descendants tree following the most expensive cumulative times. We want to find something that looks “out of place”. Getting good at that takes practice. If your callchain gets too deep, just hit enter on the row and it will focus in on that item.

In the image below, you’ll see I jumped past main, various main loop junk until i got to gtk_main_do_event(). This is the crux of event dispatch in GTK+. If we keep diving down by the most expensive callchain, we get to a peculiar function, center_on(). It seems to be calling into gtk_text_view_get_iter_location() a bunch, I wonder why.

4-center_on

So lets go find the code. It is clearly called by GtkSourceGutterRendererText, so that is where we will start.

In the code below, it looks like the text gutter renderer (what draws line numbers next to your code) needs to either place the text in the middle of the row, the top of the row (in the case of line wrapping), or the bottom of the row (also in case of line wrapping).

5-find-relevant-code

In Builder, shamefully, we don’t even allow line wrapping today. So clearly a shortcut can be taken. If wrapping is disabled, we know that we will always be centering our text to the entire height of the cell. So lets cook up a quick patch to avoid the center_on() calls altogether.

5.5-patch-the-code

Now we build, and repeat our profiling session to compare the results. Originally the gutter_renderer_text_draw() was in about 33% of our collected callchains. Now, if you look below we are down to less than 20% of our collected callchains, and center_on() is nowhere to be seen!

6-compare-results

So the moral of the story is that in about half an hour, you can profile, learn something about a code-base, and make measurable improvements. So go ye forth and make the F in Free Software stand for Fast.