Accelerators, a history lesson

It’s good to dive into our shared history every now and again to learn something new. We want to build a customized shortcut engine for Builder and that means we need to have a solid understanding of all the ways to activate shortcuts in GTK+. So the following is a list of what I’ve found, as of GTK+ 3.22. I’ve included some pros and cons of each based on my experience using them in particularly large applications.

GtkAccelGroup and GtkAccelMap

The GtkAccelGroup class is remnant from the days before GtkUIManager was deprecated. It is a structure that is attached to a top-level GtkWindow and maps “paths” to closures. It’s purpose was to be used with a GtkAccelMap to ultimately bind accelerators (such as “<control>q”) to a closure (such as gtk_main_quit).

The GtkAccelMap is a singleton that maps accelerators to paths. When activating a keybinding, the accelerator machinery will try to match the path with one registered in the GtkAccelGroup on the window.

Pros

  • Widgets can be activated no matter what the focus widget. However, the window must be focused.
  • It is simple to display keyboard accelerators next to menu items.
  • Creating an accelerator editor is straight forward. Lookup accel path, map to accelerator.

Cons

  • Not deprecated, but much of the related machinery is deprecated.
  • Fairly complex API usage. It doesn’t seem to be designed to be used by application developers.

Signal Actions

Signal actions are GObject signals that contain the G_SIGNAL_ACTION bit set. By marking this bit, we allow the GTK+ machinery to map an accelerator to a particular signal on the GtkWidget (or any ancestor widget in the hierarchy if it matches).

In the GTK+ 2.x days, we would control these keybindings using a gtkrc file. But today, we use GTK+ CSS to create binding sets and attach them to widgets using selectors. This is how the VIM implementation in Builder is done.

It should be noted that signal actions (and binding them from gtkrc/css) were meant to provide a way to provide key themes to GTK+ such as an emacs mode. It was not meant to be the generic way that applications should use to activate keybindings.

Pros

  • You can use CSS selectors to apply binding sets (groups of accelerators) to widgets.
  • Accelerators can include parameters to signals, meaning you can be quite flexible in what gets activated. With enough bong hits you’ll eventually land at vim.css.

Cons

  • Only the highest-priority CSS selector can attach binding sets to the widget. So the highest-priority needs to know about all binding sets that are necessary. That reduces the effectiveness of this as a generic solution.
  • This relies on the focus chain for activation. That means you can’t have a signal activate if another widget focused such as a something in the header bar.
  • It feels like a layer violation. I could make arguments on both side of the fence for this, which is sort of indicative of the point.
  • We have to “unbind” various default accelerators on widgets that collide with our key themes. This means that “key themes” are not only additive, but destructive as well.

GAction

Years back, we wanted the ability to activate actions inside of a program in a structured way. This allowed for external menu items as well as simplifying the process of single-instance applications. This was integrated into the GtkWidget hierarchy as well. Using gtk_application_set_accels_for_action() you can map an accelerator to a GAction. Using the current widget focus, the GTK+ machinery will work its way up the widget hierarchy until it finds a matching action. For example, if a widget has a “foo” GActionGroup associated with it, and that group contains an action named “bar”, then mapping “<control>b” to “foo.bar” would activate that action.

Pros

  • Actions can have state, meaning that we can attach them to things like toggle buttons.
  • Application level actions can be activated no matter what window is focused.
  • It’s fairly easy to reason about what actions are discoverable and activatable from a node in the widget tree. Especially with GTK+ Inspector integration.
  • Theoretically we could rebuild a11y on top of GAction and fix a lot of synchronous IPC to be async similar to how GMenu integrates with external processes. This would require a lot of external collaboration on various stacks though, so it isn’t an immediate “Pro”.

Cons

  • Action activation requires the focus hierarchy to activate the action. This means to get something activatable from the entire window we need to attach the actions to the top-level. This can be a bit inconvenient, especially in multi-document scenarios which do not share widgetry.
  • Actions are rather verbose to implement. Doing them cleanly means implementing proper API to perform the action, and then wrapping those using GActionEntry or similar. Additionally, to propagate state, the base API needs to know about the action so it can update action state. Coming up with a strategy for who is the “owner” of state is challenging for newcomers.
  • The way to attach accelerators is via gtk_application_set_accels_for_action() which is a bit weird for widgets that would want to provide accelerators. It means the applications are given more freedom to control how accelerators work at the cost of having to setup a lot of their own mechanics for activation.

What do we need in Builder?

Builder certainly lies on the “more complex” side of things in terms of what is needed from the toolkit for accelerators. Let’s go over a few of those necessities to help create a wishlist for whatever code gets written.

VIM and Emacs Keybindings

These will never be 100% perfect, but we need reasonable implementations that allow Emacs and VIM users not scream at their computer while trying to switch to a new tool. That means we need to hit the 80% really well, even if that last 20% is diminishing returns. Today, we’ve found a way to mostly do this using G_SIGNAL_ACTION and a lot of custom GTK+ CSS. The con mentioned above of GTK+ CSS requiring that the -gtk-key-bindings: CSS property know about all binding sets that need to be applied from the selector make it unrealistic for us to keep pushing this further.

Custom Accelerators

We want the user to be able to start from a basic “key theme” such as Gedit, VIM, or Emacs and apply their own overrides. This means that our keybinding registration cannot be static, but flexible to account for changes while the application is running. We also need to know when there are collisions with other accelerators so that we can let the user know what they are doing may have side-effects.

GtkShortcutsWindow Integration

In Builder we do not account for the “key theme” in the shortcuts window. This can be confusing to users of Emacs and Vim mode as what they see may not actually activate the desired action. We would like a way to map the accelerators available (grouped and with proper sections) to be automatically reflected in our GtkShortcutsWindow.

Elevating Accelerator Visibility

The shortcuts window is nice, but making the accelerators available to the user while they are exploring the interface is far more beneficial (based on my experiences). So whatever we end up implementing needs to make it fairly trivial to display accelerators in the UI. Today, we do that manually by hard-coding these accelerators into the GtkBuilder UI files. That is less than ideal.

Window-wide Activation

If I’m focused on the source code editor, I may want to activate an accelerator from a panel that is visible. For example, <control><shift>F to activate the documentation search.

Using actions to do this today would require that all panel plugins register their actions on the top-level as well as keybindings using gtk_application_set_accels_for_action(). While this is acceptable for the underlying machinery, it’s almost certainly not the API we want to expose to plugin developers in Builder. It’s tedious and prone to breakage when multiple “key themes” are in play. So we need an abstraction layer here that can use the appropriate strategy for what the plugin has in mind and also allows for different accelerators based on the “key theme”.

Re-usability

There are other large applications in the GNOME eco-system that I care about. GIMP is one of those projects that has a hard time moving to GTK+ 3.x due to a few reasons, some of which are out of our control upstream. The sheer size of the application makes it difficult. Graphical tools often are full of custom widgetry which necessarily dives into implementation details. Another reason is how important accelerators are to an immersive, creativity-based application. I would like whatever we create to be useful to other projects (after a few API iterations). GIMP’s GTK+ 3 port is one obvious possibility.

Builder Happenings

Over the last couple of weeks I’ve started implementing Run support for Builder. This is somewhat tricky business since we care about complicated manners. Everything from autotools support to profiler/debugger integration to flatpak and jhbuild runtime support. Each of these complicates and contorts the problem in various directions.

Discovering the “target binary” in your project via autotools is no easy task. We have a few clever tricks but more are needed. One thing we don’t support yet is discovering bin_SCRIPTS. So launching targets like python or gjs applications is not ready yet. Once we discover .desktop files that should start working. Pretty much any other build system would be easier to implement this.

So much work behind the scenes for a cute little button.

Screenshot from 2016-07-17 22-09-21

While running, you can click stop to kill the spawned application’s process group.

Screenshot from 2016-07-17 22-10-38

But that is not all that is going on. Matthew Leeds has been working on search and replace recently (as well as a whole bunch of paper cuts). That work has landed and it is looking really good!

Screenshot from 2016-07-17 22-12-45

Also exciting is that thanks to Sebastien Lafargue we have a fancy new color picker that integrated with Builder’s source editor. You can visualize colors in your source files and change them using the dropper or numerous sliders based on how you’d like to tweak the color.

Screenshot from 2016-07-17 22-11-38

I still have a bunch of things I want to land before GUADEC, so back to the hacker den for me.

Builder Designs

Thanks to the wonderful design skill of Allan, Builder got a bunch of new designs this last month. Last week, after arriving home from the Toronto hackfest, I started reshaping Builder to match.

We’ve simplified the greeter to remove an interstitial page and raise the discoverability of some of Builder’s features.

greeter

Creating a project has been simplified and we made some features more discoverable by removing the need to dive into a GtkComboBox. Clearly, we still have a bunch of templates to land for 3.22.

new-project

Cloning a project from git is functionally the same, but we give a bit more context to what is going on.

clone-project

Those that came to my talk at FOSDEM got to hear me complain about how we have a sidebar in Builder, but I didn’t like it and wanted something different going forward. Well, we finally have landed that.

The perspective selector is in the top left corner, and clicking on it will also show you some shortcuts to help you move through Builder faster. As we grow the application, we hope to provide a bunch more useful keybindings for folks. Some of this is going to require rethinking how keybindings work in modern gtk+.

perspective-selector

One thing that was relatively buried in Builder that we wanted to improve was the Build support. We want to make it easy to build your project without getting overwhelmed by panels. So we’ve introduced a new center widget in the header called the OmniBar. It provides a high-level overview of the project and information on the current build progress, if any.

omnibar

I hope to see some more iteration on this before 3.22, and possibly API to allow plugins to supplement information for the OmniBar.

Allan is still working on even more designs this cycle, so I have no doubt that 3.22 will be our best release yet.

Matthew Leeds has been working diligently on our new search and replace engine (in addition to many Vim compatibility improvements). I’m looking forward to having that land soon (and some screenshots to go with it)!

Builder 3.20.4

Those of us happy hackers in #gnome-builder have been diligently preparing 3.20.4 for you. I expect that most people will end up using this version during the 3.20 life-cycle as the big distros are starting to ship 3.20. We might do another 3.20 release, but I haven’t decided. There are lots of stability and performance improvements, and I’m pretty happy with where things are going.

Now that this release is out, it is probably time to start pushing hard on our 3.22 features. I’m happy to have Fangwen Yu working on Builder this summer on our search and replace engine. We have some great mockups in the works and I have no doubt Fangwen is going to do a great things with the code-base.

A few screenshots, because that’s what I’m known for.

Screenshot from 2016-05-06 18-17-10

Screenshot from 2016-05-06 18-20-29

Screenshot from 2016-05-06 18-22-38

Tarballs can be found on downloads.gnome.org.

Build System Fallbacks

It’s no secret that I focus my build system efforts in Builder around Autotools. I’m happy to include support for other build systems, so long as I’m not the person writing it.

Sometimes the right piece of code falls together so easily you wonder how the hell you didn’t think of it before. Today was such a day.

If you are using Builder from git (such as via jhbuild) or from the gnome-builder-3-20 branch (what will become 3.20.4) you can use Builder with the fallback build system. This is essentially our “NULL” build system and has been around forever. But today, these branches learned something so stupidly obvious I’m ashamed I didn’t do it 6 months ago when implementing Build Configurations.

If you go into the Build Configuration panel (the middle button in the sidebar) you can specify environment flags. Builder will use CFLAGS, CXXFLAGS, and VALAFLAGS now to prime those compilers in the absence of Autotools (where we have discovered these dynamically).

build-flags

And now semantic language features should work.

examples

How to Sysprof

So now that a new Sysprof release is shipped, lets pick on an unsuspecting library to see what it is like to improve performance in a real-world scenario. Today we’ll pick on GtkSourceView. They shouldn’t feel bad though, GtkSourceView is an absolutely wonderful library and like any piece of software, it can be improved.

GtkSourceView has a lovely helper program to test things out in the tests/ directory. If you are an app or library developer, please do this! It makes things much easier.

So lets run ./test-widget in our jhbuild environment, and start-up Sysprof. Often times, you’ll want to see how your program affects the whole system. But for this test, I want to focus on test-widget, so we will limit our capture to samples in that process. Do this by turning off the Profile my entire system switch and then selecting your target process from the list.

1-setup-profiler

Next, click record. You might be prompted to authorize your user to access performance counters based on your system configuration and user permissions.

2-click-record

After your profiling session has started, switch back to the test application and exercise the crap out of it. In this case, I turned on some features in the test widget like line numbers (something I always have enabled in Gedit and Builder) and started scrolling like crazy. I did this until I had about 30,000 samples recorded. Sysprof will tell you how many callchains have been recorded in the upper right corner of the window.

Then press the Stop button. Depending on the size of your capture, it might take a couple seconds, but the callgraph will then be generated. It has to crack open all of the linked libraries and extract symbol information from them, so it can take a second or two.

3-view-callgraph

Now the mysterious part. Start diving into the descendants tree following the most expensive cumulative times. We want to find something that looks “out of place”. Getting good at that takes practice. If your callchain gets too deep, just hit enter on the row and it will focus in on that item.

In the image below, you’ll see I jumped past main, various main loop junk until i got to gtk_main_do_event(). This is the crux of event dispatch in GTK+. If we keep diving down by the most expensive callchain, we get to a peculiar function, center_on(). It seems to be calling into gtk_text_view_get_iter_location() a bunch, I wonder why.

4-center_on

So lets go find the code. It is clearly called by GtkSourceGutterRendererText, so that is where we will start.

In the code below, it looks like the text gutter renderer (what draws line numbers next to your code) needs to either place the text in the middle of the row, the top of the row (in the case of line wrapping), or the bottom of the row (also in case of line wrapping).

5-find-relevant-code

In Builder, shamefully, we don’t even allow line wrapping today. So clearly a shortcut can be taken. If wrapping is disabled, we know that we will always be centering our text to the entire height of the cell. So lets cook up a quick patch to avoid the center_on() calls altogether.

5.5-patch-the-code

Now we build, and repeat our profiling session to compare the results. Originally the gutter_renderer_text_draw() was in about 33% of our collected callchains. Now, if you look below we are down to less than 20% of our collected callchains, and center_on() is nowhere to be seen!

6-compare-results

So the moral of the story is that in about half an hour, you can profile, learn something about a code-base, and make measurable improvements. So go ye forth and make the F in Free Software stand for Fast.

Designing APIs for multiple languages

Designing software that is both fast and available to higher level languages generally means you end up writing C. There are guiding principles you should follow when doing so to ensure that you give your software the best chance for success.

Design Failures

Lets start with a look into my past. When I was employed at MongoDB a few years back, I was tasked with writing the modern, fast, C client library. The secondary goal was to speed up the other drivers that used bits of C “for performance reasons”. However, the performance gain from the C components was a meager 1-2x faster than just implementing it in the higher-level language. This is what happens when we fail to see the big picture, which is the first step in understanding.

The cost of a thunk in and out of the language runtime is reasonably fast these days. But when you do lots of them they quickly add up. (A thunk is simply a wrapper around calling another function that possibly has setup/teardown and possibly marshaling to perform).

In the example above, the reason for the meager gains in performance from C was simple. It was encoding/decoding each individual BSON document by calling into C (and then back up into python, ruby, etc) rather than as a set. Imagine if you get a result from the server containing 1000 documents. In this case you’d cross the language barrier at least 1000 times. Now what if while decoding those documents you have to create structures that are owned by the language runtime (calling back into the runtime to allocate). Now your 1000 could have just turned into 3000 at best, and more likely, many times worse.

However, if you simply dive into C once to decode the whole stream, you cut a large number of thunks out of the equation. If instead you move the whole database client, socket handling, encryption, etc into C, you can avoid even more thunks. This is why wrapping the libmongoc C library in python was closer to 10-15x faster than the native python version compared to the meager 1-2x faster with per-document decoding.

By maximizing the time you are in C, you give yourself the largest potential for performance improvement. Where you draw your language boundary is equally important to the data-structures you choose.

GObject

We use GObject across the board in GNOME. And for a living piece of software that is nearly old enough to drink, that is a good thing. Like all type systems designed in the 1990s, it has some warts. But generally, it gets the job done and provides the inter-language features we want with very little effort.

But you need to be careful when designing APIs if you intend for them to be accessible from multiple languages. For example, if your API relies on gsignal (what other languages often call “events”), you should at least think about the costs.

For example, imagine that the callback connected to your signal is in python. Your C code knows nothing of python and therefore likely does not hold the GIL (global interpreter lock). That means that when your signal fires, and it tries to thunk to the python callback, it must first marshal parameters (possibly copying), and then acquire the python GIL (generally fine). Now imagine you do this many times per second because your design emits signal everywhere (GtkWidget, for example). Now all of a sudden you are entering/exiting the language barrier many times in rapid succession. The thunks add up.

A very similar but equally important thing is the use of main loop timeouts. In GLib-based code, we generally use some form of g_idle_add_full() that registers a new GSource. First off, for every one of these we have to wake up the main loop, mutate data structures, detect level-triggered poll events, and possibly destroy it at the end of the main loop cycle (for one-shot sources). And that doesn’t even include the callback into your language runtime. Now imagine you do this on every frame of an animation. Now imagine that for every frame of the animation you update multiple actors in your scene graph. All of a sudden your thunk costs went through the roof, and you haven’t done any actual work yet.

Designing for success

So, how do we design APIs that don’t suffer from these issues? Well first off, really consider whether the use of gsignal is beneficial.

  • Avoid gsignal when simply a single callback function will suffice. gsignal synchronizes all emissions via the global lock used to locate signal information. Obviously we can optimize this, but I’m not sure it changes anything.
  • Design APIs that can be setup from dynamic languages, but execute purely from C. For example, create your animation structure from JavaScript or Python, but the tweener itself should not involve thunks back into dynamic languages. See EggAnimation as an example.
  • If you find yourself calling into C functions in a tight loop, stop and think about what you are doing.

GNOME Shell Profiling

Now that we have a Gjs profiler we can start looking at doing some fun things with it.

Today I wrote a couple line patch to GNOME Shell to toggle on and off the profiler using SIGUSR2. So if you build Gjs and gnome-shell with the appropriate patches, you can do something like:

gnome-shell --wayland
# .. inside shell
kill -SIGUSR2 <pid>
# .. exercise shell a bit
kill -SIGUSR2 <pid>
# .. now look at /tmp/gjs-profile-$pid

If you open that file up with my Sysprof improvements, you can browse around the profile information containing JavaScript stacks.

It looks something like this.

Happy bug hunting!

A profiler of our own

So now that you are all aware that I’ve been working to modernize Sysprof, you might not be surprised to read that I decided to push things in a bit more interesting of a direction.

I have a very opinionated stance on languages. Which is, in general, they’re all terrible. Only to be made more terrible by their implementations. Which is to say, we’re all screwed and how do these computers work anyway?

But alas, succumbing to the numbness of existence is not how I want to spend my life, so time to pull up my figurative bootstraps and improve what we’ve got.

So what’s in my cross-hairs this week? Well, quite frankly, those JavaScripts. I’ve worked on a handful of language runtimes, but modern JavaScript engines really take the cake for turning a Von Neumann machine into something completely foreign and utterly non-debuggable, non-profile-able, generally unpleasant for anyone to come in and improve things other than the original author. (Same could be said about C. But wait, I said I *wasn’t* succumbing to the numbness).

Well, that isn’t completely true, given that you only write JavaScript within the context of a browser. The tooling in browsers these days is rapidly approaching something useful. Which is saying a lot when you have a language that is almost exclusively using nameless functions attached to prototype fields.

So here we go, off into the land of mystery. libmozjs24 is our path, with only headers and a few sources to guide us.

So how do profilers work anyway?

Lets simplify this to a static C-like language for the moment.

Say your program is happily running along, maybe processing things from a main loop. Along comes a signal from the kernel. The kernel decides it will deliver the signal to your main thread (see man pthread_sigmask for information on how signals are delivered and how to block them). What happens next is that on top of your current executing stack, your registered signal handler (see man sigaction) is executed.

At this point, you can do a couple of things. Generally, you might use something like libunwind to unwind the stack (past your signal handler) and record each of the instruction pointers for each frame. Using that, you can later look at what library was mapped into the region containing the instruction pointer, and resolve the function name by reading the library ELF (and possibly demangling if it is a C++ function).

If you remember your UNIX handbook, the way to spawn a process (simplified) is to fork(), followed by an exec() of the new process. So how do we get our signal handler into that process?

POSIX timers

In the 2008 version of POSIX, timers were introduced (See man timer_create for more information). They allow you to deliver a signal on an registered interval. Even better, POSIX timers follow through exec but not fork. Exactly what we want. We can fork(), setup the timer, and then exec() our target.

It looks something like this:

struct sigevent sev = { 0 };
sev.sigev_signo = SIGPROF;                   /* Send us SIGPROF */
sev.sigev_notify = SIGEV_THREAD_ID;          /* Linux extension! */
sev._sigev_un._tid = syscall (__NR_gettid);  /* See man gettid */
timer_create (CLOCK_MONOTONIC, &sev, &timer);

The above code creates a timer that we can enable/disable to deliver SIGPROF on a given interval. Note that the above code uses a Linux extension that allows you to specify the thread to receive the signal. Without something like that, you would need to either create a thread to use pthread_kill() to send SIGPROF, or mask SIGPROF from all other threads, lest they receive the signal handler instead of your target thread.

So why is the thread that the profiler runs on important? We’ll get to that later when we start looking at JITd languages.

To activate the timer, we setup our interval (frequency) to sample.

struct itimerspec its = { 0 };
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = NSEC_PER_SEC / SAMPLES_PER_SEC;
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = NSEC_PER_SEC / SAMPLES_PER_SEC;
timer_settime (self->timer, 0, &its, &old_its);

Extracting Samples

Now say we used sigaction(2) to setup a signal handler for SIGPROF. And now it is being called via signal delivery of SIGPROF. Our stack might look something like:

our_sigprof_handler()
< signal handler invoked >
some_worker()
main_loop_dispatch()
main_loop_iterate()
main()

We can unwind to get to the root of the stack, stash each of those addresses somewhere, and then return from our signal handler (allowing the program to continue executing).

However, the devils in the details there. We are in a signal handler, which basically means, you can’t do much, safely. Why? Well avoiding deadlocks mostly.

Say that SIGPROF got delivered while you were in malloc()? It might look something like:

our_sigprof_handler()
< signal handler invoked >
do_alloc()
_int_lock()
malloc()
some_worker()
main_loop_dispatch()
main_loop_iterate()
main()

Now, if you try to call malloc(), you’ll deadlock on _int_lock(). Basically, the only thing you can do is write to some memory you have pre-allocated. Buffer space, if you will.

Thankfully, you can do one more thing, which is write() to a file-descriptor. That is handy in case our buffer gets too full.

But you need to be very careful here. Even things like g_assert() could potentially cause a printf() or similar which might internally malloc. So you might need to -DG_DISABLE_ASSERT for production (which you should be doing anyway).

The complexities of a JIT

JITs truly are the magic of our times. They are the meta of the computer to improve itself faster than we can. That said, like your smart friends, they can be super annoying.

Discussing what a JIT is, the added complexity of tracing JITs, and their almost always pairing with garbage collection is out of scope for this. However, there are some excellent implementations you can go read (See mono as one such example of quality engineering).

In this case, the meaning of an address (the function in question, to say) could be the same between two samples, but be part of two completely different functions. Suppose that the JIT regenerated and reused the existing memory.

If you simply went by the address and did not account for the change, then you’d get wildly inaccurate results.

How SPSProfiler comes to the rescue

Thankfully, this work was building on top of SPSProfiler from libmozjs. While we have to provide our own sampler (which we’ve described the basics of above), it provides the necessary JIT integration in a moderately fast manner.

js::ProfileEntry

What mozjs gives us is a way to ask the runtime to deliver the current JS stack into a static buffer that we can read from. Each time JS dives into or out of a function, that “shadow stack” is updated.

However, since libmozjs has a garbage collector, if we were to access this stack information from any other thread, we’d be racing the GCing of strings pointed to by that shadow stack. So we need to ensure we interrupt the executing stack (hence SIGPROF, timers, etc).

Racing with a GC doesn’t matter in a C like language because, well there is no GC, and we can resolve symbol names after the fact because of ELF and DWARF debug information.

volatile uint32_t stack_depth = 0;
js::ProfileEntry stack[1024];
js::SetRuntimeProfilingStack (js_runtime,
                              stack,
                              &stack_depth,
                              G_N_ELEMENTS (stack));
js::EnableRuntimeProfilingStack (self->runtime, true);

Deduplicating Functions

To get the function name for JavaScript functions, we can simply read the string pointed to by the js::ProfileEntry. Something like:

if (entry->js()) {
  const gchar *name = entry->label() ?: "-- javascript --"
}

Now, if we had to copy that string for every instruction pointer in the stack, on every sample, we’d have pretty significant overhead in our profiler. So we do a bit of a deduplication hack. We have something that looks like a chunked allocator (GStringChunk) except it can’t be an allocator, because oh wait, in a signal handler.

So instead, we just have a fixed buffer size to store a single instance of the string, and a closed hashtable (no allocations required) to help with our deduplication.

While we fill the deduplicated function buffer, we hand out monotonic address identifiers for each new string. We steal some high bits in the address to indicate that this is a JITd function. (0xE << __wordsize-8). Normally this would colide with kernel addresses, but not in practice.

So now we can just replace our JITd addresses with these and store them like we would normally for C-like stacks.

What’s missing?

While SPSProfiler will put JS information (including native code) for functions before the current JavaScript frame, it doesn’t seem to give info after the current JS frame. So if your JS code calls into C code, which doesn’t result in a callback back into JavaScript code, getting reliable stack addresses is non-trivial. Maybe I’m missing something here though.

I need to go read some more code, because I’m likely to believe there is a way to deal with this, but might require newer libmozjs. Anyway, future blog post and all.

Capture formats

My new sysprof implementation uses a new binary capture format I’ve put together. Basically something really simple that lets me get data into the buffer quickly without too much fuss, alignment safe (allowing dereference of integers from the buffer with out use of memcpy()) and without the super annoying reality of Linux’s perf event stream which has dynamic trailing data based on what options were enabled when sampling. Seriously, don’t do that.

I still have some things to add, but this is the first step towards making more interesting captures (like in-app performance counters, VBlank information, compositor FPS, GTK+ frame timings, etc).

So what does that even mean?

Well, I guess you can see for yourself. Here is a simple capture I did to pick on GNOME documents (no reason really, just the first JavaScript/C hybrid app that came to mind) showing Sysprof reading our JavaScript profiler output.

Gjs Profiler

There is still some work to be done. I need to get the Gjs patches in a format suitable for inclusion. That means fixing up some boring signal handler code to be more safe. Ray Strode has already provided some good feedback here.

Additionally, I need to get my sysprof2 repository grafted into upstream Sysprof. This week I swear. We also need to get patches into some programs that use libgjs directly without the use of gjs-console so that they too can be profiled. gnome-shell is the obvious culprit here.

Come hack on Sysprof with me? Please?

If any of this interests you, I’d love to have some people come help work on modernizing Sysprof into something we all love. It’s still missing one thing I loved while working on PerfKit years ago. Pretty graphs and charts of runtime information. We can start sampling cpu, memory, network, kms/drm information, and more. With that data available, we can build some pretty compelling tooling for GNOME and GNU/Linux in general. (BSDs obviously too, but today Sysprof is Linux only due to the Linux kernel integration).

No doubt, there are gaps and missing information in this blog post. It’s hard to capture all the details after you do the work. For the nitty-gritty details, best to go look at the code.

  • https://github.com/chergert/sysprof2
  • https://github.com/chergert/gjs
  • Build/install the code above.
  • gjs-console --profile-output=foo.syscap /usr/bin/gnome-documents
  • You might need to let it exit gracefully, I don’t have periodic sample flushes implemented yet.
  • sysprof foo.capture
  • Explore!

Thanks to Red Hat for letting me dive head first into this. They really are the best employer I’ve ever had.