Sysprof + Builder

After the GNOME 3.20 cycle completed I started revamping Sysprof. More here, here, and here. The development went so smoothly that I did a 3.20 release a couple of weeks later.

A primary motivation of that work was rebuilding Sysprof into a set of libraries for building new tools. In particular, I wanted to integrate Sysprof with Builder as our profiler of choice.

On my flight back from GUADEC I laid the groundwork to integrate these two projects. As of Builder 3.21.90 (released yesterday) you can now profile your project quite easily. There are  more corner cases we need to handle but I consider those incremental bugs now.

Some of our upcoming work will be to integrate the Sysprof data collectors with Python and Gjs. The Gjs implementation is written, it just needs polish and integration with upstream. I think it will be fantastic once we have a compelling profiling story weather you are writing C, C++, Vala, Python, or Gjs.

We’ve also expanded the architectures supported by Sysprof. So I expect by time 3.22 is released, Sysprof will support POWER8, ARM, ARM64, mips, and various others as long as you have an up to date Linux kernel. That is an important part of our future plans to support remote profiling (possibly over USB or TCP). If you’re interested in working on this, contact me! The plumbing is there, we just need someone with time and ambition to lead the charge.

Selecting the Profiler

Builder's Callgraph View

Builder Happenings

Over the last couple of weeks I’ve started implementing Run support for Builder. This is somewhat tricky business since we care about complicated manners. Everything from autotools support to profiler/debugger integration to flatpak and jhbuild runtime support. Each of these complicates and contorts the problem in various directions.

Discovering the “target binary” in your project via autotools is no easy task. We have a few clever tricks but more are needed. One thing we don’t support yet is discovering bin_SCRIPTS. So launching targets like python or gjs applications is not ready yet. Once we discover .desktop files that should start working. Pretty much any other build system would be easier to implement this.

So much work behind the scenes for a cute little button.

Screenshot from 2016-07-17 22-09-21

While running, you can click stop to kill the spawned application’s process group.

Screenshot from 2016-07-17 22-10-38

But that is not all that is going on. Matthew Leeds has been working on search and replace recently (as well as a whole bunch of paper cuts). That work has landed and it is looking really good!

Screenshot from 2016-07-17 22-12-45

Also exciting is that thanks to Sebastien Lafargue we have a fancy new color picker that integrated with Builder’s source editor. You can visualize colors in your source files and change them using the dropper or numerous sliders based on how you’d like to tweak the color.

Screenshot from 2016-07-17 22-11-38

I still have a bunch of things I want to land before GUADEC, so back to the hacker den for me.

Builder Designs

Thanks to the wonderful design skill of Allan, Builder got a bunch of new designs this last month. Last week, after arriving home from the Toronto hackfest, I started reshaping Builder to match.

We’ve simplified the greeter to remove an interstitial page and raise the discoverability of some of Builder’s features.

greeter

Creating a project has been simplified and we made some features more discoverable by removing the need to dive into a GtkComboBox. Clearly, we still have a bunch of templates to land for 3.22.

new-project

Cloning a project from git is functionally the same, but we give a bit more context to what is going on.

clone-project

Those that came to my talk at FOSDEM got to hear me complain about how we have a sidebar in Builder, but I didn’t like it and wanted something different going forward. Well, we finally have landed that.

The perspective selector is in the top left corner, and clicking on it will also show you some shortcuts to help you move through Builder faster. As we grow the application, we hope to provide a bunch more useful keybindings for folks. Some of this is going to require rethinking how keybindings work in modern gtk+.

perspective-selector

One thing that was relatively buried in Builder that we wanted to improve was the Build support. We want to make it easy to build your project without getting overwhelmed by panels. So we’ve introduced a new center widget in the header called the OmniBar. It provides a high-level overview of the project and information on the current build progress, if any.

omnibar

I hope to see some more iteration on this before 3.22, and possibly API to allow plugins to supplement information for the OmniBar.

Allan is still working on even more designs this cycle, so I have no doubt that 3.22 will be our best release yet.

Matthew Leeds has been working diligently on our new search and replace engine (in addition to many Vim compatibility improvements). I’m looking forward to having that land soon (and some screenshots to go with it)!

LAS, hosted by GNOME

Sri and many members of our community have spearheaded a wonderful new conference named Libre Application Summit. It’s hosted by the GNOME Foundation and has aspirations to bring together a wide spectrum of contributors.

This conference is meant to bridge a gap in Free Software communication. We need a place for application and game developers, desktop developers, systems implementers, distributions, hardware producers, and driver developers to communicate and solve problems face to face. There are so many moving parts to a modern operating system that it is very rare to have all of these passionate people in the same room.

This will be a great place to learn about how to contribute to these technologies as well. It seems likely that I’ll do tutorial workshops and other training for participants at LAS.

I’m very excited to see where this conference goes and hope to see you in Portland come September!

Secure from whom

Side-channel attacks are a thing, this is true. But they also cost a lot of time and money to develop. If you want something that can be applied to more than just a single target, that cost explodes. That is why the two most common places where side-channel attacks are developed are nation states and universities specializing in that research.

What is not helpful, beyond informing people of the existence of them, is to simply state that side-channel attacks exist and therefore nothing is secure. Even more so without demonstrating how they are real-word applicable and how that information should alter the direction of development.

Security is a nebulous word and is almost always used as an incomplete sentence. It lacks an important qualifier. Secure from whom.

Creating a side-channel attack almost always requires knowing a bit about your target. Doubly so for something as delicate as timing attacks. Also, don’t forget to take into account development time for said attacks. If the software changes at a rate faster than you can develop your exploit, well, that’s note worthy.

Making it more difficult for an application to extract information from outside the containment zone does in fact protect the user from practical attacks which do not require a nation state to develop. It also most certainly cannot protect you from everything. Such is the reality of existence. I’m not safe from a meteorite hitting me but my risk assessment shows everything is going fine and it is not worth the mental stress to worry about.

So in summation, I’m far more interested in focusing on our ability to get security fixes out to users in a timely fashion. Herd immunity can work for software too.

Builder 3.20.4

Those of us happy hackers in #gnome-builder have been diligently preparing 3.20.4 for you. I expect that most people will end up using this version during the 3.20 life-cycle as the big distros are starting to ship 3.20. We might do another 3.20 release, but I haven’t decided. There are lots of stability and performance improvements, and I’m pretty happy with where things are going.

Now that this release is out, it is probably time to start pushing hard on our 3.22 features. I’m happy to have Fangwen Yu working on Builder this summer on our search and replace engine. We have some great mockups in the works and I have no doubt Fangwen is going to do a great things with the code-base.

A few screenshots, because that’s what I’m known for.

Screenshot from 2016-05-06 18-17-10

Screenshot from 2016-05-06 18-20-29

Screenshot from 2016-05-06 18-22-38

Tarballs can be found on downloads.gnome.org.

Build System Fallbacks

It’s no secret that I focus my build system efforts in Builder around Autotools. I’m happy to include support for other build systems, so long as I’m not the person writing it.

Sometimes the right piece of code falls together so easily you wonder how the hell you didn’t think of it before. Today was such a day.

If you are using Builder from git (such as via jhbuild) or from the gnome-builder-3-20 branch (what will become 3.20.4) you can use Builder with the fallback build system. This is essentially our “NULL” build system and has been around forever. But today, these branches learned something so stupidly obvious I’m ashamed I didn’t do it 6 months ago when implementing Build Configurations.

If you go into the Build Configuration panel (the middle button in the sidebar) you can specify environment flags. Builder will use CFLAGS, CXXFLAGS, and VALAFLAGS now to prime those compilers in the absence of Autotools (where we have discovered these dynamically).

build-flags

And now semantic language features should work.

examples

How to Sysprof

So now that a new Sysprof release is shipped, lets pick on an unsuspecting library to see what it is like to improve performance in a real-world scenario. Today we’ll pick on GtkSourceView. They shouldn’t feel bad though, GtkSourceView is an absolutely wonderful library and like any piece of software, it can be improved.

GtkSourceView has a lovely helper program to test things out in the tests/ directory. If you are an app or library developer, please do this! It makes things much easier.

So lets run ./test-widget in our jhbuild environment, and start-up Sysprof. Often times, you’ll want to see how your program affects the whole system. But for this test, I want to focus on test-widget, so we will limit our capture to samples in that process. Do this by turning off the Profile my entire system switch and then selecting your target process from the list.

1-setup-profiler

Next, click record. You might be prompted to authorize your user to access performance counters based on your system configuration and user permissions.

2-click-record

After your profiling session has started, switch back to the test application and exercise the crap out of it. In this case, I turned on some features in the test widget like line numbers (something I always have enabled in Gedit and Builder) and started scrolling like crazy. I did this until I had about 30,000 samples recorded. Sysprof will tell you how many callchains have been recorded in the upper right corner of the window.

Then press the Stop button. Depending on the size of your capture, it might take a couple seconds, but the callgraph will then be generated. It has to crack open all of the linked libraries and extract symbol information from them, so it can take a second or two.

3-view-callgraph

Now the mysterious part. Start diving into the descendants tree following the most expensive cumulative times. We want to find something that looks “out of place”. Getting good at that takes practice. If your callchain gets too deep, just hit enter on the row and it will focus in on that item.

In the image below, you’ll see I jumped past main, various main loop junk until i got to gtk_main_do_event(). This is the crux of event dispatch in GTK+. If we keep diving down by the most expensive callchain, we get to a peculiar function, center_on(). It seems to be calling into gtk_text_view_get_iter_location() a bunch, I wonder why.

4-center_on

So lets go find the code. It is clearly called by GtkSourceGutterRendererText, so that is where we will start.

In the code below, it looks like the text gutter renderer (what draws line numbers next to your code) needs to either place the text in the middle of the row, the top of the row (in the case of line wrapping), or the bottom of the row (also in case of line wrapping).

5-find-relevant-code

In Builder, shamefully, we don’t even allow line wrapping today. So clearly a shortcut can be taken. If wrapping is disabled, we know that we will always be centering our text to the entire height of the cell. So lets cook up a quick patch to avoid the center_on() calls altogether.

5.5-patch-the-code

Now we build, and repeat our profiling session to compare the results. Originally the gutter_renderer_text_draw() was in about 33% of our collected callchains. Now, if you look below we are down to less than 20% of our collected callchains, and center_on() is nowhere to be seen!

6-compare-results

So the moral of the story is that in about half an hour, you can profile, learn something about a code-base, and make measurable improvements. So go ye forth and make the F in Free Software stand for Fast.

Designing APIs for multiple languages

Designing software that is both fast and available to higher level languages generally means you end up writing C. There are guiding principles you should follow when doing so to ensure that you give your software the best chance for success.

Design Failures

Lets start with a look into my past. When I was employed at MongoDB a few years back, I was tasked with writing the modern, fast, C client library. The secondary goal was to speed up the other drivers that used bits of C “for performance reasons”. However, the performance gain from the C components was a meager 1-2x faster than just implementing it in the higher-level language. This is what happens when we fail to see the big picture, which is the first step in understanding.

The cost of a thunk in and out of the language runtime is reasonably fast these days. But when you do lots of them they quickly add up. (A thunk is simply a wrapper around calling another function that possibly has setup/teardown and possibly marshaling to perform).

In the example above, the reason for the meager gains in performance from C was simple. It was encoding/decoding each individual BSON document by calling into C (and then back up into python, ruby, etc) rather than as a set. Imagine if you get a result from the server containing 1000 documents. In this case you’d cross the language barrier at least 1000 times. Now what if while decoding those documents you have to create structures that are owned by the language runtime (calling back into the runtime to allocate). Now your 1000 could have just turned into 3000 at best, and more likely, many times worse.

However, if you simply dive into C once to decode the whole stream, you cut a large number of thunks out of the equation. If instead you move the whole database client, socket handling, encryption, etc into C, you can avoid even more thunks. This is why wrapping the libmongoc C library in python was closer to 10-15x faster than the native python version compared to the meager 1-2x faster with per-document decoding.

By maximizing the time you are in C, you give yourself the largest potential for performance improvement. Where you draw your language boundary is equally important to the data-structures you choose.

GObject

We use GObject across the board in GNOME. And for a living piece of software that is nearly old enough to drink, that is a good thing. Like all type systems designed in the 1990s, it has some warts. But generally, it gets the job done and provides the inter-language features we want with very little effort.

But you need to be careful when designing APIs if you intend for them to be accessible from multiple languages. For example, if your API relies on gsignal (what other languages often call “events”), you should at least think about the costs.

For example, imagine that the callback connected to your signal is in python. Your C code knows nothing of python and therefore likely does not hold the GIL (global interpreter lock). That means that when your signal fires, and it tries to thunk to the python callback, it must first marshal parameters (possibly copying), and then acquire the python GIL (generally fine). Now imagine you do this many times per second because your design emits signal everywhere (GtkWidget, for example). Now all of a sudden you are entering/exiting the language barrier many times in rapid succession. The thunks add up.

A very similar but equally important thing is the use of main loop timeouts. In GLib-based code, we generally use some form of g_idle_add_full() that registers a new GSource. First off, for every one of these we have to wake up the main loop, mutate data structures, detect level-triggered poll events, and possibly destroy it at the end of the main loop cycle (for one-shot sources). And that doesn’t even include the callback into your language runtime. Now imagine you do this on every frame of an animation. Now imagine that for every frame of the animation you update multiple actors in your scene graph. All of a sudden your thunk costs went through the roof, and you haven’t done any actual work yet.

Designing for success

So, how do we design APIs that don’t suffer from these issues? Well first off, really consider whether the use of gsignal is beneficial.

  • Avoid gsignal when simply a single callback function will suffice. gsignal synchronizes all emissions via the global lock used to locate signal information. Obviously we can optimize this, but I’m not sure it changes anything.
  • Design APIs that can be setup from dynamic languages, but execute purely from C. For example, create your animation structure from JavaScript or Python, but the tweener itself should not involve thunks back into dynamic languages. See EggAnimation as an example.
  • If you find yourself calling into C functions in a tight loop, stop and think about what you are doing.

Sysprof 3.20.0

Previously, previously, previously, and previously.

The past couple of weeks went by so fast working on Sysprof that I think we can actually release it as a (late) 3.20 application. We don’t have many translations, but on the other hand, we didn’t have any before.

There is so much we could do with Sysprof going forward, but I’ve got an IDE to write. If some of you are interested in working on the application, I can push you in the right direction. I have some visualization prototypes and ideas for more data collectors. I’d love to hand that off to newcomer(s) who are proficient in C and as nuts as I am about performance.

TarballsNEWS