Moving clang out of process

For the past couple of weeks, Builder from git-master has come with a new gnome-builder-clang subprocess. Instead of including libclang in the UI process, we now proxy all of that work to the subprocess. This should have very positive effect on memory usage within the UI process. It will also simplify the process of using valgrind/ASAN and obtaining useful results. In the future, we’ll teach the subprocess supervisor to recycle subprocesses if they consume too much memory.

That is rather important because compilers wreak havoc on the memory allocator. However, it presents a series of challenges too.

Once you move all of that stuff out of process you need to find an efficient way to communicate between processes. The design that I came up with is plain GVariant messages delivered over a stdin/stdout pipe to the subprocess. This is very similar to the Language Server Protocol but lets us cheat in an important way.

We sometimes need to pass large messages between processes including changes to unsaved buffers or completion results. Clang prefers to do client-side filtering of completion results so the list often includes some 25,000+ results. Not exactly the type of thing you want to parse into lots of tiny json objects and strings. By using GVariant as our message format we ensure that we only have a single memory allocation for the entire result set (and can use C strings embedded in the variant too!).

By altering some of our APIs to use interfaces like GListModel we end up more lazy in terms of object creation, use less memory, and fragment significantly less. One area this will be changing in the near future is our new completion engine. It is based on what we learned about other async APIs in Builder along with using GListModel for efficiency.

One thing that is not well known is that g_variant_get_child_value() is O(1) due to how arrays are laid out in GVariant. This couples well with g_list_model_get_item().

This has been a big step forward, but there is more to do.

More compiler fun

Yesterday’s post about -fstack-protector challenges leads us to today’s post. If you haven’t read it, go back and read it first.

Basically, the workaround I had at the time was to just disable -fstack-protector for the get_type() functions. It certainly made things faster, but it was a compromise. The get_type() functions can have user-provided code inserted into them via macros like G_DEFINE_TYPE_EXTENDED() and friends.

A real solution should manage to return the performance of the hot-path back to pre-stack-protector performance without sacrificing the the protection gained by using it.

So I spent some time today to make that happen. In bugzilla #795180 I’ve added some patches which break the get_type() functions into two. One containing the hot path, and a second that does the full type registration which is protected by our g_once_init_enter(). If we add the magic __attribute__((noinline)) to the function doing the full type registration, it can’t be inline’d into the fast path allowing it to pass the stack-protector sniff test (and therefore, not incur the stack checking wrath).

The best part is that applications and libraries only need to be recompiled to get the speedup (due to macro expansion).

Not a bad experience diving into some compiler bits this week.

Compiler complexities

The other day I found myself perusing through some disassembly to get an idea of the code’s complexity. I do that occasionally because I find it the quickest way to determine if something is out of whack.

While I was there, I noticed a rather long _get_type() function. It looked a bit long and more importantly, I only saw one exit point (retq instruction on x86_64).

That piqued my interest because _get_type() functions are expected to be fast. In the fast-path (when they’ve already been registered), I’d expect them to check for a non-zero static GType type_id and if so return it. That is just a handful of instructions at most, so what gives?

The first thing that came to mind is that I use -O0 -g -fno-omit-frame-pointer on my local builds so that I get good debug symbols and ensure that Linux-perf can unwind the stack when profiling. So let’s disable that.

Now I’ve got everything rebuilt with the defaults (-O2 -fomit-frame-pointer). Now I see a couple exit points, but still what appears to be too many for the fast path and what is this __stack_chk_fail@plt I see?

A quick search yields some information about -fstack-protector which is a recent (well half-decade) compiler feature that employs various tricks to detect stack corruption. Distributions seem to enable this by default using -fstack-protector-strong. That tries to only add stack checks to code it thinks is accessing stack allocated data.

So quick recompile with -fno-stack-protector to disable the security feature and sure enough, a proper fast path emerges, We drop from 15 instructions (with 2 conditional jumps) to 5 instructions (with 1 conditional jump).

So the next question is: “Well okay, is this worth making faster?”

That’s a complicated question. The code was certainly faster before that feature was enabled by default. The more instructions you execute, the more code has to be loaded into the instruction cache (which is very small). To load instructions means pushing others out. So even if the code itself isn’t much faster, it can prevent the surrounding code from being faster.

But furthermore, we do a lot of _get_type() calls. They are used when doing function precondition checks, virtual methods, signals, type checks, interface lookups, checking a type for conformance, altering the reference count, marshaling data, accessing properties, … you get the idea.

So I mucked through about 5 different ways trying to see if I could make things faster without disabling the stack protector, without much luck. The way types are registered access some local data via macros. Nothing seemed to get me any closer to those magic 5 instructions.

GCC, since version 4.4, has allowed you to disable the stack-protector on a per-function basis. Just add __attribute__((optimize("no-stack-protector"))) to the function prototype.

#if G_GNUC_CHECK_VERSION(4, 4)
# define G_GNUC_NO_STACK_PROTECTOR \
  __attribute__((optimize("no-stack-protector")))
#else
# define G_GNUC_NO_STACK_PROTECTOR
#endif

GType foo_get_type (void) G_GNUC_NO_STACK_PROTECTOR;

Now we get back to our old (faster) version of the code.

 48 8b 05 b9 15 20 00    mov    0x2015b9(%rip),%rax   
 48 85 c0                test   %rax,%rax
 74 0c                   je     400ac8
 48 8b 05 ad 15 20 00    mov    0x2015ad(%rip),%rax   
 c3                      retq   

But what’s the difference you ask?

I put together a test that I can run on a number of machines. It was unilaterally faster in each case (as expected), but some by as much as 50% (likely due to various caches).

Arch OS Type Speedup
ARM Ubuntu 14.04 Odroid X2 +24%
x64_64 Fedora 28 X1 Carbon Gen3 +25.5%
x86_64 Fedora 28 i7 gen7 NUC +12.25%
x86_64 Fedora 28 Surface Book 2 i7 (gen8) +12.5%
x86_64 Fedora 27 Onda Tablet +50.6%
x86 Debian netbook +12.5%

It’s my opinion that one place where it makes sense to keep things very fast (and reduce instruction cache blow-out) is a type system. That code gets run a lot intermixed between all the code you really care about.

GTask and Threaded Workers

GTask is super handy, but it’s important you’re very careful with it when threading is involved. For example, the normal threaded use case might be something like this:

state = g_slice_new0 (State);
state->frob = get_frob_state (self);
state->baz = get_baz_state (self);

task = g_task_new (self, cancellable, callback, user_data);
g_task_set_task_data (task, state, state_free);
g_task_run_in_thread (task, state_worker_func);

The idea here is that you create your state upfront, and pass that state to the worker thread so that you don’t race accessing self-> fields from multiple threads. The “shared nothing” approach, if you will.

However, even this isn’t safe if self has thread usage requirements. For example, if self is a GtkWidget or some other object that is expected to only be used from the main-thread, there is a chance your object could be finalized in a thread.

Furthermore, the task_data you set could also be finalized in the thread. If your task data also holds references to objects which have thread requirements, those too can be unref’d from the thread (thereby cascading through the object graph should you hit this undesirable race).

Such can happen when you call g_task_return_pointer() or any of the other return variants from the worker thread. That call will queue the result to be dispatched to the GMainContext that created the task. If your CPU task-switches to that thread before the worker thread has released it’s reference you risk the chance the thread holds the last reference to the task.

In that situation self and task_data will both be finalized in that worker thread.

Addressing this in Builder

We already have various thread pools in Builder for work items so it would be nice if we could both fix the issue in our usage as well as unify the thread pools. Additionally, there are cases where it would be nice to “chain task results” to avoid doing duplicate work when two subsystems request the same work to be performed.

So now Builder has IdeTask which is very similar in API to GTask but provides some additional guarantees that would be very difficult to introduce back into the GTask implementation (without breaking semantics). We do this by passing the result and the threads last ownership reference to the IdeTask back to the GMainContext at the same time, ensuring the last unref happens in the expected context.

While I was at it, I added a bunch of debugging tools for myself which caught some bugs in my previous usage of GTask. Bugs were filed, GTask has been improved, yadda yadda.

But I anticipate the threading situation to remain in GTask and you should be aware of that if you’re writing new code using GTask.

Device Integration

I’ve been working on some groundwork features to sneak into Builder 3.28 so that we can build upon them for 3.30. In particular, I’ve started to land device abstractions. The goal around this is to make it easier to do cross-architecture development as well as support devices like phones, tablets, and IoT.

For every class and kind of device we want to support, some level of integration work will be required for a great experience. But a lot of it shares some common abstractions. Builder 3.28 will include some of that plumbing.

For example, we can detect various Qemu configurations and provide cross-architecture building for Flatpak. This isn’t using a cross-compiler, but rather a GCC compiled for aarch64 as part of the Flatpak SDK. This is much slower than a proper cross-compiler toolchain, but it does help prove some of our abstractions are working correctly (and that was the goal for this early stage).

Some of the big things we need to address as we head towards 3.30 include running the app on a remote device as well as hooking up gdb. I’d like to see Flatpak gain the support for providing a cross-compiler via an SDK extension too.

I’d like to gain support for simulators in 3.30 as well. My personal effort is going to be based around GNU-based systems. However, if others join in and want to provide support for other sorts of devices, I’d be happy to get the contributions.

Anyway, Builder 3.28 can do some interesting things if you know where to find them. So for 3.30 we’ll polish them up and make them useful to vastly more people.

Introducing deviced

Over the past couple of weeks I’ve been heads down working on a new tool along with Patrick Griffis. The purpose of this tool is to make it easier to integrate IDEs and other tooling with GNU-based gadgets like phones, tablets, infotainment, and IoT devices.

Years ago I was working on a GNOME-integrated home router with davidz which sadly we never finished. One thing that was obvious to me in that moment of time was that I will not do another large scale project until I have better tooling. That was Builder’s genesis and device integration is what will make it truly useful to myself and others who love playing with GNU-friendly gadgets.

Now, building an IDE is a long process. There is a ton of code to write, trade-offs to work through, and persistence beyond what any reasonable programmer would voluntarily sign up for. But the ends justify the slog.

So what we’ve created is uninterestingly called “deviced”. It currently has three components. A deviced daemon lives on the target device that we’re interested in writing software for. A GObject-based libdeviced library provides access to discover and connect to devices and do interesting things on them. Lastly, devicectl is a readline-based command line tool that allows you to interact with these devices without having to write a program using libdeviced.

The APIs in libdeviced are appropriately abstracted so that we can provide different transports in the future. Currently, we only have network-based communication but we will implement a USB transport in the not-too-distant future. Other protocols such as SSH or custom micro-controllers can be added. Although something like SSH is more complex because it’d be the combination of both a protocol and how to run commands to get the intended effect, which is non-portable. It will be possible to support devices that do not run deviced, but that is currently out of scope.

To allow devices to be discover-able, deviced will broadcast it’s presence using mDNS on networks it is configured to listen (based on network-manager connection UUID). Long term my goal is that you can configure deviced access in Control Center, similar to “Sharing and Privacy”. The network protocol is rather simple as it’s just JSON-RPC over TLS with self-signed certificates. When a client connects to the daemon, a gnome-shell notification is presented allowing you to accept the connection. At that point, the client certificate is saved for future validation.

Our libdeviced library is GObject introspectable and should therefore work with a number of languages.

Right now, only Flatpak applications are supported, but we have abstractions to allow for contributions to support additional application layers like docker or plain old .desktop files. Currently you can push flatpak applications and runtimes to the device and install them and run them. If you have a new enough Flatpak, you can do delta updates.

It can even bridge multiple PTY devices for a shell, which isn’t really meant to be an SSH replacement, but more of a single abstraction we can use to be able to control a debugger and inferior from the IDE tooling.

There are still lots of little bugs to shake out and more bits to implement, but this is a pretty sweet 2-week proof of concept.

https://gitlab.gnome.org/chergert/deviced/

Here is a 20 second demo running on a single machine. It’s the same when using multiple machines except you get the notification on the programmable device rather than on your workstation. Obviously for IoT devices we’d need to create some sort of freedesktop notification bridge or alternate notification mechanism.

Anytime you work on a new project people will inevitably ask “why not just use XYZ”. In this case, I would expect both SSH and ADB to fall into that category. Most importantly, libdeviced is going to be about providing a single “remote device” abstraction for us in Builder. So it’s reasonable that we could abstract both of those systems from libdeviced. But neither of those provide the work-flow I envision for out-of-box experience, hence the deviced daemon. In the ADB case, it will be very difficult to get code upstream and released to distributions as it is increasingly unlikely our use-case is interesting to upstream. There were experimental patches to ADB a couple years ago to support flatpak so we didn’t take on this effort without considering our options. Ultimately, this prototype was to see the feasibility of making something that solves our problems while not locking us out of supporting other systems in the future.

Builder happenings for January

I’ve been very busy with Builder since returning from the holidays. As mentioned previously, we’ve moved to gitlab. I’m very happy about it. I can see how this is going to improve the engagement and communication between our existing community and help us keep new contributors.

I made two releases of Builder so far this month. That included both a new stable build (which flatpak users are already using) and a new snapshot for those on developer operating systems like Fedora Rawhide.

The vast majority of my work this month has been on stabilization efforts. Builder is already a very large project. Every moving part we add makes this Rube Goldberg machine just a bit more difficult to maintain. I’ve tried to focus my time on things that are brittle and either improve or replace the designs. I’ve also fixed a good number of memory leaks and safety issues. However, the memory overhead of clang sort of casts a large shadow on all that work. We really need to get clang out of process one of these days.

Over the past couple years, our coding style evolved thanks to new features like g_autoptr() and friends. Every time I come across old-style code during my bug hunts, I clean it up.

Builder learned how to automatically install Flatpak SDK Extensions. These can save you a bunch of time when building your application if you have a complex stack. Things like Rust and Mono can just be pulled in and copied into your app rather than compiled from source on your machine. In doing so, every app that uses the technology can share objects in the OSTree repository, saving disk space and network transfer.

That allowed me to create a new template, a GNOME C♯ template. It uses the Mono SDK extension and gtk-sharp for 3.x. If you want to help here, work on a omni-sharp language server plugin for us!

A new C++ template using Gtkmm was added. Given that I don’t have a lot of recent experience with Gtkmm, it’d be nice to have someone from that community come in and make sure things are in good shape.

I also did some cleanup on our code-indexer to avoid threading in our API. Creating plugins on threads turned out to be rather disastrous, so now we try extra hard to keep things on the main thread with the typical async/finish function pairs.

I created a new messages panel to elevate warnings to the user without them having to run Builder from a terminal. If you want an easy project to work on, we need to go find interesting calls to g_warning() and use ide_context_warning() instead.

Our flatpak plugin now tries extra hard to avoid downloads. Those were really annoying for people when opening builder. It took some troubleshooting in flatpak-builder, and that is fixed now.

In the process of fixing the extraneous downloading I realized we could start bundling flatpak-builder with Builder. After a couple of fixes to flatpak-builder Builder Nightly no longer requires flatpak-builder on the host. That’s one less thing to go wrong for people going through the newcomers work-flow.

We just landed the beginning of a go-langserver plugin. It seems like the language server for Go is pretty new though. We only have a symbol resolver thus far.

I found a fun bug in Vala that caused const gchar * const * parameters to async functions to turn into gchar **, int. It was promptly fixed upstream for us (thanks Rico).

Some 350 commits have landed this month so far, most of them around stabilizing Builder. It’s a good time to start playing with the Nightly branch if you’re into that.

Oh, and after some 33 years on Earth, I finally needed glasses. So I look educated now.

Musings on bug trackers

Over the past couple of weeks we migraged jsonrpc-glib, template-glib, libdazzle, and gnome-builder all to gitlab.gnome.org. It’s been a pretty smooth process, thanks to a lot of hard work by a few wonderfully accommodating people.

I love bugzilla, I really do. I’ve used it nearly my entire career in free software. I know it well, I like the command line tool integration. But I’ve never had a day in bugzilla where I managed to resolve/triage/close nearly 100 issues. I managed to do that today with our gitlab instance and I didn’t even mean to.

So I guess that’s something. Sometimes modern tooling can have a drastic effect rather immediately.

g_object_ref and -Wincompatible-pointer-types

I proposed a change to GObject that was merged not too long ago that uses __typeof__ on GCC/Clang to propagate the pointer type from the parameter. For example, the old prototype was something like:

gpointer g_object_ref (gpointer instance);

Which meant you got a cast for free even if you made a mistake such as:

FooBar *bar = foo_bar_new ();
FooFrob *frob = g_object_ref (bar);

But now, we propagate the type of the parameter as the return value with a macro that is essentially:

#define g_object_ref(Obj)  ((__typeof__(Obj)) (g_object_ref) (Obj))

Now, the code example above would be an -Wincompatible-pointer-types warning from GCC/Clang. This isn’t a common problem (people often rely on this implicit cast when using inheritance). However, when you do make the mistake it’s rather annoying to track down. I’ve certainly made the mistake before when too-quickly copying lines of code, and I know I’m not the only one.

So, how do you fix the warnings in your code base if you were relying on the implicit cast? Both of these will work, I have a preference for the former.

FooFrob *frob = g_object_ref (FOO_FROB (bar));
FooFrob *frob = FOO_FROB (g_object_ref (bar));

Builder 3.27 Progress (Again)

As normal, I’ve been busy since our last update. Here are a few highlights of features in addition to all those bug fixes.

Recursive Directory Monitors

Builder now creates a recursive directory monitor so that your project’s source tree can be updated in case of external modification, such as from a terminal. If you need a recursive directory monitor, the implementation can be found in libdazzle.

Project Tree Drag-n-Drop

The project tree now supports basic drag-n-drop. You can drag within the tree as well as from external programs supporting text/uri-list into the project tree. Nautilus is one such example.

VCS Status in Project Tree

The project tree can now query the VCS backend (git) to provide status about the added and changed files to your project.

vcs status for project tree

Editor Grid Drag-n-Drop

You can drag text/uri-list drag sources onto the editor grid and place them as you like. Drag to the top or bottom to create new above/below splits. Drag to the edges for left/right splits.

Build Pipeline Stages

Sometimes you might want to peek into the build pipeline to get a bit more insight. Expand the “Build Details” to see the pipeline stages. They’ll update as the build progresses.

displaying build pipeline entries

Updating Dependencies

We want to ensure we’re doing less work when Builder starts-up. That means we won’t auto-update dependencies before long. In doing so, you’ll have to choose to update your dependencies when it makes sense. We might as well make that easy, so here is a button to do that. It currently supports flatpak and Cargo.

update dependencies button

Hamburger menu has gone away

We focus more on “contextual” menus rather than stashing things in the window menu. So much that we’ve managed to be able to remove the “hamburger” menu by default. It will automatically display should any enabled plugin use it.

hamburger menu is gone

Lots of bug fixes too, but those don’t have pretty pictures. So that’s it for now!