Eternal Vigilance!

I’ve spent a lot of time during the years fixing nautilus memory use. I noticed the other day that it seemed to be using a lot of memory again, doing nothing but displaying the desktop:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
14315 alex      20   0  487m  46m  15m S  0.3  1.2   0:00.86 nautilus

So, its time for another round of de-bloating. I fired up massif to see what used so much memory, and it turns out that there is a cache in GnomeBG that caches the original desktop background image. We don’t really need that since we keep around the final pixmap for the background.

It turns out that my desktop image is 2560×1600, which means the unscaled pixbuf uses 12 megs of memory. Just fixing this makes things a bit better:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
16129 alex      20   0  538m  33m  15m S  4.9  0.8   0:00.87 nautilus

However, looking at the actual allocations in massif its obvious that we’re not actually using this much memory. For a short time when creating the desktop background pixmap we do several large temporary allocations, but these are quickly freed. So, it seems we’re suffering from the heap growing and then not being returned to the OS due to fragmentation.

It is ‘well known’ that glibc uses mmap for large (> 128k by default) allocations and that such allocations should be returned to the OS directly when freed. However, this doesn’t seem to happen for some reason. Lots of research follows…

It turns out that this isn’t true anymore, since about 2006. Glibc now uses a dynamic threshold for when to start using mmap for allocations. It uses the size of freed mmaped memory chunks to update the threshold, and this is causing problems for nautilus which has a behaviour where almost all allocations are small or medium sized, but there are a few large allocations when handling the desktop background. This is leading to several large temporary allocations going to the heap, never to be returned to the OS.

Enter mallopt(), with lets us set a static mmap limit. So, we set this back to the old value of 128k:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
 4971 alex      20   0  479m  26m  15m S  0.0  0.7   0:00.90 nautilus

Not bad. Dropped 20 meg of resident size with a few hours of work. Now nautilus isn’t leading the pack anymore but comes after apps like gnome-panel, gnome-screensaver and gnome-settings-daemon.

How to remove flicker from Gtk+

In between spending time taking care of a sick kid, a sick wife and being sick myself I have slowly been working on the remaining issues in the client-side-windows branch of Gtk+. The initial and main interest in having client side windows is that it lets us emulate all thats needed for widgets to work without any server side windows, which lets us do things like put Gtk+ widgets inside clutter, etc. However another interesting, and not entierly obvious advantage of client side windows is that it allows us to remove flicker. This post will describe how this works and show the effects.

Gtk+ already does quite a lot of things to avoid flicker. For instance, all drawing in expose events is automatically double buffered so that you never see partially drawn results. The remaining flickering is related to the effect of moving or resizing server side subwindows. Although even these are minimized by Gtk+, since many widgets don’t use such windows or only use input-only windows which don’t cause any visual effects. However, there are still some areas where subwindows are used, mostly in cases where scrolling is involved.

Lets start with an example on how scrolling works:

Evince

This is a regular Evince window showing a pdf, and we want to scroll down. This happens in several steps. First we copy the bottom area of the window to the top of the window:

Evince 2

Then we mark the newly scrolled in area at the bottom as invalid:

Evince scrolling 3

As a result of this Gtk+ will call the application to redraw the invalid region as soon as it has finished handling the incomming events:

scrolling-4

Voila! We have scrolled. (In reality more happened above, the scrollbar area was marked invalid and repainted also, but lets ignore that for now.)

This example also makes it easy to see where flicker comes from. The drawing of the newly exposed area is double buffered, so the newly drawn area is replaced atomically, however the initial copy is not done with the Gtk+ drawing system, instead its done with a XCopyArea directly on the window (not a subwindow move, but with similar effect). So, the xserver will display that immediately, while there might be some delay before the expose of the scrolled in area is drawn causing visual tearing.

Another common problem is widget resizing/move that can be seen in my previous blog entry.  In this case what happens is that a widget with a subwindow is moved and/or resized and it ends up over another widget. The window move operation is done immediately in the server and results in a copy similar to the above, and then there is some delay before the widgets are redrawn to match that.

Now, client side windows don’t by itself fix this, but the copies above and all rendering is now under control of the client (i.e. the app) so we have the tools to do something about it. The solution is to delay the copying until we’re ready to draw everything that will be drawn, so we never show any partial results. Whenever some region of a window is copied we just record the area to be copied and by how much. When we’re handling the expose events for the invalid area we handle the expose up to the point of drawing everything in the double buffer. At this point we replay all the copies we recorded, except we don’t care about copying anything that will draw into the area which will be drawn by the expose. Then we blit out the final result of the expose event.

Furthermore, in practice it often happens that we do several moves/scrolls of the same region before it its drawn. This works with the above approach, but is a bit wasteful as some regions are copied twice. So, instead of just simply keeping track of all copies being made we try to combine such double copies into a single copy, thus minimizing the actual copies we make in the end.

So, how does this look in the end? Its kind of hard to capture this kind of flicker with a screengrabber, so here is a video I took with my phone:

[vimeo]http://www.vimeo.com/3148023[/vimeo]

Can you tell which one uses the standard Gtk+?