Archive for the ‘Gnome’ Category

Sorting Icons Theme Mess

Thursday, April 14th, 2011

In my long-running series on why themes are evil, I bring you the newest installment.

Consider the gtk stock icon GTK_STOCK_SORT_ASCENDING which is supposed to represent sorting elements to make them increasing according to some order, typically numerically or alphabetically. The icon for such an action is supposed to somehow convey what happens when it is pressed, all in, say, 24×24 pixels.

Take a look at different themes and how they implement the icon:

eog `find /usr/share/icons/ -print | grep sort-ascending`

This command will show you the icon images with some duplication due to multiple sizes.

Observations:

  • Some show an up-arrow, others show a down-arrow. Yet others show a diagonal arrow which isn’t as bad as it sounds because such arrows are annotated.
  • Some arrows have no annotations, some are annotated by “1..9″, and yet others are annotated to “a..z”.

Officially this is a mess[tm]. When annotations are present they hint at either numerical or alphabetical ordering which may or may not match what the application does. That’s minor. But when no annotations are present, the situation is far worse: my sort-ascending button looks like someone else’s sort-descending simply because of theme differences!

I don’t know how this mess came about, but it ought to be resolved. I suggest that when the icons look like vertical arrows, sort-ascending should point down because the elements of a list will then be increasing in the direction of the arrow.

Hunting Leaks in GTK+ Applications

Thursday, April 7th, 2011

Hunting leaks in GTK+ application used to be fairly simple: you would run your application under Purify (or, later, Valgrind) and the leak reports would pretty much tell you where to go plug.

That was a long time ago. In the meantime, GTK+ has gotten more complex over its iterations with more caches, more inter-object links, and deliberately-unfreed objects. On top of that, Valgrind and Purify are not particular well suited for finding the cause of leaks: by design they will tell you the backtrace of the call that allocated memory which was never leaked. In a ref-counted world that information is often quite insufficient: the leaked widget was allocated by the gui builder — oh goodie! What you really want to know is who holds the extra ref.

Enter the gobject debugger first introduced by Danielle here. After some major internal work, it has become mature.

I used this for Gnumeric, which was already one of the strictest leak-policed applications in GTK+ land. We leaked a number of GtkTreeModel/GtkListStore objects, for example. Easily fixed. Also, when touching print code or file choosers we leaked massively: one object per printer and/or two objects per file in the current directory. A sequence of bug reports (646815, 646462, 646446, 646461, 646460, 646458, 646457, and 645483) later, GTK+ is now behaving much better. All but the last of these have been fixed. With this, we are down to leaking about 20 Gtk-related objects: the recent-documents manager, im-module objects, the default icon factory, and theme engines. Basically stuff that GTK+ does not want to release.

Please try this on your applications, especially if they are long-running. I still have to kill gnome-terminal, banshee, metacity, and gvfsd from time to time when they grow to absurd sizes. That doesn’t have to be caused by GObject leaks, but it might very well. (I know some of these samples are obsolete; I am not naive enough to believe their replacements would fare any better.)

This might be a good time to remind people that g_object_get and gtk_tree_model_get will give you a reference when you use them to retrieve GObjects. You need to unref when done. The problem is that it is not immediately clear from the g_object_get/gtk_tree_model_get call whether existing code is getting objects, so a certain knowledge of the code is needed.

GHashTable Memory Requirements

Saturday, March 26th, 2011

Someone threw a 8-million cell csv file at Gnumeric. We handle it, but barely. Barely is better than LibreOffice and Excel if you don’t mind that it takes 10 minutes to load. And if you have lots of memory.

I looked at the memory consumption and, quite surprisingly, the GHashTable that we use for cells in sheet is on top of the list: a GHashTable with 8 million entries uses 600MB!

Here’s why:

  • We are on a 64-bit platform so each GHashNode takes 24 bytes including four bytes of padding.
  • At a little less than 8 million entries the hash table is resized to about 16 million entries.
  • While resizing, we therefore have 24 million GHashNodes around.
  • 24*24000000 is around 600M, all of which is being accessed during the resize.

So what can be done about it? Here are a few things:

  • Some GHashTables have identical keys and values. For such tables, there’s no need to store both.
  • If the hash function is cheap, there’s no need to keep the hash values around. This gets a little tricky with the unused/tombstone special pseudo-hash values used by the current implementation. It can be done, though.

I wrote a proof of concept which skips things like key destructors because I don’t need them. This uses 1/3 the memory of GHashTable. It might be possible to lower this further if an in-place resize algorithm could be worked out.

On Profiling and Sharks

Tuesday, November 16th, 2010

Be careful in applying simplified models to complicated systems.

For example, Federico has been a persistent proponent for using profiling as the major (only?) guide about where to improve performance: Profiling A+B. And that is great as far as it goes. But no further.

If you were studying sharks on the decks of the fishing boat that caught them you might describe sharks as nearly blind, having no sense of smell, and unable to acquire enough oxygen by themselves. That is not a very accurate description of how they appear in their natural environment.

The point of that is that the A+B model argument applies to a system where only one program is running. The model has been simplified to the point where there is no interaction with the outside, so the A+B model is not well suited to describing the programs behaviour in a realistic system. For example, if B is very I/O intensive then it might well be the right place of concentrate performance efforts.

I/O is not the only way to hit other programs hard. Memory usage is another — especially if the program is long running.

Code Quality, Part II

Wednesday, September 29th, 2010

I have been known to complain loudly when I see code that I feel should have been better before seeing the light of day. But what about my own code? Divinely inspired and bug free from day one? Not a chance!

With Gnumeric as the example, here is what we do to keep the bug count down.

  • Testing for past classes of errors. For example, we found errors in Gnumeric’s function help texts, such as referring to arguments that do not exist or not describing all the arguments. The solution was not only to fix the problems we found, but also to write a test that checks all the function help texts for this kind of errors. Sure enough, there were several more. They are gone now, and new ones will not creep in. We do not like to make the same mistake twice!
  • Use static code checkers. This means that we keep the warning count from “gcc -Wall” down so know nothing serious is being ignored. We have looked at c-lang and Coverity output and fixing the apparent problems. (Those tools have pretty high false report rates, though.) We occasionally use sparse too and have a handful of dumb perl scripts looking for things like GObject destroy/finalize/etc handlers that fail to chain up to the parent class.
  • Use run-time code checkers. Gnumeric has been run through Valgrind and Purify any numbers of times. It is part of the test suite, so it happens regularly. This is regrettably getting harder because newer versions of Gtk+ and the libraries upon which it is built hold on to more and more memory with no way of forcing release. Glib has a built-in checker for some memory problems. We use that too.
  • Automated tests of as many part of the program as we have found time to write. The key word here is “automated”. I used to be somewhat scared of changing the format string (number rendering) code, because there was basically no way of making sure no new errors were introduced in that hairy piece of code. With the extensive test suite, I have no such reservations anymore.
  • Fuzzing, i.e., deliberately throwing garbled input at the program. I wrote tools to do this subtly for xml and files inside a zip archive in such a way that the files are still syntactically correct xml or zip files — otherwise you end up only testing the xml/zip parser which is fine, but not sufficient.
  • Google for Gnumeric. Not every will report problems to us, but they might discuss issues with others. Google seems to be pretty good at finding such occurrences.

The take-home message from this is that code quality is work. Lots of work. And yet we still let mistakes through. I blame that on the lack of a proper QA department.

Code Quality

Tuesday, August 31st, 2010

The recently released GLib 2.25.15 contains a new class for dealing with dates: gdatetime.c. With apologies to Pauli: That code is not right. It is not even wrong.

The code basically claims to handle date+time+timezone. Such code, in principle, makes a lot of sense in Glib. But even a cursory scan through the code reveals numerous grave problems:

  • It reimplements the date handling code from GDate. Badly and buggy: even fairly simple things as advancing one day does not work. Actually, advancing by zero days does not work either.
  • Code like g_date_time_difference and g_date_time_get_hour as well as the representation of time-of-day makes it clear that the code does not and will not handle timezones properly. The author does not understand things like daylight savings time and the fact that some days are not 24 hours long under that regime.
  • Code like g_date_time_printf makes it clear that the author does not understand UTF-8. Here is an outline:

    for (i = 0; i < utf8len; i++)
    {
    const char *tmp =
    g_utf8_offset_to_pointer (format, i);
    char c = g_utf8_get_char (tmp);
    [...]
    }

    That has got to be the worst way to traverse a UTF-8 string seen in the wild. And note how it mangles characters with code points outside the range of “char”.

  • There is no error handling and the API as-is will not allow it.
  • The code obviously was not tested well.

Why does code like that make it into GLib? The code was reviewed by Glib maintainer Matthias Clasen. I do not think he did a very good job. (He is busy asking for patches, but not busy applying patches. Certainly he avoids talking about substance. In any case, the code does not need patches, it needs to be taken out back.)


* * * * *

The bigger question is how you control code quality in a large project like GLib/GTK+. It is a simple question with a very complicated answer probably involving test suites and automated tools. I do not have anything to say about test suites here.

In the free software world the automated tools mostly come down to the compiler, sparse, and valgrind. (Let me know if I have missed anything substantial.)

  • “gcc -Wall” or some variant thereof. GLib and Gtk+ use this and use it reasonably well.
  • “Sparse”. There are signs that GLib/Gtk+ have not been run through sparse for a very long time. Gio, for example, appears to never have been tested.
  • “Valgrind”. Valgrind is probably being used on GLib/Gtk+ regularly, but each new release seems to be putting new roadblocks in the way of making effective use of Valgrind. In modern versions you cannot make Gtk+ release its objects, so Pango will not release its stuff and the font libraries in turn will not release its. Do not get me wrong: exit(2) if a very efficient way of releasing resources, but not being able to — optionally — release resources manually means that you do not know if your memory handling works right.

In short: Glib and Gtk+ are slowly moving away from automated code quality checks beyond the compiler.

I used to run GLib/Gtk+ through Sparse and Purify. Over time I got the message that bug reports based on that were not particular welcome.

Goodbye F-Spot, Hello Picasa

Tuesday, June 2nd, 2009

I am giving up on F-Spot.

It was a really promising application, but it has never been able follow up on that great start. The worst thing is that it is sluggish. Operations that should be instant — like displaying the next image — are not, but take half a second. (Getting a new camera did not help there!) I used to think it was just my old laptop, but with a new laptop that excuse does not fly anymore.

I have now tried Picasa. It is crazy-fast! For now, I am going to use that. The biggest problem was migrating the F-Spot tags. That problem was solved with the help from a Robert Brown on Google’s help forums.

Basically, you need this script to create album files and then blow away Picasa’s database to force a regeneration.

The Gtk+ File Chooser Dialog, Take II

Monday, February 23rd, 2009

OpenSuSE 11.1 updated the Gnome File Chooser and it now looks like this:

file-dialog2

Recall that my premise is that the file chooser’s function is to help the user choose files. By my count, the area used for files is 29% in the above dialog, including header and scroll bar. That simply is not right! Why does the “Places” section (which I rarely use) and its buttons take up that much space? Further, what does the path bar give me that adding the path into the location entry and putting “..” into the file list does not give me?

Things get a lot worse if a file preview is added. Here’s how uploading the above image looked in Mozilla:

file-dialog3

There is room for an incredible 2-4 letters of the file names! And the full “Places” and path bar sections, of course.

Could someone please defend the Gnome File Chooser so I do not have to suggest taking it out back and having it shot!

(I do not take comments at my blog, but you can probably find an email address somewhere.)

The Gtk+ File Chooser Dialog

Wednesday, January 21st, 2009

Whenever I update my OpenSuSE installation, the Gtk+ File Chooser Dialog get worse. This is how it looks for me on OpenSuSE 11.1 when used from Gnumeric. It looks more or less the same from Gedit and Mozilla.

Gtk+ File Chooser Dialog

Gtk+ File Chooser Dialog

I hope I am not breaking new ground when I claim that the purpose of the file chooser is to help the user choose a file. How is that going to happen when the area used for files is less than the size of one button?

I really hope other people are seeing something sane, but this is with a vanilla install, I think. (Note: I mention OpenSuSE 11.1 for reference, not as an assignment of blame.)

Applications

Wednesday, July 16th, 2008

I my optics, computers are here to get certain jobs done. That means it is all about applications, not eye candy: bouncing icons, themes, semi-transparent windows. My real-life work desk is not transparent, and I do not use semi-transparent paper.

Producing large applications is a lot of work, so when I write a piece of (hopefully) well-designed code, I want that code to stay written. I do not want next week’s GTK+ deprecation to come along and, effectively, cause my code to bitrot. (and I really do not want to write two different pieces of code for the job: one for “old” GTK+ and one for “new” GTK+.)

Moving from GTK+ 1.x to GTK+ 2.x was painful. I do not need anything like that again. Talks about breaking API every 3-4 years and advice like “Stay up to date, adapt your application code early” (and, by implication, often) is a clear indication that keeping applications running is likely to mean spending much time cleaning up after someone with an attention span of a few years.

Maintaining code like GTK+ is not hard. Calling it hard because you want to play with some new toy is deceiving.  Maintaining can be tedious, but if you do not want to maintain, please do not start writing new GTK+ code. You will surely abandon that prematurely too, so you have no business writing library code. Instead, go write a useful application: if you abandon that, I probably do not have to care.