GNOME Performance Love Day just ended. Bugs were filed, patches applied, applications profiled.
A little step in the right direction for making GNOME faster.
Thanks to everyone involved: you rock.
I am the man that arranges the blocks
GNOME Performance Love Day just ended. Bugs were filed, patches applied, applications profiled.
A little step in the right direction for making GNOME faster.
Thanks to everyone involved: you rock.
Sunday will be the GNOME Performance Love Day!
The idea was first proposed on the #performance channel, after seeing that some of the initial performance related issues were simply lacking a bitof man-power, and weren’t that difficult even for a beginner; mainly, at this point, lazy loading of shared modules inside the platform libraries, and the integrity checks inside private functions. Also, lack of profiling data on various machines/architectures is another issue when doing performance evaluation and regressions; if we can gather enough data on a GNOME release, on multiple platforms, we can actually become aware of regressions (in the worst case) or brag about us being faster in the next release (in the best case). As Federico wrote, the problem with performance is that you can’t simply make ${FOO} faster: you need to know how fast is going now, and how much your patches are actually making ${FOO} faster – or if they do at all.
So, if sunday you feel in the mood for profiling, hacking and learning more about the GNOME platform, jump aboard!
As I said on the gnome-utils mailing list, the GNOME Dictionary codebase sucks.
Well, it sucks to the very end of it.
It really shows its age (it’s more than 5 years old), and suffers of what I’m used to call design by accretion. This particular technique of software design works like a black hole: particles are attracted to the singularity, and form the “accretion disk”, which is nothing but a big mass of… er… mass, graviting around, attracting other mass, and accelerating – thus getting hotter and hotter – until it eventually hits the event horizon. In software, it works the same way; new code is added, and it becomes relevant up to the point of becoming critical. The code becomes a little bit ugly, and it attracts other ugly code in order to make things work and to add new features. Time passes, and the codebase becomes a uglier mess at each iteration. At one point, everything simply collapses, because one of the “oh-so-hackish” portions of code simply passes the point of being understandable by any human being – including the original writer. Also, the overall entropy of the software increases, because no-one is smart enough to understand the code, let alone sanitize it; this is what I call the “design by accretion runaway syndrome”. At this point, the only option for a developer is to toss everything out of the window in disgust, and begin from scratch; which means time, effort, and skills needed for features and bug-fixing are instead spent on rewriting stuff. One way to stop a “design by accretion” before it reaches its “runaway syndrome” is to check for hacks, kludges, ugly portions of code and exterminate them at each iterations. Hacks are what they are: they might be clever, well-thought or a demostration of coding skills; but they are hacks nonetheless, and the shorter they live, the better codebase will result in the end.
The dictionary application and applet is the best result of this particular software design “technique”; not only the high-level code is a collection of circular references and inclusions – also the low-level implementation of the dictionary protocol has become hackish enough to include at least two API, one of which is an almost complete implementation of RFC 2229, while the other is like a remnant of a previous implementation.
I tried to put a thin GObject layer around it, in order to avoid having to write my own implementation of the dictionary protocol; so far, the results are discouraging. Ergo, the best solution is to throw away the low-level stuff, create a new, GObject-oriented implementation of the dictionary protocol, and build up from there.
In the meantime, I’ll have to pass a couple of exams, and begin porting the BookmarkFile and RecentManager/RecentChooser code under GLib and GTK; also, the FileChooser code should be adapted to support recently used files in OPEN mode, and the shortcuts be saved using the BookmarkFile object inside a default location – something like $XDG_DATA_HOME/desktop-bookmarks/gtk-shortcuts.xbel
.
With almost everyone in Boston, both the mailing lists and IRC channels are really quiet; feels a little weird – but I think it’s mostly envy for not being able to go to the other side of the pond and hack away.
Thus, staying here means choosing between 1. cleaning the house with Marta and 2. hacking on the dictionary applet. While the first option would significantly improve my life, the second option is the most tempting. I’ll be a good boy, though, and clean up, now that I finally found to what I’m allergic to (you wouldn’t believe it: absynthe pollen; a potential career as “damned poet” thrown out of the window). Tomorrow I’m going to stay eight hours at the university campus, so I’ll end up working on the dictionary applet anyway…
As a side project, I’ll have to update the desktop bookmarks specification. The new revision should take into account not only the storage format, but the file locations for recently used resources, desktop places and shortcuts; by defining locations and launch behaviour we will be able to create a standard way to access bookmarks to resources on a desktop box. You could access your favourite locations inside your browser, FileChooser dialogs and Nautilus; also, your system administrator could define system-wide locations, in order to access default positions on each machine and on the network – no matter which environment you are using.
Here we go again profiling GMarkup.
Finally, I’ve been able to compile sysprof, and launch a profiling session of the BookmarkFile parser under it. This time, I’ve been using a 1.6MB file, with 3360 bookmark elements. Just to shed a bit more light on the storage format, here’s a single entry:
<?xml version="1.0" encoding="utf-8"?> <xbel version="1.0" xmlns:bookmark="http://www.freedesktop.org/standards/desktop-bookmarks" xmlns:mime="http://www.freedesktop.org/standards/shared-mime-info"> <bookmark href="file:///tmp/test-xbel/test-1.txt" added="20051005T22:15:04Z" modified="20051005T22:15:13Z" visited="20051005T22:15:04Z"> <info> <metadata owner="http://freedesktop.org"> <mime:mime-type type="text/plain"/> <bookmark:applications> <bookmark:application name="populate-recent" exec="populate-recent --info %u" timestamp="1128550513" count="2"/> </bookmark:applications> </metadata> </info> </bookmark> </xbel>
This is a valid bookmark expressed using XBEL version 1.0; the important bit is contained in the href attribute of the bookmark element, and it’s the URI of the bookmark; then, we have three timestamps, expressed using the ISO8601 format (time relative to UTC). Everything else, is our meta-data for the bookmark entry: the MIME type, the registering applications, etc. The non-mandatory bits which are missing are the document’s name and description, the “private” hint, and the groups (or meta-classes) to which the entry belongs.
Now, picture three thousands of these entries, and you’ll have the needed background.
As a profiling thought experiment, we could already define plausible hot spots; first of all, since GMarkup is an event-driven parser, we will spend most of our time inside the callbacks defined for when the start of an element is found, for the end of the element and for the text node inside the element itself; we will also have hot spots when we have to deal with attributes and with data parsing – like the conversion from the ISO8601 timestamps to something usable under Un*x.
After some thought profiling, here’s some real data.
Function | % |
---|---|
… | … |
main | 92.56 |
egg_bookmark_file_load_from_file | 91.14 |
egg_bookmark_file_load_from_data | 90.16 |
start_element_raw | 37.81 |
is_element_full | 34.24 |
end_element_raw | 22.94 |
timestamp_from_iso8601 | 9.58 |
egg_bookmark_file_add_item | 9.23 |
text_raw_cb | 1.36 |
egg_bookmark_file_free | 1.01 |
egg_bookmark_item_free | 0.94 |
… | … |
These numbers come from a sysprof session during a thousand runs of the testbookmarkfile
program (included in CVS) on a custom file containing ~3300 bookmark entries.
The testbookmarkfile
program is really simple: it loads an XBEL file and prints the number of bookmarks that has been successfully parsed, before quitting. So, having the most of the time spent inside the egg_bookmark_file_load_*
functions is really pretty much understandable. As I said in the first profile run, the file is loaded into memory using a buffered load, 4096 bytes at a time; the I/O, on a pretty decent machine, takes just .98% of the total time – which means that it’s basically drowned inside the system’s I/O latency.
Then, we have the three callbacks: start_element_raw_cb
, end_element_raw_cb
and text_raw_cb
. The first one and the second one control the state machine used in order to recognize the element hierarchy and context, while the third gets the text stored inside the text nodes. This means that these functions are big “if…else if… else” blocks, where the actual state is checked and changed accordingly to the hierarchy rules defined by the storage format.
The first unknown entry, at this point, is the is_element_full
function: what does it do? It checks if an element’s name is what we want it to be – in order to change the state of the FSM. Then why does it take ~34% of the computation time? Because of the xmlns:bookmark and xmlns:mime attributes you find inside the xbel element. These attributes are special: they are XML Namespace Declarations – that is, they say that whenever an element is prepended with a string matching the part after the colon of the xmlns attribute, then the element belongs to the namespace defined by the URI inside the relative xmlns attribute. Using namespaces we are able to include other elements than the one defined for a specific document, avoiding clashes, etc.; the custom meta-data we use inside the metadata tag is possible only because we use namespaced elements. All you have to do when you parse this stream, is to “resolve” the namespace before checking the element’s name – and that exactly is what is_element_full
does: it resolves an element to a full element, that is an element prepended with the namespace URI, and then checks it against a URI and an element name we pass to it.
Obviously, even though it should be a fast operation (actually, is pretty much a constant-time operation), it must be done for each element, and for both start and end element callbacks, so we end up in there many times; fortunately, we provide a bunch of fast paths in it, and we also store namespaces inside an hashtable, so that it does take just a third of the total time. I was a bit skeptic on its reliability, though – and always felt that the check was a little to “sensible”; so far, is_element_full
has proved me wrong – but I’ll always feel restless, and await the dreaded moment when someone breaks it.
As another big offender, we find the timestamp_from_iso8601
function. As its name implies, it transforms a ISO8601 timestamp into a Un*x one – that is, seconds from the Epoch (00:00:01 1.1.1970). I’ve taken part of the function’s logic from Mr. Project’s code, while the inverse function (timestamp_to_iso8601
) was taken from Drivel’s XML:RPC code. I adapted them both, and made them a little bit stronger on the payload constraints. Anyway, while producing a ISO8601 formatted date is quite easy (just call strftime
), the other way round is really not for the faint of heart. I was considering the option of producing a patch for making it a GLib function – many web-based projects would benefit from it, since the ISO8601 format should be the standard for expressing dates on the intarweb. As you can see, every bookmark element has three timestamps – so this function must be run three times per bookmark entry. That is probably why it’s accountable for the 9% of the overall time.
Another 9% of the time is spent adding a new item inside the list held by the BookmarkFile object. This is because the bookmarks are stored inside a GList, and indexed using an hashtable for faster lookup times. I’ve just noticed that I’ve used g_list_append
in order to store a new BookmarkItem
structure; we could shave of some time by using g_list_prepend
instead, since we really don’t care of how the bookmarks are sorted – both on disk and in memory; actually, the only way to get a list of bookmarks is to get the URIs, and then access them one at a time, by its URI. So, thanks sysprof for this. (While I was writing this, I did the change; now the egg_bookmark_file_add_item
time has become just ~3% of the total time – yay).
The lowest offenders are the egg_bookmark_*_free
functions – which do nothing except… well… free the data. Since we have a couple of lists, we end up having to cycle through them in order to free the data.
The point is that, surprisingly enough, BookmarkFile is performing quite well – and I say “suprisingly” because I didn’t expect it to at this point in time (it is a second iteration parser, written as a replacement for a libxml2-based parser); most of the time is spent in places where we could shave off something by micro-optimizing stuff (inlining some functions might yield better timings, as well as marking some unlikely path and letting the compiler do the dirty work).
This allows the code inside libegg/recentchooser to appear quite fast – or, at least, to put the blame on the performance issues of the RecentChooser widgets on Gtk instead of my parser code – heh.
This profiling sessions wouldn’t have been possible without the great job of Søren Sandmann, who wrote sysprof, and – of course – Federico Mena Quintero, our profiling overlord. I’ve learned a great deal of things in order to correctely profile my code, and it really paid off in terms of speed and reliability.
Today I had a couple of hours to spare at the university, so I decided to implement a couple of items in my “to do” list.
The first is the creation of a simple function for adding items to the RecentManager – ideally, it should require just a URI and retrieve the data where possible. For the MIME of the resource, I used the xdgmime
library, which is also used by GTK in order to interface the RecentManager code to the FreeDesktop.Org shared MIME informations; the application_name, application_exec
tuple is filled with the default program name (as supplied by GLib). This should cut back some of the API complexity, and be useful just for the local file case. If you are dealing with remotely-placed resources, you should really use gnome-vfs, and fill up these meta-data fields by yourself.
The other item in my list was the thumbnailing support. James Cape did a wonderful job, implementing the fd.o thumbnailing specification inside a libegg module (for his IconChooser widget), so I just linked his code, and added an appropriate function for the RecentInfo
object.
Profiling GMarkup – take 2. After having profiled GMarkup with a medium sized file (under 1000 bookmark entries, approx. 400KB), I’ve set off to profile it using a >1MB file, in order to get some more experimental data on the edges of GMarkup. During the last GTK meeting, Federico suggested that GTK could read the GConf XML and retrieve some of the preferences from there – instead of making GTK depending on GConf (which would mean making it depending on ORBit2, libxml2 and others). I’ve expressed some doubts about this operation; mainly because reading the recently used resources file one in a while is not so intensive: after all, we are talking about one thousand entries at most (I do clean my recent documents once in a while), while a common $HOME/.gconf
directory spans typically more than 1 MB on disk. The preliminary figures I gathered last night are that parsing a 1.5 MB file with GMarkup will take up at most ~2.5 seconds with cold cache and ~1 second with hot cache (on a test with 10000 sequential runs). Federico is trying to shave off decimals of seconds from the FileChooser – a 2 seconds delay just for reading the GConf database is just wrong; also, what if the configuration is not local or user-defined? We can’t read (let alone modify) some of the settings if the system administrators locked them down. So, in the end, I think that this is just not the way to go.
Jody Goldberg proposed the creation of a KeyFile-based configuration system just for GTK; this would bring GNOME into the exciting nineties, right at the side of Windows 3.1 with its .INI files.
While I wholeheartedly agree with Federico and Jody that GTK should be able to save some of its state data, I also feel that both GConf and a INI-like solutions are not the best one for this issue. I don’t know what to propose in place, though, so I’ll just shut up for the time being.