Profiling GMarkup – Take 2

Here we go again profiling GMarkup.

Finally, I’ve been able to compile sysprof, and launch a profiling session of the BookmarkFile parser under it. This time, I’ve been using a 1.6MB file, with 3360 bookmark elements. Just to shed a bit more light on the storage format, here’s a single entry:


<?xml version="1.0" encoding="utf-8"?>
<xbel version="1.0"
      xmlns:bookmark="http://www.freedesktop.org/standards/desktop-bookmarks"
      xmlns:mime="http://www.freedesktop.org/standards/shared-mime-info">
  <bookmark href="file:///tmp/test-xbel/test-1.txt"
            added="20051005T22:15:04Z"
            modified="20051005T22:15:13Z"
            visited="20051005T22:15:04Z">
    <info>
      <metadata owner="http://freedesktop.org">
        <mime:mime-type type="text/plain"/>
        <bookmark:applications>
          <bookmark:application name="populate-recent"
                                exec="populate-recent --info %u"
                                timestamp="1128550513"
                                count="2"/>
        </bookmark:applications>
      </metadata>
    </info>
  </bookmark>
</xbel>

This is a valid bookmark expressed using XBEL version 1.0; the important bit is contained in the href attribute of the bookmark element, and it’s the URI of the bookmark; then, we have three timestamps, expressed using the ISO8601 format (time relative to UTC). Everything else, is our meta-data for the bookmark entry: the MIME type, the registering applications, etc. The non-mandatory bits which are missing are the document’s name and description, the “private” hint, and the groups (or meta-classes) to which the entry belongs.

Now, picture three thousands of these entries, and you’ll have the needed background.

As a profiling thought experiment, we could already define plausible hot spots; first of all, since GMarkup is an event-driven parser, we will spend most of our time inside the callbacks defined for when the start of an element is found, for the end of the element and for the text node inside the element itself; we will also have hot spots when we have to deal with attributes and with data parsing – like the conversion from the ISO8601 timestamps to something usable under Un*x.

Some numbers…

After some thought profiling, here’s some real data.

Function %
main 92.56
egg_bookmark_file_load_from_file 91.14
egg_bookmark_file_load_from_data 90.16
start_element_raw 37.81
is_element_full 34.24
end_element_raw 22.94
timestamp_from_iso8601 9.58
egg_bookmark_file_add_item 9.23
text_raw_cb 1.36
egg_bookmark_file_free 1.01
egg_bookmark_item_free 0.94

These numbers come from a sysprof session during a thousand runs of the testbookmarkfile program (included in CVS) on a custom file containing ~3300 bookmark entries.

… Many words…

The testbookmarkfile program is really simple: it loads an XBEL file and prints the number of bookmarks that has been successfully parsed, before quitting. So, having the most of the time spent inside the egg_bookmark_file_load_* functions is really pretty much understandable. As I said in the first profile run, the file is loaded into memory using a buffered load, 4096 bytes at a time; the I/O, on a pretty decent machine, takes just .98% of the total time – which means that it’s basically drowned inside the system’s I/O latency.

Then, we have the three callbacks: start_element_raw_cb, end_element_raw_cb and text_raw_cb. The first one and the second one control the state machine used in order to recognize the element hierarchy and context, while the third gets the text stored inside the text nodes. This means that these functions are big “if…else if… else” blocks, where the actual state is checked and changed accordingly to the hierarchy rules defined by the storage format.

The first unknown entry, at this point, is the is_element_full function: what does it do? It checks if an element’s name is what we want it to be – in order to change the state of the FSM. Then why does it take ~34% of the computation time? Because of the xmlns:bookmark and xmlns:mime attributes you find inside the xbel element. These attributes are special: they are XML Namespace Declarations – that is, they say that whenever an element is prepended with a string matching the part after the colon of the xmlns attribute, then the element belongs to the namespace defined by the URI inside the relative xmlns attribute. Using namespaces we are able to include other elements than the one defined for a specific document, avoiding clashes, etc.; the custom meta-data we use inside the metadata tag is possible only because we use namespaced elements. All you have to do when you parse this stream, is to “resolve” the namespace before checking the element’s name – and that exactly is what is_element_full does: it resolves an element to a full element, that is an element prepended with the namespace URI, and then checks it against a URI and an element name we pass to it.

Obviously, even though it should be a fast operation (actually, is pretty much a constant-time operation), it must be done for each element, and for both start and end element callbacks, so we end up in there many times; fortunately, we provide a bunch of fast paths in it, and we also store namespaces inside an hashtable, so that it does take just a third of the total time. I was a bit skeptic on its reliability, though – and always felt that the check was a little to “sensible”; so far, is_element_full has proved me wrong – but I’ll always feel restless, and await the dreaded moment when someone breaks it.

As another big offender, we find the timestamp_from_iso8601 function. As its name implies, it transforms a ISO8601 timestamp into a Un*x one – that is, seconds from the Epoch (00:00:01 1.1.1970). I’ve taken part of the function’s logic from Mr. Project’s code, while the inverse function (timestamp_to_iso8601) was taken from Drivel’s XML:RPC code. I adapted them both, and made them a little bit stronger on the payload constraints. Anyway, while producing a ISO8601 formatted date is quite easy (just call strftime), the other way round is really not for the faint of heart. I was considering the option of producing a patch for making it a GLib function – many web-based projects would benefit from it, since the ISO8601 format should be the standard for expressing dates on the intarweb. As you can see, every bookmark element has three timestamps – so this function must be run three times per bookmark entry. That is probably why it’s accountable for the 9% of the overall time.

Another 9% of the time is spent adding a new item inside the list held by the BookmarkFile object. This is because the bookmarks are stored inside a GList, and indexed using an hashtable for faster lookup times. I’ve just noticed that I’ve used g_list_append in order to store a new BookmarkItem structure; we could shave of some time by using g_list_prepend instead, since we really don’t care of how the bookmarks are sorted – both on disk and in memory; actually, the only way to get a list of bookmarks is to get the URIs, and then access them one at a time, by its URI. So, thanks sysprof for this. (While I was writing this, I did the change; now the egg_bookmark_file_add_item time has become just ~3% of the total time – yay).

The lowest offenders are the egg_bookmark_*_free functions – which do nothing except… well… free the data. Since we have a couple of lists, we end up having to cycle through them in order to free the data.

… And a point

The point is that, surprisingly enough, BookmarkFile is performing quite well – and I say “suprisingly” because I didn’t expect it to at this point in time (it is a second iteration parser, written as a replacement for a libxml2-based parser); most of the time is spent in places where we could shave off something by micro-optimizing stuff (inlining some functions might yield better timings, as well as marking some unlikely path and letting the compiler do the dirty work).

This allows the code inside libegg/recentchooser to appear quite fast – or, at least, to put the blame on the performance issues of the RecentChooser widgets on Gtk instead of my parser code – heh.

The End (for now)

This profiling sessions wouldn’t have been possible without the great job of Søren Sandmann, who wrote sysprof, and – of course – Federico Mena Quintero, our profiling overlord. I’ve learned a great deal of things in order to correctely profile my code, and it really paid off in terms of speed and reliability.