Profiling GMarkup – Take 2

Here we go again profiling GMarkup.

Finally, I’ve been able to compile sysprof, and launch a profiling session of the BookmarkFile parser under it. This time, I’ve been using a 1.6MB file, with 3360 bookmark elements. Just to shed a bit more light on the storage format, here’s a single entry:


<?xml version="1.0" encoding="utf-8"?>
<xbel version="1.0"
      xmlns:bookmark="http://www.freedesktop.org/standards/desktop-bookmarks"
      xmlns:mime="http://www.freedesktop.org/standards/shared-mime-info">
  <bookmark href="file:///tmp/test-xbel/test-1.txt"
            added="20051005T22:15:04Z"
            modified="20051005T22:15:13Z"
            visited="20051005T22:15:04Z">
    <info>
      <metadata owner="http://freedesktop.org">
        <mime:mime-type type="text/plain"/>
        <bookmark:applications>
          <bookmark:application name="populate-recent"
                                exec="populate-recent --info %u"
                                timestamp="1128550513"
                                count="2"/>
        </bookmark:applications>
      </metadata>
    </info>
  </bookmark>
</xbel>

This is a valid bookmark expressed using XBEL version 1.0; the important bit is contained in the href attribute of the bookmark element, and it’s the URI of the bookmark; then, we have three timestamps, expressed using the ISO8601 format (time relative to UTC). Everything else, is our meta-data for the bookmark entry: the MIME type, the registering applications, etc. The non-mandatory bits which are missing are the document’s name and description, the “private” hint, and the groups (or meta-classes) to which the entry belongs.

Now, picture three thousands of these entries, and you’ll have the needed background.

As a profiling thought experiment, we could already define plausible hot spots; first of all, since GMarkup is an event-driven parser, we will spend most of our time inside the callbacks defined for when the start of an element is found, for the end of the element and for the text node inside the element itself; we will also have hot spots when we have to deal with attributes and with data parsing – like the conversion from the ISO8601 timestamps to something usable under Un*x.

Some numbers…

After some thought profiling, here’s some real data.

Function %
main 92.56
egg_bookmark_file_load_from_file 91.14
egg_bookmark_file_load_from_data 90.16
start_element_raw 37.81
is_element_full 34.24
end_element_raw 22.94
timestamp_from_iso8601 9.58
egg_bookmark_file_add_item 9.23
text_raw_cb 1.36
egg_bookmark_file_free 1.01
egg_bookmark_item_free 0.94

These numbers come from a sysprof session during a thousand runs of the testbookmarkfile program (included in CVS) on a custom file containing ~3300 bookmark entries.

… Many words…

The testbookmarkfile program is really simple: it loads an XBEL file and prints the number of bookmarks that has been successfully parsed, before quitting. So, having the most of the time spent inside the egg_bookmark_file_load_* functions is really pretty much understandable. As I said in the first profile run, the file is loaded into memory using a buffered load, 4096 bytes at a time; the I/O, on a pretty decent machine, takes just .98% of the total time – which means that it’s basically drowned inside the system’s I/O latency.

Then, we have the three callbacks: start_element_raw_cb, end_element_raw_cb and text_raw_cb. The first one and the second one control the state machine used in order to recognize the element hierarchy and context, while the third gets the text stored inside the text nodes. This means that these functions are big “if…else if… else” blocks, where the actual state is checked and changed accordingly to the hierarchy rules defined by the storage format.

The first unknown entry, at this point, is the is_element_full function: what does it do? It checks if an element’s name is what we want it to be – in order to change the state of the FSM. Then why does it take ~34% of the computation time? Because of the xmlns:bookmark and xmlns:mime attributes you find inside the xbel element. These attributes are special: they are XML Namespace Declarations – that is, they say that whenever an element is prepended with a string matching the part after the colon of the xmlns attribute, then the element belongs to the namespace defined by the URI inside the relative xmlns attribute. Using namespaces we are able to include other elements than the one defined for a specific document, avoiding clashes, etc.; the custom meta-data we use inside the metadata tag is possible only because we use namespaced elements. All you have to do when you parse this stream, is to “resolve” the namespace before checking the element’s name – and that exactly is what is_element_full does: it resolves an element to a full element, that is an element prepended with the namespace URI, and then checks it against a URI and an element name we pass to it.

Obviously, even though it should be a fast operation (actually, is pretty much a constant-time operation), it must be done for each element, and for both start and end element callbacks, so we end up in there many times; fortunately, we provide a bunch of fast paths in it, and we also store namespaces inside an hashtable, so that it does take just a third of the total time. I was a bit skeptic on its reliability, though – and always felt that the check was a little to “sensible”; so far, is_element_full has proved me wrong – but I’ll always feel restless, and await the dreaded moment when someone breaks it.

As another big offender, we find the timestamp_from_iso8601 function. As its name implies, it transforms a ISO8601 timestamp into a Un*x one – that is, seconds from the Epoch (00:00:01 1.1.1970). I’ve taken part of the function’s logic from Mr. Project’s code, while the inverse function (timestamp_to_iso8601) was taken from Drivel’s XML:RPC code. I adapted them both, and made them a little bit stronger on the payload constraints. Anyway, while producing a ISO8601 formatted date is quite easy (just call strftime), the other way round is really not for the faint of heart. I was considering the option of producing a patch for making it a GLib function – many web-based projects would benefit from it, since the ISO8601 format should be the standard for expressing dates on the intarweb. As you can see, every bookmark element has three timestamps – so this function must be run three times per bookmark entry. That is probably why it’s accountable for the 9% of the overall time.

Another 9% of the time is spent adding a new item inside the list held by the BookmarkFile object. This is because the bookmarks are stored inside a GList, and indexed using an hashtable for faster lookup times. I’ve just noticed that I’ve used g_list_append in order to store a new BookmarkItem structure; we could shave of some time by using g_list_prepend instead, since we really don’t care of how the bookmarks are sorted – both on disk and in memory; actually, the only way to get a list of bookmarks is to get the URIs, and then access them one at a time, by its URI. So, thanks sysprof for this. (While I was writing this, I did the change; now the egg_bookmark_file_add_item time has become just ~3% of the total time – yay).

The lowest offenders are the egg_bookmark_*_free functions – which do nothing except… well… free the data. Since we have a couple of lists, we end up having to cycle through them in order to free the data.

… And a point

The point is that, surprisingly enough, BookmarkFile is performing quite well – and I say “suprisingly” because I didn’t expect it to at this point in time (it is a second iteration parser, written as a replacement for a libxml2-based parser); most of the time is spent in places where we could shave off something by micro-optimizing stuff (inlining some functions might yield better timings, as well as marking some unlikely path and letting the compiler do the dirty work).

This allows the code inside libegg/recentchooser to appear quite fast – or, at least, to put the blame on the performance issues of the RecentChooser widgets on Gtk instead of my parser code – heh.

The End (for now)

This profiling sessions wouldn’t have been possible without the great job of Søren Sandmann, who wrote sysprof, and – of course – Federico Mena Quintero, our profiling overlord. I’ve learned a great deal of things in order to correctely profile my code, and it really paid off in terms of speed and reliability.

Thumbnails

Today I had a couple of hours to spare at the university, so I decided to implement a couple of items in my “to do” list.

The first is the creation of a simple function for adding items to the RecentManager – ideally, it should require just a URI and retrieve the data where possible. For the MIME of the resource, I used the xdgmime library, which is also used by GTK in order to interface the RecentManager code to the FreeDesktop.Org shared MIME informations; the application_name, application_exec tuple is filled with the default program name (as supplied by GLib). This should cut back some of the API complexity, and be useful just for the local file case. If you are dealing with remotely-placed resources, you should really use gnome-vfs, and fill up these meta-data fields by yourself.

The other item in my list was the thumbnailing support. James Cape did a wonderful job, implementing the fd.o thumbnailing specification inside a libegg module (for his IconChooser widget), so I just linked his code, and added an appropriate function for the RecentInfo object.

recent documents thumbnails
Now, both the RecentChooserMenu and the RecentChooserWidget can display thumbnails of the resources.

Profiling GMarkup – take 2. After having profiled GMarkup with a medium sized file (under 1000 bookmark entries, approx. 400KB), I’ve set off to profile it using a >1MB file, in order to get some more experimental data on the edges of GMarkup. During the last GTK meeting, Federico suggested that GTK could read the GConf XML and retrieve some of the preferences from there – instead of making GTK depending on GConf (which would mean making it depending on ORBit2, libxml2 and others). I’ve expressed some doubts about this operation; mainly because reading the recently used resources file one in a while is not so intensive: after all, we are talking about one thousand entries at most (I do clean my recent documents once in a while), while a common $HOME/.gconf directory spans typically more than 1 MB on disk. The preliminary figures I gathered last night are that parsing a 1.5 MB file with GMarkup will take up at most ~2.5 seconds with cold cache and ~1 second with hot cache (on a test with 10000 sequential runs). Federico is trying to shave off decimals of seconds from the FileChooser – a 2 seconds delay just for reading the GConf database is just wrong; also, what if the configuration is not local or user-defined? We can’t read (let alone modify) some of the settings if the system administrators locked them down. So, in the end, I think that this is just not the way to go.

Jody Goldberg proposed the creation of a KeyFile-based configuration system just for GTK; this would bring GNOME into the exciting nineties, right at the side of Windows 3.1 with its .INI files.

While I wholeheartedly agree with Federico and Jody that GTK should be able to save some of its state data, I also feel that both GConf and a INI-like solutions are not the best one for this issue. I don’t know what to propose in place, though, so I’ll just shut up for the time being.

Gtk2::SourceView

Finally, after a more then a year of testing inside CVS, Gtk2::SourceView made it to the public.

This Perl module is a wrapper around gtksourceview, the C library used primarily by Gedit. I’ve began writing it in January 2004, after working on Gnome2::Print (libgnomeprint(ui) Perl bindings) and Gnome2::GConf (GConf bindings). After the 1.0 release of the C library, I left it bit-rot inside CVS; this August, Torsten kaffee Schoenfeld did a major overhaul in order to update the build system with the latest and gratest from Glib/Gtk2, and also wrapped many of the remaining objects and methods.

Yesterday evening, I was asked for a release on -perl, and – while at it – I’ve added the accessor methods for the Gtk2::SourceView::TagStyle object, which were left unbound. I’ll do another round in order to wrap the remaining methods and add some more tests inside HEAD.

This also marks my first release on CPAN, so yay for me

Requirements

Using advogato (and my log), since I’m not on the Planet (yet).

I’ve just read Daniel’s post, and since shame is a powerful drive, here’s my non-requested answer.

I’m one guy – quite possibly, the only one – that switched from a full-fledged XML parser like libxml2 to GMarkup. Believe you me: I would have rather stayed within the blissfulness of DOM, within the ease of development of a complex and powerful parser, within the safety of one of the best XML parser around the F/OSS world.

I would have used libxml2 (and in fact, I did begin using it) – because of the work that DV (and every other developer involved) put into libxml2; I state that again: it’s a wonderful library, and it’s great to have it, and for it I’ll have to buy Daniel enough beer to knock him unconscious until Gnome 2.14. :-)

That said, I switched to GMarkup because libxml2 is also a heavy dependency for Gtk+. It’s a 1M+ library, and a dependency some devices can’t afford to have on the chain – I think specifically of embedded devices.

Supporting a platform standard like the storage for recent files and bookmarks only on desktop boxes, because they can afford to have libxml2 pre-installed is not an option.

I remember discussing on the XDG list with Daniel and others about the desktop-bookmark spec; the spec started as a GMarkup format, and then I was convinced to use XBEL. It was a good idea – and once properly standardised, it could lead to data sharing between various environments; a goal that the previous recent-files spec missed badly.

Having GMarkup to parse a valid XBEL stream, even with every limitation it has (UTF-8 only, a bit shaky XML:NS support, etc.), has been possible (even thinkable) just because I had beside my open Gedit window another window with a XBEL parser written using libxml2, reminding me how a full-fledged XML parser should work.

So: thanks Daniel for libxml2 – your great work has been and it’s still really appreciated and useful. Sadly, there are requirements, in this world, and many times they collide with what we would like.

EggBookmarkFile

hacking: That is, the GMarkup-based XBEL parser that should be used to parse “desktop bookmarks” (recent files, filechooser’s bookmarks, default locations, etc.)

Today I’ve worked hard on the namespace parsing mechanism, and even though I feel like it’s a little too fragile, and it doesn’t cover every conceivable XML namespace declaration, it’s a start. At the moment, it parser the XBEL streams that I pass to it, resulting from libxml2.

I’ll tighten up the namespace marking routines, in order for it to be XML:NS compliant, and hope that nobody messes up with his own bookmarks. :-)

life: Tonight, I made bread with Marta. On Sunday there will be a party at her parents’, and she’s going to cater for 25+ people. We had to go shopping for two days – and believe me: it has been very tiresome.

exams: tomorrow, a C exam.

Update 2005-09-16@09:39: the EggBookmarkFile code hit CVS tonight, and so the code in EggRecentManager using it. Profiling is still on, so it’ll require auto-foo patching, but everything works nicely at the moment. I’ll do more work in the API and more profiling on the widgets, as soon as I can compile sysprof.

Profiling GMarkup

Since I’ve been re-writing the XBEL parser in order to use GMarkup in place of libxml2, I’ve also felt the need to get some profile going.

I’ve always felt that one of the biggest bottlenecks in the recent-files was its use of GMarkup for the recent files storage; it appears that I was wrong – at least, as far as GMarkup is concerned.

I’ve added some profile code in my local development tree (I’ve yet to upload it to libegg, since it’s not finished or polished enough), and here’s some results:

P: [ load_from_file:000.000000] - loading from file 'test-file.xbel'
P:  [ load_from_data:000.000000] - begin loading from data
P:   [     parse_data:000.000000] - beginning parse_data
P:  [     parse_data:000.163825] - finished parsing
P: [ load_from_data:000.164151] - finished loading data
P:[ load_from_file:000.167050] - finished loading from file 'test-file.xbel'
782 bookmarks loaded successfully.

(numbers are seconds and microseconds timestamps, as returned by gettimeofday(3), launching from a cold cache state hot cache (the cache wasn’t too cold, as later profiling showed) – with the file and libraries in cache the times are pretty much halved). As you can see, test-file.xbel is an XBEL compliant stream containing 782 bookmarks. I know this is a Poor Man’s Profiling(tm), but better than nothing anyway.

As you cans ee, the parsing of the file with GMarkup takes up ~0.16 seconds, mostly due to the parsing (obviously); I’ve implemented the loading from a file using a buffered read (4KB a time, which means that the read cycle does ~400 iterations in order to store the test-file.xbel file); then I feed the buffer to a function that parses everything from a string in memory.

Next run, I’ll break down the parse_data section into the per-element parsing for the bookmark, mime-type and application elements, which should be the most time-consuming, parsing-wise:

P: [ load_from_file:000.000000] - loading from file 'test-file.xbel'
P:  [ load_from_data:000.000000] - begin loading from data
P:   [     parse_data:000.000000] - beginning parse_data
P:    [ parse_bookmark:000.000000] - begin bookmark element
P:   [ parse_bookmark:000.000486] - end bookmark element
P:    [     parse_mime:000.000000] - begin mime element
P:   [     parse_mime:000.000094] - end mime element
P:    [parse_application:000.000000] - begin application element
P:   [parse_application:000.000124] - end application element
...

As you can see, the bookmark element is the biggest offender, here; this because it has four attributes (href, added, modified and visited), three of which hold a timestamp stored is the ISO8601 format (“YYYYMMDDTHH:MM:SSZ”, where T and Z are fixed characters) – which must be parsed too. The application element holds another four attributes (name, exec, count and timestamp), so it’s the second place where the parser spends most of its time. GMarkup holds the attributes names and values passed to the callbacks in two string arrays, so scanning them is a O(n) operation; better would be accessing them using a attribute_value = g_markup_parse_context_get_attribute (context, element, "attribute_name") which internally uses an hash table: this would make the attribute parsing O(1) – but it would also break API; we could provide a flag for activating a new callback signature:

void (*start_element) (GMarkupParseContext  *context,
		       const gchar          *element_name,
		       GError              **error);
void (*end _element)  (GMarkupParseContext  *context,
		       const gchar          *element_name,
		       GError              **error);

...

G_CONST_RETURN gchar *
g_markup_parse_context_get_attribute (GMarkupParseContext  *context,
				      const gchar          *attribute_name,
				      GError              **error);

This would be neat, even though, given the numbers, this could be seen as micro-optimization.

Consideration:: most time, as shown by cold-cache runs, is spent in I/O; thus, GMarkup shows virtually no bottleneck, even with parsing a bit more intensive than plain dumb 0 == strcmp (element_name, element)) checking.

Surgeon

Today, Marta went to Trento to visit some relatives of hers, so I spent the entire day alone. I’ve studied for a while (I have a OS exam monday, mostly Linux-based on the case studies, even though the kernel shown is a 2.0 at most), and then I spent the entire afternoon rewriting the recent files storage parser using just GMarkup, in order to drop the libxml2 dependency.

Now, after looking at the resulting 3000+ LOC I really need my eyeballs to be removed and cleaned. I had to put tugether a state machine in order to validate the incoming data, and in order to remove part of the if (...) { ... } else if (...) { ... } else { ... } madness resulting of having to deal with an event-based parser.

Anyway, seems that it worked out pretty well.

RecentChooserWidget/1

The EggRecentChooserWidget code is in CVS since a bunch of minutes, complete with a context menu allowing the removal of items from the list, the ability to add a custom RecentManager instance instead of using its own (still requires some polishing, but the interesting bits are there), and filtering based on a custom function. Also, I’ve fixed a nasty bug in the parser library that left garbage at the end of the storage file when removing items.

The next step is to allow the creation of RecentManager objects (maybe using runtime loadable modules) so that Gnome could use a GnomeVFS enabled resource monitoring in order to track location/file changes. This would require the implementation of RecentManager as a GInterface, as a plausible approach.

Anyway, now I’ll work on an applet/technology demonstrator, in order to have more to show to the Gtk people.

RecentChooserWidget

I’ve pretty much finished my EggRecentChooserWidget widget, which (I noticed) is roughly modelled on the Document History window of the Gimp:

RecentChooserWidget

Also, as you can see, I’ve enable the filtering code, so that you can now filter the contents of the recently used resources list on the fly:

RecentChooserWidget
Display all recently used resources.
RecentChooserWidget
Display today’s recently used resources.

The EggRecentChooserWidget will also have a dialog window (EggRecentChooserDialog) and a button widget (EggRecentChooserButton), much in the same philosophy of the GtkFileChooser widget set. In the source tree I’ve put a simple test program which basically recreates the same Open Recent menu of the Gimp (except for the little preview, which is replaced by the icon bound to the file’s MIME type).

There are some rough edges I’d like to address before committing the code, but I plan to have the widget and the dialog inside CVS by this week.

On a related news, I’ve began porting the panel to the RecentChooser and RecentManager code; other than changing the Open Recent menu widget with the EggRecentChooserMenu one (complete with a menu item launching the EggRecentChooserDialog dialog in order to display all the recently used resources), I plan to add a list of recently used applications inside the Run Application dialog; using this list as a default, we could hide the currently used entry.

Now boarding…

holidays: I’m leaving for a week in Amsterdam, with Marta.

I’ve already been in Netherlands, years ago with my family, so I already know the city; it’ll be fun to track down what changed, and to see new things.

work: As of july 26th I’m officially unemployed – or, better, in recovery (in a Scary Devil Monastery sense). And, by the way, if you are hiring, and you need a C/Perl developer with sysadmin skills, please feel free to contact me.

hacking: Well, the infamous RecentChooserWidget is finally here:

RecentChooserWidget

Still lacking the filter and drag and drop on the list, but everything else is here.

I’m planning to write an applet soon as I return from Amsterdam.