Cleaning up

life: after the “exam craze” (two in five days) that took the best part of the week, me and Marta have resumed cleaning up the house. Since I have moved in with her, there has been so much work to do that we haven’t really felt the whole “moving in” part; but I assume it will get us, sooner or later. Next week we’re going to visit my parents, and get some of my books/DVDs/whatever that I left behind.

hacking: after moving the RecentManager to new new BookmarkFile parser/writer object, I resumed working on the actual separation of the recently used resources viewer widgets (implementing the RecentChooser interface) and the list controller (the RecentManager object). Last week I removed every trace of sorting and limit handling code involving the manager object from the widgets, and recoded them directely into the widgets themselves. Thus, it’s now possible to have a single manager instance and multiple viewers bound to it, instead of creating a manager object per widget:

RecentChooserWidgets
Two widgets, one manager

Sorting, filtering and the list limit property are still present in the RecentManager object; this way you could have a custom sorting and filtering functions set into a manager, and then multiple views attached to it.

Separation was a job the the current EggRecentModel/EggRecentView objects didn’t handle well enough; basically, you could feed you own model to a EggRecentViewGtk widget, but every other widget using the same model would display the same data, with the same sorting, filtering and list size. This behaviour doesn’t really match the MVC paradigm; for instance, using the same TreeModel you could set up many TreeView widgets, each of them showing only some part of the data stored in the model. Filtering, sorting and size are handled by the TreeView code, not by the model.

The same should happen with the recently used resources list; also because this could lead to the creation of a default singleton instance of the manager object, in order to reduce memory and locking issues due to the on disk storage.

Requirements

Using advogato (and my log), since I’m not on the Planet (yet).

I’ve just read Daniel’s post, and since shame is a powerful drive, here’s my non-requested answer.

I’m one guy – quite possibly, the only one – that switched from a full-fledged XML parser like libxml2 to GMarkup. Believe you me: I would have rather stayed within the blissfulness of DOM, within the ease of development of a complex and powerful parser, within the safety of one of the best XML parser around the F/OSS world.

I would have used libxml2 (and in fact, I did begin using it) – because of the work that DV (and every other developer involved) put into libxml2; I state that again: it’s a wonderful library, and it’s great to have it, and for it I’ll have to buy Daniel enough beer to knock him unconscious until Gnome 2.14. :-)

That said, I switched to GMarkup because libxml2 is also a heavy dependency for Gtk+. It’s a 1M+ library, and a dependency some devices can’t afford to have on the chain – I think specifically of embedded devices.

Supporting a platform standard like the storage for recent files and bookmarks only on desktop boxes, because they can afford to have libxml2 pre-installed is not an option.

I remember discussing on the XDG list with Daniel and others about the desktop-bookmark spec; the spec started as a GMarkup format, and then I was convinced to use XBEL. It was a good idea – and once properly standardised, it could lead to data sharing between various environments; a goal that the previous recent-files spec missed badly.

Having GMarkup to parse a valid XBEL stream, even with every limitation it has (UTF-8 only, a bit shaky XML:NS support, etc.), has been possible (even thinkable) just because I had beside my open Gedit window another window with a XBEL parser written using libxml2, reminding me how a full-fledged XML parser should work.

So: thanks Daniel for libxml2 – your great work has been and it’s still really appreciated and useful. Sadly, there are requirements, in this world, and many times they collide with what we would like.

EggBookmarkFile

hacking: That is, the GMarkup-based XBEL parser that should be used to parse “desktop bookmarks” (recent files, filechooser’s bookmarks, default locations, etc.)

Today I’ve worked hard on the namespace parsing mechanism, and even though I feel like it’s a little too fragile, and it doesn’t cover every conceivable XML namespace declaration, it’s a start. At the moment, it parser the XBEL streams that I pass to it, resulting from libxml2.

I’ll tighten up the namespace marking routines, in order for it to be XML:NS compliant, and hope that nobody messes up with his own bookmarks. :-)

life: Tonight, I made bread with Marta. On Sunday there will be a party at her parents’, and she’s going to cater for 25+ people. We had to go shopping for two days – and believe me: it has been very tiresome.

exams: tomorrow, a C exam.

Update 2005-09-16@09:39: the EggBookmarkFile code hit CVS tonight, and so the code in EggRecentManager using it. Profiling is still on, so it’ll require auto-foo patching, but everything works nicely at the moment. I’ll do more work in the API and more profiling on the widgets, as soon as I can compile sysprof.

Profiling GMarkup

Since I’ve been re-writing the XBEL parser in order to use GMarkup in place of libxml2, I’ve also felt the need to get some profile going.

I’ve always felt that one of the biggest bottlenecks in the recent-files was its use of GMarkup for the recent files storage; it appears that I was wrong – at least, as far as GMarkup is concerned.

I’ve added some profile code in my local development tree (I’ve yet to upload it to libegg, since it’s not finished or polished enough), and here’s some results:

P: [ load_from_file:000.000000] - loading from file 'test-file.xbel'
P:  [ load_from_data:000.000000] - begin loading from data
P:   [     parse_data:000.000000] - beginning parse_data
P:  [     parse_data:000.163825] - finished parsing
P: [ load_from_data:000.164151] - finished loading data
P:[ load_from_file:000.167050] - finished loading from file 'test-file.xbel'
782 bookmarks loaded successfully.

(numbers are seconds and microseconds timestamps, as returned by gettimeofday(3), launching from a cold cache state hot cache (the cache wasn’t too cold, as later profiling showed) – with the file and libraries in cache the times are pretty much halved). As you can see, test-file.xbel is an XBEL compliant stream containing 782 bookmarks. I know this is a Poor Man’s Profiling(tm), but better than nothing anyway.

As you cans ee, the parsing of the file with GMarkup takes up ~0.16 seconds, mostly due to the parsing (obviously); I’ve implemented the loading from a file using a buffered read (4KB a time, which means that the read cycle does ~400 iterations in order to store the test-file.xbel file); then I feed the buffer to a function that parses everything from a string in memory.

Next run, I’ll break down the parse_data section into the per-element parsing for the bookmark, mime-type and application elements, which should be the most time-consuming, parsing-wise:

P: [ load_from_file:000.000000] - loading from file 'test-file.xbel'
P:  [ load_from_data:000.000000] - begin loading from data
P:   [     parse_data:000.000000] - beginning parse_data
P:    [ parse_bookmark:000.000000] - begin bookmark element
P:   [ parse_bookmark:000.000486] - end bookmark element
P:    [     parse_mime:000.000000] - begin mime element
P:   [     parse_mime:000.000094] - end mime element
P:    [parse_application:000.000000] - begin application element
P:   [parse_application:000.000124] - end application element
...

As you can see, the bookmark element is the biggest offender, here; this because it has four attributes (href, added, modified and visited), three of which hold a timestamp stored is the ISO8601 format (“YYYYMMDDTHH:MM:SSZ”, where T and Z are fixed characters) – which must be parsed too. The application element holds another four attributes (name, exec, count and timestamp), so it’s the second place where the parser spends most of its time. GMarkup holds the attributes names and values passed to the callbacks in two string arrays, so scanning them is a O(n) operation; better would be accessing them using a attribute_value = g_markup_parse_context_get_attribute (context, element, "attribute_name") which internally uses an hash table: this would make the attribute parsing O(1) – but it would also break API; we could provide a flag for activating a new callback signature:

void (*start_element) (GMarkupParseContext  *context,
		       const gchar          *element_name,
		       GError              **error);
void (*end _element)  (GMarkupParseContext  *context,
		       const gchar          *element_name,
		       GError              **error);

...

G_CONST_RETURN gchar *
g_markup_parse_context_get_attribute (GMarkupParseContext  *context,
				      const gchar          *attribute_name,
				      GError              **error);

This would be neat, even though, given the numbers, this could be seen as micro-optimization.

Consideration:: most time, as shown by cold-cache runs, is spent in I/O; thus, GMarkup shows virtually no bottleneck, even with parsing a bit more intensive than plain dumb 0 == strcmp (element_name, element)) checking.

Surgeon

Today, Marta went to Trento to visit some relatives of hers, so I spent the entire day alone. I’ve studied for a while (I have a OS exam monday, mostly Linux-based on the case studies, even though the kernel shown is a 2.0 at most), and then I spent the entire afternoon rewriting the recent files storage parser using just GMarkup, in order to drop the libxml2 dependency.

Now, after looking at the resulting 3000+ LOC I really need my eyeballs to be removed and cleaned. I had to put tugether a state machine in order to validate the incoming data, and in order to remove part of the if (...) { ... } else if (...) { ... } else { ... } madness resulting of having to deal with an event-based parser.

Anyway, seems that it worked out pretty well.

xavier




xavier

Originally uploaded by ebassi.

A couple of weeks ago, me and Marta had some spare time on our hands, and since I had “retired” my old server (wolverine) before leaving for Amsterdam, I needed a new linux box doing all the dirty stuff for my home network.

I also had a personal project of doing a case mod, so I decided that this was the perfect occasion to try and do something neat and geek.

I retrieved two boxes full of various bricks of Lego®, an old motherboard with a P2@233MHz and 128 MB of RAM, the old 15GB hdd from wolverine[1] and the ISO image of the first Debian Sarge disk.

After a full afternoon for building the case, and another day for configuring the whole thing, here’s xavier, the new server for my home LAN.

[1] which got upgraded to a 80GB disk and declared as a “workstation” when I switched from a poorly updated Debian Unstable to a brand new Ubuntu Hoary Hedgeog.

GoogleTalk

Seems that this is the new hotness.

I’ve had a GMail account for some time now, and GMail really changed the way I use email everyday – at least until Google added POP/SMTP support, then I got back using a MUA (at least when I can use either my notebook or my workstation). Now, with GoogleTalk supporting Jabber, I registered a new account on Gaim, and here it is:

ebassi@gmail.com

(which, incidentally, is also my main email address for all the development-releated issues)

RecentChooserWidget/1

The EggRecentChooserWidget code is in CVS since a bunch of minutes, complete with a context menu allowing the removal of items from the list, the ability to add a custom RecentManager instance instead of using its own (still requires some polishing, but the interesting bits are there), and filtering based on a custom function. Also, I’ve fixed a nasty bug in the parser library that left garbage at the end of the storage file when removing items.

The next step is to allow the creation of RecentManager objects (maybe using runtime loadable modules) so that Gnome could use a GnomeVFS enabled resource monitoring in order to track location/file changes. This would require the implementation of RecentManager as a GInterface, as a plausible approach.

Anyway, now I’ll work on an applet/technology demonstrator, in order to have more to show to the Gtk people.

RecentChooserWidget

I’ve pretty much finished my EggRecentChooserWidget widget, which (I noticed) is roughly modelled on the Document History window of the Gimp:

RecentChooserWidget

Also, as you can see, I’ve enable the filtering code, so that you can now filter the contents of the recently used resources list on the fly:

RecentChooserWidget
Display all recently used resources.
RecentChooserWidget
Display today’s recently used resources.

The EggRecentChooserWidget will also have a dialog window (EggRecentChooserDialog) and a button widget (EggRecentChooserButton), much in the same philosophy of the GtkFileChooser widget set. In the source tree I’ve put a simple test program which basically recreates the same Open Recent menu of the Gimp (except for the little preview, which is replaced by the icon bound to the file’s MIME type).

There are some rough edges I’d like to address before committing the code, but I plan to have the widget and the dialog inside CVS by this week.

On a related news, I’ve began porting the panel to the RecentChooser and RecentManager code; other than changing the Open Recent menu widget with the EggRecentChooserMenu one (complete with a menu item launching the EggRecentChooserDialog dialog in order to display all the recently used resources), I plan to add a list of recently used applications inside the Run Application dialog; using this list as a default, we could hide the currently used entry.

Wild life

You can’t say to have done any good deed, unless you saved a week old cat from death.