Gtk2::SourceView

Finally, after a more then a year of testing inside CVS, Gtk2::SourceView made it to the public.

This Perl module is a wrapper around gtksourceview, the C library used primarily by Gedit. I’ve began writing it in January 2004, after working on Gnome2::Print (libgnomeprint(ui) Perl bindings) and Gnome2::GConf (GConf bindings). After the 1.0 release of the C library, I left it bit-rot inside CVS; this August, Torsten kaffee Schoenfeld did a major overhaul in order to update the build system with the latest and gratest from Glib/Gtk2, and also wrapped many of the remaining objects and methods.

Yesterday evening, I was asked for a release on -perl, and – while at it – I’ve added the accessor methods for the Gtk2::SourceView::TagStyle object, which were left unbound. I’ll do another round in order to wrap the remaining methods and add some more tests inside HEAD.

This also marks my first release on CPAN, so yay for me

Scary Voice Over

Since last tuesday, I’ve done little to nothing on the “hack” side; I’ve began a little project in Perl that should resolve in a backup manager (it’s called Hudson), but mostly I’ve done some code polishing to the libegg/recentchooser code, following the notes about the API review sent me by Matthias Clasen.

During a chat on IRC, he also raised the point of the Windows MRU system, and how to make it work with ours recently used resources list.

So, tonight, with a little help from Christopher Lee, I sent an email with a strategy on the gtk-devel mailing list. I hope to get some feedback from the win32 people.

university: lessons began. I’ve already missed one class (the bus arrived already late, and built up some considerable delay on the way), but stayed until 16 to follow a Network Security class – which began with the teacher requiring us to read and discuss for the next time the six dumbest ideas in computer security – which I did already read beacuse linked on Planet GNOME some time ago.

Yarrr!

    T' me,
    Yo, Ho, Yo, Ho,
    It's "Talk Like A Pirate" Day!
    That time in September when sea dogs remember
    That grown-ups still know how ta play!
    When wenches are curvy and dogs are all scurvy
    And a soft-wear patch covers your eye,
    Ta hell with our jobs, for one day we're all swabs
    And buccaneers all till we die!

    So hoist up the mainsils and shut down your brain cells,
    They only would get in the way,
    Avast there, me hearty, we're havin' a party,
    It's "Talk... Like... A Pirate" Day!

Talk Like a Pirate Day ’05

Cleaning up

life: after the “exam craze” (two in five days) that took the best part of the week, me and Marta have resumed cleaning up the house. Since I have moved in with her, there has been so much work to do that we haven’t really felt the whole “moving in” part; but I assume it will get us, sooner or later. Next week we’re going to visit my parents, and get some of my books/DVDs/whatever that I left behind.

hacking: after moving the RecentManager to new new BookmarkFile parser/writer object, I resumed working on the actual separation of the recently used resources viewer widgets (implementing the RecentChooser interface) and the list controller (the RecentManager object). Last week I removed every trace of sorting and limit handling code involving the manager object from the widgets, and recoded them directely into the widgets themselves. Thus, it’s now possible to have a single manager instance and multiple viewers bound to it, instead of creating a manager object per widget:

RecentChooserWidgets
Two widgets, one manager

Sorting, filtering and the list limit property are still present in the RecentManager object; this way you could have a custom sorting and filtering functions set into a manager, and then multiple views attached to it.

Separation was a job the the current EggRecentModel/EggRecentView objects didn’t handle well enough; basically, you could feed you own model to a EggRecentViewGtk widget, but every other widget using the same model would display the same data, with the same sorting, filtering and list size. This behaviour doesn’t really match the MVC paradigm; for instance, using the same TreeModel you could set up many TreeView widgets, each of them showing only some part of the data stored in the model. Filtering, sorting and size are handled by the TreeView code, not by the model.

The same should happen with the recently used resources list; also because this could lead to the creation of a default singleton instance of the manager object, in order to reduce memory and locking issues due to the on disk storage.

Requirements

Using advogato (and my log), since I’m not on the Planet (yet).

I’ve just read Daniel’s post, and since shame is a powerful drive, here’s my non-requested answer.

I’m one guy – quite possibly, the only one – that switched from a full-fledged XML parser like libxml2 to GMarkup. Believe you me: I would have rather stayed within the blissfulness of DOM, within the ease of development of a complex and powerful parser, within the safety of one of the best XML parser around the F/OSS world.

I would have used libxml2 (and in fact, I did begin using it) – because of the work that DV (and every other developer involved) put into libxml2; I state that again: it’s a wonderful library, and it’s great to have it, and for it I’ll have to buy Daniel enough beer to knock him unconscious until Gnome 2.14. :-)

That said, I switched to GMarkup because libxml2 is also a heavy dependency for Gtk+. It’s a 1M+ library, and a dependency some devices can’t afford to have on the chain – I think specifically of embedded devices.

Supporting a platform standard like the storage for recent files and bookmarks only on desktop boxes, because they can afford to have libxml2 pre-installed is not an option.

I remember discussing on the XDG list with Daniel and others about the desktop-bookmark spec; the spec started as a GMarkup format, and then I was convinced to use XBEL. It was a good idea – and once properly standardised, it could lead to data sharing between various environments; a goal that the previous recent-files spec missed badly.

Having GMarkup to parse a valid XBEL stream, even with every limitation it has (UTF-8 only, a bit shaky XML:NS support, etc.), has been possible (even thinkable) just because I had beside my open Gedit window another window with a XBEL parser written using libxml2, reminding me how a full-fledged XML parser should work.

So: thanks Daniel for libxml2 – your great work has been and it’s still really appreciated and useful. Sadly, there are requirements, in this world, and many times they collide with what we would like.

EggBookmarkFile

hacking: That is, the GMarkup-based XBEL parser that should be used to parse “desktop bookmarks” (recent files, filechooser’s bookmarks, default locations, etc.)

Today I’ve worked hard on the namespace parsing mechanism, and even though I feel like it’s a little too fragile, and it doesn’t cover every conceivable XML namespace declaration, it’s a start. At the moment, it parser the XBEL streams that I pass to it, resulting from libxml2.

I’ll tighten up the namespace marking routines, in order for it to be XML:NS compliant, and hope that nobody messes up with his own bookmarks. :-)

life: Tonight, I made bread with Marta. On Sunday there will be a party at her parents’, and she’s going to cater for 25+ people. We had to go shopping for two days – and believe me: it has been very tiresome.

exams: tomorrow, a C exam.

Update 2005-09-16@09:39: the EggBookmarkFile code hit CVS tonight, and so the code in EggRecentManager using it. Profiling is still on, so it’ll require auto-foo patching, but everything works nicely at the moment. I’ll do more work in the API and more profiling on the widgets, as soon as I can compile sysprof.

Profiling GMarkup

Since I’ve been re-writing the XBEL parser in order to use GMarkup in place of libxml2, I’ve also felt the need to get some profile going.

I’ve always felt that one of the biggest bottlenecks in the recent-files was its use of GMarkup for the recent files storage; it appears that I was wrong – at least, as far as GMarkup is concerned.

I’ve added some profile code in my local development tree (I’ve yet to upload it to libegg, since it’s not finished or polished enough), and here’s some results:

P: [ load_from_file:000.000000] - loading from file 'test-file.xbel'
P:  [ load_from_data:000.000000] - begin loading from data
P:   [     parse_data:000.000000] - beginning parse_data
P:  [     parse_data:000.163825] - finished parsing
P: [ load_from_data:000.164151] - finished loading data
P:[ load_from_file:000.167050] - finished loading from file 'test-file.xbel'
782 bookmarks loaded successfully.

(numbers are seconds and microseconds timestamps, as returned by gettimeofday(3), launching from a cold cache state hot cache (the cache wasn’t too cold, as later profiling showed) – with the file and libraries in cache the times are pretty much halved). As you can see, test-file.xbel is an XBEL compliant stream containing 782 bookmarks. I know this is a Poor Man’s Profiling(tm), but better than nothing anyway.

As you cans ee, the parsing of the file with GMarkup takes up ~0.16 seconds, mostly due to the parsing (obviously); I’ve implemented the loading from a file using a buffered read (4KB a time, which means that the read cycle does ~400 iterations in order to store the test-file.xbel file); then I feed the buffer to a function that parses everything from a string in memory.

Next run, I’ll break down the parse_data section into the per-element parsing for the bookmark, mime-type and application elements, which should be the most time-consuming, parsing-wise:

P: [ load_from_file:000.000000] - loading from file 'test-file.xbel'
P:  [ load_from_data:000.000000] - begin loading from data
P:   [     parse_data:000.000000] - beginning parse_data
P:    [ parse_bookmark:000.000000] - begin bookmark element
P:   [ parse_bookmark:000.000486] - end bookmark element
P:    [     parse_mime:000.000000] - begin mime element
P:   [     parse_mime:000.000094] - end mime element
P:    [parse_application:000.000000] - begin application element
P:   [parse_application:000.000124] - end application element
...

As you can see, the bookmark element is the biggest offender, here; this because it has four attributes (href, added, modified and visited), three of which hold a timestamp stored is the ISO8601 format (“YYYYMMDDTHH:MM:SSZ”, where T and Z are fixed characters) – which must be parsed too. The application element holds another four attributes (name, exec, count and timestamp), so it’s the second place where the parser spends most of its time. GMarkup holds the attributes names and values passed to the callbacks in two string arrays, so scanning them is a O(n) operation; better would be accessing them using a attribute_value = g_markup_parse_context_get_attribute (context, element, "attribute_name") which internally uses an hash table: this would make the attribute parsing O(1) – but it would also break API; we could provide a flag for activating a new callback signature:

void (*start_element) (GMarkupParseContext  *context,
		       const gchar          *element_name,
		       GError              **error);
void (*end _element)  (GMarkupParseContext  *context,
		       const gchar          *element_name,
		       GError              **error);

...

G_CONST_RETURN gchar *
g_markup_parse_context_get_attribute (GMarkupParseContext  *context,
				      const gchar          *attribute_name,
				      GError              **error);

This would be neat, even though, given the numbers, this could be seen as micro-optimization.

Consideration:: most time, as shown by cold-cache runs, is spent in I/O; thus, GMarkup shows virtually no bottleneck, even with parsing a bit more intensive than plain dumb 0 == strcmp (element_name, element)) checking.

Surgeon

Today, Marta went to Trento to visit some relatives of hers, so I spent the entire day alone. I’ve studied for a while (I have a OS exam monday, mostly Linux-based on the case studies, even though the kernel shown is a 2.0 at most), and then I spent the entire afternoon rewriting the recent files storage parser using just GMarkup, in order to drop the libxml2 dependency.

Now, after looking at the resulting 3000+ LOC I really need my eyeballs to be removed and cleaned. I had to put tugether a state machine in order to validate the incoming data, and in order to remove part of the if (...) { ... } else if (...) { ... } else { ... } madness resulting of having to deal with an event-based parser.

Anyway, seems that it worked out pretty well.