Profiling GMarkup – Take 3

Yesterday, I was attending the GTK developers meeting and I was asked by Federico what performances the BookmarkFile parser yields. I replied:

  ebassi:  f_lunch, I/O plays a big role. a file with 3000 bookmarks gets
           parsed in 2 seconds, more or less (cold cache).
  ebassi:  f_lunch, which, I think, is as good as it gets
  ebassi:  (3000 bookmarks == ~1.5 MB on disk)
  f_lunch: ebassi: any estimates for a 120 KB file?  my ~/.recently-used
           is 120 KB right now
  f_lunch: that would be 0.2 sec
  f_lunch: too slow :)

I did some more tests, with my profiling code turned off, just in order to measure raw timings (using /usr/bin/time), and it seems that I was wrong (and with this being my code and my brain-child I’m somewhat concerned on the state of my memory). The numbers returned by /usr/bin/time are 270% better (with cold cache):

  $ /usr/bin/time ./testbookmarkfile
  3360 bookmarks loaded successfully.
  0.74user 0.01system 0:00.83elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+1297minor)pagefaults 0swaps

And, in a warm cache case it yields timings that are 500% better than what I presumed:

  $ /usr/bin/time ./testbookmarkfile
  3360 bookmarks loaded successfully.
  0.35user 0.01system 0:00.43elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k
  0inputs+0outputs (0major+1298minor)pagefaults 0swaps

The file-size is:

  $ ls -lk test-file.xbel | awk '{ print $5; }'

Which is ten times the size of Federico’s ~/.recently-used file. And this, I think, is as good as it gets. Further optimizations would be moving the namespace resolution inside GMarkup, and also the constant time accessing to an element’s attributes. These optimizations would remove the O(n) (where n := number of attributes) scans of the attributes list that the GBookmarkFile parser does when checking the attributes payload and when building the namespaces resolution hashtable.

Further optimization could target the I/O which, as I said on -devel, plays a big role for reading files bigger than the size of a page of VM; I’d like to try using mmap() when loading files, and check what timings this yields. Also, using mmap() and g_file_set_contents() would remove the need of file locking when reading/writing; right now, I’m using lockf(), which means that I must use O_RD | O_WR when I want to lock a file when reading it – this supposedly generates spurious write events in file monitoring systems like FAM.