Yesterday, I was attending the GTK developers meeting and I was asked by Federico what performances the BookmarkFile parser yields. I replied:
ebassi: f_lunch, I/O plays a big role. a file with 3000 bookmarks gets parsed in 2 seconds, more or less (cold cache). ebassi: f_lunch, which, I think, is as good as it gets ebassi: (3000 bookmarks == ~1.5 MB on disk) f_lunch: ebassi: any estimates for a 120 KB file? my ~/.recently-used is 120 KB right now f_lunch: that would be 0.2 sec f_lunch: too slow :)
I did some more tests, with my profiling code turned off, just in order to measure raw timings (using /usr/bin/time
), and it seems that I was wrong (and with this being my code and my brain-child I’m somewhat concerned on the state of my memory). The numbers returned by /usr/bin/time
are 270% better (with cold cache):
$ /usr/bin/time ./testbookmarkfile 3360 bookmarks loaded successfully. 0.74user 0.01system 0:00.83elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1297minor)pagefaults 0swaps
And, in a warm cache case it yields timings that are 500% better than what I presumed:
$ /usr/bin/time ./testbookmarkfile 3360 bookmarks loaded successfully. 0.35user 0.01system 0:00.43elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1298minor)pagefaults 0swaps
The file-size is:
$ ls -lk test-file.xbel | awk '{ print $5; }' 1565
Which is ten times the size of Federico’s ~/.recently-used file. And this, I think, is as good as it gets. Further optimizations would be moving the namespace resolution inside GMarkup, and also the constant time accessing to an element’s attributes. These optimizations would remove the O(n) (where n := number of attributes) scans of the attributes list that the GBookmarkFile parser does when checking the attributes payload and when building the namespaces resolution hashtable.
Further optimization could target the I/O which, as I said on #gtk-devel, plays a big role for reading files bigger than the size of a page of VM; I’d like to try using mmap() when loading files, and check what timings this yields. Also, using mmap() and g_file_set_contents() would remove the need of file locking when reading/writing; right now, I’m using lockf()
, which means that I must use O_RD | O_WR
when I want to lock a file when reading it – this supposedly generates spurious write events in file monitoring systems like FAM.
[development]
[gtk+gnome]
[xbel]
[hacking]