I got sysprof up and running tonight along with Federico‘s latest patches to Pango. Just looking at the shaper, it’s getting much harder to find things to optimize. Still, we had a great time trying. Sysprof is really fun to use, and it was fun hanging out on gimpnet #performance.
My test case was pango-language-profile.c from Federico’s last blog post, using --lang=es. I spent some time on pangocairo-fcfont.c where pango_font_get_glyph_extents was sitting at 7% or so. Here was the initial output from sysprof:
Using gcc-4.0.2 and -Os, the memcpy() was added by the compiler to copy some PangoRectangle structures. I expanded those to just copy the individual fields, and the time basically went to 0. While I suspect this was just a side-effect of -Os, expanding the copy is easy and I don’t think it hurts the readability of the code. Maybe we should consider it.
The more interesting optimization was with the hash table used to cache the glyph extents calculated through Cairo. It starts at a size of 11 and grows by primes, using the glyph index as the hash.
My proposal is to instead use a fixed-size hash with a large fixed-size hash table, and instead of growing by primes, just give it a power-of-two size. I figure characters in any given sentence will mostly be from the same alphabet, clumped in some area of the unicode space. I’m also assuming that many languages start on 256-glyph aligned chunks or so, but I figure this is less important. For my patch, I chose 1024 as a reasonable size for the array per font, giving lots of room without being much of a memory hog over the original. Here’s the result: