BOLT’ing Libraries

I did a little experimenting with BOLT today to optimize libraries post-link.

I’m not an expert on it or anything, but it seems to allow you to reorder functions in your executable/library based on feedback from perf record and some special post-processing. You can merge multiple runs together in case you have different workloads you’d like to optimize for. But in the end, hot functions get placed near each other to reduce instruction cache pressure.

In all, it says you can expect gains up-to about 7% which fits in line with my experiment. For example, I open gnome-text-editor with a large C file, the overview map enabled, and syntax highlighting on. Then hold down Page Down to the bottom, Page Up to the top, and then Page Down back to the bottom.

The first pass through the source code is usually a little slower because you’re doing the incremental syntax-highlighting process.

After using BOLT on Pango, I saw roughly a 6% reduction in time spent measuring text (which is one of the most expensive parts of the overview map).

To test this out, I did have to play with my CFLAGS to have -Wl,--emit-relocs linker option. After that and a meson setup --wipe $SRCDIR things seem to work as expected.

Trying it For Yourself

sudo dnf install llvm-bolt perf

perf record -e cycles:u -j any,u -o perf.data -- gnome-text-editor

# you can do this for any of the binaries
perf2bolt -p perf.data -o perf.fdata ~/.jhbuild/lib/libpango-1.0.so

llvm-bolt ~/.jhbuild/lib/libpango-1.0.so -o libpango-1.0.so.bolt -data=perf.fdata -reorder-blocks=ext-tsp -reorder-functions=hfsort -split-functions -split-all-cold -split-eh -dyno-stats

mv libpango-1.0.so.bolt ~/.jhbuild/lib/libpango-1.0.so

Rinse and repeat.