Some recent changes to the compositor made it feel kinda slow and laggy. Not in a noticable way but more in a subconscious something feels wrong way. I tested xfwm and it seemed fine running as a compositor, and given that Metacity’s compositor has the same heritage as xfwm’s and I’ve been checking the xfwm code to see how they do things, I thought that was suggesting that there was something wrong with the Metacity one. I couldn’t see anything abnormal or strange looking at/comparing the codebase, so I turned to valgrind and oprofile. Running callgrind didn’t really show anything about the compositor code though, but as always with callgrind stuff gets lost in amongst the small functions like g_type_check_instance_is_a and the wonderfully mysterious <cycle 8>
My expectations were that some of the compositor functions would be topping the profile, given that the redrawing functions need to run anytime something changes on the desktop (modulo that they’re called from an idle handler so some redraw events are merged together) but what Oprofile said was
samples % symbol name
13908 21.8501 pos_eval_helper
10471 16.4504 pos_tokenize
7621 11.9729 meta_color_spec_render
5408 8.4962 compositor_idle_cb
3785 5.9464 fix_region
1559 2.4493 meta_draw_op_draw_with_env
1511 2.3738 free_tokens
1480 2.3251 meta_parse_position_expression
Eh? What are pos_eval_helper, pos_tokenize and meta_color_spec_render and why are they so high up the profile? Well, it turns out they’re part of the theme parsing code (Metacity has a very powerful, but complex theme format dontcha know?), but some more debugging showed that the theme expressions were being reparsed and re-evaluated every time Metacity had to draw a window frame. The impressive screenshot for this is:
Just look how many times pos_tokenize and pos_eval_helper are called when a draw op is drawn. That is one complicated function call chart.
Looking through the code I saw that draw operations were stored as their string format, like “2 * width + Bmin / height % 7″. These expressions were first tokenised (in pos_tokenize) into constituent parts (‘2′ ‘*’ ‘width’ etc) and they were then evaluated. Thing is that these strings cannot be changed at all so they can be tokenised when loading the theme rather than every time they are evaluated.
meta_color_spec_render is a similar problem. Colours can be defined as a basic colour, a GTK theme colour, a shade of a colour, or from two colours blended together. With shaded and blended colours they were generated every time Metacity had to use that colour, which is an obvious waste of time as the colour specs cannot be changed after they are created and so the colours should just be generated whenever the spec is created and then it becomes a simple function to get the correct colours.
All this is filed as bug #500279 and the patch attached to the bug fixes these two issues.
pos_eval_helper is the function that calculates the theme expressions from their tokenised state and it still gets called every time the theme needs redrawn, which sort of makes sense, as the variables may have changed since the last time, but it looks to me like the variables are all related to the width/height of the window, so it may be that we can make it so that the expressions are only re-evaluated when the width/height changes. I’ll have to look into it.
Anyway, with the patch from bug #500279 applied this is what the profile looks like now…
9355 32.4815 compositor_idle_cb
7129 24.7526 pos_eval_helper
1070 3.7151 meta_parse_size_expression
1020 3.5415 event_callback
604 2.0971 meta_compositor_process_event
597 2.0728 .plt
542 1.8819 win_extents
531 1.8437 pos_eval
519 1.8020 meta_frame_style_draw
495 1.7187 meta_draw_op_draw_with_env
The compositor functions are up where they should be…getting rid of that pos_eval_helper would be nice though, as evinced by this screenshot of the same function as above, but with the patch applied. The function is a lot less busy, which is a good thing, but the three sets of calls to pos_eval_helper could possibly be reduced somehow.
Anyway, the compositor seems faster, and I didn’t even touch that code this time :)