Disclaimer: Dear Phoronix, this post is totally for you.
First, a graph
This graph is a comparison of relative performance of the xlib backend on various computers with an Intel GPU. What it shows is the relative performance of using the X server compared to just using the computer’s CPU for rendering the real-world benchmarks we Cairo people compiled. The red bar in the center is the time the CPU renderer takes. If the bar goes to the left, using the X server is slower, if it goes to the right, using the X server is faster. Colors represent the different computers the benchmark ran on (see the legend on the left side), and the numbers tell you how much slower or faster the test ran. So the first number “-3.5” means that the test ran 3.5x slower using the X server than it ran just using the CPU. And the graph shows very impressively that the GPU is somewhere between 13 times slower and 5 times faster than the CPU depending on the task you do. And even for the exact same task, it can be between 7 times slower and 2 times faster depending on the computer it’s running on. So should you use the GPU for what you do?
Well, what’s actually benchmarked here?
The benchmarks are so-called traces recorded using
cairo-trace APPLICATION. This tool will be available with Cairo 1.10. It works a bit like
strace in that it intercepts all cairo calls of your application. It then records them to a file and that file can later be played back for benchmarking rendering performance. We’ve so far taken various test cases we considered important and added them to our test-suite. They contain everything from watching a Video on Youtube (swfdec-youtube) over running Mozilla’s Talos testsuite to scrolling in gnome-terminal. The cairo-traces README explains them all. The README also explains how to contribute traces. And we’d love to have more useful traces.
Yes, OpenGL is apparently the solution to all problems with rendering performance. Which is why I worked hard in recent weeks to improve Cairo’s GL backend. The same graph as above, just using GL instead of Xlib looks like this:
What do we learn? Yes, it’s equally confusing, from 30 times slower to 3 times faster. And we have twice as fast and 1.5x slower on the same test depending on hardware. So it’s the same mess all over again. We cannot even say if using the CPU is better or worse than using the GPU on the same machine. Take the EeePC results for the firefox-planet-gnome test: Xlib 2.5x faster, GL is 4.5x slower. Yay.
But that’s only Intel graphics
Yeah, we’re all working on Intel chips in Cairo development. Which might be related to the fact that we all either work for Intel or get laptops with Intel GPUs. Fortunately, Chris had a Netbook with an nvidia chip where he ran the tests on – once with the nvidia binary driver and once with the Open Source nouveau driver. Here’s the Xlib results:
Right, that looks as unconclusive as before. Maybe a tad worse than on Intel machines. And nouveau seems to be faster than the binary nvidia driver. Good work! But surely for GL nvidia GPUs will rule the world, in particular with the nvidia binary driver. Let’s have a look:
Impressive, isn’t it? GL on nvidia is slow. There’s not a single test where using the CPU would be noticably slower. And this is on a Netbook, where CPUs are slow. And the common knowledge that the binary drivers are better than nouveau aren’t true either: nouveau is significantly better. But then, we had a good laugh because we convinced ourselves that Intel GPUs seem to be a far better choice than nVidia GPUs currently.
That’s not enough data!
Now here’s the problem: To have a really really good laugh, we need to be sure that Intel is better than nVidia and ATI. And we don’t have enough data points to prove that conclusively yet. So if you have a bunch of spare CPU cycles and are not scared of building development code, you could run these benchmarks on your machine and send them to me. That way, you help the Cairo team getting a better overview of where we stand and a good idea of what areas need improvement the most. Here’s what you need to do:
- Make sure your distro is up to date. Recent releases – like Fedora 13 or Ubuntu Lucid – are necessary. If you run development trees of everything, that’s even better.
- Start downloading http://people.freedesktop.org/~company/stuff/cairo-traces.tar.bz2 – it’s a 103MB file that contains the benchmark traces from the cairo-traces repository.
- Install the development backages required to build Cairo. There’s commands like
apt-get build-dep cairoor
yum-builddep cairothat do that.
- Install development packages for OpenGL.
- Check out current Cairo from git using
git clone git://anongit.freedesktop.org/cairo
./autogen.sh --enable-glin the checked out cairo dir. In the summary output at the end, ensure that the Xlib and the GL backends will be built.
- change into the perf/ directory.
- Unpack the cairo-traces.tar.bz2 that hopefully finished downloading by now using
tar xjf cairo-traces.tar.bz2. This will create a benchmark directory containing the traces.
CAIRO_TEST_TARGET=image,xlib,gl ./cairo-perf-trace -r -v -i3 benchmark/* > `git describe`-`date '+%Y%m%d'`.`hostname`.perf. Ideally, you don’t want to use the computer while the benchmark runs. It will output statistics to the terminal while it runs and generate a .perf file, named something like “1.9.8-56-g1373675-20100624.lvs.perf”.
- Attach the file into an email and send it to email@example.com, preferrably with information on your CPU, GPU and which distribution and drivers you were running the test on.
Note: The test runs will take a while. Depending on your drivers and CPU, it’ll take between 15 minutes and a day – it takes ~half an hour on my 6months old X200 laptop.
What’s going to happen with all the benchmarks?
Depending on the feedback I get for this, I intend to either weep into a pillow because noone took the time to run the benchmarks, write another blog post with pretty graphics or even do a presentation about it at Linux conferences. Because most of all I intend to make us developers learn from it so that we can improve drivers and libraries so that the tests are 10x faster when you upgrade your distro the next time.
Note: I originally read the nvidia graph wrong and assumed the binary drivers were faster for OpenGL. I double-checked my data: This is not true. nouveau is faster for the Cairo workloads than the nvidia driver.