Thread debugging should be easy; there’s advanced tools like helgrind and chess, so it’s a solved problem, right?
Once upon a time, FFmpeg merged the mt-branch, which allowed frame-level multi-threading. While one CPU core decodes frame 1, the next CPU core will decode frame 2 in parallel (and so on for any other CPU cores you have). This might sound somewhat odd, because don’t most video codecs use motion vectors to access data in previously coded reference frames? Yes, they do, but we can simply add a condition variable so that thread 2 waits for the relevant data in the reference frame (concurrently decoded by thread 1) to have finished reconstructing that data, and all works fine. Although this might seem to destroy the whole point of concurrency, it works well in practice (because motion vectors tend to not cross a whole frame).
Heisenbugs and their tools
Like any other software feature, this feature contained bugs. Threading bugs have the funny name “heisenbugs”: by virtue of the scheduling of instructions on your 2 CPU cores not being identical between different runs, the interaction between 2 threads will not be identical between 2 runs of exact the same commandline. In FFmpeg, we use an elaborate framework knows as FATE to test for video decoder regressions, and we set up some stations to specifically test various multithreading configurations. As you’d expect with heisenbugs, some of these would occasionally fail a test, but otherwise run OK. So how do you debug this?
Let me start with chess. Chess is actually an extension to MSVC, so I actually first had to port FFmpeg to MSVC (which was also useful for Chrome). With that problem out of the way, this should be easy right? Last release 5 years ago, forum dead as of 2011, right… Anyway, what chess attempts to do, is to settle a fixed scheduling path between your different threads, such that they will interact in the same way between multiple runs, thus allowing you to consistently reproduce the same bug for debugging purposes. That’s incredibly helpful, but I never tried it out at the end. I’m looking forward to this appearing in some next version of MSVC.
So, helgrind. FATE actually has a helgrind station, and it sucks, reporting 1000s of potential races for files that have never failed decoding (that is, they are pixel-perfect every single time). Is there a race? Who knows, maybe. But I’m not interested in debugging theoretical races, I want a tool that helps me debug stuff that is happening. Imagine how infuriating asan, valgrind or gdb would be if they told us about stuff that might crash instead of the crash we’re investigating. (Now, post-hoc, it turns out that helgrind did indeed identify one of the bugs causing the heisenbugs in ffmpeg-mt, but it was lost in the noise.)
So now that all our best tools are not all that helpful, what to do? I ended up doing it the brute-force way (In this example, I’m debugging the h264-conformance-cama2_vtc_b FATE test in FFmpeg):
$ make THREADS=2 V=1 fate-h264-conformance-cama2_vtc_b [..] ffmpeg -nostats -threads 2 -thread_type frame -i cama2_vtc_b.avc -f framecrc -
Note that it didn’t fail! So now that we know what commandline it’s executing, let’s change that into something that brute-forces a heisenbug out of its hiding. First, let’s generate a known-good reference:
$ ./ffmpeg -threads 1 -i cama2_vtc_b.avc -f md5 -nostats -v 0 - MD5=ec33975ec4d2fccc55485da3f37a755b
Note that that used only 1 thread, since it serves as our known-good reference. Lastly, let’s see how (and how often) we can make that fail by running it as often as it takes until it fails:
$ cat test.sh i=0 while [ true ]; do MD5=$(./ffmpeg -threads 2 -thread_type frame \ -i cama2_vtc_b.avc -f md5 -nostats -v 0 -) if [ "$MD5" != "MD5=ec33975ec4d2fccc55485da3f37a755b" ]; then echo "$i failed! $MD5" else printf "$i\r" fi ((i++)) done $ bash test.sh 2731 failed! MD5=9cdbf390e5aed1e723c7c3a2def96377 3681 failed! MD5=64a112a2cfc61610a5f75c65293bbbbc 5892 failed! MD5=10224e406d4a2451c60e642a24fc3dce
And we have a reproducible failing testcase! One problem with thread debugging is failures are hard to reproduce, and another is that we may be looking at different failures at the same time (as is demonstrated by the different outputs for the 2 shown failures). However, we’d like to focus on runs that fail in one particular type of way (assuming that the cause for identical-output failures is consistent), thus taking the heisen- out of the bug. We can adjust the script slightly to focus on any one of our choosing (it turned out that all failures for this particular FATE test were caused by the same bug, displaying itself in slightly different ways).
$ cat test2.sh i=0 while [ true ]; do MD5=$(./ffmpeg -threads 2 -thread_type frame \ -i cama2_vtc_b.avc -f md5 -nostats \ -v 0 - -y -f yuv4mpegpipe out.y4m) if [ "$MD5" != "MD5=64a112a2cfc61610a5f75c65293bbbbc" ]; then echo "$i failed! $MD5" break elif [ "$MD5" != "MD5=ec33975ec4d2fccc55485da3f37a755b" ]; then echo "$i failed (the wrong way): $MD5" else printf "$i\r" fi ((i++)) done $ bash test2.sh 2201 failed (the wrong way): MD5=9cdbf390e5aed1e723c7c3a2def96377 9587 failed! MD5=64a112a2cfc61610a5f75c65293bbbbc
And with the heisen-part out of the way, we can now start debugging this as any other bug (printf debugging is easy this way, but you could even get fancy and try to attach to gdb when a particular situation occurs). Below is a comparison of ref.y4m (left, decoded with -threads 1) and out.y4m (right, delta from left with enhanced contrast). The differences are the 3 thin horizontal black/white lines towards the top of the frame. Further research by focussing more narrowly on the decoding process for these specific blocks (using the same technique) led to this fix, and the same technique was also used to fix two other heisenbugs.
How much time do 2731 iterations take? Can this be automated / integrated into FATE?
Excuse me being pedantic, but heisenbugs are actually bugs that do not occur when you try to investigate them. Although threading bugs are often heisenbugs because you modify the parameters of the race or something when you try to debug them, they are not all heisenbugs and not all heisenbugs are threading bugs =).
@pete it depends very much on the length of the individual fate test – this particular one runs at about 5-10 iterations/second on my machine, so 2731 iterations takes a couple of minutes.