23.10.2006 Beast and unit testing

See: http://blogs.testbit.eu/timj/2006/10/23/23102006-beast-and-unit-testing/ (page moved)

There’s been quite some hacking going on in the Beast tree recently. Stefan Westerfeld kindly wrote up a new development summary which is published on the Beast front page.

In particular, we’ve been hacking on the unit tests and tried to get make check invocations run much faster. To paraphrase Michael C. Feathers from his very interesting book Working Effectively with Legacy Code on unit tests:

Unit tests should run fast – a test taking 1/10th of a second is a slow unit test.

Most tests we had executed during make check took much longer. Beast has some pretty sophisticated test features nowadays, i.e. it can render BSE files to WAV files offline (in a test harness), extract certain audio features from the WAV files and compare those against saved feature sets. In other places, we’re using tests that loop through all possible input/output values of a function in brute force manner and assert correctness over the full value range. Adding up to that, we have performance tests that may repeatedly call the same functions (often thousands or millions of times) in order to measure their performance and print out measurements.

These kind of tests are nice to have for broad correctness testing, especially around release time. However we did run into the problem of make check being less likely executed before commits, because running the tests would be too slow to bother with. That of course somewhat defeats the purpose of having a test harness. Another problem that we ran into were the intermixing of correctness/accuracy tests with performance benchmarks. These often sit in the same test program or even the same function and are hard to spot that way in the full output of a check run.

To solve the outlined problems, we changed the Beast tests as follows:

* All makefiles support the (recursive) rules: check, slowcheck, perf, report (this is easily implemented by including a common makefile).

* Tests added to TESTS are run as part of check (automake standard).

* Tests added to SLOWTESTS are run as part of slowcheck with --test-slow.

* Tests added to PERFTESTS are run as part of perf with --test-perf.

* make report runs all of check, slowcheck and perf and captures the output into a file report.out.

* We use special test initialization functions (e.g. sfi_init_test(argc,argv)) which do argument parsing to handle --test-slow and --test-perf.

* Performance measurements are always reported by the treport_maximized(perf_testname,amount,unit) function or the treport_minimized() variant thereof, depending on whether the measured quantity is desired to be maximized or minimized. These functions are defined in birnettests.h and print out quantities with a magic prefix that allows grepping for performance results.

* make distcheck enforces a successful run of make report.

Together, these changes have allowed us to easily tweak our tests to have faster test loops (if !test_slow) and to conditionalize lengthy performance loops (if test_perf). So make check is pleasingly fast now, while make slowcheck still runs all the brute force and lengthy tests we’ve come up with. Performance results are now available at the tip of:

	$ make report
	[...]
	$ grep '^#TBENCH=' report.out
	#TBENCH=mini:         Direct-AutoLocker:      +83.57            nSeconds 
	#TBENCH=mini:         Birnet-AutoLocker:     +104.574           nSeconds 
	#TBENCH=maxi:  CPU Resampling FPU-Up08M:     +260.4562325006    Streams 
	#TBENCH=maxi:  CPU Resampling FPU-Up16M:     +184.19598452754   Streams 
	#TBENCH=maxi:  CPU Resampling SSE-Up08M:     +399.04229848364   Streams 
	#TBENCH=maxi:  CPU Resampling SSE-Up16M:     +338.5240352065    Streams 

The results are tailored to be parsable by performance statistics scripts. So writing scripts to present performance report differences and to compare performance reports between releases is now on the TODO list. ;-)