I’ve recently become interested in codecs. We all know the brilliant FFmpeg project, which has provided free (as per FSF definition) implementations of a variety of popular codecs, such as Windows Media or MPEG/H.26x.
I’ve started to study one member of the Windows Media Audio family, the WMAVoice codec. I’m studying an integer version of it, which means that multiplications are done by mult/shift pairs. For example, if you imagine a (32-bit) integer with 1 bit sign (+/-), 1 bit for pre-digit numbers and the other 30 bits being fractional, then the number 1.0 would be encoded as 0x3FFFFFFF. To multiply 1.0 * 1.0 (which should give 1.0 as result) in this integer scheme, you’d use:
#define mulFrac30(a, b) (((int64_t) a * (int64_t) b) >> 30)
By changing the “30” in this macro, you could multiply numbers with different amounts of fractional bits. Those trying it themselves will notice that mulFrac30(0x3FFFFFFF, 0x3FFFFFFF) results in 0x3FFFFFFE (i.e. 0.999999999), so the result is not precise. But it’s close enough to be useful on embedded devices where floating-point operations are expensive. (It’s true that I’m basically demonstrating (n * n) / ((n + 1) * (n + 1)) here, i.e. you can get around it for this particular example by encoding 1.0 as 0x40000000 instead of 0x3FFFFFFF, but similar behaviour would show up for other numbers.)
This “rounding error” becomes more prominent if you decrease the number of fractional bits. For example, 1.0 * 1.0 in frac-16 integers (0xFFFF) = 0xFFFE, which is 0.99997. It also gets worse as you convert between fractional types, e.g. to go from a frac-16 1.0 (0xFFFF) to a frac-30 1.0, would be done by:
#define convertFrac16ToFrac30(x) (x << 14)
And the result of that on 1.0 in Frac-16 (0xFFFF) would be 0x3FFFC000. Why is that a problem? Well, look at this particular piece of code present in the codec:
void function(int array[], int v, int size)
{
int stp = mulFrac31(v, 0xCCCCCC /* 0.00625 */),
min = mulFrac31(v, 0x189374 /* 0.00075 */),
max = mulFrac31(v, 0x3FE76C8B /* 0.49925 */), n;
int array2[size], array3[size];
for (n = 0; n < size; n++)
array2[n] = array3[n] =
mulFrac31(array1[n], v);
array3[0] = MAX(array3[0], min);
for (n = 1; n < size; n++)
array3[n] = MAX(array3[n - 1] + stp,
array3[n]);
array3[size - 1] = MIN(array3[size - 1],
max);
for (n = 0; n < size; n++)
if (array2[n] != array3[n])
array1[n] = divFrac31(array3[n], v);
}
This loosely translates to the simpler float version:
void function(float array[], int v, int size)
{
array1[0] = MAX(array1[0], 0.00075);
for (int n = 1; n < size; n++)
array1[n] = MAX(array1[n - 1] + 0.00625,
array1[n]);
array1[size - 1] = MIN(array1[size - 1], 0.49925);
}
But there's a catch. The original code converts from a scale as given by the input array1 to a scale provided by the magnitude of the "v" variable, whatever "v" may be. Turns out that "v" is the samplerate. For this type of codec, samperate is generally low, e.g. 8000Hz. Therefore, if the value in array1 is e.g. 0.25 (0x1FFFFFFF), then in array2/3, this same value would be mulFrac31(0x1FFFFFFF, 8000) = 1999. Converting this back would bring it to divFrac31(1999, 8000) = 0x1FFBE76E. We just chopped off 19 bits of our number! Needless to say, the quality of the audio is affected by this kind of behaviour (stddev is the average square-root difference between samples decoded with and without this behaviour):
stddev: 0.15 PSNR:112.41 bytes: 264014/ 264014
stddev: 0.56 PSNR:101.29 bytes: 264014/ 264014
stddev: 13.73 PSNR: 73.57 bytes: 527906/ 527906
stddev: 2.35 PSNR: 88.87 bytes: 2538240/ 2538240
Which makes me wonder: why do they do that?