Formatting Numbers

I have spent a few evenings working on Gnumeric‘s number formatting,
i.e., the process that takes a value (3.14, “xyz”, TRUE, …) and
a format (an object initialised from a string like “[red]0.00”)
and use them to produce the string displayed in a spreadsheet cell.

Format strings are, if the user gets near them, an unmitigated GUI
disaster. How about this beau?

  dd-mmmm-yyyy[$-40b]/dd-mmmm-yyyy[Whitestone"76]*;;0/128[Blue]

(Which means typeset a non-negative number, representing a date, twice, once with month in the current langugage and once in Finnish. If there is room leftover in the cell, fill on the right side with semicolons. Oh,
and make it all white. Negative numbers, however, should be written in blue as the nearest 128th, without the minus. Non-numbers should be left as-is.)

Excel actually exposes hexadecimal numbers there! And
the parsing rules are really complicated and very much undocumented.
Well, it is documented in a variety of places, but the documentation is always combinations of wrong and incomplete.
I doubt anyone currently at Microsoft knows the details at this point
in time, but they can at least look at the source code. And format
strings can be translated (back and forth) in undocumented ways too.
Ick.

Anyway, I have been compiling a test workbook for formats. It uses the TEXT function which conveniently exposes most of the formatting logic. (Note: you must run in the US locale as many tests depend on that.)
Think of the file as a collection of horrors.
With my (unpublished) code, the score is:

Gnumeric: Pass: 606; Fail: 0
Excel: Pass 594; Fail: 12
OOo: Pass: 221; Fail: 69788

It is important to understand somethings here:

  • Excel can be wrong even though it is nominally defining the semantics. Most of the failures are avoidable overflows in fraction formats.
  • The workbook was not written to make Gnumeric look good. It was written as a tool to help Gnumeric become good. And, in fact, if you loaded the file in older Gnumerics, you would see less than stellar results. Prior to version 1.7.7, Gnumeric would even read memory beyond the end of strings and thus possibly crash or, more likely, produce bogus results.
  • The workbook was not written to make OO look bad. The fact that Gnumeric appears better is not only that I fixed Gnumeric, but also that I can only test the things I can think of. There might very well be formats that OO handles and Gnumeric does not. That is the problem with a basically undocumented language. Further, one problem might very well result in five or ten tests failing — things are not independent.
  • The weird failure count for OO comes from array formulas that OO cannot handle. 🙂 At least one failure comes from incorrectly loading the constant to check against.