Code Quality, Part II

I have been known to complain loudly when I see code that I feel should have been better before seeing the light of day. But what about my own code? Divinely inspired and bug free from day one? Not a chance!

With Gnumeric as the example, here is what we do to keep the bug count down.

  • Testing for past classes of errors. For example, we found errors in Gnumeric’s function help texts, such as referring to arguments that do not exist or not describing all the arguments. The solution was not only to fix the problems we found, but also to write a test that checks all the function help texts for this kind of errors. Sure enough, there were several more. They are gone now, and new ones will not creep in. We do not like to make the same mistake twice!
  • Use static code checkers. This means that we keep the warning count from “gcc -Wall” down so know nothing serious is being ignored. We have looked at c-lang and Coverity output and fixing the apparent problems. (Those tools have pretty high false report rates, though.) We occasionally use sparse too and have a handful of dumb perl scripts looking for things like GObject destroy/finalize/etc handlers that fail to chain up to the parent class.
  • Use run-time code checkers. Gnumeric has been run through Valgrind and Purify any numbers of times. It is part of the test suite, so it happens regularly. This is regrettably getting harder because newer versions of Gtk+ and the libraries upon which it is built hold on to more and more memory with no way of forcing release. Glib has a built-in checker for some memory problems. We use that too.
  • Automated tests of as many part of the program as we have found time to write. The key word here is “automated”. I used to be somewhat scared of changing the format string (number rendering) code, because there was basically no way of making sure no new errors were introduced in that hairy piece of code. With the extensive test suite, I have no such reservations anymore.
  • Fuzzing, i.e., deliberately throwing garbled input at the program. I wrote tools to do this subtly for xml and files inside a zip archive in such a way that the files are still syntactically correct xml or zip files — otherwise you end up only testing the xml/zip parser which is fine, but not sufficient.
  • Google for Gnumeric. Not every will report problems to us, but they might discuss issues with others. Google seems to be pretty good at finding such occurrences.

The take-home message from this is that code quality is work. Lots of work. And yet we still let mistakes through. I blame that on the lack of a proper QA department.

ODF Plus Five Years

Five years ago I strongly criticized the OpenDocument standard for being critically incomplete for spreadsheets since it left out the syntax and semantics of formulas. As a consequence it was unusable as a basis for creating interoperable spreadsheets.

Off the record several ODF participants agreed. The explanation for the sorry state of the matter was that there was a heavy pressure for getting the ODF standard out of the door early. The people working on the text document part of the standard were not willing to wait for the spreadsheet part to be completed.

That was then and this is now. Five years have passed and there has been no relevant updates to the standard. However, one thing that has happened is that is that Microsoft started exporting ODF documents that highlight the problems I pointed out. ODF supporters cried foul when it turned out that those spreadsheets did not work with OpenOffice. In my humble opinion, those same loud ODF supporters should look for the primary culprit at the nearest mirror. You were warned; the problem was obvious for anyone dealing with programming language semantics; you did nothing.

So given the state of the standard, where does that leave ODF support in spreadsheets? Microsoft took the least-work approach and just exported formulas with their own (existing) syntax and semantics. Of course they knew that it would not play well with anyone else, but that was clearly not a priority in Redmond. Anyone else at this point realizes that ODF for spreadsheets is not defined by the standard, but by what part OpenOffice happens to implement. Just like XLS is whatever Excel says it is.

One implications is that ODF changes whenever OpenOffice does. For example, OpenOffice has changed formula syntax at least once — a change that broke Gnumeric’s import. If you follow that link, you can see that OpenOffice did precisely the same thing that Microsoft did: introduce a new formula namespace. Compare the reactions. For the record, in Gnumeric the work involved in supporting those two new namespaces were about the same.

For Gnumeric the situation remains that we will support ODF as any other spreadsheet file format. Until and unless the deficiencies are fixed, ODF is not suitable as the native format for Gnumeric or any other spreadsheet. (There are other known problems with ODF, but those are somewhat technical and not appropriate here.)

Note: I want to make clear that the above relates to spreadsheets only. I know of no serious problems with ODF and text documents, nor do I have reason to believe that are any.

It’s a Python Bug

I am not perfect. Therefore the code I write is not perfect. Every once in a (rare!) while I have been known to write code that people of evil mind could use. I deserve blame for that. I do not deserve this kind of blame.

What we have here is a Python bug: when embedding Python, we (and half a dozen other applications) use PySys_SetArgv according to spec. Python, in its wisdom, uses that as a clue to start loading python files in unexpected places. The right thing to do would be to fix Python as well as any code that might depend on the bug. That was not done.

Somehow it was chosen instead to file this against Gnumeric and half a dozen other applications. As a design error no less. That is simply offensive! Let us hope no problem is found in libc’s malloc, because dealing with bug reports for all users of that would take some effort.

Why does Python get this kind of reputation protection? They screwed up, so let them take the blame and have them fix the bug. If that breaks other applications, well so be it – the Python people will eloquently explain that to their community.

Applications

I my optics, computers are here to get certain jobs done. That means it is all about applications, not eye candy: bouncing icons, themes, semi-transparent windows. My real-life work desk is not transparent, and I do not use semi-transparent paper.

Producing large applications is a lot of work, so when I write a piece of (hopefully) well-designed code, I want that code to stay written. I do not want next week’s GTK+ deprecation to come along and, effectively, cause my code to bitrot. (and I really do not want to write two different pieces of code for the job: one for “old” GTK+ and one for “new” GTK+.)

Moving from GTK+ 1.x to GTK+ 2.x was painful. I do not need anything like that again. Talks about breaking API every 3-4 years and advice like “Stay up to date, adapt your application code early” (and, by implication, often) is a clear indication that keeping applications running is likely to mean spending much time cleaning up after someone with an attention span of a few years.

Maintaining code like GTK+ is not hard. Calling it hard because you want to play with some new toy is deceiving.  Maintaining can be tedious, but if you do not want to maintain, please do not start writing new GTK+ code. You will surely abandon that prematurely too, so you have no business writing library code. Instead, go write a useful application: if you abandon that, I probably do not have to care.

Themes Are Evil, Part II

In a previous post, I showed how a GTK+ theme engine can corrupt memory of any application unfortunate enough to be used with it.

In today’s edition, our guest star is the Qt theme engine. It does not, as far as I know, corrupt your memory or otherwise make your innocent application crash.[*] Instead it changes how your program works. For example, for Gnumeric it changes how numbers imported are handled.

If you import the number “8,5” in a decimal-comma locale then you would hope to get eight-and-a-half, right? Well, with the Qt theme you get eight and we, the Gnumeric team, look incompetent. The problem arises because the Qt theme, quite reasonably, initializes the qt library. During that, less reasonably, the following code gets executed:

setlocale( LC_ALL, “” ); // use correct char set mapping
setlocale( LC_NUMERIC, “C” ); // make sprintf()/scanf() work

I am not kidding. The Qt library thinks it should change your locale. What on Earth have the Trolls been drinking? Impure home destilled booze in large quantities?

This problem in various disguises have had us puzzled for quite a while and only very recently was the Qt theme identified as the triggering factor. Once that happened, it was not too hard to locate, but before that we have spent maybe 40 hours looking for this bug. The workaround is to set up a one-shot idle handler that resets the locale properly when the gui comes us. (Repeat this for every GTK+ program that displays or accepts floating-point values.)

The Qt theme people never caught this. If they are mostly “theme” people I can understand, but if they are mostly “Qt” people they really should have known. In either case, it is another exhibit for the case that the GTK+ theme model is seriously flawed.

[*] Well, if you use threads it might. The Qt library calls setlocale to change locale and that’s not allowed in a threaded program.

OOXML vs ODF

I had a look at “OOXML is defective by design” and, quite frankly, I am not impressed.

On my surface it is a comparison of OOXML and ODF and it comes out as a landslide victory to ODF. But anyone who has worked with spreadsheet file formats will easily see that it was written by someone who, intentionally or otherwise, is deaf, dumb, and blind to the shortfalls of ODF. And if that is where you start, then what is the point?

For example, OOcalc suffers from exactly the same rounding issue that Excel does. How could it be any different when both are based on floating point numbers? (An in neither case is that a file format issue, but rather an implementation issue.)

For example, the reason that he can happily declare that ODF has backwards compatibility is that he choses a graph sample. And OOcalc’s graphing system has not, shall we say, seen a lot of improvement since the version he tried with.

Don’t complain that “ECMA 376 documents just do not exist” when the same can be said for ODF. As-of version 1.1 of the specification there still seems to no syntax for 2+2.

One could also ask a question such as “how well can legacy spreadsheet files be represented in either format?” A very reasonable question, in my humble opinion, given the number of sheet out there. Of course, since ODF doesn’t actually have non-trivial formulas, we should probably just interpret it with respect to OOcalc’s format. I do not think ODF would fare well here.

Disclaimer: I have not, and I probably will not, read the full OOXML spec.

Formatting Numbers

I have spent a few evenings working on Gnumeric‘s number formatting,
i.e., the process that takes a value (3.14, “xyz”, TRUE, …) and
a format (an object initialised from a string like “[red]0.00”)
and use them to produce the string displayed in a spreadsheet cell.

Format strings are, if the user gets near them, an unmitigated GUI
disaster. How about this beau?

  dd-mmmm-yyyy[$-40b]/dd-mmmm-yyyy[Whitestone"76]*;;0/128[Blue]

(Which means typeset a non-negative number, representing a date, twice, once with month in the current langugage and once in Finnish. If there is room leftover in the cell, fill on the right side with semicolons. Oh,
and make it all white. Negative numbers, however, should be written in blue as the nearest 128th, without the minus. Non-numbers should be left as-is.)

Excel actually exposes hexadecimal numbers there! And
the parsing rules are really complicated and very much undocumented.
Well, it is documented in a variety of places, but the documentation is always combinations of wrong and incomplete.
I doubt anyone currently at Microsoft knows the details at this point
in time, but they can at least look at the source code. And format
strings can be translated (back and forth) in undocumented ways too.
Ick.

Anyway, I have been compiling a test workbook for formats. It uses the TEXT function which conveniently exposes most of the formatting logic. (Note: you must run in the US locale as many tests depend on that.)
Think of the file as a collection of horrors.
With my (unpublished) code, the score is:

Gnumeric: Pass: 606; Fail: 0
Excel: Pass 594; Fail: 12
OOo: Pass: 221; Fail: 69788

It is important to understand somethings here:

  • Excel can be wrong even though it is nominally defining the semantics. Most of the failures are avoidable overflows in fraction formats.
  • The workbook was not written to make Gnumeric look good. It was written as a tool to help Gnumeric become good. And, in fact, if you loaded the file in older Gnumerics, you would see less than stellar results. Prior to version 1.7.7, Gnumeric would even read memory beyond the end of strings and thus possibly crash or, more likely, produce bogus results.
  • The workbook was not written to make OO look bad. The fact that Gnumeric appears better is not only that I fixed Gnumeric, but also that I can only test the things I can think of. There might very well be formats that OO handles and Gnumeric does not. That is the problem with a basically undocumented language. Further, one problem might very well result in five or ten tests failing — things are not independent.
  • The weird failure count for OO comes from array formulas that OO cannot handle. :-) At least one failure comes from incorrectly loading the constant to check against.

10x+ Better Compression Than Gzip

I wanted to create an archive of all released Gnumeric versions.
Gnumeric’s CVS tree saw a lot of hacking on the ,v files so neither
CVS nor the derived SVN tree are useful for reconstructing past
releases. They are useful for tracking a given file’s history
minus the renames it went through.

So I hacked up a script to create a git archive for me. (You cannot actually run that script, though: it hits a “tar” bug — ick! And after hacking that, beware that it takes a long, long time to run.)

Total size of 172 tar files: 1508026377 bytes.
Total size of git archive: 139733921 bytes
Ratio: 10.8

Not too shabby, eh? Even if the corpus is pretty special.

Seeing what changed between releases is as
simple as git diff -u GNUMERIC_1_7_0..GNUMERIC_1_7_1
and very fast.

Testing is not an Option!

I released Gnumeric 1.7.3
only to discover that a little too much editing killed evaluation in
very common situations. Bad me! 1.7.4 is out.

That is not going to happen again.

I sat down and spent a few hours automating most of our tests. Then
I added a valgrind run and the beginning of tests of our importers.
It is part of “make distcheck”, so testing is now mandatory and automatic.

The workhorse of these tests is ssconvert, our handy little command-line utility that converts from one format to another.
By forcing evaluation of all cells between import and export,
we end up exercising quite a large part of the core. As well
as a few importers and exporters. No GUI tests are currently performed, but I suspect we can add that
too somehow.

A Bugfix A Day…

It has been a while since I have been poured some water out of my
ears here, but I have been busy. A couple of months ago I decided
to fix a Gnumeric bug per day. And I have by and large kept that
and our
NEWS has been growing like weeds.

Note, that there are huge differences in the amount of work behind
the items lists. “Allow ={+42}” was a trivial one-liner, while
“Introduce top-level expressions” was a massive and
intrusive patch.

But I am running out of little issues to fix on lazy days and I
rarely have any significant amount of time during the week.