Spreadsheet Function Semantics

Anyone who has spent time with Excel spreadsheets knows that Excel has a number of really strange behaviours. I am having a look at criteria functions.

Criteria functions come in two flavours: DCOUNT/DSUM/etc and COUNTIF/SUMIF/etc. The former lets you treat a range as a kind of database from which you can filter rows based on whatever criteria you have in mind and then compute some aggregate function on the filtered rows. For example, compute the average of the “Age” column for those records where the “State” column is either Maine or Texas. The COUNTIF group is a much older set of functions that more or less the same thing, but restricted to a single column. For example, count all positive entries in a range.

In either case, criteria are in play. 12, “>0”, “<=12.5", "=Oink", and "Foo*bar" are examples. The quotes here denote strings. This is already messed up. A syntax like “>0” is fine because the value is an integer. It is fine for a string too. However, the syntax is really crazy when the value is a floating-point number, a boolean or a date because now you just introduced a locale dependency for no good reason — mail the spreadsheet to Germany and get different results. Bravo. And for floating-point there is the question of whether precision was lost in turning the number into a string and back.

Excel being Excel there are, of course, special cases. “=” does not mean to look for empty strings. Instead it means to look for blank cells. And strings that can be parsed as numbers, dates, booleans, or whatever are equivalent to searching for such values. These are all just examples of run-of-the-mill Excel weirdness.

The thing that really makes me suspect that Excel designers were under the influence of potent psycho-active substances is that, for no good reason, pattern matching criteria like “foo*bar” mean something different for the two flavours of functions. For the “D” functions it means /^foo.*bar/ in grep terms, whereas for the “if” functions it means /^foo.*bar$/. Was that really necessary?

The thing is that there really is no good alternative to implementing the weird behaviour in any spreadsheet program that has similarly named functions. People have come to rely of the details and changing the semantics just means 3 or 4 sets of arbitrary rules instead of 2. That is not progress.

I noticed this while writing tests for Gnumeric. We now pass those tests, although I suspect there are more problems waiting there as I extend the test file. I do not know if LibreOffice has the intent of matching Excel with respect to these functions but, for the record, it does not. In fact, it fails in a handful of different ways: anchoring for “D” functions, strictness for DCOUNT, wildcards in general, and the array formula used in my sheet to count failures. (As well as anything having to do with booleans which localc does not support.)

Change

We learn from Matthias that the right way to describe what happened with recent Gtk+ releases is that it changed.

And provided you are thinking of source code, that is not an unreasonable nomenclature: before it worked one way, now it works a different way — it changed. And source code that has to interact with Gtk+ used to do it one way, but now needs to do it another way — it needs to change.

But what if you are thinking of binaries? That is, existing, already-distributed binaries sitting on users’ machines. With the installation of the new Gtk+, such binaries changed from working to non-working. Such a binary evidently needs to change itself. Now, I have been known to prefer to make changes by editing binaries directly (interestingly, arguably thereby turning the binary into source code in the eyes of the GPL) but it is generally not a convenient way of making changes and as a Gnumeric developer I do not expect my users to do this. So how are the binaries on users’ machines going to change from non-working to working? I have no means of reaching users. I can and I will release changed source code, but binaries from that will not reach users anytime soon. Change is not a reasonable description for this; break is. Gtk+ broke Gnumeric. Again. And note, that some of the changes appear to be completely gratuitous.

Emmanuele is rather adamant that these changes were happening to API that was pre-announced to be unstable. I think he is mistaken in the sense that while it might have been decided that this API was unstable, I do not think it was announced. At least I do not seem to be able to find it. Despite prodding, Emmanuele does not seem to be able to come up with a URL for such an announcement, and certainly not an announcement in a location directed at Gtk+ application writers. It may exist, but if it does then it is not easy to find. I looked in the obvious places: The API documentation was not changed to state that the API was subject to change. The release announcements were not changed to state that the API was subject to change. The application development mailing list was not changed by sending a message warning that the API was subject to change. Sitting around a table and agreeing on something is not an announcement. If you want to announce something to application developers then you need to use a channel or channels aimed at application developers.

The situation seems to lend itself to Douglas Adams quotes. I have already used the destruction-of-Earth situation, so here is the earlier one involving the destruction of Arthur Dent’s house:

“But the plans were on display...”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’.”

Another GTK+ ABI Break

It is a familiar situation: a distribution updates Gtk+ to a supposedly-compatible version and applications, here Gnumeric, break.

This time I am guessing that it is incompatible changes to widget theming that renders Gnumeric impossible to use.

I would estimate that this has happened 15-20 times within the GTK+ 3.x series. One or two of those are probably Gnumeric accidentally relying on a GTK+ bug that got fixed, but the vast majority of cases is simply that perfectly fine, existing code stops working.

Imagine the C library changing the behaviour of a handful of functions every release. I suspect GTK+ maintainers would be somewhat upset over that. Nevertheless, that is what is presented to GTK+ application writers.

The question of whether GTK+ applications can be written remains open with a somewhat negative outlook.

Floating-Point Accuracy For Scaling Numbers

Formulas used in computation of special function often contain constant factors that cannot be precisely represented in floating point formats such as the typical “double”. For example, the base-10 log function:

double log10 (double x)
{
  return log(x)/log(10);
}

Chances are that you should be using the library version of this function; I am just using this simple function for illustration.

Several things affect how accurate result we will get out of this formula. Most importantly, we need an accurate log function. That is provided by the C library (in glibc, at least). Here, however, I want to talk about the last part, namely the scaling by log(10).

Clearly log(10) is irrational, so it will not have an exact representation as a double. In fact, this constant is just about a worst case for double. Its value is, 2*1.00100…10101|1000001… (where “|” marks the 53-bit cutoff for double), i.e., the value is just a hair above the midpoint between the two nearest representable numbers. If we use that value, we must live with it having a relative error of about 9.4e-17.

But we’re scaling with that number and there are two ways of doing that: dividing by the value directly or multiplying by its inverse 1/log(10). The latter value, when computed with higher accuracy that double allows, has the value (1/4)*1.10111…01110|0011001… which gives us a relative representation error of 2.5e-17, i.e., only about a quarter of the error in the direct case.

double log10 (double x)
{
  static double l10i = 0.4342944819032518276511289;
  return log(x) * l10i;
}

In practice this gives log10 several extra correct bits. I noticed this when using test values for Gnu Scientific Library’s complex log10 function for testing Gnumeric’s ditto.

I cannot possibly be the first to look into this, but I don’t recall ever having read about it. I have done the analysis for a set of selected constants and the results are:

For these constants, the direct value is best: pi, EulerGamma, log10(2).

For these constants, the inverse value is best: e, log(2), log(10), sqrt(5), sqrt(pi), sqrt(2pi).

For these constants, it’s a tie: sqrt(2) and sqrt(3). Any integer or half-integer power of two will cause a tie because the mantissa of the power and its inverse will be identical.

Note, that the decision is tied to “double”. For “long double” or “float” the results are going to be different.

ODF Plus Ten Years

It’s time for another five-year update on ODF for spreadsheets. Read the initial post from 2005 and the 2010 update for context. Keep in mind that I only have an opinion on ODF for spreadsheets, not text documents.

TL;DR: Better, but ODF still not suitable for spreadsheets.

So what’s new? Well, basically one thing: we now have a related standard for formulas in ODF spreadsheets! This is something that obviously occurred 5-10 years too late, but better late than never. The Wikipedia article on OpenFormula is a fairly amusing example of the need to justify and rationalize mistakes that seems to surround the OpenDocument standard.

OpenFormula isn’t bad as standards go. It has a value system, operators, and a long list of functions, for example. Nice Where it does have problems is in the many choices it allows implementations. For example, it allows a choice whether logical values are numbers or their own distinct type. That would not have been necessary if spreadsheets had been considered in the original standard — at that time OO could have bitten the bullet and aligned with everyone else.

Back to the standard proper. What has happened in the past five years? In a word, nothing. We still have a standard whose aim was to facilitate interoperability, but isn’t achieving it.

There are actually two flavours of the standard: strict and extended. “Strict” has a well-defined syntax complete with an xml schema. Extended is strict with add-your-own tags and attributes. No-one uses strict because there are common things that cannot be represented using it. Error values, for example. A simple line graph with a regression line and a legend, for example.

When the Gnumeric team needs to add something outside “strict” we first look to see if, say, LO has already defined a syntax would can use. We only invent our own when we have to and we try to read any LO extension that we can.

The OO/LO approach, however, appears to be to ignore any other producer and define a new extension. This is part of the “ODS by definition is what we write” mindset. The result is that we end up with multiple extensions for the same things.

So extensions are a free-for-all mess. In fact it is so big a mess that the schema for Gnumeric’s extensions that was hacked up a week ago appears to be the first. Let me rephrase that: for the past ten years no-one in the ODS world has been performing even basic document validation on the documents produced. There are document checkers out there, but they basically work by discarding anything non-strict and validating what is left.

There are also inherent performance problems with ODF. Many spreadsheets contain large areas of identical formulas. (“Identical” does not mean “textually identical” in ODF syntax but rather in the R1C1 syntax where “the cell to the left of this” always has the same name.) ODF has no concept of shared formulas. That forces reparsing of different strings that produce identical formulas over and over again. Tens of thousands of times is common. That is neither good for load times nor for file sizes.

A more technical problem with ODF is that the size of the sheet is not stored. One consequence is that you can have two different spreadsheets that compute completely different things but save to identical ODF files. At least one of them will be corrupted on load. That is mostly a theoretical concern, but the lack of size information also makes it harder to defend against damaged (deliberately or otherwise) input. For example, if a file says to colour cell A12345678 red we have no way of telling whether you have a damaged file or a very tall spreadsheet.

Gnumeric continues to support ODF, but we will not be making it the primary format.

Strace Service Message

Just a service reminder: application writers should run their applications under strace from time to time.

I just did for Gnumeric and discovered that on startup we were creating a large number of files in /tmp like this:

open("/tmp/gdkpixbuf-xpm-tmp.XAXESX", O_RDWR|O_CREAT|O_EXCL, 0600) = 9

I tracked this down to embedding icons in xpm format. The gdk-pixbuf loader for xpm is unable to load from memory so it creates a temporary file and loads from that. Ick! The solution is to fix and deploy the loader (impractical), not use xpm (possible), or to use preprocess='to-pixdata' when embedding.

Writing Tests is Humbling

I have spent some time recently writing import/export tests cases for Gnumeric. It is what you do when you see a mistake that it should not have been possible to make, in this case a hang when writing certain strings to the obsolete xls/biff7 format.

Writing these tests has been a very humbling experience. Highly recommended.

A lot of the code being subjected to tests is quite old: 10-15 years old. You would think that by now it would have had any obvious bugs beating out of it. No such luck. Not only were there ancient bugs — such as the direction of diagonal borders being flipped on load+save — but there were also bugs accidentally introduced when fixing other things.

Bugs happen, even where you think you are being careful. And that is not a problem. The problem arises when the bugs are not caught and make it into releases. This is where a brutal test suite is needed. Our test suite clearly was not evil enough for import and export of various formats, so I have been adding something I call round-trip tests for ods, xls/biff7, xls/biff8, and xlsx.

A round-trip test is a test that when we convert to a given format and back to our own Gnumeric format, nothing changes. Or, if something does change, then only parts we understand change. If the format is deficient, there is not much we can do. For example: ods cannot store patterned background or the sheet size; xlsx cannot store solver parameters; xls/biff7 cannot store arbitrary unicode; and xls/biff8 has a fixed sheet size of 64k-by-256. Note, that by itself a round-trip test does not guarantee that what we produce is correct. We could, hypothetically, have swapped division and multiplication are still gotten a perfect round-trip. To test that the generated files are correct one has to load the resulting sheet in Excel or LibreOffice (which, despite claims to the contrary, are what really defines xls/xlsx and ods formats). Unfortunately, I do not know how to script that so it is not automatic.

As a result of all the new tests, the recently released Gnumeric 1.12.12 should interact better with other spreadsheets.

Note: some of these new tests were probably a decade overdue. My excuse is that The Gnumeric Team is fairly small. I do hope that LO/OO already have an evil test suite, but I am not optimistic. I ran a few of my test sheets through LO and saw things like truncated strings.

Spreadsheets and the Command Line

Spreadsheets are not the most obvious type of document to manipulate from the command line. They are essentially a visual tool meant for interactively exploring data. Experience has shown, however, that there are certain spreadsheet tasks for which the command line is very useful and Gnumeric supplies a number of command line tools for this.

  • ssconvert converts spreadsheets from one format to another, for example from xls to ods. That sounds fairly simple — load one, save the other — and it pretty much is although there are things such as merging several files or extracting parts of files that add a little complication. Since Gnumeric can save as pdf, this tool also allows command-line printing of spreadsheets.
  • ssgrep is like grep is for text files. And it has about the same set of options.
  • ssindex is used by things like tracker and beagle to find of pieces of text in spreadsheet files.
  • ssdiff is a new tool in the upcoming release. It compares two spreadsheets and outputs a list of differences between them. There are three output modes so far: (1) a text format, (2) an xml format, and (3) a mode that outputs a copy of one of the input files with differing cells marked in neon yellow.

None of these programs are big: 300-1100 lines of C code, ssdiff being the largest only because of its three output modes.

Sorting Icons Theme Mess

In my long-running series on why themes are evil, I bring you the newest installment.

Consider the gtk stock icon GTK_STOCK_SORT_ASCENDING which is supposed to represent sorting elements to make them increasing according to some order, typically numerically or alphabetically. The icon for such an action is supposed to somehow convey what happens when it is pressed, all in, say, 24×24 pixels.

Take a look at different themes and how they implement the icon:

eog `find /usr/share/icons/ -print | grep sort-ascending`

This command will show you the icon images with some duplication due to multiple sizes.

Observations:

  • Some show an up-arrow, others show a down-arrow. Yet others show a diagonal arrow which isn’t as bad as it sounds because such arrows are annotated.
  • Some arrows have no annotations, some are annotated by “1..9”, and yet others are annotated to “a..z”.

Officially this is a mess[tm]. When annotations are present they hint at either numerical or alphabetical ordering which may or may not match what the application does. That’s minor. But when no annotations are present, the situation is far worse: my sort-ascending button looks like someone else’s sort-descending simply because of theme differences!

I don’t know how this mess came about, but it ought to be resolved. I suggest that when the icons look like vertical arrows, sort-ascending should point down because the elements of a list will then be increasing in the direction of the arrow.