ODF Plus Ten Years

It’s time for another five-year update on ODF for spreadsheets. Read the initial post from 2005 and the 2010 update for context. Keep in mind that I only have an opinion on ODF for spreadsheets, not text documents.

TL;DR: Better, but ODF still not suitable for spreadsheets.

So what’s new? Well, basically one thing: we now have a related standard for formulas in ODF spreadsheets! This is something that obviously occurred 5-10 years too late, but better late than never. The Wikipedia article on OpenFormula is a fairly amusing example of the need to justify and rationalize mistakes that seems to surround the OpenDocument standard.

OpenFormula isn’t bad as standards go. It has a value system, operators, and a long list of functions, for example. Nice Where it does have problems is in the many choices it allows implementations. For example, it allows a choice whether logical values are numbers or their own distinct type. That would not have been necessary if spreadsheets had been considered in the original standard — at that time OO could have bitten the bullet and aligned with everyone else.

Back to the standard proper. What has happened in the past five years? In a word, nothing. We still have a standard whose aim was to facilitate interoperability, but isn’t achieving it.

There are actually two flavours of the standard: strict and extended. “Strict” has a well-defined syntax complete with an xml schema. Extended is strict with add-your-own tags and attributes. No-one uses strict because there are common things that cannot be represented using it. Error values, for example. A simple line graph with a regression line and a legend, for example.

When the Gnumeric team needs to add something outside “strict” we first look to see if, say, LO has already defined a syntax would can use. We only invent our own when we have to and we try to read any LO extension that we can.

The OO/LO approach, however, appears to be to ignore any other producer and define a new extension. This is part of the “ODS by definition is what we write” mindset. The result is that we end up with multiple extensions for the same things.

So extensions are a free-for-all mess. In fact it is so big a mess that the schema for Gnumeric’s extensions that was hacked up a week ago appears to be the first. Let me rephrase that: for the past ten years no-one in the ODS world has been performing even basic document validation on the documents produced. There are document checkers out there, but they basically work by discarding anything non-strict and validating what is left.

There are also inherent performance problems with ODF. Many spreadsheets contain large areas of identical formulas. (“Identical” does not mean “textually identical” in ODF syntax but rather in the R1C1 syntax where “the cell to the left of this” always has the same name.) ODF has no concept of shared formulas. That forces reparsing of different strings that produce identical formulas over and over again. Tens of thousands of times is common. That is neither good for load times nor for file sizes.

A more technical problem with ODF is that the size of the sheet is not stored. One consequence is that you can have two different spreadsheets that compute completely different things but save to identical ODF files. At least one of them will be corrupted on load. That is mostly a theoretical concern, but the lack of size information also makes it harder to defend against damaged (deliberately or otherwise) input. For example, if a file says to colour cell A12345678 red we have no way of telling whether you have a damaged file or a very tall spreadsheet.

Gnumeric continues to support ODF, but we will not be making it the primary format.