Archive for the ‘Gnumeric’ Category

Namespaces

Thursday, August 25th, 2005

Gnumeric’s solver was broken in HEAD and while fixing it, I
updated to the latest version of lp_solve.

Let me tell you, lp_solve is a prime example of how not to make
a library! It looks like there used to be a program and that it
was made into a library by removing main.

There is no concept of namespaces there. When you include the
relevant header file, you get everything used anywhere internally:
EQ, gcd, MALLOC, TRUE, is_int, and about 400-600
other identifiers.

You cannot isolate that problem to just where you use the header,
by the way, as static is practically usused.

I decided to throw a perl script at the problem and combine everything into one
gaint C file. All 44186 lines of it after pruning about 5000 lines.
The script adds tons of statics in the process,
renames the relevant part of the API, and extracts
that API. Extra points for you if you can read the perl script
without losing your breakfast.

The Cat’s Out…

Thursday, August 18th, 2005

…and so is Gnumeric 1.5.3. (Complete with a big ugly, but harmless,
error message on xls save — oops! At least we got it fixed in time
for Debian and the Win32 build.)

Home;
Source;
Release notes;
Changes.

Common Subexpressions

Tuesday, August 2nd, 2005

It turns out that it is moderately common to have large number of
VLOOKUPs (or HLOOKUPs) in the same spreadsheet. Gnumeric is embarrassingly
slow
for this. There are several reasons for this.

Profiling where the time
is spent points the blame at g_utf8_collate.

Thinking about the problem, however, suggests a different cause, namely that we are evaluating collate keys for the table every
once for every VLOOKUP. That is simple, easy to understand, and not
prone to obscure problems, but evidently it is not good enough.
Luckily it should be quite easy to add some kind of cache for this.

If I was redesigning the evaluation engine from the ground up, I would
probably compile expressions into some kind of byte code with common
subexpressions explicitly taken care of. But I am not, so the above
cache will have to do for now. That should also handle the case where
the subexpressions are not statically common, but the result of
something like INDIRECT.

INDIRECT, btw., is the single most ugly
feature of spreadsheet semantics. It turns the result of an expression into a cell or name reference and if I was designing
a proposed standard
formula syntax and semantics
for spreadsheets I would think
long and hard about INDIRECT and its consequences. But I am not.
(Interestingly, most uses of INDIRECT that I have seen would be
far better handled as INDEX calls.)

Back to g_utf8_collate. It works by converting
both strings, in their entirety, to a normalized format and then
comparing those. In a language like C, as opposed to Haskell, that
is quite wasteful in two ways:

  • The comparison is done character-by-character from the
    beginning on the strings. That means that it is very common to
    only look at the first few characters of the normalized format. In that case, why was the whole thing normalized?
  • The normalization process allocates space for the normalized format in the form of a GString. That is slow and not needed at
    all since the comparison just needs a single character at a time.

It gets even sillier if you want to do the comparison while ignoring
letter case. Then you first get to case fold the strings in their
entirety before you can call g_utf8_collate.

On Bugs Reports

Monday, July 11th, 2005

For entertainment purposes only:

Evidently someone dumped a feature request for Gnumeric on
comp.os.linux.advocacy and is mildly upset that after a year it has not been implemented.
(Not even an interesting feature, mind you.)

In my humble opinion that is like tattooing an Excel feature request on
your butt, flying off to Redmond and mooning the Microsoft campus on
a dark and rainy winter evening. That is likely to work, for sure.

Bugzilla might not be the world’s best bug reporting interface,
but it sure beats mailing lists and irc in terms on probability
of the bug report not being forgotten forever.

OpenDocument for Spreadsheets

Thursday, June 16th, 2005

From time to time I am asked if Gnumeric supports or will support the
OpenDocument standard. I am sad to report that the question is
meaningless. The purpose of a standard ought to be interoperability;
this standard is not a step in that direction.

Executive summary: The OpenDocument Standard v1.0 completely ignores semantics of spreadsheets, thus virtually ensuring
that if two implementations of the standard existed then they could
not use each other’s spreadsheets. That, in turn, renders the alleged
standard pointless for spreadsheets.

The OpenDocument standard v1.0 is 708 pages long. You would think that in such
an elephant they could find the space to hammer out a usable spec for
spreadsheets. But no, they do not.

Formulas (or formulae if you are feeling learned) is an important part
of what goes into spreadsheets. What does the alleged standard have
to say about those? Basically you go to page 184-186 and find the
two (equivalent) formula examples that it has:

  =sum([.A1:.A5])
  =sum([.A1];[.A2];[.A3];[.A4];[.A5])

Pedantics — that would be me — would point out that both these
fail to comply with the formula requirement “Every formula should begin with a namespace prefix specifying the syntax and semantics used within the formula.” Hmmm… Or maybe “=” is that prefix and
anyone seeing it will magically understand the syntax and semantics.

Regardless, we learn

  • the existence of a function named sum. I hope you can make good use of it, because it is
    the only function we get to know about. (There is a handful more
    in contexts unrelated to formulas.)
  • that there exists operators. Which ones seems to be
    a secret, apart from the fact that some of them are logical.
  • that numbers can be part of formulas. We will have to assume that regular C-like syntax for them is used.
  • that strings can be part of formulas. There is no clue about
    how to write them; I will guess that you put them in quotes,
    but what
    about embedded quotes, then?

There is what appears to be the beginning of a value system on pages 184-185. You get numbers, dates, times, booleans, and strings.
Notice the absence of error values and “empty”, let alone arrays.
Excel has those, Gnumeric has them, and OOo-calc has at least error values. Thus there
is no chance that someone following the OpenDocument standard can
approximate Excel semantics, nor for that matter is there any chance
that OOo-calc can implement the standard with its current value
system.

It gets worse. Let us assume that a proper value system was in
place. If you wanted to have any hope of achieving interoperability
you would them have to define function (and operator) semantics
for a set of functions large enough to be useful. Details matter.
For example, for our friend sum above, Excel will
completely ignore strings like “42″ (with the quotes) in referenced
cells, but it will convert them to numbers and include them in the
sum when given as direct arguments to SUM. (Yes, that is sick.) In
technical terms, Excel does not have referential integrity.
OOo-calc lacks referential integrity too and in the direct-argument case produces an error, i.e., it is sick in a different way. (For completeness, Gnumeric ignores all strings for SUM; this might change.) As a cuter example, OOo-calc manages to evaluate =”1″+1 to
TRUE!

So there. As far as spreadsheets are concerned, the OpenDocument Standard v1.0 is the equivalent of giving precise punctuation rules for sentences without telling if it is for English, German, French, or
something else.

Only vaguely related is the question of whether Gnumeric should be
able to read OOo-calc files. The answer is yes, and we do have some
support for it. In practice it is probably easier to interoperate
via the xls format.

Gnumeric 1.5.2

Tuesday, June 14th, 2005

The weekend saw the release of Gnumeric version 1.5.2.
It should probably have a catchy name, but it does not.

This is a lots-of-little-things release that ought to make for a smoother ride.

Dependencies Re-Redux

Thursday, June 9th, 2005

I finally did manage to hack the dependency tracking code into a
state where invalidation of a sheet (done because the user performed the
undo-able removal of the sheet) is distinct from the destruction of
the sheet.

That sounds technical but the nice results are: (1) we no longer
have to clear the undo/redo queues when someone deletes a sheet;
(2) we now track and undo changes to other sheets’ cells when a
sheet is removed; (3) we track and undo changes to global names’
definitions when a sheet is removed.

These are not things you notice when they work, but you do when they
do not work. Chalk one up for user experience.

Multihead

Monday, May 23rd, 2005

Applications that support multiple windows should support multihead
displaying, i.e., it should be possible to put some of the windows
on a separate screen. If you are blessed with two screens it is
a very nice productivity improvement to use that extra screen for
reference materials relating to what you work on. Applications
that support this are cool.

Gnumeric is cool, of course. Emacs is cool. Mozilla is not cool.

Unfortunately multihead support is a bit of a late add-on for GTK+.
The duct tape is visible here and there. For example, every time you
create a widget for the non-default screen, you in effect build it
for the default screen with all the setting that relate to that,
then warp it over to the other screen and redo all the settings.
Anything that relies on settings — icon sizes, fonts, etc. — needs
to be done or redone when the widget is realized.

But GTK+ is better than the rest of the Gnome stack. There are
still places that build X cursors for the wrong display, fail to
catch screen-changed signals, or grab the cursor on the wrong
display. The results range from the cute (popups and menus show on
the wrong screen) over the irritating (application crash) to the
worse (grabbed X cursor).