Namespaces

Gnumeric’s solver was broken in HEAD and while fixing it, I
updated to the latest version of lp_solve.

Let me tell you, lp_solve is a prime example of how not to make
a library! It looks like there used to be a program and that it
was made into a library by removing main.

There is no concept of namespaces there. When you include the
relevant header file, you get everything used anywhere internally:
EQ, gcd, MALLOC, TRUE, is_int, and about 400-600
other identifiers.

You cannot isolate that problem to just where you use the header,
by the way, as static is practically usused.

I decided to throw a perl script at the problem and combine everything into one
gaint C file. All 44186 lines of it after pruning about 5000 lines.
The script adds tons of statics in the process,
renames the relevant part of the API, and extracts
that API. Extra points for you if you can read the perl script
without losing your breakfast.

Utility Functions

Dom,
it is probably not that they are being inefficient or behaving illogically. It is more like that they are optimizing a utility
function somewhat different from the one you would naïvely expect.

(A mathematician takes a walk and comes by a house on fire; he calls
the fire department and they come and put out the fire. The next
day he comes by a house that is not on fire; he sets it on fire and
walks on after thus having reduced the problem to a previously solved
one.)

Common Subexpressions

It turns out that it is moderately common to have large number of
VLOOKUPs (or HLOOKUPs) in the same spreadsheet. Gnumeric is embarrassingly
slow
for this. There are several reasons for this.

Profiling where the time
is spent points the blame at g_utf8_collate.

Thinking about the problem, however, suggests a different cause, namely that we are evaluating collate keys for the table every
once for every VLOOKUP. That is simple, easy to understand, and not
prone to obscure problems, but evidently it is not good enough.
Luckily it should be quite easy to add some kind of cache for this.

If I was redesigning the evaluation engine from the ground up, I would
probably compile expressions into some kind of byte code with common
subexpressions explicitly taken care of. But I am not, so the above
cache will have to do for now. That should also handle the case where
the subexpressions are not statically common, but the result of
something like INDIRECT.

INDIRECT, btw., is the single most ugly
feature of spreadsheet semantics. It turns the result of an expression into a cell or name reference and if I was designing
a proposed standard
formula syntax and semantics
for spreadsheets I would think
long and hard about INDIRECT and its consequences. But I am not.
(Interestingly, most uses of INDIRECT that I have seen would be
far better handled as INDEX calls.)

Back to g_utf8_collate. It works by converting
both strings, in their entirety, to a normalized format and then
comparing those. In a language like C, as opposed to Haskell, that
is quite wasteful in two ways:

  • The comparison is done character-by-character from the
    beginning on the strings. That means that it is very common to
    only look at the first few characters of the normalized format. In that case, why was the whole thing normalized?
  • The normalization process allocates space for the normalized format in the form of a GString. That is slow and not needed at
    all since the comparison just needs a single character at a time.

It gets even sillier if you want to do the comparison while ignoring
letter case. Then you first get to case fold the strings in their
entirety before you can call g_utf8_collate.

FileChooser

Federico is bringing up the file chooser’s lack of speed again. Good.

The first step, IMHO, should be to get rid of reloading the folder
of widget mapping. It is wrong to do non-widget, expensive and externally-visible actions in a widget mapping handler. If someone
switches to another virtual screen and back (for example to peek at something),
do we really want to reload the folder? Do we lose the selections
in the process? If The Gimp wants that behaviour, I say it can
install a handler and trigger it itself.

Would things appear to be faster if we installed a single-shot
idle-handler that created a file chooser and threw it away?
(That would be a work-around more than a fix, of course.)

There is more to your item 7 than just performance, btw. It should
not stat() all those parent directories because it may not be allowed
to do so. If you just succeeded in stat(“/foo/bar/baz”) then
it should not be necessary to check that “/foo” and “/foo/bar” are
directories.

Nat is Crazy

Nat, that is crazy.
You are not exactly 25 years old anymore.

Ok, I have done worse, i.e., I have done 200 miles a couple of times.
That takes about 12 hours, all breaks included. Some things you ought
to know in advance:

  • Your route looks somewhat dangerous, traffic-wise. Wear something
    with screaming colours.
  • Your behind will be as sore as your jaw was recently.
  • Your ability to control various muscles you did not even know you
    had will be temporarily affected. Do not drive a car yourself for a day.
  • If you get wet, say around Quincy, the trip is not going to be
    any fun. You will end up with a mixture of water and road dirt thrown from your own wheels all over yourself. It will make an unpleasant sound on your teeth. (Been there, done that.)

On Bugs Reports

For entertainment purposes only:

Evidently someone dumped a feature request for Gnumeric on
comp.os.linux.advocacy and is mildly upset that after a year it has not been implemented.
(Not even an interesting feature, mind you.)

In my humble opinion that is like tattooing an Excel feature request on
your butt, flying off to Redmond and mooning the Microsoft campus on
a dark and rainy winter evening. That is likely to work, for sure.

Bugzilla might not be the world’s best bug reporting interface,
but it sure beats mailing lists and irc in terms on probability
of the bug report not being forgotten forever.

_A_

Nat,
here is your chance to recycle a few select quotes from The Incredibles.

_A_

I do not recall having a graduation ceremony after my own time in
pre-K.

More Burgers

Robert, I am not at all
disputing the claim that electricity costs are slow to adapt. I will
take your word for that.

But the claim you proxied was different, namely that slowness in
adaption was somehow related to lack of international trading. Yet,
the slowest adapting price you mention is for a perfectly fine,
internationally traded commodity, complete with a futures market. The scientific solution here is to discard or at least
amend the hypothesis that slow adaption is related to whether there is
international trading. Wearing my mathematician’s hat, I do not
accept proof by “widely considered”.

For the record, and without thinking too deeply about it, I would
guess that electricity prices adopt slowly is mainly because the
building of power plants and distribution network starts with a
huge initial capital expense which has to be earned back (or not,
depending on your economical system, 🙂 over a long time.

I do recall you implying that without a minimum wage there would be no unemployment. That is pure reaganomics aka nonsense. Do a fact check, i.e., look for times
or places with no minimum wage and see if unemployment occurs. For
example, during the great depression there was widespread unemployment
but no minimum wage. (You could find newer examples, but then you would have to find them abroad.) Science: hypothesis has been
proven wrong and
gets discarded. Politics: let’s hope no-one notices and draw a cat.

Totally unrelated, I agree that seizing people’s property, even with economic
compensation, really ought to be based on something a bit more
essential than the love of strip malls.