January 2005 – Elijah’s Blog

Valgrinding Gnome

A few months back, Daniel ran a
full gnome session under Valgrind. He did so with a Fedora 3
system but without debuginfo packages installed. So, I used his
instructions on my cvs build. I found out that it was a royal
pain to sort through all the log files that valgrind generates when
you’re done. So, I didn’t do it again for a long time.

I finally got off my lazy duff and wrote a script to
assist in valgrinding Gnome. This script makes it easy to set
things up so my next sessions would run under valgrind, parse all the
generated logs to chuck out the bad ones and generate
a pretty statistics report, and clean things up and sets things back
to running without valgrind for subsequent logins.

I know, I know. I should have been working on the focus bugs (sorry
for my slowliness Christian in looking at your Epiphany patch–I’ll
get on it soon) or updating the modules that use gnome-desktop to
launch apps but…I’m a volunteer and I do whatever I want. 🙂
Anyway, take a look at the
statistics and the linked-to logs (with line-numbers!) and see if
you can fix anything. You’d be my hero if you could fix the
uninitialized warnings for Metacity, especially the ones that include
the meta_frames_ensure_layout function. 😉

Bugzilla tidbits

So far this month, there have been more than 1000 more bugs closed
than what have been reported (3859 versus 2822). It’s usually pretty
rare for us to even keep up, let alone do a massive cleaning like
this. So, if you closed any bugs this month, give yourself a pat on
the back. This was a joint effort by lots and lots of people.

Looking closer into the large number of bugs closed, there are 10
people who are on track to close over a thousand bugs by the end of
the year if they can keep up their current rate. (In particular,
Sebastien has well over 500 already! Go Sebastien!) For comparison
purposes, there were only 12 people who were able to close over a
thousand bugs when
2003 and 2004 bugstats were combined. Granted, it’s very unlikely
this crazy pace will continue, but it’s still awesome to look back on
the month.

And the icing on the cake is that Metacity is no longer on the weekly

bug summary. It’s just not buggy enough anymore, I guess. 😉

GNOME-KDE relations

Luis, I have to
respectfully disagree with your comment on GNOME-KDE relations; I
don’t think it’s that bad. In addition to .desktop files, we have
also at least agreed on the Extended Window Manager Hints (though we
suck at getting them up on the web in some form other than in a cvs
repository…). And the EWMH works too, because I’ve seen reports of
people successfully using KWin under Gnome (or using Openbox or
WindowMaker or other WMs). In particular, I’d like to point out that
Lubos Lunak (the KWin maintainer) has been particularly helpful to
(and patient with) me as I’ve been trying to get involved on the
wm-spec-list. He’s been very cool.

Random tidbits

Jamin: Here’s some google juice for
you. 😉

Got yet another solution to the floating point problem, this one from
Ingo Luetkebohle. He pointed out the existence of DBL_MANT_DIG and
frexp(), so one can simply check whether the difference in exponents
is greater than the number of digits in the mantissa. While it’s not
as short a solution as using the volatile keyword, this is still a
cool solution and I’m sure it can come in handy for a variety of other
situations. 🙂

The continuing floating point saga

So, I had a couple people email me due to my floating point posts
(which required them to google for my email address, though that isn’t
difficult with the uniqueness of my name). It appears that I probably
would have gotten even more comments had my blog been set up to accept
them. Does anyone know if pyblosxom can be set up to accept comments?
Any pointers?

In particular, Laurent Boulard went to a lot of work to help me track
the problem down and find a solution/workaround. Very cool. Also,
solnul AT gmx de just sent me an alternate solution with is rather
ingenious: use the volatile keyword. That’s so cute it just
made me smile. Very clever.

Miguel: very awesome document. Many thanks for pointing it out to
me. It’s similar to other chapters and documents I had read before,
but is more thorough. I especially liked this paragraph on page 82
(or page 252 depending on which counting method you trust in the
document). It was in reference to a nearly trivial 8-line program
which tends to produce different results depending on whether
optimizations are used:

Thus, on these systems, we can’t predict the behavior of the program
simply by reading its source code and applying a basic understanding
of IEEE 754 arithmetic. Neither can we accuse the hardware or the
compiler of failing to provide an IEEE 754 compliant environment: the
hardware has delivered a correctly rounded result to each destination,
as it is required to do, and the compiler has assigned some
intermediate results to destinations that are beyond the user’s
control, as it is allowed to do.

That was exactly the problem I was being bit by. The “destination” of
my floating point expressions in my program were put beyond my control
and thus I couldn’t force my program to provide consistent or expected
behavior. Luckily for me, gcc has an -ffloat-store option which
allows me to force the destination to be a certain desired precision,
thus allowing me to put to use my knowledge of IEEE 754. However, I’m
tempted to spurn that solution and go with the volatile
solution (which effectively does the same thing but is localized)
instead. 🙂

PEBKAC problems, and blaming something else

I hate it when I screw up, don’t realize my mistake, think that it was
something or someone else that was in error, and then publicly blame
the party I believe to be responsible. As you may have guessed, I
figured out what I did wrong in my floating point calculations that I
blamed on gcc. It turns out that I forgot about -ffloat-store (I had
read it before, but ignored it, despite all those classes that covered
floating point arithmetic…). That option is not the default
behavior, which I was assuming/expecting. (Actually, I may have had a
secondary problem at one point in that I didn’t store floating point
expressions in temporaries before comparing them, which is required
when using -ffloat-store in order to obtain correct behavior.)

To make sure I never forget this again, here’s a simple program
demonstrating my woes of this past day:

/* Run this program like this:
 *
 *   bash$ gcc -Wall -O2 precision.c -o precision
 *   bash$ echo "10 4e-16" | ./precision
 *
 * and then again like this
 *
 *   bash$ gcc -Wall -O2 -ffloat-store precision.c -o precision
 *   bash$ echo "10 4e-16" | ./precision
 *
 * Output from this program when compiled under gcc with the former
 * optimization level (depends on gcc version and platform,
 * apparently, so this may not be what you get) is:
 *
 *   i + a > i                 :  TRUE
 *   is_greater (i+a,i)        :  FALSE
 *   inline_is_greater (i+a,i) :  TRUE
 *
 * Output from this program when compiled under gcc using the latter
 * "optimization" level (this should be 100% reliable) is:
 *
 *   i + a > i                 :  FALSE
 *   is_greater (i+a,i)        :  FALSE
 *   inline_is_greater (i+a,i) :  FALSE
 *
 * Thus, this program only reliably behaves correctly if -ffloat-store
 * is used.
 */

#include "stdio.h"

inline double
inline_is_greater (double a, double b)
{
  return a > b;
}

double
is_greater (double a, double b)
{
  return a > b;
}

int
main ()
{
  int i;
  double a;
  double b;

  scanf ("%d %lf", &i;, &a;);

  b = i+a;

  printf ("i + a > i                 :  %s\n",
          (b > i)                    ? "TRUE" : "FALSE");
  printf ("is_greater (i+a,i)        :  %s\n",
          is_greater (i+a,i)         ? "TRUE" : "FALSE");
  printf ("inline_is_greater (i+a,i) :  %s\n",
          inline_is_greater (i+a,i)  ? "TRUE" : "FALSE");

  return 0;
}

Now, I just need to decide whether to turn on -ffloat-store for a
small section of my code (and how to do so–I’m fearing ugly Makefile
hacks and being forced to split certain files up that I don’t really
like splitting), or to leave it on for my entire code. My code runs
slightly faster if I don’t turn it on and it doesn’t hurt to have that
extra accuracy in places other than one or two critical sections, but
it does make it harder to compare results if the code is run across
multiple architectures (and even on the same one with different levels
of optimization)…

It’s not a bitwise issue either

Havoc, I’m not
worried about bitwise equality, and in fact it is not what I want to
test. I know that floating point representations are non-unique (one
can shift the mantissa right and increment the exponent to get the
exact same floating point number…). I’m merely worried about
whether i+a and i, as floating point values (i.e. after roundoff), are
considered unequal (and, if so, whether (i+a)-i is positive). The
compiler is apparently doing optimizations, because it reports that
(i+a) > i is true, but that double_comparison(i+a,i) is false. It
seems to be either not rounding the i+a before subtracting the i
(which I believe is possible with special floating point instructions
that use extended precision and is something that I might assume would
come from -ffast-math), or else making incorrect rearrangements by
assuming the associativity property holds.

Unsafe optimizations

I guess I’ll break from my standard practice and blog on something
non-Gnome. 🙂

I really need to find some kind of documentation on what a compiler is
allowed to optimize away and what it isn’t. Apparently, my
assumptions were totally wrong. If anyone has some good pointers on
this, I’d love to hear it. In particular…

I have some code where I need to know whether

i + a > i

Now, gcc (using the “safe”, or so I thought, level of optimization of
“-O2 -funroll-loops”) appears to optimize this as

a > 0

which is not the same. (i may be an int, but a is a double and I’m
not working with an infinite precision arithmetic package or anything
like that.) In fact, the latter can cause my program to fail horribly
(luckily it results in a situation that I can catch with an assert,
though aborting the program is pretty sucky behavior too).

I tried a couple different variations to try to trick the compiler out
of incorrectly optimizing things away: (i+a)-i > 0 had the same
problem. I tried the trickier i+a > i+1e-50 (not that I know whether
it is safe to assume |a|>1e-50, but it at least seemed fairly
reasonable). The compiler apparently optimized this to a > 1e-50 and
thus also failed. I tried sticking both i+a and i into variables and
then comparing the variables. That failed–unless I used the
variables elsewhere such as in a printf statement (i.e. if I’m trying
to debug the code it works, otherwise it doesn’t). To fix this, I had
to make a function:

bool
double_compare (double val1, double val2)
{
   return val1 > val2;
}

Then, calling double_compare(i_plus_a,i) would work (yes, i_plus_a is
a variable equal to i+a). Finally, something that works. However,
this isn’t very safe. It only works because gcc doesn’t yet do
aggressive inlining of functions (due to the length of double_compare,
it would be an obvious candidate for aggressive inlining).

I would have thought that such unsafe optimizations would have only
been done with -O3 or -ffast-math or something similar. Can anyone
tell me why these optimizations are considered okay for the compiler
to do at -O2, when they obviously produce incorrect results? What do
I do–depend on the assert to warn the person running the program that
they need to fix my code to outwit the “smart” compiler?

Update: Yes, I know how floating point arithmetic works. I am
aware that the compiler is transforming what I have to:

(double)i + (double)a > (double)i

and is then probably transforming this to the equivalent expression of

((double)i + (double)a) – ((double)i) > 0

The compiler is then probably either using extended precision
arithmetic to evaluate this (I want it to round after adding i and a
before moving on), or else assuming that addition and subtraction are
associative (which is NOT true for floating point numbers) to change
this to:

((double)i – (double)i) + (double)a > 0

If this were true, then the subtraction of i from itself could just be
removed. The problem is that I have situations where i is about 10 or
so, and a is 4e-16, and in these circumstances i+a is identically
equal to i in floating point arithmetic. I need to know whether i+a
and i are considered to be different floating point values (and, if
they are, whether their difference is positive).

A few productive weeks

The past couple of weeks have been pretty interesting.

Bugzilla and the Bugsquad

Luis is back and diving through loads of bugs. Andrew got things
setup so that new versions will automatically and intelligently
(i.e. no 2.9.1 if you only want 2.9.x) be added when you upload a new
tarball to the ftp servers. Both of them are now assisting with the
Evolution import that Gerardo has been working hard on, and making
noise about
cleaning up unneeded keywords. Olav has been fixing bugs like crazy.
Bugzilla statistics
for 2003 and 2004 are up, the list of bugs that bug-buddy displays
is now meaningful (though they need better summaries), and lots of people have
gone nuts triaging bugs (3 people with over 100 bugs closed in the
last week!)

Metacity

I’ve been trying to knock the Metacity bug count from the weekly bug
summary down below 150. We’re getting close, but we’re not quite
there yet. This also involved fixing up a few loose ends in
other modules as well.

Sebastien apparently found a whole bunch of focus stealing prevention
bugs that I was totally unware of. That was really cool. I’ve fixed
the two he reported, and am awaiting the reports for the other
issues…

I also went and put together a getting started
guide for those who might want to try their hand at hacking
Metacity. So, if this is something you’d be interested in, take a
look at the new docs and tackle some of the tasks pointed out. 🙂