Change

We learn from Matthias that the right way to describe what happened with recent Gtk+ releases is that it changed.

And provided you are thinking of source code, that is not an unreasonable nomenclature: before it worked one way, now it works a different way — it changed. And source code that has to interact with Gtk+ used to do it one way, but now needs to do it another way — it needs to change.

But what if you are thinking of binaries? That is, existing, already-distributed binaries sitting on users’ machines. With the installation of the new Gtk+, such binaries changed from working to non-working. Such a binary evidently needs to change itself. Now, I have been known to prefer to make changes by editing binaries directly (interestingly, arguably thereby turning the binary into source code in the eyes of the GPL) but it is generally not a convenient way of making changes and as a Gnumeric developer I do not expect my users to do this. So how are the binaries on users’ machines going to change from non-working to working? I have no means of reaching users. I can and I will release changed source code, but binaries from that will not reach users anytime soon. Change is not a reasonable description for this; break is. Gtk+ broke Gnumeric. Again. And note, that some of the changes appear to be completely gratuitous.

Emmanuele is rather adamant that these changes were happening to API that was pre-announced to be unstable. I think he is mistaken in the sense that while it might have been decided that this API was unstable, I do not think it was announced. At least I do not seem to be able to find it. Despite prodding, Emmanuele does not seem to be able to come up with a URL for such an announcement, and certainly not an announcement in a location directed at Gtk+ application writers. It may exist, but if it does then it is not easy to find. I looked in the obvious places: The API documentation was not changed to state that the API was subject to change. The release announcements were not changed to state that the API was subject to change. The application development mailing list was not changed by sending a message warning that the API was subject to change. Sitting around a table and agreeing on something is not an announcement. If you want to announce something to application developers then you need to use a channel or channels aimed at application developers.

The situation seems to lend itself to Douglas Adams quotes. I have already used the destruction-of-Earth situation, so here is the earlier one involving the destruction of Arthur Dent’s house:

“But the plans were on display...”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’.”

Another GTK+ ABI Break

It is a familiar situation: a distribution updates Gtk+ to a supposedly-compatible version and applications, here Gnumeric, break.

This time I am guessing that it is incompatible changes to widget theming that renders Gnumeric impossible to use.

I would estimate that this has happened 15-20 times within the GTK+ 3.x series. One or two of those are probably Gnumeric accidentally relying on a GTK+ bug that got fixed, but the vast majority of cases is simply that perfectly fine, existing code stops working.

Imagine the C library changing the behaviour of a handful of functions every release. I suspect GTK+ maintainers would be somewhat upset over that. Nevertheless, that is what is presented to GTK+ application writers.

The question of whether GTK+ applications can be written remains open with a somewhat negative outlook.

Strace Service Message

Just a service reminder: application writers should run their applications under strace from time to time.

I just did for Gnumeric and discovered that on startup we were creating a large number of files in /tmp like this:

open("/tmp/gdkpixbuf-xpm-tmp.XAXESX", O_RDWR|O_CREAT|O_EXCL, 0600) = 9

I tracked this down to embedding icons in xpm format. The gdk-pixbuf loader for xpm is unable to load from memory so it creates a temporary file and loads from that. Ick! The solution is to fix and deploy the loader (impractical), not use xpm (possible), or to use preprocess='to-pixdata' when embedding.

How Does One Create A Gtk+ Application?

How does one go about creating a gtk+ working application? Is it even possible?

Seriously.

TL;DR: The Gtk+ ABI is broken so often that distribution-supplied binaries rarely work.

* * *

Imagine it is the time when Gtk+ 3.0 was released. You have a beautiful application with no bugs. For the sake of argument. Distributions ship it pre-compiled and life is good.

Then distributions update Gtk+ and everything based on GtkGrid breaks. You work around that in your source code, but distributions do not release new versions of your program until its next release.

In the meantime, Gtk+ breaks ABI compatibility for mouse wheel scrolling. Distributions update that and your program ceases to work with mouse wheels. You work around that in your source code, but distributions do not release new versions of your program until its next release.

In the meantime, Gtk+ breaks ABI compatibility of scrolled windows. Windows that used to have sane sizes now have near-zero size and when distributions update Gtk+, users of your application are not impressed. You work around that in your source code, but distributions do not release new versions of your program until its next release.

In the meantime, Gtk+ breaks ABI compatibility of redrawing. Parts of the gui that used to render correctly now stops updating at all. When distributions update Gtk+, your program ceases to work. You work around that in your source code, but distributions do not release new versions of your program until its next release.

Somewhere in the middle of this, Ubuntu decides to break scrollbars using a Gtk+ plugin. Your first hint that this has happened is when Ubuntu users start filing bug reports.

In the meantime, the layout rules for GtkGrid change again. When distributions update Gtk+, your program looks awful. You work around that in your source code, but distributions do not release new versions of your program until its next release.

Your program works with multiple screens. Or rather, it used to work with multiple screens. Then Gtk+ dropped support for it without notice.

Now I hear we are in for another round of breaking rendering because of some Wayland deficiency. It sounds like something that will require a runtime version check to deal with. In the meantime, if any distribution ships with updated Gtk+ but without your program updated, well, things will be broken.

* * *

The sum of all the above is that your application will have serious issues for anyone using the distribution supplied binary. And it is not because of anything you did wrong!

How does one shield oneself from this, i.e., how does one ensure that the binary compiled (say) three years (or months) ago continues to work reasonably? I don’t know. As far as I know, Gtk+ does not support parallel installations of 3.0, 3.2, …; if Gtk+ does support it, then none of the distributions do it. I’m sure it would be painful. Note, that using static copies of Gtk+ is not a viable solution because the binaries are created by distributions. They really have no way of knowing what Gtk+ version to use for any given application and they probably would not like to deal with the security implications of static linking.

(Note: the time ordering of the above is probably off here and there. There are probably also more ABI breaks that I do not remember right now.)

No, I am the CADT

Sorry, Luis, I am the CADT. I believe you have your timing wrong.

At the time, bugs.gnome.org was run out of some server Miguel had set up in Mexico. It was some buggy, early version of Debian’s bug system that rolled over and died when someone shipped binary data. I.e., all the time.

It was also low on disk space. Consequently, in order to keep it running, I wrote scripts to mass close (and therefore let expire) thousands of bugs. It was that or not having a running bug system. Owen Taylor was most unhappy about the expiration — can’t really fault him — and, I believe, brought in the current bugzilla based system served by Redhat.

There was something about screensaver bugs having jwz’s name on them that caused him to get more than his fair share of the resulting emails. I forget the details of that.

Writing Tests is Humbling

I have spent some time recently writing import/export tests cases for Gnumeric. It is what you do when you see a mistake that it should not have been possible to make, in this case a hang when writing certain strings to the obsolete xls/biff7 format.

Writing these tests has been a very humbling experience. Highly recommended.

A lot of the code being subjected to tests is quite old: 10-15 years old. You would think that by now it would have had any obvious bugs beating out of it. No such luck. Not only were there ancient bugs — such as the direction of diagonal borders being flipped on load+save — but there were also bugs accidentally introduced when fixing other things.

Bugs happen, even where you think you are being careful. And that is not a problem. The problem arises when the bugs are not caught and make it into releases. This is where a brutal test suite is needed. Our test suite clearly was not evil enough for import and export of various formats, so I have been adding something I call round-trip tests for ods, xls/biff7, xls/biff8, and xlsx.

A round-trip test is a test that when we convert to a given format and back to our own Gnumeric format, nothing changes. Or, if something does change, then only parts we understand change. If the format is deficient, there is not much we can do. For example: ods cannot store patterned background or the sheet size; xlsx cannot store solver parameters; xls/biff7 cannot store arbitrary unicode; and xls/biff8 has a fixed sheet size of 64k-by-256. Note, that by itself a round-trip test does not guarantee that what we produce is correct. We could, hypothetically, have swapped division and multiplication are still gotten a perfect round-trip. To test that the generated files are correct one has to load the resulting sheet in Excel or LibreOffice (which, despite claims to the contrary, are what really defines xls/xlsx and ods formats). Unfortunately, I do not know how to script that so it is not automatic.

As a result of all the new tests, the recently released Gnumeric 1.12.12 should interact better with other spreadsheets.

Note: some of these new tests were probably a decade overdue. My excuse is that The Gnumeric Team is fairly small. I do hope that LO/OO already have an evil test suite, but I am not optimistic. I ran a few of my test sheets through LO and saw things like truncated strings.

Spreadsheets and the Command Line

Spreadsheets are not the most obvious type of document to manipulate from the command line. They are essentially a visual tool meant for interactively exploring data. Experience has shown, however, that there are certain spreadsheet tasks for which the command line is very useful and Gnumeric supplies a number of command line tools for this.

  • ssconvert converts spreadsheets from one format to another, for example from xls to ods. That sounds fairly simple — load one, save the other — and it pretty much is although there are things such as merging several files or extracting parts of files that add a little complication. Since Gnumeric can save as pdf, this tool also allows command-line printing of spreadsheets.
  • ssgrep is like grep is for text files. And it has about the same set of options.
  • ssindex is used by things like tracker and beagle to find of pieces of text in spreadsheet files.
  • ssdiff is a new tool in the upcoming release. It compares two spreadsheets and outputs a list of differences between them. There are three output modes so far: (1) a text format, (2) an xml format, and (3) a mode that outputs a copy of one of the input files with differing cells marked in neon yellow.

None of these programs are big: 300-1100 lines of C code, ssdiff being the largest only because of its three output modes.

Sorting Icons Theme Mess

In my long-running series on why themes are evil, I bring you the newest installment.

Consider the gtk stock icon GTK_STOCK_SORT_ASCENDING which is supposed to represent sorting elements to make them increasing according to some order, typically numerically or alphabetically. The icon for such an action is supposed to somehow convey what happens when it is pressed, all in, say, 24×24 pixels.

Take a look at different themes and how they implement the icon:

eog `find /usr/share/icons/ -print | grep sort-ascending`

This command will show you the icon images with some duplication due to multiple sizes.

Observations:

  • Some show an up-arrow, others show a down-arrow. Yet others show a diagonal arrow which isn’t as bad as it sounds because such arrows are annotated.
  • Some arrows have no annotations, some are annotated by “1..9”, and yet others are annotated to “a..z”.

Officially this is a mess[tm]. When annotations are present they hint at either numerical or alphabetical ordering which may or may not match what the application does. That’s minor. But when no annotations are present, the situation is far worse: my sort-ascending button looks like someone else’s sort-descending simply because of theme differences!

I don’t know how this mess came about, but it ought to be resolved. I suggest that when the icons look like vertical arrows, sort-ascending should point down because the elements of a list will then be increasing in the direction of the arrow.

Hunting Leaks in GTK+ Applications

Hunting leaks in GTK+ application used to be fairly simple: you would run your application under Purify (or, later, Valgrind) and the leak reports would pretty much tell you where to go plug.

That was a long time ago. In the meantime, GTK+ has gotten more complex over its iterations with more caches, more inter-object links, and deliberately-unfreed objects. On top of that, Valgrind and Purify are not particular well suited for finding the cause of leaks: by design they will tell you the backtrace of the call that allocated memory which was never leaked. In a ref-counted world that information is often quite insufficient: the leaked widget was allocated by the gui builder — oh goodie! What you really want to know is who holds the extra ref.

Enter the gobject debugger first introduced by Danielle here. After some major internal work, it has become mature.

I used this for Gnumeric, which was already one of the strictest leak-policed applications in GTK+ land. We leaked a number of GtkTreeModel/GtkListStore objects, for example. Easily fixed. Also, when touching print code or file choosers we leaked massively: one object per printer and/or two objects per file in the current directory. A sequence of bug reports (646815, 646462, 646446, 646461, 646460, 646458, 646457, and 645483) later, GTK+ is now behaving much better. All but the last of these have been fixed. With this, we are down to leaking about 20 Gtk-related objects: the recent-documents manager, im-module objects, the default icon factory, and theme engines. Basically stuff that GTK+ does not want to release.

Please try this on your applications, especially if they are long-running. I still have to kill gnome-terminal, banshee, metacity, and gvfsd from time to time when they grow to absurd sizes. That doesn’t have to be caused by GObject leaks, but it might very well. (I know some of these samples are obsolete; I am not naive enough to believe their replacements would fare any better.)

This might be a good time to remind people that g_object_get and gtk_tree_model_get will give you a reference when you use them to retrieve GObjects. You need to unref when done. The problem is that it is not immediately clear from the g_object_get/gtk_tree_model_get call whether existing code is getting objects, so a certain knowledge of the code is needed.