Security From Whom?

Secure from whom? I was asked after my recent post questioning the positioning of Mir/Wayland as security improvement.

Excellent question — I am glad you asked! Let us take a look at the whos and compare.

To take advantage of the X11 protocol issues, you need to be able to speak X11 to the server. Assuming you haven’t misconfigured something (ssh or your file permissions) so other users’ software can talk to your
server, that means causing you to run evil X11 protocol code like XEvilTeddy. Who can do that? Well, there are probably a few thousand people who can. That is a lot, but most of application developers or maintainers who have to sneak the changes in via source form. That is possible, but it is slow, has high risk of discovery, and has problems with deniability. And choosing X11 as a mechanism is just plain silly. Just contact a command-and-control server and download the evil payload instead. There are also a smaller number of people who can attack via binaries, either because distributions take binaries directly from them or because the can change and re-sign binary packages. That would mean your entire distribution is compromised and choosing the X11 attack is really silly again.

Now, let us look at the who of a side-channel attack. This requires the ability to run code on your machine,
but it does not have to be code that can speak X11 to your X server equivalent. It can be sand-boxed code such as javascript even when the sand-box is functioning as designed. Who can do that? Well, anyone who controls a web server you visit; plus any adserver network used by such web servers; plus anyone buying ads from such adserver networks. In short, just about anyone. And tracking the origin of such code created by an evil advertiser would be extremely hard.

So to summarize: attacking the X11 protocol is possible by a relatively small group of people who have much better methods available to them; attacking via side-channel can be done by a much wider group who probably do not have better methods. The former threat is so small as to be irrelevant in the face of the second.

Look, it is not that I think of security in black and white terms. I do not. But if improved security is your motivation then looking at a Linux laptop and deciding that pouring man-decades into a partial replacement for the X server is what needs doing is a bad engineering decision when there are so many more important concerns, i.e., you are doing it wrong. And selling said partial X server replacement as a security improvement is at best misleading and uninformed.

On the other hand, if you are working on Mir/Wayland because that kind of thing floats your boat, then fine. But please do not scream “security!” when you break, say, my colour picker.


Recently, Matthew Garrett wrote about the abysmal X inter-app security situation. I.e., the total lack of a security situation. It came with an interesting proof-of-concept application, XEvilTeddy, demonstrating the ability to steal passwords and upload them elsewhere. Everybody knew such an application was possible; the interesting part was exhibiting one.

All good and fine, but one thing has been bothering me. Matthew wrote “if you’re using Snap packages on Mir (ie, Ubuntu mobile) then there’s a genuine improvement in security.” But is that really true?

Now, getting rid of X means that an application no longer can simply ask the X server to get all the keystrokes and that would seem to be an obvious improvement in security. It is, however, only an actual improvement in security if it is the only way of getting the keystrokes. It is not.

Recent years have seen a slew of side-channel attacks on, say, gpg. For example, see here and here. Basically, the cpu leaks information about the program it is running in the form of timing, current use, sound(!), electromagnetic radiation, etc. Some of these are observable from another process on the same machine, others from a laptop in the next room. If there is a direction in the field, my take on it is that attacks running on the same machine are considered a bit too easy nowadays.

It is hard to avoid side-channel leakage. gpg gets hardened every time an attack is discovered, but (say) firefox and gtk+ almost certainly leak like crazy.

“But such an attack is hard,” I hear you say. Maybe, but I do not think so. The thinking used to be that exploiting overflow of stack-based variables was hard, but all it took was one explanatory article and that cat was out of the sack.

If I was not such an incurably lazy person I would create XReallyEvilTeddy to demonstrate this. I am, so I have not. But it would be naïve to believe such applications do not exist. And it would therefore be naïve to believe that Mir and Wayland really do have better security.


We learn from Matthias that the right way to describe what happened with recent Gtk+ releases is that it changed.

And provided you are thinking of source code, that is not an unreasonable nomenclature: before it worked one way, now it works a different way — it changed. And source code that has to interact with Gtk+ used to do it one way, but now needs to do it another way — it needs to change.

But what if you are thinking of binaries? That is, existing, already-distributed binaries sitting on users’ machines. With the installation of the new Gtk+, such binaries changed from working to non-working. Such a binary evidently needs to change itself. Now, I have been known to prefer to make changes by editing binaries directly (interestingly, arguably thereby turning the binary into source code in the eyes of the GPL) but it is generally not a convenient way of making changes and as a Gnumeric developer I do not expect my users to do this. So how are the binaries on users’ machines going to change from non-working to working? I have no means of reaching users. I can and I will release changed source code, but binaries from that will not reach users anytime soon. Change is not a reasonable description for this; break is. Gtk+ broke Gnumeric. Again. And note, that some of the changes appear to be completely gratuitous.

Emmanuele is rather adamant that these changes were happening to API that was pre-announced to be unstable. I think he is mistaken in the sense that while it might have been decided that this API was unstable, I do not think it was announced. At least I do not seem to be able to find it. Despite prodding, Emmanuele does not seem to be able to come up with a URL for such an announcement, and certainly not an announcement in a location directed at Gtk+ application writers. It may exist, but if it does then it is not easy to find. I looked in the obvious places: The API documentation was not changed to state that the API was subject to change. The release announcements were not changed to state that the API was subject to change. The application development mailing list was not changed by sending a message warning that the API was subject to change. Sitting around a table and agreeing on something is not an announcement. If you want to announce something to application developers then you need to use a channel or channels aimed at application developers.

The situation seems to lend itself to Douglas Adams quotes. I have already used the destruction-of-Earth situation, so here is the earlier one involving the destruction of Arthur Dent’s house:

“But the plans were on display...”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’.”

Another GTK+ ABI Break

It is a familiar situation: a distribution updates Gtk+ to a supposedly-compatible version and applications, here Gnumeric, break.

This time I am guessing that it is incompatible changes to widget theming that renders Gnumeric impossible to use.

I would estimate that this has happened 15-20 times within the GTK+ 3.x series. One or two of those are probably Gnumeric accidentally relying on a GTK+ bug that got fixed, but the vast majority of cases is simply that perfectly fine, existing code stops working.

Imagine the C library changing the behaviour of a handful of functions every release. I suspect GTK+ maintainers would be somewhat upset over that. Nevertheless, that is what is presented to GTK+ application writers.

The question of whether GTK+ applications can be written remains open with a somewhat negative outlook.

Floating-Point Accuracy For Scaling Numbers

Formulas used in computation of special function often contain constant factors that cannot be precisely represented in floating point formats such as the typical “double”. For example, the base-10 log function:

double log10 (double x)
  return log(x)/log(10);

Chances are that you should be using the library version of this function; I am just using this simple function for illustration.

Several things affect how accurate result we will get out of this formula. Most importantly, we need an accurate log function. That is provided by the C library (in glibc, at least). Here, however, I want to talk about the last part, namely the scaling by log(10).

Clearly log(10) is irrational, so it will not have an exact representation as a double. In fact, this constant is just about a worst case for double. Its value is, 2*1.00100…10101|1000001… (where “|” marks the 53-bit cutoff for double), i.e., the value is just a hair above the midpoint between the two nearest representable numbers. If we use that value, we must live with it having a relative error of about 9.4e-17.

But we’re scaling with that number and there are two ways of doing that: dividing by the value directly or multiplying by its inverse 1/log(10). The latter value, when computed with higher accuracy that double allows, has the value (1/4)*1.10111…01110|0011001… which gives us a relative representation error of 2.5e-17, i.e., only about a quarter of the error in the direct case.

double log10 (double x)
  static double l10i = 0.4342944819032518276511289;
  return log(x) * l10i;

In practice this gives log10 several extra correct bits. I noticed this when using test values for Gnu Scientific Library’s complex log10 function for testing Gnumeric’s ditto.

I cannot possibly be the first to look into this, but I don’t recall ever having read about it. I have done the analysis for a set of selected constants and the results are:

For these constants, the direct value is best: pi, EulerGamma, log10(2).

For these constants, the inverse value is best: e, log(2), log(10), sqrt(5), sqrt(pi), sqrt(2pi).

For these constants, it’s a tie: sqrt(2) and sqrt(3). Any integer or half-integer power of two will cause a tie because the mantissa of the power and its inverse will be identical.

Note, that the decision is tied to “double”. For “long double” or “float” the results are going to be different.

ODF Plus Ten Years

It’s time for another five-year update on ODF for spreadsheets. Read the initial post from 2005 and the 2010 update for context. Keep in mind that I only have an opinion on ODF for spreadsheets, not text documents.

TL;DR: Better, but ODF still not suitable for spreadsheets.

So what’s new? Well, basically one thing: we now have a related standard for formulas in ODF spreadsheets! This is something that obviously occurred 5-10 years too late, but better late than never. The Wikipedia article on OpenFormula is a fairly amusing example of the need to justify and rationalize mistakes that seems to surround the OpenDocument standard.

OpenFormula isn’t bad as standards go. It has a value system, operators, and a long list of functions, for example. Nice Where it does have problems is in the many choices it allows implementations. For example, it allows a choice whether logical values are numbers or their own distinct type. That would not have been necessary if spreadsheets had been considered in the original standard — at that time OO could have bitten the bullet and aligned with everyone else.

Back to the standard proper. What has happened in the past five years? In a word, nothing. We still have a standard whose aim was to facilitate interoperability, but isn’t achieving it.

There are actually two flavours of the standard: strict and extended. “Strict” has a well-defined syntax complete with an xml schema. Extended is strict with add-your-own tags and attributes. No-one uses strict because there are common things that cannot be represented using it. Error values, for example. A simple line graph with a regression line and a legend, for example.

When the Gnumeric team needs to add something outside “strict” we first look to see if, say, LO has already defined a syntax would can use. We only invent our own when we have to and we try to read any LO extension that we can.

The OO/LO approach, however, appears to be to ignore any other producer and define a new extension. This is part of the “ODS by definition is what we write” mindset. The result is that we end up with multiple extensions for the same things.

So extensions are a free-for-all mess. In fact it is so big a mess that the schema for Gnumeric’s extensions that was hacked up a week ago appears to be the first. Let me rephrase that: for the past ten years no-one in the ODS world has been performing even basic document validation on the documents produced. There are document checkers out there, but they basically work by discarding anything non-strict and validating what is left.

There are also inherent performance problems with ODF. Many spreadsheets contain large areas of identical formulas. (“Identical” does not mean “textually identical” in ODF syntax but rather in the R1C1 syntax where “the cell to the left of this” always has the same name.) ODF has no concept of shared formulas. That forces reparsing of different strings that produce identical formulas over and over again. Tens of thousands of times is common. That is neither good for load times nor for file sizes.

A more technical problem with ODF is that the size of the sheet is not stored. One consequence is that you can have two different spreadsheets that compute completely different things but save to identical ODF files. At least one of them will be corrupted on load. That is mostly a theoretical concern, but the lack of size information also makes it harder to defend against damaged (deliberately or otherwise) input. For example, if a file says to colour cell A12345678 red we have no way of telling whether you have a damaged file or a very tall spreadsheet.

Gnumeric continues to support ODF, but we will not be making it the primary format.

Strace Service Message

Just a service reminder: application writers should run their applications under strace from time to time.

I just did for Gnumeric and discovered that on startup we were creating a large number of files in /tmp like this:

open("/tmp/gdkpixbuf-xpm-tmp.XAXESX", O_RDWR|O_CREAT|O_EXCL, 0600) = 9

I tracked this down to embedding icons in xpm format. The gdk-pixbuf loader for xpm is unable to load from memory so it creates a temporary file and loads from that. Ick! The solution is to fix and deploy the loader (impractical), not use xpm (possible), or to use preprocess='to-pixdata' when embedding.

How Does One Create A Gtk+ Application?

How does one go about creating a gtk+ working application? Is it even possible?


TL;DR: The Gtk+ ABI is broken so often that distribution-supplied binaries rarely work.

* * *

Imagine it is the time when Gtk+ 3.0 was released. You have a beautiful application with no bugs. For the sake of argument. Distributions ship it pre-compiled and life is good.

Then distributions update Gtk+ and everything based on GtkGrid breaks. You work around that in your source code, but distributions do not release new versions of your program until its next release.

In the meantime, Gtk+ breaks ABI compatibility for mouse wheel scrolling. Distributions update that and your program ceases to work with mouse wheels. You work around that in your source code, but distributions do not release new versions of your program until its next release.

In the meantime, Gtk+ breaks ABI compatibility of scrolled windows. Windows that used to have sane sizes now have near-zero size and when distributions update Gtk+, users of your application are not impressed. You work around that in your source code, but distributions do not release new versions of your program until its next release.

In the meantime, Gtk+ breaks ABI compatibility of redrawing. Parts of the gui that used to render correctly now stops updating at all. When distributions update Gtk+, your program ceases to work. You work around that in your source code, but distributions do not release new versions of your program until its next release.

Somewhere in the middle of this, Ubuntu decides to break scrollbars using a Gtk+ plugin. Your first hint that this has happened is when Ubuntu users start filing bug reports.

In the meantime, the layout rules for GtkGrid change again. When distributions update Gtk+, your program looks awful. You work around that in your source code, but distributions do not release new versions of your program until its next release.

Your program works with multiple screens. Or rather, it used to work with multiple screens. Then Gtk+ dropped support for it without notice.

Now I hear we are in for another round of breaking rendering because of some Wayland deficiency. It sounds like something that will require a runtime version check to deal with. In the meantime, if any distribution ships with updated Gtk+ but without your program updated, well, things will be broken.

* * *

The sum of all the above is that your application will have serious issues for anyone using the distribution supplied binary. And it is not because of anything you did wrong!

How does one shield oneself from this, i.e., how does one ensure that the binary compiled (say) three years (or months) ago continues to work reasonably? I don’t know. As far as I know, Gtk+ does not support parallel installations of 3.0, 3.2, …; if Gtk+ does support it, then none of the distributions do it. I’m sure it would be painful. Note, that using static copies of Gtk+ is not a viable solution because the binaries are created by distributions. They really have no way of knowing what Gtk+ version to use for any given application and they probably would not like to deal with the security implications of static linking.

(Note: the time ordering of the above is probably off here and there. There are probably also more ABI breaks that I do not remember right now.)

No, I am the CADT

Sorry, Luis, I am the CADT. I believe you have your timing wrong.

At the time, was run out of some server Miguel had set up in Mexico. It was some buggy, early version of Debian’s bug system that rolled over and died when someone shipped binary data. I.e., all the time.

It was also low on disk space. Consequently, in order to keep it running, I wrote scripts to mass close (and therefore let expire) thousands of bugs. It was that or not having a running bug system. Owen Taylor was most unhappy about the expiration — can’t really fault him — and, I believe, brought in the current bugzilla based system served by Redhat.

There was something about screensaver bugs having jwz’s name on them that caused him to get more than his fair share of the resulting emails. I forget the details of that.