January 2006 – Christian Neumair

The following analysis is wrong. The callback organization is done in a not very obvious way and not very well documented but does NOT POSE ANY RISK OF LOSING DATA! DON’T PICK UP THIS STORY!

This is really a horrific story about design glitches which should remind you of double-checking that imperative programming forces you to take extremely care of what you’re doing.

GnomeVFS has an async file operation API, which ensures that your application does not block until the operation is finished. It offers you two callback hooks, one of them is called “synchronous” and the other one “asynchronous”. since 2001, the code works like:
For some transfer phases, both the sync and the async callbacks are called, and for others, whether only the sync callback is called or both depends on whether the async call was already called within a particular period of time. The callbacks may flag what further action should be taken in the Xfer process. Its interpretation depends on the overall state and progress of the xfer machinery.

It turns out that the original idea probably was that the sync callback is used for the really important stuff, being called on every single change of the xfer state machine, while the async callback was mainly meant to be used for expensive user interface updates. Unfortunately, the code is used in a way (cf. call_progress_often_internal and call_progress_uri) which does incorporate both the async and the sync callback’s return value, but the async callback is called after the sync callback, and so takes precedence over the sync retval. Calling the async callback after the sync one may make sense in some situations, like informing the user that something just changed in the sync callback, but the fact that the sync callback’s return value is overwritten by the async callback’s retval can really be a problem, considering that – as proven by call_progress_often_internal – it can’t always be predicted inside the sync callback whether the async callback will be called right afterwards or not.

In short, to work around this design glitch, you’ll be forced to either use only a sync or an async callback, or use both and let the async callback be aware of the sync callback’s last retval through some internal variable, so that it can overwrite the callback’s retval if desired. Notice that each time the async callback is called, the sync callback was called right before, so either the sync retval can be returned again or some intentionally override value.

Shockingly, the current Nautilus Xfer code does all its error handling in the async callback, which is not aware of the sync callback’s response, resulting in random behavior depending on the time since the last async invocation and the actual XFer state.

With some good luck, the async code is called each time the transfer does something important, but the GnomeVFS XFer copy_file, copy_directory code and some more rather important ones use the callback invocations which only have the async callback invocated if it wasn’t done already within a particular period, which is probably due to performance considerations.

Under some circumstances the async code is not invoked, and the Nautilus sync code blindly returns 0 in some situations where it is absolutely not desirable.
No error handling is done in Nautilus’ sync callback, it’s response does not depend on the state of the state machine, and whether its return value is used or not depends on the period since the last invocation.

Conclusions
a) GnomeVFS has design flaws
b) Nautilus has design flaws
c) fixing a partially broken or unintuitive API concept is very hard if not impossible, even if the API itself is powerful
d) Data loss is no good

Consequences
a) improve GnomeVFS docs, maybe change async/sync callback handling
b) fix Nautilus by making the async callback sync-aware and moving important stuff into the sync callback (done with some luck, needs testing)
c) write proper GnomeVFSXfer API documentation (TODO)

Update

I’m not so sure whether I got the whole sync/async process right anymore. According to the GnomeVFSProgressCallbackState, the async callback is called “periodically every few hundred miliseconds and whenever user interaction is needed”. I don’t like that architecture at all, and am inclined to modify it, so that the user always has to specify a sync callback, and the async callback would be optional with its retval being ignored.

Maybe Christian Kellner also was right some months ago when he concluded that a new GnomeVFS async file operation API is needed.

Month: January 2006

studies, USB mounts

When things go really bad…and nobody notices it for years

Debian/GFDL – could they do better?