Archive for February, 2006

ioctl, fsync – how to flush block device buffers?

Saturday, February 11th, 2006

Maybe somebody with deep kernel knowledge could help me to sort out the following problem I mentioned some time ago (bug report):

When ejecting USB mass storage, the volume monitor notifies the file manager that an unmount happened, although the USB stick is not yet ejected, but just unmounted. The difference in this particular case is that the buffers on the USB stick are still not yet flushed, i.e. not all data has definitly been written to the stick.

Because of the said notification, the USB stick icon is removed from the screen and the user is suggested that it is OK to remove the stick, although the buffer is not yet flushed and the “eject” command wasn’t even run.

We have multiple ways of resolving this:

a) block unmount signal, i.e. delay it until the whole ejection process is over. Problems: The ejection runs in a different thread, which does not know anything about the volume, but just about its URI. We’d have to introduce mutex-protected hash tables and some extra glue code. Not particularly attractive.

b) tell the underlying operating (linux in my case) that it should flush the storage buffer before the unmount takes place, thus ensuring a clean unmount experience.
Unfortunately, I was unable to figure out the right syscall, although I fiddled around with the kernel for a very notable amount of time.

candidate I: fsync(filedes)

POSIX claims:

“The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage device
associated with the file described by fildes.”

Oh well, it is blocking and returns, but /dev/foo isn’t flushed, no matter what parameter combination I try out. I wonder whether the interpretation of “transferred to the storage device” should be interpreted as of “data is consistent” or as of “data was piped through some connection”.

candidate II: sync()

Works, flushes buffers for all block devices. Problematic, though: It will sync even unrelated block devices, which may be a huge problem with many devices, maybe unmounted concurrently (at least to the user).

candidate III: ioctl(filedes, BLKFLSBUF, 0)

This one sounds promising. When investigating the issue I hoped this would work since e2fsprogs also uses this. However, I was wrong. While according to sys/mount.h it is part of “(t)he read-only stuff” the actual kernel code checks for the user having admin caps, and sets errno to EACCESS even if the user has rw permissions to the block device file.

For non-generic block devices, patches were submitted by others as mentioned in a comment in the said bug report, which reduce the amount of permissions required to flush a buffer. I wonder whether there is a traditional way of dealing with unflushed buffers in UNIX, because this has a lot to do with permission models which is where things get compilicated.

It would really be nice to have a simple portable way of achieving what I want: flushed buffers for a distinct block device.

Anybody? :)