Thumbnail Followup (#2)

You may remember my posting regarding thumbnail caching.

The problem is that all applications using GnomeThumbnail will read the entries ~/.thumbnails/normal or ~/.thumbnails/large [and ~/.thumbnails/failed] synchronously, as the first thumbnail request is made, which means a notable time [more than a few seconds] for a few hundred MB of thumbnails. There is a long-standing bug report about that issue.

Why the cache, in the first place?

The idea was to keep around an in-memory list (a cache) of all available thumbnails, and additionally to store the previously-requested thumbnails in memory, as it is likely that previously-requested thumbnails are used again.

The problematic aspect of the cache is that ~/.thumbnails may contain hundrets of megabytes, and thus be very big. This implies that the cache refresh is very expensive.

First idea: Get rid of cache

My initial idea was to solve this by removing the in-memory cache altogether. While users reported that this solved their problems and didn’t cause any performance impact, this may not be generalized: Currently, the user-visible thumbnail loading time is dominated by the time to synchronously read the thumbs, not by the cache-lookup time. This will not be true anymore as we have asynchronous thumbnail loading in Nautilus [which is not written yet, but should heavily improve performance].

Second idea: Multi-threaded cache voodoo

So the next idea was to use a multi-threaded solution, which would refresh the cache in a worker thread, and circumvent it entirely during refresh when requesting/generating a thumbnail. While it sounds good, it gets really nasty as you realize that POSIX doesn’t specify what happens as a directory changes during readdir(). Assuming you refresh the cache from disk [worker thread], and at the same generate a new thumbnail without using the cache [main thread], you’ll end up with a modified file system and you cannot be sure about the validity of any entries you read – you’d have to reopen the directory and reread it entirely. This means no cache hits during thumbnail generation, i.e. if you open two directories simultanously with Nautilus and the thumbs for one are generated, and the thumbs for the others aren’t, you’ll get no cache hits at all. Maybe we could use file change notification, but it is platform-dependant and we don’t want to explicitly write code against Linux or some other UNIX and have gazillions of #ifdefs in the code for dnotify, inotify etc.

Third idea: (to-be-written?) on-demand in-memory file systems

I think the smartest solution [that happens to mean no work for us, i.e. just drop our cache, cf. first idea] involves finding an on-demand in-memory file system, which forms an ideal cache for many applications. Dear lazyweb, do you know of any FS that does the following:

  • Store the entire FS contents in a file if it’s not mounted
  • Load the entire FS directory structure into memory as it is mounted
  • Load the file contents into memory as it is requested
  • Store the file contents into memory and back into the image as it is written
  • Live in user-space, possibly using shared memory, as a bonus you’d tell it anytime to prefetch the FS structure at any given point [in out case as the session starts]

You’d just mount it to ~/.thumbnails and let the voodoo happen. Yes, this is also platform-dependant, but it can be implemented for all platforms and doesn’t make us depend on the platforms file system capabilities, falling back to a performance-reduced but not memory-intense behavior.

Thanks to the guys at the freenode ##c channel for the fruitful discussion, especially wobster for the POSIX and RAMFS hints.

Comments?