How much space are eating your thumbnails?

Thumbnails are created by applications and thanks to a proposed draft are shared among desktops. But, it doesnot mean that every thumbnail stored in your home directory is useful for the purpose they were created. Some of them points to a file that doesnot exists anymore, some of them are broken images, and some of them were created by applications that doesnot respect the proposed draft.

Basically there are two size of thumbnails: normal (128×128 pixels) and large (256×256 pixels). Each thumbnail must contains at least two pairs of key/value, one of them is the URI of the original file and the another one is the last time the file was modified.

To get the file name of a thumbnail a MD5 sum must be applied to its URI. If you move the file to a new location, then the name of the thumbnail must be updated (also its metadata).

When you delete a file through Nautilus, this file is moved to the Trash folder. Furthermore, its thumbnail must be updated. Nautilus does it right, which is good. But, when you expunge the Trash, only the original file is deleted, not the thumbnail; which is bad, but easy to fix.

On the other hand, when you rename a folder, the next time the folder will be visited (in this case under a new name), the thumbnails will be regenerated, because for each URI there is no a thumbnails associated. Now, you have two thumbnails stored for the same file, but only one is valid. If you repeat this step often, your .thumbnails filder will get polluted of useless thumbnails.

Instead of renaming the folder, you can create a new folder, then move the group of files there, and finally, delete the old one. In this case Nautilus will not regenerate the thumbnails, it will update the thumbnails correctly. At least in the first hiearchy (I have not test it deeply).

The worst case happens when the files are moved or deleted by a non free desktop compliant (or kind of compliant) application, let’s say the shell. The thumbnails associated to those files will not be updated or deleted. (inotify to rescue?).

The average for a normal thumbnail is 25Kb of space while for a large one is 75Kb. If you maintain a lot of pictures in a long period of time (with all the file management involved), probably you have enough space wasted by useless thumbnails.

At least, I had. And I have the feeling that some other people, too. A time to live for thumbnails was requested, as is filed in bugzilla #150483.

Instead of delete my old thumbnails, I prefer to delete only the useless ones (in the sense of my first paragraph). So, I wrote a little script in Python (shorter than my comment) that estimate how much space I am wasting because of useless thumbnails.

17 Responses to “How much space are eating your thumbnails?”

  1. Jon Cooper says:

    Excellent idea – have tried running it myself, however, being a python noob…

    jcooper@m2:~$ python Desktop/thumbnail-checker.py
    Traceback (most recent call last):
    File “Desktop/thumbnail-checker.py”, line 181, in ?
    locale.setlocale(locale.LC_ALL, lang_code)
    File “/usr/lib/python2.4/locale.py”, line 381, in setlocale
    return _setlocale(category, locale)
    locale.Error: unsupported locale setting

    This is on a Dapper machine that should, in theory, have the python locales configured correctly. Any advice to offer?

    Thanks in advance if you do :)

    Jon

  2. zdzichu says:

    Nice! Some tool for removing thumbnails is lot better than
    find ~/.thumbnails/ -mtime +365 -exec rm {} +
    run from cron.

  3. Thomas says:

    In fact, using an MD5 sum is here I probably overkill, because the only thing needed is to hash the URI with a low collision, with a thumbnail filename of at most 255 chars. You don’t need to make it “hard” to guess the corresponding URI.

    Independently of the hash chosen, a nice idea would be to have two files for each thumbnail:
    - “[hashed-URI].thumb” containing the thumbnail itself
    - “[hashed-URI].URI” containing the URI of the file

    That would allow for some background tool that would clean up stale thumbnail files.
    The TTL value could depend on the URI type (e.g.: longer for file that exist, shorter for http URL)

    Just food for thought…

  4. Franck says:

    that’s just amazing, according to your script I got 164 Mb Orphans thumbnails! wich is probably true cause I have 315 Mb in my .thumbnails directory…

    Could you just add a delete option to your script?

    By the way, u made a very good job.

    Franck

  5. Erik says:

    I also ran into the set locale problem and according to http://docs.python.org/lib/module-locale.html it looks like
    locale.setlocale(locale.LC_ALL, lang_code)
    should be
    locale.setlocale(locale.LC_ALL, (lang_code, encoding))
    or just localte.setlocate(localte.LC_ALL, locale.getdefaultlocalte())
    both of these worked for me on Dapper. Other than that, nice tool, of course it would be ever better if there was a preview frame and a way to delete the thumbnails from within the app itself, but knowing no python, I’d have no idea how to do this.

  6. Erik says:

    Yeah, and I can’t spell the word locale, t’s like to slip in there quite a bit…

  7. Troy says:

    There is a dialog in Comix which let’s you clean up (part) of your thumbnail cache. Go to Edit->Manage Thumbnails.

    A package called Gofoto contains a file called thumbcache.py which creates the thumbnails. It works, but needs a bit of work to adhere to the proposal.

  8. locale.setlocale(locale.LC_ALL, “”) is simpler and does the same thing (and is probably more reliable).

    I also would like a button to delete useless thumbnails.

  9. Erik says:

    So I went ahead and added in support for viewing and deleting thumbnails. Patch is located here:
    http://alumni.imsa.edu/~eryanv/thumbnail_patch
    Be nice, it’s the first Python programming I’ve ever done in my life, so I’m sure it violates lots of Pythong conventions.
    For deleting you can either select a row entry and use the delete key or just use the Remove All button.
    I’ve tried to do a moderate amount of testing on this, however I didn’t have any “thumbnails” of the External or Invalid type.

  10. Jon, I’m also using Ubuntu Dapper and it works for me :-P
    Anyway, I’ll fix it soon. Thanks,

  11. Thomas,

    In fact, the MD5 sum is calculated only for the URI not for the original file. But, if the URI changes, the filename of the thumbnail also change.

  12. Erik, thanks for your patch. The UI needs some love, but good enough for your first try in PyGtk.

    For externals it means whatever URI different from ‘file:’. For instance, fonts:/// burn:///, sftp:///, etc.

  13. Zack Cerza says:

    183MB wasted, 174 of which are orphans.

    The total size of ~/.thumbnails is 297MB.

  14. Hola Germán, muy buena la aplicación y te agradezco por la contribución que has hecho. (y con parche para borrarlos mejor aún)

    Otras veces he tenido que borrar todo el .thumbnails y que luego se genere todo de vuelta, y ahora con esta aplicación pude borrar 59 MiB de thumbnails inexistentes :)

    Saludos desde Argentina.

  15. Sven Neumann says:

    The script could be even shorter and perhaps would run faster if it was using libgimpthumb. But wait, there are no Python bindings for libgimpthumb yet. We would very much welcome a patch for pygimp that adds those (against CVS please, there have been lots of changes in pygimp lately).

  16. hungerburg says:

    the script works nicely, also after applying the patch (cleanly) – great, this needs more exposure!

  17. Boris says:

    GThumb used to have this feature, and I don’t know why it has been dropped.

    I often have a huge amount of pictures to rename/resize before burning, and I use shell scripts with imagick’s convert, but I often open folders with Nautilus to check. Thus my .thumbnails is *always* dirty, and I usually delete it by hand. Your tool is exactly what I needed :)

    BUT there should be some sort of policy for this folder… It shouldn’t be allowed to grow & eat all your disk space without notice.