200 lines of C to make a file duplicates finder and remover with Libgda

To illustrate the usage of virtual connections in Libgda, I wrote a small program which is just about 200 lines of code and which can either:

  • show all the file duplicates in a directory (based on actual file contents comparison, not just MD5 hash)
  • show the file duplicates for a file
  • show and delete the duplicates for a file (though it should not be used in production machines as there are probably still some bugs).

All this is made using some SQL code on a list of files made into a data model.

I thought it would be slow but in the end it’s quite quick as it takes only 5s to search for duplicates in GLib’s 1000 (not compiled) source files and 9s for GTK+’s 3500 files (the duplicates seem to be images from the documentation, but also for example gdk/win32/gdkspawn-win32.c and gdk/quartz/gdkspawn-quartz.c).

The code requires SVN trunk’s Libgda version and can be found in the http://svn.gnome.org/viewcvs/libgda/trunk/samples/DirDataModel
directory. Make sure to read the README file for more information.

The next step is to make a small program to help repair an F-Spot database if the user has moved files around (as I should not have done).

One thought on “200 lines of C to make a file duplicates finder and remover with Libgda”

Leave a Reply

Your email address will not be published. Required fields are marked *