I’m currently working on Tracker to enable connectivity with remote services like Flickr, Facebook, Twitter, Identi.ca etc. The aim of this work is to keep in Tracker all the metadata concerning your online data to allow presenting all your documents in a unified way, no matter where they’re stored. Actually, we can describe pretty much anything using RDF, so we can index photos, messages, documents, friendship connections, etc.

For those who don’t know how Tracker works (the others can safely skip this paragraph), we basically have a central RDF store (tracker-store) and a set of programs to import data into the store. These programs are called miners. Currently, Tracker ships with two miners, one for files and one for applications.

As part of this year’s GSOC, I introduced a new class of miners, the web miners. As you can guess, those miners connect to some popular social sites, to import data into Tracker. Most of the popular so called social websites provide an API which allows accessing their data in a pretty comprehensive way. However, not all share the same license terms when it comes to exploiting the information you get from them.

For the miners, Flickr, Twitter, Identi.ca and PicasaWeb are not problematic: we can download the information and keep it in the database. For Facebook however, it’s not so simple. Particularly, the section III 2. or their policy which states that “You must not give data you receive from us to any third party, including ad networks”. While it’s pretty clear that you don’t want your data sold to ad networks, I think this also prevents us from “redistributing” the data via Tracker to other software that would access the RDF database… And so makes the Facebook miner violate the terms of use.

Of course, I’m no legal expert and am maybe doing a bad interpretation of the Facebook policy, meanwhile we’ll disable the Facebook miner by default when it’ll be merged into Tracker master.