Reblogged from Lanedo GmbH. blog
Every 6 months or so we produce a new stable release and for Tracker 1.2 we had some new exciting work being introduced. For those that don’t know of Tracker, it is a semantic data storage and search engine for desktop and mobile devices. Tracker is a central repository of user information, that provides two big benefits for the user; shared data between applications and information which is relational to other information (for example: mixing contacts with files, locations, activities and etc.).
Providing your own data
Earlier in the year a client came Lanedo and to the community asking for help on integrating Tracker into their embedded platforms. What did they want? Well, they wanted to take full advantage of the Tracker project’s feature set but they also wanted to be able to use it on a bigger scale, not just for local files or content on removable USB keys. They wanted to be able to seamlessly query across all devices on a LAN and cloud content that was plugged into Tracker. This is not too dissimilar to the gnome-online-miners project which has similar goals.
Before Tracker 1.2.0, files and folders came by way of a GFile and GFileInfo which were found using the GFileEnumerator API that GLib offers. Underneath all of this the GFile* relates to GLocalFile* classes which do the system calls (like lstat()) to crawl the file system.
Why do we need this? Well, on top of TrackerCrawler (which calls the GLib API), is TrackerFileNotifier and TrackerFileSystem, these essentially report content up the stack (and ignore other content depending on rules). The rules come from a TrackerIndexingTree class which knows what to black list and what to white list. On top of all of this is TrackerMinerFS, which (now is inaccurately named) handles queues and processing of ALL content. For example, DELETED event queues are handled before CREATED event queues. It also gives status updates, handles INSERT retries when the system is busy and so on).
To make sure that we take advantage of existing technology and process information correctly, we have to plugin at the level of the TrackerCrawler class.
Essentially we have a simple interface for handling open and close cases for iterating a container (or directory) called TrackerDataProvider interface (and TrackerFileDataProvider implementation for the default or existing local file system case).
That is followed up with an enumerator interface for enumerating that container (or directory). That is called TrackerEnumerator and of course there is a TrackerFileEnumerator class to implement the previous functionality that existed.
So why not just implement our own GFile backend and make use of existing interfaces in GLib? Actually, I did look into this but the work involved seemed much larger and I was conscious of breaking existing use cases of GFile in other classes in libtracker-miner.
How do I use it?
So now it’s possible to provide your own data provider implementation for a cloud based solution to feed Tracker. But what are the minimum requirements? Well, Tracker requires a few things to function, those include providing a real GFile and GFileInfo with an adequate name, and mtime. The libtracker-miner framework requires the mtime for checking if there have been updates compared to the database. The TrackerDataProvider based implementation is given as an argument to the TrackerMiner object creation and called by the TrackerCrawler class when indexing starts. The locations that will be indexed by the TrackerDataProvider are given to the TrackerIndexingTree and you can use the TRACKER_DIRECTORY_FLAG_NO_STAT for non-local content.
Crash aware Extractor
In Tracker 1.0.0, the Extractor (the ‘tracker-extract’ process) used to extract metadata from files was upgraded to be passive. Passive meaning, the Extractor was only extracting content from files already added to the database. Before that, content was concatenated from the Extractor to the file system miner and inserted into the database collectively.
Sadly with 1.0.0, any files that caused crashes or serious system harm resulting in the termination of ‘tracker-extract’ were subsequently retried on each restart of the Extractor. In 1.2.0 these failures are noted and files are not retried.
Thanks to work from Bastien Hadess, there have been a number of extractors added for electronic book and comic books. If your format isn’t supported yet, let us know!
Updated Preferences Dialog
Often we get questions like:
- Can Tracker index numbers?
- How can I disable indexing file content?
To address these, the preferences dialog has been updated to provide another tab called “Control” which allows users to change options that have existed previously but not been presented in a user interface.
In addition to this, changing an option that requires a reindex or restart of Tracker will prompt the user upon clicking Apply.
What else changed?
Of course there were many other fixes and improvements as well as the things mentioned here. To see a full list of those, see them as mentioned in the announcement.
Looking for professional services?
If you or someone you know is looking to make use of Open Source technology and wants professional services to assist in that, get in touch with us at Lanedo to see how we can help!
Comments are closed.
What does “indexing numbers” mean? From that dialog I would have absolutely no idea what checking/unchecking that box would do.