What’s new in Tracker 1.2?

Minions-Happy

Reblogged from Lanedo GmbH. blog

Every 6 months or so we produce a new stable release and for Tracker 1.2 we had some new exciting work being introduced. For those that don’t know of Tracker, it is a semantic data storage and search engine for desktop and mobile devices. Tracker is a central repository of user information, that provides two big benefits for the user; shared data between applications and information which is relational to other information (for example: mixing contacts with files, locations, activities and etc.).

Providing your own data

Earlier in the year a client came Lanedo and to the community asking for help on integrating Tracker into their embedded platforms. What did they want? Well, they wanted to take full advantage of the Tracker project’s feature set but they also wanted to be able to use it on a bigger scale, not just for local files or content on removable USB keys. They wanted to be able to seamlessly query across all devices on a LAN and cloud content that was plugged into Tracker. This is not too dissimilar to the gnome-online-miners project which has similar goals.

The problem

Before Tracker 1.2.0, files and folders came by way of a GFile and GFileInfo which were found using the GFileEnumerator API that GLib offers. Underneath all of this the GFile* relates to GLocalFile* classes which do the system calls (like lstat()) to crawl the file system.

Why do we need this? Well, on top of TrackerCrawler (which calls the GLib API), is TrackerFileNotifier and TrackerFileSystem, these essentially report content up the stack (and ignore other content depending on rules). The rules come from a TrackerIndexingTree class which knows what to black list and what to white list. On top of all of this is TrackerMinerFS, which (now is inaccurately named) handles queues and processing of ALL content. For example, DELETED event queues are handled before CREATED event queues. It also gives status updates, handles INSERT retries when the system is busy and so on).

To make sure that we take advantage of existing technology and process information correctly, we have to plugin at the level of the TrackerCrawler class.

The solution

Essentially we have a simple interface for handling open and close cases for iterating a container (or directory) called TrackerDataProvider interface (and TrackerFileDataProvider implementation for the default or existing local file system case).

That is followed up with an enumerator interface for enumerating that container (or directory). That is called TrackerEnumerator and of course there is a TrackerFileEnumerator class to implement the previous functionality that existed.

So why not just implement our own GFile backend and make use of existing interfaces in GLib? Actually, I did look into this but the work involved seemed much larger and I was conscious of breaking existing use cases of GFile in other classes in libtracker-miner.

How do I use it?

So now it’s possible to provide your own data provider implementation for a cloud based solution to feed Tracker. But what are the minimum requirements? Well, Tracker requires a few things to function, those include providing a real GFile and GFileInfo with an adequate name, and mtime. The libtracker-miner framework requires the mtime for checking if there have been updates compared to the database. The TrackerDataProvider based implementation is given as an argument to the TrackerMiner object creation and called by the TrackerCrawler class when indexing starts. The locations that will be indexed by the TrackerDataProvider are given to the TrackerIndexingTree and you can use the TRACKER_DIRECTORY_FLAG_NO_STAT for non-local content.

Crash aware Extractor

In Tracker 1.0.0, the Extractor (the ‘tracker-extract’ process) used to extract metadata from files was upgraded to be passive. Passive meaning, the Extractor was only extracting content from files already added to the database. Before that, content was concatenated from the Extractor to the file system miner and inserted into the database collectively.

Sadly with 1.0.0, any files that caused crashes or serious system harm resulting in the termination of ‘tracker-extract’ were subsequently retried on each restart of the Extractor. In 1.2.0 these failures are noted and files are not retried.

New extractors?

Thanks to work from Bastien Hadess, there have been a number of extractors added for electronic book and comic books. If your format isn’t supported yet, let us know!

Updated Preferences Dialog

Often we get questions like:

  • Can Tracker index numbers?
  • How can I disable indexing file content?

To address these, the preferences dialog has been updated to provide another tab called “Control” which allows users to change options that have existed previously but not been presented in a user interface.

tracker-preferences-1.2

In addition to this, changing an option that requires a reindex or restart of Tracker will prompt the user upon clicking Apply.

What else changed?

Of course there were many other fixes and improvements as well as the things mentioned here. To see a full list of those, see them as mentioned in the announcement.

Looking for professional services?

If you or someone you know is looking to make use of Open Source technology and wants professional services to assist in that, get in touch with us at Lanedo to see how we can help!

tracker-search gets colour & snippets!

Recently Carlos added FTS4 and snippet support to Tracker. We merged that to master after doing some tests and have reduced the database size on disk by doing this. I released 0.15.2 yesterday with the FTS4 work, and today I decided to add a richer experience to tracker-search.

Below you can see me searching for passport and sue found in some of the documents indexed on my machine. The colour there is quite nice to separate hits and snippets/contexts where the terms were found. This search without any arguments really will search ALL resources in the database:

tracker-search-snippets

This second screenshot shows searching for love with all music in particular. So you can use this for all areas of tracker-search:

tracker-search-snippets2

With any luck, we will be releasing a 0.16.0 in time for the next GNOME release with this all available in!

Tracker Needle with improved tagging

Given there have been a number of improvements to tracker-needle recently, I thought I would make a video to highlight some of them. A quick summary:

  • Searching for “foo” now finds files tagged with “foo”
  • Searches are limited to 500 items per category/query (to avoid abusing the GtkTreeView mainly)
  • A tag list is now available to show all hits by tags
  • Tags can be edited by the context menu per item (planned to be improved later)

Really nice to have tagging supported properly in tracker-needle now.

Improved Tracker Preferences for Indexed Locations

Something I have been meaning to do for a long time, is to update the preferences dialog for Tracker to easily add locations which are special user directories (as per the GUserDirectory locations).

I wanted to do this in such a way that:

  • It was really easy to toggle locations as recursive or not
  • The file chooser was only necessary for non-standard locations
  • Better use of the space was made by integrating the two lists (previously) for single directory and recursive directory indexing
  • I could fix a few issues which had been reported when it came to saving using the special symbols (e.g. &DESKTOP for G_USER_DIRECTORY_DESKTOP, etc.) when one or more user directories evaluated to the same location

The result is this (now in master and 0.12.2 when it is released):

Tracker extensions for Firefox & Thunderbird

Recently Adrien Bustany blogged about the Firefox extension for Tracker and has yet to blog about his Thunderbird extension work.

As you would expect, the Firefox extension syncs bookmarks to Tracker (in that direction only for now) and the Thunderbird extension sends email to Tracker to be indexed (even full text content of emails which our Evolution miner doesn’t do because of the system stress it causes). This is really quite superb work from Adrien and tracker-needle already supports bookmarks and emails so it all just works after a make install (into the $prefix where Firefox/Thunderbird are installed). Currently the Thunderbird extension requires version >= 5.0 (works with betas too), and the Firefox extension requires version >= 4.0 (and supports 5.0).

These works have been imported using a pretty cool tool after I felt more comfortable using that to import Adrien’s subtrees into Tracker’s git repository. I did read up on the coolest merge ever from Linus but it felt more like a hack to me to do it that way. Still, I guess Linus knows what he is doing πŸ™‚

So now we have both plugins imported with full history into git. The thunderbird branch was merged to master today and the firefox branch will be merged this week hopefully pending Adrien’s review. Great stuff!

GtkSearchEngineTracker

GTK+ has had support for Tracker for a while as a backend search engine used in the GtkFileChooser. At GUADEC this year, the Tracker team were asked to update the backend at the GTK+ team meeting. I found time this week to add support and push my changes to the tracker-with-libtracker-sparql branch.

For now, I have dropped support for all older versions of Tracker because it really is a mess to maintain and GTK+ 3.0 should really be using the latest and greatest APIs anyway. The other change I made was to support searching by filenames not the content of files. There is a #define in the .c file (FTS_MATCHING) which allows switching between using FTS (Full Text Search) and filenames (which are usually part of an FTS search anyway). For me, finding a file based on the name itself seems more intuitive for the GtkFileChooser and tends to yield results I am really looking for better than the FTS matching. In most cases, I don’t want to find a file based on some content when choosing a file. I would appreciate any comments on this.

A demonstration of the new functionality:

Tracker Needle

What is it?
In short it is a replacement for tracker-search-tool. I know the name is a tad lame, but I used that for lack of a better alternative at this point.

Why?
We have quite a few people asking for things which are simply not available in tracker-search-tool and I wanted a good excuse to learn Vala. So, tracker-needle was born out of my desire to learn Vala.

Where?
Currently, it lives in a branch on GNOME’s Tracker GIT repository. I am hoping we can merge this to master before we start doing stable releases (which is going to be quite soon).

What next?
Well, right now, it is fairly basic and I plan to add more polish and more “views”. I have in mind to add some sort of photo/icon view for images only and perhaps also some sort of category chooser type view. Possibly some way of displaying and setting tags nicely would be good too. Any ideas appreciated. Note, the idea here isn’t to replace Nautilus or Zeitgeist, this is purely a tool for users to try Tracker with and use occasionally. Ultimately many of the Tracker team believe Tracker should be integrated with applications, not a separate application to search and I tend to agree.

Video?
Eye candy for anyone interested πŸ™‚

Tracker: branches branches branches

Recently, there has been so much work going into Tracker master. For a while now we have been averaging between 1 and 2 branches a week being merged into master. So I thought I would highlight some of the sweet work going into Tracker at the moment:

Dropping libinotify
For some years, we have been using an imported version of libinotify in our source tree to do the things not available in GIO’s monitoring API. One of the main reasons we didn’t move to GIO’s API was that the model we were using didn’t fit the model GIO used. In Tracker, if you monitored a directory and it moved to another location, we moved the monitor to that location. With GIO, if you monitor a directory it doesn’t move, which makes sense. Thanks to Aleksander Morgado, we have now merged his drop-inotify branch into master. It is so nice to be able to remove that imported library now.

D-Bus with file descriptors
We are always trying to reduce the memory footprint of Tracker. Recently Adrien Bustany finished implementing support for DBUS_TYPE_UNIX_FD in Tracker. The nice thing about this, is that we now don’t copy masses of memory from one place to another just for pushing the data between two processes. Adrien and Philip have previously blogged about this, but more recently, Adrien finished support for this by also implementing this for the tracker-miner-fs and tracker-extract communication. Effectively the same data is transported between those as tracker-miner-fs and tracker-store, with the difference that tracker-store also receives file specific information appended to the SPARQL message (like size, modified dates, etc).

To use this you need D-Bus 1.3.1, it is nice to see these sort of performance improvements in Tracker. Great work Adrien thanks!

Direct access
Bastien reported a bug not so long ago about adding support for direct access to the databases via a library API. This week, we started a branch to get this work under way. While we do this, we are considering re-writing the libtracker-client API using Vala and improving the old API substantially.

Git branch management
Due to the high number of branches we create, I decided to do some sort of clean up. I created a script to list all the branches and relevant information about them to be able to email the mailing list and check if everyone was happy with removing old branches. I thought this might be useful to other projects. Here is the script I used:

#!/bin/sh

if ! git rev-parse --git-dir > /dev/null; then
        echo "This is not a git directory"
        exit 1
fi

if test $# -lt 1; then
        remote=origin
else
        remote=$1
fi

git ls-remote $remote | while read LINE; do
        commit=`echo $LINE | sed 's/ .*//'`
        name=`echo $LINE | sed 's/.* //'`

        if [ -z $name ]; then
                continue;
        fi

        case $name in
        refs/heads/master)
                continue
                ;;
        refs/heads/*)
                shortname=`echo $name | sed 's@.*/@@'`
                if ! git log --max-count=1 --pretty=format:"Branch '$shortname' -- last commit was %ar by %an (%h)" $commit 2>/dev/null; then
                        echo
                        echo "Your checkout doesn't contain commit `echo $commit | sed 's/^\(.......\).*/\1/'` for branch $shortname"
                        echo
                        exit 1
                fi
                ;;
        esac
done

This produces output like:

Branch 'album-art-to-libtracker-extract' -- last commit was 3 months ago by Martyn Russell (d1f1384)
Branch 'albumart-quill' -- last commit was 8 months ago by Philip Van Hoof (a397a0f)
Branch 'anonymous-file-nodes' -- last commit was 5 months ago by Carlos Garnacho (60658be)
Branch 'async-queries' -- last commit was 2 months ago by Carlos Garnacho (88358dd)
Branch 'async-queries-due' -- last commit was 10 weeks ago by JΓΌrg Billeter (52634ce)
...

Thanks to Sven Herzberg for some of the improvements to the original script. Most importantly, the use of git ls-remote. This makes sure that local branches are not used which may have been removed in the origin repository.

Tracker + Totem

Bastien has been complaining that the Tracker plugin for Totem doesn’t work any more since 0.6. So I decided to see how quickly I could update it today. All in all, it only took me a few hours and here it is. You will have to excuse the crappy file naming and video tests I have to play with – normal users probably title these a bit better I think πŸ™‚

On another note, we released 0.7.1 on Friday gone with some really nice fixes since the first release. We plan on doing another release this Friday too.