Tracker Needle Update

What is it? Why? Where?
These questions were all covered in my previous blog about tracker-needle.

What’s changed
Well, a number of small changes have been going on based on user requests and making sure all the tracker-search-tool old features are covered. Specifically:

  • History is saved using a nice new editable combo box
  • Tags can now be seen (though that’s about it for now, more work is needed here)
  • A progress spinner has been added when queries take longer than the user might expect
  • Support for Emails has been added

You can see all these new features in the latest video:

What’s next?
I would love to get tags working better, allowing adding/removing and searching by them. Generally, searching by tags isn’t hard, what is hard is making the search categories include tags and identifying them with files. It can be done, but doing it in a fast way is not quite so trivial. I already have a patch I am working on for this.

Tracker Needle

What is it?
In short it is a replacement for tracker-search-tool. I know the name is a tad lame, but I used that for lack of a better alternative at this point.

Why?
We have quite a few people asking for things which are simply not available in tracker-search-tool and I wanted a good excuse to learn Vala. So, tracker-needle was born out of my desire to learn Vala.

Where?
Currently, it lives in a branch on GNOME’s Tracker GIT repository. I am hoping we can merge this to master before we start doing stable releases (which is going to be quite soon).

What next?
Well, right now, it is fairly basic and I plan to add more polish and more “views”. I have in mind to add some sort of photo/icon view for images only and perhaps also some sort of category chooser type view. Possibly some way of displaying and setting tags nicely would be good too. Any ideas appreciated. Note, the idea here isn’t to replace Nautilus or Zeitgeist, this is purely a tool for users to try Tracker with and use occasionally. Ultimately many of the Tracker team believe Tracker should be integrated with applications, not a separate application to search and I tend to agree.

Video?
Eye candy for anyone interested 🙂

Tracker: Direct Access branch merged in master

A while back we had this bug from Bastien: 613255 – “Read-only, non-DBus, store access”. For the past 5 or 6 weeks we have been working on this. Initially the idea was just to do direct access, but once we got started, we realised that the libtracker-client API wasn’t really good enough and we would like to extend it. But we didn’t want the old API there either, so we came up with this new library to supersede libtracker-client. For now we package both, but all functions in libtracker-client are marked as deprecated at this point.

So what do we have now? Fundamentally we have ONE API for different backends using different technologies. To summarise:

1. D-Bus (libtracker-bus, backend)
2. Direct Access (libtracker-direct, backend)

D-Bus – Read/Write Access
Depending on the version, we either use FD passing (requires > 1.3.1) to avoid copious memory copies OR we use D-Bus glib marshalling which represents the worst performance you can get from Tracker (though it is still usable).

Direct Access – Read Only Access
This is based on a library we had internally in Tracker called libtracker-data. We merged some things to make this happen (like libtracker-db) but generally, we sit on top of this library in libtracker-direct.

Plugins?
The backends are dynamically loaded at run time depending on the client’s needs (i.e. if you only ever do SELECT type queries, you’ll use the direct-access backend).

How does the API look for libtracker-sparql?
The idea here was to facilitate all the old API needs and some new ones. What we wanted was less API bloat and to incorporate some of the things we had in the code base all spread out in multiple libraries into this libtracker-sparql. These things include:

  • Connections – used in libtracker-client, we wanted some common way to get a connection to Tracker regardless of what backend was in use.
  • Cursors – used in libtracker-db and wanted as a public API for some time, but not possible without WAL (Write Ahead Logging) in SQLite 3.7. Now we share the same API internally and externally.
  • Builder – used in tracker-extract for building SPARQL queries for selecting/inserting data.
  • Utilities – used in tracker-extract, the miners, etc. for escaping text used in SPARQL queries and some other common functionality.
  • Example
    So this is what you might expect with the new API:

    
    TrackerSparqlConnection *connection;
    GError *error = NULL;
    const gchar *query = "SELECT ?class WHERE { ?class tracker:writeback true }";
    
    connection = tracker_sparql_connection_get (&error);
    
    if (!connection) {
    	g_printerr ("%s: %s\n", _("Could not establish a connection to Tracker"), error ? error->message : _("No error given"));
    	g_clear_error (&error);
    	return;
    }
    
    /* The NULL below is the GCancellable */
    cursor = tracker_sparql_connection_query (connection, query, NULL, &error);
    
    if (error) {
    	g_printerr ("%s, %s\n", _("Could not query classes"), error->message);
    	g_error_free (error);
    	g_object_unref (connection);
    	return;
    }
    
    if (!cursor) {
    	g_print ("%s\n", _("No classes were found"));
    } else {
    	while (tracker_sparql_cursor_next (cursor, NULL, NULL)) {
    		g_print ("%s\n", tracker_sparql_cursor_get_string (cursor, 1, NULL));
    	}
    
    	g_object_unref (cursor);
    }
    
    g_object_unref (connection);
    

    So, now we have direct access. I have ported my tracker-needle experimental application to it and it seems faster than tracker-search-tool (which it aims to supersede). I will blog about tracker-needle later, but for now, direct-access is available in master for anyone interested to try it out! This has been a huge team effort with Jürg, Philip and Aleksander Morgado and myself involved. Try it out, any comments about the API are appreciated and it is documented quite nicely too.

Tracker: branches branches branches

Recently, there has been so much work going into Tracker master. For a while now we have been averaging between 1 and 2 branches a week being merged into master. So I thought I would highlight some of the sweet work going into Tracker at the moment:

Dropping libinotify
For some years, we have been using an imported version of libinotify in our source tree to do the things not available in GIO’s monitoring API. One of the main reasons we didn’t move to GIO’s API was that the model we were using didn’t fit the model GIO used. In Tracker, if you monitored a directory and it moved to another location, we moved the monitor to that location. With GIO, if you monitor a directory it doesn’t move, which makes sense. Thanks to Aleksander Morgado, we have now merged his drop-inotify branch into master. It is so nice to be able to remove that imported library now.

D-Bus with file descriptors
We are always trying to reduce the memory footprint of Tracker. Recently Adrien Bustany finished implementing support for DBUS_TYPE_UNIX_FD in Tracker. The nice thing about this, is that we now don’t copy masses of memory from one place to another just for pushing the data between two processes. Adrien and Philip have previously blogged about this, but more recently, Adrien finished support for this by also implementing this for the tracker-miner-fs and tracker-extract communication. Effectively the same data is transported between those as tracker-miner-fs and tracker-store, with the difference that tracker-store also receives file specific information appended to the SPARQL message (like size, modified dates, etc).

To use this you need D-Bus 1.3.1, it is nice to see these sort of performance improvements in Tracker. Great work Adrien thanks!

Direct access
Bastien reported a bug not so long ago about adding support for direct access to the databases via a library API. This week, we started a branch to get this work under way. While we do this, we are considering re-writing the libtracker-client API using Vala and improving the old API substantially.

Git branch management
Due to the high number of branches we create, I decided to do some sort of clean up. I created a script to list all the branches and relevant information about them to be able to email the mailing list and check if everyone was happy with removing old branches. I thought this might be useful to other projects. Here is the script I used:

#!/bin/sh

if ! git rev-parse --git-dir > /dev/null; then
        echo "This is not a git directory"
        exit 1
fi

if test $# -lt 1; then
        remote=origin
else
        remote=$1
fi

git ls-remote $remote | while read LINE; do
        commit=`echo $LINE | sed 's/ .*//'`
        name=`echo $LINE | sed 's/.* //'`

        if [ -z $name ]; then
                continue;
        fi

        case $name in
        refs/heads/master)
                continue
                ;;
        refs/heads/*)
                shortname=`echo $name | sed 's@.*/@@'`
                if ! git log --max-count=1 --pretty=format:"Branch '$shortname' -- last commit was %ar by %an (%h)" $commit 2>/dev/null; then
                        echo
                        echo "Your checkout doesn't contain commit `echo $commit | sed 's/^\(.......\).*/\1/'` for branch $shortname"
                        echo
                        exit 1
                fi
                ;;
        esac
done

This produces output like:

Branch 'album-art-to-libtracker-extract' -- last commit was 3 months ago by Martyn Russell (d1f1384)
Branch 'albumart-quill' -- last commit was 8 months ago by Philip Van Hoof (a397a0f)
Branch 'anonymous-file-nodes' -- last commit was 5 months ago by Carlos Garnacho (60658be)
Branch 'async-queries' -- last commit was 2 months ago by Carlos Garnacho (88358dd)
Branch 'async-queries-due' -- last commit was 10 weeks ago by Jürg Billeter (52634ce)
...

Thanks to Sven Herzberg for some of the improvements to the original script. Most importantly, the use of git ls-remote. This makes sure that local branches are not used which may have been removed in the origin repository.

Tracker Release Candidate 1

0.7.28
Today we released 0.7.28. We are considering this our last unstable release for 0.7 before we do 0.8. So long as there are no major regressions, this time next week, we hope to have our first stable release with the super shiny stuff we have been working on for over 6 months.

Using tracker-sparql
Recently I added support to list classes which we notify of changes in the database. This is generally quite useful and a common question on IRC:

$ tracker-sparql --list-notifies
Notifies: 23
   Continue reading →

Tracker 0.7.20 Released

Managed to get 0.7.20 out of the door. Not long now before we start 0.8 releases. I want to start doing this within the next few weeks if possible.

Tracker is looking great right now though. The core team has been exemplary in recent weeks.

Roll on 0.8 🙂

Tracker Update

libtracker-miner

So Carlos and I have been working on libtracker-miner for the last few months. Since tracker-store (formerly known as trackerd) is now handling all reads/writes from/to database and doing it much faster than ever before with a much more expressive language to query with (SPARQL), we had to merge the old tracker-indexer and parts of trackerd from the 0.6 branch into one binary that could crawl the file system, insert file specific metadata and call tracker-extract for file type metadata (for example: none “file” data, but actually data like image height, width, etc.).

As we had to do this anyway, we took the opportunity to refactor the parts we were unhappy with and to make libtracker-miner a library which other “data miners” could use. This gives the following things:

  • DBus integration for free
  • An API to find other miners both available and running
  • An API to get/set status, progress, name and description for each miner
  • An API to pause/resume each miner
  • Signals to know when all miners or specific miners start/stop/pause/resume/error/progress.

More recently, Adrien Bustany has been working on “bridges”, which in fact are the same principle, they are miners of data but for web applications like:

  • Facebook
  • Flickr
  • Twitter
  • etc.

We are working together to integrate this into the “miner” framework we already have set up in master right now and it is quite exciting to see integration in other areas than just desktop applications.

Additionally, Philip is making Evolution use the same miner API so we will have support for 3 miners as standard out of the box for:

  • Email data
  • File data
  • Application data

tracker-status-icon

Formerly known as tracker-applet, this has been refactored by Carlos recently to work with the new miner API too, so now you can see (much like the network manager) a list of miners and their state/progress. It also allows pausing/resuming of ALL or single miners at a time which is very useful.

tracker-preferences

The tracker-preferences application was also really out of date. The whole configuration system has changed since 0.6 so we decided to use Vala and GtkBuilder to build the new dialog. This dialog only services tracker-miner-fs preferences right now because they are really the only settings that make any difference to the user at this point. There is some polish that is needed here, but it looks good so far:

screenshot-tracker-preferences

0.7 Development Release

The current roadmap is mostly done now with a few exceptions which we have decided to not worry about for the 0.7 release. Next Friday we plan on doing this release now that most of the UIs are in reasonable states and people should be able to start using it normally now all the big features have been integrated. This has been put off by 2 weeks already but we don’t want to delay any further. So look out for a new version of Tracker next week!

Tracker Update

Roadmap to 0.7

While I was at the desktop summit, I decided to come up with a roadmap so we all had something to work to for the 0.7 unstable release which we are hoping to do soon. The roadmap is on live.gnome.org here:

http://live.gnome.org/Tracker/Roadmap

As you can see, it is progressing nicely.

Config

The configuration system in Tracker has always consisted of one TrackerConfig inheriting from a GObject and used to load/save applying the GKeyFile API. The problem we found here, is that we really want configurations to be more fine grained to specific binaries. Some of the options (like log verbosity) would apply to ALL binaries that use the config otherwise. So now we have TrackerConfigFile as a base class with tracker-object-keyfile.[ch] to do some utility functions for us in libtracker-common and all binaries that want their own TrackerConfig with object properties now inheirt from TrackerConfigFile. This is quite nice because it reduces the code duplication we had and now we have a nice set of separate config files in $HOME/.config/tracker/.

With 0.6. we also had this concept of “modules” which would be for each type of data we wanted to track. We had “files”, “applications”, “email”, and some others… These modules also have a configuration pertaining to how to index their data. Things like globs for including and ignoring certain files. There are also options to make sure data isn’t indexed too regularly (which was needed for some content that was constantly updating). All of this is in the process of being revised and merged with the TrackerConfig machinery. This mostly applies to the “files” module though. The module config and module code (which was a complex GModule implementation) is all going to be simplified now that we have separate binaries for mining each data we are interested in.

Album Art

This was quite a mess before. We had code in different places for this. Over the past week or so I have cleaned this up too. Now we do all album art downloads and extraction from the tracker-extract binary (called from mp3 and gstreamer extractors when they see media with such content). Before we would request thumbnails for the new art in tracker-extract, but due to the unstable nature of tracker-extract (based on dynamically loading modules using 3rd party APIs we can’t guarantee the stability of) we were always at risk of failing to queue new thumbnail requests to the thumbnail daemon if we had a crash. We only send thumbnail requests AFTER all indexing has been completed, if we don’t do this, we suffer with severe performance problems. Now all thumbnail requests are done from one place, the tracker-miner-fs and the albumart functions are no longer spread across libtracker-common and tracker-extract. They are just in tracker-extract.

Volume Support

Over the past few days Carlos re-added volume support to Tracker so now using a simple query, you can find out if your data on that MMC you just inserted or removed is available.

So, to get a list of ALL data objects and their availability (which is true or false based on if the media is mounted or not) you can use:

$ tracker-sparql -q "SELECT ?do ?av WHERE {
                            ?do a nie:DataObject ;
                            tracker:available ?av }"

You can also get a list of all data objects which are NOT available. To make things faster, we have not included “available” for EVERY item, only where items are available. This makes updating the tables a lot faster. So when looking for files which are not mounted, the query becomes a bit more complex:

$ tracker-sparql -q "SELECT ?do WHERE {
                            ?do a nie:DataObject .
                     OPTIONAL {
                            ?do tracker:available ?av } .
                     FILTER (! BOUND(?av)) }"

Of course the most common use case is, tell me files which are available, which can be done with:

$ tracker-sparql -q "SELECT ?do WHERE {
                            ?do a nie:DataObject ;
                            tracker:available true }"

We are still fine tuning the volume work to be faster but things are coming along swimmingly!