What’s new in Tracker 1.2?

Minions-Happy

Reblogged from Lanedo GmbH. blog

Every 6 months or so we produce a new stable release and for Tracker 1.2 we had some new exciting work being introduced. For those that don’t know of Tracker, it is a semantic data storage and search engine for desktop and mobile devices. Tracker is a central repository of user information, that provides two big benefits for the user; shared data between applications and information which is relational to other information (for example: mixing contacts with files, locations, activities and etc.).

Providing your own data

Earlier in the year a client came Lanedo and to the community asking for help on integrating Tracker into their embedded platforms. What did they want? Well, they wanted to take full advantage of the Tracker project’s feature set but they also wanted to be able to use it on a bigger scale, not just for local files or content on removable USB keys. They wanted to be able to seamlessly query across all devices on a LAN and cloud content that was plugged into Tracker. This is not too dissimilar to the gnome-online-miners project which has similar goals.

The problem

Before Tracker 1.2.0, files and folders came by way of a GFile and GFileInfo which were found using the GFileEnumerator API that GLib offers. Underneath all of this the GFile* relates to GLocalFile* classes which do the system calls (like lstat()) to crawl the file system.

Why do we need this? Well, on top of TrackerCrawler (which calls the GLib API), is TrackerFileNotifier and TrackerFileSystem, these essentially report content up the stack (and ignore other content depending on rules). The rules come from a TrackerIndexingTree class which knows what to black list and what to white list. On top of all of this is TrackerMinerFS, which (now is inaccurately named) handles queues and processing of ALL content. For example, DELETED event queues are handled before CREATED event queues. It also gives status updates, handles INSERT retries when the system is busy and so on).

To make sure that we take advantage of existing technology and process information correctly, we have to plugin at the level of the TrackerCrawler class.

The solution

Essentially we have a simple interface for handling open and close cases for iterating a container (or directory) called TrackerDataProvider interface (and TrackerFileDataProvider implementation for the default or existing local file system case).

That is followed up with an enumerator interface for enumerating that container (or directory). That is called TrackerEnumerator and of course there is a TrackerFileEnumerator class to implement the previous functionality that existed.

So why not just implement our own GFile backend and make use of existing interfaces in GLib? Actually, I did look into this but the work involved seemed much larger and I was conscious of breaking existing use cases of GFile in other classes in libtracker-miner.

How do I use it?

So now it’s possible to provide your own data provider implementation for a cloud based solution to feed Tracker. But what are the minimum requirements? Well, Tracker requires a few things to function, those include providing a real GFile and GFileInfo with an adequate name, and mtime. The libtracker-miner framework requires the mtime for checking if there have been updates compared to the database. The TrackerDataProvider based implementation is given as an argument to the TrackerMiner object creation and called by the TrackerCrawler class when indexing starts. The locations that will be indexed by the TrackerDataProvider are given to the TrackerIndexingTree and you can use the TRACKER_DIRECTORY_FLAG_NO_STAT for non-local content.

Crash aware Extractor

In Tracker 1.0.0, the Extractor (the ‘tracker-extract’ process) used to extract metadata from files was upgraded to be passive. Passive meaning, the Extractor was only extracting content from files already added to the database. Before that, content was concatenated from the Extractor to the file system miner and inserted into the database collectively.

Sadly with 1.0.0, any files that caused crashes or serious system harm resulting in the termination of ‘tracker-extract’ were subsequently retried on each restart of the Extractor. In 1.2.0 these failures are noted and files are not retried.

New extractors?

Thanks to work from Bastien Hadess, there have been a number of extractors added for electronic book and comic books. If your format isn’t supported yet, let us know!

Updated Preferences Dialog

Often we get questions like:

  • Can Tracker index numbers?
  • How can I disable indexing file content?

To address these, the preferences dialog has been updated to provide another tab called “Control” which allows users to change options that have existed previously but not been presented in a user interface.

tracker-preferences-1.2

In addition to this, changing an option that requires a reindex or restart of Tracker will prompt the user upon clicking Apply.

What else changed?

Of course there were many other fixes and improvements as well as the things mentioned here. To see a full list of those, see them as mentioned in the announcement.

Looking for professional services?

If you or someone you know is looking to make use of Open Source technology and wants professional services to assist in that, get in touch with us at Lanedo to see how we can help!

Tracker: Direct Access branch merged in master

A while back we had this bug from Bastien: 613255 – “Read-only, non-DBus, store access”. For the past 5 or 6 weeks we have been working on this. Initially the idea was just to do direct access, but once we got started, we realised that the libtracker-client API wasn’t really good enough and we would like to extend it. But we didn’t want the old API there either, so we came up with this new library to supersede libtracker-client. For now we package both, but all functions in libtracker-client are marked as deprecated at this point.

So what do we have now? Fundamentally we have ONE API for different backends using different technologies. To summarise:

1. D-Bus (libtracker-bus, backend)
2. Direct Access (libtracker-direct, backend)

D-Bus – Read/Write Access
Depending on the version, we either use FD passing (requires > 1.3.1) to avoid copious memory copies OR we use D-Bus glib marshalling which represents the worst performance you can get from Tracker (though it is still usable).

Direct Access – Read Only Access
This is based on a library we had internally in Tracker called libtracker-data. We merged some things to make this happen (like libtracker-db) but generally, we sit on top of this library in libtracker-direct.

Plugins?
The backends are dynamically loaded at run time depending on the client’s needs (i.e. if you only ever do SELECT type queries, you’ll use the direct-access backend).

How does the API look for libtracker-sparql?
The idea here was to facilitate all the old API needs and some new ones. What we wanted was less API bloat and to incorporate some of the things we had in the code base all spread out in multiple libraries into this libtracker-sparql. These things include:

  • Connections – used in libtracker-client, we wanted some common way to get a connection to Tracker regardless of what backend was in use.
  • Cursors – used in libtracker-db and wanted as a public API for some time, but not possible without WAL (Write Ahead Logging) in SQLite 3.7. Now we share the same API internally and externally.
  • Builder – used in tracker-extract for building SPARQL queries for selecting/inserting data.
  • Utilities – used in tracker-extract, the miners, etc. for escaping text used in SPARQL queries and some other common functionality.
  • Example
    So this is what you might expect with the new API:

    
    TrackerSparqlConnection *connection;
    GError *error = NULL;
    const gchar *query = "SELECT ?class WHERE { ?class tracker:writeback true }";
    
    connection = tracker_sparql_connection_get (&error);
    
    if (!connection) {
    	g_printerr ("%s: %s\n", _("Could not establish a connection to Tracker"), error ? error->message : _("No error given"));
    	g_clear_error (&error);
    	return;
    }
    
    /* The NULL below is the GCancellable */
    cursor = tracker_sparql_connection_query (connection, query, NULL, &error);
    
    if (error) {
    	g_printerr ("%s, %s\n", _("Could not query classes"), error->message);
    	g_error_free (error);
    	g_object_unref (connection);
    	return;
    }
    
    if (!cursor) {
    	g_print ("%s\n", _("No classes were found"));
    } else {
    	while (tracker_sparql_cursor_next (cursor, NULL, NULL)) {
    		g_print ("%s\n", tracker_sparql_cursor_get_string (cursor, 1, NULL));
    	}
    
    	g_object_unref (cursor);
    }
    
    g_object_unref (connection);
    

    So, now we have direct access. I have ported my tracker-needle experimental application to it and it seems faster than tracker-search-tool (which it aims to supersede). I will blog about tracker-needle later, but for now, direct-access is available in master for anyone interested to try it out! This has been a huge team effort with Jürg, Philip and Aleksander Morgado and myself involved. Try it out, any comments about the API are appreciated and it is documented quite nicely too.

Desktop Summit, Lanedo & Imendio and Tracker

Desktop Summit

Wanted to say thank you to everyone at the desktop summit this year. It was superb and it was good to see everyone again!

So it became quite obvious to me at this years desktop summit in Gran Canaria that no one really knows what is going on with regards to Lanedo and my involvement in projects. This is primarily because I haven’t been blogging enough. I have decided to change this.

After speaking to various people (Bastien, Lucas, etc) I was surprised to hear some of the questions about Lanedo. I thought my initial blog covered it. But clearly not.

Lanedo & Imendio

In December 2008, Micke spoke to us all in Imendio and said that he was going to shutdown the company. Of course this came as a huge surprise to everyone given our success over the years and the economic climate was not the reason for his decision. The reason was stress. If I really think back I can see how Micke was trying to change things internally to alleviate this by of-loading some of his responsibility to others. This happened probably for a year or more. In the end, I think it was just too much. Towards the end of Imendio, you could tell how stressed Micke was by his demeanor. Now-a-days, he is much happier and everyone can see the change.

Richard decided to not continue with Imendio too. As such Tim and I (who were effectively internally managing projects at Imendio) decided to start a new company if everyone (except Micke and Richard) wanted to continue. The consensus was that they did, so in January 2009, Lanedo GmbH was formed in Hamburg. We took on some of Imendio’s contracts and now we are continuing the work on our own steam.

Tracker

This year Tracker was in the spotlight somewhat. As a project it was grown considerably in the last 12 months. In the early part of last year, Carlos and I started working on it full time. More and more people got involved like Jürg Billeter, Philip Van Hoof, Ivan Frade and Mikael Ottela. These are the core developers. We refactorred a lot of it to produce the 0.6.9x releases. Jamie has been providing feedback about direction and ideas and doing one of the most important features – the SQLite module we use for Full Text Search (FTS).

About 3-6 months ago, Jürg, Philip and Ivan started looking into the 0.7 work and at the moment Jurg is leading the development there while I maintain bug fixes for the 0.6 branch. Our roles in the project are all quite well defined (I would say at least) and it is a really fun project to work on with some really brilliant people contributing. Right now, this is how it looks:

I handle the File system monitoring, crawling and database connection management. I also do the 0.6.9x releases and have been doing project management in coordination with Urho Konttori.

Carlos maintains the indexing of the data, the extensions (or modules) which know what to do with the data we extract.

Philip works on the thumbnailing and has a really good appetite for creating specifications and working with new technologies to provide ideas about how to improve areas.

Ivan is our ontology guru not to mention he added the GLib unit tests to Tracker which is a huge benefit.

Jürg has been working on completely refactoring the databases and the higher level API that sits on top of them (libtracker-data). Jürg is also leading development the 0.7 (master) branch right now.

Mikael is our extractor expert. Mikael has been improving constantly the MP3/GStreamer/JPEG/etc extractors to get better performance for each release.

ALL of us do general project maintenance it should be added, so we all contribute to each other’s areas. These are also just some of the more noteable areas which we are each involved in. It is a large project and there are a lot of things not mentioned here.

So right now Tracker is looking really good and it is an exciting project to be involved in, especially with Zeitgeist being interested in using it and other components in BOTH desktops too.

I plan to blog much more about features we add, crap we remove, etc.

The calm before the storm

Baby Coming!

Sue is about to have our baby (expected date is the 18th of March), she really can’t wait for it to be born now and neither can I! Right now I am just trying to get as much sleep as possible in preparation 🙂
We don’t yet know if it is a boy or a girl so there is an added excitement after waiting 9 months not knowing. Sue thinks it is a boy, I think it is a girl.

Tracker Release

Yesterday I released Tracker 0.6.91, which follows the recent 0.6.90 release that we did after 12 months of solid development on the project. I say we, there is quite a huge team working on this project now, including Carlos Garnacho, Ivan Frade, Jürg Billeter, Philip Van Hoof, Mikael Ottela, Urho Konttori and many more. We have a preliminary roadmap (as mentioned here) for Tracker  too. This recent release and possibly one more will be the last before 0.7 which will include Jürg’s vstore branch (which we have been working on in parallel for months now). We also had a discussion about the current architecture of the project and decided to change some of the roles around regarding what the indexer and daemon currently do to make things more efficient. With this all in mind, I am expecting some seriously good fun on this project in the next 3 months.

Accidentally marking blog comments as spam

Anyone know how to revert this? Thomas Thurman sent me a comment and I accidentally pressed the “mark-as-spam” link in the email I received and I can not find a way to revert that. It also seems he can’t post his entry again :/

The comment was about my image and how it would be a good GDM greeter screen. I agree and had this in mind at the time of taking the picture. So I uploaded it and it is pending on arg.gnome.org.

Initng

Recently I tried Initng on my Ubuntu distribution using some handy pages from the Ubuntu forums.

To say the least, I was amazed.
My boot time on Ubuntu Breezy was 58 seconds and now it is 29 seconds (that is from Grub to GDM).

Give it a try, there are only a few steps to follow, it takes 5 minutes.

Servelocity & Cambridge

Signed up with Servelocity and now have a new domain (http://bytejunky.net). They provide a really good service compared to the last hosted solution used.

Took the opportunity to look into original which has been used in many places already for some of the 450 odd images I took in New Zealand. They will be there soon 😉

Spent the day in Cambridge botanic gardens recently for Sue’s Birthday. Took the digital camera along for the ride 🙂 and using the power of original I can now bring to you some of the superb scenery that I was able to capture.

Half Life 2 with Cedega

Another Birthday

25 years old!

I think it will be a nice night in with the family and and some good food.

My brothers asked what I wanted, and I couldn’t resist 😀

Gnome Blog 0.8

Saw Seth’s announcement for gnome-blog 0.8 and thought I would try it, so this post is my first post with it. Very simple to use, needs a little more in the formatting department perhaps, but great none the less.

Tried out www.blogger.com too, it seems very comprehensive.

Half Life 2 with Cedega

A friend of mine recently bought Half Life 2 and I have been meaning to try it out but have been busy with Gossip. Now that I have set up my new GeForce 6600 GT, I thought I would try and get it running under Cedega. Transgaming’s site boasts it working fully out of the box and I got it going on my Fedora Core 3 distribution with no problems – oh and the game is superb too 😉

Also – as I didn’t have the DVD to install it, I used Steam’s installer to download it all which works well.

Ubuntu, Streamtuner and VMWare

New Stuff
Recently looked into Stream Tunner which I have to say is superb! Look into it if you get the chance.

Also, I thought I would try to get Windows XP going with Vmware for Linux. Again, worked flawlessly, and am quite impressed with it.

Gossip Transports
Still plodding along with this. Things are going really well at the moment actually. Mikael and I spoke about what will happen over the next few weeks and we will probably merge our branches then and get things cleaned up ready for a new Gossip release.

Fine tuning the wizard last week meant that now people no longer need to wait for a list of servers they can use. It is much more efficient now.

Ubuntu anyone?
I have been trying Ubuntu Linux this last few weeks and I love it! Previously I have used mainly Red Hat and occasionally Mandrake. The best thing about it has to be the eye candy 😀