My current job implies working with Tracker, for the first time not as a developer but as a user. This is quite a cool change, as I can now be on the side of those bitching when things don’t work as they should 🙂

As you probably know now, Tracker is a RDF database (and a set of programs to exploit it). However, it is a bit special for various reasons:
1. Tracker’s ontologies are fixed (changes are supported in a limited way), which means you should stick to the installed data schemas (ontologies), as opposed to being allowed to store any triple in the database.
2. Tracker uses SQLite. On the good side it means Tracker is rather lite on the system resources (it usually idles at around 4MB RSS, and can go maybe up to 25-30MB when running a big query). On the bad side, it means that not every operation is fast, since SQLite is an on-disk database. As my job implies using Tracker on devices with not so much memory or CPU power, it is very important to know what is fast/expensive and what is not. And it is precisely what this post is about.

OPTIONAL blocks and predicate functions

OPTIONAL blocks are one of the very costly operations in Tracker. Let’s say you want to query all music resources, and their title. You could run this query:
SELECT ?urn ?title WHERE { ?urn a nmm:MusicPiece; nie:title ?title } [1]

However, this query will only return resources that do have a title. Resources without title will not match the query. To also get resources without a title, we can write:
SELECT ?urn ?title WHERE { ?urn a nmm:MusicPiece OPTIONAL { ?urn nie:title ?title } } [2]

Now, we have an OPTIONAL block, which makes our query slower. On this very precise example, the speed difference might be negligible, but I’ve already seen 10x speedups on some queries optimized to use as few OPTIONAL blocks as possible.

The faster solution is to use “predicate functions”, a non-standard SPARQL feature that allows us to use predicate as functions on the query variables. The query [2] rewritten to use predicate functions would be:
SELECT ?urn nie:title(?urn) WHERE { ?urn a nmm:MusicPiece } [3]

In that case, the second columns in our results would be an empty string when there is no title. If this is faster, you might wonder why Tracker does not convert internally OPTIONAL blocks to predicate functions. The answer is, OPTIONAL blocks allow you to do more things, that are not always possible with predicate functions. When using an optional block, you define a sparql variable (?title in our example), which you can reuse in other patterns. This is not the case when using predicate functions.

You can chain predicate functions. If you also want to get the album title of each music resource, you can write:
SELECT ?urn nie:title(?urn) nie:title(nie:musicAlbum(?urn)) WHERE { ?urn a nmm:MusicPiece } [4]

If you use a predicate function on a predicate that can have more than on value, values will be joined with a user defined separator (by default, “,”):
SELECT ?urn nie:keyword(?urn) WHERE { ?urn a nmm:MusicPiece } [5]

However, predicate functions don’t work on lists. That means if you have a chain of predicate functions p1(p2(…pn(?variable))), the query will only be valid if p2, p3…pn are single valued.

If one of the predicates is not single valued, you will either have to use an OPTIONAL block… Or wait until the next blog post, where I’ll present an alternative solution 🙂

One Response to “Optimizing SPARQL queries for Tracker, part 1”

  1. […] Adrien Bustany Posted by Bez kategorii Subscribe to RSS feed […]

Comments are closed.