Optimizing SPARQL queries for Tracker, part 2

20/01/2011

In the last post, we learnt a way to avoid OPTIONAL blocks in queries. We however raised a problem that happens when you want to fetch an optional resource, and the predicate chain that links that resource to the subject includes some multi-valued predicate.

To illustrate this example, let’s imagine we want to retrieve all music resources, along with their tags, if they have some. The straightforward way to write this query would be:
SELECT ?m ?tagLabel WHERE { ?m a nmm:MusicPiece OPTIONAL { ?m nao:hasTag ?tag ; nao:prefLabel ?tagLabel } } [1]

If you read the previous article, and forgot about multi-valued predicates, you might try something like:
SELECT ?m nao:tagLabel(nao:hasTag(?m)) WHERE { ?m a nmm:MusicPiece } [2]

… but that query is not valid. Why? Because nao:hasTag is not single valued, a resource can have several tags. If you get several results using a predicate function, Tracker will concatenate them using a separator character, by default “,”. So the query
SELECT ?m nao:hasTag(?m) WHERE { ?m a nmm:MusicPiece } [3]
could return a line like:
urnOfMusicPiece urnOfTag1,urnOfTag2,urnOfTag3.
So what you get in the second column is actually not the identifier of a resource, but a string with URNs encoded inside. And no way the nao:prefLabel “function” can work on that.

There exists an alternative solution though, and it is to use the so called scalar selects. A scalar select is a SELECT block returning one line, and one column, which is assimilable to a scalar. And that type of select can be added to our query’s projections:
SELECT ?m (SELECT GROUP_CONCAT(nao:prefLabel(?tag), ":") WHERE { ?m nao:hasTag ?tag }) WHERE { ?m a nmm:MusicPiece } [4]

Yes, this does look a bit like black magic. But we’ll break it down into pieces. First, if you remove the scalar select, you get back to our most basic query, selecting all music resources. Now let’s analyse the scalar select itself, first without the GROUP_CONCAT:
SELECT nao:prefLabel(?tag) WHERE { ?m nao:hasTag ?tag } [5]

The query 5 has nothing really special to it, the only detail being that ?m is not defined in the scalar select, but its definition comes from the “main” one. Scalar selects in projections are evaluated after the WHERE pattern, which means you can use values from the “main” select in a scalar select in the projections, but not the other way around.

Now on to GROUP_CONCAT: if our resource ?m happens to have several tags, our scalar select will return more than one line, and additional results will be discarded (Tracker implicitely adds “LIMIT 1” to scalar selects). Not good. The GROUP_CONCAT takes all results, and concatenates them together using a defined separator. In our case, we get a list of tag labels separated by :. So, a result line from the query 4 might look like:
urnOfMusicPiece tagLabel1:tagLabel2:tagLabel3
And if there were no tags, the second column will simply be empty. Of course, this approach requires a bit of string splitting on the application side, but this is usually much cheaper than the OPTIONAL block. And if you’re really going to use this kind of query, the choice of a separator better than “:” might be a good idea, ASCII has some special characters like 0x1E (field separator) that are less likely to be used in tag labels. You can use the syntax \u001E in SPARQL.

PS. To answer the question “How do I know if a predicate is single or multi valued, you can read the ontology reference documentation, and look for the “cardinality” property of the predicates you’re using.

Update: the ASCII control character I wanted to mention was not 0x2E but 0x1E

Posted by Adrien Bustany
Filed in Uncategorized
Tags: Sparql, Tracker

4 Comments »

4 Responses to “Optimizing SPARQL queries for Tracker, part 2”

Optimizing SPARQL queries for Tracker, part 2Experiments in GNOMEland | 9nd.pl Says:

20/01/2011 at 11:20 am
[…] Adrien Bustany Posted by Bez kategorii Subscribe to RSS feed […]
Damian Says:

20/01/2011 at 6:04 pm
I’m not sure the syntax is correct here. In SPARQL 1.1 you can’t have SELECTs in that position.

I think the query would be:

SELECT ?m (GROUP_CONCAT(nao:prefLabel(?tag), “:”) AS ?tags)
WHERE {
?m a nmm:MusicPiece;
nao:hasTag ?tag .
} GROUP BY ?m

but don’t take my word for it. (Thanks for the work on tracker and SPARQL, btw)
- Adrien Bustany Says:
  
  20/01/2011 at 6:47 pm
  Hello Damian,
  
  you are probably right about the SPARQL 1.1 correctness of my query, however Tracker introduces a few syntax additions as described in [1], among them the scalar selects. Also, I’m not sure your version of the query would match subjects with no nao:hasTag predicate (which is the point of using scalar selects here).
  
  In general, Tracker bends RDF’s corners when needed to stay fast. So standard SPARQL queries should work in Tracker, but the reciprocal might not always be true.
  
  As a side note, you can also use scalar selects in a WHERE pattern, refer to the doc for some examples.
  
  [1] http://live.gnome.org/Tracker/Documentation/SparqlFeatures
  - Damian Says:
    
    20/01/2011 at 7:11 pm
    Oh yes, I forgot the original point of this 🙂
    
    Anyway, yes, you’d need an OPTIONAL in there. Sorry.

Comments are closed.

Experiments in GNOMEland