New SPARQL Parser merged

Up to now, Tracker has used Rasqal to parse SPARQL queries received from applications. We imported a copy of Rasqal into Tracker’s source tree as we use some SPARQL extensions that are not yet fully implemented in Rasqal upstream. For the last couple of weeks I’ve been writing a new SPARQL parser, a hand-written recursive descent parser.

The motivation was to speed up (large) queries and to fix a few corner cases with OPTIONAL graph patterns that Rasqal gets wrong. Rasqal does not perform very well for large queries as it always builds a full abstract syntax tree of the query, even if it’s a very simple but long INSERT statement. The hand-written parser processes queries on the fly, trying to keep only as much state around as necessary.

The new parser has now been merged into master and should improve our SPARQL support in both speed and conformance. We’ve tested it with test cases from DAWG and it appears to work at least as well as the old parser, but I’m sure we’ve missed some bugs, so let us know if you find any issues with SPARQL in Tracker master.

6 thoughts on “New SPARQL Parser merged”

  1. I would have been happy to talk with you about this and performance. INSERT is a non-standard part of sparql and implemented in rasqal as experimental code and likely not optional for large queries as you describe. flex+yacc does not deal with that well, and the full abstract syntax tree was by design, appropriate for standard sparql without inserts.

    The rasqal API and sparql work is under active development and the query algebra is a lot more compliant than before (test cases driven). You might not need all of that for your use cases.

  2. @Dave: I understand that building the AST was by design and it’s a perfectly valid design in general. However, for our purposes the AST is not necessary, and we want to avoid the memory and performance overhead as Tracker will be used on embedded devices with limited resources.

    We did not contact you as the AST-less parser wouldn’t fit into Rasqal and code from the existing yacc parser couldn’t have been reused, as far as I can tell from my knowledge of the Rasqal codebase. I hope I haven’t missed an opportunity for collaboration.

  3. Awesome! Right at the time where I get some crazy ideas about extending SparQL to allow ContextKit properties and GConf/GSettings values to parameterize Tracker queries. That would be used for a “Context Cron” daemon that would be excellent at life queries and running actions whenever the result changes.

    This is probably best done outside of Tracker, and we would need a SparQL parser of our own for that. We would parse the queries, plug in the property and setting values and hand the result to Tracker.

    So, do you think your parser can be easily used to transform one SparQL into another?

    (This old Lisp head here considers all parsing related things to be accidental complexity actually and has to cry a bit everytime things go pear shaped because some language needs to be invented or some values need to be treated abstractly, but hey, this is the real world… 🙂

    (Also, hand written recursive descent parsers are the way to go for any non-toy parser, so kudos for this choice!)

  4. Hi juergbi! Nice work, really. Do you plan to support query optimization sometime in the future?

Leave a Reply

Your email address will not be published. Required fields are marked *