TL;DR: $TITLE, and a call for distributors to make it easily available in stable distros, more about that at the bottom.
Sometime this week (or last, depending how you count), Tracker 2.99.1 was released. Sam has been doing a fantastic series of blog posts documenting the progress. With my blogging frequency I’m far from stealing his thunder :), I will still add some retrospect here to highlight how important of a milestone this is.
First of all, let’s give an idea of the magnitude of the changes so far:
[carlos@irma tracker]$ git diff origin/tracker-2.3..origin/master --stat -- docs examples src tests utils |tail -n 1
788 files changed, 20475 insertions(+), 66384 deletions(-)
[carlos@irma tracker-miners]$ git diff origin/tracker-miners-2.3..origin/master --stat -- data docs src tests | tail -n 1
354 files changed, 39422 insertions(+), 6027 deletions(-)
What did happen there? A little more than half of the insertions in tracker-miners (and corresponding deletions in tracker) can be attributed to code from libtracker-miner, libtracker-control and corresponding tests moving to tracker-miners. Those libraries are no longer public, but given those are either unused or easily replaceable, that’s not even the most notable change :).
The changes globally could be described as “things falling in place”, Tracker got more cohesive, versatile and tested than it ever was, we put a lot of care and attention to detail, and we hope you like the result. Let’s break down the highlights.
Understanding SPARQL
Sometime a couple years ago, I got fed up after several failed attempts at implementing support for property paths, this wound up into a rewrite of the SPARQL parser. This was part of Tracker 2.2.0 and brought its own benefits, ancient history.
Getting to the point, having the expression tree in the new parser closely modeled after SPARQL 1.1 grammar definition helped getting a perfect snapshot of what we don’t do, what we don’t do correctly and what we do extra. The parser was made to accept all correct SPARQL, and we had an `_unimplemented()` define in place to error out when interpreting the expression tree.
But that also gave me something to grep through and sigh, this turned into many further reads of SPARQL 1.1 specs, and a number of ideas about how to tackle them. Or if we weren’t restricted by compatibility concerns, as for some things we were limited by our own database structure.
Fast forward to today, the define is gone. Tracker covers the SPARQL 1.1 language in its entirety, warts and everything. The spec is from 2013, we just got there 7 years late :). Most notably, there’s:
- Graphs: In a triple store, the aptly named triples consist of subject/predicate/object, and they belong within graphs. The object may point to elements in other graphs.
In prior versions, we “supported graphs” in the language, but those were more a property of the triple’s object. This changes the semantics slightly in appearance but in fundamental ways, eg. no two graphs may have the same triple, and the ownership of the triple is backwards if subject and object are in different graphs.
Now the implementation of graphs perfectly matches the description, and becomes a good isolated unit to let access in the case of sandboxing.
We also support the ADD/MOVE/CLEAR/LOAD/DROP wholesome operations on graphs, to ease their management.
- Services: The SERVICE syntax allows to federate portions of your query graph pattern to external services, and operate transparently on that data as if local. This is not exactly new in Tracker 2.99.x, but now supports dbus services in addition to http ones. More notes about why this is key further down.
- New query forms, DESCRIBE/CONSTRUCT: This syntax sits alongside SELECT. DESCRIBE is a simple form to get RDF triples fully describing a resource, CONSTRUCT is a more powerful data extraction clause that allows serializing arbitrary portions of the triple set, even all of it, and even across RDF schemas.
Of all 11 documents from the SPARQL recommendations, we are essentially missing support for HTTP endpoints to entirely pass for a SPARQL 1.1 store. We obviously don’t mean to compete wrt enterprise-level databases, but we are completionists and will get to implementing the full recommendations someday :).
There is no central store
The tracker-store service got stripped of everything that makes it special. You were already able to create private stores, making those public via DBus is now one API call away. And its simple DBus API to perform/restore backups is now superseded by CONSTRUCT and LOAD syntax.
We have essentially democratized triple stores, in this picture (and a sandboxed world) it does not make sense to have a singleton default one, so the tracker-store process itself is no more. Each miner (Filesystem, RSS) has its own store, made public on its DBus name. TrackerSparqlConnection constructors let you specifically create a local store, or connect to a specific DBus/HTTP service.
No central service? New paradigm!
Did you use to store/modify data in tracker-store? There’s some bad news: It’s no longer for you to do that, scram from our lawn!
You are still much welcome to create your own private store, there you can do as you please, even rolling something else than Nepomuk.
But wait, how can you keep your own store and still consume data indexed by tracker miners? Here comes the SERVICE syntax to play, allowing you to deal with miner data and your own altogether. A simple hypothetical example:
# Query favorite files SELECT ?u { SERVICE <dbus:org.freedesktop.Tracker3.Miner.Files> { ?u a nfo:FileDataObject } ?u mylocaldata:isFavorite true }
As per the grammar definition, the SERVICE syntax can only be used from Query forms, not Update ones. This is essentially the language conspiring to keep a clear ownership model, where other services are not yours to modify.
If you are only interested in accessing one service, you can use tracker_sparql_connection_bus_new and perform queries directly to the remote service.
A web presence
It’s all about appearance these days, that’s why newscasters don’t switch the half of the suit they wear. A long time ago, we used to have the tracker-project.org domain, the domain expired and eventually got squatted.
That normally sucks on itself, for us it was a bit of a pickle, as RDF (and our own ontologies) stands largely on URIs, that means live software producing links out of our control, and it going to pastes/bugs/forums all over the internet. Luckily for us, tracker-project.org is a terrible choice of name for a porn site.
We couldn’t simply do the change either, in many regards those links were ABI. With 3.x on the way, ABI was no longer a problem, Sam did things properly so we have a site, and a proper repository of ontologies.
Nepomuk is dead, long live Nepomuk
Nepomuk is a dead project. Despite its site being currently alive, it’s been dead for extended periods of time over the last 2 years. That’s 11.5M EUR of your european taxpayer money slowly fading away.
We no longer consider we should consider it “an upstream”, so we have decided to go our own. After some minor sanitization and URI rewriting, the Nepomuk ontology is preserved mostly as-is, under our own control.
But remember, Nepomuk is just our “reference” ontology, a swiss army knife for whatever a might need to be stored in a desktop. You can always roll your own.
Tracker-miner-fs data layout
For sandboxing to be any useful, there must be some actual data separation. The tracker-miner-fs service now stores things in several graphs:
- tracker:FileSystem
- tracker:Audio
- tracker:Video
- tracker:Documents
- tracker:Software
And commits further to the separation between “Data Objects” (e.g. files) and “Information Elements” (e.g. what its content represents). Both aspects of a “file” still reference each other, but simply used to be the same previously.
The tracker:FileSystem
graph is the backbone of file system data, it contains all file Data Objects, and folders. All other graphs store the related Information Elements (eg. a song in a flac file).
Resources are interconnected between graphs, depending on the graphs you have access to, you will get a partial (yet coherent) view of the data.
CLI improvements
We have been doing some changes around our CLI tools, with tracker shifting its scope to being a good SPARQL triple store, the base set of CLI tools revolves around that, and can be seen as an equivalent to sqlite3 CLI command.
We also have some SPARQL specific sugar, like tracker endpoint
that lets you create transient SPARQL services.
All miner-specific subcommands, or those that relied implicitly on their details did move to the tracker-miners repo, the tracker3 command is extensible to allow this.
Documentation
In case this was not clear, we want to be a general purpose data storage solution. We did spend quite some time improving and extending the developer and ontology documentation, adding migration notes… there’s even an incipient SPARQL tutorial!
There is a sneak preview of the API documentation at our site. It’s nice being able to tell that again!
Better tests
Tracker additionally ships a small helper python library to make it easy writing tests against Tracker infrastructure. There’s many new and deeper tests all over the place, e.g. around new syntax support.
Up next…
You’ve seen some talk about sandboxing, but nothing about sandboxing itself. That’s right, support for it is in a branch and will probably be part of 2.99.2. Now the path is paved for it to be transparent.
We currently are starting the race to update users. Sam got some nice progress on nautilus, and I just got started at shaving a yak on a cricket.
The porting is not completely straightforward. With few nice exceptions, a good amount of the tracker code around is stuck in some time frozen “as long as it works”, cargo-culted state. This sounds like a good opportunity to modernize queries, and introduce the usage of compiled statements. We are optimist that we’ll get most major players ported in time, and made 3.x able to install and run in parallel in case we miss the goal.
A call to application developers
We are no longer just “that indexer thingy”. If you need to store data with more depth than a table. If you missed your database design and relational algebra classes, or don’t miss them at all. We’ve got to talk :), come visit us at #tracker.
A call to distributors
We made tracker and tracker-miners 3.x able to install and run in parallel to tracker 2.x, and we expect users to get updated to it over time.
Given that it will get reflected in nightly flatpaks, and Tracker miners are host services, we recommend that tracker3 development releases are made available or easy to install in current stable distribution releases. Early testers and ourselves will thank you.
It’s impressive how you have been working on so many fronts. Tracker 3 is going to be a remarkable milestone. Congratulations, Carlos and Sam!
Thank you Antonio :). We do hope so!
I’d love to have more CLI examples for “tracker sparql” and “tracker sql”. Will it be possible to “CREATE DATABASE” and actions like that. Targeted at novice users rather than full-fledged gurus 😉
Thank you @amano for your interest :).
The tracker-sparql manpage has some examples, it definitely could welcome some additions/updates though. The “tracker sql” command stands a lowlevel tool, using it does take some knowledge about the database structure.
That said, the user experience of our CLI tools could definitely get a facelift, a shell mode at least.
In Tracker, the database structure is defined by the ontology, we don’t allow database structure changes that don’t come through it. There’s some docs to write one (again, subject to extend), and trying things out should be easy now with e.g. “tracker3 endpoint –dbus-service org.example.Example –ontology-path ~/my-test-ontology/”
There’s indeed a learning curve with SPARQL, and there’s not a lot of generic internet documentation that is not simple slides or dense papers. That is the main motivation behind our own SPARQL tutorial :).
Oh, OK. My misconception was that the SQL “layer” was for the “common Joes” who come with a bit of basic SQL syntax knowledge. While it is the other way round and “tacker sql” is a bad starting point to get into the SPARQL concept.
I would like to have a list of easy examples to get into the thing:
Real world examples like:
Assume you have all Beatles albums in your Music folder and already indexed by tracker:
Now manually query those things from the DB:
All files in Music/Beatles/
All files with an Album tag
File from the White album
Things like that.