What am I doing with Tracker?

“Colored net”by Chris Vees (priorité maison) is licensed under CC BY-NC-ND 2.0

Some years ago I was asked to come up with some support for sandboxed apps wrt indexed data. This drummed up into Tracker 2.0 and domain ontologies, allowing those sandboxed apps to keep their own private data and collection of Tracker services to populate it.

Fast forward to today and… this is still largely unused, Tracker-using flatpak applications still whitelist org.freedesktop.Tracker, and are thus allowed to read and change content there. Despite I’ve been told it’s been mostly lack of time… I cannot blame them, domain ontologies offer the perfect isolation at the cost of the perfect duplication. It may do the job, but is far from optimal.

So I got asked again “we have a credible story for sandboxed tracker?”. One way or another, seems we don’t, back to the drawing board.

Somehow, the web world seems to share some problems with our case, and seems to handle it with some degree of success. Let’s have a look at some excerpts of the Sparql 1.1 recommendation (emphasis mine):

RDF is often used to represent, among other things, personal information, social networks, metadata about digital artifacts, as well as to provide a means of integration over disparate sources of information.

A Graph Store is a mutable container of RDF graphs managed by a single service. […] named graphs can be added to or deleted from a Graph Store. […] a Graph Store can keep local copies of RDF graphs defined elsewhere […] independently of the original graph.

The execution of a SERVICE pattern may fail due to several reasons: the remote service may be down, the service IRI may not be dereferenceable, or the endpoint may return an error to the query. […] Queries may explicitly allow failed SERVICE requests with the use of the SILENT keyword. […] (SERVICE pattern) results are returned to the federated query processor and are combined with results from the rest of the query.

So according to Sparql 1.1, we have multiple “Graph Stores” that manage multiple RDF graphs. They may federate queries to other endpoints with disparate RDF formats and whose availability may vary. This remote data is transparent, and may be used directly or processed for local storage.

Let’s look back at Tracker, we have a single Graph Store, which really is not that good at graphs. Responsibility of keeping that data updated is spread across multiple services, and ownership of that data is equally scattered.

It snapped me, if we transpose those same concepts from the web to the network of local services that your session is, we can use those same mechanisms to cut a number of drawbacks short:

  • Ownership is clear: If a service wants to store data, it would get its own Graph Store instead of modifying “the one”. Unless explicitly supported, Graph Stores cannot be updated from the outside.
  • So is lifetime: There’s been debate about whether data indexed “in Tracker” is permanent data or a cache. Everyone would get to decide their best fit, unaffected by everyone else’s decisions. The data from tracker-miners would totally be a cache BTW :).
  • Increases trustability: If Graph Stores cannot be tampered with externally, you can trust their content to represent the best effort of their only producer, instead of the minimum common denominator of all services updating “the Graph Store”.
  • Gives a mechanism for data isolation: Graph Stores may choose limiting the number of graphs seen on queries federated from other services.
  • Is sandboxing friendly: From inside a sandbox, you may get limited access to the other endpoints you see, or to the graphs offered. Updates are also limited by nature.
  • But works the same without a sandbox. It also has some benefits, like reducing data duplication, and make for smaller databases.

Domain ontologies from Tracker 2.0 also handle some of those differently, but very very roughly. So the first thing to do to get to that RDF nirvana was muscling up that Sparql support in Tracker, and so I did! I already had some “how could it be possible to do…” plans in my head to tackle most of those, but unfortunately they require changes to the internal storage format.

As it seems the time to do one (FTR, storage format has been “unchanged” since 0.15) I couldn’t just do the bare minimum work, it seemed too much of a good opportunity to miss, instead of maybe making future changes for leftover Sparql 1.1 syntax support.

Things ended up escalating into https://gitlab.gnome.org/GNOME/tracker/commits/wip/carlosg/sparql1.1, where It can be said that Tracker supports 100% of the Sparql 1.1 syntax. No buts, maybe bugs.

Some notable additions are:

  • Graphs are fully supported there, along with all graph management syntax.
  • Support for query federation through SERVICE {}
  • Data dumping through DESCRIBE and CONSTRUCT query forms.
  • Data loading through LOAD update form.
  • The pesky negated property path operator.
  • Support for rdf:langString and rdf:List
  • All missing builtin functions

This is working well, and is almost drop-in (One’s got to mind the graph semantics), making it material for Gnome 3.34 starts to sound realistic.

As Sparql 1.1 is a recommendation finished in 2013, and no other newer versions seem to be in the works, I think it can be said Tracker is reaching maturity. Only HTTP Graph Store Protocol (because why not) remains the big-ish item to reasonably tell we implement all 11 documents. Note that Tracker’s bet for RDF and Sparql started at a time when 1.0 was the current document and 1.1 just an early draft.

And sandboxing support? You might guess already the features it’ll draw from. It’s coming along, actually using Tracker as described above will go a bit deeper than the required query language syntax, more on that when I have the relevant pieces in place. I just thought I’d stop a moment to announce this huge milestone :).

2 thoughts on “What am I doing with Tracker?”

  1. A (small) part of the reason nothing has been done with the previous/current sandboxing support is because it doesn’t work if you run an app from an installation that doesn’t do exports, such as running apps from Builder. Since nothing gets exported, the private Tracker instance won’t be started, unless you also have the same app actually installed, and since we have different app ids for stable and nightly/devel, it has to be nightly.

    Since a lot of apps using Tracker are a part of newcomers program, it’s expected that the Builder workflow always works, so having to install nightly build of an app you want to run in Builder is more or less a dealbreaker. 🙂

  2. The application could still detect that environment and choose not to do the tracker_sparql_connection_set_domain() call that gets domain ontologies rolling *shrug*.

    Anyway, too late, I’m overengineering 8).

Leave a Reply

Your email address will not be published. Required fields are marked *