Tracker ORM: Step 0

02/05/2010

So I’ve been accepted for the second year as a Google Summer of Code student. I want to thank all the people who supported my proposals (yes, there were two of them, obviously only one got accepted), and of course Google for sponsoring Open Source work.

For this summer, my job will be to build an ORM using Tracker. Those who looked at other articles in this blog know that I’ve been working on Tracker for quite some time now, so it’s no surprise I’ll continue working on the same project 🙂

Introducing Hormiga

So far, I’m in the early planning of the project, mainly looking at how other do it, both in SPARQL and in SQL. What I know for sure, is that this project will be called “Hormiga”. “Hormiga” as “ORM” inside, and means “ant” in spanish. An ant is lightweight but strong, which would be two qualities nice to have in the final product.

What is it for ?

Hormiga aims at lowering the bar when it comes to working with SPARQL and RDF databases. Some operations are sometimes complex or tedious, Hormiga aims at making them easier. However, I also intend to leave the possibility to make direct SPARQL requests, for those who know what they do.

How will it work ?

I’m not totally sure yet of how Hormiga will be used. Most ORM solutions I saw for RDF use reflection in the language they target (Java, Python, Ruby). I don’t want to restrict Hormiga so much, so reflection is ruled out for now. I’m thinking more of a workflow along the lines of Apache Cayenne, which generates a simple proxy that can be subclassed to implement custom behaviors.

Using such a behaviour, the workflow would be :

  1. Write a mapping file, possibly in JSON or XML. The mapping file(s) define the (RDF) classes and properties to map, and how to map them.
  2. Run the ORM generator on the mapping file to produce proxies for each mapped class (I’m thinking about targetting Vala at first, which gives us C for free, but python or javascript backends could be added in the future).
  3. Subclass (if needed) the produced proxies. The generated proxies should not be modified, in fact generation should be part of the build process.
  4. Use the produced classes as if they were normal objects.

We will probably have an entity manager, as most ORMs do, to do queries and update objects (from/to the DB).

Things I’d like to get right from the beginning

  • Lazy loading: the more we can defer the actual work, the snappier the application will be
  • Doing the maximum on the SPARQL side: do as much as possible of the filtering on Tracker’s side, and only load what’s needed
  • Stay flexible: always allow the user to fall back to SPARQL if the API is not complete enough

I know there’s not a lot of concrete things in this blog post, but it should get better in the weeks to come 🙂 Meanwhile, you can always tell me your thoughts (if you have some about Hormiga) in the comments!