Oh, the security!


Under public domain

There’s been lately lots of fuzz around Tracker as a security risk, as the de-facto maintainer of Tracker I feel obliged to comment. I’ll comment purely on Tracker bits, I will not comment on other topics that OTOH were not as debated but are similarly affected, like thumbnailing, previewing, autodownloading, or the state of maintenance of gstreamer-plugins-bad.

First of all, I’m glad to tell that Tracker now sandboxes its extractors, so its only point of exposure to exploits is now much more constrained, leaving very little room for malicious code to do anything harmful. This fix has been backported to 1.10 and 1.8, and new tarballs rolled, everyone rejoice.

Now, the original post raising the dust storm certainly achieved its dramatic effect, despite Tracker not doing anything insecure besides calling a closed well known set of 3rd party libraries (which after all are most often installed from the same trusted sources that Tracker comes from), it’s been on the “security” spotlight across several bugs/MLs/sites with different levels of accuracy, I’ll publicly comment on some of these assertions I’ve seen in the last days.

This is a design flaw in Tracker!

Tracker has always performed metadata extraction in a separate process for stability reasons, which means we already count on this process possibly crashing and burning away.

Tracker was indeed optimistic at the possible reasons why that might happen, but precisely thanks to Tracker design it’s been a breeze to isolate the involved parts. A ~200 lines change hardly counts as a redesign.

All of tracker daemons are inherently insecure!, or its funnier cousin Tracker leaks all collected information to the outside world!

This security concern has only raised because of using 3rd party parsers (well, in the case of the GStreamer vulnerability in question, decoders, why a parsing facility like GstDiscoverer triggers decoding is another question worth asking), and this parsing of content happens in exactly one place in your common setup: tracker-extract.

Let’s dissect a bit Tracker daemons’ functionality:

  • tracker-store: It is the manager of your user Tracker database, it connects to the session bus and gets readwrite access to a database in ~/.cache. Also does notification of changes in the database through the user bus.
  • tracker-miner-fs: It’s the process watching for changes in filesystem, and filling in the basic information that can be extracted from shared-mime-info sniffing (which usually involves matching some bytes inside the file, little conditionals involved), struct dirent and struct stat.
  • tracker-extract: Guilty as charged! It receives the notification of changes, and is basically a loop that picks the next unprocessed file, runs it through 3rd party parsers, sends a series of insert clauses over dbus, and picks the next file. Wash, rinse, repeat.
  • tracker-miner-applications: A very simplified version of tracker-miner-fs that just parses the keyfiles in various .desktop file locations.
  • tracker-miner-rss: This might be another potential candidate, as it parses “arbitrary” content through libgrss. However, it must be configured by the user, it otherwise has no RSS feeds to read from. I’ll take the possibility of hijacking famous blogs and news sites to hack through tracker-miner-rss as remote enough to fix it after a breathe.

So, taking aside per-parser specifics, tracker consists of one database stored under 0600 permissions, information being added to it through requests in the dbus session, and being read by apps from a readonly handle created by libtracker-sparql, the read and write channels can be independently isolated.

If you are really terrified by your user information being stored inside your homedir, or can’t sleep thinking of your session bus as a dark alley, you certainly want to run all your applications in a sandbox, they won’t be able to poke on org.freedesktop.Tracker1.Store or sniff on ~/.cache that way.

But again, there is nothing that makes Tracker as a whole inherently insecure, at least not more than the average session bus service, or the average application storing data in your homedir. Everything that could be distrusted is down to specific parsers, and that is anything but inherent in Tracker.

Tracker-extract runs plugins and is thus unsafe!

No, tracker-extract has a modular design, but is not extensible itself. It reads a closed set of modules implemented by Tracker from a folder that should be in /usr/lib/tracker-1.0 if your setup is right. The API of these modules is private and subject to change. If anything manages to add or modify modules there, you’ve got way worse concerns.

Now, one of these extractor modules uses GStreamer, which to my knowledge is still the go-to library if you want anything multimedia on linux, and it happens to open an arbitrary list of plugins itself, that is beyond Tracker control or extent.

It should be written in rust!

What do we gain from that? As said, tracker-extract is in essence a very simple loop, all the scary stuff is handled by external libraries that will still be implemented in “unsafe languages”, rust is just as useful as gift paper to wrap this.

Extraction should be further isolated into another process!

There are good reasons not to do that. Having two separate processes running completely interlocked tasks (one process can’t do anything until the other is finished) is pretty much a worst case for scheduling, context switching, performance and battery life at once.

Furthermore, such tertiary service would need exactly the same whitelisted syscalls and exactly the same number of ways out of the process. So I think I won’t attract the “Tracker is heavy/slow” zealots for this time… There is a throwaway process, and it is tracker-extract.

The silver linings

Tracker is already more secure, now lets silence the remaining noise. Quite certainly one area of improvement is Flatpak integration, so sandboxed applications can launch isolated Tracker instances that run under the same sandboxed environment, and extracted data is only visible within the sandbox.

This is achievable with current Tracker design, however the “Tracker as a service” approach sounds excessive with this status quo, tracker needs to adapt to being usable as a local store, and it needs to go through being more of a generic SPARQL endpoint before.

But this is just adapting to the new times, Flatpak is relatively young and Tracker is slow moving, so they haven’t met yet. But there is a pretty clear roadmap, and we’ll get there.

8 thoughts on “Oh, the security!”

  1. @tobias, what you did is very human, fears lose their foundation but the fear remains. That’s why screeching sounds like nails on a blackboard make our skin crawl.

    I will however be clear here: Tracker shall not change its defaults. The problem has been addressed directly and doing anything additionally on top is succumbing to unfounded fears, changing the defaults implicitly acknowledges that the sandboxing effort is not entirely trustable, while it’s meant to be first barrier of defense. I will not send such mixed message.

    Malicious files don’t often announce themselves, you might move it together with other legitimate files to an indexed folder, or might even be fooled to open it! is your video player sandboxed? Truth is that every other app in your system uses the same exploitable codecs, making tracker not index ~/Downloads is just an effective protection barrier as it is closing lightly the bedroom door to stop the burglar in your living room.

  2. Carlos, you handled this one exquisitely well. Both technically with the seccomp.h stuff (I only knew after reading the release notes, so I reviewed/looked at the commit only yesterday – out of curiosity), and communicatively.

    I think it’s fair to say that you are not just Tracker’s de-facto maintainer. You are its maintainer.

  3. >Now, one of these extractor modules uses GStreamer, which to my knowledge is still the go-to library if you want anything multimedia on linux

    No, that would be ffmpeg.

    The big flaw here is that someone thought it was acceptable to use badly maintained code written with performance in mind to process all files on a system to some extent. Why were sandboxing precautions not taken earlier? The big deal here is that tracker and GStreamer developers were not proactive in trying to mitigate security concerns, someone had to point them out first for a snarky rebuttal blog post to happen.

  4. No, that would be ffmpeg.

    ffmpeg gives me metadata of raw image formats? jpeg2000? png? xcf? ogg? flac? Are you willing to provide patches?

    The big flaw here is that someone thought it was acceptable to use badly maintained code written with performance in mind to process all files on a system to some extent. Why were sandboxing precautions not taken earlier? The big deal here is that tracker and GStreamer developers were not proactive in trying to mitigate security concerns, someone had to point them out first for a snarky rebuttal blog post to happen.

    The big deal has been just hot air since the GStreamer devs fixed the real issue really quickly after that snarky blog post, which was actually the first word of notice. After that, no matter how slow or “not proactive enough” you deem our response, there’s never been a real risk.

    And now what? after the issue is fixed we have to put up with “you should have known better” snarky comments? I won’t even bother approving/commenting your other comment, go whine to LWN.

  5. @fratti: A few years ago when we rearchitected tracker we did so with the explicit aim at splitting the whole thing up in separate processes.

    The main reason why tracker-extract was taken out of what is now called tracker-miner-fs (used to be called tracker-indexer or something) was that a crash of the extractor code (which heavily depends on external libraries — and we are very thankful of those libraries for else we would have had to written huge amounts of fileformat metadata extraction code) would mean loss of state of the miner process. But also, because of the possibility to in future further isolate the process that deals with external input (the files with metadata that must be extracted and mined).

    Container technology and process isolation like seccomp was released in 2005, but in 2009 it still had only one user: Arcangeli’s CPUShare. After that Google started using it and it became popular. This means that it’s actually a fairly recent possibility to use infrastructure like seccomp. Agreed that Tracker could have immediately jumped on it, but as Carlos mentioned: are you prepared to write patches? You can compare the years that the Tracker project was funded by Nokia for the Maemo/MeeGo devices to the current years. You’ll notice quite clearly that there aren’t that much people who spend their time on maintaining the project anymore.

    Carlos does what he can do. And now he added seccomp support. This is a good thing. Say thank you, instead of blaming people without contributing yourself.

    Code speaks louder than words, fratti.

    Kind regards,

    Philip

  6. Don’t bother with fratti, Philip. As long as he has nothing constructive to add, every later message is going straight to the trash.

Leave a Reply

Your email address will not be published. Required fields are marked *