Seeking in Transcoded Streams with Rygel

When looking at various UPnP media servers, one of the features I wanted was the ability to play back my music collection through my PlayStation 3.  The complicating factor is that most of my collection is encoded in Vorbis format, which is not yet supported by the PS3 (at this point, it doesn’t seem likely that it ever will).

Both MediaTomb and Rygel could handle this to an extent, transcoding the audio to raw LPCM data to send over the network.  This doesn’t require much CPU power on the server side, and only requires 1.4 Mbit/s of bandwidth, which is manageable on most home networks.  Unfortunately the only playback controls enabled in this mode are play and stop: if you want to pause, fast forward or rewind then you’re out of luck.

Given that Rygel has a fairly simple code base, I thought I’d have a go at fixing this.  The first solution I tried was the one I’ve mentioned a few times before: with uncompressed PCM data file offsets can be easily converted to sample numbers, so if the source format allows time based seeking, we can easily satisfy byte range requests.

I got a basic implementation of this working, but it was a little bit jumpy and not as stable as I’d like.  Before fully debugging it, I started looking at the mysterious DLNA options I’d copied over to get things working.  One of those was the “DLNA operation”, which was set to “range” mode.  Looking at the GUPnP header files, I noticed there was another value named “timeseek”.  When I picked this option, the HTTP requests from the PS3 changed:

GET /... HTTP/1.1
Host: ...
User-Agent: PLAYSTATION 3
Connection: Keep-Alive
Accept-Encoding: identity
TimeSeekRange.dlna.org: npt=0.00-
transferMode.dlna.org: Streaming

The pause, rewind and fast forward controls were now active, although only the pause control actually worked properly. After fast forwarding or rewinding, the PS3 would issue another HTTP request with the TimeSeekRange.dlna.org header specifying the new offset, but the playback position would reset to the start of the track when the operation completed. After a little more experimentation, I found that the playback position didn’t reset if I included TimeSeekRange.dlna.org in the response headers. Of course, I was still sending back the beginning of the track at this point but the PS3 acted as though it was playing from the new point in the song.

It wasn’t much more work to update the GStreamer calls to seek to the requested offset before playback and things worked pretty much as well as for non-transcoded files.  And since this solution didn’t involve byte offsets, it also worked for Rygel’s other transcoders.  It even worked to an extent with video files, but the delay before playback was a bit too high to make it usable — fixing that would probably require caching the GStreamer pipeline between HTTP requests.

Thoughts on DLNA

While it can be fun to reverse engineer things like this, it was a bit annoying to only be able to find out about the feature by reading header files written by people with access to the specification.  I can understand having interoperability and certification requirements to use the DLNA logo, but that does not require that the specifications be private.

As well as keeping the specification private, it feels like some aspects have been intentionally obfuscated, using bit fields represented in both binary and hexadecimal string representations inside the resource’s protocol info.  This might seem reasonable if it was designed for easy parsing, but you need to go through two levels of XML processing (the SOAP envelope and then the DIDL payload) to get to these flags.  Furthermore, the attributes inherited from the UPnP MediaServer specifications are all human readable so it doesn’t seem like an arbitrary choice.

On the bright side, I suppose we’re lucky they didn’t use cryptographic signatures to lock things down like Apple has with some of their protocols and file formats.

Watching iView with Rygel

One of the features of Rygel that I found most interesting was the external media server support.  It looked like an easy way to publish information on the network without implementing a full UPnP/DLNA media server (i.e. handling the UPnP multicast traffic, transcoding to a format that the remote system can handle, etc).

As a small test, I put together a server that exposes the ABC‘s iView service to UPnP media renderers.  The result is a bit rough around the edges, but the basic functionality works.  The source can be grabbed using Bazaar:

bzr branch lp:~jamesh/+junk/rygel-iview

It needs Python, Twisted, the Python bindings for D-Bus and rtmpdump to run.  The program exports the guide via D-Bus, and uses rtmpdump to stream the shows via HTTP.  Rygel then publishes the guide via the UPnP media server protocol and provides MPEG2 versions of the streams if clients need them.

There are still a few rough edges though.  The video from iView comes as 640×480 with a 16:9 aspect ratio so has a 4:3 pixel aspect ratio, but there is nothing in the video file to indicate this (I am not sure if flash video supports this metadata).

Getting Twisted and D-Bus to cooperate

Since I’d decided to use Twisted, I needed to get it to cooperate with the D-Bus bindings for Python.  The first step here was to get both libraries using the same event loop.  This can be achieved by setting Twisted to use the glib2 reactor, and enabling the glib mainloop integration in the D-Bus bindings.

Next was enabling asynchronous D-Bus method implementations.  There is support for this in the D-Bus bindings, but has quite a different (and less convenient) API compared to Twisted.  A small decorator was enough to overcome this impedence:

from functools import wraps

import dbus.service
from twisted.internet import defer

def dbus_deferred_method(*args, **kwargs):
    def decorator(function):
        function = dbus.service.method(*args, **kwargs)(function)
        @wraps(function)
        def wrapper(*args, **kwargs):
            dbus_callback = kwargs.pop('_dbus_callback')
            dbus_errback = kwargs.pop('_dbus_errback')
            d = defer.maybeDeferred(function, *args, **kwargs)
            d.addCallbacks(
                dbus_callback, lambda failure: dbus_errback(failure.value))
        wrapper._dbus_async_callbacks = ('_dbus_callback', '_dbus_errback')
        return wrapper
    return decorator

This decorator could then be applied to methods in the same way as the @dbus.service.method method, but it would correctly handle the case where the method returns a Deferred. Unfortunately it can’t be used in conjunction with @defer.inlineCallbacks, since the D-Bus bindings don’t handle varargs functions properly. You can of course call another function or method that uses @defer.inlineCallbacks though.

The iView Guide

After coding this, it became pretty obvious why it takes so long to load up the iView flash player: it splits the guide data over almost 300 XML files.  This might make sense if it relied on most of these files remaining unchanged and stored in cache, however it also uses a cache-busting technique when requesting them (adding a random query component to the URL).

Most of these files are series description files (some for finished series with no published programs).  These files contain a title, a short description, the URL for a thumbnail image and the IDs for the programs belonging to the series.  To find out about those programs, you need to load all the channel guide XML files until you find which one contains the program.  Going in the other direction, if you’ve got a program description from the channel guide and want to know about the series it belongs to (e.g. to get the thumbnail), you need to load each series description XML file until you find the one that contains the program.  So there aren’t many opportunities to delay loading of parts of the guide.

The startup time would be a lot easier if this information was collapsed down to a smaller number of larger XML files.

More Rygel testing

In my last post, I said I had trouble getting Rygel’s tracker backend to function and assumed that it was expecting an older version of the API.  It turns out I was incorrect and the problem was due in part to Ubuntu specific changes to the Tracker package and the unusual way Rygel was trying to talk to Tracker.

The Tracker packages in Ubuntu remove the D-Bus service activation file for the “org.freedesktop.Tracker” bus name so that if the user has not chosen to run the service (or has killed it), it won’t be automatically activated.  Unfortunately, instead of just calling a Tracker D-Bus method, Rygel was trying to manually activate Tracker via a StartServiceByName() call.  This would fail even if Tracker was running, hence my assumption that it was a tracker API version problem.

This problem will be fixed in the next Rygel release: it will call a method on Tracker directly to see if it is available.  With that problem out of the way, I was able to try out the backend.  It was providing a lot more metadata to the PS3 so more files were playable, which was good.  Browsing folders was also much quicker than the folder back end.  There were a few problems though:

  1. Files are exposed in one of three folders: “All Images”, “All Music” or “All Videos”.  With even a moderate sized music collection, this is unmangeable.  It wasn’t clear what order the files were being displayed in either.
  2. There was quite a long delay before video playback starts.

When the folder back end fixes the metadata and speed issues, I’d be inclined to use it over the tracker back end.

Video Transcoding

Getting video transcoding working turned out to require a newer GStreamer (0.10.23), the “unstripped” ffmpeg libraries and the “bad” GStreamer plugins package from multiverse.  With those installed, things worked pretty well.  With these dependencies encoded in the packaging, it’d be pretty painless to get it set up.  Certainly much easier than setting things up in MediaTomb’s configuration file.

Ubuntu packages for Rygel

I promised Zeeshan that I’d have a look at his Rygel UPnP Media Server a few months back, and finally got around to doing so.  For anyone else who wants to give it a shot, I’ve put together some Ubuntu packages for Jaunty and Karmic in a PPA here:

Most of the packages there are just rebuilds or version updates of existing packages, but the Rygel ones were done from scratch.  It is the first Debian package I’ve put together from scratch and it wasn’t as difficult as I thought it might be.  The tips from the “Teach me packaging” workshop at the Canonical All Hands meeting last month were quite helpful.

After installing the package, you can configure it by running the “rygel-preferences” program.  The first notebook page lets you configure the transcoding support, and the second page lets you configure the various media source plugins.

I wasn’t able to get the Tracker plugin working on my system, which I think is due to Rygel expecting the older Tracker D-Bus API.  I was able to get the folder plugin working pretty easily though.

Once things were configured, I ran Rygel itself and an extra icon showed up on my PlayStation 3.  Getting folder listings was quite slow, but apparently this is limited to the folder back end and is currently being worked on.  It’s a shame I wasn’t able to test the more mature Tracker back end.

With LPCM transcoding enabled, I was able to successfully play a Vorbis file on the PS3.  With transcoding disabled, I wasn’t able to play any music — even files in formats the PS3 could handle natively.  This was apparently due to the folder backend not providing the necessary metadata.  I didn’t have any luck with MPEG2 transcoding for video.

It looks like Rygel has promise, but is not yet at a stage where it could replace something like MediaTomb.  The external D-Bus media source support looks particularly interesting.  I look forward to trying out version 0.4 when it is released.

Streaming Vorbis files from Ubuntu to a PS3

One of the nice features of the PlayStation 3 is the UPNP/DLNA media renderer.  Unfortunately, the set of codecs is pretty limited, which is a problem since most of my music is encoded as Vorbis.  MediaTomb was suggested to me as a server that could transcode the files to a format the PS3 could understand.

Unfortunately, I didn’t have much luck with the version included with Ubuntu 8.10 (Intrepid), and after a bit of investigation it seems that there isn’t a released version of MediaTomb that can send PCM audio to the PS3.  So I put together a package of a subversion snapshot in my PPA which should work on Intrepid.

With the newer package, it was pretty easy to get things working:

  1. Install the mediatomb-daemon package
  2. Edit the /etc/mediatomb/config.xml file and make the following changes:
    • Change the <protocolInfo/> line to set extend="yes".
    • In the <extension-mimetype> section, uncomment the line to map “avi” to “video/divx”.  This will get a lot of videos to play without problem.
    • In the <mimetype-upnpclass> section, add a line to map “application/ogg” to “object.item.audioItem.musicTrack”.  This is needed for the vorbis files to be recognised as music.
    • In the <mimetype-contenttype> section add a line to map “audio/L16” to “pcm”.
    • On the <transcoding> element, change the enabled attribute to “yes”.
    • Add the settings from here to the <transcoding> section.
  3. Edit the /etc/default/mediatomb script and set INTERFACE to the network interface you want to advertise on.
  4. Restart the mediatomb daemon.
  5. Go to the web UI (try opening /var/lib/mediatomb/mediatomb.html in a web browser), and add the directories you want to export.
  6. Test things on the PS3.

Things aren’t perfect though.  As MediaTomb is simply piping the transcoded audio to the PS3, it doesn’t implement seeking on such files, and it seems that the PS3 won’t even let you pause a stream that doesn’t allow seeking.  With a less generalised transcoding backend, it seems like it should be trivial to support seeking in an uncompressed PCM stream though, since the byte offsets can be trivially mapped to sample numbers.

The other problem I found was that none of the recent music I’d ripped showed up.  It seems that they’d been ripped with the .oga file extension rather than .ogg.  This change appears to have been made in bug 543306, but the reasoning seems suspect: the guidelines from Xiph indicate that the files generated by this encoding profile should continue to use the .ogg file extension.

I tried adding some extra mappings to the MediaTomb configuration file to recognise the files without success, but eventually decided to just rename them and fix the encoding profile locally.

A Perfect Media Server

While MediaTomb mostly works for me, it doesn’t do everything I’d like.  A few of the things I’d like out of a media server include:

  1. No need to configure things via a web UI.  In fact, I could do without a web UI all together – something nicely integrated into the desktop would be nice.
  2. No need to set model specific settings in the configuration file.  Ideally it would know how to talk to common players by default.
  3. Supports transcoding and seeking within transcoded files.  Preferably knows what needs transcoding for common players.
  4. Picks up new files in real time.  So something inotify based rather than periodic reindexing.
  5. A virtual folder tree for music based on artist/album metadata. A plain folder tree for other media would be fine.
  6. Cached video thumbnails would be nice too.  The build of MediaTomb in my PPA includes support for thumbnails (needs to be enabled in the config file), but they aren’t cached so are slow to appear.

Perhaps Zeeshan‘s media server will be worth trying out at some point.