Seeking in Transcoded Streams with Rygel

When looking at various UPnP media servers, one of the features I wanted was the ability to play back my music collection through my PlayStation 3.  The complicating factor is that most of my collection is encoded in Vorbis format, which is not yet supported by the PS3 (at this point, it doesn’t seem likely that it ever will).

Both MediaTomb and Rygel could handle this to an extent, transcoding the audio to raw LPCM data to send over the network.  This doesn’t require much CPU power on the server side, and only requires 1.4 Mbit/s of bandwidth, which is manageable on most home networks.  Unfortunately the only playback controls enabled in this mode are play and stop: if you want to pause, fast forward or rewind then you’re out of luck.

Given that Rygel has a fairly simple code base, I thought I’d have a go at fixing this.  The first solution I tried was the one I’ve mentioned a few times before: with uncompressed PCM data file offsets can be easily converted to sample numbers, so if the source format allows time based seeking, we can easily satisfy byte range requests.

I got a basic implementation of this working, but it was a little bit jumpy and not as stable as I’d like.  Before fully debugging it, I started looking at the mysterious DLNA options I’d copied over to get things working.  One of those was the “DLNA operation”, which was set to “range” mode.  Looking at the GUPnP header files, I noticed there was another value named “timeseek”.  When I picked this option, the HTTP requests from the PS3 changed:

GET /... HTTP/1.1
Host: ...
User-Agent: PLAYSTATION 3
Connection: Keep-Alive
Accept-Encoding: identity
TimeSeekRange.dlna.org: npt=0.00-
transferMode.dlna.org: Streaming

The pause, rewind and fast forward controls were now active, although only the pause control actually worked properly. After fast forwarding or rewinding, the PS3 would issue another HTTP request with the TimeSeekRange.dlna.org header specifying the new offset, but the playback position would reset to the start of the track when the operation completed. After a little more experimentation, I found that the playback position didn’t reset if I included TimeSeekRange.dlna.org in the response headers. Of course, I was still sending back the beginning of the track at this point but the PS3 acted as though it was playing from the new point in the song.

It wasn’t much more work to update the GStreamer calls to seek to the requested offset before playback and things worked pretty much as well as for non-transcoded files.  And since this solution didn’t involve byte offsets, it also worked for Rygel’s other transcoders.  It even worked to an extent with video files, but the delay before playback was a bit too high to make it usable — fixing that would probably require caching the GStreamer pipeline between HTTP requests.

Thoughts on DLNA

While it can be fun to reverse engineer things like this, it was a bit annoying to only be able to find out about the feature by reading header files written by people with access to the specification.  I can understand having interoperability and certification requirements to use the DLNA logo, but that does not require that the specifications be private.

As well as keeping the specification private, it feels like some aspects have been intentionally obfuscated, using bit fields represented in both binary and hexadecimal string representations inside the resource’s protocol info.  This might seem reasonable if it was designed for easy parsing, but you need to go through two levels of XML processing (the SOAP envelope and then the DIDL payload) to get to these flags.  Furthermore, the attributes inherited from the UPnP MediaServer specifications are all human readable so it doesn’t seem like an arbitrary choice.

On the bright side, I suppose we’re lucky they didn’t use cryptographic signatures to lock things down like Apple has with some of their protocols and file formats.

Streaming Vorbis files from Ubuntu to a PS3

One of the nice features of the PlayStation 3 is the UPNP/DLNA media renderer.  Unfortunately, the set of codecs is pretty limited, which is a problem since most of my music is encoded as Vorbis.  MediaTomb was suggested to me as a server that could transcode the files to a format the PS3 could understand.

Unfortunately, I didn’t have much luck with the version included with Ubuntu 8.10 (Intrepid), and after a bit of investigation it seems that there isn’t a released version of MediaTomb that can send PCM audio to the PS3.  So I put together a package of a subversion snapshot in my PPA which should work on Intrepid.

With the newer package, it was pretty easy to get things working:

  1. Install the mediatomb-daemon package
  2. Edit the /etc/mediatomb/config.xml file and make the following changes:
    • Change the <protocolInfo/> line to set extend="yes".
    • In the <extension-mimetype> section, uncomment the line to map “avi” to “video/divx”.  This will get a lot of videos to play without problem.
    • In the <mimetype-upnpclass> section, add a line to map “application/ogg” to “object.item.audioItem.musicTrack”.  This is needed for the vorbis files to be recognised as music.
    • In the <mimetype-contenttype> section add a line to map “audio/L16” to “pcm”.
    • On the <transcoding> element, change the enabled attribute to “yes”.
    • Add the settings from here to the <transcoding> section.
  3. Edit the /etc/default/mediatomb script and set INTERFACE to the network interface you want to advertise on.
  4. Restart the mediatomb daemon.
  5. Go to the web UI (try opening /var/lib/mediatomb/mediatomb.html in a web browser), and add the directories you want to export.
  6. Test things on the PS3.

Things aren’t perfect though.  As MediaTomb is simply piping the transcoded audio to the PS3, it doesn’t implement seeking on such files, and it seems that the PS3 won’t even let you pause a stream that doesn’t allow seeking.  With a less generalised transcoding backend, it seems like it should be trivial to support seeking in an uncompressed PCM stream though, since the byte offsets can be trivially mapped to sample numbers.

The other problem I found was that none of the recent music I’d ripped showed up.  It seems that they’d been ripped with the .oga file extension rather than .ogg.  This change appears to have been made in bug 543306, but the reasoning seems suspect: the guidelines from Xiph indicate that the files generated by this encoding profile should continue to use the .ogg file extension.

I tried adding some extra mappings to the MediaTomb configuration file to recognise the files without success, but eventually decided to just rename them and fix the encoding profile locally.

A Perfect Media Server

While MediaTomb mostly works for me, it doesn’t do everything I’d like.  A few of the things I’d like out of a media server include:

  1. No need to configure things via a web UI.  In fact, I could do without a web UI all together – something nicely integrated into the desktop would be nice.
  2. No need to set model specific settings in the configuration file.  Ideally it would know how to talk to common players by default.
  3. Supports transcoding and seeking within transcoded files.  Preferably knows what needs transcoding for common players.
  4. Picks up new files in real time.  So something inotify based rather than periodic reindexing.
  5. A virtual folder tree for music based on artist/album metadata. A plain folder tree for other media would be fine.
  6. Cached video thumbnails would be nice too.  The build of MediaTomb in my PPA includes support for thumbnails (needs to be enabled in the config file), but they aren’t cached so are slow to appear.

Perhaps Zeeshan‘s media server will be worth trying out at some point.