Seeking in Transcoded Streams with Rygel

When looking at various UPnP media servers, one of the features I wanted was the ability to play back my music collection through my PlayStation 3.  The complicating factor is that most of my collection is encoded in Vorbis format, which is not yet supported by the PS3 (at this point, it doesn’t seem likely that it ever will).

Both MediaTomb and Rygel could handle this to an extent, transcoding the audio to raw LPCM data to send over the network.  This doesn’t require much CPU power on the server side, and only requires 1.4 Mbit/s of bandwidth, which is manageable on most home networks.  Unfortunately the only playback controls enabled in this mode are play and stop: if you want to pause, fast forward or rewind then you’re out of luck.

Given that Rygel has a fairly simple code base, I thought I’d have a go at fixing this.  The first solution I tried was the one I’ve mentioned a few times before: with uncompressed PCM data file offsets can be easily converted to sample numbers, so if the source format allows time based seeking, we can easily satisfy byte range requests.

I got a basic implementation of this working, but it was a little bit jumpy and not as stable as I’d like.  Before fully debugging it, I started looking at the mysterious DLNA options I’d copied over to get things working.  One of those was the “DLNA operation”, which was set to “range” mode.  Looking at the GUPnP header files, I noticed there was another value named “timeseek”.  When I picked this option, the HTTP requests from the PS3 changed:

GET /... HTTP/1.1
Host: ...
User-Agent: PLAYSTATION 3
Connection: Keep-Alive
Accept-Encoding: identity
TimeSeekRange.dlna.org: npt=0.00-
transferMode.dlna.org: Streaming

The pause, rewind and fast forward controls were now active, although only the pause control actually worked properly. After fast forwarding or rewinding, the PS3 would issue another HTTP request with the TimeSeekRange.dlna.org header specifying the new offset, but the playback position would reset to the start of the track when the operation completed. After a little more experimentation, I found that the playback position didn’t reset if I included TimeSeekRange.dlna.org in the response headers. Of course, I was still sending back the beginning of the track at this point but the PS3 acted as though it was playing from the new point in the song.

It wasn’t much more work to update the GStreamer calls to seek to the requested offset before playback and things worked pretty much as well as for non-transcoded files.  And since this solution didn’t involve byte offsets, it also worked for Rygel’s other transcoders.  It even worked to an extent with video files, but the delay before playback was a bit too high to make it usable — fixing that would probably require caching the GStreamer pipeline between HTTP requests.

Thoughts on DLNA

While it can be fun to reverse engineer things like this, it was a bit annoying to only be able to find out about the feature by reading header files written by people with access to the specification.  I can understand having interoperability and certification requirements to use the DLNA logo, but that does not require that the specifications be private.

As well as keeping the specification private, it feels like some aspects have been intentionally obfuscated, using bit fields represented in both binary and hexadecimal string representations inside the resource’s protocol info.  This might seem reasonable if it was designed for easy parsing, but you need to go through two levels of XML processing (the SOAP envelope and then the DIDL payload) to get to these flags.  Furthermore, the attributes inherited from the UPnP MediaServer specifications are all human readable so it doesn’t seem like an arbitrary choice.

On the bright side, I suppose we’re lucky they didn’t use cryptographic signatures to lock things down like Apple has with some of their protocols and file formats.

Watching iView with Rygel

One of the features of Rygel that I found most interesting was the external media server support.  It looked like an easy way to publish information on the network without implementing a full UPnP/DLNA media server (i.e. handling the UPnP multicast traffic, transcoding to a format that the remote system can handle, etc).

As a small test, I put together a server that exposes the ABC‘s iView service to UPnP media renderers.  The result is a bit rough around the edges, but the basic functionality works.  The source can be grabbed using Bazaar:

bzr branch lp:~jamesh/+junk/rygel-iview

It needs Python, Twisted, the Python bindings for D-Bus and rtmpdump to run.  The program exports the guide via D-Bus, and uses rtmpdump to stream the shows via HTTP.  Rygel then publishes the guide via the UPnP media server protocol and provides MPEG2 versions of the streams if clients need them.

There are still a few rough edges though.  The video from iView comes as 640×480 with a 16:9 aspect ratio so has a 4:3 pixel aspect ratio, but there is nothing in the video file to indicate this (I am not sure if flash video supports this metadata).

Getting Twisted and D-Bus to cooperate

Since I’d decided to use Twisted, I needed to get it to cooperate with the D-Bus bindings for Python.  The first step here was to get both libraries using the same event loop.  This can be achieved by setting Twisted to use the glib2 reactor, and enabling the glib mainloop integration in the D-Bus bindings.

Next was enabling asynchronous D-Bus method implementations.  There is support for this in the D-Bus bindings, but has quite a different (and less convenient) API compared to Twisted.  A small decorator was enough to overcome this impedence:

from functools import wraps

import dbus.service
from twisted.internet import defer

def dbus_deferred_method(*args, **kwargs):
    def decorator(function):
        function = dbus.service.method(*args, **kwargs)(function)
        @wraps(function)
        def wrapper(*args, **kwargs):
            dbus_callback = kwargs.pop('_dbus_callback')
            dbus_errback = kwargs.pop('_dbus_errback')
            d = defer.maybeDeferred(function, *args, **kwargs)
            d.addCallbacks(
                dbus_callback, lambda failure: dbus_errback(failure.value))
        wrapper._dbus_async_callbacks = ('_dbus_callback', '_dbus_errback')
        return wrapper
    return decorator

This decorator could then be applied to methods in the same way as the @dbus.service.method method, but it would correctly handle the case where the method returns a Deferred. Unfortunately it can’t be used in conjunction with @defer.inlineCallbacks, since the D-Bus bindings don’t handle varargs functions properly. You can of course call another function or method that uses @defer.inlineCallbacks though.

The iView Guide

After coding this, it became pretty obvious why it takes so long to load up the iView flash player: it splits the guide data over almost 300 XML files.  This might make sense if it relied on most of these files remaining unchanged and stored in cache, however it also uses a cache-busting technique when requesting them (adding a random query component to the URL).

Most of these files are series description files (some for finished series with no published programs).  These files contain a title, a short description, the URL for a thumbnail image and the IDs for the programs belonging to the series.  To find out about those programs, you need to load all the channel guide XML files until you find which one contains the program.  Going in the other direction, if you’ve got a program description from the channel guide and want to know about the series it belongs to (e.g. to get the thumbnail), you need to load each series description XML file until you find the one that contains the program.  So there aren’t many opportunities to delay loading of parts of the guide.

The startup time would be a lot easier if this information was collapsed down to a smaller number of larger XML files.

More Rygel testing

In my last post, I said I had trouble getting Rygel’s tracker backend to function and assumed that it was expecting an older version of the API.  It turns out I was incorrect and the problem was due in part to Ubuntu specific changes to the Tracker package and the unusual way Rygel was trying to talk to Tracker.

The Tracker packages in Ubuntu remove the D-Bus service activation file for the “org.freedesktop.Tracker” bus name so that if the user has not chosen to run the service (or has killed it), it won’t be automatically activated.  Unfortunately, instead of just calling a Tracker D-Bus method, Rygel was trying to manually activate Tracker via a StartServiceByName() call.  This would fail even if Tracker was running, hence my assumption that it was a tracker API version problem.

This problem will be fixed in the next Rygel release: it will call a method on Tracker directly to see if it is available.  With that problem out of the way, I was able to try out the backend.  It was providing a lot more metadata to the PS3 so more files were playable, which was good.  Browsing folders was also much quicker than the folder back end.  There were a few problems though:

  1. Files are exposed in one of three folders: “All Images”, “All Music” or “All Videos”.  With even a moderate sized music collection, this is unmangeable.  It wasn’t clear what order the files were being displayed in either.
  2. There was quite a long delay before video playback starts.

When the folder back end fixes the metadata and speed issues, I’d be inclined to use it over the tracker back end.

Video Transcoding

Getting video transcoding working turned out to require a newer GStreamer (0.10.23), the “unstripped” ffmpeg libraries and the “bad” GStreamer plugins package from multiverse.  With those installed, things worked pretty well.  With these dependencies encoded in the packaging, it’d be pretty painless to get it set up.  Certainly much easier than setting things up in MediaTomb’s configuration file.

Ubuntu packages for Rygel

I promised Zeeshan that I’d have a look at his Rygel UPnP Media Server a few months back, and finally got around to doing so.  For anyone else who wants to give it a shot, I’ve put together some Ubuntu packages for Jaunty and Karmic in a PPA here:

Most of the packages there are just rebuilds or version updates of existing packages, but the Rygel ones were done from scratch.  It is the first Debian package I’ve put together from scratch and it wasn’t as difficult as I thought it might be.  The tips from the “Teach me packaging” workshop at the Canonical All Hands meeting last month were quite helpful.

After installing the package, you can configure it by running the “rygel-preferences” program.  The first notebook page lets you configure the transcoding support, and the second page lets you configure the various media source plugins.

I wasn’t able to get the Tracker plugin working on my system, which I think is due to Rygel expecting the older Tracker D-Bus API.  I was able to get the folder plugin working pretty easily though.

Once things were configured, I ran Rygel itself and an extra icon showed up on my PlayStation 3.  Getting folder listings was quite slow, but apparently this is limited to the folder back end and is currently being worked on.  It’s a shame I wasn’t able to test the more mature Tracker back end.

With LPCM transcoding enabled, I was able to successfully play a Vorbis file on the PS3.  With transcoding disabled, I wasn’t able to play any music — even files in formats the PS3 could handle natively.  This was apparently due to the folder backend not providing the necessary metadata.  I didn’t have any luck with MPEG2 transcoding for video.

It looks like Rygel has promise, but is not yet at a stage where it could replace something like MediaTomb.  The external D-Bus media source support looks particularly interesting.  I look forward to trying out version 0.4 when it is released.

Sansa Fuze

On my way back from Canada a few weeks ago, I picked up a SanDisk Sansa Fuze media player.  Overall, I like it.  It supports Vorbis and FLAC audio out of the box, has a decent amount of on board storage (8GB) and can be expanded with a MicroSDHC card.  It does use a proprietary dock connector for data transfer and charging, but that’s about all I don’t like about it.  The choice of accessories for this connector is underwhelming, so a standard mini-USB connector would have been preferable since I wouldn’t need as many cables.

The first thing I tried was to copy some music to the device using Rhythmbox.  This appeared to work, but took longer than expected.  When I tried to play the music, it was listed as having an unknown artist and album name.  Looking at the player’s filesystem, the reason for this was obvious: Rhythmbox had transcoded the music to MP3 and lost the tags.  Copying the ogg files directly worked a lot better: it was quicker and preserved the metadata.

Of course, getting Rhythmbox to do the right thing would be preferable to telling people not to use it.  Rhythmbox depends on information about the device provided by HAL, so I had a look at the relevant FDI files.  There was one section for Sansa Clip and Fuze players which didn’t list Vorbis support, and another section for “Sansa Clip version II”.  The second section was a much better match for the capabilities of my device.  As all Clip and Fuze devices support the extra formats when running the latest firmware, I merged the two sections (hal bug 20616, ubuntu bug 345249).  With the updated FDI file in place, copying music with Rhythmbox worked as expected.

The one downside to this change is that if you have a device with old firmware, Rhythmbox will no longer transcode music to a format the device can play.  There doesn’t seem to be any obvious way to tell if a device has a new enough firmware via USB IDs or similar, so I’m not sure how to handle it automatically.  That said, it is pretty easy to upgrade the firmware following the instructions from their forum, so it is probably best to just do that.

PulseAudio

It seems to be a fashionable to blog about experiences with PulseAudio, I thought I’d join in.

I’ve actually had some good experiences with PulseAudio, seeing some tangible benefits over the ALSA setup I was using before.  I’ve got a cheapish surround sound speaker set connected to my desktop.  While it gives pretty good sound when all the speakers are used together, it sounds like crap if only the front left/right speakers are used.

ALSA supports multi-channel audio with the motherboard’s sound card alright, but apps producing stereo sound would only play out of the front two speakers.  There are some howtos on the internet for setting up a separate ALSA device that routes stereo audio to all the speakers in the right way, but that requires that I know in advance what sort of audio an application is going to generate: something like Totem could produce mono, stereo or surround output depending on the file I want to play.  This is more effort than I was usually willing to do, so I ended up flicking a switch on the amplifier to duplicate the front left/right channels to the rear.

With PulseAudio, I just had to edit the /etc/pulse/daemon.conf file and set default-sample-channels to 6, and it took care of converting mono and stereo output from apps to play on all the speakers while still letting apps producing surround output play as expected.  This means I automatically get the best result without any special effort on my part.

I’m not too worried that I had to tell PulseAudio how many speakers I had, since it is possible to plug in a number of speaker configurations and I don’t think the card is capable of sensing what has been attached (the manual documents manually selecting the speaker configuration in the Windows driver).  It might be nice if there was a way to configure this through the GUI though.

I’m looking forward to trying the “flat volume” feature in future versions of PulseAudio, as it should get the best quality out of the sound hardware (if I understand things right, 50% volume with current PulseAudio releases means you only get 15 bits of quantisation on a 16-bit sound card).  I just hope that it manages to cope with the mixers my sound card exports: one two-channel mixer for the front speakers, one two-channel mixer for the rear two speakers and two single channel mixers for the center and LFE channels.

Using Twisted Deferred objects with gio

The gio library provides both synchronous and asynchronous interfaces for performing IO.  Unfortunately, the two APIs require quite different programming styles, making it difficult to convert code written to the simpler synchronous API to the asynchronous one.

For C programs this is unavoidable, but for Python we should be able to do better.  And if you’re doing asynchronous event driven code in Python, it makes sense to look at Twisted.  In particular, Twisted’s Deferred objects can be quite helpful.

Deferred

The Twisted documentation describes deferred objects as “a callback which will be put off until later”.  The deferred will eventually be passed the result of some operation, or information about how it failed.

From the consumer side, you can register one or more callbacks that will be run:

def callback(result):
    # do stuff
    return result

deferred.addCallback(callback)

The first callback will be called with the original result, while subsequent callbacks will be passed the return value of the previous callback (this is why the above example returns its argument). If the operation fails, one or more errbacks (error callbacks) will be called:

def errback(failure):
    # do stuff
    return failure

deferred.addErrback(errback)

If the operation associated with the deferred has already been completed (or already failed) when the callback/errback is added, then it will be called immediately. So there is no need to check if the operation is complete before hand.

Using Deferred objects with gio

We can easily use gio’s asynchronous API to implement a new API based on deferred objects.  For example:

import gio
from twisted.internet import defer

def file_read_deferred(file, io_priority=0, cancellable=None):
    d = defer.Deferred()
    def callback(file, async_result):
        try:
            in_stream = file.read_finish(async_result)
        except gio.Error:
            d.errback()
        else:
            d.callback(in_stream)
    file.read_async(callback, io_priority, cancellable)
    return d

def input_stream_read_deferred(in_stream, count, io_priority=0,
                               cancellable=None):
    d = defer.Deferred()
    def callback(in_stream, async_result):
        try:
            bytes = in_stream.read_finish(async_result)
        except gio.Error:
            d.errback()
        else:
            d.callback(bytes)
    # the argument order seems a bit weird here ...
    in_stream.read_async(count, callback, io_priority, cancellable)
    return d

This is a fairly simple transformation, so you might ask what this buys us. We’ve gone from an interface where you pass a callback to the method to one where you pass a callback to the result of the method. The answer is in the tools that Twisted provides for working with deferred objects.

The inlineCallbacks decorator

You’ve probably seen code examples that use Python’s generators to implement simple co-routines. Twisted’s inlineCallbacks decorator basically implements this for generators that yield deferred objects. It uses the enhanced generators feature from Python 2.5 (PEP 342) to pass the deferred result or failure back to the generator. Using it, we can write code like this:

@defer.inlineCallbacks
def print_contents(file, cancellable=None):
    in_stream = yield file_read_deferred(file, cancellable=cancellable)
    bytes = yield input_stream_read_deferred(
        in_stream, 4096, cancellable=cancellable)
    while bytes:
        # Do something with the data.  For this example, just print to stdout.
        sys.stdout.write(bytes)
        bytes = yield input_stream_read_deferred(
            in_stream, 4096, cancellable=cancellable)

Other than the use of the yield keyword, the above code looks quite similar to the equivalent synchronous implementation.  The only thing that would improve matters would be if these were real methods rather than helper functions.

Furthermore, the inlineCallbacks decorator causes the function to return a deferred that will fire when the function body finally completes or fails. This makes it possible to use the function from within other asynchronous code in a similar fashion. And once you’re using deferred results, you can mix in the gio calls with other Twisted asynchronous calls where it makes sense.

Metrics for success of a DVCS

One thing that has been mentioned in the GNOME DVCS debate was that it is as easy to do “git diff” as it is to do “svn diff” so the learning curve issue is moot.  I’d have to disagree here.

Traditional Centralised Version Control

With traditional version control systems  (e.g. CVS and Subversion) as used by Free Software projects like GNOME, there are effectively two classes of users that I will refer to as “committers” and “patch contributors”:

Centralised VCS Users

Patch contributors are limited to read only access to the version control system.  They can check out a working copy to make changes, and then produce a patch with the “diff” command to submit to a bug tracker or send to a mailing list.  This is where new contributors start, so it is important that it be easy to get started in this mode.

Once a contributor is trusted enough, they may be given write access to the repository moving them to the committers group. They now have access to more functionality from the VCS, including the ability to checkpoint changes into focused commits, possibly on branches.  The contributor may still be required to go through patch review before committing, or may be given free reign to commit changes as they see fit.

Some problems with this arrangement include:

  • New developers are given a very limited set of tools to do their work.
  • If a developer goes to the trouble of learning the advanced features of the version control system, they are still limited to the read only subset if they decide to start contributing to another project.

Distributed Workflow

A DVCS allows anyone to commit to their own branches and provides the full feature set to all users.  This splits the “committers” class into two classes:

Distributed VCS Users

The social aspect of the “committers” group now becomes the group of people who can commit to the main line of the project – the core developers. Outside this group, we have people who make use of the same features of the VCS as the core developers but do not have write access to the main line: their changes must be reviewed and merged by a core developer.

I’ve left the “patch contributor” class in the above diagram because not all contributors will bother learning the details of the VCS.  For projects I’ve worked on that used a DVCS, I’ve still seen people send simple patches (either from the “xxx diff” command, or as diffs against a tarball release) and I don’t think that is likely to change.

Measuring Success

Making the lives of core developers better is often brought up as a reason to switch to a DVCS (e.g. through features like offline commits, local cache of history, etc).  I’d argue that making life easier for non core contributors is at least as important.  One way we can measure this is by looking at whether such contributors are actually using VCS features beyond what they could with a traditional centralised setup.

By looking at the relative numbers of contributors who submit regular patches and those that either publish branches or submit changesets we can get an idea of how much of the VCS they have used.

It’d be interesting to see the results of a study based on contributions to various projects that have already adopted DVCS.  Although I don’t have any reliable numbers, I can guess at two things that might affect the results:

  1. Familiarity for existing developers.  There is a lot of cross pollination in Free Software, so it isn’t uncommon for a new contributor to have worked on another project before hand.  Using a VCS with a familiar command set can help here (or using the same VCS).
  2. A gradual learning curve.  New contributors should be able to get going with a small command set, and easily learn more features as they need them.

I am sure that there are other things that would affect the results, but these are the ones that I think would have the most noticeable effects.

DVCS talks at GUADEC

Yesterday, a BoF was scheduled for discussion of distributed version control systems with GNOME.  The BoF session did not end up really discussing the issues of what GNOME needs out of a revision control system, and some of the examples Federico used were a bit snarky.

We had a more productive meeting in the session afterwards where we went over some of the concrete goals for the system.  The list from the blackboard was:

  • Contributor collaboration (i.e. let anyone use the tool rather than just core developers).
  • Distro ⇔ distro and distro ⇔ upstream collaboration.
  • Host GNOME source code repositories
  • Code review
  • Server side hooks
  • Translators: what to do?
  • Enforced checks
  • Offline operations
  • Documentation authors?
  • Support Win32/Mac (important for GTK)

The sys admin tasks were broken down to:

  • MAINTAINERS file syntax checking
  • PO file syntax checking
  • CIA integration.
  • Commits mailing list
  • Check that commit messages are not empty
  • Trigger updates from commits (e.g. the web site module).
  • Release notes tarballs
  • Damned Lies support

It was clear from the discussion that neither Git or Bazaar satisfied all of the criteria.

The Playground

John Carr did a great job setting up Bazaar mirrors of all the GNOME modules.  This provided an easy way for people to see play around with Bazaar.  However, it only gave you half the experience since it didn’t provide a way to publish code and collaborate.

To aid in this, we have set up the bzr-playground.gnome.org machine, which any GNOME developer should be able to use to publish branches based on John’s imports.  Instructions on getting set up can be found on the wiki.  I hope that we will get a lot of people trying out this infrastructure.

We gave a presentation today on some of the things Bazaar provides that could be useful when hacking on GNOME.  Demoing bzr-playground was a bit problematic due to the internet connection problems at the venue, but I think we still showed some useful tools for local collaboration, searching and code review.

Meanwhile, Robert Collins has been working on some of the GNOME sysadmin features that Bazaar was lacking.  Among other things, he got Damned Lies working with both Subversion and Bazaar, with a test installation on the playground machine.

Prague

I arrived in Prague yesterday for the Ubuntu Developer Summit.  Including time spent in transit in Singapore and London, the flights took about 30 hours.

As I was flying on BA, I got to experience Heathrow Terminal 5. It wasn’t quite as bad as some of the horror stories I’d heard.  There were definitely aspects that weren’t forgiving of mistakes.  For example, when taking the train to the “B” section there was a sign saying that if you accidentally got on the train when you shouldn’t have it would take 40 minutes to get back to the “A” section.

It is also quite difficult to find water fountains in the terminal, which is inexcusable given that they don’t let people bring their own water bottles.

I had been a bit worried that they’d lose my bag, but it arrived okay in Prague.  Jonathan was not so lucky.

As well as the Ubuntu and Canonical folks, there are a bunch of Gnome developers here, including Ryan, Murray, Olav, David and Lennart.  It will be an interesting week.