Improved handling of files with multiple tracks in GStreamer

Thanks to Sebastian Dröge there is a new thing in GStreamer called streamid. It basically gives all streams inside a given file a unique id, making files with multiple streams a lot easier to deal with. This streamid is also supported by the GStreamer discoverer object. So once you identified the contents of a file with discoverer you can be sure to grab the exact stream you want coming out of (uri)decodebin by checking the pad for the streamid. The most common usecase for this is of course files with multiple audio streams in different languages.

From the output of Discoverer the stream id is really easy to get:
On the stream object you get out of Discoverer you just run a:

stream.get_stream_id()

On the pad you get from decodebin or uridecodebin the patch is a bit more convoluted, but not
to hard once you know how (there might be some kind of convenience API added for this at some point).

Before you connect the pad you get from the bin you attach a pad to it like this:

src_pad.add_probe(Gst.PadProbeType.EVENT_DOWNSTREAM, self.padprobe, None)


Then you in the function you define you can extract the stream_id with the parse_stream_start call as seen below:

def padprobe(self, pad, probeinfo, userdata):
       event = probeinfo.get_event()
       eventtype=event.type
       if eventtype==Gst.EventType.STREAM_START:
           streamid = event.parse_stream_start() 
       return Gst.PadProbeReturn.OK

I been using this code in my local copy of Transmageddon to start implementing support for files with multiple audio streams (also supporting multiple video streams would be easy, but I am not sure how useful it would be). Got a screenshot of my current development snapshot below, but I am still trying to figure out what would be a nice way to present it. The current setup will look quite crap if the incoming file got more than a few audio streams. Suggestions welcome :)

Transmageddon multistream  devshot

Transmageddon multistream development snapshot

GStreamer, Python and videomixing

One feature that would be of interest to us in the Empathy Video Conference client is the ability to record conversations. Due to that I have been putting together a simple prototype Python test application in free moments to verify that everything works as expected, before any effort is put into doing any work inside Empathy.

The sample code below requires two webcams to be connected to your system to work. It basically takes the two camera video streams, puts one of them through a encode/rtp/decode process (to roughly emulate what happens in a video call) and puts a text overlay onto the video to let the conference participant know the call is being recorded. The two video streams are then mixed together and displayed. In the actual application the combined stream would be saved to disk instead of course and also audio captured and mixed.

If we ever get around to working on this feature is an open question, but at least we can now assume that it is likely to work. Of course getting one stream in over the network over RTP is very different from what this sample does, so that might uncover some bugs.

The sample also works with Python3, so even though it is only a prototype it already fulfils the GNOME Goal :)

import sys
from gi.repository import Gst
from gi.repository import GObject
GObject.threads_init()
Gst.init(None)

import os

class VideoBox():
   def __init__(self):
       mainloop = GObject.MainLoop()
       # Create transcoding pipeline
       self.pipeline = Gst.Pipeline()


       self.v4lsrc1 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc1.set_property("device", "/dev/video0")
       self.pipeline.add(self.v4lsrc1)

       self.v4lsrc2 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc2.set_property("device", "/dev/video1")
       self.pipeline.add(self.v4lsrc2)

       camera1caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter1 = Gst.ElementFactory.make("capsfilter", "filter1") 
       self.camerafilter1.set_property("caps", camera1caps)
       self.pipeline.add(self.camerafilter1)

       self.videoenc = Gst.ElementFactory.make("theoraenc", None)
       self.pipeline.add(self.videoenc)

       self.videodec = Gst.ElementFactory.make("theoradec", None)
       self.pipeline.add(self.videodec)

       self.videortppay = Gst.ElementFactory.make("rtptheorapay", None)
       self.pipeline.add(self.videortppay)

       self.videortpdepay = Gst.ElementFactory.make("rtptheoradepay", None)
       self.pipeline.add(self.videortpdepay)

       self.textoverlay = Gst.ElementFactory.make("textoverlay", None)
       self.textoverlay.set_property("text","Talk is being recorded")
       self.pipeline.add(self.textoverlay)

       camera2caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter2 = Gst.ElementFactory.make("capsfilter", "filter2") 
       self.camerafilter2.set_property("caps", camera2caps)
       self.pipeline.add(self.camerafilter2)

       self.videomixer = Gst.ElementFactory.make('videomixer', None)
       self.pipeline.add(self.videomixer)

       self.videobox1 = Gst.ElementFactory.make('videobox', None)
       self.videobox1.set_property("border-alpha",0)
       self.videobox1.set_property("top",0)
       self.videobox1.set_property("left",-320)
       self.pipeline.add(self.videobox1)

       self.videoformatconverter1 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter1)

       self.videoformatconverter2 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter2)

       self.videoformatconverter3 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter3)

       self.videoformatconverter4 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter4)

       self.xvimagesink = Gst.ElementFactory.make('xvimagesink',None)
       self.pipeline.add(self.xvimagesink)

       self.v4lsrc1.link(self.camerafilter1)
       self.camerafilter1.link(self.videoformatconverter1)
       self.videoformatconverter1.link(self.textoverlay)
       self.textoverlay.link(self.videobox1)
       self.videobox1.link(self.videomixer)

       self.v4lsrc2.link(self.camerafilter2)
       self.camerafilter2.link(self.videoformatconverter2)
       self.videoformatconverter2.link(self.videoenc)
       self.videoenc.link(self.videortppay)
       self.videortppay.link(self.videortpdepay)
       self.videortpdepay.link(self.videodec)
       self.videodec.link(self.videoformatconverter3)
       self.videoformatconverter3.link(self.videomixer)

       self.videomixer.link(self.videoformatconverter4)
       self.videoformatconverter4.link(self.xvimagesink)
       self.pipeline.set_state(Gst.State.PLAYING)
       mainloop.run()
   
if __name__ == "__main__":
    app = VideoBox()
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    exit_status = app.run(sys.argv)
    sys.exit(exit_status)

The long journey towards good free video conferencing

One project we been working on here at Red Hat Brno is to make sure we have a nicely working voice and video calling with Empathy in Fedora 18. The project is being spearheaded by Debarshi Ray with me trying to help out with the testing. We are still not there, but we are making good progress thanks to the help of people like Brian Pepple, Sjoerd Simons, Olivier Crete and Guillaume Desmottes and more.

But having been involved with open source multimedia for so long I thought it could be interesting for people to know why free video calling have taken so long to get right and why we still have a little bit to go. So I decided to do this write up of some of the challenges involved. Be aware though that this article is mostly discuss the general historical challenges of getting free VoIP up and running, but I will try to tie that into the specific issues we are trying to resolve currently where relevant.

Protocols

The first challenge that had to be overcome was the challenge of protocols. VoIP and video calling has been around for a while (which an application like Ekiga is proof of), but it has been hampered by a jungle of complex standards, closed protocols, lack of interoperability and so on. Some of the older standards also require non-free codecs to operate. The open standard that has started to turn this around is XMPP which is the protocol that came out of the Jabber project. Originally it was just an open text chat network, but thanks to ongoing work it now features voice and video conferencing too. It also got a boost as Google choose it as the foundation for their GTalk offering ensuring that anyone with a gmail address suddenly was available to chat or call. That said like any developing protocol it has its challenges, and some slight differences in behaviour between a Google jabber server and most others is causing us some pain with video calls currently, which is one of the issues we are trying to figure out how to resolve.

Codecs and interoperability

The other thing that has hounded us is the combination of non-free codecs and the need for interoperability. For a video calling system to be interesting to use you would need to be able to use it to contact at least a substantial subset of your friends and family. For the longest time this either meant using a non-free codec, because if you relied solely on free codecs no widely used client out there would be able to connect with you. But thanks to the effort of first Xiph.org to create the Speex audio codec and now most recently the Opus audio codec, and later the adoption of Speex by Google has at least mostly resolved things on the audio side of things. On the video side things are still not 100% there. We have the Theora video codec from Xiph.org, but unfortunately when the RTP specification for that codec was written, the primary usecase in mind was RTSP streaming and not video conferencing, making the Theora RTP a bit hairy to use for video conferencing. The other bigger issue with Theora is that outside the Linux world nobody adopted Theora for video calling, so once again you are not likely able to use it to call a very large subset of your friends and family unless they are all on Linux systems.
There might be a solution on the way though in the form of new kid on the block, VP8. VP8 is a video codec that Google released as part of their WebM HTML5 video effort. The RTP specification for VP8 is still under development, so adoption is limited, but the hope and expectation is that Google will support VP8 in their GTalk client once the RTP specification is stable and thus we should have a good set of free codecs for both Audio and Video available and in the hands of a large user base.

Frameworks

Video calling is a quite complex technical issue, with a lot of components needing to work together from audio and video acquisition on your local machine, integrating with your address book, negotiating the call between the parties involved, putting everything into RTP packets on one side and unpacking and displaying them on the other side, taking into account the network, firewalls and and audio and video sync. So in order for a call to work you will need (among others) ALSA, PulseAudio, V4L2, GStreamer, Evolution Data Server, Farstream, libnice, the XMPP server, Telepathy and Empathy to work together across two different systems. And if you want to interoperate with a 3rd party system like GTalk the list of components that all need to work perfectly with each other grows further.

A lot of this software has been written in parallel with each other, written in parallel with evolving codecs and standards, and it tries to interoperate with as many 3rd party systems as possible. This has come at the cost of stability, which of course has turned people of from using and testing the video call functionality of Empathy. But we believe that we have reached a turning point now where the pieces are in place, which is why we are now trying to help stabilize and improve the experience to make doing VoIP and video conferencing calls work nicely out of the box on Fedora 18.

Missing pieces

In addition to the nitty gritty of protocols and codecs there are other pieces that has been lacking to give users a really good experience. The most critical one is good echo cancellation. This is required in order to avoid having an ugly echo effect when trying to use your laptop built-in speakers and microphone for a call. So people have been forced to use a headset to make things work reasonably well. This was a quite hard issue to solve as there was neither any great open source code available which implemented echo cancellation or a good way to hook it into the system. To start addressing this issue while I was working for Collabora Multimedia we reached out to the Dutch non-profit NLnet Foundation who sponsored us to have Wim Taymans work on creating an echo cancellation framework for PulseAudio. The goal was to create the framework within PulseAudio to support pluggable echo cancellation modules, turn two existing open source echo cancellation solutions into plugins for this framework as examples and proof of concept, and hope that the availability of such a framework would encourage other groups or individuals to release better echo cancellation modules going forward.
When we started this work the best existing open source echo cancellation system was Speex DSP. Unfortunately SpeexDSP had a lot of limitations, for instance it could not work well with two soundcards, which meant using your laptop speakers for output and a USB microphone for input would not work. Although we can claim no direct connection as things would have it Google ended up releasing a quite good echo cancellation algorithm as part of their WebRTC effort. This was quickly turned into a library and plugin for PulseAudio by Arun Raghavan. And this combined PulseAudio and WebRTC echo cancellation system is what we will have packaged and available in Fedora 18.

Summary

So I outlined a few of the challenges around having a good quality VoIP and video conferencing solution shipping out of the box on a Linux Distribution. And some of the items like the Video Codec situation and general stack stability is not 100% there yet. There also is quite a few bugs in Empathy in terms of behaviour, but Debarshi are already debugging those and with the help of the Telepathy and Empathy teams we should hopefully get those issues patched and merged before Fedora 18 is shipping. Our goal is to get Empathy up to a level where people want to be using it to make VoiP and Video calls, as that is also the best way to ensure things stay working going forward.

In addition to Debarshi, another key person helping us with this effort in the Fedora community is Brian Pepple, who are making sure we are getting releases and updates of GStreamer, Telpathy, Farstream, libnice and so on packaged for Fedora 18 almost on the day. This is making testing and verifying bugfixes a lot easier for us.

Future plans

There are also some nice to have items we want to look at going forward after having stabilized the current functionality. For instance Red Hat and Xiph.org codec guru Monty Montgomery suggested we add a video noise reduction video to the GStreamer pipeline inside Empathy in order to improve quality and performance when using a low quality built in web camera. [Edit: Sjoerd just tolm me the Gst 0.10 version of the code had such a plugin available, so this might not be to hard to resolve.]
Debarshi is also interested in seeing if we can help move the multiparty chat feature forward. But we are not expecting to be able to work on these issues before Fedora 18 is released.

GStreamer 1.x documentation getting a boost

GStreamer maintainer Wim Taymans decided that having a brand new GStreamer 1.x series was only worth the effort if we also had some nice up to date documentation for GStreamer 1.0. So over the last week he has been going over the GStreamer Application Development manual making sure it is up to date and fixing all the code examples and adding new chapters even. So if you want to get into GStreamer development our introduction manual should now be a good starting point again!

Transmageddon 0.24 is out

So I finally released an official new Transmageddon release today, 0.24. In addition to the changes from the 0.23 test release (GTK3 and GStreamer 1.0), this new version uses Python 3 and got some fixes for improved handling of missing codecs. Let me know if its giving you any trouble.

Next step is looking into how to implement some or all of the new interface design from Gendre Sébastien.

GStreamer 1.0 Released

So this news is a couple of days old now, but I wanted to write a blog entry about the exciting release of GStreamer 1.0. When we released GStreamer 0.10 about 7 years ago we did not expect or plan the 0.10 series to last as long as it did, I think if we had it would have been called 1.0 instead of 0.10. Our caution back then was that 0.10 was a quite revolutionary version with the core of GStreamer extensively re-designed around effective use of threads and thread safety. The new GStreamer 1.0 is more of incremental improvement, cleaning up the API and making doing things modern systems expect easier and more straightforward. I think a lot of the work that went into 1.0 could be said to be based on cleaning up the awkward APIs that can evolve as you are not able to change anything existing, just add new stuff that does not affect the old.

That said there are a lot of major improvements to be seen too, with the list that Tim-Phillip Muller put together for the GStreamer website catching the major items:

  • more flexible memory handling
  • extensible and negotiable metadata for buffers
  • caps negotiation and renegotiation mechanisms, decoupled from buffer allocation
  • improved caps renegotiation
  • automatic re-sending of state for dynamic pipelines
  • reworked and more fine-grained pad probing
  • simpler and more descriptive audio and video caps
  • more efficient allocation of buffers, events and other mini objects
  • improved timestamp handling
  • support for gobject-inspection-based language bindings

The list can seem a bit technical and dry, but there are a lot of things that will benefit users here once plugins and applications start taking advantage of them. For instance the more flexible memory handling will improve performance when for instance running GStreamer on ARM boards like a Panda board or a Raspberry Pi. The changes will also make it a lot easier to write plugins and use plugins on PC platforms which use the GPU for decoding or encoding and that use OpenGL for rendering the playback. These things where possible with 0.10, but they required a bit more special casing and more code in the plugins and a bit more overhead in setting up the pipeline. If you want a great introduction to what is in this 1.0 release I recommend the keynote by Wim Taymans about GStreamer 1.0 and the GStreamer Status report keynote by Tim-Phillip Müller which both talk alot about the new features and the possibilities they open.

I think that the biggest change for a lot of developers though will be if they are using a language binding for their application, GStreamer 1.0 is offering Gobject introspection bindings support, which means that most bindings from now on will be using that. For Python developers like myself that is a quite big change in API. On the other hand it also does mean that porting to Python3 has become very simple. For Transmageddon I simply ran it through the Python 2to3 script and it just worked.

There are a lot of contributors to this release, and I would love to thank them all, but I think Wim and Tim also deserve a special mention as I don’t think there would be a GStreamer 1.0 without them. Wim has of course been the GStreamer maintainer for a long while and he shepherded this release from the beginning when he was the only developer working on it. Tim has been the GStreamer release manager for a long while now and are doing a wonderful job, fixing a gazillion bugs and making sure we regressions are not allowed to creep into GStreamer releases. He ended up doing a lot of heavy lifting to get the final GStreamer 0.11 test releases and the final 1.0 out the door. So a big big thanks to both of them.

Another person I want to give an extra kudos to is Bastien Nocera who have done a lot of working porting GNOME applications to GStreamer 1.0, there is and was of course a lot of other people involved in that process too, but Bastien did an incredible job working through that list and writing patches to port many of them over.

What I am really looking forward to now is the release of Fedora 18 as I think it will be the first chance for a lot of users and developers to try out all the great work that has been done on GStreamer 1.0 and porting applications over to it. I am personally targeting Fedora 18 with Transmageddon, doing my current testing and development on a F18 system, making sure things like the PackageKit integration is working smoothly.

Transmageddon test release available

I have made a first test release of Transmageddon today, 0.23. It is the first release depending on GStreamer 0.11/1.0 and GTK3. It also features a little bit of GNOME Shell and Unity integration in the form of a notification message when its done transcoding and the menu has been moved into the shell. You can find the test release in the pre-release directory on linuxrising.org. Let me know if you have any issues so I can try to fix them before I make a final stable release when GStreamer 1.0 is released. Be aware that you want the GStreamer 0.11 releases from yesterday to try this release.

New GTK3 and GNOME shell friendly Transmageddon

The above screenshot is taken on my Fedora 18 test system!

Does GStreamer scale?

Sometimes we get questions about if GStreamer can scale to handle really complex pipelines. Well thanks to Kipp Cannon and his talk at this years GStreamer Conference we know have the answer.
They are part of a research project called LIGO which is doing researching gravitaional waves, which can be described as ripples in the fabric of space-time. Kipp is part of the LIGO Data Analysis Software Working Group which has develop a what the call gstlal.

P.S. You might want to download and view the following with an image viewer application as the browser struggles to render it in any detail :)
Anyway, to make a long story short, their pipeline looks something like this image (created with the built in graph dump functionality of GStreamer using graphviz).

If you ever seen a bigger pipeline please let me know :) Also it should alleviate any concerns you might have about GStreamers scalability.

Gearing up Transmageddon for release

As we count down towards the GStreamer 1.0 release I been spending some free moments preparing the new Transmageddon version, today I re-enabled multipass encoding, removed the redundant xvid option (as we already do MPEG4) and polished the notification message a little. Tried to run through with as many codecs as I can, and so far they all seem to work perfectly with GStreamer 1.0. So hopefully a new Transmageddon release soon :)

I think I got everything that was there before the 1.0 port back again, but I wouldn’t be surprised if I have managed to miss a feature somewhere :)