Category Archives: GStreamer

The long journey towards good free video conferencing

One project we been working on here at Red Hat Brno is to make sure we have a nicely working voice and video calling with Empathy in Fedora 18. The project is being spearheaded by Debarshi Ray with me trying to help out with the testing. We are still not there, but we are making good progress thanks to the help of people like Brian Pepple, Sjoerd Simons, Olivier Crete and Guillaume Desmottes and more.

But having been involved with open source multimedia for so long I thought it could be interesting for people to know why free video calling have taken so long to get right and why we still have a little bit to go. So I decided to do this write up of some of the challenges involved. Be aware though that this article is mostly discuss the general historical challenges of getting free VoIP up and running, but I will try to tie that into the specific issues we are trying to resolve currently where relevant.

Protocols

The first challenge that had to be overcome was the challenge of protocols. VoIP and video calling has been around for a while (which an application like Ekiga is proof of), but it has been hampered by a jungle of complex standards, closed protocols, lack of interoperability and so on. Some of the older standards also require non-free codecs to operate. The open standard that has started to turn this around is XMPP which is the protocol that came out of the Jabber project. Originally it was just an open text chat network, but thanks to ongoing work it now features voice and video conferencing too. It also got a boost as Google choose it as the foundation for their GTalk offering ensuring that anyone with a gmail address suddenly was available to chat or call. That said like any developing protocol it has its challenges, and some slight differences in behaviour between a Google jabber server and most others is causing us some pain with video calls currently, which is one of the issues we are trying to figure out how to resolve.

Codecs and interoperability

The other thing that has hounded us is the combination of non-free codecs and the need for interoperability. For a video calling system to be interesting to use you would need to be able to use it to contact at least a substantial subset of your friends and family. For the longest time this either meant using a non-free codec, because if you relied solely on free codecs no widely used client out there would be able to connect with you. But thanks to the effort of first Xiph.org to create the Speex audio codec and now most recently the Opus audio codec, and later the adoption of Speex by Google has at least mostly resolved things on the audio side of things. On the video side things are still not 100% there. We have the Theora video codec from Xiph.org, but unfortunately when the RTP specification for that codec was written, the primary usecase in mind was RTSP streaming and not video conferencing, making the Theora RTP a bit hairy to use for video conferencing. The other bigger issue with Theora is that outside the Linux world nobody adopted Theora for video calling, so once again you are not likely able to use it to call a very large subset of your friends and family unless they are all on Linux systems.
There might be a solution on the way though in the form of new kid on the block, VP8. VP8 is a video codec that Google released as part of their WebM HTML5 video effort. The RTP specification for VP8 is still under development, so adoption is limited, but the hope and expectation is that Google will support VP8 in their GTalk client once the RTP specification is stable and thus we should have a good set of free codecs for both Audio and Video available and in the hands of a large user base.

Frameworks

Video calling is a quite complex technical issue, with a lot of components needing to work together from audio and video acquisition on your local machine, integrating with your address book, negotiating the call between the parties involved, putting everything into RTP packets on one side and unpacking and displaying them on the other side, taking into account the network, firewalls and and audio and video sync. So in order for a call to work you will need (among others) ALSA, PulseAudio, V4L2, GStreamer, Evolution Data Server, Farstream, libnice, the XMPP server, Telepathy and Empathy to work together across two different systems. And if you want to interoperate with a 3rd party system like GTalk the list of components that all need to work perfectly with each other grows further.

A lot of this software has been written in parallel with each other, written in parallel with evolving codecs and standards, and it tries to interoperate with as many 3rd party systems as possible. This has come at the cost of stability, which of course has turned people of from using and testing the video call functionality of Empathy. But we believe that we have reached a turning point now where the pieces are in place, which is why we are now trying to help stabilize and improve the experience to make doing VoIP and video conferencing calls work nicely out of the box on Fedora 18.

Missing pieces

In addition to the nitty gritty of protocols and codecs there are other pieces that has been lacking to give users a really good experience. The most critical one is good echo cancellation. This is required in order to avoid having an ugly echo effect when trying to use your laptop built-in speakers and microphone for a call. So people have been forced to use a headset to make things work reasonably well. This was a quite hard issue to solve as there was neither any great open source code available which implemented echo cancellation or a good way to hook it into the system. To start addressing this issue while I was working for Collabora Multimedia we reached out to the Dutch non-profit NLnet Foundation who sponsored us to have Wim Taymans work on creating an echo cancellation framework for PulseAudio. The goal was to create the framework within PulseAudio to support pluggable echo cancellation modules, turn two existing open source echo cancellation solutions into plugins for this framework as examples and proof of concept, and hope that the availability of such a framework would encourage other groups or individuals to release better echo cancellation modules going forward.
When we started this work the best existing open source echo cancellation system was Speex DSP. Unfortunately SpeexDSP had a lot of limitations, for instance it could not work well with two soundcards, which meant using your laptop speakers for output and a USB microphone for input would not work. Although we can claim no direct connection as things would have it Google ended up releasing a quite good echo cancellation algorithm as part of their WebRTC effort. This was quickly turned into a library and plugin for PulseAudio by Arun Raghavan. And this combined PulseAudio and WebRTC echo cancellation system is what we will have packaged and available in Fedora 18.

Summary

So I outlined a few of the challenges around having a good quality VoIP and video conferencing solution shipping out of the box on a Linux Distribution. And some of the items like the Video Codec situation and general stack stability is not 100% there yet. There also is quite a few bugs in Empathy in terms of behaviour, but Debarshi are already debugging those and with the help of the Telepathy and Empathy teams we should hopefully get those issues patched and merged before Fedora 18 is shipping. Our goal is to get Empathy up to a level where people want to be using it to make VoiP and Video calls, as that is also the best way to ensure things stay working going forward.

In addition to Debarshi, another key person helping us with this effort in the Fedora community is Brian Pepple, who are making sure we are getting releases and updates of GStreamer, Telpathy, Farstream, libnice and so on packaged for Fedora 18 almost on the day. This is making testing and verifying bugfixes a lot easier for us.

Future plans

There are also some nice to have items we want to look at going forward after having stabilized the current functionality. For instance Red Hat and Xiph.org codec guru Monty Montgomery suggested we add a video noise reduction video to the GStreamer pipeline inside Empathy in order to improve quality and performance when using a low quality built in web camera. [Edit: Sjoerd just tolm me the Gst 0.10 version of the code had such a plugin available, so this might not be to hard to resolve.]
Debarshi is also interested in seeing if we can help move the multiparty chat feature forward. But we are not expecting to be able to work on these issues before Fedora 18 is released.

GStreamer 1.x documentation getting a boost

GStreamer maintainer Wim Taymans decided that having a brand new GStreamer 1.x series was only worth the effort if we also had some nice up to date documentation for GStreamer 1.0. So over the last week he has been going over the GStreamer Application Development manual making sure it is up to date and fixing all the code examples and adding new chapters even. So if you want to get into GStreamer development our introduction manual should now be a good starting point again!

Transmageddon 0.24 is out

So I finally released an official new Transmageddon release today, 0.24. In addition to the changes from the 0.23 test release (GTK3 and GStreamer 1.0), this new version uses Python 3 and got some fixes for improved handling of missing codecs. Let me know if its giving you any trouble.

Next step is looking into how to implement some or all of the new interface design from Gendre Sébastien.

GStreamer 1.0 Released

So this news is a couple of days old now, but I wanted to write a blog entry about the exciting release of GStreamer 1.0. When we released GStreamer 0.10 about 7 years ago we did not expect or plan the 0.10 series to last as long as it did, I think if we had it would have been called 1.0 instead of 0.10. Our caution back then was that 0.10 was a quite revolutionary version with the core of GStreamer extensively re-designed around effective use of threads and thread safety. The new GStreamer 1.0 is more of incremental improvement, cleaning up the API and making doing things modern systems expect easier and more straightforward. I think a lot of the work that went into 1.0 could be said to be based on cleaning up the awkward APIs that can evolve as you are not able to change anything existing, just add new stuff that does not affect the old.

That said there are a lot of major improvements to be seen too, with the list that Tim-Phillip Muller put together for the GStreamer website catching the major items:

  • more flexible memory handling
  • extensible and negotiable metadata for buffers
  • caps negotiation and renegotiation mechanisms, decoupled from buffer allocation
  • improved caps renegotiation
  • automatic re-sending of state for dynamic pipelines
  • reworked and more fine-grained pad probing
  • simpler and more descriptive audio and video caps
  • more efficient allocation of buffers, events and other mini objects
  • improved timestamp handling
  • support for gobject-inspection-based language bindings

The list can seem a bit technical and dry, but there are a lot of things that will benefit users here once plugins and applications start taking advantage of them. For instance the more flexible memory handling will improve performance when for instance running GStreamer on ARM boards like a Panda board or a Raspberry Pi. The changes will also make it a lot easier to write plugins and use plugins on PC platforms which use the GPU for decoding or encoding and that use OpenGL for rendering the playback. These things where possible with 0.10, but they required a bit more special casing and more code in the plugins and a bit more overhead in setting up the pipeline. If you want a great introduction to what is in this 1.0 release I recommend the keynote by Wim Taymans about GStreamer 1.0 and the GStreamer Status report keynote by Tim-Phillip Müller which both talk alot about the new features and the possibilities they open.

I think that the biggest change for a lot of developers though will be if they are using a language binding for their application, GStreamer 1.0 is offering Gobject introspection bindings support, which means that most bindings from now on will be using that. For Python developers like myself that is a quite big change in API. On the other hand it also does mean that porting to Python3 has become very simple. For Transmageddon I simply ran it through the Python 2to3 script and it just worked.

There are a lot of contributors to this release, and I would love to thank them all, but I think Wim and Tim also deserve a special mention as I don’t think there would be a GStreamer 1.0 without them. Wim has of course been the GStreamer maintainer for a long while and he shepherded this release from the beginning when he was the only developer working on it. Tim has been the GStreamer release manager for a long while now and are doing a wonderful job, fixing a gazillion bugs and making sure we regressions are not allowed to creep into GStreamer releases. He ended up doing a lot of heavy lifting to get the final GStreamer 0.11 test releases and the final 1.0 out the door. So a big big thanks to both of them.

Another person I want to give an extra kudos to is Bastien Nocera who have done a lot of working porting GNOME applications to GStreamer 1.0, there is and was of course a lot of other people involved in that process too, but Bastien did an incredible job working through that list and writing patches to port many of them over.

What I am really looking forward to now is the release of Fedora 18 as I think it will be the first chance for a lot of users and developers to try out all the great work that has been done on GStreamer 1.0 and porting applications over to it. I am personally targeting Fedora 18 with Transmageddon, doing my current testing and development on a F18 system, making sure things like the PackageKit integration is working smoothly.

Transmageddon test release available

I have made a first test release of Transmageddon today, 0.23. It is the first release depending on GStreamer 0.11/1.0 and GTK3. It also features a little bit of GNOME Shell and Unity integration in the form of a notification message when its done transcoding and the menu has been moved into the shell. You can find the test release in the pre-release directory on linuxrising.org. Let me know if you have any issues so I can try to fix them before I make a final stable release when GStreamer 1.0 is released. Be aware that you want the GStreamer 0.11 releases from yesterday to try this release.

New GTK3 and GNOME shell friendly Transmageddon

The above screenshot is taken on my Fedora 18 test system!

Does GStreamer scale?

Sometimes we get questions about if GStreamer can scale to handle really complex pipelines. Well thanks to Kipp Cannon and his talk at this years GStreamer Conference we know have the answer.
They are part of a research project called LIGO which is doing researching gravitaional waves, which can be described as ripples in the fabric of space-time. Kipp is part of the LIGO Data Analysis Software Working Group which has develop a what the call gstlal.

P.S. You might want to download and view the following with an image viewer application as the browser struggles to render it in any detail :)
Anyway, to make a long story short, their pipeline looks something like this image (created with the built in graph dump functionality of GStreamer using graphviz).

If you ever seen a bigger pipeline please let me know :) Also it should alleviate any concerns you might have about GStreamers scalability.

Gearing up Transmageddon for release

As we count down towards the GStreamer 1.0 release I been spending some free moments preparing the new Transmageddon version, today I re-enabled multipass encoding, removed the redundant xvid option (as we already do MPEG4) and polished the notification message a little. Tried to run through with as many codecs as I can, and so far they all seem to work perfectly with GStreamer 1.0. So hopefully a new Transmageddon release soon :)

I think I got everything that was there before the 1.0 port back again, but I wouldn’t be surprised if I have managed to miss a feature somewhere :)

The challenges of Desktop Linux

So Miguael de Icaza posted a blog with his opinion about why Desktop Linux has not become a huge success. The core of his argument seems to be that the lack of ABI stability was the main reason we didn’t get a significant market share in the desktop market. Personally I think this argument doesn’t hold water at all and the comparison with MacOS X a bit random.

So I think there are a lot of contributing factors to our struggle in the desktop market like:

  • We are trying to compete with a near monopoly (Windows)
  • Companies tend to depend on a myriad of applications to run their business, and just a couple of them not running under Linux
    would be enough to derail a transition to Linux desktops
  • We were competing not only with other operating systems, but with a Office productivity application monopoly
  • We are trying to compete by supporting an unlimited range of hardware options
  • We divided our efforts into multiple competing APIs (GNOME vs KDE)
  • There was never a clear method of distributing software on Linux outside the distro specific package system.
  • Many of our underlaying systems were a bit immature
  • Software patents on multimedia codecs made it hard to create a good out of the box experience for multimedia
  • Competing with free applications is never a tempting proposition for 3rd party vendors
  • We never reached a critical mass where porting to desktop Linux tended to make sense
  • An impression was created that Linux users would not pay for any software
  • The different update cycles of the distributions made it hard to know when a new API would be available ‘everywhere’
  • Success in other areas drained resources away from the desktop

The Apple Myth
So how did Apple succeed? Well first of all the question needs to be asked if they have succeeded? When Steve Jobs came back to Apple I think their global market share for personal computers was down to just below 5% if my memory serves me correct. According to Wikipedia (not the best proof of anything, but lets assume they are in the ballpark) their marketshare is now about 7.5%. So in other words on the back of being the media darling and record breaking products such as the iPod, iPhone and iPad, they have managed to increase their market share with 2.5% in the PC market. I think that speaks volumes about the challenges posted by the first two items in my list above. Another thing that is both an advantage for Apple and a disadvantage at the same time is that they got their own hardware. In the advantage collumn that means that their developers had a very limited set of hardware configurations to support and they could ensure MacOS X ran well on that configuration. We on the other hand have been struggling with trying to support basically any random configuration out there, which means ensuring a problem free experience for everyone is next to impossible. Of course I think only supporting your own hardware also does sometimes makes things harder for Apple, because if a company was considering switching to MacOS X they would have to throw away all their existing hardware, which I am sure makes a lot of companies think twice if contemplating switching.

Apple were also able to build on their old market share when launching MacOS X, which means they have had a profitable ecosystem all the way. So for instance porting games has provided enough income to support companies in keep doing so. While for Linux it has often been a proposition of trying to build a market when considering porting to Linux.

Conclusions
So I could go intro great detail for each of my bullet points, but I think they are quite self explanatory. But my general point is that I when I ask myself if I think our market share would be significantly higher if our ABI stability had been even better, the answer is no. Not that I am saying I think it has had no impact at all, I am sure examples exist of ABI breakage or distro fragmentation having caused 3rd party software developers to shy away, but I don’t really believe Linux would have had for instance a 10% marketshare today if only our ABI stability had been better over the last 10 years. But maybe it would have added another 0.2% or something in that range.

But as I said in an earlier blog post, I am not negative about the future of Linux and open source on the desktop. I just think it is a lot slower slog to get there than we hoped for, and I do honestly feel that we have a much more compelling product to offer today than we did 10 years ago in comparison with Windows and MS Office. But the challenges in my bullet point list remain and overcoming them has been and will continue to be something we have to chip away at, one step at a time. And in the meantime linux and open source software is still doing extremely well in a lot of other end user facing market segments where the competition was not so strongly entrenched, like mobile phones, tablets, TVs, set top boxes, in-flight entertainment systems, in-vehicle entertainment systems, home applicances and so on.

Back from GStreamer Conference 2012

Came back last evening from the GStreamer Conference and I am now back in Cambridge for the weekend. The GStreamer Conference was a lot of fun this year and it was great seeing everyone again. I think the mixture of talks we had this year was really good and I think everyone attending enjoyed themselves. For those who missed the conference this year then Phoronix and Lwn.net posted articles from the Conference. The talks where also recorded and will soon be available at the Ubicast GStreamer Conference website. We did try to get livestreaming going this year, but due to technical problems it didn’t work out, but maybe next year.

A big thank you again to our Gold Sponsor Collabora and our Silver Sponsors Entropy Wave, Fluendo, Igalia and Google. Thanks also goes to LWN.net, Phoronix and Ubicast for making sure the talks and sessions at the GStreamer Conference can reach a wide an audience as possible. And last but not least a big thanks to all our conference speakers who took the time and effort to prepare presentations for this years GStreamer Conference.

For me personally the GStreamer Conference this year also marks the end of my life in Cambridge, UK. Starting from next week I will have completed my period of comuting to Brno, and will instead be living in Brno, Czech Republic on a permanent basis. Which reminds me, we are looking to hire more members to our Brno desktop engineering team, so I will be posting a blog soon outlining what kind of experience we are looking for.