Category Archives: Streaming

GStreamer Conference 2013 – Haggis edition

So we are quickly approaching the time for GStreamer Conference 2013. This year we will be in Edinburgh, Scotland and the conference will be hosted at the Edinburgh International Conference Centre alongside the Embedded Linux Conference Europe and the Automotive Linux Summit.

The GStreamer Conference 2013 Schedule is now live and as you can see there are a lot of great talks this year too, ranging from OpenGL integration, embedded hardware, new codecs and more. As always the GStreamer Conference is the best place to be to discuss the challenges of multimedia.

The conference will be held on the 22nd and 23rd of October so I strongly recommend you get yourself registered. If you want to attend ELCE or the Autotmotive summit make sure to register for those too and extend your stay in Edinburgh to also cover the 24th and 25th of October.

Looking forward to seeing you all there.

And last but not least, a big thank you to this years sponsors of the GStreamer Conference, especially Platinum sponsor Collabora. Also a big thank you to sponsors Fluendo, Google and for the first time this year Cisco.

And as always a big thank you to Ubicast who as always will be recording and publishing the conference for us. Be sure to check out their recordings from earlier years to find out what the GStreamer Conference is about.

GStreamer, Python and videomixing

One feature that would be of interest to us in the Empathy Video Conference client is the ability to record conversations. Due to that I have been putting together a simple prototype Python test application in free moments to verify that everything works as expected, before any effort is put into doing any work inside Empathy.

The sample code below requires two webcams to be connected to your system to work. It basically takes the two camera video streams, puts one of them through a encode/rtp/decode process (to roughly emulate what happens in a video call) and puts a text overlay onto the video to let the conference participant know the call is being recorded. The two video streams are then mixed together and displayed. In the actual application the combined stream would be saved to disk instead of course and also audio captured and mixed.

If we ever get around to working on this feature is an open question, but at least we can now assume that it is likely to work. Of course getting one stream in over the network over RTP is very different from what this sample does, so that might uncover some bugs.

The sample also works with Python3, so even though it is only a prototype it already fulfils the GNOME Goal :)

import sys
from gi.repository import Gst
from gi.repository import GObject
GObject.threads_init()
Gst.init(None)

import os

class VideoBox():
   def __init__(self):
       mainloop = GObject.MainLoop()
       # Create transcoding pipeline
       self.pipeline = Gst.Pipeline()


       self.v4lsrc1 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc1.set_property("device", "/dev/video0")
       self.pipeline.add(self.v4lsrc1)

       self.v4lsrc2 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc2.set_property("device", "/dev/video1")
       self.pipeline.add(self.v4lsrc2)

       camera1caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter1 = Gst.ElementFactory.make("capsfilter", "filter1") 
       self.camerafilter1.set_property("caps", camera1caps)
       self.pipeline.add(self.camerafilter1)

       self.videoenc = Gst.ElementFactory.make("theoraenc", None)
       self.pipeline.add(self.videoenc)

       self.videodec = Gst.ElementFactory.make("theoradec", None)
       self.pipeline.add(self.videodec)

       self.videortppay = Gst.ElementFactory.make("rtptheorapay", None)
       self.pipeline.add(self.videortppay)

       self.videortpdepay = Gst.ElementFactory.make("rtptheoradepay", None)
       self.pipeline.add(self.videortpdepay)

       self.textoverlay = Gst.ElementFactory.make("textoverlay", None)
       self.textoverlay.set_property("text","Talk is being recorded")
       self.pipeline.add(self.textoverlay)

       camera2caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter2 = Gst.ElementFactory.make("capsfilter", "filter2") 
       self.camerafilter2.set_property("caps", camera2caps)
       self.pipeline.add(self.camerafilter2)

       self.videomixer = Gst.ElementFactory.make('videomixer', None)
       self.pipeline.add(self.videomixer)

       self.videobox1 = Gst.ElementFactory.make('videobox', None)
       self.videobox1.set_property("border-alpha",0)
       self.videobox1.set_property("top",0)
       self.videobox1.set_property("left",-320)
       self.pipeline.add(self.videobox1)

       self.videoformatconverter1 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter1)

       self.videoformatconverter2 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter2)

       self.videoformatconverter3 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter3)

       self.videoformatconverter4 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter4)

       self.xvimagesink = Gst.ElementFactory.make('xvimagesink',None)
       self.pipeline.add(self.xvimagesink)

       self.v4lsrc1.link(self.camerafilter1)
       self.camerafilter1.link(self.videoformatconverter1)
       self.videoformatconverter1.link(self.textoverlay)
       self.textoverlay.link(self.videobox1)
       self.videobox1.link(self.videomixer)

       self.v4lsrc2.link(self.camerafilter2)
       self.camerafilter2.link(self.videoformatconverter2)
       self.videoformatconverter2.link(self.videoenc)
       self.videoenc.link(self.videortppay)
       self.videortppay.link(self.videortpdepay)
       self.videortpdepay.link(self.videodec)
       self.videodec.link(self.videoformatconverter3)
       self.videoformatconverter3.link(self.videomixer)

       self.videomixer.link(self.videoformatconverter4)
       self.videoformatconverter4.link(self.xvimagesink)
       self.pipeline.set_state(Gst.State.PLAYING)
       mainloop.run()
   
if __name__ == "__main__":
    app = VideoBox()
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    exit_status = app.run(sys.argv)
    sys.exit(exit_status)

The long journey towards good free video conferencing

One project we been working on here at Red Hat Brno is to make sure we have a nicely working voice and video calling with Empathy in Fedora 18. The project is being spearheaded by Debarshi Ray with me trying to help out with the testing. We are still not there, but we are making good progress thanks to the help of people like Brian Pepple, Sjoerd Simons, Olivier Crete and Guillaume Desmottes and more.

But having been involved with open source multimedia for so long I thought it could be interesting for people to know why free video calling have taken so long to get right and why we still have a little bit to go. So I decided to do this write up of some of the challenges involved. Be aware though that this article is mostly discuss the general historical challenges of getting free VoIP up and running, but I will try to tie that into the specific issues we are trying to resolve currently where relevant.

Protocols

The first challenge that had to be overcome was the challenge of protocols. VoIP and video calling has been around for a while (which an application like Ekiga is proof of), but it has been hampered by a jungle of complex standards, closed protocols, lack of interoperability and so on. Some of the older standards also require non-free codecs to operate. The open standard that has started to turn this around is XMPP which is the protocol that came out of the Jabber project. Originally it was just an open text chat network, but thanks to ongoing work it now features voice and video conferencing too. It also got a boost as Google choose it as the foundation for their GTalk offering ensuring that anyone with a gmail address suddenly was available to chat or call. That said like any developing protocol it has its challenges, and some slight differences in behaviour between a Google jabber server and most others is causing us some pain with video calls currently, which is one of the issues we are trying to figure out how to resolve.

Codecs and interoperability

The other thing that has hounded us is the combination of non-free codecs and the need for interoperability. For a video calling system to be interesting to use you would need to be able to use it to contact at least a substantial subset of your friends and family. For the longest time this either meant using a non-free codec, because if you relied solely on free codecs no widely used client out there would be able to connect with you. But thanks to the effort of first Xiph.org to create the Speex audio codec and now most recently the Opus audio codec, and later the adoption of Speex by Google has at least mostly resolved things on the audio side of things. On the video side things are still not 100% there. We have the Theora video codec from Xiph.org, but unfortunately when the RTP specification for that codec was written, the primary usecase in mind was RTSP streaming and not video conferencing, making the Theora RTP a bit hairy to use for video conferencing. The other bigger issue with Theora is that outside the Linux world nobody adopted Theora for video calling, so once again you are not likely able to use it to call a very large subset of your friends and family unless they are all on Linux systems.
There might be a solution on the way though in the form of new kid on the block, VP8. VP8 is a video codec that Google released as part of their WebM HTML5 video effort. The RTP specification for VP8 is still under development, so adoption is limited, but the hope and expectation is that Google will support VP8 in their GTalk client once the RTP specification is stable and thus we should have a good set of free codecs for both Audio and Video available and in the hands of a large user base.

Frameworks

Video calling is a quite complex technical issue, with a lot of components needing to work together from audio and video acquisition on your local machine, integrating with your address book, negotiating the call between the parties involved, putting everything into RTP packets on one side and unpacking and displaying them on the other side, taking into account the network, firewalls and and audio and video sync. So in order for a call to work you will need (among others) ALSA, PulseAudio, V4L2, GStreamer, Evolution Data Server, Farstream, libnice, the XMPP server, Telepathy and Empathy to work together across two different systems. And if you want to interoperate with a 3rd party system like GTalk the list of components that all need to work perfectly with each other grows further.

A lot of this software has been written in parallel with each other, written in parallel with evolving codecs and standards, and it tries to interoperate with as many 3rd party systems as possible. This has come at the cost of stability, which of course has turned people of from using and testing the video call functionality of Empathy. But we believe that we have reached a turning point now where the pieces are in place, which is why we are now trying to help stabilize and improve the experience to make doing VoIP and video conferencing calls work nicely out of the box on Fedora 18.

Missing pieces

In addition to the nitty gritty of protocols and codecs there are other pieces that has been lacking to give users a really good experience. The most critical one is good echo cancellation. This is required in order to avoid having an ugly echo effect when trying to use your laptop built-in speakers and microphone for a call. So people have been forced to use a headset to make things work reasonably well. This was a quite hard issue to solve as there was neither any great open source code available which implemented echo cancellation or a good way to hook it into the system. To start addressing this issue while I was working for Collabora Multimedia we reached out to the Dutch non-profit NLnet Foundation who sponsored us to have Wim Taymans work on creating an echo cancellation framework for PulseAudio. The goal was to create the framework within PulseAudio to support pluggable echo cancellation modules, turn two existing open source echo cancellation solutions into plugins for this framework as examples and proof of concept, and hope that the availability of such a framework would encourage other groups or individuals to release better echo cancellation modules going forward.
When we started this work the best existing open source echo cancellation system was Speex DSP. Unfortunately SpeexDSP had a lot of limitations, for instance it could not work well with two soundcards, which meant using your laptop speakers for output and a USB microphone for input would not work. Although we can claim no direct connection as things would have it Google ended up releasing a quite good echo cancellation algorithm as part of their WebRTC effort. This was quickly turned into a library and plugin for PulseAudio by Arun Raghavan. And this combined PulseAudio and WebRTC echo cancellation system is what we will have packaged and available in Fedora 18.

Summary

So I outlined a few of the challenges around having a good quality VoIP and video conferencing solution shipping out of the box on a Linux Distribution. And some of the items like the Video Codec situation and general stack stability is not 100% there yet. There also is quite a few bugs in Empathy in terms of behaviour, but Debarshi are already debugging those and with the help of the Telepathy and Empathy teams we should hopefully get those issues patched and merged before Fedora 18 is shipping. Our goal is to get Empathy up to a level where people want to be using it to make VoiP and Video calls, as that is also the best way to ensure things stay working going forward.

In addition to Debarshi, another key person helping us with this effort in the Fedora community is Brian Pepple, who are making sure we are getting releases and updates of GStreamer, Telpathy, Farstream, libnice and so on packaged for Fedora 18 almost on the day. This is making testing and verifying bugfixes a lot easier for us.

Future plans

There are also some nice to have items we want to look at going forward after having stabilized the current functionality. For instance Red Hat and Xiph.org codec guru Monty Montgomery suggested we add a video noise reduction video to the GStreamer pipeline inside Empathy in order to improve quality and performance when using a low quality built in web camera. [Edit: Sjoerd just tolm me the Gst 0.10 version of the code had such a plugin available, so this might not be to hard to resolve.]
Debarshi is also interested in seeing if we can help move the multiparty chat feature forward. But we are not expecting to be able to work on these issues before Fedora 18 is released.

Back from GStreamer Conference 2012

Came back last evening from the GStreamer Conference and I am now back in Cambridge for the weekend. The GStreamer Conference was a lot of fun this year and it was great seeing everyone again. I think the mixture of talks we had this year was really good and I think everyone attending enjoyed themselves. For those who missed the conference this year then Phoronix and Lwn.net posted articles from the Conference. The talks where also recorded and will soon be available at the Ubicast GStreamer Conference website. We did try to get livestreaming going this year, but due to technical problems it didn’t work out, but maybe next year.

A big thank you again to our Gold Sponsor Collabora and our Silver Sponsors Entropy Wave, Fluendo, Igalia and Google. Thanks also goes to LWN.net, Phoronix and Ubicast for making sure the talks and sessions at the GStreamer Conference can reach a wide an audience as possible. And last but not least a big thanks to all our conference speakers who took the time and effort to prepare presentations for this years GStreamer Conference.

For me personally the GStreamer Conference this year also marks the end of my life in Cambridge, UK. Starting from next week I will have completed my period of comuting to Brno, and will instead be living in Brno, Czech Republic on a permanent basis. Which reminds me, we are looking to hire more members to our Brno desktop engineering team, so I will be posting a blog soon outlining what kind of experience we are looking for.

Interview with Arun Raghavan about PulseAudio

With all the talk generated by Arun Raghavans blog post comparing PulseAudio and Audioflinger I thought it would be good to follow up with an interview with Arun about the latest developments in PulseAudio and the way forward for the project. You can find the PulseAudio interview here. I also made a new page listing all the Collabora developer interviews done so far. Enjoy :)

Farstream and libnice, an interview with Youness Alaoui

After the success of the GStreamer interview with Wim Taymans I thought we follow up with another great interview with a Collabora developer.

This time we are talking with Youness Alaoui who is one of the maintainers of Farstream, the audio and video conferencing framework built on top of GStreamer. We also cover another of Youness Alaoui projects, libnice, the NAT traversal library. So if you want to know what is happening with audio and video conferencing on Linux be sure to read the full interview with Youness Alaoui here.

Dynamic Adaptive Streaming over HTTP standard, ST Microelectronics, Fraunhofer and GStreamer

Edward Hervey pointed out to me this morning that there are some nice articles online about an effort between ST Microelectronics and Frauenhofer, around the 3GP DASH (Dynamic Adaptive Streaming over HTTP) standard. There is for instance this article in Thinq magazine and this article on TMCnet. What you might not know and which is even cooler is that Emanuele Quacchio will be speaking this GStreamer based DASH implementation at the the GStreamer Conference later this Month. So if you haven’t signed up for the GStreamer Conference already, then maybe this is the time to do it :)

I am really excited about this years GStreamer Conference as we have a lot of ongoing efforts about to come to fruits. From Collabora we got Wim Taymans will be talking about GStreamer 1.0 effort, which we expect to have out before years end and Tim-Philipp Müller will speak about a lot of the other incredible advances we made over the last year. Being in the middle of it I think its easy to go a bit blind due to the gradual process, but things like the new parsing libraries that Thibault Saunier have been working on, which will enable much quicker and better support for things like libva and vdpau plugins in GStreamer. Or the new baseclasses that Mark Nauwelaerts have ported most of our plugins over to now, which in one fell swoop improved our plugin quality by leaps and bounds. And of course there are things like the GStreamer Editing services (GES), discoverer and encodebin which Edward Hervey created, which will make applications like Transmageddon video transcoder and remuxer and the PiTiVi video editor a lot easier to develop.

We will also be doing some real cool demonstrations of stuff we have been working on at Collabora at the Linux Con Showcase on Thursday. Thanks to GES we have a great demo of a mobile editing solution using either QML or HTML5. We have HTML5 video calling using Telepathy and we have Video calling using Telepathy from the Media explorer media center solution.

Another talk that I will be sure not to miss is Jan Schmidt who will be talking about Blu-Ray playback with GStreamer. In addition to being technically interesting Jans talks are always fun, like last year he did his presentation using GStreamer instead of something like LibreOffice, having created his slides as a DVD menu through a small program he wrote to turn SVG files into DVD menus :)

Media Explorer

Discovered a new piece of software using GStreamer the other day called Media Explorer. It is a nice media center type solution for the desktop and Meego devices. The system has been developed by mostly Intel engineers, but they have now made it freely available. I tried it on my Fedora system last evening and it seems to have a bit of a MeeGo bias currently, as it complained about Connman being installed and also didn’t look for media in the desktop Media directories, but I am sure those are smaller issues that can quickly be sorted out. Thomas Wood did this nice blog entry about it a couple of weeks ago, with some more details.

Anyway if you are looking for a linux based media center UI this might be what you are looking for, personally I will try to see if I can get it going on my little Panda board at home.

Update: Also noticed there is a nice article about Media Explorer on linux.com.

GStreamer Conference 2011 update

Just sent out this little email with some updates on the GStreamer 2011 Conference. Planning is progressing and a sponsorship leaflet is now available for those interested in sponsoring the conference. The call for paper deadline is also slowly but surely approaching, so anyone who wants to do a talk please send in an abstract before the 1st of July this year.

For everyone else, just register for the conference and set aside October for GStreamer and Prague :) As always details can be found on the GStreamer Conference 2011 website.

Software and Copyright

Had an interesting conversation today with Wim about copyright and software. We are currently working on integrating some RTSP patches into GStreamer and it turns out these patches contain some code originating in Xine which means we need to get a relicense on the code to the LGPL or replace the GPL parts.

Part of the problem is that people in the free software community tend to be rather sloppy with copyright statements when copying code around. So when people take code from a GPL project to put into another GPL project they often don’t copy all the needed copyright headers over as well, and also a lot of people when making significant changes don’t add their names to the copyright header of the file.

On top of that there are issues like the hard to grasp definition of what exactly constitutes a copyrightable contribution and the general problems with programming languages not being very expressive by nature, meaning that there isn’t that many (non stupid) ways of doing something.

Neither of us being a lawyer probably doesn’t help either :)

Anyway all of this makes it a quite extensive task to a) identify all (potential) copyright holders on a specific piece of code. b) track all these people down. c) get their permission to relicense d) if anyone says no, identify exactly what is their contribution and figure how that can be removed. Figuring out what to do when things are not easy to rewrite in any intelligent way, as a rewrite would basically look like the original and so on.

In terms of rewrites being identical to the original copyright law addresses this with rules saying that only the expressive elements of code is copyrightable, but its not easy for a layman to have a clear feeling on what that translates into in terms of a specific piece of code. In many cases you can never really know, even with a good lawyer to help you, before a judge gives final verdict on it. And having a judge give final verdict is what you probably want to avoid in the first place as that process costs a lot of time and money.

What is certain is that it do seem clear to me that copyright law was not made with gigantic collaborative efforts, which open source software is an example of, in mind. (ok, I am not expecting any awards for spotting the obvious :)