GStreamer Conference 2013 – Haggis edition

So we are quickly approaching the time for GStreamer Conference 2013. This year we will be in Edinburgh, Scotland and the conference will be hosted at the Edinburgh International Conference Centre alongside the Embedded Linux Conference Europe and the Automotive Linux Summit.

The GStreamer Conference 2013 Schedule is now live and as you can see there are a lot of great talks this year too, ranging from OpenGL integration, embedded hardware, new codecs and more. As always the GStreamer Conference is the best place to be to discuss the challenges of multimedia.

The conference will be held on the 22nd and 23rd of October so I strongly recommend you get yourself registered. If you want to attend ELCE or the Autotmotive summit make sure to register for those too and extend your stay in Edinburgh to also cover the 24th and 25th of October.

Looking forward to seeing you all there.

And last but not least, a big thank you to this years sponsors of the GStreamer Conference, especially Platinum sponsor Collabora. Also a big thank you to sponsors Fluendo, Google and for the first time this year Cisco.

And as always a big thank you to Ubicast who as always will be recording and publishing the conference for us. Be sure to check out their recordings from earlier years to find out what the GStreamer Conference is about.

GStreamer, Python and videomixing

One feature that would be of interest to us in the Empathy Video Conference client is the ability to record conversations. Due to that I have been putting together a simple prototype Python test application in free moments to verify that everything works as expected, before any effort is put into doing any work inside Empathy.

The sample code below requires two webcams to be connected to your system to work. It basically takes the two camera video streams, puts one of them through a encode/rtp/decode process (to roughly emulate what happens in a video call) and puts a text overlay onto the video to let the conference participant know the call is being recorded. The two video streams are then mixed together and displayed. In the actual application the combined stream would be saved to disk instead of course and also audio captured and mixed.

If we ever get around to working on this feature is an open question, but at least we can now assume that it is likely to work. Of course getting one stream in over the network over RTP is very different from what this sample does, so that might uncover some bugs.

The sample also works with Python3, so even though it is only a prototype it already fulfils the GNOME Goal :)

import sys
from gi.repository import Gst
from gi.repository import GObject
GObject.threads_init()
Gst.init(None)

import os

class VideoBox():
   def __init__(self):
       mainloop = GObject.MainLoop()
       # Create transcoding pipeline
       self.pipeline = Gst.Pipeline()


       self.v4lsrc1 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc1.set_property("device", "/dev/video0")
       self.pipeline.add(self.v4lsrc1)

       self.v4lsrc2 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc2.set_property("device", "/dev/video1")
       self.pipeline.add(self.v4lsrc2)

       camera1caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter1 = Gst.ElementFactory.make("capsfilter", "filter1") 
       self.camerafilter1.set_property("caps", camera1caps)
       self.pipeline.add(self.camerafilter1)

       self.videoenc = Gst.ElementFactory.make("theoraenc", None)
       self.pipeline.add(self.videoenc)

       self.videodec = Gst.ElementFactory.make("theoradec", None)
       self.pipeline.add(self.videodec)

       self.videortppay = Gst.ElementFactory.make("rtptheorapay", None)
       self.pipeline.add(self.videortppay)

       self.videortpdepay = Gst.ElementFactory.make("rtptheoradepay", None)
       self.pipeline.add(self.videortpdepay)

       self.textoverlay = Gst.ElementFactory.make("textoverlay", None)
       self.textoverlay.set_property("text","Talk is being recorded")
       self.pipeline.add(self.textoverlay)

       camera2caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter2 = Gst.ElementFactory.make("capsfilter", "filter2") 
       self.camerafilter2.set_property("caps", camera2caps)
       self.pipeline.add(self.camerafilter2)

       self.videomixer = Gst.ElementFactory.make('videomixer', None)
       self.pipeline.add(self.videomixer)

       self.videobox1 = Gst.ElementFactory.make('videobox', None)
       self.videobox1.set_property("border-alpha",0)
       self.videobox1.set_property("top",0)
       self.videobox1.set_property("left",-320)
       self.pipeline.add(self.videobox1)

       self.videoformatconverter1 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter1)

       self.videoformatconverter2 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter2)

       self.videoformatconverter3 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter3)

       self.videoformatconverter4 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter4)

       self.xvimagesink = Gst.ElementFactory.make('xvimagesink',None)
       self.pipeline.add(self.xvimagesink)

       self.v4lsrc1.link(self.camerafilter1)
       self.camerafilter1.link(self.videoformatconverter1)
       self.videoformatconverter1.link(self.textoverlay)
       self.textoverlay.link(self.videobox1)
       self.videobox1.link(self.videomixer)

       self.v4lsrc2.link(self.camerafilter2)
       self.camerafilter2.link(self.videoformatconverter2)
       self.videoformatconverter2.link(self.videoenc)
       self.videoenc.link(self.videortppay)
       self.videortppay.link(self.videortpdepay)
       self.videortpdepay.link(self.videodec)
       self.videodec.link(self.videoformatconverter3)
       self.videoformatconverter3.link(self.videomixer)

       self.videomixer.link(self.videoformatconverter4)
       self.videoformatconverter4.link(self.xvimagesink)
       self.pipeline.set_state(Gst.State.PLAYING)
       mainloop.run()
   
if __name__ == "__main__":
    app = VideoBox()
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    exit_status = app.run(sys.argv)
    sys.exit(exit_status)

The long journey towards good free video conferencing

One project we been working on here at Red Hat Brno is to make sure we have a nicely working voice and video calling with Empathy in Fedora 18. The project is being spearheaded by Debarshi Ray with me trying to help out with the testing. We are still not there, but we are making good progress thanks to the help of people like Brian Pepple, Sjoerd Simons, Olivier Crete and Guillaume Desmottes and more.

But having been involved with open source multimedia for so long I thought it could be interesting for people to know why free video calling have taken so long to get right and why we still have a little bit to go. So I decided to do this write up of some of the challenges involved. Be aware though that this article is mostly discuss the general historical challenges of getting free VoIP up and running, but I will try to tie that into the specific issues we are trying to resolve currently where relevant.

Protocols

The first challenge that had to be overcome was the challenge of protocols. VoIP and video calling has been around for a while (which an application like Ekiga is proof of), but it has been hampered by a jungle of complex standards, closed protocols, lack of interoperability and so on. Some of the older standards also require non-free codecs to operate. The open standard that has started to turn this around is XMPP which is the protocol that came out of the Jabber project. Originally it was just an open text chat network, but thanks to ongoing work it now features voice and video conferencing too. It also got a boost as Google choose it as the foundation for their GTalk offering ensuring that anyone with a gmail address suddenly was available to chat or call. That said like any developing protocol it has its challenges, and some slight differences in behaviour between a Google jabber server and most others is causing us some pain with video calls currently, which is one of the issues we are trying to figure out how to resolve.

Codecs and interoperability

The other thing that has hounded us is the combination of non-free codecs and the need for interoperability. For a video calling system to be interesting to use you would need to be able to use it to contact at least a substantial subset of your friends and family. For the longest time this either meant using a non-free codec, because if you relied solely on free codecs no widely used client out there would be able to connect with you. But thanks to the effort of first Xiph.org to create the Speex audio codec and now most recently the Opus audio codec, and later the adoption of Speex by Google has at least mostly resolved things on the audio side of things. On the video side things are still not 100% there. We have the Theora video codec from Xiph.org, but unfortunately when the RTP specification for that codec was written, the primary usecase in mind was RTSP streaming and not video conferencing, making the Theora RTP a bit hairy to use for video conferencing. The other bigger issue with Theora is that outside the Linux world nobody adopted Theora for video calling, so once again you are not likely able to use it to call a very large subset of your friends and family unless they are all on Linux systems.
There might be a solution on the way though in the form of new kid on the block, VP8. VP8 is a video codec that Google released as part of their WebM HTML5 video effort. The RTP specification for VP8 is still under development, so adoption is limited, but the hope and expectation is that Google will support VP8 in their GTalk client once the RTP specification is stable and thus we should have a good set of free codecs for both Audio and Video available and in the hands of a large user base.

Frameworks

Video calling is a quite complex technical issue, with a lot of components needing to work together from audio and video acquisition on your local machine, integrating with your address book, negotiating the call between the parties involved, putting everything into RTP packets on one side and unpacking and displaying them on the other side, taking into account the network, firewalls and and audio and video sync. So in order for a call to work you will need (among others) ALSA, PulseAudio, V4L2, GStreamer, Evolution Data Server, Farstream, libnice, the XMPP server, Telepathy and Empathy to work together across two different systems. And if you want to interoperate with a 3rd party system like GTalk the list of components that all need to work perfectly with each other grows further.

A lot of this software has been written in parallel with each other, written in parallel with evolving codecs and standards, and it tries to interoperate with as many 3rd party systems as possible. This has come at the cost of stability, which of course has turned people of from using and testing the video call functionality of Empathy. But we believe that we have reached a turning point now where the pieces are in place, which is why we are now trying to help stabilize and improve the experience to make doing VoIP and video conferencing calls work nicely out of the box on Fedora 18.

Missing pieces

In addition to the nitty gritty of protocols and codecs there are other pieces that has been lacking to give users a really good experience. The most critical one is good echo cancellation. This is required in order to avoid having an ugly echo effect when trying to use your laptop built-in speakers and microphone for a call. So people have been forced to use a headset to make things work reasonably well. This was a quite hard issue to solve as there was neither any great open source code available which implemented echo cancellation or a good way to hook it into the system. To start addressing this issue while I was working for Collabora Multimedia we reached out to the Dutch non-profit NLnet Foundation who sponsored us to have Wim Taymans work on creating an echo cancellation framework for PulseAudio. The goal was to create the framework within PulseAudio to support pluggable echo cancellation modules, turn two existing open source echo cancellation solutions into plugins for this framework as examples and proof of concept, and hope that the availability of such a framework would encourage other groups or individuals to release better echo cancellation modules going forward.
When we started this work the best existing open source echo cancellation system was Speex DSP. Unfortunately SpeexDSP had a lot of limitations, for instance it could not work well with two soundcards, which meant using your laptop speakers for output and a USB microphone for input would not work. Although we can claim no direct connection as things would have it Google ended up releasing a quite good echo cancellation algorithm as part of their WebRTC effort. This was quickly turned into a library and plugin for PulseAudio by Arun Raghavan. And this combined PulseAudio and WebRTC echo cancellation system is what we will have packaged and available in Fedora 18.

Summary

So I outlined a few of the challenges around having a good quality VoIP and video conferencing solution shipping out of the box on a Linux Distribution. And some of the items like the Video Codec situation and general stack stability is not 100% there yet. There also is quite a few bugs in Empathy in terms of behaviour, but Debarshi are already debugging those and with the help of the Telepathy and Empathy teams we should hopefully get those issues patched and merged before Fedora 18 is shipping. Our goal is to get Empathy up to a level where people want to be using it to make VoiP and Video calls, as that is also the best way to ensure things stay working going forward.

In addition to Debarshi, another key person helping us with this effort in the Fedora community is Brian Pepple, who are making sure we are getting releases and updates of GStreamer, Telpathy, Farstream, libnice and so on packaged for Fedora 18 almost on the day. This is making testing and verifying bugfixes a lot easier for us.

Future plans

There are also some nice to have items we want to look at going forward after having stabilized the current functionality. For instance Red Hat and Xiph.org codec guru Monty Montgomery suggested we add a video noise reduction video to the GStreamer pipeline inside Empathy in order to improve quality and performance when using a low quality built in web camera. [Edit: Sjoerd just tolm me the Gst 0.10 version of the code had such a plugin available, so this might not be to hard to resolve.]
Debarshi is also interested in seeing if we can help move the multiparty chat feature forward. But we are not expecting to be able to work on these issues before Fedora 18 is released.

Back from GStreamer Conference in Prague

I am back home for about a day today after spending a week in Prague for the GStreamer Conference and LinuxCon Europe.

Had an absolute blast and I am really happy the GStreamer Conference again turned out to be a big success. A big thanks to the platinum sponsor Collabora, and the two silver sponsors Fluendo and Google who made it all possible. Also a big thanks to Ubicast who was there onsite recording all talks. They aim to have all the talks online within a Month.

While I had to run a bit back and forth to make sure things was running smoothly, I did get to some very interesting talks, like Monty Montgomery from Xiph.org talking about the new Opus audio codec they are working on with the IETF and the strategies they are working on to fend of bogus patent claims.

On a related note I saw that Apple released their lossless audio codec ALAC as free software under the Apache license. Always nice to see such things even if ALAC for the most part has failed to make any headway against the already free FLAC codec. If Apple now would join the effort around WebM things would really start looking great in the codec space.

We did a Collabora booth during the LinuxCon and Embedded Linux days that followed the GStreamer Conference. Our demos showcasing a HTML5 video editing UI using GStreamer and the GStreamer Editing Services and video conferencing using Telepathy through HTML5 was a great success and our big screen TV running the Media Explorer media center combined with Telepathy based video conferencing provided us with a steady stream of people to our booth. For those who missed the conference all the tech demos can be grabbed from this Prague-demo Ubuntu PPA.

So as you might imagine I was quite tired by the time Friday was almost done, but thanks to Tim Bird and Sony I got a really nice end to the week as I won a Sony Tablet S through the Elinux Wiki editing competition. The tablet is really nice and it was the first tablet I ever wanted, so winning one was really great. The feature set is really nice with built in DLNA support, it can function as a TV remote and it has support for certain Playstation 1 titles. The ‘folded magazine’ shape makes it really nice to hold and I am going to try to use it as an e-book reader as I fly off to Lahore tomorrow morning for my sister-in-laws wedding.

Dynamic Adaptive Streaming over HTTP standard, ST Microelectronics, Fraunhofer and GStreamer

Edward Hervey pointed out to me this morning that there are some nice articles online about an effort between ST Microelectronics and Frauenhofer, around the 3GP DASH (Dynamic Adaptive Streaming over HTTP) standard. There is for instance this article in Thinq magazine and this article on TMCnet. What you might not know and which is even cooler is that Emanuele Quacchio will be speaking this GStreamer based DASH implementation at the the GStreamer Conference later this Month. So if you haven’t signed up for the GStreamer Conference already, then maybe this is the time to do it :)

I am really excited about this years GStreamer Conference as we have a lot of ongoing efforts about to come to fruits. From Collabora we got Wim Taymans will be talking about GStreamer 1.0 effort, which we expect to have out before years end and Tim-Philipp Müller will speak about a lot of the other incredible advances we made over the last year. Being in the middle of it I think its easy to go a bit blind due to the gradual process, but things like the new parsing libraries that Thibault Saunier have been working on, which will enable much quicker and better support for things like libva and vdpau plugins in GStreamer. Or the new baseclasses that Mark Nauwelaerts have ported most of our plugins over to now, which in one fell swoop improved our plugin quality by leaps and bounds. And of course there are things like the GStreamer Editing services (GES), discoverer and encodebin which Edward Hervey created, which will make applications like Transmageddon video transcoder and remuxer and the PiTiVi video editor a lot easier to develop.

We will also be doing some real cool demonstrations of stuff we have been working on at Collabora at the Linux Con Showcase on Thursday. Thanks to GES we have a great demo of a mobile editing solution using either QML or HTML5. We have HTML5 video calling using Telepathy and we have Video calling using Telepathy from the Media explorer media center solution.

Another talk that I will be sure not to miss is Jan Schmidt who will be talking about Blu-Ray playback with GStreamer. In addition to being technically interesting Jans talks are always fun, like last year he did his presentation using GStreamer instead of something like LibreOffice, having created his slides as a DVD menu through a small program he wrote to turn SVG files into DVD menus :)

Instant messaging hackfest at Collabora office

This week the Collabora office has been filled with a great group of people trying to make sure the instant messaging in GNOME Shell among other things works nicely. For those of us who use GNOME shell, like with latest Fedora, the integration into the shell is quite nice, but it also has some irritating behavioural issues. To resolve these issues some of our top Collabora coders working on Empathy and Telepathy has joined forces with coders from Red Hat, Intel and the community who work on messaging and/or the GNOME shell, to iron out the remaining issues and define any new APIs that are needed. The full agenda and attendee list can be found on the IM hackfest page.

To give you all an idea of the event I took this photo of the group sitting in our meeting room today:

Hackfest at Collabora office

For day to day reporting I suggest following the blog of Bastien Nocera who has been making daily posts from the hackfest. You would also want to read the update from the hackfest from Collaboras own Danielle Madeley.