Further progress on multistream Transmageddon

As mentioned in my previous blog entry I am working on multistream handling in Transmageddon. Not been a lot of changes, but I have been able to put in a little time here and there. The changes needed to accommodate this have also cleaned up the codebase quite a bit in my opinion, moving from a forest of variables to a list of python dictionaries. This change makes keeping track of whats happening in the codepath a lot easier as I can now just print the dictionary from the list to see what all relevant values are at a given point. Anyway a little screenshot below to show where I am at:

In-progress support for multiple audio streams.

Still quite a bit of work to do to clean up the codebase and decide how certain things are to be handled (or not handled), but it is getting there. Screenshot above actually demonstrates one thing I haven’t decided on yet, which is how to deal with combining a device preset with a multistream file.

The biggest blocker currently for finishing this work is that the GStreamer encodebin element does not have an API yet for dealing with selecting encoding settings for multiple streams as detailed in this bug report. If anyone got the inclination to cook up a patch for encodebin which adds support for this that would be much appreciated.

Anyway, once I have this completed I think my next step will be to try to add some kind of DVD ripping support to Transmageddon and some basic metadata checking/editing and move the video flipping support into a special menu and add support for enabling/disabling deinterlacing in that same special menu. I trying to figure out as I go along how I can keep the user interface simple and straightforward and add requested features. The question that I continuously ask myself is what features do belong in Transmageddon and what features are of a level where people should go to something like PiTiVi instead.

Improved handling of files with multiple tracks in GStreamer

Thanks to Sebastian Dröge there is a new thing in GStreamer called streamid. It basically gives all streams inside a given file a unique id, making files with multiple streams a lot easier to deal with. This streamid is also supported by the GStreamer discoverer object. So once you identified the contents of a file with discoverer you can be sure to grab the exact stream you want coming out of (uri)decodebin by checking the pad for the streamid. The most common usecase for this is of course files with multiple audio streams in different languages.

From the output of Discoverer the stream id is really easy to get:
On the stream object you get out of Discoverer you just run a:

stream.get_stream_id()

On the pad you get from decodebin or uridecodebin the patch is a bit more convoluted, but not
to hard once you know how (there might be some kind of convenience API added for this at some point).

Before you connect the pad you get from the bin you attach a pad to it like this:

src_pad.add_probe(Gst.PadProbeType.EVENT_DOWNSTREAM, self.padprobe, None)


Then you in the function you define you can extract the stream_id with the parse_stream_start call as seen below:

def padprobe(self, pad, probeinfo, userdata):
       event = probeinfo.get_event()
       eventtype=event.type
       if eventtype==Gst.EventType.STREAM_START:
           streamid = event.parse_stream_start() 
       return Gst.PadProbeReturn.OK

I been using this code in my local copy of Transmageddon to start implementing support for files with multiple audio streams (also supporting multiple video streams would be easy, but I am not sure how useful it would be). Got a screenshot of my current development snapshot below, but I am still trying to figure out what would be a nice way to present it. The current setup will look quite crap if the incoming file got more than a few audio streams. Suggestions welcome :)

Transmageddon multistream  devshot

Transmageddon multistream development snapshot

GStreamer, Python and videomixing

One feature that would be of interest to us in the Empathy Video Conference client is the ability to record conversations. Due to that I have been putting together a simple prototype Python test application in free moments to verify that everything works as expected, before any effort is put into doing any work inside Empathy.

The sample code below requires two webcams to be connected to your system to work. It basically takes the two camera video streams, puts one of them through a encode/rtp/decode process (to roughly emulate what happens in a video call) and puts a text overlay onto the video to let the conference participant know the call is being recorded. The two video streams are then mixed together and displayed. In the actual application the combined stream would be saved to disk instead of course and also audio captured and mixed.

If we ever get around to working on this feature is an open question, but at least we can now assume that it is likely to work. Of course getting one stream in over the network over RTP is very different from what this sample does, so that might uncover some bugs.

The sample also works with Python3, so even though it is only a prototype it already fulfils the GNOME Goal :)

import sys
from gi.repository import Gst
from gi.repository import GObject
GObject.threads_init()
Gst.init(None)

import os

class VideoBox():
   def __init__(self):
       mainloop = GObject.MainLoop()
       # Create transcoding pipeline
       self.pipeline = Gst.Pipeline()


       self.v4lsrc1 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc1.set_property("device", "/dev/video0")
       self.pipeline.add(self.v4lsrc1)

       self.v4lsrc2 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc2.set_property("device", "/dev/video1")
       self.pipeline.add(self.v4lsrc2)

       camera1caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter1 = Gst.ElementFactory.make("capsfilter", "filter1") 
       self.camerafilter1.set_property("caps", camera1caps)
       self.pipeline.add(self.camerafilter1)

       self.videoenc = Gst.ElementFactory.make("theoraenc", None)
       self.pipeline.add(self.videoenc)

       self.videodec = Gst.ElementFactory.make("theoradec", None)
       self.pipeline.add(self.videodec)

       self.videortppay = Gst.ElementFactory.make("rtptheorapay", None)
       self.pipeline.add(self.videortppay)

       self.videortpdepay = Gst.ElementFactory.make("rtptheoradepay", None)
       self.pipeline.add(self.videortpdepay)

       self.textoverlay = Gst.ElementFactory.make("textoverlay", None)
       self.textoverlay.set_property("text","Talk is being recorded")
       self.pipeline.add(self.textoverlay)

       camera2caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter2 = Gst.ElementFactory.make("capsfilter", "filter2") 
       self.camerafilter2.set_property("caps", camera2caps)
       self.pipeline.add(self.camerafilter2)

       self.videomixer = Gst.ElementFactory.make('videomixer', None)
       self.pipeline.add(self.videomixer)

       self.videobox1 = Gst.ElementFactory.make('videobox', None)
       self.videobox1.set_property("border-alpha",0)
       self.videobox1.set_property("top",0)
       self.videobox1.set_property("left",-320)
       self.pipeline.add(self.videobox1)

       self.videoformatconverter1 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter1)

       self.videoformatconverter2 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter2)

       self.videoformatconverter3 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter3)

       self.videoformatconverter4 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter4)

       self.xvimagesink = Gst.ElementFactory.make('xvimagesink',None)
       self.pipeline.add(self.xvimagesink)

       self.v4lsrc1.link(self.camerafilter1)
       self.camerafilter1.link(self.videoformatconverter1)
       self.videoformatconverter1.link(self.textoverlay)
       self.textoverlay.link(self.videobox1)
       self.videobox1.link(self.videomixer)

       self.v4lsrc2.link(self.camerafilter2)
       self.camerafilter2.link(self.videoformatconverter2)
       self.videoformatconverter2.link(self.videoenc)
       self.videoenc.link(self.videortppay)
       self.videortppay.link(self.videortpdepay)
       self.videortpdepay.link(self.videodec)
       self.videodec.link(self.videoformatconverter3)
       self.videoformatconverter3.link(self.videomixer)

       self.videomixer.link(self.videoformatconverter4)
       self.videoformatconverter4.link(self.xvimagesink)
       self.pipeline.set_state(Gst.State.PLAYING)
       mainloop.run()
   
if __name__ == "__main__":
    app = VideoBox()
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    exit_status = app.run(sys.argv)
    sys.exit(exit_status)

LibreOffice coding at Red Hat

At Red Hat we are involved with a lot of cool open source projects. One of these is the popular LibreOffice productivity Suite, where we are putting in a lot of effort to make sure Red Hat customers and the community in general have a dependable and feature rich Office Suite available.

In addition to of course doing work to add features requested by Red Hat customers, the team focuses on helping build the upstream project and making sure we help push desktop integration forward.

In fact the work done by Caolán McNamara, David Tardon, Stephan Bergmann, Michael Stahl and Eike Rathke is making Red Hat a major contributor to LibreOffice. So to celebrate the success of our team so far we wanted to have some nice t-shirts made for this years
LibreOffice conference in Berlin to give the team. It would have added a nice little touch to a conference where Caolan did a talk about his cool widget layout work (*1), Michael did a talk about the migration of LibreOffice to gbuild, Stephan did a talk about API stability and Eike did a talk about collaborative editing.

Unfortunately the t-shirts came back late from the printer and thus missed the conference, but I will be sending them out to the team today so that they have them ready for the next LibreOffice event :)

RedHat LibreOffice team t-shirt

Anyway a big thank you from me to the team, they have been a pleasure working with since I joined Red Hat and I am looking forward to seeing what we will achieve over the next years.

The long journey towards good free video conferencing

One project we been working on here at Red Hat Brno is to make sure we have a nicely working voice and video calling with Empathy in Fedora 18. The project is being spearheaded by Debarshi Ray with me trying to help out with the testing. We are still not there, but we are making good progress thanks to the help of people like Brian Pepple, Sjoerd Simons, Olivier Crete and Guillaume Desmottes and more.

But having been involved with open source multimedia for so long I thought it could be interesting for people to know why free video calling have taken so long to get right and why we still have a little bit to go. So I decided to do this write up of some of the challenges involved. Be aware though that this article is mostly discuss the general historical challenges of getting free VoIP up and running, but I will try to tie that into the specific issues we are trying to resolve currently where relevant.

Protocols

The first challenge that had to be overcome was the challenge of protocols. VoIP and video calling has been around for a while (which an application like Ekiga is proof of), but it has been hampered by a jungle of complex standards, closed protocols, lack of interoperability and so on. Some of the older standards also require non-free codecs to operate. The open standard that has started to turn this around is XMPP which is the protocol that came out of the Jabber project. Originally it was just an open text chat network, but thanks to ongoing work it now features voice and video conferencing too. It also got a boost as Google choose it as the foundation for their GTalk offering ensuring that anyone with a gmail address suddenly was available to chat or call. That said like any developing protocol it has its challenges, and some slight differences in behaviour between a Google jabber server and most others is causing us some pain with video calls currently, which is one of the issues we are trying to figure out how to resolve.

Codecs and interoperability

The other thing that has hounded us is the combination of non-free codecs and the need for interoperability. For a video calling system to be interesting to use you would need to be able to use it to contact at least a substantial subset of your friends and family. For the longest time this either meant using a non-free codec, because if you relied solely on free codecs no widely used client out there would be able to connect with you. But thanks to the effort of first Xiph.org to create the Speex audio codec and now most recently the Opus audio codec, and later the adoption of Speex by Google has at least mostly resolved things on the audio side of things. On the video side things are still not 100% there. We have the Theora video codec from Xiph.org, but unfortunately when the RTP specification for that codec was written, the primary usecase in mind was RTSP streaming and not video conferencing, making the Theora RTP a bit hairy to use for video conferencing. The other bigger issue with Theora is that outside the Linux world nobody adopted Theora for video calling, so once again you are not likely able to use it to call a very large subset of your friends and family unless they are all on Linux systems.
There might be a solution on the way though in the form of new kid on the block, VP8. VP8 is a video codec that Google released as part of their WebM HTML5 video effort. The RTP specification for VP8 is still under development, so adoption is limited, but the hope and expectation is that Google will support VP8 in their GTalk client once the RTP specification is stable and thus we should have a good set of free codecs for both Audio and Video available and in the hands of a large user base.

Frameworks

Video calling is a quite complex technical issue, with a lot of components needing to work together from audio and video acquisition on your local machine, integrating with your address book, negotiating the call between the parties involved, putting everything into RTP packets on one side and unpacking and displaying them on the other side, taking into account the network, firewalls and and audio and video sync. So in order for a call to work you will need (among others) ALSA, PulseAudio, V4L2, GStreamer, Evolution Data Server, Farstream, libnice, the XMPP server, Telepathy and Empathy to work together across two different systems. And if you want to interoperate with a 3rd party system like GTalk the list of components that all need to work perfectly with each other grows further.

A lot of this software has been written in parallel with each other, written in parallel with evolving codecs and standards, and it tries to interoperate with as many 3rd party systems as possible. This has come at the cost of stability, which of course has turned people of from using and testing the video call functionality of Empathy. But we believe that we have reached a turning point now where the pieces are in place, which is why we are now trying to help stabilize and improve the experience to make doing VoIP and video conferencing calls work nicely out of the box on Fedora 18.

Missing pieces

In addition to the nitty gritty of protocols and codecs there are other pieces that has been lacking to give users a really good experience. The most critical one is good echo cancellation. This is required in order to avoid having an ugly echo effect when trying to use your laptop built-in speakers and microphone for a call. So people have been forced to use a headset to make things work reasonably well. This was a quite hard issue to solve as there was neither any great open source code available which implemented echo cancellation or a good way to hook it into the system. To start addressing this issue while I was working for Collabora Multimedia we reached out to the Dutch non-profit NLnet Foundation who sponsored us to have Wim Taymans work on creating an echo cancellation framework for PulseAudio. The goal was to create the framework within PulseAudio to support pluggable echo cancellation modules, turn two existing open source echo cancellation solutions into plugins for this framework as examples and proof of concept, and hope that the availability of such a framework would encourage other groups or individuals to release better echo cancellation modules going forward.
When we started this work the best existing open source echo cancellation system was Speex DSP. Unfortunately SpeexDSP had a lot of limitations, for instance it could not work well with two soundcards, which meant using your laptop speakers for output and a USB microphone for input would not work. Although we can claim no direct connection as things would have it Google ended up releasing a quite good echo cancellation algorithm as part of their WebRTC effort. This was quickly turned into a library and plugin for PulseAudio by Arun Raghavan. And this combined PulseAudio and WebRTC echo cancellation system is what we will have packaged and available in Fedora 18.

Summary

So I outlined a few of the challenges around having a good quality VoIP and video conferencing solution shipping out of the box on a Linux Distribution. And some of the items like the Video Codec situation and general stack stability is not 100% there yet. There also is quite a few bugs in Empathy in terms of behaviour, but Debarshi are already debugging those and with the help of the Telepathy and Empathy teams we should hopefully get those issues patched and merged before Fedora 18 is shipping. Our goal is to get Empathy up to a level where people want to be using it to make VoiP and Video calls, as that is also the best way to ensure things stay working going forward.

In addition to Debarshi, another key person helping us with this effort in the Fedora community is Brian Pepple, who are making sure we are getting releases and updates of GStreamer, Telpathy, Farstream, libnice and so on packaged for Fedora 18 almost on the day. This is making testing and verifying bugfixes a lot easier for us.

Future plans

There are also some nice to have items we want to look at going forward after having stabilized the current functionality. For instance Red Hat and Xiph.org codec guru Monty Montgomery suggested we add a video noise reduction video to the GStreamer pipeline inside Empathy in order to improve quality and performance when using a low quality built in web camera. [Edit: Sjoerd just tolm me the Gst 0.10 version of the code had such a plugin available, so this might not be to hard to resolve.]
Debarshi is also interested in seeing if we can help move the multiparty chat feature forward. But we are not expecting to be able to work on these issues before Fedora 18 is released.

GStreamer 1.x documentation getting a boost

GStreamer maintainer Wim Taymans decided that having a brand new GStreamer 1.x series was only worth the effort if we also had some nice up to date documentation for GStreamer 1.0. So over the last week he has been going over the GStreamer Application Development manual making sure it is up to date and fixing all the code examples and adding new chapters even. So if you want to get into GStreamer development our introduction manual should now be a good starting point again!

3 Months at Red Hat in Brno

So I got an email yesterday telling me I hit the 3 Month milestone at Red Hat. It has been an incredible experience so far and it has been a joy meeting all the talented people working for Red Hat here in Brno. Been a lot of ramping up for me, learning and understanding the Red Hat release processes and the tools we use. I think one of the things I appreciated the most during my time here is seeing how Red Hat even now as a large billion dollar company has stayed so true to their values, and how you see the belief in the open source model is not just something you see among the engineers and technical staff, but even among the top manager of the company, going all the way up to the CEO.

There are also a lot of really nice efforts and programs being run here in Brno with the company having an impressive level of interaction and collaboration with the local universities. Red Hat staff do lectures, mentor bachelor and master projects and let a large amount of students work with us as interns. And more often than not those internships leads to a job with Red Hat.

Yesterday we had a big party here in Brno to celebrate the opening of our new office building, as having reached a headcount of 500 here in Brno, the current building was starting to feel quite cramped. The bottom floors of the current building will be emptied out now and rebuilt, so that they are ready to house the next batch of new hires. We have a nice space set aside for the Brno desktop team in the new building, so this Friday we will all be moving there which I think will help us work even better together.

On a personal front my wife and daughter arrived last week after we finally manage to get my wife a visa and this morning the moving truck arrived with our stuff from the UK. Next step is a raid to IKEA to fill in the missing pieces. So all set for the next Months and years here :)

Intersted in Python, Ruby or Javascript?

There is a great conference here in Brno called Rupy, which will be held on November 16th to 18th this year which focuses on state of the art programming languages and related topics. A lot of people from Red Hat here in Brno will attend, but also a lot of people travelling in from abroad. So if you are into programming languages and compilers, be sure to come. And you also get to try some great food and beers here in Brno!

Transmageddon 0.24 is out

So I finally released an official new Transmageddon release today, 0.24. In addition to the changes from the 0.23 test release (GTK3 and GStreamer 1.0), this new version uses Python 3 and got some fixes for improved handling of missing codecs. Let me know if its giving you any trouble.

Next step is looking into how to implement some or all of the new interface design from Gendre Sébastien.

GStreamer 1.0 Released

So this news is a couple of days old now, but I wanted to write a blog entry about the exciting release of GStreamer 1.0. When we released GStreamer 0.10 about 7 years ago we did not expect or plan the 0.10 series to last as long as it did, I think if we had it would have been called 1.0 instead of 0.10. Our caution back then was that 0.10 was a quite revolutionary version with the core of GStreamer extensively re-designed around effective use of threads and thread safety. The new GStreamer 1.0 is more of incremental improvement, cleaning up the API and making doing things modern systems expect easier and more straightforward. I think a lot of the work that went into 1.0 could be said to be based on cleaning up the awkward APIs that can evolve as you are not able to change anything existing, just add new stuff that does not affect the old.

That said there are a lot of major improvements to be seen too, with the list that Tim-Phillip Muller put together for the GStreamer website catching the major items:

  • more flexible memory handling
  • extensible and negotiable metadata for buffers
  • caps negotiation and renegotiation mechanisms, decoupled from buffer allocation
  • improved caps renegotiation
  • automatic re-sending of state for dynamic pipelines
  • reworked and more fine-grained pad probing
  • simpler and more descriptive audio and video caps
  • more efficient allocation of buffers, events and other mini objects
  • improved timestamp handling
  • support for gobject-inspection-based language bindings

The list can seem a bit technical and dry, but there are a lot of things that will benefit users here once plugins and applications start taking advantage of them. For instance the more flexible memory handling will improve performance when for instance running GStreamer on ARM boards like a Panda board or a Raspberry Pi. The changes will also make it a lot easier to write plugins and use plugins on PC platforms which use the GPU for decoding or encoding and that use OpenGL for rendering the playback. These things where possible with 0.10, but they required a bit more special casing and more code in the plugins and a bit more overhead in setting up the pipeline. If you want a great introduction to what is in this 1.0 release I recommend the keynote by Wim Taymans about GStreamer 1.0 and the GStreamer Status report keynote by Tim-Phillip Müller which both talk alot about the new features and the possibilities they open.

I think that the biggest change for a lot of developers though will be if they are using a language binding for their application, GStreamer 1.0 is offering Gobject introspection bindings support, which means that most bindings from now on will be using that. For Python developers like myself that is a quite big change in API. On the other hand it also does mean that porting to Python3 has become very simple. For Transmageddon I simply ran it through the Python 2to3 script and it just worked.

There are a lot of contributors to this release, and I would love to thank them all, but I think Wim and Tim also deserve a special mention as I don’t think there would be a GStreamer 1.0 without them. Wim has of course been the GStreamer maintainer for a long while and he shepherded this release from the beginning when he was the only developer working on it. Tim has been the GStreamer release manager for a long while now and are doing a wonderful job, fixing a gazillion bugs and making sure we regressions are not allowed to creep into GStreamer releases. He ended up doing a lot of heavy lifting to get the final GStreamer 0.11 test releases and the final 1.0 out the door. So a big big thanks to both of them.

Another person I want to give an extra kudos to is Bastien Nocera who have done a lot of working porting GNOME applications to GStreamer 1.0, there is and was of course a lot of other people involved in that process too, but Bastien did an incredible job working through that list and writing patches to port many of them over.

What I am really looking forward to now is the release of Fedora 18 as I think it will be the first chance for a lot of users and developers to try out all the great work that has been done on GStreamer 1.0 and porting applications over to it. I am personally targeting Fedora 18 with Transmageddon, doing my current testing and development on a F18 system, making sure things like the PackageKit integration is working smoothly.