One thing we are doing here at Red Hat Brno is maintain Firefox for Fedora and RHEL. The job is mostly focused on making sure we have Firefox available on all RHEL versions with all the latest security fixes, but it also gives our great team of Martin Stransky and Jan Horak some time to work on adding new features to Firefox to make sure it feels like a more integrated part of your desktop. They are currently working on 3 such features that you will hopefully be able to enjoy soon. The first is a patch to inhibit the screensaver when you are watching HTML5 or Flash content fullscreen. So if you are annoyed by having to move your mouse every 3 minutes to avoid the screen dimming when watching The Daily Show this is the fix for you. The second item they are working on is enabling the GStreamer backend in Firefox on Fedora. Which means that if you install for instance H264 support for Totem you will also have H264 support for HTML5 in Firefox. And finally there is also ongoing work on adding support for GIO in Firefox to make sure that any setup that works with GIO in terms of remote file access also works with Firefox, this latest task is taking some time though as it is currently blocking on some code refactoring in Firefox.
As mentioned in my previous blog entry I am working on multistream handling in Transmageddon. Not been a lot of changes, but I have been able to put in a little time here and there. The changes needed to accommodate this have also cleaned up the codebase quite a bit in my opinion, moving from a forest of variables to a list of python dictionaries. This change makes keeping track of whats happening in the codepath a lot easier as I can now just print the dictionary from the list to see what all relevant values are at a given point. Anyway a little screenshot below to show where I am at:
Still quite a bit of work to do to clean up the codebase and decide how certain things are to be handled (or not handled), but it is getting there. Screenshot above actually demonstrates one thing I haven’t decided on yet, which is how to deal with combining a device preset with a multistream file.
The biggest blocker currently for finishing this work is that the GStreamer encodebin element does not have an API yet for dealing with selecting encoding settings for multiple streams as detailed in this bug report. If anyone got the inclination to cook up a patch for encodebin which adds support for this that would be much appreciated.
Anyway, once I have this completed I think my next step will be to try to add some kind of DVD ripping support to Transmageddon and some basic metadata checking/editing and move the video flipping support into a special menu and add support for enabling/disabling deinterlacing in that same special menu. I trying to figure out as I go along how I can keep the user interface simple and straightforward and add requested features. The question that I continuously ask myself is what features do belong in Transmageddon and what features are of a level where people should go to something like PiTiVi instead.
Thanks to Sebastian Dröge there is a new thing in GStreamer called streamid. It basically gives all streams inside a given file a unique id, making files with multiple streams a lot easier to deal with. This streamid is also supported by the GStreamer discoverer object. So once you identified the contents of a file with discoverer you can be sure to grab the exact stream you want coming out of (uri)decodebin by checking the pad for the streamid. The most common usecase for this is of course files with multiple audio streams in different languages.
From the output of Discoverer the stream id is really easy to get:
On the stream object you get out of Discoverer you just run a:
On the pad you get from decodebin or uridecodebin the patch is a bit more convoluted, but not
to hard once you know how (there might be some kind of convenience API added for this at some point).
Before you connect the pad you get from the bin you attach a pad to it like this:
src_pad.add_probe(Gst.PadProbeType.EVENT_DOWNSTREAM, self.padprobe, None)
Then you in the function you define you can extract the stream_id with the parse_stream_start call as seen below:
def padprobe(self, pad, probeinfo, userdata): event = probeinfo.get_event() eventtype=event.type if eventtype==Gst.EventType.STREAM_START: streamid = event.parse_stream_start() return Gst.PadProbeReturn.OK
I been using this code in my local copy of Transmageddon to start implementing support for files with multiple audio streams (also supporting multiple video streams would be easy, but I am not sure how useful it would be). Got a screenshot of my current development snapshot below, but I am still trying to figure out what would be a nice way to present it. The current setup will look quite crap if the incoming file got more than a few audio streams. Suggestions welcome
One feature that would be of interest to us in the Empathy Video Conference client is the ability to record conversations. Due to that I have been putting together a simple prototype Python test application in free moments to verify that everything works as expected, before any effort is put into doing any work inside Empathy.
The sample code below requires two webcams to be connected to your system to work. It basically takes the two camera video streams, puts one of them through a encode/rtp/decode process (to roughly emulate what happens in a video call) and puts a text overlay onto the video to let the conference participant know the call is being recorded. The two video streams are then mixed together and displayed. In the actual application the combined stream would be saved to disk instead of course and also audio captured and mixed.
If we ever get around to working on this feature is an open question, but at least we can now assume that it is likely to work. Of course getting one stream in over the network over RTP is very different from what this sample does, so that might uncover some bugs.
The sample also works with Python3, so even though it is only a prototype it already fulfils the GNOME Goal
import sys from gi.repository import Gst from gi.repository import GObject GObject.threads_init() Gst.init(None) import os class VideoBox(): def __init__(self): mainloop = GObject.MainLoop() # Create transcoding pipeline self.pipeline = Gst.Pipeline() self.v4lsrc1 = Gst.ElementFactory.make('v4l2src', None) self.v4lsrc1.set_property("device", "/dev/video0") self.pipeline.add(self.v4lsrc1) self.v4lsrc2 = Gst.ElementFactory.make('v4l2src', None) self.v4lsrc2.set_property("device", "/dev/video1") self.pipeline.add(self.v4lsrc2) camera1caps = Gst.Caps.from_string("video/x-raw, width=320,height=240") self.camerafilter1 = Gst.ElementFactory.make("capsfilter", "filter1") self.camerafilter1.set_property("caps", camera1caps) self.pipeline.add(self.camerafilter1) self.videoenc = Gst.ElementFactory.make("theoraenc", None) self.pipeline.add(self.videoenc) self.videodec = Gst.ElementFactory.make("theoradec", None) self.pipeline.add(self.videodec) self.videortppay = Gst.ElementFactory.make("rtptheorapay", None) self.pipeline.add(self.videortppay) self.videortpdepay = Gst.ElementFactory.make("rtptheoradepay", None) self.pipeline.add(self.videortpdepay) self.textoverlay = Gst.ElementFactory.make("textoverlay", None) self.textoverlay.set_property("text","Talk is being recorded") self.pipeline.add(self.textoverlay) camera2caps = Gst.Caps.from_string("video/x-raw, width=320,height=240") self.camerafilter2 = Gst.ElementFactory.make("capsfilter", "filter2") self.camerafilter2.set_property("caps", camera2caps) self.pipeline.add(self.camerafilter2) self.videomixer = Gst.ElementFactory.make('videomixer', None) self.pipeline.add(self.videomixer) self.videobox1 = Gst.ElementFactory.make('videobox', None) self.videobox1.set_property("border-alpha",0) self.videobox1.set_property("top",0) self.videobox1.set_property("left",-320) self.pipeline.add(self.videobox1) self.videoformatconverter1 = Gst.ElementFactory.make('videoconvert', None) self.pipeline.add(self.videoformatconverter1) self.videoformatconverter2 = Gst.ElementFactory.make('videoconvert', None) self.pipeline.add(self.videoformatconverter2) self.videoformatconverter3 = Gst.ElementFactory.make('videoconvert', None) self.pipeline.add(self.videoformatconverter3) self.videoformatconverter4 = Gst.ElementFactory.make('videoconvert', None) self.pipeline.add(self.videoformatconverter4) self.xvimagesink = Gst.ElementFactory.make('xvimagesink',None) self.pipeline.add(self.xvimagesink) self.v4lsrc1.link(self.camerafilter1) self.camerafilter1.link(self.videoformatconverter1) self.videoformatconverter1.link(self.textoverlay) self.textoverlay.link(self.videobox1) self.videobox1.link(self.videomixer) self.v4lsrc2.link(self.camerafilter2) self.camerafilter2.link(self.videoformatconverter2) self.videoformatconverter2.link(self.videoenc) self.videoenc.link(self.videortppay) self.videortppay.link(self.videortpdepay) self.videortpdepay.link(self.videodec) self.videodec.link(self.videoformatconverter3) self.videoformatconverter3.link(self.videomixer) self.videomixer.link(self.videoformatconverter4) self.videoformatconverter4.link(self.xvimagesink) self.pipeline.set_state(Gst.State.PLAYING) mainloop.run() if __name__ == "__main__": app = VideoBox() signal.signal(signal.SIGINT, signal.SIG_DFL) exit_status = app.run(sys.argv) sys.exit(exit_status)
One project we been working on here at Red Hat Brno is to make sure we have a nicely working voice and video calling with Empathy in Fedora 18. The project is being spearheaded by Debarshi Ray with me trying to help out with the testing. We are still not there, but we are making good progress thanks to the help of people like Brian Pepple, Sjoerd Simons, Olivier Crete and Guillaume Desmottes and more.
But having been involved with open source multimedia for so long I thought it could be interesting for people to know why free video calling have taken so long to get right and why we still have a little bit to go. So I decided to do this write up of some of the challenges involved. Be aware though that this article is mostly discuss the general historical challenges of getting free VoIP up and running, but I will try to tie that into the specific issues we are trying to resolve currently where relevant.
The first challenge that had to be overcome was the challenge of protocols. VoIP and video calling has been around for a while (which an application like Ekiga is proof of), but it has been hampered by a jungle of complex standards, closed protocols, lack of interoperability and so on. Some of the older standards also require non-free codecs to operate. The open standard that has started to turn this around is XMPP which is the protocol that came out of the Jabber project. Originally it was just an open text chat network, but thanks to ongoing work it now features voice and video conferencing too. It also got a boost as Google choose it as the foundation for their GTalk offering ensuring that anyone with a gmail address suddenly was available to chat or call. That said like any developing protocol it has its challenges, and some slight differences in behaviour between a Google jabber server and most others is causing us some pain with video calls currently, which is one of the issues we are trying to figure out how to resolve.
Codecs and interoperability
The other thing that has hounded us is the combination of non-free codecs and the need for interoperability. For a video calling system to be interesting to use you would need to be able to use it to contact at least a substantial subset of your friends and family. For the longest time this either meant using a non-free codec, because if you relied solely on free codecs no widely used client out there would be able to connect with you. But thanks to the effort of first Xiph.org to create the Speex audio codec and now most recently the Opus audio codec, and later the adoption of Speex by Google has at least mostly resolved things on the audio side of things. On the video side things are still not 100% there. We have the Theora video codec from Xiph.org, but unfortunately when the RTP specification for that codec was written, the primary usecase in mind was RTSP streaming and not video conferencing, making the Theora RTP a bit hairy to use for video conferencing. The other bigger issue with Theora is that outside the Linux world nobody adopted Theora for video calling, so once again you are not likely able to use it to call a very large subset of your friends and family unless they are all on Linux systems.
There might be a solution on the way though in the form of new kid on the block, VP8. VP8 is a video codec that Google released as part of their WebM HTML5 video effort. The RTP specification for VP8 is still under development, so adoption is limited, but the hope and expectation is that Google will support VP8 in their GTalk client once the RTP specification is stable and thus we should have a good set of free codecs for both Audio and Video available and in the hands of a large user base.
Video calling is a quite complex technical issue, with a lot of components needing to work together from audio and video acquisition on your local machine, integrating with your address book, negotiating the call between the parties involved, putting everything into RTP packets on one side and unpacking and displaying them on the other side, taking into account the network, firewalls and and audio and video sync. So in order for a call to work you will need (among others) ALSA, PulseAudio, V4L2, GStreamer, Evolution Data Server, Farstream, libnice, the XMPP server, Telepathy and Empathy to work together across two different systems. And if you want to interoperate with a 3rd party system like GTalk the list of components that all need to work perfectly with each other grows further.
A lot of this software has been written in parallel with each other, written in parallel with evolving codecs and standards, and it tries to interoperate with as many 3rd party systems as possible. This has come at the cost of stability, which of course has turned people of from using and testing the video call functionality of Empathy. But we believe that we have reached a turning point now where the pieces are in place, which is why we are now trying to help stabilize and improve the experience to make doing VoIP and video conferencing calls work nicely out of the box on Fedora 18.
In addition to the nitty gritty of protocols and codecs there are other pieces that has been lacking to give users a really good experience. The most critical one is good echo cancellation. This is required in order to avoid having an ugly echo effect when trying to use your laptop built-in speakers and microphone for a call. So people have been forced to use a headset to make things work reasonably well. This was a quite hard issue to solve as there was neither any great open source code available which implemented echo cancellation or a good way to hook it into the system. To start addressing this issue while I was working for Collabora Multimedia we reached out to the Dutch non-profit NLnet Foundation who sponsored us to have Wim Taymans work on creating an echo cancellation framework for PulseAudio. The goal was to create the framework within PulseAudio to support pluggable echo cancellation modules, turn two existing open source echo cancellation solutions into plugins for this framework as examples and proof of concept, and hope that the availability of such a framework would encourage other groups or individuals to release better echo cancellation modules going forward.
When we started this work the best existing open source echo cancellation system was Speex DSP. Unfortunately SpeexDSP had a lot of limitations, for instance it could not work well with two soundcards, which meant using your laptop speakers for output and a USB microphone for input would not work. Although we can claim no direct connection as things would have it Google ended up releasing a quite good echo cancellation algorithm as part of their WebRTC effort. This was quickly turned into a library and plugin for PulseAudio by Arun Raghavan. And this combined PulseAudio and WebRTC echo cancellation system is what we will have packaged and available in Fedora 18.
So I outlined a few of the challenges around having a good quality VoIP and video conferencing solution shipping out of the box on a Linux Distribution. And some of the items like the Video Codec situation and general stack stability is not 100% there yet. There also is quite a few bugs in Empathy in terms of behaviour, but Debarshi are already debugging those and with the help of the Telepathy and Empathy teams we should hopefully get those issues patched and merged before Fedora 18 is shipping. Our goal is to get Empathy up to a level where people want to be using it to make VoiP and Video calls, as that is also the best way to ensure things stay working going forward.
In addition to Debarshi, another key person helping us with this effort in the Fedora community is Brian Pepple, who are making sure we are getting releases and updates of GStreamer, Telpathy, Farstream, libnice and so on packaged for Fedora 18 almost on the day. This is making testing and verifying bugfixes a lot easier for us.
There are also some nice to have items we want to look at going forward after having stabilized the current functionality. For instance Red Hat and Xiph.org codec guru Monty Montgomery suggested we add a video noise reduction video to the GStreamer pipeline inside Empathy in order to improve quality and performance when using a low quality built in web camera. [Edit: Sjoerd just tolm me the Gst 0.10 version of the code had such a plugin available, so this might not be to hard to resolve.]
Debarshi is also interested in seeing if we can help move the multiparty chat feature forward. But we are not expecting to be able to work on these issues before Fedora 18 is released.
GStreamer maintainer Wim Taymans decided that having a brand new GStreamer 1.x series was only worth the effort if we also had some nice up to date documentation for GStreamer 1.0. So over the last week he has been going over the GStreamer Application Development manual making sure it is up to date and fixing all the code examples and adding new chapters even. So if you want to get into GStreamer development our introduction manual should now be a good starting point again!
So I finally released an official new Transmageddon release today, 0.24. In addition to the changes from the 0.23 test release (GTK3 and GStreamer 1.0), this new version uses Python 3 and got some fixes for improved handling of missing codecs. Let me know if its giving you any trouble.
Next step is looking into how to implement some or all of the new interface design from Gendre Sébastien.
So this news is a couple of days old now, but I wanted to write a blog entry about the exciting release of GStreamer 1.0. When we released GStreamer 0.10 about 7 years ago we did not expect or plan the 0.10 series to last as long as it did, I think if we had it would have been called 1.0 instead of 0.10. Our caution back then was that 0.10 was a quite revolutionary version with the core of GStreamer extensively re-designed around effective use of threads and thread safety. The new GStreamer 1.0 is more of incremental improvement, cleaning up the API and making doing things modern systems expect easier and more straightforward. I think a lot of the work that went into 1.0 could be said to be based on cleaning up the awkward APIs that can evolve as you are not able to change anything existing, just add new stuff that does not affect the old.
That said there are a lot of major improvements to be seen too, with the list that Tim-Phillip Muller put together for the GStreamer website catching the major items:
- more flexible memory handling
- extensible and negotiable metadata for buffers
- caps negotiation and renegotiation mechanisms, decoupled from buffer allocation
- improved caps renegotiation
- automatic re-sending of state for dynamic pipelines
- reworked and more fine-grained pad probing
- simpler and more descriptive audio and video caps
- more efficient allocation of buffers, events and other mini objects
- improved timestamp handling
- support for gobject-inspection-based language bindings
The list can seem a bit technical and dry, but there are a lot of things that will benefit users here once plugins and applications start taking advantage of them. For instance the more flexible memory handling will improve performance when for instance running GStreamer on ARM boards like a Panda board or a Raspberry Pi. The changes will also make it a lot easier to write plugins and use plugins on PC platforms which use the GPU for decoding or encoding and that use OpenGL for rendering the playback. These things where possible with 0.10, but they required a bit more special casing and more code in the plugins and a bit more overhead in setting up the pipeline. If you want a great introduction to what is in this 1.0 release I recommend the keynote by Wim Taymans about GStreamer 1.0 and the GStreamer Status report keynote by Tim-Phillip Müller which both talk alot about the new features and the possibilities they open.
I think that the biggest change for a lot of developers though will be if they are using a language binding for their application, GStreamer 1.0 is offering Gobject introspection bindings support, which means that most bindings from now on will be using that. For Python developers like myself that is a quite big change in API. On the other hand it also does mean that porting to Python3 has become very simple. For Transmageddon I simply ran it through the Python 2to3 script and it just worked.
There are a lot of contributors to this release, and I would love to thank them all, but I think Wim and Tim also deserve a special mention as I don’t think there would be a GStreamer 1.0 without them. Wim has of course been the GStreamer maintainer for a long while and he shepherded this release from the beginning when he was the only developer working on it. Tim has been the GStreamer release manager for a long while now and are doing a wonderful job, fixing a gazillion bugs and making sure we regressions are not allowed to creep into GStreamer releases. He ended up doing a lot of heavy lifting to get the final GStreamer 0.11 test releases and the final 1.0 out the door. So a big big thanks to both of them.
Another person I want to give an extra kudos to is Bastien Nocera who have done a lot of working porting GNOME applications to GStreamer 1.0, there is and was of course a lot of other people involved in that process too, but Bastien did an incredible job working through that list and writing patches to port many of them over.
What I am really looking forward to now is the release of Fedora 18 as I think it will be the first chance for a lot of users and developers to try out all the great work that has been done on GStreamer 1.0 and porting applications over to it. I am personally targeting Fedora 18 with Transmageddon, doing my current testing and development on a F18 system, making sure things like the PackageKit integration is working smoothly.
I have made a first test release of Transmageddon today, 0.23. It is the first release depending on GStreamer 0.11/1.0 and GTK3. It also features a little bit of GNOME Shell and Unity integration in the form of a notification message when its done transcoding and the menu has been moved into the shell. You can find the test release in the pre-release directory on linuxrising.org. Let me know if you have any issues so I can try to fix them before I make a final stable release when GStreamer 1.0 is released. Be aware that you want the GStreamer 0.11 releases from yesterday to try this release.
The above screenshot is taken on my Fedora 18 test system!
So thanks to Ubicast we now have the recordings of this years talks available online. A couple of talks got lost due to technical issues, but most of them are there and I hope you enjoy them. We are collecting the slides at the GStreamer Conference 2012 wiki page, but be aware that we are having some issues with the wiki this morning.