Improved handling of files with multiple tracks in GStreamer

Thanks to Sebastian Dröge there is a new thing in GStreamer called streamid. It basically gives all streams inside a given file a unique id, making files with multiple streams a lot easier to deal with. This streamid is also supported by the GStreamer discoverer object. So once you identified the contents of a file with discoverer you can be sure to grab the exact stream you want coming out of (uri)decodebin by checking the pad for the streamid. The most common usecase for this is of course files with multiple audio streams in different languages.

From the output of Discoverer the stream id is really easy to get:
On the stream object you get out of Discoverer you just run a:

stream.get_stream_id()

On the pad you get from decodebin or uridecodebin the patch is a bit more convoluted, but not
to hard once you know how (there might be some kind of convenience API added for this at some point).

Before you connect the pad you get from the bin you attach a pad to it like this:

src_pad.add_probe(Gst.PadProbeType.EVENT_DOWNSTREAM, self.padprobe, None)


Then you in the function you define you can extract the stream_id with the parse_stream_start call as seen below:

def padprobe(self, pad, probeinfo, userdata):
       event = probeinfo.get_event()
       eventtype=event.type
       if eventtype==Gst.EventType.STREAM_START:
           streamid = event.parse_stream_start() 
       return Gst.PadProbeReturn.OK

I been using this code in my local copy of Transmageddon to start implementing support for files with multiple audio streams (also supporting multiple video streams would be easy, but I am not sure how useful it would be). Got a screenshot of my current development snapshot below, but I am still trying to figure out what would be a nice way to present it. The current setup will look quite crap if the incoming file got more than a few audio streams. Suggestions welcome :)

Transmageddon multistream  devshot

Transmageddon multistream development snapshot

GStreamer, Python and videomixing

One feature that would be of interest to us in the Empathy Video Conference client is the ability to record conversations. Due to that I have been putting together a simple prototype Python test application in free moments to verify that everything works as expected, before any effort is put into doing any work inside Empathy.

The sample code below requires two webcams to be connected to your system to work. It basically takes the two camera video streams, puts one of them through a encode/rtp/decode process (to roughly emulate what happens in a video call) and puts a text overlay onto the video to let the conference participant know the call is being recorded. The two video streams are then mixed together and displayed. In the actual application the combined stream would be saved to disk instead of course and also audio captured and mixed.

If we ever get around to working on this feature is an open question, but at least we can now assume that it is likely to work. Of course getting one stream in over the network over RTP is very different from what this sample does, so that might uncover some bugs.

The sample also works with Python3, so even though it is only a prototype it already fulfils the GNOME Goal :)

import sys
from gi.repository import Gst
from gi.repository import GObject
GObject.threads_init()
Gst.init(None)

import os

class VideoBox():
   def __init__(self):
       mainloop = GObject.MainLoop()
       # Create transcoding pipeline
       self.pipeline = Gst.Pipeline()


       self.v4lsrc1 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc1.set_property("device", "/dev/video0")
       self.pipeline.add(self.v4lsrc1)

       self.v4lsrc2 = Gst.ElementFactory.make('v4l2src', None)
       self.v4lsrc2.set_property("device", "/dev/video1")
       self.pipeline.add(self.v4lsrc2)

       camera1caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter1 = Gst.ElementFactory.make("capsfilter", "filter1") 
       self.camerafilter1.set_property("caps", camera1caps)
       self.pipeline.add(self.camerafilter1)

       self.videoenc = Gst.ElementFactory.make("theoraenc", None)
       self.pipeline.add(self.videoenc)

       self.videodec = Gst.ElementFactory.make("theoradec", None)
       self.pipeline.add(self.videodec)

       self.videortppay = Gst.ElementFactory.make("rtptheorapay", None)
       self.pipeline.add(self.videortppay)

       self.videortpdepay = Gst.ElementFactory.make("rtptheoradepay", None)
       self.pipeline.add(self.videortpdepay)

       self.textoverlay = Gst.ElementFactory.make("textoverlay", None)
       self.textoverlay.set_property("text","Talk is being recorded")
       self.pipeline.add(self.textoverlay)

       camera2caps = Gst.Caps.from_string("video/x-raw, width=320,height=240")
       self.camerafilter2 = Gst.ElementFactory.make("capsfilter", "filter2") 
       self.camerafilter2.set_property("caps", camera2caps)
       self.pipeline.add(self.camerafilter2)

       self.videomixer = Gst.ElementFactory.make('videomixer', None)
       self.pipeline.add(self.videomixer)

       self.videobox1 = Gst.ElementFactory.make('videobox', None)
       self.videobox1.set_property("border-alpha",0)
       self.videobox1.set_property("top",0)
       self.videobox1.set_property("left",-320)
       self.pipeline.add(self.videobox1)

       self.videoformatconverter1 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter1)

       self.videoformatconverter2 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter2)

       self.videoformatconverter3 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter3)

       self.videoformatconverter4 = Gst.ElementFactory.make('videoconvert', None)
       self.pipeline.add(self.videoformatconverter4)

       self.xvimagesink = Gst.ElementFactory.make('xvimagesink',None)
       self.pipeline.add(self.xvimagesink)

       self.v4lsrc1.link(self.camerafilter1)
       self.camerafilter1.link(self.videoformatconverter1)
       self.videoformatconverter1.link(self.textoverlay)
       self.textoverlay.link(self.videobox1)
       self.videobox1.link(self.videomixer)

       self.v4lsrc2.link(self.camerafilter2)
       self.camerafilter2.link(self.videoformatconverter2)
       self.videoformatconverter2.link(self.videoenc)
       self.videoenc.link(self.videortppay)
       self.videortppay.link(self.videortpdepay)
       self.videortpdepay.link(self.videodec)
       self.videodec.link(self.videoformatconverter3)
       self.videoformatconverter3.link(self.videomixer)

       self.videomixer.link(self.videoformatconverter4)
       self.videoformatconverter4.link(self.xvimagesink)
       self.pipeline.set_state(Gst.State.PLAYING)
       mainloop.run()
   
if __name__ == "__main__":
    app = VideoBox()
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    exit_status = app.run(sys.argv)
    sys.exit(exit_status)

Python and libcheese, the simple way of dealing with camera devices

GStreamer does assembling advanced video application quite easy, in fact so easy that even I can write such an application in Python :) What I have had a lot more issues with is understanding how to deal with things like USB cameras and such. Well luckily the developers of Cheese realized this and created libcheese to help. libcheese is today used by Cheese itself of course, but also by Empathy for its camera handling.

Since I been thinking about adding some kind of video recording support in Transmageddon I wanted to test libcheese from Python. Unfortunately there was no Python examples available anywhere online, so I had write my own example :)
With some pointers from David King I managed to put the following python code together.

import sys
from gi.repository import Gtk
from gi.repository import Cheese
from gi.repository import Clutter
from gi.repository import Gst
Gst.init(None)
Clutter.init(sys.argv)

class VideoBox():
   def __init__(self):
    self.stage = Clutter.Stage()
    self.stage.set_size(400, 400)
    self.layout_manager = Clutter.BoxLayout()
    self.textures_box = Clutter.Actor(layout_manager=self.layout_manager)
    self.stage.add_actor(self.textures_box)

    self.video_texture = Clutter.Texture.new()

    self.video_texture.set_keep_aspect_ratio(True)
    self.video_texture.set_size(400,400)
    self.layout_manager.pack(self.video_texture, expand=False, x_fill=False, y_fill=False, x_align=Clutter.BoxAlignment.CENTER, y_align=Clutter.BoxAlignment.CENTER)

    self.camera = Cheese.Camera.new(self.video_texture, None, 100, 100)
    Cheese.Camera.setup(self.camera, None)
    Cheese.Camera.play(self.camera)

    def added(signal, data):
        uuid=data.get_uuid()
        node=data.get_device_node()
        print "uuid is " +str(uuid)
        print "node is " +str(node)
        self.camera.set_device_by_device_node(node)
        self.camera.switch_camera_device()

    device_monitor=Cheese.CameraDeviceMonitor.new()
    device_monitor.connect("added", added)
    device_monitor.coldplug()

    self.stage.show()
    Clutter.main()
if __name__ == "__main__":
    app = VideoBox()

The application creates a simple clutter window to host the stream from the webcam. So when you run the application it should display the video from the system webcam. Then if you plug a second webcam into a USB port it will switch the video feed to that stream. Not a very useful application in itself, but hopefully enough to get you started on using libcheese from Python. You can find the libcheese API docs here, they are for C, but Python API from Gobject Introspection follows it so close that you should be able to find the right calls. And remember for figuring out exact API names ipython is your friend :)

P.S. You need Cheese 3.6 installed to be able to use libcheese with Python, this version which will be in Fedora starting with Fedora 18.