Why Phonon is a broken wheel

So I attended the OSDL Desktop Summit in Mainz, Germany from Sunday to Tuesday evening this week. The goal of the meeting was to follow up on the tasks done at the earlier meeting in Portland and to look for new areas that could do with some work, multimedia being the one pulling me in.

The goal of myself and the GStreamer community in such a scenario is of course to advocate the use of GStreamer as the response free software can offer to advanced media frameworks on other platforms. We do believe we have the best and most extensive framework available and that with the work currently being done in the community this is not likely to change anytime soon.

In the discussion the approach taken by the Phonon abstraction layer which the Phonon project is advocating for inclusion in KDE4 also came up. I have held back blogging about Phonon for some time to avoid flamewars, but I don’t want to have efforts like OSDL delayed due to setups like Phonon being promoted or thought of as a workable solution for the issues faced. Let me start of with a brief introduction to the area of multimedia frameworks.

First of all multimedia frameworks are very complex systems, handling very hard technical problems in order to cater for issues ranging from problems from dealing with the analog past (for some technical information in this field I recommend fourcc.org), a forest of media formats with different traits, a host of features/deficiencies in hardware and a wide range of other software solutions to interact with like sound servers and legacy systems. On top of this you add performance and latency requirements, network protocols and multiplatform issues. In the end you have a problem space where even those who have worked on the issues for many years need to keep their head clear when designing the framework.

Multimedia frameworks are also by their nature abstraction layers themselves, trying to abstract away all the demuxers, muxers, decoders, encoders, cameras, soundcards, sound systems, network protocols and various types of filters into a coherent and userfriendly API.

With all this complexity the frameworks are struggling to not only bring you a coherent API, but also try to offer developers using them some high-level API’s that are useful for application developers. What we discovered in GStreamer is that while application developers initially ask for a ‘play this file’ API, which is what we offer through our ‘playbin’ component, they often end up pushing playbin to its limits and sometimes not use it at all in the end as they want to do fancy stuff which demands manipulating the pipelines more directly.
In many cases application authors want to do something which demands that a special plugin or filter to be written.

Now to get back to why I think Phonon is conceptually broken. First of all it is destined to fall into one of two traps. Either its API become so high level and limited that application developers will shun it due to a lack of features, meaning that you have an API useful for doing ‘ding’ sounds in standard applications, but anyone wanting more powerful operations will feel its a bigger hindrance than a help. On the other hand if they actually try to implement a feature set that is big enough to at least satisfy a subset, of for instance music player writers, then they will be forced into accessing things so deep in the frameworks that the operations become so framework specific that generalizing them away into a common API will at best be a kludge and at worst produce various broken behaviour changing depending on framework chosen.

Phonon also falls short in many other areas. For instance the stated goal is to let application writers write their applications against one API and have them work with a host of media frameworks. The reasoning is that no framework ‘does it all’ so having this flexibility is a good thing. This logic falls down quickly when you start thinking about it. While the general statement that no framework supports all formats or all features is true, the opposite, that a combination of multiple frameworks ‘do it all’, is equally untrue. An application developer who wants to add support for a new or rare audio format for instance would very often need to write the plugin or library to support it him/herself anyway, targeting an API with for instance 5 backends doesn’t make that job easier, it actually means that if you as an application developer want to ensure all your users can play this format he/she need to repeat the job five times.

So if you choose to standardize on one framework like GNOME has done with GStreamer then there is a chance that the application developer wants to support a format that GStreamer doesn’t currently support. But at least the developer have a clear idea where to add support for this format.

Also in a usability context telling the users to do hit and miss framework changes based on which media file they are trying to play is simply broken and I have a hard time believing that this is the user experience anyone wants to present users with for KDE4.

The counter argument to this is that Phonon do allow application developers to force or at least strongly suggest a specific backend for the user to use. This do solve the problem of the application developer knowing where to add support for something, but it also means that another stated goal of Phonon, to avoid enforcing such a ‘heavy’ dependency as GStreamer, might very well be replaced by enforcing 5 different mediaframeworks to be bundled with KDE4 as a whole. And if you think one framework is a heavy dependency then I promise you than five is not less. It also reduces the synergy effect of the KDE community a lot as it means that the work done by one music player author to add support for a new format will not be automatically available to the other KDE music players.

What we have been advocating for a long time from the GStreamer corner is that if both GNOME and KDE share a multimedia framework the synergy effect for both desktops will be huge.

My final objection to Phonon is that even if they manage to prove me wrong on their ability to provide a truly useful limited cross framework API and demonstrates that having a menu option offering your grandma to play her music using framework X,Y or Z actually solves more problems that it creates, I still think that it falls short. Because it wouldn’t provide an API to do applications like Pitivi, Diva, Jokosker, Buzztard, Flumotion and so on which I think is where we want to be at today in order to provide a competitive desktop. MacOS X and Windows Vista are showing us that this is the role that the desktop is heading towards.

One scenario I know have been contemplated is using Phonon, but at the same time saying that GStreamer is the recommended framework if you want to do something outside of the scope of Phonon. But my opinion is that in this use case focusing on Qt-style bindings for GStreamer is a better solution and a much easier thing to do and would result in something more useful for developers and users alike.

So I hope that interested people in the KDE community agrees with my analysis and starts working on Qt-style bindings for GStreamer, and as a result Phonon falls by the wayside. If not, well hopefully we will be able to cooperate on some of the lower level issues in the desktop, like improved driver handling through HAL for instance as the minimum.

All this said, people from the GStreamer community will of course try to help out people developing the GStreamer phonon backend for instance. We do try our best to try to help anyone using GStreamer, even when they do something we don’t believe in the viability or direction of. Zaheer Merali for instance has already volunteered to mentor or co-mentor anyone interested in working on Phonon-GStreamer integration as part of the Google Summer of code and as far as I know there where multiple proposals submitted for that.


#1 Anonymous troll on 05.13.06 at 23:14

IMHO why worry about whatever KDE selects? KDE is a dead desktop anyway

#2 Mike Hearn on 05.14.06 at 02:55

Am I the only one who finds it hilarious that “ABI stability” is being touted as a plus for Phonon? Hello – Phonon is going to be a C++ library and the binary compatibility issues surrounding C++ are so complicated that anybody who goes anywhere near them quickly. gets their asses bitten.

From the perspective of somebody who needs a stable ABI because they are distributing software in binary form, the C/GObject based GStreamer API wins, every time, hands down.

The main problem with GStreamer right now is that it can be hard to tell what elements will actually be available … the gst-plugins-good/bad/ugly is a good idea but distributors seem to routinely *change* their contents, so in practice an application that dlopens libgstreamer will actually have no clue what the framework on the users system can do! This problem affects any media framework of course. Also it’s not really clear what happens if a program wishes to redistribute the Fluendo MP3 plugin – right now people have to register to get it, so I guess you’d have to work through the licensing with Fluendo then find a way to temporarily register an element shipped privately with the app (and manage conflicts).

thanks -mike

#3 Mike Hearn on 05.14.06 at 03:02

(that’s “the main problem with respect to binary compatibility” obviously … I don’t know what multimedia app developers would identify as the main problem :)

It’d be nice if GStreamer went “1.0” at some point, the current status makes it look like a 0.12 will appear at some point.

#4 Georgi Chulkov on 05.14.06 at 09:31

I welcome the idea of having the choice to switch frameworks easily and this is the goal behind Phonon. Besides, consider the following situation: GStreamer supports formats A, B, and C. Xine supports B, C, and D. If Phonon allows me to choose a different back-end per format, then I can support all four without any extra effort (of learning a new API and doing Phonon’s integration work myself)!

On a broader note, there’s a reason why today we have so many application choices for some taks. KDE vs Gnome, Firefox vs Konqueror, amoong others. I’m reasonably confident that GStreamer will not go away, but it may make some major choices that I do not agree with. (I am intentinally unspecific here. If I were to mention where Gnome, or Firefox, or whatever other app went wrong for me personally, suddenly it would be a post about *that*, and not the big picture.)

Let KDE go in a conceptually different way with Phonon from GNOME with GStreamer. Let’s see how each turns out. Then, we can all chose and use what we like better. Diversity is very important, as any problems one of the two communities may face will warn the other, and in the end improve both. If we stick to one philosophy, we’re autimatically ditching all those potential advantages of the other, which we would never discover.

I love choice. I also love the way Amarok will let me choose Xine or GStreamer, and would love to see this idea implemented globally within KDE.

#5 Bongani Hlope on 05.14.06 at 18:00

This thread is just silly. Phonono is an API to be used by KDE applications for their multimedia needs using different MM backends. GStreamer is a MM backend. If it is wrong for writing an API to use for different bacends, then lets get rid of glibc and concentrate on the API provided by BSD or Linux.

This whole post use just useless arguments about a MM backend writer thinkin that abstracting his backend will hurt his project… For KDE would like to “use your backend” but they don’t like the API you expose, is it wrong for them to “use you backend”, but having an API wrapper? Stop being childish, no one is taking your candy yet.

If at the end you backend sucks, fix it or it can easily be replaced. All you need to do is look at what developers are unhappy about, then try to work on a solution. Whining that you backend will be abstracted away (NOTE: Phonono is not even attempting to replace you) is just plain being silly.

#6 Heitzso on 05.15.06 at 20:09

I prefer gnome simplicity over kde eye-candy. That said, I understand some folks prefer kde eye-candy over gnome simplicity. That’s just fine. People are diverse. That’s a good thing. But I do, personally, prefer gnome, so …

If KDE wants to shoot themselves in the foot with phonon the same way they did with arts, hey, let them. Won’t jiggle my gnome desktop. phonon will have to eventually reinvent the complexity of gstreamer. and, like arts, won’t have the developers to pull it off. So, hey, go ahead with phonom.

Then, 5 years from now, we’ll have a brand shiny new kde proposal for a replacement for unmaintained phonon, the same as we have a brand shiny new kde proposal for a replacement for arts. Heaven forbid they should embrace somebody else’s extensive library.

#7 Frank Earl on 05.16.06 at 03:37

Re: Everyone’s comments… Abstraction layers are a necesary evil. You use only as much as you need to use. My largest concern with Phonon, while it’s a “nifty” idea, is that it’s almost too much abstraction just for the sake of making KDE’s multimedia apps not depend on a specific API or to “reinvent the wheel” on hooking into playback engines. It’s all well and good to say this, but how many media players do you need, really? How is Phonon going to provide help to a developer writing a game using the Miles sound engine? Unless it’s the baseline sound API for all of Linux (and not just for KDE), it’s NOT going to help them. Same goes for OpenAL and SDL apps as well. In fact, it’s going to probably conflict (Because engines like ARTS and OSD take over the sound card interface in a way that blocks anything except a caller to their APIs- and unless they’re “the thing” in sound/media access for Linux and considered part of the baseline, they’re not going to get a Miles or OpenAL hook for them included in any games to speak of.) and now you’ve actually jarred the user experience.

Here’s the reality of things. What Linux needs is a standard way of people being able to capture sound to a raw stream (sort of done…), capture video to a raw stream (not really done- I should be able to use ieee-1395 video sources as well as web cams and capture cards identically with no special abstraction APIs. It’s video, for goodness sake!), feed multiple sound sources from multiple apps transparently to the sound card (ALSA sort of does this, but it interferes with JACK and a few other solutions…), and stream raw video frames in an accelerated maner to a pane on the screen (Also sort of done… Too many twisted ways to accomplish this to count- but all of them are available after a fashion). Anything else is playback engines doing the decoding, index search, etc. I could care less about that, so long as the above is taken care of so that I can mix and match all my choices of things like GStreamer, etc. without anything stomping on the others.

#8 Bongani Hlope on 05.16.06 at 15:38

@Frank Earl

It looks like no one bothered to read the motivation of Phonon, if you want to write games and media reach application you _must_ use a multimedia backend of your choice.

Why should sound notifications, presentations etc. developers know about how GStreamer works? Abstract it away and give them a simple API to play sound.


and then post your comments

#9 Ari Tilli on 06.05.06 at 08:40

“If KDE wants to shoot themselves in the foot with phonon the same way they did with arts, hey, let them. Won’t jiggle my gnome desktop..”

So why are U then posting. The thread is so silly anyway..
A Person: “I have developed XXXX, it is sooo good, use it !!!”
Others: “Well, we’ll use it, but write an abstraction layer.”
A Person: “U fools, XXXX is soooooooo good.”
Others: “Yes yes,we will use it , and give users also other choices.”
A Person & funboys: “Noooooo no choice , ours is sooooooo good…. Don’t let the stupid users to choose…… Choice is bad !! Users are studpiiiiiiid”

..and it silly since all good apps use jackd anyway
It is sooooooo goooooooooood !!!!!!! :):)