So I attended the OSDL Desktop Summit in Mainz, Germany from Sunday to Tuesday evening this week. The goal of the meeting was to follow up on the tasks done at the earlier meeting in Portland and to look for new areas that could do with some work, multimedia being the one pulling me in.
The goal of myself and the GStreamer community in such a scenario is of course to advocate the use of GStreamer as the response free software can offer to advanced media frameworks on other platforms. We do believe we have the best and most extensive framework available and that with the work currently being done in the community this is not likely to change anytime soon.
In the discussion the approach taken by the Phonon abstraction layer which the Phonon project is advocating for inclusion in KDE4 also came up. I have held back blogging about Phonon for some time to avoid flamewars, but I don’t want to have efforts like OSDL delayed due to setups like Phonon being promoted or thought of as a workable solution for the issues faced. Let me start of with a brief introduction to the area of multimedia frameworks.
First of all multimedia frameworks are very complex systems, handling very hard technical problems in order to cater for issues ranging from problems from dealing with the analog past (for some technical information in this field I recommend fourcc.org), a forest of media formats with different traits, a host of features/deficiencies in hardware and a wide range of other software solutions to interact with like sound servers and legacy systems. On top of this you add performance and latency requirements, network protocols and multiplatform issues. In the end you have a problem space where even those who have worked on the issues for many years need to keep their head clear when designing the framework.
Multimedia frameworks are also by their nature abstraction layers themselves, trying to abstract away all the demuxers, muxers, decoders, encoders, cameras, soundcards, sound systems, network protocols and various types of filters into a coherent and userfriendly API.
With all this complexity the frameworks are struggling to not only bring you a coherent API, but also try to offer developers using them some high-level API’s that are useful for application developers. What we discovered in GStreamer is that while application developers initially ask for a ‘play this file’ API, which is what we offer through our ‘playbin’ component, they often end up pushing playbin to its limits and sometimes not use it at all in the end as they want to do fancy stuff which demands manipulating the pipelines more directly.
In many cases application authors want to do something which demands that a special plugin or filter to be written.
Now to get back to why I think Phonon is conceptually broken. First of all it is destined to fall into one of two traps. Either its API become so high level and limited that application developers will shun it due to a lack of features, meaning that you have an API useful for doing ‘ding’ sounds in standard applications, but anyone wanting more powerful operations will feel its a bigger hindrance than a help. On the other hand if they actually try to implement a feature set that is big enough to at least satisfy a subset, of for instance music player writers, then they will be forced into accessing things so deep in the frameworks that the operations become so framework specific that generalizing them away into a common API will at best be a kludge and at worst produce various broken behaviour changing depending on framework chosen.
Phonon also falls short in many other areas. For instance the stated goal is to let application writers write their applications against one API and have them work with a host of media frameworks. The reasoning is that no framework ‘does it all’ so having this flexibility is a good thing. This logic falls down quickly when you start thinking about it. While the general statement that no framework supports all formats or all features is true, the opposite, that a combination of multiple frameworks ‘do it all’, is equally untrue. An application developer who wants to add support for a new or rare audio format for instance would very often need to write the plugin or library to support it him/herself anyway, targeting an API with for instance 5 backends doesn’t make that job easier, it actually means that if you as an application developer want to ensure all your users can play this format he/she need to repeat the job five times.
So if you choose to standardize on one framework like GNOME has done with GStreamer then there is a chance that the application developer wants to support a format that GStreamer doesn’t currently support. But at least the developer have a clear idea where to add support for this format.
Also in a usability context telling the users to do hit and miss framework changes based on which media file they are trying to play is simply broken and I have a hard time believing that this is the user experience anyone wants to present users with for KDE4.
The counter argument to this is that Phonon do allow application developers to force or at least strongly suggest a specific backend for the user to use. This do solve the problem of the application developer knowing where to add support for something, but it also means that another stated goal of Phonon, to avoid enforcing such a ‘heavy’ dependency as GStreamer, might very well be replaced by enforcing 5 different mediaframeworks to be bundled with KDE4 as a whole. And if you think one framework is a heavy dependency then I promise you than five is not less. It also reduces the synergy effect of the KDE community a lot as it means that the work done by one music player author to add support for a new format will not be automatically available to the other KDE music players.
What we have been advocating for a long time from the GStreamer corner is that if both GNOME and KDE share a multimedia framework the synergy effect for both desktops will be huge.
My final objection to Phonon is that even if they manage to prove me wrong on their ability to provide a truly useful limited cross framework API and demonstrates that having a menu option offering your grandma to play her music using framework X,Y or Z actually solves more problems that it creates, I still think that it falls short. Because it wouldn’t provide an API to do applications like Pitivi, Diva, Jokosker, Buzztard, Flumotion and so on which I think is where we want to be at today in order to provide a competitive desktop. MacOS X and Windows Vista are showing us that this is the role that the desktop is heading towards.
One scenario I know have been contemplated is using Phonon, but at the same time saying that GStreamer is the recommended framework if you want to do something outside of the scope of Phonon. But my opinion is that in this use case focusing on Qt-style bindings for GStreamer is a better solution and a much easier thing to do and would result in something more useful for developers and users alike.
So I hope that interested people in the KDE community agrees with my analysis and starts working on Qt-style bindings for GStreamer, and as a result Phonon falls by the wayside. If not, well hopefully we will be able to cooperate on some of the lower level issues in the desktop, like improved driver handling through HAL for instance as the minimum.
All this said, people from the GStreamer community will of course try to help out people developing the GStreamer phonon backend for instance. We do try our best to try to help anyone using GStreamer, even when they do something we don’t believe in the viability or direction of. Zaheer Merali for instance has already volunteered to mentor or co-mentor anyone interested in working on Phonon-GStreamer integration as part of the Google Summer of code and as far as I know there where multiple proposals submitted for that.