Talks at FOSS.in 2012

Let me recap the talks held at FOSS.in a bit. It’s a bit late, I’m sorry for that, but the festive season was a bit demanding, timewise.

FOSS.IN

The conference started off smoothly with a nice Indian breakfast, coffee and good chats. The introductory talk by Atul went well and by far not as long as we expected it to be. Atul was obviously not as energetic as he used to be. I think he grew old and does visibly suffer from his illness. So a big round of applause and a bigger bucket of respect for pulling this event off nonetheless.

The first talk of the day was given by Gopal and he talked about “Big Data”. He started off with a definition and by claiming that what is considered to be big data now, is likely not to be considered big data in the future. We should think about 1GB RAM now in our laptops. Everybody ran 1GB or more in their laptops. But 10 years ago that would not have been the case. The only concept, he said, that survived was “Divide and Conquer”. That is to break up a problem into smaller sub problems which then can be run on many processing units in parallel. Hence distributed data and distributed processing was very important.

The prime example of big data was to calculate the count of unique items in a large set, i.e. compare the vocabulary of two books. You split up the books into words to find the single words and then count every one of them to find out how often it was present. You could also preprocess the words with a “stemming filter” to get rid of forms and flexions. If your data was big enough, “sort | uniq” wouldn’t do it, because “sort” would use up all your memory. To do it successfully anyway, you can split your data up, do the sorting and then merge the sort result. He was then explaining how to split up various operations and merge them together. Basically, it was important to split and merge every operation possible to scale well. And that was exactly what “Hadoop” does. In fact, it’s got several components that facilitate dealing with all that: “splitter”, “mapper”, “combiner”, “partitioner” , “shuffle fetch” and a “reducer”. However, getting data into Hadoop, was painful, he said.

Lydia from KDE talked about “Wikidata – The foundation to build your apps on“. She introduced her talk with a problem: “Which drugs are approved for pregnancy in the US?”. She said, that the Wikipedia couldn’t really answer this question easily, because maintaining such a list would be manual labour which is not really fascinating. One would have to walk through every article about a drug and try to find the information whether it was approved or not and then condense it to a list. She was aiming at, I guess, Wikipedia not really storing sematic data.

Wikidata wants to be similar to Wikimedia Commons, but for data of the world’s knowledge. It seems to that missing semantic storage which is also able to store information about the sources of the information that confirm correctness. Something like the GDP of a country or length of a river would be prime examples of use cases for Wikidata. Eventually this will increase the number of editors because the level to contribute will be lowered significantly. Also every Wikipedia language can profit immediately because it can be easily hooked up.

I just had a quick peek at Drepper’s workshop on C++11, because it was very packed. Surprisingly many people wanted to listen to what he had to say about the new C++. Since I was not really present I can’t really provide details on the contents.

Lenny talked about politics in Free Software projects. As the title was “Pushing Big Changes“, the talk revolved around issues around acquiring and convincing people to share your vision and have your project accepted by the general public. He claimed that the Internet is full of haters and that one needed a thick skin to survive the flames on the Internet. Very thick in fact.

An interesting point he made was, that connections matter. Like personal relationships with relevant people and being able to influence them. And he didn’t like it. That, and the talk in general, was interesting, because I haven’t really heard anyone talking about that so openly. Usually, everybody praises Free Software communities as being very open, egalitarian and what not. But not only rumour has it, that this is rarely the case. Anyway, The bigger part of the talk was quite systemd centric though and I don’t think it’s applicable to many other projects.

A somewhat unusual talk was given by Ben & Daniel, talking about how to really use Puppet. They do it at Mozilla at a very large scale and wanted to share some wisdom they gained.

They had a few points to make. Firstly: Do not store business data (as opposed to business logic) in Puppet modules. Secondly: Put data in “PuppetDB” or use “Hiera”. Thirdly: Reuse modules from either the “PuppetForge” or Github. About writing your own modules, they recommended to write generic enough code with parametrised classes to support many more configurations. Also, they want you to stick to the syntax style guide.

Sebastian from the KDE fame talked about KDE Plasma and how to make us succeed on mobile targets such as mobile phones or tablets. Me, not knowing “Plasma” at all, was interested to learn that Plasma was “a technology that makes it easy to build modern user interfaces”. He briefly mentioned some challenges such as running on multiple devices with or without touchscreens. He imagines the operating system to be provided by Mer and then run Plasma on top. He said that there was a range of devices that were supported at the moment. The developer story was also quite good with “Plasma Quick” and the Mer SDK.

He tried to have devices manufactured by Chinese companies and told some stories about the problems involved. One of them being that “Freedom” (probably as in Software Freedom) was not in their vocabulary. So getting free drivers was a difficult, if not impossible, task. Another issue was the size of orders, so you can’t demand anything with a order of a size of 10000 units, he said. But they seem to be able to pull it off anyway! I’m very eager to see their devices.

The last talk, which was the day’s keynote, went quite well and basically brought art and code together. He introduced us to Processing, some interesting programming IDE to produce mainly visual arts. He praised how Free Software (although he referred to it as Open Source) made everybody more creative and how the availability of art transformed the art landscape. It was interesting to see how he used computers to express his creativity and unfortunately, his time was up quite quickly.

Drepper, giving quite a few talks, also gave a talk about parallel programming. The genesis of problem was the introduction of multiple processors into a machine. It got worse when threads were introduced where they share the address space. It allowed for easy data sharing between threads but also made corrupting other threads very very easy. Also in subtle ways that you would not anticipate like that all threads share one working directory and if one thread changed it, it would be changed for all the threads of the process. Interestingly, he said that threads are not something that the end user shall use, but rather a tool for the system to exploit parallelism. The system shall provide better means for the user to use parallelism.

He praised Haskell for providing very good means for using threads. It is absolutely side effect free and even stateful stuff is modelled side effect free. So he claimed that it is a good research tool, but that it is not as efficient as C or C++. He also praised Futures (with OpenMP) where the user doesn’t have to care about the details about the whole threading but leave it up to the system. You only specify what can run in parallel and the system does it for you. Finally, he introduced into C++11 features that help using parallelism. There are various constructs in the language that make it easy to use futures, including anonymous functions and modelling thread dependencies. I didn’t like them all too much, but I think it’s cool that the language allows you to use these features.

There was another talk from Mozilla’s IT given by Shyam and he talked about DNSSec. He started with a nice introduction to DNSSec. It was a bit too much, I feel, but it’s a quite complicated topic so I appreciate all the efforts he made. The main point that I took away was to not push the DS too soon, because if you don’t have signed zones yet, resolvers don’t trust your answers and your domain is offline.

Olivier talked about GStreamer 1.0. He introduced into the GStreamer technology by telling that its concept is around elements, which are put in bins and that elements have source and sink pads that you connect. New challenges were DSPs, different processing units like GPUs. The new 1.0 included various new features better locking support that makes it easier for languages like Python or better memory management with GstBufferPool.

I couldn’t really follow the rest of the talks as I was giving one myself and was busy talking to people afterwards. It’s really amazing how interested people are and to see the angle they ask questions from.

Leave a Reply

Your email address will not be published. Required fields are marked *

Creative Commons Attribution-ShareAlike 3.0 Unported
This work by Muelli is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported.