The value of open source (for customers)

freesoftware, General, openstack, red hat, work 1 Comment

Over at community.redhat.com, the Red Hat community blog, I have posted an article detailing some of the value I see to customers of companies who support and build on free software. The article is basically notes from a presentation I will be giving next Wednesday at the Red Hat Summit, “Community Catalysts: The Value of Open Source Community Development”. The problem statement?

It’s not always obvious, however, what the value of that is to our customers. The four freedoms of the free software definition which personify open source software – the freedom to use, study, modify and share modified copies of the software – at first glance appear to benefit only participants in open source communities. If you are a customer of a company like Red Hat, does it really matter that you have access to the source code, or that you can share the software with others? Aren’t customers, in some sense, paying us to “just take care of all of that stuff?”

This line of thought is not original, but it’s one I’ve had for a long time – and others such as Simon Phipps have given voice to similar insights in the past. Hopefully I can give it a fresh treatment for Red Hat Summit attendees next week!

Joining Red Hat

community, freesoftware, work 30 Comments

I’m joining Red Hat on May 2nd, where I’ll be working with the Open Source and Standards team. We’ll be working hard to help all of the projects that Red Hat contributes to kick ass at growing community. I have known several of the team from years gone past, and interviewing for the position was frankly a pleasure.

I’d like to thank Karsten Wade for thinking of me and making the connections back in February. When my future boss described the position and the team to me around then, and asked me whether I might be interested, I couldn’t help myself from saying “I think you just described my dream job”. I know you’re supposed to show restraint and play hard to get in these situations, but I got carried away.

Red Hat is one of the few companies out there that could tempt me away from independent consulting. They have a range of projects covering the server, desktop, middleware, cloud services and virtualisation. They are the top, or one of the top, corporate contributors to dozens of projects I use every day. I love the Red Hat philosophy of working with communities to make great Free and Open Source software.

Of course, that doesn’t mean that everything is roses. There are projects within Red Hat (or that Red Hat contributes heavily to) which need to improve their community processes, that could do a better job of promoting themselves, or that have hung on to former business models post-acquisition, at the expense of community growth. And it’ll be our job to fix those issues. It’ll be challenging, it’ll be a slow, incremental process. But I have no doubt that it will be very rewarding. I’m looking forward to it!

Connecting Apache httpd to Tomcat with mod_jk: The bare minimum

freesoftware, General, work 3 Comments

Earlier this week, I wrote:

I hate docs that tell you what to do, but not why. As soon as a package name or path changes, you’re dust. This is maybe the 4th time I’ve been configuring Apache to delegate stuff to Tomcat using mod_jk, and every time is just like the first.

For those who don’t know, mod_jk is a module inplementing the wire protocol AJP/13, which allows a normal HTTP web server to forward on certain requests to a second server. In this case, we want to forward requests for JSP pages and servlets to Tomcat 6. This allows you to do neat things like serve static content with Apache and only forward on the dynamic Java stuff to Tomcat. The user sees a convenient URL (no port :8080 on the hostname) and the administrator gets to serve multiple web scripting languages on the same server, or load balance requests for Java server resources across several hosts.

I have spent enough time on it at this point, I think, that I understand all of the steps in the process, and have stripped it down to the bare minimum that one would need to do in terms of configuration to get things working. And so I’m putting my money where my mouth is, and this is my attempt to write a nice explanation of how mod_jk does its thing, and how to avoid some of the common mistakes I had.

First, a remark: Apache is one of those pieces of software that has gotten harder, rather than easier, to configure as time has gone on. Distributions each package it differently, with different “helpful” mechanisms that make common tasks like enabling a module easier, and to enable convenient packaging of modules like PHP, independent of the core package. But the overall effect is that a lot of magic done by distributions makes it much harder to follow the upstream documentation. Config files are called different names, or stored in different places. Different distributions handle the inclusion of config file snippets differently. And so on.

This is not to say that Apache, Tomcat and mod_jk don’t have some nice docs – they do, but often the docs don’t correspond to the distros, or haven’t been updated in a while, and often they don’t explain why you have to do something, putting emphasis instead on what you need to do. After all my reading, I finally found the Holy Grail I was looking for – the simple document of how to configure mod_jk – but even this has its shortcomings. The article doesn’t mention Tomcat, for example, which left me digging around for information on the configuration I needed to do to Tomcat, which led me to this, which led me to over-write the sample workers.properties file in the simple set-up document.

But if you understand the First Principles, you can figure out what’s going on with any organisation of configuration. That’s what I’m hoping to get across here.

How does mod_jk do its thing?

The first issue I had trouble getting my head around was how, exactly, all this was supposed to work. In particular, I didn’t quite understand how the configuration worked on the Tomcat size of things.

As I understand it, here’s what happens:

  • A GET request comes in to httpd for http://localhost/examples/jsp/num/numguess.jsp
  • Apache processes the request, and find a matching pattern for the URL among JkMount directives
  • Apache then reads the file specified by the JkWorkersFile option to figure out what to do with the request. Let’s say that config file says to forward to localhost:8009 using the protocol ajp13
  • Tomcat has a Connector listening on port 8009, with the protocol AJP/13, which handles the request and replies on the wire. Apache httpd sends the reply back to the client

Apache httpd configuration

There are two steps to configuring Apache:

  1. Enabling the module
  2. Configuring mod_jk

Apache provides a handy utility called “a2enmod” which will enable a module for you, once it’s installed. What happens behind the scenes for modules depends on the distribution. On Ubuntu, module load instructions are put in a file called /etc/apache2/mods-available/<module>.load optionally alongside a sample configuration file /etc/apache2/mods-available/<module>.conf. To enable the module, you create a symlink to the .load file in /etc/apache2/mods-enabled.

On my Ubuntu laptop, my jk.load contains:

LoadModule jk_module /usr/lib/apache2/modules/mod_jk.so

On OpenSuse, on the other hand, a line similar to this is explicitly added to the file /etc/apache2/sysconfig.d/loadmodule by sysconfig, based on the contents of a field in the configuration file /etc/sysconfig/apache2 – remember how I said that distro packaging makes things harder? If you added the line directly to the loadmodule file, the change would be lost the next time Apache restarts.

In both cases, these files (on Ubuntu, the mods-available/*.load files, and on OpenSuse the sysconfig.d/* files) are loaded by the main Apache config file (httpd.conf) at start-up.

Configuring mod_jk

The minimum configuration that mod_jk needs is a pointer to a Workers definition config file (JkWorkersFile). Other useful configuration options are a path to a log file (JkLogFile – which should be writable by the user ID which owns the httpd process) and a desired log level – I set JkLogLevel to “debug” while getting things set up. On OpenSuse, I also needed to set JkShmFile, since for the default file location (/srv/www/logs/jk-runtime-status) the directory didn’t exist and wasn’t writable by wwwrun, the user that owns the httpd process.

This configuration, and the configuration of paths below, is usually in a separate config file – in both Ubuntu and OpenSuse, it’s jk.conf in /etc/apache2/conf.d (files ending in .conf in this directory are automatically parsed at start-up). To avoid errors in the case where mod_jk is not present or loaded, you can surround all Jk directives with an “<IfModule mod_jk.c>…</IfModule>” check if you’d like.

The JkMount directive configures what will get handled by which worker (more on workers later). It takes two arguments: a path, and the name of the worker to handle requests matching the path. Unix wildcards (globs) are accepted, so

JkMount /examples/*.jsp ajp13_worker

will match all files under /examples ending in .jsp and will pass them off to the ajp13_worker worker.

If you want Apache to serve any static content under your webapps, you’ll also need either a Directory or Alias entry to handle them. Putting together with the previous section, the following (from Ubuntu) was the jk.conf file I used to pass the handling of JSPs and servlets off to Tomcat, and serves static stuff through Apache:

<IfModule mod_jk.c>

JkWorkersFile   /etc/libapache2-mod-jk/workers.properties
JkLogFile       /var/log/apache2/mod_jk.log
JkLogLevel      debug
Alias /examples /usr/share/tomcat6-examples/examples
JkMount /examples/*.jsp ajp13_worker
JkMount /examples/servlets/* ajp13_worker

</IfModule>

I should use Directory to prevent Apache from serving anything it shouldn’t, like Tomcat config files under WEB-INF – I could also just use “JkMount /examples/* ajp13_worker” to have everything handled by Tomcat.

Now that Apache’s config is done, we need to configure mod_jk itself, via the workers.properties file we set in the JkWorkersFile parameter.

workers.properties

Sample workers.properties files contain a lot of stuff you probably don’t need. The basic, unavoidable parameters you will need are the name of a worker (which you’ve already used as the 2nd argument for JkMount above), and a hostname and port to send requests to, and a protocol type (there are several options for worker type besides AJP/1.3 – “lb” for “load balancer” is the most important to read up on). For the above jk.conf, the simplest possible workers.properties file is:

worker.list=ajp13_worker
worker.ajp13_worker.port=8009
worker.ajp13_worker.host=localhost
worker.ajp13_worker.type=ajp13

And that’s it! The last step is to set up Tomcat to handle AJP 1.3 requests on port 8009.

Configuring Tomcat

In principle, Tomcat doesn’t need to know anything about mod_jk.It just needs to know that requests are coming in on a given port, with a given protocol.

Typically, an AJP 1.3 connector is already defined in te default server.xml (in /usr/tomcat6 on both Ubuntu and OpenSuse) when you install Tomcat. The format of the connector configuration is:

<Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″ />

I am pretty sure that this will work without the redirectPort option, but I haven’t tried it. It basically allows requests received with security constraints specifying encryption to be handled over SSL, rather than unencrypted.

In addition to this, Tomcat does provide a facility to auto-create the appropriate mod_jk configuration on the fly. To do so, you need to specify an ApacheConfig in the Tomcat connector, and point it at the workers.properties file. This facility looks pretty straightforward, but I know I found it confusing in the past when I lost edits to the jk.conf file – I prefer manual configuration myself.

Gotchas

I have had quite a few gotchas while figuring all this out – I may as well share for the benefit of future people having the same problems.

  • All the documentation for mod_jk installedd with the packages refers to Tomcat5 paths – for example, on OpenSuse, in the readme, I was asked to copy workers.config into /etc/tomcat5/base – a directory which doesn’t exist (even when you change the 5 to a 6)
  • If your apache web server uses virtual hosts (and, on Ubuntu, it does by default) then JkMounts are not picked up from the global configuration file! You need to either add “JkMountCopy true” to the VirtualHost section, or have JkMounts per VirtualHost. If you used Alias as I did above, and you try to run a servlet, the error message is just a 404. If you try to load a JSP, you will see the source.
  • If you make a mistake in your workers.property file (I had a typo “workers.list=ajp13_worker” for several hours) and your worker name is not found in a “worker.list” entry, you will see no error message at all with warnings set to error or info. With the warning level set to debug, you will see the error message “jk_translate::mod_jk.c (3542): no match for /examples/ found” The chances are you have a typo in either your jk.conf file (check that the name of the worker corresponds to the name you use in workers.properties), or you have a typo somewhere in your workers.properties file (is it really work.list? Does the worker name match? Is it the same as the worker name in the .host, .port and .type configuration?
  • Make sure you get Tomcat working correctly first and working perfectly on port 8080 – or you won’t know whether errors you’re seeing are Tomcat errors, Apache errors or mod_jk errors.

I’m sure I’ve made mistakes and forgotten important stuff – I’m happy to get feedback in the comments.

Community Software Development training course

community, work Comments Off

For the past few months, I have been offering a new service – a training course tailored to helping a team be effective working with community projects – whether that is engaging an existing community, or growing a community around new code. Details of the topics I cover are up on my site. Developing software in community is as much a social activity as it is a technical activity – and engaging an existing community, like moving into a new neighbourhood or starting at a new school, can be very daunting indeed. This course covers not just the technical issues of community development, but also the social, management and strategic issues involved. Some of the questions that I try to help answer are:

  • What are the tools and communication norms?
  • How can I get answers to my questions?
  • Is there a trick to writing patches that get reviewed quickly?
  • How do I figure out who’s in charge?
  • How much will it cost me to open source some code/to work with an existing project?
  • How does managing volunteers work?
  • Is there anything I can do to help my developers be more vocal upstream?
  • What legal issues should my developers be aware of?

All of these things, in my experience, are challenges that organisations have to overcome when they start engaging with community projects like Apache, GNOME or the Linux kernel.

If you’re having trouble with these issues, or some subset of them, and are interested in a training seminar, please contact me, and we’ll talk.

What community?

community, General, gnome, maemo, work 5 Comments

With the announcement of Tizen (pronounced, I learned, tie-zen, not tea-zen or tizz-en) recently, I headed over to the website to find out who the project was aimed at. I read this on the “Community” page:

The Tizen community is made up of all of the people who collectively work on or with Tizen:

  • Product contributors: kernel/distribution developers, release managers, quality assurance, localization, etc.
  • Application developers: people who write applications to run on top of Tizen
  • Users: people who run Tizen on their device and provide feedback
  • Vendors: companies who create products based on Tizen
  • Other contributors: promotion, documentation, and much more

Anyone can contribute by:

  • Submitting patches
  • Filing bugs
  • Developing applications
  • Helping with wiki documentation
  • Participating in other community efforts and programs

Wow! That’s a diverse target audience, and a very wide ranging list of ways you can help out. But is it really helpful to scope the project so wide, and try to cater to such a wide range of use-cases from the start? And is the project at a stage where it even makes sense to advertise itself to some of these different types of users?

I have talked about the different meanings of “maintainer” before, depending on whether you’re maintaining a code project or are a package maintainer for a distribution. I have also talked about the different types of community that build up around a project, and how each of them needs their own identity – particularly in the context of the MeeGo trademark. I particularly like Simon Phipps’s analysis of the four community types as a way to clarify what you’re talking about.

For Tizen, I see between three and five different types of community, each with different needs, and each of which can form at different stages in the life-cycle of the project. Trying to “sell” the project to one type of community before the project is ready for them will result in disappointment and frustration all round – managing the expectations of people approaching Tizen will be vital to its long-term success, even if it opens you up to short-term criticism. Unless each of these communities is targeted individually and separately, and at the right time, I am sceptical about the results.

“Upstream” software developers

The first and most identifiably “Open Source” family of communities will be the software developers working on components and applications which will end up in the core of Tizen. For the most part, these communities exist already, and Samsung and Intel engineers are working with them. These are the projects we commonly call “upstreams” – projects you don’t control, but from whom code flows into your product.

In other cases, code will originate from Intel and/or Samsung. In the same way that Buteo, oFono and the various applications which were developed for the MeeGo Netbook UX were very closely associated with MeeGo, there will be similar projects (sometimes the same projects) which will have a close association with Tizen. Each of these projects will have their own personality, their own maintainers, roadmaps, specs – and each of them should have their own identity, and space to collaborate and communicate.

Communities form around programming projects not because of the code, but because of a shared vision and values. Each project will attract different people – the people who are interested in metadata and search are not the same as the people who will be passionate about system-wide contact integration. Each project needs its own web space, maintainers, bug tracker, mailing list, and wiki space. Of course, many projects can share the same infrastructure, and a lot of the same community processes (for things like code governance), and for projects closely related to Tizen, we can provide common space to help create a Tizen developer community in the same way there’s a GNOME developer community. But each community around each component will have its own personality and will need its own space.

At the level of Tizen, we could start with an architecture diagram, perhaps – and for each component on the architecture diagram, link to the project’s home page – many of the links will point to places like kernel.org, gnome.org, freedesktop.org and so on. For Tizen-specific projects, there could be a link to the project home page, with a list of stuff that needs to be done before the component is “ready”.

Core platform packagers, testers, integrators

Once we have a set of components which are working well together, we get to the heart of what I think will be Tizen’s early activity – bringing those components together into a cohesive whole. Tizen will be, basically, a set of distributions aimed at different form factors. And the deliverable in a distribution is not code or a Git tag, it’s a complete, integrated stack.

The engineering skills, resources and processes required to integrate a distribution are different to those of a code project. Making a great integrated Linux platform is obviously difficult – otherwise Red Hat would not be making money, and Ubuntu would not have had the opportunity to capture so much mind-share. Both Red Hat and Canonical do something right which others failed at before them.

Distributions attract a different type of contributor than code projects, and need a different set of tools and infrastructure to allow people to collaborate.At the distribution level, it is more likely you will be debating whether or not to integrate a particular package or its competitor than it is to debate whether to implement a feature in a specific package. Of course, it is possible to influence upstream projects to get specific features implemented, not least by providing developer resources, and there will be a need for some ambassadors to bridge the gap to upstream projects. And it is possible for a distribution to carry patches to upstream packages if that community disagrees. But in general, not much code gets written in distributions.

What the distro community needs and expects is infrastructure for continuous integration, bug tracking software, a way to submit and build software packages, good release engineering, an easy way to find out what packages need a maintainer (see Debian’s WNPP list  or Ubuntu’s “need-packaging” list for examples) and a way to influence what packages or features are included in future releases (see Fedora or Ubuntu for examples). They also want tools to allow packaging, testing and  deploying the integrated distribution – for an embedded distro, that might mean an emulator and an image creator, perhaps.

Vendors and carriers

Communities of companies are worth a special mention. Companies have very different ways of working together and agreeing on things than communities of individuals. I was tempted to just roll vendors into the “Platform integrators” community type, but they are sufficiently different to be considered another type of community. Vendors have different constraints and motivations than individual contributors to the platform, and we should be aware of those.

Vendors like to have a business relationship – some written agreement that shows where everyone stands. They have a direct relationship with people who buy their hardware, and have an interest (potentially in conflict with other communities) in owning the user relationship – through branded application stores, UI and support forums, for example. And since vendors are typically working on hardware development in parallel with software development, they care a lot about a reliable release schedule and quality level from the stack. Something that companies care about which individuals usually don’t are legal concerns around working with the process – do they have patent rights to the code they ship? Are they giving up any of their own potential patent claims?

3rd party application developers

Application developers don’t care, in general, whether the platform is open source or closed, or developed collaboratively or by one party (witness the popularity of Android and iOS with application developers). What they do care about are developer tools, documentation, and the ability to share their work with device users and other application developers. Some application developers will want to develop their applications as free software, and it is possible to enable that, but I think the most important thing for application developers is that it’s easy to do things with your platform, that there are good tools for developing, testing and deploying your application, that your platforms APIs are enabling the developer to do what he wants, and that you are providing a channel for those developers to get their apps to users of your platform.

An application developer doesn’t want to have to ship his software to 5 different app stores on every release – in contrast to vendors, he would like a single channel to his market. Other things he cares about are being able to form a relationship with his users – so app stores need to be social, allow user ratings and comments, and allow the author to interact with his users. Clear terms of engagement are vital here too – especially for commercial application developers. And application developers are also another type of community – they will want to share tips and tricks, code, and their thoughts on the project leaders in some kind of app developer knowledge base.

Device users

There is another potential community which I should mention, and that is users of your platform – typically, these will be users of devices running your platform. It should be possible for engaged users to share information, opinions, tips & tricks, and interesting hacks among each other. It should also be possible to rate and recommend applications easily – this is in the interests of both your user community and your application developer ecosystem.

OK, so what?

Each of these community types is different, and they don’t mix well. They mature at different rates. There is no point in trying to build a user platform until there are devices running your platform on the market, for example

So each type of community needs a separate space to work. There is no point in catering to a 3rd party application developer until you have developer tools and a platform for him to develop against. Vendors will commit to products when they see a viable integrated platform. And so on.

What is vital is to be very clear, for each type of community, what the rules of engagement are. As an example, one company can control the integration of a platform and the development of many of its components (as is the case for Android) and everyone is relatively happy, because they know where they stand and what they’re getting into. But if you advertise as an open and transparent project, and a small group of people announce the decisions of what components are included or excluded from the stack (as was the case in MeeGo), then in spite of being vastly more open, people who have engaged with the project will end up unhappy, because of a mismatch between the message and the practice in the project.

So what about Tizen? I think it is a mistake to announce the projects as a place to “submit patches, report bugs and develop applications” when there is no identifiable code base, no platform to try, and no published SDK to develop against. By announcing that Tizen is an Open Source platform, Intel and Samsung have set an expectation for people – and these are people who have gone through the move to MeeGo under two years ago, and who have seen Nokia drop the project earlier this year. If they are disappointed by the project’s beginnings because the expectations around the project have been set wrong from the offset, it could take a long time to recover.

Personally, I would start low-key by announcing an architecture diagram and concentrating on code and features that need writing, then ramp up the integrator community with some alpha images and tools to allow people to roll their own; finally, when the platform stabilises roll out the developer SDK and app store and start building up an application developer community. But by aiming too big with the messaging, Tizen runs the risk of scaring some people away early. Time will tell.

 

Humanitarian Software – Using technology to help humanity

community, freesoftware, marketing, work Comments Off

Tomorrow, Friday September 23rd, the Humanitarian FOSS track at the Open World Forum will bring together leaders from some of the most important humanitarian software projects and case studies of the impact these projects are having on people’s lives around the world. I’m happy to have been allowed to chair the track, and I am humbled by the quality of the presenters and the impact that their work is having.

In addition to the Humanitarian track, we are also honoured to have Laura Walker Hudson from FrontlineSMS give a keynote presentation on the overarching theme of “Humanitarian FOSS – serving humanity” in the main auditorium at 17:15. Laura will give an overview of the myriad ways that free and open source software is saving and helping people’s lives.

The Humanitarian track will have two core themes:

  • Crisis Management– how Free and Open Source Software plays a role in extreme events
    • The Sahana project, born in Sri Lanka after the 2004 tsunami in the Indian Ocean, helps NGOs and citizens caught in a crisis by crowd-sourcing missing persons reports, co-ordinate different NGOs working in the same place, and track incident reports and volunteer co-ordination.
    • Tashiro Shuichi from Japan will present the ways that Open Source software helped during the tsunami disaster in Japan.
    • Syrine Tlili from the Tunisian Ministry of Communication Technologies will tell us how Open Source was used by citizens during the Arab Spring revolutions
    • Sigmah is a project that enables project management for NGOs
  • Sustainable Development– once the crisis is over, what are the projects that help with systemic problems like education, health-care, sanitation, and documenting human rights violations?
    • SMS is the killer app for communication in the developing world. Most villages in Africa, Asia and South America have cellphone connectivity, but unreliable power grid, Internet and no phone lines. FrontlineSMS enables you to send and receive SMS messages from any computer, using a cheap phone or GSM modem. It is at the heart of every prominent humanitarian software project.
    • Sugar is an operating system which was designed from the ground up to meet the needs of educators in developing countries, as part of the OLPC (One Laptop Per Child) project to revolutionize the use of technology in education. Sean Daly from the Sugar project will show us a deployment of Sugar and OLPC in a secondary school in a small town in Madagascar.
    • Martus, a project created by Benetech, allows the secure recording and storage of testimony relating to human rights violations. Testimony collected with Martus has been used to successfully prosecute police officers for murder in Guatemala.
    • Mifos, which was developed by Grameen Bank, the pre-cursor of micro-financing, provides a micro-financing platform for financial institutions.
    • Akvo help connect doers and donors to transform communities in some of the poorest parts of the world, funding water, sanitation, and health-care projects around the world.
    • The Open Bank Project promotes financial transparency and provides tools to allow people to fight corruption in banking.

Coders for Social Good

There are dozens of amazing Free/Open Source Software projects working to improve the lives of people around the world. For example, Literacy Bridge provides talking books to communities in Africa, and OpenMRS enables the gathering of medical information from regional clinics to reduce child mortality by improving resource allocation.

Many Open Source developers are developing software in communities because they want to make the world a better place. Working on a humanitarian project provides a unique opportunity to combine the social good of Open Source community projects and the public good of helping people in need. Social Coding 4 Good is a new initiative from Benetech which puts willing volunteers in contact with humanitarian projects in need of resources.

The schedule for the track is available on the Open World Forum website. For any press or interview requests, please contact me by email dave@neary-consulting.com or my cellphone +33 6 77 01 92 13.

 

Where do we go from here?

community, freesoftware, maemo, meego, work 14 Comments

The post-Elopocalypse angst has been getting me down over the past few days. It’s against my nature to spend a lot of time worrying about things that are decided, done, dusted. It was Democritus, I think, who said that only a fool worries about things over which he has no control, and I definitely identify with that. It seems that a significant number of people on mailing lists I’m subscribed to don’t share this character trait.

I prefer to roll with the punches, to ask, “where do we go from here?” – we have a new landscape, with Nokia potentially being a lot less involved in MeeGo over the coming months. Will they reduce their investment in 3rd party developers? Perhaps. I expect them to. Will they lay some people off? I bet that there will be a small layoff in MeeGo Devices, but I’d wager that there will be bigger cuts in external contracts. In any case, this is something over which I have no control.

First up – what next for MeeGo? While MeeGo is looking a lot less attractive for application developers now, I still think there’s a great value proposition for hardware vendors to get behind it in vertical markets. Intel seem committed, and MeeGo (even with Nokia reducing investment) is much broader than one company now. A lot of people are betting the bank on it being a viable platform. So I think it will be, and soon.

Will I continue contributing time & effort to MeeGo? My reasons for contributing to MeeGo were not dependent on Nokia’s involvement, so yes, but I will be carefully eyeing business opportunities as well. I’d be lying if I said that I didn’t expect to get some business from a vibrant MeeGo ecosystem, and now I will need to explore other avenues. But the idea of collaborating on a core platform and building a set of free software form-factor specific UIs is still appealing. And I really do like the Maemo/MeeGo community a lot.

Luckily, the time to market difficulties that Nokia experienced are, in my opinion, issues of execution rather than inherent problems in working with free software. Companies have a clear choice between embracing proprietary-style development and treating upstream as “free code” (as Google have with Android), or embracing community-style development and working “The Open Source Way” (as Red Hat have learned to do). Nokia’s problems came from the hybrid approach of engage-but-keep-something-back, which prevented them from leveraging community developers as co-developers, while at the same time imposing all the costs of growing and supporting a large community.

I expect lots of companies to try to learn from this experience and start working smarter with communities – and since that’s where I can help them, I’m not too worried about the medium term.

I would bet on Nokia partners and subcontractors battening down the hatches right now until the dust settles, and potentially looking for revenue sources outside the MeeGo world. If I had a team of people working for me that’s what I’d do. If some Nokia work kept coming my way, I’d be glad of it, but right now I’d be planning a life without Nokia in the medium term.

For any companies who have followed Nokia from Symbian to MeeGo, my advice would be to stick to Linux, convert to an Android strategy, and start building some Windows Phone skills in case Nokia’s bet works out, but don’t bet the bank on it. And working effectively with community developed software projects is a key skill for the next decade that you should be developing (a small plug for my services there).

For anyone working on MeeGo within Nokia, the suspense over who might lose their jobs is worse than the fall, let me reassure you. Having been through a re-org or two in my time, I know that the wait can last weeks or months, and even when the cuts come, there’s always an itching suspicion of another one around the corner. Nothing is worse for morale in a team than wondering who will still be there next month. But you have learned valuable and sought-after skills working on MeeGo, and they are bankable on the market right now. If I were working on MeeGo inside Nokia right now, I think I’d ignore the possibility of a lay-off and get on with trying to make the MeeGo phone as great as possible. If I got laid off, I’d be happy to have a redundancy package worthy of Finland, and would be confident in my ability to find a job as a Linux developer very quickly.

For community members wondering whether to stick with MeeGo or jump ship, I’d ask, why were you hanging out around MeeGo in the first place? Has anything in the past week changed your motivations? If you wanted to have a shiny free-software-powered Nokia phone, you should have one by the end of the year. If you wanted to hack on any of the components that make up MeeGo, you can still do that. If you were hoping to make money off apps, that’s probably not going to happen with MeeGo on handsets any time soon. If you’re not convinced by the market potential of MeeGo apps on tablets, I’d jump ship to Android quick (in fact, why aren’t you there already?).

Qt users and developers are probably worried too. I don’t think that Qt is immediately threatened. The biggest danger for Qt at this point would be Intel & others deciding that Qt was a bad choice and moving to something else. That would be a massive strategic blunder – on a par with abandoning the GTK+ work which had been done before moblin 2 to move to Qt. Rewriting user interfaces is hard and I don’t think that Intel are ready to run the market risk of dropping Qt – which means that they’re pot-committed at this point. If Nokia ever did decide to drop Qt, Intel would probably be in the market to buy it. Then again, I can also see how Qt’s management might try to do an LMBO and bring the company private again. Either way, there will be a demand for Qt, and Qt developers, for some time to come.

No-one likes the guy giving unwanted advice to everyone, so this seems like a good place to stop. My instinct when something like this happens is to take a step back, see what’s inherently changed, and try to see what the landscape looks like from different perspectives. From my perspective, the future is definitely more challenging than it was a week ago, but it’s not like the Elopocalypse wiped out my livelihood. In fact, I have been thinking about life without Nokia since MeeGo was first announced last year, when I guessed that Nokia would prefer working through the Linux Foundation for an independent eye.

But even if Nokia were my only client, and they were going away tomorrow, I think I could probably find other clients, or get a job, quickly enough. It’s important to put these things in perspective.

Drawing up a roadmap

community, freesoftware, General, gimp, gnome, maemo, openwengo, work 6 Comments

One of the most important documents a project can have is some kind of elaboration of what the maintainers want to see happen in the future. This is the concrete expression of the project vision – it allows people to adhere to the vision, and gives them the opportunity to contribute to its realisation. This is the document I’ll be calling a roadmap.

Sometimes the word “roadmap” is used to talk about other things, like branching strategies and release schedules. To me, a release schedule and a roadmap are related, but different documents. Releasing is about ensuring users get to use what you make. The roadmap is your guiding light, the beacon at the end of the road that lets you know what you’re making, and why.

Too many projects fall into the trap of having occasional roadmap planning processes, and then posting a mighty document which stays, unchanged, until the next time the planning process gets done. Roadmaps like these end up being historical documents – a shining example of how aspirations get lost along the way of product development.

Other projects are under-ambitious. Either there is no roadmap at all, in which case the business as usual of making software takes over – developers are interrupt-driven, fixing bugs, taking care of user requests, and never taking a step back to look at the bigger picture. Or your roadmap is something you use to track tasks which are already underway, a list of the features which developers are working on right now. It’s like walking in a forest at night with a head-light – you are always looking at your feet avoiding tree-roots, yet you have no idea where you’re going.

When we drew up the roadmap for the GIMP for versions 2.0 and 2.2 in 2003, we committed some of these mistakes. By observing some projects like Inkscape (which has a history of excellent roadmapping) and learning from our mistakes, I came up with a different method which we applied to the WengoPhone from OpenWengo in 2006, and which served us well (until the project became QuteCom, at least). Here are some of the techniques I learned, which I hope will be useful to others.

Time or features?

One question with roadmaps is whether hitting a date for release should be included as an objective. Even though I’ve said that release plans and roadmaps are different documents, I think it is important to set realistic target dates on way-points. Having a calendar in front of you allows you to keep people focussed on the path, and avoid falling into the trap of implementing one small feature that isn’t part of your release criteria. Pure time-based releases, with no features associated, don’t quite work either. The end result is often quite tepid, a product of the release process rather than any design by a core team.

I like Joel’s scheduling technique: “If you have a bunch of wood blocks, and you can’t fit them into a box, you have two choices: get a bigger box, or remove some blocks.” That is, you can mix a time-based and feature-based schedule. You plan features, giving each one a priority. You start at the top and work your way down the list. At the feature freeze date, you run a project review. If a feature is finished, or will be finished (at a sufficient quality level) in time for release, it’s in. If it won’t realistically be finished in time for the release date, it’s bumped. That way, you stick to your schedule (mostly), and there is a motivation to start working on the biggest wood blocks (the most important features) first.

A recent article on lessons learned over years of Bugzilla development by Max Kanat-Alexander made an interesting suggestion which makes a lot of sense to me – at the point you decide to feature freeze and bump features, it may be better to create a release branch for stabilisation work, and allow the trunk to continue in active development. The potential cost of this is a duplication of work merging unfinished features and bug fixes into both branches, the advantage is it allows someone to continue working on a bumped feature while the team as a whole works towards the stable release.

Near term, mid term, long term

The Inkscape roadmap from 2005 is a thing of beauty. The roadmap mixes beautifully long-term goals with short-term planning. Each release has a by-line, a set of one or two things which are the main focus of the release. Some releases are purely focussed on quality. Others include important features. The whole thing feels planned. There is a vision.

But as you come closer and closer to the current work, the plans get broken down, itemised further. The BHAGs of a release in 2 years gets turned into a list of sub-features when it’s one year away, and each of those features gets broken down further as a developer starts planning and working on it.

The fractal geometer in me identifies this as a scaling phenomenon – coding software is like zooming in to a coastline and measuring its length. The value you get when measuring with a 1km long ruler is not the same as with a 1m ruler. And as you get closer and closer to writing code, you also need to break down bigger tasks into smaller tasks, and smaller tasks into object design, then coding the actual objects and methods. Giving your roadmap this sense of scope allows you to look up and see in the distance every now and again.

Keep it accurate

A roadmap is a living document. The best reason to go into no detail at all for  future releases beyond specifying a theme is that you have no idea yet how long things will take to do when you get there. If you load up the next version with features, you’re probably aiming for a long death-march in the project team.

The inaccurate roadmap is an object of ridicule, and a motivation killer. If it becomes clear that you’re not going to make a date, change the date (and all the other dates in consequence). That might also be a sign that the team has over-committed for the release, and an opportunity to bump some features.

Leave some empty seats

In community projects, new contributors often arrive who would like to work on features, but they don’t know where to start. There is an in-place core team who are claiming features for the next release left & right, and the new guy doesn’t know what to do. “Fix some bugs” or “do some documentation” are common answers for many projects including GNOME (with the gnome-love keyword in Bugzilla) and LibreOffice (with the easy hacks list). Indeed, these do allow you to get to know the project.

But, as has often been said, developers like to develop features, and sometimes it can be really hard what features are important to the core team. This is especially true with commercial software developers. The roadmap can help.

In any given release, you can include some high priority features – stuff that you would love to see happen – and explicitly marked as “Not taken by the core team”. It should be clear that patches of a sufficiently high standard implementing the feature would be gratefully accepted. This won’t automatically change a new developer into a coding ninja, nor will it prevent an ambitious hacker from biting off more than he can chew, but it will give experienced developers an easy way to prove themselves and earn their place in the core team, and it will also provide some great opportunities for mentoring programs like the Google Summer of Code.

The Subversion roadmap, recently updated by the core team, is another example of best practice in this area. In addition to a mixed features & time based release cycle, they maintain a roadmap which has key goals for a release, but also includes a separate list of high priority features.

The end result: Visibility

The end result of a good roadmap process is that your users know where they stand, more or less, at any given time. Your developers know where you want to take the project, and can see opportunities to contribute. Your core team knows what the release criteria for the next release are, and you have agreed together mid-term and long-term goals for the project that express your common vision. As maintainer, you have a powerful tool to explain your decisions and align your community around your ideas. A good roadmap is the fertile soil on which your developer community will grow.

The Ladybird Guide to Business Intelligence

work 6 Comments

Recently I have found myself frustrated by the lack of a very simple overview for Business Intelligence explaining what problems it solves, and how.

For example, the Pentaho BI platform FAQ has a promising first question: “What is a business intelligence (BI) platform?” The answer is typical of BI overviews I have seen:

A comprehensive development and runtime environment for building complete solutions to business intelligence problems. The Pentaho BI Platform is the infrastructure and core services that integrate business intelligence components to complete the BI Suite. This includes the infrastructure necessary to build, deploy, execute and support applications.

I don’t know about you, but that gives me more questions than answers. What type of problems are business intelligence problems? What are the core services provided by a BI platform? What are BI components, and a complete BI suite? In short, what does it do?

The wikipedia article on business intelligence is a bit better, but still gets into heavy acronyms quite early.

I think I have figured out what Don Norman calls a conceptual model which is about right, so for those who have struggled as I have recently, here is the Ladybird guide to Business Intelligence.

What problems does BI solve?

Let’s say you are the CEO of a company, and you want to track what the costs of the company are, across payroll, purchasing, marketing and sales, overall and by division. You also want to track revenues by division, product line, market and month. For each variable, you’d like to drill down when you see a figure that looks odd. Payroll in Asia increased 20% this year – did we buy a company? Are there savings to be made?

All of this information spans dozens of different computer systems, applications, databases. What you want is one application to rule them all, from which you can get nice graphical clickable data.

Let’s say you’re a free software project manager or community manager. You have lots of infrastructure for people working on the project – source control, mailing lists, forums, translation infrastructure, documentation, bug tracking, downloads, …

You want to know if your community is growing, shrinking or stagnant. You’d like to know if translators are up, and spot when something is up – we lost 3 Thai translators last cycle, does the Thai translation team have a problem? Is there a problem with wiki spam? A correlation between people active on the forums and commits to the project? Some of these questions span different applications, systems, and databases. What you want is one application to rule them all, where you can get a quick overview of what’s happening in the community, and click on something to drill down into the data, or create complex queries to spot correlations and patterns across different apps.

BI software is ideally suited to helping in both of these situations.

How does it work?

Very simply, a BI platform is a web application that allows you to create queries and visualise the results across a variety of data sources. At its simplest, you bring big lumps of data together and extract some useful numbers from it. If you’ve ever used a pivot table in a spreadsheet application, you’ve written a BI query.

Now we get into the acronyms and the jargon. Here’s a quick lexicon of commonly used BI terms:

ETL
Extract/Transform/Load – An ETL module allows you to script and automate the extraction of data from a funky data source (say, CSV files on a server, an auto-generated spreadsheet, or screen-scraping data from a HTTP query, or just an SQL database), and transform it into some other format (typically basic transformations like joins, mapping inputs to database fields, or applying simple arithmetic to convert to an agreed unit), and then store the result in a database.
OLAP
Online Analytical Processing – a fancy name for “queries”. There is a de facto standard query format called MDX and the database needs to be optimised for “multidimensional queries” (aka joins – like pivot tables in a spreadsheet).
Data Warehouse
A fancy name for database.
Reporting
The presentation of the results of queries in a graphical way.

In brief, then, a BI suite provides you with a way to suck in data from a variety of sources, store the data (if you need to) in a custom database which is optimised for querying across different data sources, a nice way to define the queries in which you are interested, and then present the results of those queries in a nice graphical way.

If you don’t need to do any transformation of data, and you can operate directly on SQL databases, then you can typically provide the BI platform access to them directly. If you have any unusual data sources, or want to transform data, you will need an ETL module. If you are dealing with a lot of data and want to optimise query time, you might need a specific OLAP server. A query editor will help you create queries to get the information you want out of your data. You will need a reporting module to convert query results from raw tabular form to pie charts, bar charts and the like. And the BI server provides hooks for all of these various modules to work together, sucking in, storing, manipulating and presenting data in interesting ways.

Is this all right?

I would love to know if my mental model is flawed – so if I’m missing anything important, or I’ve said something which is a pile of rubbish, please do add a comment and let me know.

I know how hard it can be to cut through the jargon in an area where it’s ubiquitous and the first step in enterprise software is usually the hardest, so hopefully this will be useful to someone other than myself.

Follow-up to “Shy Developer Syndrome”

community, freesoftware, maemo, work 4 Comments

Reposted from neary-consulting.com

My article on “Shy Developer Syndrome” a few weeks ago garnered quite a bit of interest, and useful feedback. Since a lot of it adds valuable perspectives to the problem, I thought I should share some of my favourite responses.

Here on gnome.org, Rodney Dawes argued that developers tend to stay away from mailing lists because the more public lists are very noisy:

For me, mailing lists are a huge risk vs. low return problem. They can become a time sink easily, and it’s quite often that pointless arguments get started on them, as offshoots of the original intent of the thread. Web Forums also have this problem. And, to really get much of anything out of a list, you must subscribe to it, as not everyone who replies, is going to put you specifically in the recipients headers. That means, you’re now suddenly going to get a lot more mail than you normally would for any highly active project. And for anyone trying to get involved in an open source community, 99% of the mail on that list is probably going to be totally irrelevant to them. It will just make tracking the conversation they are trying to have, much harder.

I agree with Rodney that dealing with a new level of volume of email is one of the trickiest things for new contributors. I still remember when I signed up to lkml for an afternoon in college, only to find 200 new emails 3 hours later. I panicked, unsubscribed, and gave up that day on being a Linux kernel hacker.

Since then, however, I have learned some email habits which are shared by other free software hackers I know. Everyone I know has their own tricks for working with medium or high volume mailing lists, and some combination of them may make things livable for you, allowing you to hear the signal without being drowned out by the noise. LifeHacker is a good source of tips.

Rob Staudinger says something similar, pointing the finger at bikeshed discussions as a big problem with many community lists:

Will the zealots go and suggest postgresql’s process model was poor, or samba’s memory allocator sucks? Unlikely, but they will tell you your GUI was bad or that you’re using a package format they don’t like, just because it’s so easy to engage on that superficial level.

Over at LWN, meanwhile, Ciaran O’Riordan makes a good point. Many developers working on free software want to separate their work and personal lives.

When I leave the office at 6pm, my work should have no more relevance until the following morning. Same when I quit a company. I might choose to tell people where I work/worked, but it should be a choice, and I should be able to choose how much I tell people about my work. Having mailing list posts and maybe even cvs commits might be too detailed. Maybe waaay too detailed.

Finally, over at neary-consulting.com, MJ Ray suggested that asking individuals to respond to a request can backfire:

Publicly referring to individuals on a mailing list is a double-edged sword. It might bolster the confidence of the named individual, but it also reduces the confidence of other people who might have answered the question. In general, I feel it’s best not to personalise comments on-list. Some e-democracy groups require all messages to be addressed to a (fictional or powerless) chair or editor, similar to the letters pages of The Times.

While I agree with MJ in situations where the answer is accessible to the wider community, but often only developers working for you, the manager, are in a position to reply – at that point, you have a choice: get the information off your developer and answer yourself, or ask him to answer the question. and I’ve found that asking on the list has the positive side-effects I mentioned.

« Previous Entries