December 9, 2011
freesoftware, General, work
Earlier this week, I wrote:
I hate docs that tell you what to do, but not why. As soon as a package name or path changes, you’re dust. This is maybe the 4th time I’ve been configuring Apache to delegate stuff to Tomcat using mod_jk, and every time is just like the first.
For those who don’t know, mod_jk is a module inplementing the wire protocol AJP/13, which allows a normal HTTP web server to forward on certain requests to a second server. In this case, we want to forward requests for JSP pages and servlets to Tomcat 6. This allows you to do neat things like serve static content with Apache and only forward on the dynamic Java stuff to Tomcat. The user sees a convenient URL (no port :8080 on the hostname) and the administrator gets to serve multiple web scripting languages on the same server, or load balance requests for Java server resources across several hosts.
I have spent enough time on it at this point, I think, that I understand all of the steps in the process, and have stripped it down to the bare minimum that one would need to do in terms of configuration to get things working. And so I’m putting my money where my mouth is, and this is my attempt to write a nice explanation of how mod_jk does its thing, and how to avoid some of the common mistakes I had.
First, a remark: Apache is one of those pieces of software that has gotten harder, rather than easier, to configure as time has gone on. Distributions each package it differently, with different “helpful” mechanisms that make common tasks like enabling a module easier, and to enable convenient packaging of modules like PHP, independent of the core package. But the overall effect is that a lot of magic done by distributions makes it much harder to follow the upstream documentation. Config files are called different names, or stored in different places. Different distributions handle the inclusion of config file snippets differently. And so on.
This is not to say that Apache, Tomcat and mod_jk don’t have some nice docs – they do, but often the docs don’t correspond to the distros, or haven’t been updated in a while, and often they don’t explain why you have to do something, putting emphasis instead on what you need to do. After all my reading, I finally found the Holy Grail I was looking for – the simple document of how to configure mod_jk – but even this has its shortcomings. The article doesn’t mention Tomcat, for example, which left me digging around for information on the configuration I needed to do to Tomcat, which led me to this, which led me to over-write the sample workers.properties file in the simple set-up document.
But if you understand the First Principles, you can figure out what’s going on with any organisation of configuration. That’s what I’m hoping to get across here.
How does mod_jk do its thing?
The first issue I had trouble getting my head around was how, exactly, all this was supposed to work. In particular, I didn’t quite understand how the configuration worked on the Tomcat size of things.
As I understand it, here’s what happens:
- A GET request comes in to httpd for http://localhost/examples/jsp/num/numguess.jsp
- Apache processes the request, and find a matching pattern for the URL among JkMount directives
- Apache then reads the file specified by the JkWorkersFile option to figure out what to do with the request. Let’s say that config file says to forward to localhost:8009 using the protocol ajp13
- Tomcat has a Connector listening on port 8009, with the protocol AJP/13, which handles the request and replies on the wire. Apache httpd sends the reply back to the client
Apache httpd configuration
There are two steps to configuring Apache:
- Enabling the module
- Configuring mod_jk
Apache provides a handy utility called “a2enmod” which will enable a module for you, once it’s installed. What happens behind the scenes for modules depends on the distribution. On Ubuntu, module load instructions are put in a file called /etc/apache2/mods-available/<module>.load optionally alongside a sample configuration file /etc/apache2/mods-available/<module>.conf. To enable the module, you create a symlink to the .load file in /etc/apache2/mods-enabled.
On my Ubuntu laptop, my jk.load contains:
LoadModule jk_module /usr/lib/apache2/modules/mod_jk.so
On OpenSuse, on the other hand, a line similar to this is explicitly added to the file /etc/apache2/sysconfig.d/loadmodule by sysconfig, based on the contents of a field in the configuration file /etc/sysconfig/apache2 – remember how I said that distro packaging makes things harder? If you added the line directly to the loadmodule file, the change would be lost the next time Apache restarts.
In both cases, these files (on Ubuntu, the mods-available/*.load files, and on OpenSuse the sysconfig.d/* files) are loaded by the main Apache config file (httpd.conf) at start-up.
The minimum configuration that mod_jk needs is a pointer to a Workers definition config file (JkWorkersFile). Other useful configuration options are a path to a log file (JkLogFile – which should be writable by the user ID which owns the httpd process) and a desired log level – I set JkLogLevel to “debug” while getting things set up. On OpenSuse, I also needed to set JkShmFile, since for the default file location (/srv/www/logs/jk-runtime-status) the directory didn’t exist and wasn’t writable by wwwrun, the user that owns the httpd process.
This configuration, and the configuration of paths below, is usually in a separate config file – in both Ubuntu and OpenSuse, it’s jk.conf in /etc/apache2/conf.d (files ending in .conf in this directory are automatically parsed at start-up). To avoid errors in the case where mod_jk is not present or loaded, you can surround all Jk directives with an “<IfModule mod_jk.c>…</IfModule>” check if you’d like.
The JkMount directive configures what will get handled by which worker (more on workers later). It takes two arguments: a path, and the name of the worker to handle requests matching the path. Unix wildcards (globs) are accepted, so
JkMount /examples/*.jsp ajp13_worker
will match all files under /examples ending in .jsp and will pass them off to the ajp13_worker worker.
If you want Apache to serve any static content under your webapps, you’ll also need either a Directory or Alias entry to handle them. Putting together with the previous section, the following (from Ubuntu) was the jk.conf file I used to pass the handling of JSPs and servlets off to Tomcat, and serves static stuff through Apache:
Alias /examples /usr/share/tomcat6-examples/examples
JkMount /examples/*.jsp ajp13_worker
JkMount /examples/servlets/* ajp13_worker
I should use Directory to prevent Apache from serving anything it shouldn’t, like Tomcat config files under WEB-INF – I could also just use “JkMount /examples/* ajp13_worker” to have everything handled by Tomcat.
Now that Apache’s config is done, we need to configure mod_jk itself, via the workers.properties file we set in the JkWorkersFile parameter.
Sample workers.properties files contain a lot of stuff you probably don’t need. The basic, unavoidable parameters you will need are the name of a worker (which you’ve already used as the 2nd argument for JkMount above), and a hostname and port to send requests to, and a protocol type (there are several options for worker type besides AJP/1.3 – “lb” for “load balancer” is the most important to read up on). For the above jk.conf, the simplest possible workers.properties file is:
And that’s it! The last step is to set up Tomcat to handle AJP 1.3 requests on port 8009.
In principle, Tomcat doesn’t need to know anything about mod_jk.It just needs to know that requests are coming in on a given port, with a given protocol.
Typically, an AJP 1.3 connector is already defined in te default server.xml (in /usr/tomcat6 on both Ubuntu and OpenSuse) when you install Tomcat. The format of the connector configuration is:
<Connector port=”8009″ protocol=”AJP/1.3″ redirectPort=”8443″ />
I am pretty sure that this will work without the redirectPort option, but I haven’t tried it. It basically allows requests received with security constraints specifying encryption to be handled over SSL, rather than unencrypted.
In addition to this, Tomcat does provide a facility to auto-create the appropriate mod_jk configuration on the fly. To do so, you need to specify an ApacheConfig in the Tomcat connector, and point it at the workers.properties file. This facility looks pretty straightforward, but I know I found it confusing in the past when I lost edits to the jk.conf file – I prefer manual configuration myself.
I have had quite a few gotchas while figuring all this out – I may as well share for the benefit of future people having the same problems.
- All the documentation for mod_jk installedd with the packages refers to Tomcat5 paths – for example, on OpenSuse, in the readme, I was asked to copy workers.config into /etc/tomcat5/base – a directory which doesn’t exist (even when you change the 5 to a 6)
- If your apache web server uses virtual hosts (and, on Ubuntu, it does by default) then JkMounts are not picked up from the global configuration file! You need to either add “JkMountCopy true” to the VirtualHost section, or have JkMounts per VirtualHost. If you used Alias as I did above, and you try to run a servlet, the error message is just a 404. If you try to load a JSP, you will see the source.
- If you make a mistake in your workers.property file (I had a typo “workers.list=ajp13_worker” for several hours) and your worker name is not found in a “worker.list” entry, you will see no error message at all with warnings set to error or info. With the warning level set to debug, you will see the error message “jk_translate::mod_jk.c (3542): no match for /examples/ found” The chances are you have a typo in either your jk.conf file (check that the name of the worker corresponds to the name you use in workers.properties), or you have a typo somewhere in your workers.properties file (is it really work.list? Does the worker name match? Is it the same as the worker name in the .host, .port and .type configuration?
- Make sure you get Tomcat working correctly first and working perfectly on port 8080 – or you won’t know whether errors you’re seeing are Tomcat errors, Apache errors or mod_jk errors.
I’m sure I’ve made mistakes and forgotten important stuff – I’m happy to get feedback in the comments.
November 18, 2011
community, freesoftware, General
A couple of days ago, I tweeted this: “Insight of the day: All “community norms” documents come down to one word: Empathy. Think how others will feel before you act.” I think that’s worth developing on.
As I was growing up, empathy meant "Deanna Troi" to me
Here are a few examples:
- GNOME Code of Conduct: “If someone asks for help it is because they need it. Do politely suggest specific documentation or more appropriate venues where appropriate, but avoid aggressive or vague responses such as “RTFM”.” – think how it feels to need help, and to take the step of asking for it. I want to know that someone *understands* what I’m going through, feels my frustration, and is looking for a way to help relieve it. Recently, when confronted with a technical issue that prevented me from using my scanner, the advice I received on an IRC channel was to upgrade or change my Linux distribution! How un-empathic can you get!
- Koha patch rules (along with every other development project in the world): “The patch must apply to the current HEAD of the master branch of the code”: Put yourself in the place of a project maintainer who receives a patch proposal. He tries to apply it to his source tree, but the merge fails. Why? Because the patch was created against a year-old release of the project, and he’s since reworked internals to solve a different, unrelated issue. The maintainer is faced with a number of unsavory choices now: Spend time reading the patch, understanding what it does, and “forward-porting” it to master; check out the old branch, apply the patch, review, and test it there, and not commit to master; or drop the patch. What would you do in that situation? Someone is giving you a gift, which is going to make work for you. Is it worth an hour or two of your time to work on it to get it to just apply to your work? You still need to review the patch after that – which you would have to do anyway. In that situation, most people will ask the original patch proposer to do the initial grunt work and get the patch working on the tip of master.
- Now flip things around. You found a bug in software you use – the bug was really annoying. You took the time to get the source code, identify, qualify and fix the buig, open a bug report, which was confirmed, and attach a patch which you have checked fixes the problem. And what answer do you get? “Not good enough – work on it more”. How would that make you feel? That depends on how it is communicated. If it’s a stock answer, like a sheet of paper handed over a tax office counter, with a list of prerequisites, then I bet that would make me angry, resentful and frustrated. I poured time and effort into that patch, and this is how you treat it? If the criticism is of the core of the patch – it doesn’t fix the problem, or should do so differently, then the criticism might be easier to take. But if it’s issues which potentially add many hours of effort on top of time already spent, with no benefit to the proposer (check out the latest code, upgrade half a dozen dependencies without breaking my old version, compile it, and then forward-port the patch), chances are he won’t do it. An empathic response might be to make someone aware of the guidelines and their reason for being, but help him with the forward-porting on IRC by asking him to explain the patch, what it does and why.
All of those guidelines on indentation and whitespace, commenting code, including test cases, updating documentation, and ensuring code compiles at the tip of the master branch are designed to help patch proposers make patches which are easy for maintainers to apply. And in this context, an empathic patch proposer can understand them much better. Miguel de Icaza did a great job of framing this right.
When I was in college, I went to the Netherlands one Summer for a working holiday. At one point, I had a job offer for short-term work, but needed to be registered for tax to start, and I needed a bank account to get paid. So I went to the tax office, and the lady behind the counter very resignedly handed me a piece of paper with prerequisites, and told me to come back when I had fulfilled them. Then she looked over my shoulder and said “Next!” One of the prerequisites was a permanent address – I explained that I was living in a camp-site for the summer, and would that address do? Of course not! No proposed solution, no consideration of the situation I was in, no empathy.
Then I went to the bank, where I was told that I needed a permanent address to open an account. Same sense that the person I was dealing with didn’t care about me. So with my friend Barry, we looked into short term accommodation options. Landlords required (among other things) an employment agreement and a bank account before they would rent us accommodation – even if only by the week! In the end, we had to pass up that job, and work “on the black” for less than minimum wage to survive the Summer – but what choice did we have?
How frustrating! Just imagine how we felt. That’s empathy.
Again and again in community guidelines, whether it’s guidelines for people who are approaching a community to report a bug or propose a patch or feature, or guidelines for community members dealing with each other and people outside the community, this idea “think how this would make you feel if the roles were reversed” is pervasive, but unwritten. I think that it should be.
It reminds me of a social experiment I heard about recently:
Rats are placed in a box with a lever. When you pull the lever, a food pellet is distributed. The rats quickly learn to pull the lever. After a while, you change the configuration of the cage. Now, when the lever is pulled, the food pellet is distributed, but a cold shower drenches all the rats in the box. After a while, the rats learn not to pull the level, and start to punish rats who do as a group.
The second generation starts when a new rat is introduced into the box. as he approaches the leverl, the older rats all jump on top of him, to prevent him from touching it. In effect, they are teaching him the rule “don’t touch the lever”, without explaining why the rule exists. As time goes on, new rats are added, and old rats removed from the box.
The third generation happens when there are none of the oiriginal rats left in the box. None of the rats have experienced the cold shower after the lever was pressed. At that point, you can turn off the cold shower function – you will be sure that no rat will ever touch the level, because the community rules forbid it. Ask any of the rats why, though, and they will not be able to give you a better answer than “because that’s the way it is”. If rats could talk, of course.
Community guidelines which are purely written documents, but which neglect the empathic side of the equation, and don’t explain how not following the guidelines affect other people, can be similar. It’s important to include the reasons for guidelines,so that we don’t forget how breaking the rules makes others feel. It’s also important to have sufficient flexibility and adaptability in dealing with new community members – like the rat experiment, circumstances change all the time. Back when everyone was using CVS, performing merges was a complicated and time-consuming process. Nowadays, rebasing and merging with Git, Bazaar or Mercurial is so easy that some of the coding guidelines we used to have may no longer have the same impact. Likewise, email technology has moved on, and with cheap and copious bandwidth, email norms have evolved – netiquette community norms move on with them.
In general, as Bill & Ted famously said, “Be excellent to each other”. Think about how your actions & statements will be received. Be empathic.
October 6, 2011
community, General, gnome, maemo, work
With the announcement of Tizen (pronounced, I learned, tie-zen, not tea-zen or tizz-en) recently, I headed over to the website to find out who the project was aimed at. I read this on the “Community” page:
The Tizen community is made up of all of the people who collectively work on or with Tizen:
- Product contributors: kernel/distribution developers, release managers, quality assurance, localization, etc.
- Application developers: people who write applications to run on top of Tizen
- Users: people who run Tizen on their device and provide feedback
- Vendors: companies who create products based on Tizen
- Other contributors: promotion, documentation, and much more
Anyone can contribute by:
- Submitting patches
- Filing bugs
- Developing applications
- Helping with wiki documentation
- Participating in other community efforts and programs
Wow! That’s a diverse target audience, and a very wide ranging list of ways you can help out. But is it really helpful to scope the project so wide, and try to cater to such a wide range of use-cases from the start? And is the project at a stage where it even makes sense to advertise itself to some of these different types of users?
I have talked about the different meanings of “maintainer” before, depending on whether you’re maintaining a code project or are a package maintainer for a distribution. I have also talked about the different types of community that build up around a project, and how each of them needs their own identity – particularly in the context of the MeeGo trademark. I particularly like Simon Phipps’s analysis of the four community types as a way to clarify what you’re talking about.
For Tizen, I see between three and five different types of community, each with different needs, and each of which can form at different stages in the life-cycle of the project. Trying to “sell” the project to one type of community before the project is ready for them will result in disappointment and frustration all round – managing the expectations of people approaching Tizen will be vital to its long-term success, even if it opens you up to short-term criticism. Unless each of these communities is targeted individually and separately, and at the right time, I am sceptical about the results.
“Upstream” software developers
The first and most identifiably “Open Source” family of communities will be the software developers working on components and applications which will end up in the core of Tizen. For the most part, these communities exist already, and Samsung and Intel engineers are working with them. These are the projects we commonly call “upstreams” – projects you don’t control, but from whom code flows into your product.
In other cases, code will originate from Intel and/or Samsung. In the same way that Buteo, oFono and the various applications which were developed for the MeeGo Netbook UX were very closely associated with MeeGo, there will be similar projects (sometimes the same projects) which will have a close association with Tizen. Each of these projects will have their own personality, their own maintainers, roadmaps, specs – and each of them should have their own identity, and space to collaborate and communicate.
Communities form around programming projects not because of the code, but because of a shared vision and values. Each project will attract different people – the people who are interested in metadata and search are not the same as the people who will be passionate about system-wide contact integration. Each project needs its own web space, maintainers, bug tracker, mailing list, and wiki space. Of course, many projects can share the same infrastructure, and a lot of the same community processes (for things like code governance), and for projects closely related to Tizen, we can provide common space to help create a Tizen developer community in the same way there’s a GNOME developer community. But each community around each component will have its own personality and will need its own space.
At the level of Tizen, we could start with an architecture diagram, perhaps – and for each component on the architecture diagram, link to the project’s home page – many of the links will point to places like kernel.org, gnome.org, freedesktop.org and so on. For Tizen-specific projects, there could be a link to the project home page, with a list of stuff that needs to be done before the component is “ready”.
Core platform packagers, testers, integrators
Once we have a set of components which are working well together, we get to the heart of what I think will be Tizen’s early activity – bringing those components together into a cohesive whole. Tizen will be, basically, a set of distributions aimed at different form factors. And the deliverable in a distribution is not code or a Git tag, it’s a complete, integrated stack.
The engineering skills, resources and processes required to integrate a distribution are different to those of a code project. Making a great integrated Linux platform is obviously difficult – otherwise Red Hat would not be making money, and Ubuntu would not have had the opportunity to capture so much mind-share. Both Red Hat and Canonical do something right which others failed at before them.
Distributions attract a different type of contributor than code projects, and need a different set of tools and infrastructure to allow people to collaborate.At the distribution level, it is more likely you will be debating whether or not to integrate a particular package or its competitor than it is to debate whether to implement a feature in a specific package. Of course, it is possible to influence upstream projects to get specific features implemented, not least by providing developer resources, and there will be a need for some ambassadors to bridge the gap to upstream projects. And it is possible for a distribution to carry patches to upstream packages if that community disagrees. But in general, not much code gets written in distributions.
What the distro community needs and expects is infrastructure for continuous integration, bug tracking software, a way to submit and build software packages, good release engineering, an easy way to find out what packages need a maintainer (see Debian’s WNPP list or Ubuntu’s “need-packaging” list for examples) and a way to influence what packages or features are included in future releases (see Fedora or Ubuntu for examples). They also want tools to allow packaging, testing and deploying the integrated distribution – for an embedded distro, that might mean an emulator and an image creator, perhaps.
Vendors and carriers
Communities of companies are worth a special mention. Companies have very different ways of working together and agreeing on things than communities of individuals. I was tempted to just roll vendors into the “Platform integrators” community type, but they are sufficiently different to be considered another type of community. Vendors have different constraints and motivations than individual contributors to the platform, and we should be aware of those.
Vendors like to have a business relationship – some written agreement that shows where everyone stands. They have a direct relationship with people who buy their hardware, and have an interest (potentially in conflict with other communities) in owning the user relationship – through branded application stores, UI and support forums, for example. And since vendors are typically working on hardware development in parallel with software development, they care a lot about a reliable release schedule and quality level from the stack. Something that companies care about which individuals usually don’t are legal concerns around working with the process – do they have patent rights to the code they ship? Are they giving up any of their own potential patent claims?
3rd party application developers
Application developers don’t care, in general, whether the platform is open source or closed, or developed collaboratively or by one party (witness the popularity of Android and iOS with application developers). What they do care about are developer tools, documentation, and the ability to share their work with device users and other application developers. Some application developers will want to develop their applications as free software, and it is possible to enable that, but I think the most important thing for application developers is that it’s easy to do things with your platform, that there are good tools for developing, testing and deploying your application, that your platforms APIs are enabling the developer to do what he wants, and that you are providing a channel for those developers to get their apps to users of your platform.
An application developer doesn’t want to have to ship his software to 5 different app stores on every release – in contrast to vendors, he would like a single channel to his market. Other things he cares about are being able to form a relationship with his users – so app stores need to be social, allow user ratings and comments, and allow the author to interact with his users. Clear terms of engagement are vital here too – especially for commercial application developers. And application developers are also another type of community – they will want to share tips and tricks, code, and their thoughts on the project leaders in some kind of app developer knowledge base.
There is another potential community which I should mention, and that is users of your platform – typically, these will be users of devices running your platform. It should be possible for engaged users to share information, opinions, tips & tricks, and interesting hacks among each other. It should also be possible to rate and recommend applications easily – this is in the interests of both your user community and your application developer ecosystem.
OK, so what?
Each of these community types is different, and they don’t mix well. They mature at different rates. There is no point in trying to build a user platform until there are devices running your platform on the market, for example
So each type of community needs a separate space to work. There is no point in catering to a 3rd party application developer until you have developer tools and a platform for him to develop against. Vendors will commit to products when they see a viable integrated platform. And so on.
What is vital is to be very clear, for each type of community, what the rules of engagement are. As an example, one company can control the integration of a platform and the development of many of its components (as is the case for Android) and everyone is relatively happy, because they know where they stand and what they’re getting into. But if you advertise as an open and transparent project, and a small group of people announce the decisions of what components are included or excluded from the stack (as was the case in MeeGo), then in spite of being vastly more open, people who have engaged with the project will end up unhappy, because of a mismatch between the message and the practice in the project.
So what about Tizen? I think it is a mistake to announce the projects as a place to “submit patches, report bugs and develop applications” when there is no identifiable code base, no platform to try, and no published SDK to develop against. By announcing that Tizen is an Open Source platform, Intel and Samsung have set an expectation for people – and these are people who have gone through the move to MeeGo under two years ago, and who have seen Nokia drop the project earlier this year. If they are disappointed by the project’s beginnings because the expectations around the project have been set wrong from the offset, it could take a long time to recover.
Personally, I would start low-key by announcing an architecture diagram and concentrating on code and features that need writing, then ramp up the integrator community with some alpha images and tools to allow people to roll their own; finally, when the platform stabilises roll out the developer SDK and app store and start building up an application developer community. But by aiming too big with the messaging, Tizen runs the risk of scaring some people away early. Time will tell.
September 1, 2011
The Blue Advocado (formerly Board Café) is one of my favourite newsletters – there’s always at least one article I want to read, and the area of non-profit management is such a broad and deep one that there’s always something to learn.
This month’s issue has some great tips on how to get more out of conferences:
- Choose the sessions you know the least about
- Instead of listening for good ideas (only), listen for things you can quote in your next grant proposal or monthly report
- Skip at least one session. Go outside.
- Fail-proof way to meet someone: get to a session early; other introverts will be sitting there playing solitaire on their cell phones. Sit near one of them, then lean over and say, “Would you mind if I introduced myself? I’m supposed to meet at least five new people at this conference and I haven’t met any so far!”
- If you’re bored or irritated by a session, walk out.
- Put up a sign on the bulletin board to meet people. If no-one turns up, no-one will ever know.
Certainly, tips 1, 2 and 6 are things I have not done before, and would be interested in trying.
June 10, 2011
I’m happy to be co-ordinating the “Humanitarian Free/Open Source Software” track at this year’s Open World Forum. The call for participation is now open.
I’ve already been in contact with some great organisations including Ushahidi, CrisisCommons and Akvo, and I plan to contact Sahana and Benetech about the conference too. I’d love to get more stories from around the world of how free software & open data are making the world a better place and saving lives.
June 8, 2011
Interesting article from Gartner which has some relevance to my recent proposal for a gnome-design mailing list: Gartner Identifies Five Collaboration Myths.
Myth 1. The right tools will make us collaborative
Technology can make it easier to collaborate when applications mirror a more intuitive, fluid work style, but selecting a tool without addressing roles, processes, metrics and the organization’s workplace climate is putting the cart before the horse.
June 3, 2011
As many have probably heard, Oracle and IBM jointly proposed that OpenOffice.org become an Apache project this week. The news surprised me, and made me think about what’s in it for each of the parties involved.
What does it mean for OpenOffice.org?
Apache project status brings with it a number of prerequisites which will affect the way the OpenOffice.org project runs pretty profoundly. The most obvious one will be a licence change. The copyright holder(s) – in this case Oracle must agree to release the code under the Apache Public Licence. In addition, there is a commitment to certain social norms, and to use the infrastructure provided by the Apache project. The Apache incubation process docs have all the info.
Among the items that need to be checked off before OpenOffice.org graduates to top-level project is a diverse developer base, and non-dependence on one company for support. It will be interesting to see how things evolve on this front in the coming months.
What’s in it for Oracle?
This is easy – Oracle off-loads OpenOffice.org, for which it has no further use, without damaging its relationship with IBM and other commercial OOo partners. They lose any revenues involved, but apparently they were resigned to losing those anyway. So for Oracle this is all up-side.
What’s in it for IBM?
IBM can continue to develop Symphony, with a licence it’s happy with. However, some of the criteria for membership will require IBM to put significant resources into developer and community recruitment, and development. Without Oracle or the LibreOffice supporters, there will be a void to fill. Rob Weir talks about this: “In particular, we need to attract a wide variety of project specialists. This includes C++ programmers (on Linux, Mac and Windows), QA (also on all platforms), help/documentation, UI/UCD, translation/globalization, accessibility, install, etc.”
What’s in it for Apache?
This is the one I really don’t get. The Apache Foundation doesn’t have any irons in the desktop software fire, so this is a departure for them. Somehow the Eclipse Foundation feels like it might have been a better match. Plus, by getting involved in the LibreOffice/OpenOffice war, Apache is running a risk.
But the Apache people I know are all fiercely proud of their open, inclusive and transparent operation – so there is no particular reason for them to reject an application for inclusion as an incubated project. In fact, an outright rejection would have been unprecedented. It remains to be seen whether the project will get through the incubation process, but with the sponsors and mentors involved, it’s likely that it will.
Anyway, the answer to the question right now, as far as I can see, is “not much”. At best, they will prove, if OpenOffice does eventually regain the momentum from LibreOffice, that the APL is better for building community than LGPL.
What does this mean for LibreOffice?
Well, at one level, not much has changed. LibreOffice is “pure” copyleft, and APL 2.0 is a GPL compatible non-copyleft licence, so LibreOffice can continue to integrate patches and features from OpenOffice, but the reverse remains impossible (except now for a different reason).
At another level, a lot has changed. Before this week, LibreOffice could legitimately claim to the the community bazaar project to Oracle’s cathedral. But moving to Apache gives some community cred to OpenOffice. Jeremy Allison’s comment on Rob Weir’s blog sums it up nicely: “This is about copyleft vs. non-copyleft licensing”. Some developers will prefer Apache’s philosophy of “do what you want with it”, and others will prefer the LGPL’s philosophy of “feel free to take, but if you build on it, share”. This is the first time I can think of that we will see Apache and GPL forks of the same project competing head to head for community and commercial developer mindshare. The results will be fascinating.
Inevitably, the project that succeeds in growing the more diverse developer base will win.
May 31, 2011
community, freesoftware, General, gimp, gnome
I’ve been thinking a lot recently about mentoring programs, what works, what doesn’t, and what the minimum amount of effort needed to bootstrap a program might be.
With the advent of Google Summer of Code and Google Code-In, more and more projects are formalising mentoring and thinking about what newcomers to the project might be able to do to learn the ropes and integrate themselves into the community. These programs led to other organised programs like GNOME’s Women Summer Outreach Program. Of course, these initiatives weren’t the first to encourage good mentoring, but they have helped make the idea of mentors taking new developers under their wing much more widespread.
In addition to these scheduled and time-constrained programs, many projects have more informal “always-on” mentoring programs – Drupal Dojo and GNOME Love come to mind. Others save smaller easier tasks for newcomers to cut their teeth on, like the “easy hacks” in LibreOffice. Esther Schindler wrote a fantastic article a few years ago documenting best and worst practices in mentoring among different projects.
Most mentoring programs I have seen and participated in don’t have a very good success rate, though. In this article, I look at why that is the case, and what can be put in place to increase the success rate when recruiting new developers.
Why most mentoring fails
Graham Percival, a GNU/LilyPond developer, decided in 2008 to run an experiment. At one point, Graham decided that he would quit the project, but felt guilty about doing so in one go. So he started the “Great Documentation Project” to recruit a replacement documentation team to follow on after his departure. He then spent 12 months doing nothing but mentoring newcomers to get them involved in the project, and documented his results. Over the course of a year, he estimates that he spent around 800 hours mentoring newcomers to the project.
His conclusions? The net result for the project was somewhere between 600-900 hours of productivity, and at the end of the year, 0 new documentation team members. In other words, Graham would have been better off doing everything himself.
Graham found that “Only 1 in 4 developers was a net gain for the project” – that is, for every 4 apprentices that Graham spent time mentoring, only 1 hung around long enough for the project to recoup the time investment he put into mentoring. A further 1 in 4 were neither gain or loss – their contribution roughly equalled the mentor time that they took up. And the remainder were a net loss for the project, with much more time spent mentoring than the results could justify.
The GNOME Women’s Summer Outreach Program in 2006 had 6 participants. In 2009, the GNOME Journal ran a “Where are they now?” follow-up article. Of the 6 original participants, only one is still involved in free software, and none are involved in GNOME. Murray Stokely did a follow-up in 2008 to track the 57 alumni of Summer of Code who had worked on FreeBSD. Of these, 10 students went on to get full commit access, and a further 4 students were still contributing to FreeBSD or OpenBSD after the project. Obey Arthur Liu also did a review of Debian participants in 2008. Of 11 students from 2008 who had no previous Debian developer experience, he found that 4 remained active in the project one year later.
From my own experience as a replacement mentor and administrator in the Summer of Code for the GIMP in 2006, we had 6 projects, most of which were considered a success by the end of the summer, yet of the participating students, none have made any meaningful contribution to the GIMP since.
I feel safe in saying that the majority of mentoring projects fail – and Graham’s 1 in 4 sounds about right as an overall average success/failure rate. This begs the question: why?
Most mentored projects take too long
What might take a mentor a couple of hours working on his own could well take an apprentice several days or weeks. All of the experience that allows you to hit the ground running isn’t there. The most important part of the mentoring experience is getting the student to the point where he can start working on the problem. To help address this point, many projects now require Summer of Code applicants to compile the project and propose a trivial patch before they are accepted for the program, but understanding the architecture of a project and reading enough code to get a handle on coding conventions will take time. It will also take mentor time. It takes longer to teach a newcomer to your project than to do the work yourself, as anyone who has ever had a Summer intern will attest.
When you set a trainee task which you estimate to be about 4 hours work, that will end up costing a few weeks of volunteer effort for your apprentice, and 8 to 10 hours mentoring time for you during that time. Obviously, this is a big investment on both sides, and can lead to the apprentice giving up, or the mentor running out of patience. I remember in the first year of Summer of Code, projects were taking features off their wishlists that had not been touched for years, and expecting students new to the project to come in and work full time implementing them perfectly over the course of 12 weeks. The reality that first year was that most of the time was spent getting a working environment set up, and getting started on their task.
Mentoring demands a lot of mentors
As a free software developer, you might not have a lot of time to work on your project. Josh Berkus, quoted in Schindler’s article, says “being a good mentor requires a lot of time, frequently more time than it would take you to do the development yourself”. According to the Google Summer of Code FAQ, “5 hours a week is a reasonable estimate” for the amount of time you would need to dedicate to mentoring. Federico Mena Quintero suggests that you will need to set aside “between 30 and 60 minutes a day“.
When you only have 10 hours a week to contribute to a project, giving up half of it to help someone else is a lot. It is easy to see how working on code can get a higher priority than checking in with your apprentice to make sure everything is on track.
More mentoring projects fail for lack of communication than for any other reason.
Apprentices may expect their mentors to check in on them, while mentors expect apprentices to ask questions if they have any. Perhaps newcomers to the project are not used to working on mailing lists, or are afraid of asking stupid questions, preferring to read lots of documentation or search Google for answers. In the absence of clear guidelines on when and how parties will talk to each other, communication will tend towards “not enough” rather than “too much”.
No follow through
Many mentoring programs stop when your first task is complete. The relationship between the mentor and the apprentice lasts until the end of the task, and then either the apprentice goes off and starts a new task, with a new mentor, or that is the end of their relationship with the project. I would be really interested to hear how many Summer of Code mentors maintained a relationship with their students after the end of the Summer, and helped them out with further projects. I suspect that many mentors invest a lot of time during the program, and then spend most of their time catching up with what they wanted to do.
In her OSCON keynote in 2009, Skud talked about the creation of a welcoming and diverse community as a prerequisite for recruiting new developers. Sometimes, your project culture just doesn’t match newcomers to the project. If this happens regularly, then perhaps the project’s leaders need to work on changing the culture, but this is easier said than done. As Chris di Bona says in this video, “the brutality of open source is such that people will learn to work with others, or they will fail”. While many think that this kind of trial-by-fire is fine, the will not be the environment for everyone. It is really up to each project and its leaders to decide how “brutal” or forgiving they want to be. There is a trade-off: investing time in apprentices who will contribute little is a waste of time, but being too dismissive of a potential new developer could cost your project in the long run.
Mentoring best practices
Is all the effort worth it? If mentoring programs are so much hassle, why go to the bother?
Mentoring programs are needed to ensure that your project is long-term sustainable. As Graham says in his presentation: “Core developers do most of the work. Losing core developers is bad. Projects will lose core developers.” Do you need any other reason to start actively recruiting new blood?
There are a few simple things that you can put in place to give your mentoring program a better chance of success.
Mentored tasks should be small, bite-sized, and allow the apprentice to succeed or fail fast. This has a number of advantages: The apprentice who won’t stick around, or who will accomplish nothing, has not wasted a lot of your mentor’s time. The apprentice who will stay around gets a quick win, gets his name in the ChangeLog, and gains assurance in his ability to contribute. And the quick feedback loop is incredibly rewarding for the mentor, who sees his apprentice attack new tasks and increase his productivity in short order. Graham implies that a 10 minute task is the right size, with the expectation that the apprentice might take 1 hour to accomplish the task.
A ten minute task might even take longer to identify and list than it would to do. You can consider this cost the boot-strapping cost of the mentoring program. Some tasks that are well suited to this might include:
- Write user documentation for 1 feature
- Get the source code, compile it, remove a compiler warning, and submit a patch
- Critique 1 unreviewed patch in Bugzilla
- Fix a trivial bug (a one line/local change)
Of course, the types of tasks on your list will change from one project to the next.
Mentoring is management
Just as not everyone is suited to being a manager, not everyone is suited to being a mentor. The skills needed to be a good mentor are also those of a good manager – keeping track of someone else’s activity while making them feel responsible for what they’re working on, communicating well and frequently, and reading between the lines to diagnose problems before it’s too late to do anything about them.
When you think of it in this way, there is an opportunity for developers who would like to gain management experience to do so as a mentor in a free software project. Continually recruiting mentors is just as important as recruiting developers. Since mentoring takes a lot of time, it’s important that mentors get time off and new mentors are coming in in their place.
Pair apprentices with mentors, not tasks
An apprentice should have the same mentor from the day he enters the mentoring program until he no longer needs or wants the help. The relationship will ideally continue until the apprentice has himself become a mentor. Free software communities are built on relationships, and the key point to a mentoring program is to help the creation of a new relationship. Mentoring relationships can be limited in time also, 6 months or a year seem like good time limits. The time needed to mentor will, hopefully, go down over this period.
Regular meeting times
Mentors and apprentices should ensure that there is a time on their calendar for a “one on one” at regular times. How regularly will depend on the tasks, and the amount of time you can spend on it. Weekly, fortnightly or monthly are all reasonable in different situations. This meeting should be independent of any other communication you have with the person – it is too easy for the general business of a project to swallow up a newbie and prevent their voice from being heard. Rands said it well when he said “this chatter will bury the individual voice unless someone pays attention.”
Convert apprentices into mentors
Never do you understand the pain of the initial learning curve better than when you have just gone through it. The people best suited to helping out newcomers to the project are those who have just come through the mentoring program themselves.
This is a phenomenon that I have seen in the Summer of Code. Those students who succeed and stay with the project are often eager to become mentors the following year. And they will, in general, be among the best mentors in the project.
For all involved, it’s useful to have some idea of the issues newcomers have – ensure that documenting solutions is part of what you ask. It’s also useful to know how successful your mentoring program is. Can you do better than the 1 in 4 success rate of LilyPond? Keeping track of successes and failures encourages new mentors, and gives you data to address any problems you run into.
Manage the mentors
All of this work has overhead. In a small project with 1 or 2 core developers, it’s easy enough to have each core developer take an apprentice under their wing, and co-ordinate on the mailing list. In bigger projects, keeping track of who is a mentor, and who is mentoring who, and inviting new mentors, and ensuring that no-one falls through the cracks when a mentor gets too busy, is a job of itself. If your mentoring program goes beyond more than ~5 mentors or so, you might want to consider nominating someone to lead the program (or see who steps up to do the job). This is the idea behind the Summer of Code administrator, and it’s a good one.
Go forth and multiply
Developer attrition is a problem in open source, and recruitment and training of new developers is the only solution. Any project which is not bringing new developers up to positions where they can take over maintainership is doomed to failure. A good mentoring program, however, with a retention rate around 25%, organised continuously, should ensure that your project continues to grow and attract new developers.
Replenishing your stock of mentor tasks and recruiting new mentors will take effort, and continual maintenance of someone putting in a few hours a week. If you execute well, then you will have helped contribute to the long term diversity and health of your project.
February 7, 2011
community, freesoftware, General, gimp, gnome, maemo, openwengo, work
One of the most important documents a project can have is some kind of elaboration of what the maintainers want to see happen in the future. This is the concrete expression of the project vision – it allows people to adhere to the vision, and gives them the opportunity to contribute to its realisation. This is the document I’ll be calling a roadmap.
Sometimes the word “roadmap” is used to talk about other things, like branching strategies and release schedules. To me, a release schedule and a roadmap are related, but different documents. Releasing is about ensuring users get to use what you make. The roadmap is your guiding light, the beacon at the end of the road that lets you know what you’re making, and why.
Too many projects fall into the trap of having occasional roadmap planning processes, and then posting a mighty document which stays, unchanged, until the next time the planning process gets done. Roadmaps like these end up being historical documents – a shining example of how aspirations get lost along the way of product development.
Other projects are under-ambitious. Either there is no roadmap at all, in which case the business as usual of making software takes over – developers are interrupt-driven, fixing bugs, taking care of user requests, and never taking a step back to look at the bigger picture. Or your roadmap is something you use to track tasks which are already underway, a list of the features which developers are working on right now. It’s like walking in a forest at night with a head-light – you are always looking at your feet avoiding tree-roots, yet you have no idea where you’re going.
When we drew up the roadmap for the GIMP for versions 2.0 and 2.2 in 2003, we committed some of these mistakes. By observing some projects like Inkscape (which has a history of excellent roadmapping) and learning from our mistakes, I came up with a different method which we applied to the WengoPhone from OpenWengo in 2006, and which served us well (until the project became QuteCom, at least). Here are some of the techniques I learned, which I hope will be useful to others.
Time or features?
One question with roadmaps is whether hitting a date for release should be included as an objective. Even though I’ve said that release plans and roadmaps are different documents, I think it is important to set realistic target dates on way-points. Having a calendar in front of you allows you to keep people focussed on the path, and avoid falling into the trap of implementing one small feature that isn’t part of your release criteria. Pure time-based releases, with no features associated, don’t quite work either. The end result is often quite tepid, a product of the release process rather than any design by a core team.
I like Joel’s scheduling technique: “If you have a bunch of wood blocks, and you can’t fit them into a box, you have two choices: get a bigger box, or remove some blocks.” That is, you can mix a time-based and feature-based schedule. You plan features, giving each one a priority. You start at the top and work your way down the list. At the feature freeze date, you run a project review. If a feature is finished, or will be finished (at a sufficient quality level) in time for release, it’s in. If it won’t realistically be finished in time for the release date, it’s bumped. That way, you stick to your schedule (mostly), and there is a motivation to start working on the biggest wood blocks (the most important features) first.
A recent article on lessons learned over years of Bugzilla development by Max Kanat-Alexander made an interesting suggestion which makes a lot of sense to me – at the point you decide to feature freeze and bump features, it may be better to create a release branch for stabilisation work, and allow the trunk to continue in active development. The potential cost of this is a duplication of work merging unfinished features and bug fixes into both branches, the advantage is it allows someone to continue working on a bumped feature while the team as a whole works towards the stable release.
Near term, mid term, long term
The Inkscape roadmap from 2005 is a thing of beauty. The roadmap mixes beautifully long-term goals with short-term planning. Each release has a by-line, a set of one or two things which are the main focus of the release. Some releases are purely focussed on quality. Others include important features. The whole thing feels planned. There is a vision.
But as you come closer and closer to the current work, the plans get broken down, itemised further. The BHAGs of a release in 2 years gets turned into a list of sub-features when it’s one year away, and each of those features gets broken down further as a developer starts planning and working on it.
The fractal geometer in me identifies this as a scaling phenomenon – coding software is like zooming in to a coastline and measuring its length. The value you get when measuring with a 1km long ruler is not the same as with a 1m ruler. And as you get closer and closer to writing code, you also need to break down bigger tasks into smaller tasks, and smaller tasks into object design, then coding the actual objects and methods. Giving your roadmap this sense of scope allows you to look up and see in the distance every now and again.
Keep it accurate
A roadmap is a living document. The best reason to go into no detail at all for future releases beyond specifying a theme is that you have no idea yet how long things will take to do when you get there. If you load up the next version with features, you’re probably aiming for a long death-march in the project team.
The inaccurate roadmap is an object of ridicule, and a motivation killer. If it becomes clear that you’re not going to make a date, change the date (and all the other dates in consequence). That might also be a sign that the team has over-committed for the release, and an opportunity to bump some features.
Leave some empty seats
In community projects, new contributors often arrive who would like to work on features, but they don’t know where to start. There is an in-place core team who are claiming features for the next release left & right, and the new guy doesn’t know what to do. “Fix some bugs” or “do some documentation” are common answers for many projects including GNOME (with the gnome-love keyword in Bugzilla) and LibreOffice (with the easy hacks list). Indeed, these do allow you to get to know the project.
But, as has often been said, developers like to develop features, and sometimes it can be really hard what features are important to the core team. This is especially true with commercial software developers. The roadmap can help.
In any given release, you can include some high priority features – stuff that you would love to see happen – and explicitly marked as “Not taken by the core team”. It should be clear that patches of a sufficiently high standard implementing the feature would be gratefully accepted. This won’t automatically change a new developer into a coding ninja, nor will it prevent an ambitious hacker from biting off more than he can chew, but it will give experienced developers an easy way to prove themselves and earn their place in the core team, and it will also provide some great opportunities for mentoring programs like the Google Summer of Code.
The Subversion roadmap, recently updated by the core team, is another example of best practice in this area. In addition to a mixed features & time based release cycle, they maintain a roadmap which has key goals for a release, but also includes a separate list of high priority features.
The end result: Visibility
The end result of a good roadmap process is that your users know where they stand, more or less, at any given time. Your developers know where you want to take the project, and can see opportunities to contribute. Your core team knows what the release criteria for the next release are, and you have agreed together mid-term and long-term goals for the project that express your common vision. As maintainer, you have a powerful tool to explain your decisions and align your community around your ideas. A good roadmap is the fertile soil on which your developer community will grow.
November 23, 2010
Novell will be bought by a North American group called Attachmate that appears to be made up of financiers buying assets as investments. As someone who has seen one acquisition of a company by financiers up-close, and an acquisition of legacy products, where they languished as cash cows, I feel partly qualified to guess what might happen to Novell post acquisition – although I am often wrong about these things, so take all this with a pinch of salt.
The first hint is that Novell is being split into two groups – traditional Novell activities (mostly identity, security, systems & resource management) and Linux business (Suse Linux – presumably including things like Mono, the desktop group, OBS, Suse Studio and other related interesting stuff). In summary, looking at last quarter’s results (PDF link), old stuff that still generates a lot of revenue but little growth, and new & growing business, which just recently became profitable.
If you are a Dark Hand type of guy, the financier who wants a return on investment and doesn’t really care about innovation or changing the world, then your goal is to buy assets, perhaps sell a subsidiary or two to recoup some of the costs of the deal, perhaps change the management team, and keep the profitable business for a 5 year horizon before selling it on to make a profit. Your anticiated ROI for this type of deal would need to be around 8% to 10% per year.
So you sell on some patents & copyrights that you’re not really interested in (presumably with a free license to use said patents for a period of time), you split your business up into the cash cow moneymaker (Old Novell) and the new, growing business that can sell at a high valuiation relative to its earnings (Suse Linux), and you line up a buyer for the speculative Linux business. With $450M for patents and perhaps $800m for the Linux business, you get old, profitable business with limited growth potential, but with regular earnings (~$600M for the last financial year, as far as I can tell, in legacy revenues, with an operating net margin of >10%) and $300M cash on hand (after subtracting liabilities & deferred revenues from cash on hand).
Let’s do the sums, then: let’s say, for arguments sake, that Suse ends up being worth $800m (not unreasonable given annual revenues in the $300m range, with great growth prospects). This represents probably a 3x valuation of (Suse + Ximian), given that Suse was bought in 2003 for $210m – certainly not unreasonable given the growth of Suse and Linux since then, this might even be on the low side. Add in the $450m for patents, and $300m cash assets that they’re getting as part of the deal.
That means Attachmate will be getting all of Novell’s legacy business for $650m, around one year’s revenue. With an annual return of >10% per year on revenues. Presumably, there will be some cost cutting to increase that margin further, and some growth will be expected, so I’m sure that Attachmate are confident that they will find a buyer for Novell after a few years for around the same price, giving them that 50% return in around 4 or 5 years.
I’m sure that some people here more familiar with the financial markets, SEC filings and annual reports, and generally “the way things work” will point out the half-dozen flaws in my thinking here, but this is what I expect to happen – a lot of people in non-core areas will be laid off in an effort to reduce costs and “streamline” the company (ie. make it a more attractive acquisition target), Suse will be sold on, and Novell will be kept as a cash cow.
To all the friends I have working with Novell, I wish you well. Acquisitions are uncertain times, and morale sapping at the best of times. The dust will settle soon.
« Previous Entries Next Entries »