Attention! Speed bump ahead!
September 14, 2012 8:47 pm community, freesoftware, GeneralThis week I was reminded that the first step in an Open Source project is often the hardest. We’ve been using MediaWiki for a project at work, with the “ConfirmAccount” extension to deal with spammers – a very nice extension indeed! It adds account creation requests to a queue where they can be handled by members of the bureaucrat group.
We had one wishlist item. We wanted to have a notification email sent out to every member of the group when a new request was received. There is an existing feature to send a notification email to a configurable email address, but not to the whole bureaucrat group.
So I said to myself, how hard can that be? And I rolled up my sleeves and set aside an afternoon to make the change, and submit it upstream. After a few false-starts which were nothing to do with MediaWiki, I got down to it.
First task: Upgrade to the latest version of MediaWiki – the version on Fedora 17 is MediaWiki 1.16.5, and the latest stable version is 1.19.2. So I download the latest version, and follow the upgrade instructions to over-write the system MediaWiki install in /usr/share/mediawiki with the upstream version. Unfortunately, the way that $IP (the Install Path) is set changed in version 1.17, in a way that took a little time to work around.
Once that was done, I downloaded the HEAD of the trunk branch from SVN which was linked from the old version of the extension home page, and got the extension working. That needed a few additional modules, and some configuration to get the email notifications working locally, but eventually, I was good to go.
I got to work making my change. It took a while, but once I figured out how to turn on debugging and the general idiom for database queries, it was easy enough. After a couple of hours hacking and testing, I was happy with the result.
The first date
I headed back to the extension home page to figure out how to submit a patch. At the same time, out of habit, I joined the project IRC channel, #mediawiki on Freenode, reasoning that if I got lost I could ask for help there. No indication on the Extension page, but a web search showed me that MediaWiki uses Bugzilla. So I registered for Yet Another Bugzilla Account, and confirmed my email address a few minutes later. Then I created a bug on Bugzilla, and attached my patch to the report.
Simultaneously, on IRC, I was asking for help and was told by a very nice community guy called Dereckson that the preferred way to submit patches was through Gerrit. It turned out that the extension home page should have been pointing to the more recent Git repository all along, and I had been developing against the wrong version of MediaWiki. Dereckson updated the extension page with the right repository information as soon as he discovered it hadn’t been updated before. No big deal, I cloned the Git repository, and tried to apply the patch from svn to git. Unfortunately, it didn’t work – some other changes related to translatable strings had changed code in the same area, and I had to re-do the change, but that was pretty easy.
I did try to submit a patch by following the procedure in the git workflow document, but without an account on Gerrit it didn’t work, of course. Dereckson convinced me to apply to developer access. After some initial resistance, because I really didn’t want to be a MediaWiki developer just to submit a drive-by patch, I requested developer access with the comment “I just wanted to submit one patch to an extension I use, Extension:ConfirmAccount.” Half an hour later, jeremyb approved my request with the comment “That’s what you think now!” 🙂
Then I went to the documentation on getting access and followed the instructions there until I was directed to upload my ssh key to labs and found I did not have access to that resource. Thanks again to Dereckson (once more to the rescue!) on IRC, I found my way to getting set up for git and Gerrit, and got an SSH key set up for Gerrit. Then I went back to the instructions for submitting a patch and a quick “git review” later, I had submitted my first patch ever to Gerrit.
Pretty quickly, the first couple of reviews came in. First comment: “There’s some whitespace issues here.” Gee, thanks. Second comment, from Dereckson (again!) started with a “Thank you”, said the idea was a good one, and then gave me an example of the project norms for commit comments, and made one comment on the code suggesting I use an option.
As a first time user of Gerrit, I noticed a few issues with it for newbies. It’s not at all clear to me how to distinguish “important” commenters from trivial-to-change things like whitespace issues. It’s also not clear whether a -1 blocks a commit, or how to have a discussion with someone about the approach taken in a patch. Also, it was unclear what the suggested way to update a patch was and propose a new, improved version. My first try, I made some changes, committed them into a new revision on my local branch, and pushed the lot for review (normal git workflow, I daresay). Unfortunately, this was not correct. I ended up squashing the two commits with a “rebase -i”, and since then I have been using “commit –amend”.
After a few more rounds of comments (another whitespace comment, and a suggestion to avoid hard-coding the group name), I am currently on the 5th patchset, which I think does what it says it should, and will pass review muster, when someone gets around to reviewing it. I’ve been told that the review time for a small patch like this can be up to 5 or 10 days, and I don’t know exactly how I know that the process is over and it’s good to commit.
Sunk costs
The end result is that for a small change to a fairly simple MediaWiki extension, I spent about 2 hours coding, and about 4 or 5 hours (a full afternoon) going through the various hoops involved in submitting the change for review upstream.
I’m aware that this is a one-off cost – that now that I have a Bugzilla account, and a git and Gerrit account, it will be easier next time. Now that I have spent the time reading the MediaWiki coding conventions, git workflow, and have spent time understanding how to use Gerrit, I won’t have these issues again. The next patch will only take a few minutes to submit, and I won’t be wondering if I did something wrong if I don’t get a review in the first 10 minutes.
But along with some installation and firewall issues, I ended up spending slightly more than a full day on this. In hindsight, I’m saying to myself “was it really worth a full day of work to avoid maintaining this 20 line patch over time?”
I think it’s important that projects make newcomers jump through some hoops when joining your project – the tools you use and the community processes you follow are an important part of your culture. Sometimes, however, the initial investment that you have to put into the first-time use of a tool – the investment that regular contributors never see any more – is big.
If you’ve never used Bugzilla, git, Gerrit, or SSH, how long would it take you to submit a first patch to a project? How many hurdles does someone have to jump through to submit a patch for your project? Is there a way to ease people into it? I could imagine something like an email based process for newcomers, and only after they’ve made a few contributions, insist that all of the community’s preferred tools and processes be used. Or having true single sign-on, where you have only one account-creation process for all of your interactions with the project, so that you don’t end up creating a wiki account, a Bugzilla account, a Labs account and configuring a Gerrit account.
I want to make clear – I am not picking on MediaWiki here. I rate the project well above average in the speed and friendliness with which I was helped at every turn. But they, like every project, have adopted tools to make it easier for regular contributors, and to help ensure that no patches get dropped on the floor because of poor processes. Here’s the $64,000 question: are the tools and processes which make it easier for regular contributors making it harder for first-time contributors?
September 14th, 2012 at 10:39 pm
In an alternate universe you downloaded any one of a million mailer daemons, to which you forwarded that single mail. That daemon then sent out emails to everyone on the list. You where done in less than half an hour, and in the end you had a more robust and well tested interface for maintaining the list. 😉
September 15th, 2012 at 12:34 am
This is one of the problems that GitHub solves. It’s a win especially for “long tail” small open source projects. But even large projects probably get more casual contributors if they’re on GitHub.
September 15th, 2012 at 12:44 am
whitespace consistency is important in a project, because fixing it up later makes things harder to find in version control history, so it is better to remove inconsistent stuff before it hits the version control repo.
September 15th, 2012 at 10:48 am
Dave: I agree! It would have been nice for the first commenter to be a little more precise (as the second whitespace commenter was), and perhaps to also look at the code and say whether it looked good. Also, doesn’t Gerrit provide a means to propose replacement patches as well as comments? Might have been nice to change the whitespace for the newbie – and/or point to the relevant coding conventions document. I’m just saying: “This patch has whitespace issues.” isn’t the nicest review comment to get for one’s first patch proposal.
September 15th, 2012 at 12:23 pm
@Havoc: I do wonder, though, if drive by contributors are enough for established and big open source projects. Yes, there is the chance that a one off will turn into a continuing contribution, but in my experience that only happens if there already is the potential interest for that in the first place.
A lot of the process is in place to avoid working on the same old stuff, or to optimise the path for continuing contributors.
Yes, we should optimise the path for the casual contributor, but maybe we need a separate path, instead of changing the established one.
September 15th, 2012 at 1:45 pm
I think are (at least) two very different kinds of contributors to a project:
1. Those that already have a fix that works for them, and are just trying to “do the right thing” by contributing back upstream. The goal here is to make contributing the patch back *easier* for the submitter than just maintaining the delta indefinitely. Most such contributors will have a fairly “take it or leave it” attitude – the ideal thing (IMO) is to provide an easy way for them to *publish* their change, in a way that someone else can take it over and do the polishing needed to add it to the project. In CPython, this means posting patches to the issue tracker, in other projects it may mean sending them to a mailing list, on GitHub it means submitting a pull request. In all cases, the important point is that the existence of the problem and the patch that solves it are made *public*, rather than the problem remaining unreported and the patch sitting behind the author’s firewall.
2. Those that have a general interest in a project and *want* to be highly invested in it. These are the people that will actually be reviewing patches and shepherding them through to incorporation (as well as working on their own interests).
Now, sometimes people in the first category will “catch the bug” and migrate into the second category. Other times, someone will be looking for a new hobby (for whatever reason), and decide to get more involved in a project they find interesting.
I *do* think it’s important for a project to have a mechanism for people to “lob patches over the fence”. The expectation needs to be set clearly that such patches will often remain unreviewed and unincorporated, but that’s better than not providing a venue for publishing them at all. I also think that it’s important for a project to encourage people to make the move into the second category – senior developers on projects often end up with other demands on their time and stop being as actively involved, so it’s important to have new contributors starting fairly continuously.
For CPython, the PSF board actually paid Brett Cannon to spend 3 months bringing our developer guide up to scratch, and Jesse Noller and I created the “core-mentorship” mailing list as a space for people to ask questions about the mechanics of contribution without intruding upon the inboxes of the far larger number of participants in the main python-dev mailing list.
As far as GitHub and BitBucket and their ilk go, I see the cries of “Please move your hosting to GitHub” as a complete betrayal of the promise of DVCS. Even though GH in particular has done wonders in terms of making it easy to submit fixes using a similar workflow across multiple projects, it’s still placing way too much power in the hands of a single vendor.
September 16th, 2012 at 8:53 am
Having a single sign on for the whole project is hard. We did it for Mageia, and there is lots of small issues (did a talk on it for FOSDEM), because most web tools are not designed for this.
This also requires a team of good sysadmins to handle that, and usually, that’s not something easy to find, since everybody think he is good enough for that.
September 19th, 2012 at 4:53 pm
Hi there! I love this post, because it’s a great statement of the current state of things. I think WMF is doing a lot to make it better, but we could certainly do more.
Emailing patches seems like an OK first-shot implementation of that, but really there needs to be a better way to make patches as a new contributor. I’m pretty sure that allowing registration on Gerrit, and defaulting to having no privileges for review (but all the freedom to make patches) would solve the problem. There may be other, non-Gerrit ways to do this, but this seems like it’s a nice, consolidated solution.
I’d love to help work with the Gerrit/labs admins on this. I’m marktraceur in IRC and am always willing to poke the right people for a task 🙂 I may even find time to work on this if you catch me at the right time!
Thanks for the great article.
September 21st, 2012 at 4:26 pm
The barrier to entry for drive-by patches is far too high for many projects. There must be a dozen patches that I gave up trying to submit to various projects over the years.
October 8th, 2012 at 7:08 pm
I loved this post! I think many people share the same frustrations and you made it very clear what obstacles there are for newbie contributors.
I only recently started contributing to MediaWiki and I had to keep a few notes for personal use since the Git workflow page (which you link to above) has become huge (and so did the tutorial).
I was therefore quite pleased when I found out that someone set to create a “one-printed-page” cheatsheet (aptly called Git/TLDR that would be useful to newcomers and experts (well, intermediates) alike.
I’ve been trying to improve it however I can and spread the word about it. You might want to check it out and perhaps even link to it from your post 🙂
October 12th, 2012 at 7:12 pm
Well, there are certainly multiple ways to improve the experience:
– At LibreOffice you can email your patch to gerrit@libreoffice.org and it should be picked up to gerrit for you
– needing any form of approval to be able ‘apply for developer access’ or ‘uploading your ssh-keys’ is of course a bad and needless speedbump indeed.
October 16th, 2012 at 2:02 pm
My job is to mentor developers in the art of contributing upstream, and I would give Dave the highest possible grade if he were my student. He clearly jumped through all the hoops with Buddhist patience, and it looks like the patch will be accepted.
Regarding the number and difficulty of the hoops: even the best community-oriented projects have them. If the project is small, the hoops mostly involve talking to the right people, as there is probably little to no documentation on what the guidelines are (and if there are, they might be out of date, as in MediaWiki’s case). If the project is big (and healthy), there probably is a lot of documentation, but then the problem becomes learning it all and getting through the review process.
Github makes the technical aspects of contributing easier, but I second @Nick’s opinion that this puts too much power in the hands of one vendor. For a fully free software alternative, I suggest people look at .
January 11th, 2013 at 5:24 pm
As always, a thought-provoking post, Dave.
I do, however, have trouble with what you and several of the other commenters call “process”.
A sequence of steps which is not just un-documented, but actually mis-documented, and which requires trial and error to achieve repeatability, is not a process by any definition I’ve ever stumbled across.
I think a lot of open source project administrators would help build their communities by studying “process” and applying some of the repeatability principles that the rest of the world has documented over many decades of research.
The software development library of any given university is chock full of “process” textbooks, published especially in the 70s, 80s and 90s, written on software-specific best practices such as how to maintain a “product” over its lifecycle, how to document test cases, etc.
A study of “business process” will provide more general principles that would apply to an open source community.
For example, Wikipedia lists some goals for a “business process” (http://en.wikipedia.org/wiki/Business_process):
* Definability
* Order
* Customer
* Value-adding
* Embeddedness
* Cross-functionality
Definability and Order are definitely missing from the MediaWiki “process” you described.
Large open source organizations are often myopic toward maintaining any aspect of open source software outside the code.
Build scripts, user documentation and tests (automated and/or documented manual tests) are rarely maintained by large open source organizations, even in an age when developers frequently complain about the lack thereof (or the complexity thereof) in other open source projects.
Creating and maintaining process documentation is often overlooked, too. Not to pick on them, but it seems the MediaWiki developers don’t even have process documentation on their radar.
Maybe it’s asking for too much, but I would love to see a follow-up article on open source organizations that *do* maintain process documentation well, and how they go about keeping it up to date. Gnu is probably one that tries (http://www.gnu.org/help/help.html#helpgnu), although its individual umbrella projects are very hit and miss. Undoubtedly there are others who do a good job of maintaining community process docs — maybe Red Hat?
In any case thanks for the interesting article!