The Lifecycle of a Patch (or: Working Upstream)

January 14, 2011 3:45 pm Dave Neary community, freesoftware, gimp, gnome, inkscape, maemo, meego

Yesterday I looked into what it means to be a maintainer of a package. Today, I’m going to examine how to affect change in a distribution like MeeGo, and what it means to work upstream. To do so, we’re going to look at how code gets from a developer’s brain into the hands of a user.

So – how can you make a change in a Linux-based distribution? Here’s what happens when everything works as it should:

You open a bug report for the feature against your distribution
You identify the module or modules you need to change to implement the new feature
You open bug reports for each of the modules concerned, detailing the feature and the changes needed in that module for the feature
You write a patch to implement the feature, and propose it (appropriately cut up for ease of review) to the maintainers of those modules
Once the code has gone through the appropriate review process, it will be committed to the source control of the module(s)
Some time later, the maintainer of each module will include that code in a stable release of the module
Some time after that, the new stable versions will be packaged and uploaded to MeeGo
Your code will be included in the next release of the distribution following the upload.

When people talk about “working upstream” in MeeGo or Linaro, this is what they mean.

To simplify matters for our analysis, let’s consider that the feature we want to implement is self-contained in one module (or related modules which release together). There are two different scenarios we’ll consider:

The module is maintained by people not associated with your distribution (for example, a GNU or GNOME project)
The module is maintained by people closely related to your distribution (for example, Unity in Ubuntu, or oFono in MeeGo)

We will also look at a third situation, where you find and fix a bug in the software you are using – that is, a released version of a distribution (the proverbial “scratching an itch”).

For each case, I will try to pick a representative feature/patch and follow it from developer through to distribution to Real Users.

What if your code changes different projects?

If your code touches several modules (for example, if you are proposing some new API in GTK+ which you want to use in the GIMP) then things can get complicated – you will need a stable version of GTK+ to be released before you can ship a stable release of the GIMP which depends on it.

This issue of staggered releases is one that Andrew Cowie pointed out a few years ago for language bindings. To avoid making bindings on shifting sands, he preferred to package new APIs once they had been included in a stable GNOME release. In turn, Java GNOME developers rarely depend on development release bindings, and they would wait for the new API to be included in a stable bindings release. For example, the gtk_orientable_get_orientation, added to GTK+ at the end of September 2008, was released in GTK+ 2.16, in March 2009. The first version of Java-GNOME which depended on GTK+ 2.16 was version 4.0.13, released in August 2009. That was packaged in distributions in Autumn 2009, and so most users would not have access to the newer bindings for a few months after that – perhaps early 2010 – at which point, the API was written 18 months beforehand.

And that is when you have a regular release schedule you can rely on! Pity the developer who wants to release a GIMP plug-in which depends on some API included in GIMP 2.8 – the last stable GIMP release, 2.6, came out in October 2008, and over two years later, 2.8 still has not released. And when you combine unreliable release schedules for distributions and applications, the results are cumulative: users of the stable Debian distribution are still using GIMP 2.4 releases. The GIMP 2.4 released in October 2007. Features added to the GIMP in late 2007 are still not in the hands of users of stable Debian distributions.

Getting features to users

It is difficult to generalise when users upgrade their Linux distributions, or even to say what proportion of Linux users are new users at any given time. It would be over-simplifying to say that developers use bleeding-edge distributions, power users upgrade early to the latest and greatest, new users install the latest distributions available, but will only upgrade every 18 months or so afterwards, and conservative users stick with “Long term service” or stable distributions. Most developers I know use their computer for work (and thus want a stable distribution) and only install the latest versions of various dependencies they need to work on their project. But let’s generalise and say that this is roughly the case. So (guesstimating) about 10% of your users will be upgrading to the latest distribution very quickly after its release, a further 20% in the months after when the bugs are shaken out, and the rest will follow along in their own time, perhaps 12 or 18 months later.

To make this concrete, let’s follow the life of a single patch. This is complete anecdata, but in my defence, the patch has been chosen by random, from a project which I know has good community processes and release management in place. The patch we’re going to follow adds an extension to Inkscape to render objects along triangular paths.

Bug #226001 opened on 2008-05-03 by inductiveload, with a description of the feature to be added, and proposed code to implement it. The code, as an extension, may have a lower bar for acceptance than code which is core to a project.
Patch submission reviewed on 2008-05-03, minor comments, but patch is accepted (note: This was not the authors first submission to Inkscape)
Patch corrected to respond to comments and committed on 2008-05-03 (did I mention these guys had good community processes!?!)
Inkscape 0.47-pre0, containing the Triangle extension, released on 2009-07-02
Inkscape 0.47-pre4 included in Ubuntu 9.10

So for a feature developed in mid 2008, most Inkscape users will still not have the feature by the end of 2009, 18 months later. This is both a typical and atypical example: in many projects, patch proposals lay unreviewed for days, weeks, sometimes months, but the 0.47 release cycle was a particularly long one for Inkscape. However, I think the lag from code written to presence on user’s hard drives of ~12 to 18 months is about correct.

Does it have to be this hard?

If this were the only way to get features into a distribution, trying to improve MeeGo by contributing upstream would be a very frustrating experience. Happily, there are ways to accelerate the process. Taking the MeeGo kernel as an example, where Greg Kroah-Hartman recently threw in the towel on persuading people to propose patches upstream; the process is supposed to work like this:

Propose a patch for inclusion upstream. This patch will then ship in a future stable kernel release (let’s say 2.6.38).
After peer review, when the code has been accepted for inclusion in the kernel upstream, propose a backport for inclusion in the MeeGo kernel. The back-ported patch will be maintained across the next MeeGo release, and will be dropped when the kernel version included in the MeeGo project catches up with 2.6.38.

The overhead here is reduced basically to the peer review process of the upstream project, and the cumulative cost of merging a patch over the course of 6 months.

As a distributor (or a developer working on a specific distribution), this allows you to get code to everyone, eventually, and have that code included in your distribution as soon as you are sure that it is up to the standard expected by the community. Currently in MeeGo, the trend seems to be more towards submitting patches concurrently upstream and to MeeGo kernel maintainers (or even submitting them upstream once they have been accepted into the MeeGo kernel). In the case that a patch requires substantial modifications, or is rejected outright, upstream, the kernel maintainers are then left carrying a patch indefinitely in the distribution. For one patch, this might not be a big deal, but for thousands of patches, the maintenance and integration burden of these patches adds up.

It is also not unusual for kernel developers to maintain their own git branches for a long time. Three examples that come to mind are inotify, which Robert Love maintained for over a year for both Novell and in the kernel before it was accepted into the mainline, ReiserFS, which was maintained for several years out-of-tree before being shipped with the Linux kernel in 2001, and the fast desktop patchset which Con Kolivas maintained for almost five years on the -ck kernel branch. Distributions will occasionally ship a substantial diff to upstream if there is a maintainer committed to getting the code upstream eventually. Allocating someone to work over a long period to make everyone happy and comfortable with your code may enable you to ship a big patch to upstream, but this will not be sustainable long term.

To summarise: when working upstream, as a distribution, you should only ship with patches which have been accepted in a development version of upstream already, if you can help it.

Meetings in telephone boxes

Sometimes, however, when upstream and downstream coincide, you can simplify things considerably, while also adding a small measure of risk.

In MeeGo, to continue with that example, the distribution architects have a pretty good idea when they can expect emergency telephony to be ready for oFono and the MeeGo telephony stack, because they’re writing it. By co-ordinating the upstream release management with downstream packaging, you can make promises as a distribution which you can’t with community-developed software.

When upstream and downstream are co-ordinating each other, we cut out the middleman. The workflow becomes:

Report a bug/feature request against a component of the distribution
Develop a patch which implements the feature, and submit it directly to the distribution bug tracker
Once it has been reviewed and accepted, you know that your patch will be included in the next version of the distribution.

This gives a distribution much more control, both over what gets done, and when, and explains both the Ayatana and MeeGo UX development projects. However, being able to plan around the release is no guarantee that the release will happen on time: GNOME has in the past been stung by planning during the 2.6 development cycle to depend on a new version of GTK+, only to find that the release was delayed. In the end, the GTK+ release shipped in time for the 2.6 release at the end of March.

Scratch scratch

The other patch lifecycle I’d like to mention, because it is so relevant to distributions, was pointed out to me by Federico Mena Quintero yesterday. What happens to a patch that someone makes and submits to a distribution when they find a bug in stable released software? This is one of the key advantages of free software – if you find a bug in the software you use, and you have the wherewithall, you can fix the bug and share that fix with everyone else.

However, as we have seen, there is typically a lag of several months from the time that software is released and the time it is being used by large numbers of users through distributions. With releases of Red Hat Enterprise Linux, Novell Suse Linux Desktop and Ubuntu LTS being supported for up to 5 years, it is possible that important bugs will be fixed in these stable versions for years after the original developers have moved on, and are no longer maintaining older stable versions.

Let’s say I find and fix a bug in Rhythmbox 0.12.5, which ships with Ubuntu 9.10. I open a bug report on Launchpad, attach a fix to the source .deb there, and I update my local copy. As a user, I’m happy – I have fixed my problem and shared the solution with others. If I’m particularly conscientious, I might open a bug on gnome.org against Rhythmbox and attach my patch there, but since the development version is now 0.13.2, the best you can hope for is that the patch applies cleanly to the master branch, and will be included in the next release. It is very unlikely that the upstream maintainers will release another update to the 0.12 series at this point.

Now imagine that you are a maintainer for Suse, and someone reports the same bug against a long-term service release.In practice, there are several different versions being maintained by different distributions, and no good way to know if the same bug has been reported and fixed by someone else. You end up searching for a fix in upstream bug trackers, and in the bug trackers of each of the other main distributions. According to Federico at the time:

Patches for old versions are traded in the black market. You have friends in another distro? You ask them first, “did you guys already fix this?” Those patches don’t ever manage to reach CVS, where everyone would be able to get them.

Ideally, you could collaborate ahead of time with other distributions to ensure that you are all using the same branch of upstream modules, and are committing patches upstream. The Linux kernel is moving to this model, and there are also discussions underway in GNOME to co-ordinate this type of activity. Mark Shuttleworth has also pushed for something similar by encouraging projects in the core Linux platform to have a regular cadence of releases, so that everyone can synchronise their longer term service offerings every couple of years.

But at the moment, the best you can hope for is that your patch will be included in an upcoming release for your distribution, and which point other users of the distro can avail of it, and that upstream will patch their development version and latest stable versions, and get your patch to everyone in a few months.

Working upstream

The goal of this article is to explain what working upstream actually means, and how to make that more palatable for a distribution that wants to get features written and included in their next release. Hopefully, by pointing out some of the shortcomings of the way patches circulate from developers to users, some of these issues can be addressed.

In any case, one thing is clear – if you are carrying a patch as a distribution without ever submitting it upstream, you are making a costly mistake. You will be carrying code that others won’t, and bearing all of the merge and maintenance burden for that code for years to come. The path to maximum happiness is to co-ordinate with other distributions and with upstream to ensure that everyone is working in the same place, and sharing work as much as possible.

5 Responses

Safe as Milk » Blog Archive » What’s involved in maintaining a package? Says:
January 14th, 2011 at 3:46 pm
[…] happens, in all its gory details, is the next instalment in this series of at least 2 articles: The Lifecycle of a Patch (or: Working Upstream). 6 […]
Tweets that mention Safe as Milk » Blog Archive » The Lifecycle of a Patch -- Topsy.com Says:
January 14th, 2011 at 4:36 pm
[…] This post was mentioned on Twitter by Irish LinuxUG, Zuissi. Zuissi said: Planet Gnome: Dave Neary: The Lifecycle of a Patch: Reposted from Neary Consulting (or: Working Upstream) Yester… http://bit.ly/i1N4r4 […]
I’m looking for a Linux distro with fast browsing and a fast boot. Any suggestions? » Answer Says:
January 14th, 2011 at 8:12 pm
[…] Safe as Milk » Blog Archive » The Lifecycle of a Patch (or … […]
Bob Says:
January 15th, 2011 at 6:09 pm
Any chance to automate this process?
Given everybody talks about coordination, but no one has time to dispose and manage it, would it be possible to engage continous automatic scraping into distributions’ bugtrackers and/or distributions’ source packages to spot patches which have not been published to the referring projects?
Рекомендации по подготовке кода, направляемого для включения в состав Linux-ядра | AllUNIX.ru – Всероссийский портал о UNIX-системах Says:
January 17th, 2011 at 1:17 am
[…] к теме продвижения патчей в upstream, можно отметить заметку Дэйва Нири (Dave Neary), входившего в совет директоров […]

Safe as Milk