## Estimating merge costs

After commenting on Mal Minhas’s “cost of non-participation” paper (PDF), I’ve been thinking about the cost of performing a merge back to a baseline, and I think I have something to work with.

First, this might be obvious, but worth stating: Merging a branch which has changed and a branch which has not changed is trivial, and has zero cost.

So merging only has a cost if we have a situation where the two trees concerned with the merge have changed.

We can also make another observation: If we are only adding new function points to a branch, and the mainline branch does not change the API, there is a very small cost to merging (almost zero). There may be some cost if functions with similar names, performing similar functions, have been added to the mainline branch, but we can trivially merge even a large diff if we are not touching any of the baseline code, and only adding new files, objects, or functions.

With that said, let’s get to the nuts & bolts of the analysis:

Let’s say that a code tree has n function points. A vendor takes a branch and makes a series of modifications which affects x function points in the program. The community develops the mainline, and changes y function points in the original program. Both vendor and community add new function points to extend functionality, but we’re assuming that merging these is an almost zero cost.

The probability of conflicts is obviously greater the bigger x and y are. This probability increases very fast the bigger the numbers. Let’s assume that every time that a given function point has been modified by both the vendor and the community that there is a conflict which must be manually resolved  (1).  If we assume that changes are independently distributed across the codebase (2), we can work out that the probability of at least one conflict is 1 – (n-x)!(n-y)!/n!(n-x-y)! if I haven’t messed up my maths (thanks to derf on #maemo for the help!).

So if we have 20 functions, and one function gets modified on the mainline and another on the vendor branch, we have a 5% chance of a conflict, but if we modify 5 each, the probability goes up to over 80%. This is the same phenomenon which lets you show that if you have 23 people in a room, chances are that at least two of them will share a birthday.

We can also calculate the expected number of conflicts, and thus the expected cost of the merge, if we assume the cost of each of these conflicts is a constant cost C (3). However, the maths to do that is outside the scope of my skillz right now Anyone else care to give it a go & put it in the comments?

We have a bunch of data we can analyse to calculate the cost of merges in quantitative terms (for example, Nokia’s merge of Hildon work from GTK+ 2.6 to 2.10), to estimate C, and of course we can quite easily measure n and y over time from the database of source code we have available to us, so it should be possible to give a very basic estimate metric for cost of merge with the public data.

Footnotes:

(1) It’s entirely possible to have automatic merges happen within a single function, and the longer the function, the more likely this is to happen if the patches are short.

(2) A poor assumption, since changes tend to be disproportionately concentrated in a  few key functions.

(3) I would guess that the cost is usually proportional to the number of lines in the function, perhaps by the square of the number of lines – resolving a conflict in a 40 line function os probably more than twice as easy as resolving a conflict in an 80 line function. This is slightly at odds with footnote (1), so overall the assumption of constant cost seems reasonable to me.

## The value of engagement

(Reposted from Neary Consulting)

Mal Minhas of the LiMo Foundation announced and presented a white paper at OSiM World called “Mobile Open Source Economic Analysis” (PDF link). Mal argues that by forking off a version of a free software component to adjust it to your needs, run intensive QA, and ship it in a device (a process which can take up to 2 years), you are leaving money on the table, by way of what he calls “unleveraged potential” – you don’t benefit from all of the features and bug fixes which have gone into the software since you forked off it.

While this is true, it is also not the whole story. Trying to build a rock-solid software platform on shifting sands is not easy. Many projects do not commit to regular stable releases of their software. In the not too distant past, the FFMpeg project, universally shipped in Linux distributions, had never had a stable or unstable release. The GIMP went from version 1.2.0 in December 1999 to 2.0.0 in March 2004 in unstable mode, with only bug-fix releases on the 1.2 series.

In these circumstances, getting both the stability your customers need, and the latest & greatest features, is not easy. Time-based releases, pioneered by the GNOME project in 2001, and now almost universally followed by major free software projects, mitigate this. They give you periodic sync points where you can get software which meets a certain standard of feature stability and robustness. But no software release is bug-free, and this is true for both free and proprietary software. In the Mythical Man-Month, Fred Brooks described the difficulties of system integration, and estimated that 25% of the time in a project would be spent integrating and testing relationships between components which had already been planned, written and debugged. Building a system or a Linux distribution, then, takes a lot longer than just throwing the latest stable version of every project together and hoping it all works.

By participating actively in the QA process of the project leading up to the release, and by maintaining automated test suites and continuous integration, you can mitigate the effects of both the shifting sands of unstable development versions and reduce the integration overhead once you have a stable release. At some stage, you must draw a line in the sand, and start preparing for a release. In the GNOME project, we have a progressive freezing of modules, progressively freezing the API & ABI of the platform, the features to be included in existing modules, new module proposals, strings and user interface changes, before finally we have a complete code freeze pre-release. Similarly, distributors decide early what versions of components they will include on their platforms, and while occasional slippages may be tolerated, moving to a new major version of a major component of the platform would cause integration testing to return more or less to zero – the overhead is enormous.

The difficulty, then, is what to do once this line is drawn. Serious bugs will be fixed in the stable branch, and they can be merged into your platform easily. But what about features you develop to solve problems specific to your device? Typically, free software projects expect new features to be built and tested on the unstable branch, but you are building your platform on the stable version. You have three choices at this point, none pleasant – never merge, merge later, or merge now:

• Develop the feature you want on your copy of the stable branch, resulting in a delta which will be unique to your code-base, which you will have to maintain separately forever. In addition, if you want to benefit from the features and bug fixes added to later versions of the component, you will incur the cost of merging your changes into the latest version, a non-negigible amount of time.
• Once you have released your product and your team has more time, propose the features you have worked on piecemeal to the upstream project, for inclusion in the next stable version. This solution has many issues:
• If the period is long enough, your feature additions will be long removed from the codebase as it has evolved, and merging your changes into the latest unstable tree will be a major task
• You may be redundantly solving problems that the community has already addressed, in a different or incompatible way.
• Feature requests may need substantial re-writing to meet community standards. This problem is doubly so if you have not consulted the community before developing the feature, to see how it might best be integrated.
• In the worst case, you may have built a lot of software on an API which is only present in your copy of the component’s source tree, and if your features are rejected, you are stuck maintaining the component, or re-writing substantial amounts of code to work with upstream.
• Develop your feature on the unstable branch of the project, submit it for inclusion (with the overhead that implies), and back-port the feature to your stable branch once included. This guarantees a smaller delta from the next stable version to your branch, and ensures you work gets upstream as soon as possible, but adds a time & labour overhead to the creation of your software platform

In all of these situations there is a cost. The time & effort of developing software within the community and back-porting, the maintenance cost (and related unleveraged potential) to maintaining your own branch of a major component, and the huge cost of integrating a large delta back to the community-maintained version many months after the code has been written.

Intuitively, it feels like the long-term cheapest solution is to develop, where possible, features in the community-maintained unstable branch, and back-port them to your stable tree when you are finished. While this might be nice in an ideal world, feature proposals have taken literally years to get to the point where they have been accepted into the Linux kernel, and you have a product to ship – sometimes the only choice you have is to maintain the feature yourself out-of-tree, as Robert Love did for over a year with inotify.

While addressing the raw value of the code produced by the community in the interim, Mal does not quantify the costs associated with these options. Indeed, it is difficult to do so. In some cases, there is not only a cost in terms of time & effort, but also in terms of goodwill and standing of your engineers within the community – this is the type of cost which it is very hard to put a dollar value on. I would like to see a way to do so, though, and I think that it would be possible to quantify, for example, the community overhead (as a mean) by looking at the average time for patch acceptance and/or number of lines modified from intial proposal to final mainline merge.

Anyone have any other thoughts on ways you could measure the cost of maintaining a big diff, or the cost of merging a lot of code?

## Frustration

I wonder if it was a mistake to adopt the “evaluate as they come in” method for Maemo Summit presentations. As we received proposals, for each proposal on its merits we said yes, no or maybe. If you were a yes, you were added to the schedule. A no got a nice email. A maybe stayed in the queue.

We set a deadline for submissions of September 13th, but this was a deadline for us to finish the schedule, not for people who wanted to give presentations to submit. I said as much in the call for content: “The final deadline for submissions will be September 13th but the sooner you submit your proposal the better chances you will have to get a slot”.

After Nokia World, a bunch of people came out of the woodwork to propose quality presentations, and after reviewing pending proposals last week, we now have an agenda which is almost full – there are 5 open slots and about 8 open lightning talk slots, about half of which are potentially taken already.

So it’s slightly frustrating to see 16 new submissions come in over the past 2 days as people saw the deadline arriving and the schedule filling up. If they were all there before, our choices might have been different, but now we will unfortunately be obliged to reject otherwise great presentations, simply because the proposers waited too long to ask for a slot.

It’s a tough problem to solve, though – if we had set an earlier deadline, we would not have received many of those presentations, or they would have been vague proposals like “can’t say much yet, but this’ll be a cool presentation about something related to Fremantle”. Approving presentations early allowed the council to have better information for travel subsidies and allowed people to book travel earlier and thus cheaper. But we’re going to miss out on some presentations I think would be pretty good. Pity.

## Ton Roosendaal to keynote the Summit community days

1 Comment

Ton Roosendaal (on the left)

I’m very pleased to share that the opening keynote for the community days during the Maemo Summit will be Ton Roosendaal of the Blender Foundation. It’s been my pleasure to know Ton, mostly from afar, for the past few years, and he is one of the most amazing people I know in the free software world.

Ton is one of those people who has a sense of doing things big, and doing them right. Over the past few years, Ton has raised money to hire artists and developers to work on commercial quality films and games, resulting in Project Orange, Project Apricot, Project Peach and now Project Durian is in pre-production (you can buy your copy now and get your name on the credits!), with the goal of showing off what Blender can do and making the program better by working closely with artists to see what needs work. The results are truly impressive, and the amount of foresight and hard work which went into getting each of these projects off the ground and completed is amazing.

The Blender community is also amazing. Ton has continually given passionate users a reason to stay around, and ways to help the project, and that has been rewarded by a diverse community of artists and developers working together. The BlenderNation fan/news site is a testament to the creativity and passion of the community.

I’m really looking forward to hearing Ton speak.

## Maemo Community Council elections Q3 09: Nominations open

1 Comment

The nomination period for candidatures for the Q3 2009 Maemo community council election is now open.

Candidates eligible for election according to the rules of the council can be nominated by anyone in the community. If a maemo.org community member nominates someone other than themselves, the nomination must be accepted by the nominee before it is official.

Nominations may be made before 23:59 UTC, September 20th, at which time
a voting period of one week will open, by sending an email to the
maemo-community mailing list with the subject “Council Nomination:”
followed by the name of the nominee. Nominations can be confirmed by the
nominee replying to this email.

I encourage anyone who would like to be on the council to nominate
themselves early, and I would encourage all community members to be
forthcoming with questions for the candidates.

Important election dates:

Sept 7
Nominations open for Maemo Community Council elections
Sept 21
Nominations close, voting opens
Sept 28
Voting closes, provisional results declared
Oct 5
If no challenges are upheld, results for elections are final

Good luck to all!