I recently read a peer-reviewed academic paper from a couple years ago that analyzed the contributions of different companies to WebKit. The authors didn’t bother to account for individuals using non-corporate email addresses, since that’s hard work, and did not realize that most Google developers contribute to the project using @chromium.org email addresses, resulting in Google’s contributions being massively undercounted. There were other serious mistakes in the paper too, but this is the one that came to mind when reading The FOSS Post’s article Insights On Companies/Developers Behind Wayland.
The FOSS Post didn’t bother to account for where some big developers work, incorrectly trusting that all employees use corporate emails when contributing to open source projects. It contains some interesting claims, like “Clearly, Samsung and the individual ‘Bryce Harrington’ are almost doing the same work [on Wayland build tools]” and “75% of the code [in libinput] is written by Peter Hutterer. Followed by 10% for a group of individuals and 5% by Red Hat.” I have only very passing familiarity with the Wayland project, but I do know that Bryce works for Samsung, and that Peter works for Red Hat. Suggesting that Red Hat contributed only 5% of the code to libinput, when the real number looks more like 80%, does not speak well of the quality of The FOSS Post’s insights. Also notably, Kristian Høgsberg’s massive contributions to the project were not classed as contributions from Intel, where he was working at the time.
You don’t have to be an expert on the community to take the time to account for people not using corporate emails before publishing an analysis. This is why it’s important to understand the community you are analyzing at least somewhat before publishing such “insights.”
Update: The FOSS Post’s article was completely updated with new charts to address this issue.
Update #2: Jonas reports in the comments below that the charts are still completely wrong.
FWIW, the graphs are still completely wrong. I know several @gmail.com and @gnome.org contributors (me included) who do so on behalf of some company.
Well, that’s your fault then, isn’t it? Use your corporate mail address when working on company time and your private when you you’re not paid. Either that or create some “corporation affiliation” database.
Because proper research is overrated :)
@carlos Is that really practical though? It assumes that you’re able to track down each and every contributor to a large product and are able to ascertain exactly which company they worked for at the time. Also, I would imagine the contributors would understandably be more than a little creeped out if someone were quietly doing research about them personally on the internet versus just scraping information from a git log
It’s impossible to tell when a commit is done on paid time and when it’s done in free time.
If developers are not willing to use proper identification for observers, it’s their own fault when their employer does not get propr credit.
Exactly this, working for a company and contributing on their behalf is different than contributing as an individual, no matter if you work there or not.