Code review in open source projects: Influential factors and actions

Coming from “Prioritizing volunteer contributions in free software development”, the Wikimedia Foundation allowed me to spend time on research about code review (CR) earlier in 2016. The theses and bullet points below incorporate random literature and comments from numerous people.
While the results might also be interesting for other free and open source software projects, they might not apply to your project for various reasons.

In Wikimedia we would like to review and merge better code faster. Especially patches submitted by volunteers. Code Review should be a tool and not an obstacle.
Benefits of Code Review are knowledge transfer, increased team awareness, and finding alternative solutions. Good debates help to get to a higher standard of coding and drives quality.^[A1]

I see three dimensions of potential influential factors and potential actions (that often cannot be cleanly separated):

3 aspects: social, technical, organizational.
2 roles: contributor, reviewer.
3 factors: Patch-Acceptance/Positivity-Likeliness, Patch-Time-to-review/merge, Contributor onboarding (not covered here).

In general, “among the factors we studied, non-technical (organizational and personal) ones are betters predictors” (means: possible factors that might affect the outcome and interval of the code review process) “compared to traditional metrics such as patch size or component, and bug priority.”^[S1]

Note that Wikimedia plans to migrate its code review infrastructure from Gerrit to Phabricator Differential at some point.

Unstructured review approach

An unstructured review approach potentially demotivates first patch contributors, but fast and structured feedback is crucial for keeping them engaged.

Set up and document a multi-phase, structured patch review process for reviewers: Three steps proposed by Sarah Sharp for maintainers / reviewers^[A2], quoting:

Fast feedback whether it is wanted: Is the idea behind the contribution sound? / Do we want this? Yes, no. If the contribution isn’t useful or it’s a bad idea, it isn’t worth reviewing further. Or “Thanks for this contribution! I like the concept of this patch, but I don’t have time to thoroughly review it right now. Ping me if I haven’t reviewed it in a week.” The absolute worst thing you can do during phase one is be completely silent.^[A2]
Architecture: Is the contribution architected correctly? Squash the nit-picky, perfectionist part of yourself that wants to comment on every single grammar mistake or code style issue. Instead, only include a sentence or two with a pointer to coding style documentation, or any tools they will need to run their contribution through.^[A2]
Polishing: Is the contribution polished? Get to comment on the meta (non-code) parts of the contribution. Correct any spelling or grammar mistakes, suggest clearer wording for comments, and ask for any updated documentation for the code^[A2]

Lack of enough skillful, available, confident reviewers and mergers

Not enough skillful or available reviewers and potential lack of confident reviewers^[W1]? Not enough reviewers with rights to actually merge into the codebase?

Capacity building: Discuss consider handing out code review rights to more (trusted) volunteers by recognizing active users who mark patches as good-to-go or needs-improvement (based on statistics)? Encourage them to become habitual and trusted reviewers; actively nominate to become maintainers^[W2]? Potentially recognize people not executing their code review rights anymore. Again this requires statistics (to identify very active reviewers) and stakeholders (to decide on nominations).
Review current code review patch approval handout practice (see Wikimedia’s related documentation about +2 rights in Gerrit).
Consider establishing prestigious roles for people, like “Reviewers”?^[W3]
“we recommend including inexperienced reviewers so that they can gain the knowledge and experiences required to provide useful comments to change authors”^[S2]; Reviewers who have prior experience give more useful comments as they have more knowledge about design constraints and implementation.^[S2]

Under-resourced or unclear responsibilities

Lack of repository owners / maintainers, or under-resourced or unclear responsibilities when everyone expects someone else to review. (For the MediaWiki core code repository specifically, see related tasks T115852 and T1287.)

“Changes failing to capture a reviewer’s interest remain unreviewed”^[S3] due to self-selecting process of reviewers, or everybody expects another person in the team to review. “When everyone is responsible for something, nobody is responsible”^[W4].

Have better statistics (on proposed patches waiting for review for a long time) to identify unmaintained areas within a codebase or codebases with unclear maintenance responsibilities.
Define a role to “Assign reviews that nobody selects.”^[S3] (There might be (old) code areas that only one or zero developers understand.) Might need an overall “Code Review wrangler” position similar to a Bugwrangler/Bugmaster.
Clarify and centrally document which Engineering/Development/Product teams are responsible for which codebases, and Team/Maintainer ⟷ Codebase/Repository relations (Example: “How Wikimedia Foundation’s Reading team manages extensions”)
Actively outreach to volunteers for unmaintained codebases via Requesting repository ownership? Might need an overall “Code Review wrangler” position similar to a Bugwrangler/Bugmaster.
Advertise a monthly “Project in need of a maintainer” campaign on a technical mailing list and/or blog posts?

Hard to identify good reviewer candidates

Hard for new contributors to identify and add good reviewers.

“choice of reviewers plays an important role on reviewing time. More active reviewers provide faster responses” but “no correlation between the amount of reviewed patches on the reviewer positivity”.^[S1]

Check “owners” tool in Phabricator “for assigning reviewers based on file ownership”^[W5] so reviewers get notified of patches in their areas of interest. In Gerrit this exists but is limited.
Encourage people to become project members/watchers.^[W6]
Organization specific: Either have automated updating of outdated manual list of Developers/Maintainers, or replace individual names on the list of Developers/Maintainers by links to Phabricator project description pages.
In the vague technical future, automatic reviewer suggestion systems could help^[S2], like automatically listing people who lately touched code in a code repository or related tasks in an issue tracking system and the length of their current review queue. (Proofs of concept have been published in scientific papers but code is not always made available.)

Unhelpful reviewer comments

Due to unhelpful reviewer comments, contributors spend time on creating many revisions/iterations before successful merge.

Make sure documentation for reviewers states:
- Reviewers’ CR comments considered useful by contributors: identifying functional issues; identifying corner cases potentially not covered; suggestions for APIs/designs/code conventions to follow.^[S2]
- Reviewers’ CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring^[S2]
- Reviewers’ CR comments considered not useful by contributors: Authors consider reviewers praising on code segments, reviewers asking questions to understand the implementation, and reviewers pointing out future issues not related to the specific code (should be filed as tasks) as not useful.^[S2]
- Avoid negativity and ask the right questions the right way. As a reviewer, ask questions instead of making demands to foster a technical discussion: “What do you think about…?” “Did you consider…?” “Can you clarify…?” “Why didn’t you just…” provides a judgement, putting people on the defensive. Be positive.^[A1]
- If you learned something or found something particular well, give compliments. (As code review is often about critical feedback only.)^[A1]
- Tool specific: Agree and document how to use Gerrit’s negative review (CR-1): “Some people tend to use it in an ‘I don’t like this but go ahead and merge if you disagree’ sense which usually does not come across well. OTOH just leaving a comment makes it very hard to keep track – I have been asked in the past to -1 if I don’t like something but don’t consider it a big deal, because that way it shows up in Gerrit as something that needs more work.”^[W7]
- Stakeholders with different expertise areas to review aspects need to split reviewing parts of a larger patch.

Weak review culture

Prioritization / weak review culture: more pressure to write new code than to review patches contributed? Code review “application is inconsistent and enforcement uneven.”^[W8]

Introduce and foster routine and habit across developers to spend a certain amount of time each day for reviewing patches (or part of standup), and team peer review on complex patches^[A1].
Write code to display “a prominent indicator of whether or not you’ve pushed more changesets than you’ve reviewed”^[W9]?
Technical: Allow finding / explicitly marking first contributions by listing recent first contributions and their time to review on korma’s code_contrib_new_gone in T63563. Someone responsible to ping, follow up, and (with organizational knowledge) to add potential reviewers to such first patches. Might need an overall “Code Review wrangler” position similar to a Bugwrangler/Bugmaster.
Organization specific: Contact the WMF Team Practices Group about their thoughts how this can be fostered?

Workload of existing reviewers

Workload of existing reviewers; too many items on their list already.

Reviewer’s Queue Length: “the shorter the queue, the more likely the reviewer is to do a thorough review and respond quickly” and the longer the more likely it takes longer but “better chance of getting in” (due to more sloppy review?)^[S1].

Code review tool support to propose reviewers or display on how many unreviewed patches a reviewer is already added so the author can choose other reviewers. Proposal to add reviewers to patches^[W2] but this requires already good knowledge of the community members as otherwise it just creates more noise.
Potentially document that “two reviewers find an optimal number of defects – the cost of adding more reviewers isn’t justified […]”^[S3]
- Documentation for reviewers: “we should encourage people to remove themselves from reviewers when they are certain they won’t review the patch. A lot of noise and wasted time is created by the fact that people are unable to keep their dashboards clean”^[WA]
Tool specific: Gerrit’s negative review (CR-1) gets lost when a reviewer removes themselves (bug report) hence Gerrit lists (more) items which look unreviewed. Check if same problem exists in Phabricator Differential?
Tool specific: Agree whether ‘Patch cannot be merged due to conflicts; needs rebasing’ should be a reason to give CR-1^[WB] in order to get a ‘cleaner’ list? (But depending on the Continuous Integration infrastructure tools of your project, such rejection via static analysis might happen automatically anyway.)

Poor quality of contributors’ patches

Due to poor quality of contributors’ patches, reviewers spend time on reviewing many revisions/iterations before successful merge. Might make reviewers ignore instead of reviewing again and again giving yet another negative CR-1 review.

Make sure documentation for contributors states:
- Small, independent, complete patches are more likely to be accepted.^[S4]
- “[I]f there are more files to review [in your patch], then a thorough review takes more time and effort”^[S2] and “review effectiveness decreases with the number of files in the change set.”^[S2]
- Small patches (a maximum of 4 lines changed) “have a higher chance to be accepted than average, while large patches are less likely to be accepted” (probability) but “one cannot determine that the patch size has a significant influence on the time until a patch is accepted” (time)^[S5]
- Patch Size: “Review time [is] weakly correlated to the patch size” but “Smaller patches undergo fewer rounds of revisions”^[S1]
- Reasons for rejecting a patch (not all are equally decisive; “less decisive reasons are usually easier to judge” when it comes to costs explaining rejections):^[S6]
  - Problematic implementation or solution: Compilation errors; Test failures; Incomplete fix; Introducing new bugs; Wrong direction; Suboptimal solution works but there is a more simple or efficient way); Solution too aggressive for end users; Performance; Security
  - Difficult to read or maintain: Including unnecessary changes (to split into separate patch); Violating coding style guidelines; Bad naming (e.g. variable names); Patch size too large (but rarely matters as it’s ambiguous – if necessary it’s not a problem); Missing docs; Inconsistent or misleading docs; No accompanied test cases (When should “No accompanied test cases” be a reason for a negative review? In which cases do we require unit tests?^[W4] This should be more deterministic); Integration conflicts with existing code; Duplication; Misuse of API; risky changes to internal APIs; not well isolated
  - Deviating from the project focus or scope: Idea behind is not of core interest; irrelevant or obsolete
  - Affecting the development schedule / timing: Freeze; low urgency; Too late
  - Lack of communication or trust: Unresponsive patch authors; no discussion prior to patch submission; patch authors’ expertise and reputation^[S6]
  - cf. Reasons of the Phabricator developers why patches can get rejected
- There is a mismatch of judgement: Patch reviewers consistently consider test failures, incomplete fix, introducing new bugs, suboptimal solution, inconsistent docs way more decisive for rejecting than authors.^[S6]
- Propose guidelines for writing acceptable patches:^[S6]
  - Authors should make sure that patch is in scope and relevant before writing patch
  - Authors should be careful to not introduce new bugs instead of only focussing on the target
  - Authors should not only care if the patch works well but also whether it’s an optimal solution
  - Authors should not include unnecessary changes and should check that corner cases are covered
  - Authors should update or create related documentation^[S6] (for Wikimedia, see Development policy)
- Patch author experience is relevant: Be patient and grow. “more experienced patch writers receive faster responses” plus more positive ones. In WebKit, contributors’ very first patch is likely to get positive feedback while for their 3rd to 6th patch it is harder.^[S1]
Agree on who is responsible for testing and document responsibility. (Tool specific: Phabricator Differential can force patch authors to fill out a test plan.)^[W7]

Likeliness of patch acceptance depends on: Developer experience, patch maturity; Review time impacted by submission time, number of code areas affected, number of suggested reviewers, developer experience.^[S7]

Hard to realize a repository is unmaintained

Hard to realize how (in)active a repository is for a potential contributor.

Implement displaying “recent activity” information somewhere in the code repository browser and code review tool, to communicate expectations.
Have documentation that describe steps how to ask for help and/or take over maintainership, to allow contributors to act if interested in the code repository. For Wikimedia these docs are located at Requesting repository ownership.

No culture to improve changesets by other contributors

Changesets are rarely picked up by other developers^[WB]. After merging, “it is very difficult to revert it or to get original developers to help fix some broken aspect of a merged change”^[WB] regarding followup fixing culture.

Document best practices to amend a change written by another contributor if you are interested in bringing the patch forward.

Hard to find related patches

Hard to find existing “related” patches in a certain code area when working on your own patch in that area, or when reviewing several patches in the same code area. (Hence there might also be some potential rebase/merge conflicts^[WB] to avoid if possible.)

Phabricator Differential offers “Recent Similar Open Revisions”.^[WC] Gerrit might have such a feature in a newer version.^[WD]

Lack of synchronization between teams

Lack of synchronization between developer teams: team A stuck because team B doesn’t review their patches?

Organization specific: Wikimedia has regular “Scrum of Scrum” meetings of all scrum masters across teams, to communicate when the work of a team is blocked by another team.

Comment which important factors that you have experienced are missing!

References

Papers:
- [S1] Olga Baysal, Oleksii Kononenko, Reid Holmes, Michael W. Godfrey: “The influence of non-technical factors on code review” in: 20th Working Conference on Reverse Engineering (WCRE 2013), pages 122-131.
- [S2] Amiangshu Bosu, Michaela Greiler, and Christian Bird: “Characteristics of useful code reviews: an empirical study at Microsoft” in: Proceedings of the 12th Working Conference on Mining Software Repositories (MSR 2015), pages 146-156.
- [S3] Peter C. Rigby, Brendan Cleary, Frederic Painchaud, Margaret-Anne Storey, and Daniel M. German: “Contemporary Peer Review Action: Lessons from Open Source Development”, 2012
- [S4] Peter C. Rigby, Daniel M. German, and Margaret-Anne Storey: “Open source software peer review practices: a case study of the apache server” in: Proceedings of the 30th international conference on Software engineering (ICSE 2008), pages 541-550.
- [S5] Peter Weißgerber, Daniel Neu, and Stephan Diehl: “Small patches get in!” in: Proceedings of the international working conference on Mining software repositories (MSR 2008), pages 67-76.
- [S6] Yida Tao, Donggyun Han, and Sunghun Kim: “Writing Acceptable Patches: An Empirical Study of Open Source Project Patches” in: IEEE International Conference on Software Maintenance and Evolution (ICSME 2014), pages 271-280.
- [S7] Yujuan Jiang, Bram Adams, and Daniel M. German: “Will my patch make it? and how fast?: case study on the Linux kernel” in: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR 2013), pages 101-110.
Talks and blogposts:
- [A1] Derek Prior: “Cultivating a Code Review Culture” at RailsConf 2015
- [A2] Sarah Sharp: “The Gentle Art of Patch Review”
Discussions in Wikimedia:

2 Responses to Code review in open source projects: Influential factors and actions

skierpage says:

2016-11-28 at 16:29

(Hi!) Interesting grab-bag of ideas.
`git blame` identifies possible reviewers and indicates if a particular area is inactive. It can falsely suggest people making global changes, you need to use `git log` as well.

A great commit message can encourage review by stating what it fixes, how to see the problem, how to verify the fix, and what the committer tested. All this can be replaced by updates to tests as part of the patch. Could submitting a test case that fails be a separate precursor commit (or would that break CI?)? More tests are good and have a lower bar for review, and encourage review of the follow-up patch that fixes the test

“Reviewers’ CR comments considered somewhat useful by contributors: ” bullet is repeated.
- aklapper says:
  
  2016-11-29 at 14:25
  
  S! Thanks for your comments and sharing your thoughts!

Comments are closed.