Code review at scale: what I learned running it for three years

After enough years running code review, I stopped thinking the main question was whether I caught the bug.

Bugs matter. But if bug-catching is the entire value of review, the team is leaving most of the value on the table. Over three years of leading reviews at PickYourTrail, my view of what code review is actually for shifted substantially — from quality filter to quality system. That shift changed how I review, what I comment on, and how I think about review culture as a team infrastructure problem.

What review is actually for

The early version of my review practice was mostly a filter. Something comes in, I inspect it, I approve it or send it back. That is not wrong — catching problems before they merge is genuinely important — but it is incomplete as a mental model.

In a real team, review is also where people learn what “good” looks like in this codebase. How we name things. How we structure logic. What tradeoffs we accept and what risks we do not. That knowledge is not transmitted by documentation or architecture diagrams. It is transmitted through the ongoing conversation of review, where abstract standards become concrete in specific code. If review is only gatekeeping, it slows the team. If review is a quality and alignment mechanism, it helps the team scale.

What took time to internalize: review has a teaching surface completely separate from its gate function. When I leave a comment, I am not only acting on this specific pull request. I am shaping how the author approaches the next ten decisions they make without my involvement. A comment that explains the risk or principle behind a requested change creates capability. A comment that just flags the problem without explaining why creates dependency.

The review comments I am embarrassed about

I used to write comments that were technically correct but not very useful. Sharper than they needed to be, optimizing more for demonstrating what I had seen than for helping the engineer improve their thinking.

Good review is not performance. The reviewer who sounds the sharpest is not necessarily the one creating the most value. A comment that makes the author feel stupid while technically identifying a real problem has mixed value at best. A comment that makes the author understand something they did not understand before — about the system, about the pattern, about the cost of the choice they made — has compounding value.

There is a real difference between “this should be in the service layer” and “the reason we put this kind of logic in the service layer is that it keeps the controller from knowing about domain rules, which lets us test this behavior in isolation without spinning up an HTTP context.” Both might produce the same code change. Only one produces a better engineer.

Turnaround time and the flow problem

This took me longer to learn than it should have. A brilliant review that arrives two days late can still damage team momentum significantly.

Turnaround time matters because review is not only a quality step — it is also a throughput step. When pull requests sit waiting, engineers context-switch to other work, branches drift from main and accumulate merge risk, and the feedback arrives in a context that is no longer live in the author’s mind. Worse, when review becomes a persistent bottleneck, the team’s relationship with the process degrades. Review starts feeling like something done to the team rather than something the team does for itself.

The calibration I settled on: better to review more quickly and slightly less exhaustively than to be thorough on a schedule that breaks the team’s flow. That is a tradeoff, and it means some things get through that a longer look would have caught. But a review culture the team learns to resent costs more in the long run.

What I ask for versus what I require

One of the most useful distinctions I made explicit over time: the difference between feedback that should block a merge and feedback that is a suggestion or preference.

If everything becomes a blocker, review gets political and exhausting. Engineers start treating any comment as adversarial, because any comment might become the thing standing between them and shipping. If nothing becomes a blocker, review loses its enforcement function. The middle ground is explicit signaling about what kind of feedback each comment is.

Correctness, security, significant performance risk, or maintainability serious enough to harm the system — those block. Naming preferences, alternative structure, or things to consider in a follow-up — clearly marked as non-blocking. A reviewer who conflates these two categories creates a poor review experience. An author who does not know whether a comment is blocking or not is stuck guessing, which produces slower cycles and worse conversations.

This seems like a small process detail. In practice it made reviews noticeably less stressful and more productive.

PR shape is a review multiplier

A large, unfocused PR is not just a reviewer inconvenience. It is a quality risk. Reviews of large PRs are inevitably shallower than reviews of small ones — not because reviewers are lazy, but because attention is finite and large PRs demand context reconstruction that small PRs do not. The problems that get through tend to be the ones buried in the middle.

Small, well-scoped PRs are a review multiplier in both directions. The reviewer can be more thorough because less context reconstruction is required. The author gets feedback that is more precise and more useful. The commit history becomes more intelligible. And the review cycle is faster because a focused change is easier to understand and easier to approve.

Getting the team to genuinely write smaller PRs is harder than it sounds. It requires changing habits that are often deeply set. But the quality improvement is real and visible, and it is one of the highest-leverage review culture interventions I know.

Review as an architecture signal

One of the things I value most about sustained review practice is that it creates visibility into architecture problems before they become crises.

Patterns accumulate in the comment history. Every feature requires the same awkward workaround. One module attracts confusing logic repeatedly. A shared abstraction keeps getting bypassed in ways that suggest it is not modeling the right thing. Individual PRs can look fine while collectively pointing at a structural problem.

I started treating recurring review comments as architecture data. If I keep leaving the same kind of comment — about a leaky boundary, about logic that keeps ending up in the wrong layer, about a configuration pattern that propagates in ways it should not — that is a signal worth acting on systemically rather than just per-PR. The right response eventually is not a better comment but a design conversation about why the system keeps producing that pressure.

What three years of this taught me

Code review is not a final exam for the author. It is a quality conversation for the team, and the team’s ability to have that conversation well is itself a technical capability worth investing in.

Review culture is not something that happens automatically when you set up a required-review policy. It is shaped by the quality of the comments people leave, the norms that develop around turnaround time and PR size, the distinction between blocking and non-blocking feedback, and most importantly by whether the team has internalized review as something that helps them rather than something done to them.

Getting there takes time. It also requires the most senior people in the review system to model the behavior explicitly — to write the kind of comments they want others to write, to respond quickly, to be clear about what they require versus what they suggest. Review culture propagates from what the highest-influence participants actually do, not from policy statements about what review should be.