The effect of AI tools on coding
« previous post |
Joel Becker et al., "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity", METR 7/10/2025:
Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February–June 2025 frontier affect the productivity of experienced open-source developers. 16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early-2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down. This slowdown also contradicts predictions from experts in economics (39% shorter) and ML (38% shorter). To understand this result, we collect and evaluate evidence for 20 properties of our setting that a priori could contribute to the observed slowdown effect—for example, the size and quality standards of projects, or prior developer experience with AI tooling. Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.
(See also this version…)
A graph of their results:
This Swedish thesis confirmed those survey results, but did not test actual development time — the METR results show that users' opinions about productivity are by no means always accurate. Of course those METR results were based on Claude 3.5 — Claude 4 might be different. Or might not be. And maybe making coders feel good is worth a 19% productivity decline…
(Here's someone who's really enthusiastic about using Claude Code — I assume the latest version — but again, it's opinion and not productivity measurement.)
Articles like "Generative AI is Turning Publishing Into a Swamp of Slop" (7/10/2025) suggest that LLMs are enhancing "productivity" in certain corners of the publishing industry. So it would be interesting to understand (beyond the obvious reasons) why coding is different, and what the implications are for other applications.
The METR discussion includes some attempts to "very roughly gesture at some salient important differences", which would apply in other fields. My own concern, based on considerable experience, is that the motives of the administrators (and consultants) responsible for tool choice are pretty clearly not always aligned with productivity improvements. Or user satisfaction, for that matter…
Ron Stieger said,
July 13, 2025 @ 8:23 am
I don’t think coding is different from publishing in this regard. The key to this result is “experienced developers”. There have been other results that AI does increase productivity for junior developers. But the senior developers know how to recognize slop and won’t accept it, so they end up with more iterations.That’s certainly been my experience: at times it’s an incredible shortcut but other times it goes down a rabbit hole of wrongness, so overall a wash at best. I would expect a similar result from serious writers – it would slow them down even though it accelerates the production of crappy writing.
A more interesting question perhaps: why were expectations/perceptions so far off from the measured reality?