AI software development has grown rapidly in popularity, but how is code quality impacted? Not well, according to new research, with issues including code churn – code added but deleted shortly after – and a higher proportion of repeated code.
The research comes from developer analytics company GitClear, and is based on data gathered on 150 million changed lines of code, two-thirds from private corporations opted into anonymized data sharing, and one-third on open source projects mostly from Groogle, Facebook and Microsoft. The study looks at code added, updated, deleted, copied or moved, and excludes what GitClear defines as “noise”, such as the same code committed in multiple branches, blank lines, and other non-meaningful lines.
GitHub’s Copilot, which kick-started the AI coding wave when introduced in beta in June 2021, has more than 1 million developers with paid subscriptions according to CEO Thomas Dohmke, who also said that developers complete tasks 55 percent faster and that 46 percent of code was completed by Copilot in files where it was enabled.
GitClear throws the spotlight on code quality rather than quantity, as well as observing that AI assistants tend to give “suggestions for added code, but never suggestions for updating, moving, or deleting code.” The researchers also propose that “code suggestion algorithms are incentivized to propose suggestions most likely to be accepted,” which seems sensible until one considers the importance of code that is concise and readable.
Measuring code quality is not easy. The researchers though do identify some trends showing that the amount of code being added, deleted, updated and copy/pasted has never been higher, but the instances of code being moved have declined. They also see an increase in churn, now at 7.1 percent compared to just 3.3 percent in 2020.
Code is moved when developers are restructuring code so this could be an indicator of refactoring, which means improving the design and structure of code without altering its behavior.
The reasons for these trends are open to speculation though the researchers believe it is related to growing use of AI coding techniques. They are scathing about the impact of more copy/pasted code, saying there is “no greater scourge to long-term code maintainability.”
Excessive use of copy/paste is not a new problem. Developers may do it because it seems at the time quicker and easier than working out how to reuse existing code, or because multiple developers working on a project are not communicating well, or because too much is copied from samples or from question and answer coding sites.
The GitClear researchers do not say much about how to fix the issues identified, falling back on “questions for follow up research,” though they do suggest that engineering leaders should “monitor incoming data and consider its implications for future product maintenance.”
AI coding assistants are not going away, though they may improve, and like all newish tools, developers will learn how optimize their use.
In some ways this research may be reassuring for developers who fear replacement by AI tools. A recent study on AI refactoring, from code analysis company CodeScene, concluded that “AI is nowhere near replacing humans in a coding context; today’s AI is simply too error-prone, and far from a point where it is able to securely modify existing code.”