GitHub research reports high Copilot satisfaction from enterprise devs… but others doubt productivity gains

GitHub research reports high Copilot satisfaction from enterprise devs… but others doubt productivity gains
Coding

 A GitHub report on research conducted with developers at global technology services company Accenture attempted to quantify the “enterprise impact of Copilot,” finding that it made coding 55 percent faster as well as more fulfilling; but a much-discussed post by another developer reported no productivity gains.

The GitHub trial divided Accenture developers into two randomized groups, one with access to Copilot and one without and analyzed differences. It also looked at telemetry from Copilot users.

According to the research, those using Copilot submitted 8.69 percent more pull requests and with a 15 percent higher merge rate – with these metrics indicating, the report claims, “an excellent measure of code quality as seen through the eyes of a maintainer or coworker.” Even more impressive was an 84 percent increase in successful builds.

Developers in the trial accepted around 30 percent of Copilot suggestions, and 90 percent committed code including some suggestions. Satisfaction also increased, with 95 percent of developers stating that they enjoyed coding more when using Copilot.

GitHub reports that most developers find Copilot useful; but other thoughtful comments show some complex issues with AI coding

Although it is clear that Copilot-generated code is passing reviews and end up in production, this kind of research does not answer questions about potential increased code verbosity or long-term maintainability.

Another developer, Yuxuan Shui, from London, posted earlier this month about a year of using Copilot. He used it mainly for an open source project, a X11 compositor written in C. Shui said that the unique nature of his project meant that “large-scale reasoning of the codebase with Copilot is out of the window,” which if true suggests that the ability of the AI assistant to help developers understand pre-existing code is limited. 

Shui nevertheless found Copilot quite good for mundane tasks like writing repetitive code for parsing, glue functions, and similar. Since these were for him the “least fun part of programming,” there was a lot of potential here.

The developer though identified two problems. First, he found Copilot inconsistent. It would write one function based on comments and get it right, but then for a second function generate a “chunk of gibberish.” There was no way of predicting which would appear, making this a burden.

Second, he found Copilot slow. “I would wait at least 2-3 seconds to get any suggestion from Copilot,” he wrote, and much longer if it was going to spit out a large chunk of code.

He concluded that currently “I do not think Copilot will improve my productivity,” though allowing that it may improve in future and tip the balance in favour.

This is not necessarily inconsistent with Accenture’s results. A C developer writing intricate graphics code may get worse results, for example, than a Java developer writing line of business applications. 

A lengthy discussion of the post on Hacker News though shows that opinions are mixed. “The worst, most obscure bugs I’ve had to debug in the last year were all in Copilot-written code. It looks plausible, but it makes extremely subtle mistakes,” said one contributor, expressing a particularly alarming potential weakness in AI coding.

Another said that Copilot is “pretty great for boilerplate code… It’s a tossup for anything much more complex.”

Developer time is precious though, and even small gains can be worthwhile. “I’m paying the $10/mo out of my own pocket, and yesterday — at least to me – it paid for itself for the month in just … two examples. I find it delightfully surprising,” said another; and ” I use it with my Python side-projects, and it’s truly amazing how much time it saves me,” was a comment from someone else.

The general impression is that Copilot (and other AI coding assistants) can be helpful but not in the pervasive way that the way they are marketed implies. As with other tools, developers have to learn how to get value from them, and what not to do, to avoid polluting rather than improving their code.