OCaml maintainers reject massive AI-generated pull request

OCaml maintainers reject massive AI-generated pull request

The maintainers of OCaml, an open-source functional programming language, rejected an AI-generated pull request (PR) with more than 13,000 lines of code, citing copyright concerns, lack of review resources, and that it did not align with the maintenance practices of the project – raising key issues about the future interaction between AI and open source.

OCaml has several compilers, including ocamlc which emits bytecode and ocamlopt which builds standalone native executables. Typically, OCaml developers use the bytecode compiler for developing and testing their code, but may use the native compiler for production. There are built-in tools for debugging OCaml bytecode but debugging with the native compiler is more limited.

Developer Joel Reymont was frustrated by the lack of DWARF debugging information when using the native compiler and turned to Claude Code to add this feature, DWARF being a standard format used by many debuggers including lldb (the LLVM debugger) and gdb (GNU debugger). Reymont described on his blog how he used Anthropic’s Claude Code to add the feature to the OCaml native compiler, and said “I did not write a single line of code but carefully shepherded AI over the course of several days … my work was just directing, shaping, cajoling and reviewing.” Reymont submitted the code as a pull request to the OCaml GitHub repository.

A curious aspect to the submission was that some of the source code was credited to Mark Shinwell at Jane Street Europe, a financial trading company whose research includes an open source project called OxCaml, described as a safer and more performant version of OCaml, and which includes DWARF debugging support.

Many files in the AI-generated code credit Mark Shinwell as the author - though the AI also insisted that no code was copied
Many files in the AI-generated code credit Mark Shinwell as the author – though the AI also insisted that no code was copied

“This seems to be largely a copy of the work done in OxCaml” remarked OCaml contributor Tim McGilchrist, who is also working on the project. Asked why some of the files credited Shinwell as the author, Reymont said, “Beats me. AI decided to do so and I didn’t question it.”

Shinwell also consulted AI regarding the copyright, which told him that “I conclude that no code was copied from oxcaml” and gave reasons. Unconvinced, maintainer Gabriel Scherer said “the fact that the tool that produced the code attributes its copyright to a real human is a clear sign that something is an issue.”

Other issues identified included lack of a design discussion regarding the feature, difficulty in reviewing such a “humungous amount of code,” the future maintenance burden, and the fact that others were working on a similar feature that was not yet ready for upstreaming – this being the DWARF support in OxCaml.

Three years ago, Scherer posted that OCaml “maintainers have been constantly complaining that there are more people willing to submit changes/PRs than people willing to review them, creating a bottleneck on the reviewing side.” This situation has not improved, and for people to submit “very large relatively-low-effort PRs creates a real risk of bringing the Pull-Request system to a halt,” Scherer said in a comment on Reymont’s PR. He added that in his experience, reviewing AI-generated code is more taxing than reviewing human-written code.

Following some discussion, Scherer closed the pull request, stating that “none of the people who could plausibly be interested in reviewing and supporting a DWARF-support PR seem willing to consider doing the work … so I think that this has no chance of being merged.” He said it was not a value judgement of the code, but the development approach did not align with those of the project. “Currently github/ocaml as it is organized is not able to work in this way.” 

A claim by Reymont that the AI had “deep understanding” of the code was challenged by another developer as showing a lack of knowledge about how LLMs (large language models) work.

Despite the issues with this particular PR, the fact that Claude Code was able, with human oversight, to come up with a working feature of such intricacy is in itself impressive – though the reliability and quality of the code is not known.

Scherer remarked in another comment that “getting good native-debugging support in the OCaml compiler would be very nice,” adding that the difficulty was to find a way of doing this that had a reasonable maintenance burden. He also said the OCaml project lacked a “clear policy for what we expect regarding AI-assisted code contributions.”

Considering the strong push for AI coding from industry giants including Microsoft, Google and AWS, it is inevitable the volume of PRs submitted to open source projects that are coded all or part using AI will increase. AI optimists may believe that AI itself can review these PRs satisfactorily, but there are obvious risks in this given that LLM output is not deterministic and problems such as hallucination and prompt injection remain unsolved.