Code Intelligence (CI) has introduced CI Spark, an LLM (Large Language Model) AI assistant for creating tests, with claimed productivity of 15 times that of manually created tests.
The world of unit tests has evolved since its early days, when developers wrote tests checking that input to their code routines generated the expected output. While such tests are still important, a more powerful means of discovering security issues is testing with unexpected and randomized input, rather than what the developers had in mind when writing the code. This technique, called fuzzing, has unearthed many security issues which would otherwise have gone undetected.
In August, Google posted about adding LLM-based AI to its OSS-Fuzz project for continuous fuzz testing of open source software. A fuzz introspector tool identifies under-tested areas of the code and creates prompts to the LLM requesting new fuzz targets. If the code generated by the LLM failed to compile, the tool generates a new prompt asking for a fix. “At first, the code generated from our prompts wouldn’t compile; however, after several rounds of prompt engineering and trying out the new fuzz targets, we saw projects gain between 1.5% and 31% code coverage,” the Google team reported.
CI is both a provider of enterprise testing tools and maintainer of an open source fuzzing tool for JVM (Java Virtual Machine) languages called Jazzer, which has been used by OSS-Fuzz since 2021. Given that collaboration, it is no surprise that CI is now introducing CI Spark which it says takes a similar approach but targeting enterprise software development. “The main purpose of this new OSS-Fuzz addition is to keep securing open-source, CI Spark will soon be rolled out for commercial projects as well,” said the company.
That said, it is early days for the tool and the roadmap includes support for different LLMs, a new evaluation framework to validate the quality of a test, static analysis to improve the identification of candidates for testing, and support for additional programming languages.
The commercial pressure to deliver features can mean that testing – whether unit testing or fuzz testing – can be neglected, resulting in software that is less resilient, less secure, and harder to maintain. Applying AI to this problem is an obvious solution. GitHub’s Copilot coding assistant has support for unit test generation and, earlier this year, Tabnine introduced AI-powered unit testing. Applying AI to fuzz testing has been talked about for years – such as in this Microsoft Research post from 2017 – but it now looks set to become mainstream.