Can you trust AI to test AI-generated code? DevClass talks AI coding with Tabnine

Can you trust AI to test AI-generated code? DevClass talks AI coding with Tabnine
AI programming

Peter Guagenti, president and CMO of Tabnine, recently spoke with DevClass about AI versus search, how to assess an AI coding assistant, and whether developers can trust AI-generated tests to validate AI-generated code.

GitHub’s Copilot is the best-known AI coding assistant – but was not the first. Copilot was first previewed in 2021, while Tabnine, originally a Canadian company, was founded in 2018. In 2019 it was acquired by Israeli firm Codota, which had been working on AI coding since its own foundation in 2015. Confusing matters, Codota renamed itself to Tabnine in 2021, but the combined entity is among the earliest in this field. 

“We’ve been sort of overshadowed by Copilot, and that’s just Microsoft’s reach,” Guagenti told us – though he added that “we’ve held our own as a solid number two.”

The product is now one of many though – competing with Amazon Q, Google’s Gemini Code Assist, Codeium, JetBrains AI Assistant, and more. In what is now a crowded market, how can developers assess which is best for them?

“There’s a cohort of us who have effectively the same capabilities,” lamented Guagenti. “So for individual developers some of the stuff is going to be do I like how the UX (user experience) works? Do I like how it responds to me? The separation in evaluation is when you go into teams and look at, what are the models underneath it, and do they meet my regulatory license and compliance expectations?”

According to Guagenti, highlights of Tabnine are that it can be deployed, optionally, as a “fully private deployment” which has “zero access to Tabnine infrastructure, so we never see any of your data.”

AI coding is now good at writing code for common scenarios, but when it comes to asking for help with coding problems it begins to look a lot like search. Is the industry honest about what is AI, and what is just a new take on search?

“I would argue the most useful function for generative AI right now is effectively search,” explained Guagenti. “I spent a year working at Google and I got to see under the hood what they were doing with AI then for search results … I remember the first time I saw it and it was obvious to me that this was the future of finding information. I don’t want to read a page of search results, I want [the] vetted, validated clarity of a single answer – because no one wants to go onto StackOverflow and pore over other people’s posts for three hours.”

“It’s not a question of is it generative AI or is it search,” declared Guagenti, putting it rather as the ability to accumulate knowledge and parse that information back to the developer.

We put it to Guagenti, though, that the problem with summarized content is that the source and authority of the content becomes opaque. That, combined with the known tendency of AI assistants to be wrong occasionally, makes developers wary.

“I think the only way these tools are ever going to work is if we trust them,” replied Guagenti, retreating somewhat from his earlier answer. “Coding assistants are really focused on what is the training data, and bigger is not always better. We think models [should] start getting smaller and more focused,” he explained, talking up the value of trusted sources. He also noted that coding assistants are getting better at provenance. “Explain your source. Where did you find that information? That’s something we have in beta today.”

Another issue is with software testing. Developers are encouraged to test AI-generated code, because it cannot yet be entirely trusted – but vendors are also pitching AI-generated tests. Can we trust AI to test AI?

“There is definitely a risk in there,” conceded Guagenti. “But the risk also exists with humans – it’s not just an AI risk. The person writing the code is also writing the tests and is motivated for the test to pass. So I don’t think it is a new problem.”

Developers should understand, Guagenti told us, that even if a study shows 80 percent reduction in effort with automated testing “you still have to put in the 20 percent. It doesn’t absolve you of your responsibility.”

That said, part of the answer is to have different systems. “You are going to have separate systems that are focused on code generation, and separate systems that are focused on code validation,” Guagenti argued. “We’re thinking, different models, different prompts, different context … we believe that the different AI agents can keep each other in check. We think we can do this more with AI more effectively than the humans are doing today.”