Hey GitHub! Universe event introduces speech-to-code Copilot experiment, new Code Search

Developer Advocate Rizèl Scarlett demonstrates Hey Github!

GitHub used its Universe event, in San Francisco and online, to introduce new features for its code repository and DevOps platform. The key news includes:

  • GitHub Copilot for business, coming next month, introduces team licensing and management for the Copilot AI coding service
  • Hey GitHub!, billed as an experiment, enables speech control of Copilot
  • Codespaces, ephemeral virtual machines for compiling and debugging code, has new plans offering up to 60 hours a month free for individual developers. In addition, the JetBrains IDEs are now supported for use with Codespaces, and the JupyterLab notebook IDE is in public beta.
  • A redesigned Code Search service, currently in preview, is based on a new search engine with better performance as well as a new user interface.
  • GitHub Actions Importer is in private preview, for migrating CI/CD (Continuous Integration/Continuous Delivery) from platforms such as CircleCI, Jenkins, or even Azure DevOps, to GitHub Actions.
  • GitHub Accelerator will fund 20 open source maintainers with a “full stipend and mentorship,” according to GitHub CEO Thomas Dohmke.

Copilot is the target of a lawsuit claiming that the service copies code without proper license attribution; but this did not deter the Microsoft-owned organization from hyping it up at Universe, with reference to research including one study in which 40 percent of the code written by developers was synthesized by Copilot, and another claiming a 55 percent productivity improvement. These are big numbers, and even if normal coding scenarios deliver only a fraction of these claims, the cost of developer time is such that it could still be worthwhile.

Developer Advocate Rizèl Scarlett showed at Universe how Copilot could go one step further by accepting speech commands to generate or amend code. On stage she both wrote and executed a simple application using speech and Copilot alone. The stated reason is “to bring the benefits of GitHub Copilot to even more developers, including developers who have difficulty typing using their hands,” according to Dohmke.

Party trick or genuinely useful? Reactions are varied. “There are several reasons this wouldn’t work” observed one developer, citing problems with code review, debugging, editing, and the imprecision of speech recognition. Tasks like removing a comma can be arduous using regular speech recognition engines, but are critical in programming. On the other hand, “this will be a generational paradigm change in how to write code… if it works,” said a more optimistic comment on Hacker News.

The new code search may bring more immediate benefit. A Universe session presented by Timothy Clem, staff software engineer, answered the question “Why build a search engine from scratch?” Clem explained that “code search is uniquely different from text search … code is designed to be understood by machines.” Early experiments with Elasticsearch “took months to index the code,” Clem said.

The new engine, called Blackbird, uses a variety of techniques to speed performance, and can now deliver millions of search results across all GitHub’s public code in less than a second and can build a complete index in around 14 hours. One of the insights was that there is lots of duplicate code on GitHub, so that 76TB of code becomes 22TB. The code search uses regular expressions, is currently in private preview, and will likely become the default search on GitHub.

Code Search internals

GitHub also presented its latest “Octoverse” statistics, showing that 94M developers are now on GitHub (it was 73M in 2021 and 31M in 2018), and that the fastest growing language is not Python or JavaScript but HCL (Hashicorp Configuration Language), reflecting the advance of infrastructure as code.

The growth of GitHub overall will not be a comfort to those who feel that GitHub combined with Visual Studio Code gives Microsoft too much sway over the developer ecosystem.

More details are in Dohmke’s post here.