Apache Lucene goes full steam ahead on performance with 9.0 release

Development

By Julia Schmidt

December 13, 2021

Apache Lucene goes full steam ahead on performance with 9.0 release

The team behind search engine Apache Lucene has recently made version 9.0 of the open source project available for downloading, sharing performance improvements and first steps towards Java module system support with its user base.

Lucene 9.0, which serves as the basis for projects such as Elasticsearch and MongoDB Atlas’ full-text search, tries to keep up with the times, by looking into ways of supporting new usage scenarios and Java features. It is, for instance, the first release to provide JARs with automatically generated module names, which the team behind the engine hopes will help to enable work with the Java module system somewhere along the line.

The Lucene team also has been busy exploring the indexing of high-dimensionality numeric vectors to perform nearest-neighbor search in v9.0. The resulting implementation uses the Hierarchical Navigable Small World graph algorithm and has been added to answer a growing demand from data scientists working in the field of machine learning to index documents containing vectors.

However, the focus of the new major release seems to have been largely placed on performance, as the update’s announcement highlights speed-ups in areas like taxonomy faceting, sorting, and indexing of multi-dimensional points. And there’s still more to come, as it also includes some foundational work to take system statistics into account when running queries concurrently, which looks to make the most out of the resources available.

Apart from that, Lucene comes with reworked ConcurrentMergeScheduler settings, which assumes modern I/O to improve indexing performance and prevent systems from running into seemingly random JDK issues. RegExp queries have become more strict following the Java Pattern policy for rejecting illegal syntax, and now know how to handle \w, \W, \d, \D, \s, and \S expressions.

With the new release the Lucene team decided to update the project to use version 2.0 of Snowball, a processing language for stemming algorithms. Thanks to the change, users now have analysers for Serbian, Nepali, and Tamil at their disposal. Lucene 9.0 is also the first release to provide a minimal stemmer for Swedish (more complex versions have been available already), as well as a JapaneseCompletionFilter for Input Method-aware auto-completion.

To make the new version work, developers need to have JDK 11 or newer installed. Under the hood changes in component handling also mean that authors of custom analysis factories need to fit those with a default constructor implementation to keep the factories functional. It’s also generally recommended to check imports, as Lucene 9 doesn’t use split packages anymore, hence renamed some none-core JARs.

More details on renamings and other backwards incompatibilities can be found in the project’s changelog.

Node.js adds experimental TypeScript support, as it 'simply cannot be ignored'

Uno Platform 5.3 released with full JetBrains Rider support and '350 enhancements'

PHP 8.4 is coming in November with HTML 5 extension, new array functions, and more

React community splitting into full-stack and client-only camps, suggests survey

Executives have more confidence in software supply chain security than their developers

Why Facebook does not use Git – and why most other devs do

Netlify sponsors Astro and becomes official deployment partner, as CEO takes aim at "vendor lock-in"

Devs say many of their hours are wasted, disagree with managers on how to fix the issue

Boomi takes aim at zombie APIs with control plane

German court rules AI output can be protectable, ups stakes for machine generated code

Daunting downtime stats help put industrial DevOps under spotlight