Apache Cassandra 4.0 crosses the finish line with virtual tables, additional logging tools

Apache Cassandra 4.0 crosses the finish line with virtual tables, additional logging tools

Six years after the last major release, Apache Cassandra 4.0 has finally been pushed into the open. The 4.0 release of the database was initially planned for earlier this month, but got put on hold over a last-minute backward compatibility issue. With that now taken care of, teams can finally get using a stable-marked variant and test out new features such as audit and full query logging, virtual tables, and transient replication.

While regular tables manage and store data as SSTables, virtual tables are backed by an API. They are created in special keyspaces only, namely system_virtual_schema and system_views, aren’t replicated, and exist only locally.

Virtual tables have made their way into the release for use-cases like the exposition of metrics through CQL or disclosing YAML configuration information, which can be useful for understanding a cluster’s health and status. In earlier versions, such data could only be accessed through a Java Management Extension connection, which meant lots of configuration work on nodes on firewalls — so Cassandra is definitely going for convenience with the addition. 

Organisations looking to get more insight into the operations done on a given Cassandra database can draw on the new audit logging feature. Audit logging records successful and failed login attempts as well as all database command requests to the Cassandra Query Language (CQL) which are largely helpful for compliance purposes.

Teams more interested in ways to improve their system’s performance or find issues are well-advised to have a look at the newly added — and apparently production-safe — full query logs (FQL). Those take note of all queries invoked, the approximate time they were called upon, parameters necessary to bind wildcard values, as well as all query options. 

To make use of FQL, logging must be enabled via nodetool enablefullquerylog and FQL must be configured via nodetool or the cassandra.yaml. Disabling follows the same logic via nodetool disablefullquerylog, and there’s a resetfullquerylog command — which needs to be used carefully, since it will delete all log files.

A still-experimental addition is the transient replication feature. It is said to allow configuring a subset of replicas to only replicate data that hasn’t been incrementally repaired, for cases in which full replicas aren’t available. During the repair process, the transient replicas stream their data to full replicas until it is fully replicated.

Other enhancements include support for Java 11, zero copy streaming capabilities for SSTables, and an enum to identify stream operation types. Internode messaging was improved as well, sending IPAddressAndPort only when a session is initiated, using non-blocking I/O, enforcing strict resource limits on the number of queued outbound messages, and grouping messages into a single logical payload.

Under the hood, the Cassandra team has been busy putting infrastructure in place to make sure the project’s quality is up to scratch for high-profile users such as Netflix, Apple, GitHub, and Spotify. Amongst other things, the project has been fitted with tools for performance, fuzz, upgrade, and replay testing, fault injection, and unit coverage expansion. 

These additions were also meant to help get the recently somewhat lacklustre release frequency up again. The project promised a shift to a yearly release cycle, with individual releases supported for three years.