Apache Kafka 3.0 prepares project for big clean-up and Raft metadata mode

Apache Kafka 3.0 prepares project for big clean-up and Raft metadata mode

Apache Kafka 3.0 has been pushed into availability, paving the way for the event processing platform to leave the ZooKeeper project behind once and for all.

It therefore isn’t too surprising that the most highlighted improvements in the release come in the context of Kafka’s consensus mechanism KRaft, which is still in preview. The ZooKeeper successor now provides KRaft Controllers and Brokers to “generate, replicate, and load snapshots for the metadata topic partition named __cluster_metadata”, which is where Kafka Cluster stores and replicates cluster metadata. 

To make the switch to KRaft (once it is marked stable) a smooth one, the Kafka team reworked the tool’s metadata record types and made the Kafka Controller responsible for generating Producer IDs in both ZooKeeper and KRaft mode. There was also an agreement that producers should enable the strongest message delivery guarantee by default, which is why such instances will now come with idempotence and acknowledgement of delivery by all replicas enabled.

Apart from that, the Kafka team tried to tune the platform for current use cases that were a bit troublesome to realise before. Monitoring tasks, for instance, often produced some overhead since fetching offsets for multiple consumer groups wasn’t an option. This was rectified in the OffsetFetch API in version 3.0. Meanwhile an improvement in the AdminClient.listOffsets will help users test a partition’s liveliness; it now offers the option to query the offset and timestamp of the record with the highest timestamp in a partition.

Kafka Streams’ TaskMetadata interface sports three new methods to check for committedOffsets, endOffsets, and timeCurrentIdlingStarted which should help keep track of a system’s health as well. Improvements in timestamp synchronization made the addition of a new method in the Kafka Consumer API necessary. The currentLag method will now return “the consumer lag of a specific partition if it is known locally and without contacting the Kafka Broker”. 

Noteworthy enhancements can also be found in the data integration hub Kafka Connect. It now includes the option to “restart either all or only the failed of a connector’s Connector and Task instances with a single call” and has the connector client overrides and log contexts in the Connect Log4j configurations enabled by default.

As this is a major release, there are also some changes that could potentially lead to old code not working properly. Amongst those is the removal of APIs that used the 24-hour default grace period in Streams and replacing them with explicit new methods that either set said period to zero or accept a custom value for the duration of the grace period. 

The Kafka team also got started on some clean-up work, so users should be prepared for warnings. Support for Java 8 and Scala 2.12 \have been deprecated in Kafka 3.0, and will be dropped completely in v4.0. Same goes for the first version of the replication tool MirrorMaker, though additional configuration options in MirrorMaker2 could help as an incentive to make the switch anyway.

Other than that, the maintainers also decided to deprecate message formats v0 and v1, meaning new data will be written using V2 from now on. The plan is for backward compatibility for replicated v0 and v1 data to be realised through conversion once the old formats are removed. This will impact performance, however, so upgrading is recommended.

More details are available in the Kafka blog.