Kafka takes on under-replication in 2.3 release

Kafka takes on under-replication in 2.3 release

Distributed streaming platform Apache Kafka has hit the 2.3 mark, claiming to improve aspects such as resilience and failure handling along the way.

Prevention and debugging of replication issues has gotten easier thanks to the introduction of the topic partition category AtMinIsr. It can be used to warn operations of partitions running the risk of becoming unavailable, since it’s set up between UnderReplicated and UnderMinIsr. An additional option –under-min-isr in the describe topics command informs users which topic partitions may need addressing for being below the set minimum number of data copies.

The design of ReplicaFetcher has been improved, so that – should a partition crash – the  thread concerned focuses on the partitions still up instead of tracking the crashed one. This behaviour often led to an abrupt halt to replication for all partitions the thread was responsible for which in turn led to under-replicated partitions. 

To make denial of service attacks harder, there is now a limit to the total number of connections that can be active on a broker. The priority also isn’t on new connection requests any longer but on already existing connections to make SocketServer processors more fair.

A more compliance related feature has landed in the form of max.compaction.lag.ms, which can be used to set a maximum amount of time for which a log segment can stay uncompacted. This helps to make sure data is deleted in a timely manner, should specific regulations be in place.

The current release also saw changes to Kafka Connect and Kafka Streams. Connect, a framework which integrates Kafka with other systems, won’t pause worker threads during rebalancing tasks anymore. Additional context in log messages should help in understanding a connectors’ behaviour over time.

To help with things like range queries, Kafka now also includes an implementation of an in-memory window (as well as session) store for Streams. Other than that timestamps are now written into the state store and flatTransformValues() has been added. This allows processor-API-aware computations “that return multiple records for each input record without changing the message key and causing a repartition” as the announcement states.

A complete list of changes, including a new API in the AdminClient which makes incremental config changes easier, can be found in the project’s release notes.