A cage in search of a bird: With streaming on an all-time high, Kafka trims latencies, adds security

AI/ML

By Team Devclass

April 16, 2020

A cage in search of a bird: With streaming on an all-time high, Kafka trims latencies, adds security

In a particularly timely release, distributed data streaming platform Apache Kafka – used by companies including Netflix, Coursera and Slack – has unleashed version 2.5, aiming for a more stable, less laggy experience.

One of the problems the Kafka team tackled in the new release is the aggregation of multiple streams into a larger object, which was a bit of a stolid process before. To accommodate such use cases better, the tool’s DSL now comes with a new operator called cogroup(). Instead of creating state stores for each stream and a chain of ValueJoiners to keep them together, it uses a single state store for the new object. Besides the better representation, this approach is also said to improve performance, which is always helpful in data heavy scenarios.

Another enhancement is an attempt to reduce latencies for applications using multiple partitions on a single instance. Instead of letting the appropriate get function run through a number of stores when a stream instance is calling for a particular key, the platform now allows fetching the key from a single partition. The implementation of this was mainly made possible by the team’s work on incremental rebalance to work around node partitions being unavailable, since this helped to expose information on which key belongs to which partition.

To make the platform a bit more secure, Kafka 2.5 is able to make use of TLS 1.3, though v1.2 is still the default. Earlier versions have been disabled due to security vulnerabilities. Should a setup still need those protocols for some reason, they can however be enabled via configuration options ssl.protocol and ssl.enabled.protocols.

Resilience was another item on the 2.5 todo list, which is why default values for zookeeper.session.timeout.ms and replica.lag.time.max.ms have been increased from 6s to 18s and 10s to 30s respectively.

When making the switch to the new version, users should be aware that UsePreviousTimeOnInvalidTimestamp has been replaced by UsePartitionTimeOnInvalidTimeStamp, and KafkaStreams.store(StoreQueryParameters) has taken over for KafkaStreams.store(String, QueryableStoreType), which means some adjustments in older code might be necessary. Additional details can be found in the release notes.

Apache Kafka was initially developed at job networking site LinkedIn, which is now part of Microsoft. It is meant to help with handling real-time data feeds, though lately more and more machine learning engineers seem to have developed a taste for it as a building block for complex ML systems.

A cage in search of a bird: With streaming on an all-time high, Kafka trims latencies, adds security

Microsoft shovels extra Copilot features into VS Code amid dev complaints of 'more AI bloat'

Docker adds AI agents to Compose along with GPU-powered cloud Offload service

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Shadow AI in the enterprise: managing risk without slowing progress

Cursor AI editor hits 1.0 milestone, including BugBot and high-risk background agents

Node.js frustrating and inefficient? OpenAI rewrites AI coding tool in Rust

ABOUT US

FOLLOW US