Apache Samza 1.4 aims at better performance and state monitoring

Apache Samza 1.4 aims at better performance and state monitoring

Asynchronous computational framework for stream processing Apache Samza, which is used at Slack for example, has hit version 1.4 bringing improvements to state monitoring and the SQL API.

To help with the former, Samza has been fitted with a metric to track the maximum serialised value size written to RocksDB. This is important for users of Kafka-backed stores, since the streaming platform has configurable message size limits, meaning insight into a record’s size can help prevent errors. There’s now also a null-check before incrementing metrics for bytesSerialized, since values like this could lead to processes failing before.

Another improvement is meant to mitigate the fact that the application master could take quite a long time to save job metadata. This was caused by the JobModelManager using CoordinatorStreamStore.put(), which flushes for every message which can degrade performance quite a bit – especially when a remote server is under heavy load. Once updated however, the step uses batch processing and calls flush only after put/putAll/delete has been called in related classes.

Regarding the SQL API, the new release allows the dynamic addition of jars in ReflectionUdfResolver, and fixes a couple of issues with SQL statements using trailing semicolons, the display of udf names, and subqueries in joins among other things.

To make the framework work better with Azure’s blob storage, Samza now comes with a native SystemProducer to send records to the cloud service. The team also added extra environment variables to allow running an application master in isolated mode. Since Samza in that case also needs an isolating classloader to run the job coordinator, this feature has been implemented as well.

Internal autosizing related configs can now be used by external controllers should job.autosizing.enabled be set.

Those looking to update to the new version should note that the project’s autoscaling module has been removed since it’s no longer supported. Other than that, upgrading should be relatively painless. The next release however might see backward incompatible changes regarding samza job submission, which are part of the project’s efforts to simplify its job runner. Users who want to start getting prepared for that change are advised to check the project’s wiki.