data Artisans’ Streaming Ledger sprinkles ACID on Machine Learning

data Artisans’ Streaming Ledger sprinkles ACID on Machine Learning

Apache Flink instigator data Artisans has introduced the Streaming Ledger library to process event streams across multiple shared states and tables. Using ACID semantics for transactional consistency, it could be worthy of the financial sector and real-time machine learning.

Up until now, data stream processing projects such as Flink and Spark only offered ways to operate on a single key in a single operator at a time. The ACID transactions of the Ledger change that through operations on multiple keys across multiple tables, with options to share tables between event streams, while promising performance and consistency.

ACID stands for atomicity, consistency, isolation, and durability. For the transactions within Streaming Ledger it means that it performs either all modifications on a row or none, and brings tables from consistent state to consistent state. Transactions are serialisable and act as if they were the only ones operating on tables. The changes they make are durable through the use of persistent sources and checkpoints – the same way as it is ensured in applications built on the stream processing framework Apache Flink.

Architecture and use cases

Streaming Ledger works with Flink, which also stems from data Artisans, using its state to store tables. Other than tables, core building blocks of Streaming Ledger applications are transaction event streams, transaction functions, and optional result streams. The state of an application is maintained in one or several key/value tables. Those are transactionally updated by transaction functions. Keys and values can be of any type, tables are persisted as part of checkpoints, and can -depending of the configuration be stored in memory or RocksDB.

Transactions are triggered by so-called transaction events, that flow in parallel streams. Per transaction event stream, there is one transaction function, which contains the transaction’s business logic written in Java or Scala. Though transaction functions are similar to Flink’s transformation functions, there’s a difference when it comes to table access: transaction functions can share access to the same tables, modifying multiple rows or keys at a time while maintaining consistency. If a transaction function emits events, for example to signal the success of a transaction or send updated values somewhere, they’d be put into a result stream.

Most use cases data Artisans presents for its new product come from the world of financial transactions, for example wiring money between accounts and ledger entries with an indication of whether the transaction was successful. Machine Learning could be another way to make use of the library. For example models that classify events need feature sets attached to events. Computing those in real time and assembling them to vectors could be realised with Streaming Ledger.  

The data Artisans Streaming Ledger library for single streams can be found on GitHub. The element that is needed to process multiple streams in parallel is patent pending and only available in the River Edition of the company’s commercial platform with pricing available on request. More information and detailed use cases in the Streaming Ledger can be found in a now released whitepaper.