Waiting for a streaming data warehouse? It just Materialize-d

Waiting for a streaming data warehouse? It just Materialize-d

Materialize has officially launched its eponymous streaming data warehouse, making the code available on GitHub and opening a waiting list for a cloud-based version.

In a blog announcing the product, co-founders Arjun Narayan and Frank McSherry said despite the onset of cloud-native infrastructure, “data still doesn’t move as fast as it should”.

Their answer, unsurprisingly perhaps, lies in streaming architectures, with data processing pivoting from “a query-based ‘polling’ design – with staleness built in – to a reactive model that responds to changes the moment they happen. It also bypasses repeated work on unchanged data, which allows it to scale to substantially larger volumes of work.”

“Many people hoped that event-streaming itself would be the revolution,” they continue, but “Cobbled together with free software, streaming is indeed an exciting development, but today requires huge sacrifices in interoperability, flexibility, and ease of use.”

Materialize, they argue is “the first Streaming Data Warehouse” which will connect with existing event streaming architecture, while to the client, “it walks and quacks like Postgres.”

The product is written in Rust, and based on the McSherry-driven Timely Dataflow model, according to the project’s GitHub page. Right now it reads Avro, Protobuf, JSON, and newline-delimited text and can read data from Kafka topics or tail local files.

The developers say support for AWS Kinesis streams and Azure Event Hub, for reading ORC and Parquet files on object storage, and for “getting data in from arbitrary HTTP endpoints” are all coming soon.

Once data is in, it adds, users can “define views and perform reads via the PostgreSQL protocol” while they can use “any PostgreSQL-compatible driver in any language/environment to make SELECT queries against your views. Alternatively, it can be configured to stream results to a Kafka topic.

The company describes it as “source available” under the BSL 1.1 converting to the open-source Apache 2.0 license after four years. It will be “free forever on a single node” with additional features such as high availability coming under a paid cloud service, which is currently operating a waiting list.

The company has been launched by former Cockroach Labs engineer Arjun Narayan, who is CEO, and Microsoft Research Alum Frank McSherry, who is chief scientist. The top rank is rounded out by head of engineering, and Cockroach and Dropbox veteran, Cuong Do, and Craig Breslawski, who is head of sales and marketing.