Google’s Dataproc lights up Spark on Kubernetes

Google’s Dataproc lights up Spark on Kubernetes

Google has announced a Kubernetes-flavoured version of its Cloud Dataproc Hadoop and Spark service, giving customers an alternative to working with Yarn.

In a blog accompanying the announcement, James Malone, Product Manager, Google Cloud, wrote “With this announcement, we are bringing enterprise-grade support, management, and security to Apache Spark jobs running on GKE clusters.”

Another blog detailing the service, added “By extending the Cloud Dataproc Jobs API to GKE, you can package all the various dependencies of your job into a single Docker container. This Docker container allows you to integrate Spark jobs directly into the rest of your software development pipelines.“

The service, currently in alpha, will also support the Apache Flink stream processing framework, while support for both Presto and Apache Druid is also in the pipeline.

While Dataproc is a Google service, Malone said Google’s Anthos platform opened up the possibility of running jobs across hybrid or multiple clouds. “Dataproc becomes that one pane of glass,” Malone told Devclass, taking care of monitoring, security etc, wherever jobs are running.

While Malone couldn’t give a pipeline for when Anthos support might appear, experience suggests that three to six months is a typical time lag from the point when Google says something seems like a good idea to something actually appearing.

Malone also said that companies were asking for Hadoop support on Kubernetes. “That’s a lot more complicated [and will take] a lot more time to figure out.”

In the meantime, the company had to be aware of what other open source projects were gaining traction to ensure that Kubernetes support came earlier and easier “Capturing the new developments now is the most useful [thing to do].”

Malone said that the announcement meant that for now, there will be two versions of Dataproc, supporting Yarn and Kubernetes.

“Yarn will be around for the foreseeable future,” he said, and it was possible that Yarn would in time run on Kubernetes too.