Google's Dataproc lights up Spark on Kubernetes • DEVCLASS

Google’s Dataproc lights up Spark on Kubernetes

By Team Devclass

September 11, 2019

Google’s Dataproc lights up Spark on Kubernetes

Google has announced a Kubernetes-flavoured version of its Cloud Dataproc Hadoop and Spark service, giving customers an alternative to working with Yarn.

In a blog accompanying the announcement, James Malone, Product Manager, Google Cloud, wrote “With this announcement, we are bringing enterprise-grade support, management, and security to Apache Spark jobs running on GKE clusters.”

Another blog detailing the service, added “By extending the Cloud Dataproc Jobs API to GKE, you can package all the various dependencies of your job into a single Docker container. This Docker container allows you to integrate Spark jobs directly into the rest of your software development pipelines.“

The service, currently in alpha, will also support the Apache Flink stream processing framework, while support for both Presto and Apache Druid is also in the pipeline.

While Dataproc is a Google service, Malone said Google’s Anthos platform opened up the possibility of running jobs across hybrid or multiple clouds. “Dataproc becomes that one pane of glass,” Malone told Devclass, taking care of monitoring, security etc, wherever jobs are running.

While Malone couldn’t give a pipeline for when Anthos support might appear, experience suggests that three to six months is a typical time lag from the point when Google says something seems like a good idea to something actually appearing.

Malone also said that companies were asking for Hadoop support on Kubernetes. “That’s a lot more complicated [and will take] a lot more time to figure out.”

In the meantime, the company had to be aware of what other open source projects were gaining traction to ensure that Kubernetes support came earlier and easier “Capturing the new developments now is the most useful [thing to do].”

Malone said that the announcement meant that for now, there will be two versions of Dataproc, supporting Yarn and Kubernetes.

“Yarn will be around for the foreseeable future,” he said, and it was possible that Yarn would in time run on Kubernetes too.

Google’s Dataproc lights up Spark on Kubernetes

Microsoft shovels extra Copilot features into VS Code amid dev complaints of 'more AI bloat'

Docker adds AI agents to Compose along with GPU-powered cloud Offload service

Microsoft SQL Server MCP tool: 'Leap in data interaction' or limited and frustrating?

Cloudflare container platform in public preview with scale to zero pricing, some initial limitations

Google positions itself for 'next decade' of AI as Gemini CLI arrives with generous free tier

CloudBees opens MCP server so agents can infiltrate DevOps

AI is generating code at scale – but human scale code review can’t keep up

Redefining identity security in the age of agentic AI

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

New Relic aims to crack open MCP servers

Apple's Containerization will matter to developers – but Podman devs complain of unfixed issues

Shadow AI in the enterprise: managing risk without slowing progress

ABOUT US

FOLLOW US