Cloudera has released universal resource scheduler YuniKorn into the wild, bringing a unified approach to scheduling stateless batch workloads and stateful services for big data scenarios to cloud-native and more classic environments.
The project was born from the observation of the fragmented scheduler ecosystem. The Cloudera team couldn’t find a good tool with support for batch jobs and the need for fairness and high throughput that go along with it on the one hand, and long running services that call for persistent volumes and complex placement constraints on the other hand.
YuniKorn tackles these needs by offering a common scheduler interface that defines communication protocols and decouples the scheduler from underlying resource management platforms such as YARN and Kubernetes.
The project’s core decides on the placement of each request and sends response allocation requests to the resource management systems. To make sure platforms really get the requests and the core can collect system resources, a scheduler shim is in place to translate between both entities. Users can get an insight into the current and historic scheduler status through a web interface.
The current state of the Apache License 2 protected open source project lets ops folks work with hierarchical pools and queues with min/max resource quotas when scheduling batch jobs or long-running services. YuniKorn also supports resource fairness between queues, users, and apps, cross-queue preemption based on fairness, custom resource types, and automatically maps incoming container requests to queues using policies.
Since it can be deployed on Kubernetes and used as a scheduler, the system knows about predicates such as pod affinity and node selectors, persistent volumes and the respective claims, and dynamically loads scheduler configurations from the configmap, once it’s set up.
Initially, YuniKorn only supports YARN and Kubernetes, which is also reflected in the name of the project (Y for YARN, K for Kubernetes, uni for unified and orn for…you know, to make it into an actual word). In the next couple of months the team behind the project is mainly planning to focus on the support of big data workloads like those you come across when using Apache Spark or Flink – not surprising if you look into the company’s big data background.
Other than that, the roadmap contains the implementation of a cluster overview page to inform about cluster and application status for the web UI, and support for Helm charts and security in the Kubernetes shim. The core is still missing things like Grafana and Prometheus integration, functions to prioritise apps and a workload simulator, however since those things are already being looked into, the next releases could get interesting.