PingCap wraps Chaos Mesh around Kubernetes

PingCap wraps Chaos Mesh around Kubernetes

If Kubernetes seems too easy to work with, you might be glad to hear that database company PingCap has open sourced a chaos engineering platform especially for the fast growing container orchestrator.

Chaos engineering is the art of injecting faults into (production) infrastructure to test a system’s resilience. Netflix developed its Chaos Monkey tool back in 2011, and the original monkey has spawned a whole army of simian-monikered tools.

Now China originated, US-registered cloud database firm PingCap has open sourced its Chaos Mesh under the Apache 2 license. According to PingCap’s Chengwen Yin, Chaos Mesh promises “all-around fault injection methods for complex systems on Kubernetes, covering faults in Pod, network, file system, and even the kernel.” 

The tool enables an array of potential faults, from Pods being killed or failing, network problems including delays, losses, duplication and corruptions, and I/O delays and errors.

Chaos objects are defined using CustomResourceDefinitions. “Instead of defining all types of fault injections in a unified CRD object, we allow flexible and separate CRD objects for different types of fault injection,” the PingCap team writes. This means, “you can manipulate CRD objects directly through the Kubernetes API.” There are currently three chaos objects: PodChaos, NetworkChaos and IOPChaos.

The system is made up of three components; a controller-manager; chaos-daemon- and sidecar. Objects are created or updated to the Kubernetes API using a YAML file or Kubernetes client. Chaos Mesh uses the API server to watch the objects and manage the lifecycle of the chaos experiments, with the three components working together to inject errors.

The PingCap team said they are in “in the process of supporting a wider range of fault types of finer granularity”. Specifically, they’re looking at injecting errors at the system call and kernel levels, and injecting specific error types at the application function and statement  levels.

At the same time, they’re looking to improve the Chaos Mesh Dashboard, and to develop a fault orchestration interface.

PingCap says there are no special dependencies, meaning it can be deployed directly to Kubernetes, and there are no modifications needed to the deployment logic of the system being tested. Right now, the system runs on Kubernetes v1.12 or later, and requires a Helm version between v2.8.2 and 3.0.0.