Netflix’s shares Mantis to get to the bottom of system downtime

Netflix’s shares Mantis to get to the bottom of system downtime

One of the tech teams behind video streaming service Netflix has open sourced a platform for building stream processing applications. Mantis is supposed to provide better insight into complex distributed systems while keeping the cost associated with observing and operating them to a minimum.

It can be used as a foundation for developing real-time applications to quickly identify issues, trigger alerts, and apply remediations. This is down to the four guiding principles established during the development of Mantis: access to raw events, realtime access to events, ability to ask new questions without changing instrumentation, and cost-effectiveness.

To realise them, the project steps away from using metrics and logs and works with untransformed events instead. Having access to those as they arrive, allows applications to quickly react to anomalies, while offering enough information, to adapt in the face of changing problems. To keep costs low, the platform uses an “on-demand, reactive model, where you don’t pay the cost for these events until something is subscribed to their stream.” It also reissues data for equivalent subscribers, saving even more resources.

With over four years of production use under its belt, Mantis can be considered battle-tested, which means additional feedback isn’t the main reason for the release. Development team member Jeff Chao explained the step in a blog post with Netflix’s belief that its challenges “are not necessarily unique” so sharing the platform code is meant to benefit the broader community.

In Mantis’ particular case, the challenges Netflix wanted to address were the long time needed to accurately process metrics while improving the operational health of an increasingly complex system used by a growing number of customers. The result is described as “a robust, scalable platform that is ideally suited for high volume, low latency use cases like anomaly detection and alerting” and can be found in the company’s real-time monitoring and health check setup.

Mantis is now freely available via GitHub. Its source code is protected under the Apache License 2.0.

The project isn’t Netflix’ first foray into open source. Other popular software by the company includes continuous delivery platform Spinnaker, and chaos engineering tool Chaos Monkey. The step into open source however doesn’t always secure a project’s longevity, as, for example, latency and fault tolerance library Hystrix demonstrates, which isn’t in active development anymore.