Crisis? What crisis? Netflix opensources incident management framework

Crisis? What crisis? Netflix opensources incident management framework

Netflix has open sourced its crisis management framework, giving organisations a standardised way to manage incidents and get back to a state of chill.

The video giant says the framework, which it has dubbed Dispatch, encompasses “All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!”

The framework was originally designed to help Netflix handle security incidents, but the company said it is applicable to any type of crisis. As Netflix rather brutally puts it, “there are quite a few steps to managing an incident and much of it is typically handled on an ad-hoc basis by a human.”

So, to minimise the chance of the human screwing up (even more), Dispatch will “focus on creating resources, assembling participants, sending out notifications, tracking tasks, and assisting with post-incident reviews; allowing you to focus on actually fixing the issue!”

It is deeply integrated with existing tools such as Slack, GSuite, Jira, etc, leveraging  “the existing familiarity of these tools to provide orchestration instead of introducing another tool.”

The framework encompasses a UI, API and database. When an incident is reported by a user via the UI, or automatically via the API, an incident flow is created, so that familiar tools like PagerDuty and Google Docs can be tapped for alerting colleagues, informing customers/suppliers, and managing post incident reviews, and the like.

This relieves the incident commander of the stress of managing access to resources and data, and of managing communications. At the same time, all relevant data is recorded, while key tasks – including post incident reports – are tracked, and “owners are reminded if they’re not completed on time.”

Under the covers, the framework is built on Python 3.7 with FastAPI, and uses Vue.js UI and Postgres. Plugins are provided for GSuite, Jira, PagerDuty and Slack, but “the plugin architecture allows for integrations with whatever tools your organization is already using.”

What it doesn’t have is a reliance on AWS, with no AWS APIs used at all, though Netflix promises, “in addition to all of the built-in integrations, Dispatch provides multiple integration points that allow it to fit into just about any existing environment.”

Crises and incidents are hardly new, however as companies increasingly rely on the web to engage with customers and conduct business, the boundary between a purely internal problem, and something that can crater the entire business has become blurred.

But Atlassian said traditional enterprise service management platforms were not up to the job when it hoovered up OpsGenie launched its own Jira Ops platform in 2018.