GitHub outages? It’s the database stupid... • DEVCLASS

CI/CD
DevOps

GitHub outages? It’s the database stupid…

By Team Devclass

March 2, 2020

GitHub outages? It’s the database stupid…

GitHub has cast a sliver of light on the cause of the outages that have plagued the code hosting platform in recent weeks.

CEO Nat Friedman was forced to take to Twitter last week to apologise for the outages, after the Microsoft-owned platform took two substantial lie downs in a matter of days. However, while he said “we take reliability very seriously” he gave no reason for the company’s failure to deliver the same.

On Friday, GitHub svp for engineering, Keith Ballinger added his apology to the mix, before going some way to explain what the actual problem was.

“These incidents were distinct events, but have a common theme of uncovering new challenges in scaling our database tier,” he said. “Specifically, increased load on our largest database cluster contributed to degradations across multiple services.“

Ballinger promised “a more in-depth and technical report of these events and the work we are doing to improve the scalability and performance of our backend systems.”

By way of reassurance, he added, “We have several data partitioning initiatives already in progress, and we’ll be rolling out some of this work very soon. You can follow our status page for updates about the availability of our systems.”

Yelp that’s it. It’s all the database’s fault. Ballinger doesn’t go into depth about what database exactly is at fault. However, back in 2018, in the wake of a 24 hour outage, a lengthy mea culpa referred to problems with the MySQL clusters underpinning the service.

At the time, it said it would adjust the configuration of Orchestrator, which it used to manage the MySQL clusters, while a a pre-existing effort “to support serving GitHub traffic from multiple data centers in an active/active/active design…to tolerate the full failure of a single data center failure without user impact” was given added urgency. It also pledged to use more chaos engineering to envisage likely failure scenarios, and improve its reporting.

It’s fair to say it’s delivered on at least one of those – it was much easier for users to confirm it was indeed GitHub that was the problem last week….and the week before that.

GitHub outages? It’s the database stupid…

CloudBees opens MCP server so agents can infiltrate DevOps

GitLab warms up investors for winter release of agentic AI flavoured Duo Workflow

JetBrains previews official VS Code language server for Kotlin, unveils fresh language features at K...

The hidden cost of dev stack diversity within an enterprise: 'Engineering chaos'

More React, more app-like: GitHub engineer outlines future UI for its DevOps platform

Tailwind CSS 4.0 released with 'ground-up rewrite' for faster Rust-powered build

How should development environments be standardized? Coder report highlights wide variations

GitHub Git downtime caused by bad configuration update

GitHub debuts limited Copilot free tier in a crowded market

Community plans to fork Puppet, unhappy with Perforce changes to open-source project

ISO C++ Chair Herb Sutter leaves Microsoft, declares forthcoming C++ 26 'most impactful release sinc...

Gitpod discontinues "journey of experiments, failures and dead-ends" with Kubernetes

ABOUT US

FOLLOW US