+1 for chaos engineering: AWS gets fault injection simulator, adds hosted monitoring services

AWS Fault Injection Simulator

AWS re:Invent still isn’t done and this week, the traditional keynote by Amazon’s CTO Dr Werner Vogels looked at some new and upcoming services geared towards developers looking to build dependable systems.  

With CNCF Technical Oversight Committee chair Liz Rice recently pinpointing chaos engineering as an approach to watch in 2021, Vogels’ announcement of a AWS Fault Injection Simulator (FIS) got quite a bit of interest on social media. The fully managed chaos engineering service is meant to become available next year and looks to give developers a hand at discovering their apps’ weaknesses before they become an issue.

It isn’t the first time that chaos engineering has come up at re:Invent, however. In 2017 Nora Jones, who back then worked as a senior software engineer at Netflix, took to the stage to explain the benefits of integrating some experimentation into the development process. In order to build confidence in a system’s behaviour, chaos engineering practitioners define a system output that signals normal behaviour, introduce some variables reflecting real world events that could influence that state, and start experimenting with those to check for issues. 

The notion of testing in production back then still felt foreign to most, so the recent surge of interest seems to indicate that people are warming up to the idea – especially with more and more projects relying on outside resources that introduce new potential for failure. AWS Fault Injection Simulator could be useful to those interested in the concept but afraid of risking system-wide consequences, as it promises “controls and guardrails” such as rollbacks and configurable stopping points for experiments.

To get started, the service is said to provide “pre-built templates that generate the desired disruptions, such as server latency or database error” which can be run in parallel or sequentially to harden a system and can even be integrated into a CD pipeline for continuous testing. FIS is also advertised to help teams find monitoring blindspots, so that they can improve their setup to become more comprehensive.

Monitoring was quite central to the keynote anyway, as AWS announced previews of managed services for cornerstone monitoring tools Grafana (AMG) and Prometheus (AMP). In cooperation with Grafana developers Grafana Labs, the new offer is meant to take hosting and tool management off dev teams’ hands, add some additional security features, and integrate with all sorts of AWS data sources for insight collection. 

While some community members felt slightly baffled by the sentiment of the tools being hard to manage, the service is probably mainly aimed at meeting organisations where they are, as Vogels put it, which happens to be AWS in some cases. This can help make the service more accessible for some and offers Grafana Labs a way to present a direct taster of what their tool can do. In the best case scenario, this could entice a company or two to invest into an Enterprise subscription which in turn would help to finance the open source project or even win some developers over to help move the project forward. 

Out of the box AMP includes support for Amazon’s Elastic Kubernetes and Container Services (EKS and ECS) only, though it is said that it can be used to monitor self-managed Kubernetes clusters on both cloud and on-premises as well.

While pricing information for AWS Fault Injection Simulator isn’t available yet, AMG is said to be free of charge until 15 February 2021, and go up to $9 per active user, workspace, and month or $5 if it’s view-access only. AMP (for now) only charges “for usage of metrics ingested”. This changes on 15 March “at which point preview customers will be charged for stored and queried metrics” with costs starting at $0.002 per 10,000 samples for the first 2 billion samples plus storage and processing charges.

Other than that Vogels introduced AWS CloudShell, a browser-based shell which should now be available via a new icon in the AWS Management Console. According to AWS chief evangelist Jeff Barr, the new tool is meant to allow easy access to AWS resources and was added to reduce friction and complexity for those who want to use the AWS Command Line Interface but “don’t want to deal with client applications, public keys, AWS credentials, tooling, and so forth”. 

To make this less of an issue, CloudShell works with the credentials used to log into the management console and comes with the AWS CLI pre-installed. The free offer also includes things like Python and Node runtimes, bash, and git, as well as 1GB of persistent storage per region for customisation measures.