With DevOps, security is everybody’s responsibility. OK, so what’s next?

With DevOps, security is everybody’s responsibility. OK, so what’s next?

Sponsored For some time now, DevSecOps has looked like the solution to the puzzle of how to get cloud applications out the door without turning throwing security on to the fire. Instead of trying to retrofit complex security policies to work in a hierarchical development pipeline, security becomes a distributed problem that every coder is asked to address.

DevSecOps solutions are by nature designed to be preventative. The idea is to remove complexity by baking robust security methodologies into software development from the earliest stages. Get it right from the outset, and reactive firefighting is greatly reduced.

Conveniently, this model – “shifting security left” to the coder rather than the expert in a fixed hierarchy – also makes sense when developing on cloud platforms that assume rapid deployment and collaboration. There is no development team, security team, or IT deployment team because they are one and the same person. In theory, that’s how security misconfigurations can be caught before they do harm. However, when it comes to cloud development, “shift left” is more talked about than practised. This situation has crept up on organisations that haven’t realised how programming culture has changed rapidly in the cloud era.

“There is a lack of control in this model. With the shift into cloud development and the fact that coders can always get a better answer of Stack Overflow and GitHub, it’s become practically impossible to track the supply chain. It’s a governance problem,” says Guy Eisenkot, the co-founder and vice president of product for a new Israeli-USA start-up, Bridgecrew. Security is now becoming a focal point of continuous development, he says, and this is a work in progress alongside priorities such as performance and features. The problem is that it is not enough to rely on tools alone.

Eisenkot uses the example of an Elasticsearch cluster that has inadvertently been left in a publicly accessible state. Under today’s hierarchical model, security checking will flag this, and the issue is sent to a ticketing system where it will probably languish for hours until someone with the necessary access spots it. But it would be far more efficient to focus on prevention, with the misconfiguration as part of a CI or CD pipeline that immediately pushes the issue to the team that made the error. Armed with the alert, remediation can happen at compile time rather than runtime, which would mean scrambling back from a potentially serious mistake.

“It’s not simply that the tools are not enough, it’s the fact that too many of them are reporting alerts and sending them to Slack channels, Jira tickets and other queues instead of going to developer’s desktop and CI tools,” says Eisenkot.

This is where Infrastructure as Code (IaC), a methodology that makes it easier for security to be “everyone’s business”, comes in to play. However, “easier” does not necessarily mean “perfect”. Let’s take a closer look.

InterrogatingIAC

The issue of configuration has been brought into sharp focus by the rise of IaC and its implementation, using tools such as Terraform. IaC’s promise is to remove the need for manual configuration and a jumble of individual console tools, turning provisioning of cloud and data centre services into machine-readable code which is bundled into automated workflows or templates.

IaC configuration to a cloud platform such as AWS can be either push (direct to the server using a configuration tool such as Chef or Puppet) or pull in the case of Terraform via a configuration server to the provider’s infrastructure. In most cases, Terraform treats resources as ‘immutable’, that is, it wipes a resource and starts again rather than making a configuration change by fiddling with an existing setup.

In the cloud era where developers use multiple platforms and timescales that have shrunk to hours, DevSecOps still asks a lot of developers to get it right first time, argues Eisenkot.

“It suddenly doesn’t take six months to deploy a change to an application, it takes a few minutes. This speed is a blessing in disguise because it creates challenges when we deploy software very fast. Where do we make the compromises?”

As he explains in a separate article on the company’s approach to Infrastructure as Code (IaC) misconfiguration, just having security tools to spot and remediate problems is not the same as managing development risk. The first problem is that anyone with the job of managing compliance and risk cannot easily see what’s going on in an environment where developers are often handed amazing amounts of freedom.

“There are blind spots,” says Eisenkot. “How do you ensure you know what’s going on in your environment? The idea that we know everything that’s going on in our architecture is an assumption we can’t make any more.”

But at least four in ten IaC templates contain a misconfiguration, according to Bridgecrew’s analysis, which often slip through into runtime vulnerabilities that expose resources. The tools to provision but also misconfigure cloud security have been successfully decentralised, but leaving the expertise to manage this problem centrally managed where they are becoming ineffective. The upshot is that many organisations have adopted IaC while lacking the means to manage it securely.

Bridgecrew’s approach to this problem is to build a new type of platform that oversees the entire development pipeline and its effect on security. In essence the system sanity-checks what developers are doing and issues pull requests before problems arise. This is achieved through overlapping capabilities that integrate with Amazon AWS, Microsoft Azure, and Google Cloud Platform (GCP), as well as violations in CloudFormation, Kubernetes and Terraform.

The first element is the ability to scan for IaC and other cloud misconfiguration and policy violations – ideally during build time so that errors don’t make it to deployed infrastructure. This process can also generate compliance reports to meet standards such as PCI. Importantly, uncovering misconfiguration and unauthorised changes at this stage makes fixing them much quicker and easier.

“Bridgecrew is deployed by running a CloudFormation or Terraform template on the subject cloud account. The template creates a read-only role on the subject IaaS account. This role enables it to perform scheduled scans of configuration states and continuously evaluate their latest states,” says Eisenkot.

For some organisations, this can turn into an almost archaeological task that involves pulling out problems in templates dating back years. Eisenkot uses the example of encryption by default, which wasn’t enabled on some database packages until three years ago. Uncovering those issues are easy wins that can happen on day one.

The basic element of Bridgecrew’s fix strategy is playbooks, automation and remediation scripts that make it easy to address common problems across different cloud platforms that might otherwise require several manual steps. A useful starting point, yes, but they have limitations in real-world development, Eisenkot acknowledges. Today, cloud deployment is highly automated, which means there is a risk that changes made by a playbook can easily be undone by a subsequent change.

The company has also put time into open source tools such as Checkov, an IaC static analysis tool which can be used to spot misconfigurations in Terraform, CloudFormation, and Kubernetes. Other Bridgecrew open source tools include the AWS Least Privilege Terraformer AirIAM, which generates a template designed to limit permissions for identity and access management to those which are strictly necessary; and Terragoat, “a vulnerable-by-design learning tool for experimenting with building secure cloud configuration pipelines“. 

If scanning IaC templates sounds straightforward, Eisenkot is candid about the challenges of building a platform that can do this without creating its own headaches.

“It becomes extremely complicated when trying to build a data model that depicts how configuration will eventually manifest based on the prescribed plan. It is imperative to extract its variables and dependencies against the entire code base. Without this step, scans could result in both false positives in the form of configurations that don’t extract all their dependencies or false negatives that are not identified due to misinterpretation of IaC plan,” he says.

Ultimately, handing developers a new platform to manage bad decisions turned out to be only part of the solution. “We learned that it’s not only about giving developers the panic button, but we also had to educate them on making the right decisions when building new infrastructure in the public cloud.” More fundamental still was to educate developers to avoid bad configurations before they happened.

“By making sure those end up at a developer’s pull-request or build report, we were able to provide actionable advice on how to troubleshoot bad configurations and prevent them from getting into production in the first place.”

In a decade, the cloud has gone from powerful resource to invisible problem and, more recently, a risky liability. Having acquired a new set of tools, developers have learned that they must impose some discipline on how they use it. The solution, then, is not more tools to fix security problems but better integration of security as an ethos.

This article is sponsored by Bridgecrew.