Talking to people about the ‘next big thing’ often leads to a discussion about Serverless Computing at the moment. As an advocate for the containerisation movement, Red Hat’s Michael Hausenblas already knows what kind of questions are likely to come up in business evaluation processes and how to address them. In his book “Serverless Ops – A Beginners Guide For Serverless Operations” he therefore focuses on one of the most talked about issues of the Serverless movement – the role of operations – and approaches the paradigm from this particular perspective.
DevClass: For anyone new to the topic, could you give a brief explanation of what Serverless means?
Michael Hausenblas: Serverless has become a sort of umbrella term for a class of technologies that includes Function-as-a-Service (FaaS), where the developer doesn’t have to deal with the provisioning of servers or the like. For example, if you’re using containers, you’re still involved with building images and so on. With Serverless or FaaS however you just provide your code, upload it to a platform, define some triggers and the platform executes that code whenever a trigger goes off.
The whole movement started out in 2014, when AWS introduced AWS Lambda. They have a big head start there and were two years ahead of everyone else. That’s why they now have a market share of around 70 per cent in production – depending on where you look those numbers up of course. They really went all in and made Lambda the glue code of everything that’s done in AWS.
As with other new technologies, Serverless Computing isn’t really one-size fits all, but the usefulness depends on your specific project. In which cases would you say Serverless is a good choice and are there situations where you would advise to go with a different approach?
In general one would use Serverless in a couple of situations – essentially, when you have event-driven systems. An example for a great fit would be uploading an image to an S3 bucket and transforming or resizing it when that event happens.
For many long-running things Serverless currently is not that great, in Machine Learning for example, where there are a lot of iterations and long-running processes. But we see some adoption when it comes to applications in the Internet of Things (IoT) for example, where you also want a reaction to certain events, so that’s a good place to start.
If you’re having a monolith and think about switching to either a containerised setup /microservices, which you commonly deploy in containers, or Serverless, you have to remember that if you break the service down into microservices first, that’s only one part of the job. You also need a couple of other things like a container registry, a CI/CD pipeline to build the container images, a container runtime and an orchestrator like Kubernetes.
With Serverless on the other hand you typically don’t look at that but just provide the code. Since those functions tend to be quite small, you’ll normally end up with a lot of them, think 200 instead of your 30 microservices, for example, and you have to ask yourself, how they should be orchestrated. It’s a pretty new space with not much there yet and if you look at typical deployments, the services are pretty small with only a hand full of functions.
If you want to start getting into Serverless, maybe start with smaller things, like replacing a cron job or trying it on a greenfield project, where there aren’t that many external dependencies. Once you get a handle on that, you can tackle the bigger issues.
Though Serverless is quite developer centric, ops people will be the ones having to adjust the most it seems. What do they have to prepare for?
Obviously Serverless is nice for developers, since they don’t have to worry about provisioning something – they just provide code which is then executed. But there’s also an implication that to a certain extent exists in the containerised world, too: the question of who is on call at the end of the day.
Imagine you have a setup that is using AWS Lambda – you don’t have any access to the administrator’s space since everything is done by AWS. Nevertheless you need a strategy in case anything goes wrong in that function and needs to be fixed. You probably have to get your developers themselves on call or introduce a dedicated role of an application operator or whatever you want to call that role, someone who actually looks after the function, carries a pager, monitors alerts and does whatever is necessary to keep things running on an application level.
On AWS or any other public cloud you don’t have any administrators. But one pattern is to deploy on Kubernetes and there are over ten frameworks to do so. In those cases you can still have a cluster on premises and therefore need people to look after everything, installing and maintaining this piece of framework software which could be Apache OpenWhisk, for example. They probably won’t be the same as those that have to look after your functions, as they work on a totally different level, that’s why you still have to think about who is looking after the actual function and what can go wrong once it’s deployed.
Some companies like Expedia seem to be quite happy to put their developers on call, but it only works if you only have a couple of functions. If you have a large enterprise application with a couple of hundred functions you probably need some on-call strategy.
Security and compliance are often flagged up as some of the pressing issues of Serverless at the moment – what are your thoughts on that?
On the one hand security is pretty straightforward since from an attacker’s point of view there isn’t much you can do. The attack surface is quite small and in a public cloud setup you always have a huge organisation behind it that looks at things on that level, so you don’t really have a lot of ways to exploit that.
If you run your own Serverless framework on a Kubernetes cluster for example you have to take care of both levels, the Serverless and the Kubernetes one. If you’re using the defaults, there are a couple of things you need to be aware of in Kubernetes, but in general I would argue it’s a pretty secure setup, where you don’t need to worry too much about security issues in that context – from a developer’s perspective at least.
Quite a few people seem to fear vendor lock-in and want to sidestep it via a multi-cloud approch. Can you tell us anything about the problems that go along with such scenarios and how you think they are going to be tackled?
In the Cloud Native Computing Foundation there’s a Serverless Working Group now and one of the things they’re doing is standardising so called CloudEvents. This specification will allow you to essentially make the triggers I mentioned earlier portable and accessible throughout cloud platforms.
The part that isn’t solved yet is the integration part. Functions are stateless and short running so imagine if you want to store things. There’s no standard way to do so in a database, a message queue or whatever yet, but at least the trigger side is covered.
Looking at the cloud-native ecosystem you just mentioned, developers who have been on the job for a while could wonder why problems such as Continuous Integration are getting investigated just now. Why is that and what should one consider in this context?
The question of Continuous Integration in the context of Serverless is interesting, because on the one hand you have way smaller artifacts to deal with. On the other hand if you have many of these functions, things tend to be a little bit tricky, because there isn’t a concept of a registry, service discovery and other things are very much platform dependent – there’s nothing portable on the horizon yet.
Having unit tests is fine, but if you want to see how certain functions are able to work together, which function calls which, if functions need input from several others, etc., that’s still in its early days. You’d need to do a lot of stuff on your own, since you don’t have a lot of tooling that helps nowadays.
As a bit of inspiration: Do you have any suggestion for technologies or projects that teams with a taste for experiments should look into?
Especially if you’re already using a containerised setup, I’d suggest having a look at service meshes like Istio or Conduit. They essentially allow you to – rather than working on a case by case basis – outsource things like traffic management between services, failure injection and observability via a side container and build a control plane that works across the entire cluster. This allows application developers to focus on their stuff without having to worry about exposing metrics to monitoring etc. without changing their code.
Another thing that’s worth a look are data meshes such as Dotmesh. They provide you with a cross-platform snapshot of your data which can be used in different scenarios such as debugging databases, for data science or the versioning of data sets.
Do you have any favourite open source projects you’d like to share?
My favourite open source project certainly is Kubernetes and I think it will stay that way for quite some time. But there are a lot of tools in the container and Serverless space that are super-interesting. I personally also contribute to the data mesh project Dotmesh because I see a lot of potential there. Depending on where you look there are a lot of, sometimes also smaller, interesting tools and projects. If you’re interested in containers, the Cloud Native Computing Foundation with its projects is generally a good starting place. Most of them are looking for contributors, if you’d like to get a foot into the door there.
And while we’re on the topic of sharing, were there any talks that left you inspired in 2018, something you think others could profit from as well?
One speaker that stands out to me is Sarah Wells from the Financial Times (FT). I saw her at KubeCon Europe, where she essentially walked us through the journey that the FT had in terms of containers, their lessons learned, how they handled the versioning and upgrading and also talked about culture.
These end-user stories are very important for techies like me, since we tend to focus on features and how cool this and that is, but we don’t very often take into account that someone who deploys something into production and who is on a timeline to meet certain goals isn’t always necessarily interested in features but more in the way a setup solves a certain problem across different teams and platforms. So her talk gave a very useful insight. I think we definitely need more in that space.
Thank you for taking the time to sit down with us.