Figma software engineering manager Ian VonSeggern, has described the rationale and outcome of its migration from AWS ECS (Elastic Container service) to AWS Kubernetes (EKS). Figma runs a service for collaborative design and prototyping of web applications.
Although Kubernetes is often perceived as a platform for microservices, VonSeggern said that “we’re not a microservices company, and we don’t plan to become one.”
Why migrate from ECS? ECS is an orchestration service for containerized applications and potentially simpler to manage than Kubernetes. According to AWS, “using Amazon ECS decreases the number of decisions customers must make around compute, network, and security configurations, without sacrificing scale or features.”
Figma though found that ECS was causing rather than reducing complexity, particularly as it introduced services that are expected to run on Kubernetes. One example was etcd, a distributed configuration data store that normally runs on Kubernetes and uses a Kubernetes feature called StatefulSets. Figma wanted the benefits of etcd but running on ECS, for which the team created custom code which proved “fragile and hard to maintain.” Similarly, teams at Figma wanted to use Helm charts for deploying standard packages, but Helm charts also expect to find Kubernetes.
In other words, Figma found itself swimming against the stream by not using what has become an industry standard platform. “Running on ECS meant we were missing out on all the open source technology in the Cloud Native Computing Foundation (CNCF) ecosystem,” said VonSeggern, as the team also looked enviously at projects including Keda for autoscaling and Envoy as a service mesh.
Figma believed that Kubernetes and its ecosystem will receive “significantly more investment than Amazon will be able to put into ECS.”
Other factors mentioned are avoiding lock-in, and having an easier time hiring people with Kubernetes expertise rather than for ECS.
Figma took a year or so to migrate most of its services to EKS. The company chose to run three separate EKS clusters to increase resiliency. The main reason for failures, VonSeggern implies, is bugs and operator errors, and he says that the three-cluster set up has reduced the “blast radius” of such issues on multiple occasions. One such occasion was when “an operator accidentally performed an action which destroyed and recreated CoreDNS on one of our production clusters.”
Can EKS be cheaper to run than ECS? VonSeggern does not give figures, but said that there are cost efficiencies with EKS. “For our ECS on EC2 services, we simply over-provisioned our services so we had enough machines to surge up during a deploy,” he wrote, whereas on EKS the open source Karpenter project is used to scale the number of nodes (the VMs used by EKS) dynamically. Figma has also focused on migrating its most expensive service to ARM-based Graviton processors, which are more efficient, though Graviton VMs are also available for ECS.
Another example, used after rather than during the migration, was introducing Datadog’s open source Vector, a Kubernetes sidecar project built in Rust, for processing and forwarding logs. This was more streamlined than the previous approach, which wrote logs to the AWS Cloudwatch service, then processed them on the serverless Lambda platform, and then wrote them to third party Datadog and Snowflake analytics services. “The intermediate storage on Cloudwatch was getting expensive,” said VonSeggern.