Engineering Blog

                            

Airbnb’s Dynamic Kubernetes Cluster Scaling Journey

Introduction

At Airbnb, it’s crucial to adjust cloud spending according to demand. Their traffic varies daily, so they need their cloud resources to scale up and down as needed. To achieve this, they use Kubernetes, a system for managing containers, along with OneTouch, a tool for configuring services.

In this post, we’ll discuss how they adjust the size of their clusters using Kubernetes Cluster Autoscaler, and we’ll highlight some improvements they’ve made for the community.

Airbnb has moved most online services from manual setups to Kubernetes over the past few years. They now operate thousands of nodes across nearly a hundred clusters. This transition happened gradually, with their Kubernetes setup evolving through three stages.

Stage 1: Homogenous Clusters, Manual Scaling

Before adopting Kubernetes, each service instance operated on its own server and was manually adjusted to handle traffic spikes. Capacity management varied across teams, and resources were seldom unallocated when demand decreased.

Initially, Airbnb’s Kubernetes setup was basic, consisting of a few clusters, each hosting single node types and configurations for stateless online services. As services transitioned to Kubernetes, containerized services were run in a multi-tenant environment, reducing resource wastage. Capacity management centralized at the Kubernetes control plane, streamlining operations. While clusters were manually scaled during this phase, it represented a significant enhancement over previous methods.

Figure 1: EC2 Nodes vs Kubernetes Nodes

Stage 2: Multiple Cluster Types, Independently Autoscaled

In the second stage of cluster configuration, Airbnb encountered diverse workload types seeking Kubernetes integration. To address this, they introduced cluster type abstraction, ensuring uniform configurations across identical clusters.

The proliferation of cluster types resulted in an unsustainable manual capacity management approach. To resolve this, Airbnb integrated Kubernetes Cluster Autoscaler into each cluster. This component dynamically adjusts cluster sizes based on pod requests, launching new nodes when capacity is exceeded and removing underutilized nodes. Implementing this solution significantly improved efficiency, saving approximately 5% of their total cloud expenditure and eliminating the operational burden of manual scaling.

The second stage of our cluster configuration began when more diverse workload types, each with different requirements, sought to run on Kubernetes. To accommodate their needs, we created a cluster type abstraction. A “cluster type” defines the underlying configuration for a cluster, meaning that all clusters of a cluster type are identical, from node type to different cluster component settings.

Figure 2: Kubernetes Cluster Types

Stage 3: Heterogeneous Clusters, Autoscaled

s Airbnb transitioned almost all online computing to Kubernetes, the number of cluster types surpassed 30, with over 100 clusters. Managing this expansion became cumbersome, particularly during cluster upgrades, which required testing on each cluster type individually.

In the third phase, Airbnb aimed to streamline cluster management by creating heterogeneous clusters capable of accommodating diverse workloads under a single Kubernetes control plane. This consolidation significantly reduced overhead by minimizing the number of configurations to test. Moreover, with the majority of computing now on Kubernetes, efficiency in each cluster became crucial for cost reduction.

Figure 3: A heterogeneous Kubernetes cluster

Cluster Autoscaler Improvements

Custom gRPC Expander

Airbnb’s key enhancement to Cluster Autoscaler involved developing a new method for determining node groups to scale. Internally, Cluster Autoscaler manages a list of node groups and filters out those that do not meet pod scheduling requirements by simulating scheduling against pending pods. If there are pending pods, Cluster Autoscaler attempts to scale the cluster accordingly. The filtered node groups are then passed to an Expander component for further filtering based on operational requirements.

Figure 4: Cluster Autoscaler and Expander

As Airbnb transitioned to heterogeneous cluster logic, they found that default expanders were insufficient to meet their complex business requirements regarding cost and instance type selection. For example, the default priority expander only allowed users to specify distinct tiers of node groups, limiting flexibility in weighted priority strategies.

To address these limitations and operational concerns, Airbnb set out requirements for a new expander type in Cluster Autoscaler. They aimed to develop something both extensible and usable by others, deployable independently of Cluster Autoscaler, and compatible with the Kubernetes Cluster Autoscaler ecosystem. This led to the design of a pluggable “custom Expander,” implemented as a gRPC client, consisting of two components.

The first component, integrated into Cluster Autoscaler, transforms information about valid node groups into a defined protobuf schema and communicates with the gRPC server to return the final list of options for scaling up the cluster. This design allows for more flexibility and responsiveness to changing business needs while contributing to the Kubernetes Cluster Autoscaler ecosystem.

service Expander {
  rpc BestOptions (BestOptionsRequest) returns (BestOptionsResponse) 
}
message BestOptionsRequest {
  repeated Option options;
  map<string, k8s.io.api.core.v1.Node> nodeInfoMap;
}
message BestOptionsResponse {
  repeated Option options;
}
message Option {
  // ID of node to uniquely identify the nodeGroup
  string nodeGroupId;
  int32 nodeCount;
  string debug;
  repeated k8s.io.api.core.v1.Pod pod;
}

The second component, the gRPC server, is designed to be user-written and run as a separate application or service. It facilitates running complex expansion logic when selecting which node group to scale up, based on information passed from the client. Currently, the protobuf messages exchanged over gRPC are slightly modified versions of those passed to the Expander in Cluster Autoscaler.

For instance, a weighted random priority expander can be implemented by having the server read from a priority tier list and weighted percentage configuration from a configmap, and make selections accordingly. This setup offers flexibility for users to tailor expansion logic to their specific needs.

Figure 5: Cluster Autoscaler and Custom gRPC Expander

The implemented solution includes a failsafe option, which suggests passing multiple expanders as arguments to Cluster Autoscaler. This ensures that if the server fails, Cluster Autoscaler can still expand using a fallback Expander.

Operating as a separate application, the expansion logic can be developed independently of Cluster Autoscaler. Moreover, the customizable gRPC server allows users to tailor it to their specific needs, making the solution extensible and beneficial to the wider community.

Internally, Airbnb has successfully utilized this new solution to scale all clusters since the beginning of 2022. It enables dynamic expansion of node groups to meet business needs, fulfilling the initial goal of developing an extensible custom expander.

Additionally, the custom expander has been accepted into the upstream Cluster Autoscaler, and it will be available for use in the next version (v1.24.0) release.

Other Autoscaler Improvements

During the transition to diverse Kubernetes clusters, several improvements were identified for Cluster Autoscaler:

  • Early abort for AWS ASGs with no capacity: This enhancement allows Cluster Autoscaler to quickly check if AWS Auto Scaling Groups have available space. It speeds up scaling adjustments, avoiding long waits for users.
  • Caching AWS launch templates: Storing AWS ASG Launch Templates in a cache reduces the number of requests to the AWS API. This prevents overloading, especially when dealing with many empty ASGs.

Conclusion

In the last four years, Airbnb has made significant progress in their Kubernetes Cluster setup. Centralizing most of their compute resources on a single platform has enhanced efficiency. Now, they’re prioritizing a more standardized cluster setup, emphasizing scalability and consistency.

Through the implementation of a refined expander in Cluster Autoscaler and addressing minor issues, Airbnb has successfully tailored their scaling strategy to align with business needs, focusing on cost efficiency and diverse instance types. Additionally, they’ve contributed valuable features to the community, enriching the Kubernetes ecosystem.

Reference

https://medium.com/airbnb-engineering/dynamic-kubernetes-cluster-scaling-at-airbnb-d79ae3afa132

Previous Post
Next Post