PinCompute: Pinterest’s Kubernetes-Backed Platform for Versatile Computing Needs

Overview

Pinterest is enhancing its compute platform with PinCompute, a fully managed compute API designed for various use cases. Built on Kubernetes, PinCompute simplifies infrastructure management and embraces cloud-native principles. This article explores its architecture and impact on innovation and efficiency at Pinterest.

Architecture

PinCompute is a regional Platform-as-a-Service (PaaS) on Kubernetes, with a host cluster managing workload tracking and a control plane. Multiple zonal member clusters execute workloads, aligning with cloud provider-defined failure domains for fault isolation. All clusters share a standard setup, supporting diverse workloads and hardware selections. It’s a multi-tenant platform ensuring secure accommodation of various workload types while maintaining efficient isolation.

Users interact with Compute APIs to manage workloads on the platform. Custom Resources define supported workload types, accommodating batch jobs and long-running services. Workload data is stored in the host cluster’s Kubernetes API upon submission. The federation control plane oversees regional tasks like quota enforcement and workload distribution. Workload segments are distributed to member clusters for execution, managed by a mix of in-house and open source operators. The federation control plane aggregates execution statuses, accessible via PinCompute APIs.

PinCompute Primitives

PinCompute introduces three foundational primitives tailored to accommodate a wide array of workloads at Pinterest: PinPod, PinApp, and PinScaler.

PinPod: It enhances Kubernetes Pods with Pinterest-specific features like per-container updates, managed sidecars, and failover mechanisms, facilitating streamlined application deployment and maintenance.
PinApp: Designed for managing long-running applications, PinApp leverages PinPod’s capabilities and offers built-in orchestration for distributed application management, ensuring efficient deployment and scaling.
PinScaler: Integrated with Pinterest’s metrics dashboard, PinScaler enables automatic application scaling based on configurable metrics, providing flexibility and reliability for handling varying workloads.

These primitives collectively streamline workload management, enhance scalability, and improve efficiency across Pinterest’s infrastructure. They empower teams to deploy, manage, and scale their applications with ease, driving innovation and efficiency throughout the organization.

PinCompute integrates advanced primitives like PinPod, PinApp, and PinScaler, enhancing its capability to support various workloads:

General purpose compute and service deployment: PinApp and PinScaler enable rapid deployment and scaling of stateless services, while PinPod supports versatile computing tasks like Jupyter Notebook for Pinterest developers.
Run-to-finish jobs: PinterestJobSet, PinterestTrainingJob, and PinterestCronJob handle parallel processes, distributed training, and scheduled tasks using frameworks like TFJob and PyTorchJob.
Infrastructure services: PinterestDaemon and PinterestSideCar support different deployment modes, ensuring efficient resource utilization for infrastructure services.

These primitives empower Pinterest developers to focus on business logic while streamlining service delivery and operational tasks.

PinCompute introduces three core components tailored for diverse workloads at Pinterest: PinPod, PinApp, and PinScaler.

PinPod enhances Kubernetes Pods with Pinterest-specific features, facilitating streamlined application deployment and maintenance.
PinApp manages long-running applications, offering built-in orchestration for efficient deployment and scaling.
PinScaler enables automatic application scaling based on configurable metrics, enhancing flexibility and reliability.

Together, these primitives streamline workload management, enhance scalability, and improve efficiency across Pinterest’s infrastructure, driving innovation and efficiency throughout the organization.

Accessing PinCompute

Users use PinCompute’s Platform Interfaces, which include APIs, client layers, and supporting services, to access PinCompute primitives.

PinCompute API

The PinCompute API serves as a gateway for users to interact with the platform, offering three main categories of APIs: workload, operation, and insight APIs. Workload APIs enable CRUD actions on compute workloads, debugging APIs facilitate troubleshooting by offering features like log streaming and container shell access, and insight APIs provide runtime information on application state changes and internal system events for users to monitor their workloads effectively.

Why PinCompute API

PinCompute API, built on Kubernetes APIs, offers streamlined access to the platform across multiple clusters, optimizes Kubernetes API calls with caching, and ensures a consistent user experience across backend services.

Integrating With Pinterest Infrastructure

This layer integrates Pinterest’s infrastructure capabilities, like rate limiting and security measures, to simplify Kubernetes API usage and provide a stable interface. PinCompute API employs rate limiting and Pinterest’s security primitives for authentication, authorization, and auditing, ensuring reliability, security, and compliance.

Enhanced API Semantics

PinCompute API enhances the Kubernetes data model, simplifying it for Pinterest developers. This streamlining reduces the learning curve and improves data efficiency, cutting API call data size by up to 50%. The APIs are designed to be descriptive and intuitive for actions like pause, stop, and restart-container. PinCompute offers OpenAPI documentation, auto-generated clients, and SDKs for self-service application development.

PinCompute SDK

An SDK for clients is being developed to streamline access to PinCompute. This SDK encapsulates best practices like error handling, retry with backoff, logging, and metrics, ensuring consistent application across clients. Versioned SDKs with clear development guidance are managed, and active engagement with users promotes adoption of the latest versions for optimal interactions with PinCompute.

Managing Resources in PinCompute

Resource Model

PinCompute offers three resource tiers: Reserved, OnDemand, and Preemptible, each with distinct quotas and provisioning strategies, including fixed-size pools, shared resources, and upcoming preemptive utilization of unused capacity. Buffers are also allocated within clusters for workload fluctuations.

Scheduling Architecture

PinCompute employs a dual-layered scheduling approach for efficient workload management. At the cluster level, clusters are selected based on filters and scoring mechanisms. At the node level, workloads are assigned to individual nodes within clusters using Kubernetes’s scheduler framework augmented with proprietary plugins. This ensures precise workload placement and resource optimization across PinCompute infrastructure.

PinCompute Cost Efficiency

PinCompute prioritizes cost efficiency while maintaining user experience through multi-tenancy promotion, workload submission streamlining, and transitioning GPU usage from costly to economical options. This effort optimizes resource usage, prevents oversubscription, and reduces GPU costs, as illustrated in the accompanying diagram, supporting business growth.

Link to the Article

https://medium.com/pinterest-engineering/pincompute-a-kubernetes-backed-general-purpose-compute-platform-for-pinterest-8ad408df2d6f