A “Krispr” Approach to Kubernetes Infrastructure: Keeping Pods Fresh and Rolling Out Updates Smoothly
Introduction In the demanding world of modern service-oriented architectures, maintaining fresh and up-to-date infrastructure is crucial for optimal performance and security. Airbnb, with its hundreds of services relying on Kubernetes, faced challenges in efficiently updating shared infrastructure components within their platform. Their existing approach, heavily dependent on service owner upgrades, led to version fragmentation, complexity,…

Scaling Kubernetes to 2,500 Nodes for Deep Learning at OpenAI
OpenAI, a pioneer in artificial intelligence, pushes the boundaries of Kubernetes by scaling it to manage massive deep learning workloads. While managing bare VMs remains an option for the largest tasks, Kubernetes shines for its rapid iteration cycles, reasonable scalability, and reduced development overhead. This blog dives into OpenAI’s journey building a 2,500-node Kubernetes cluster…