TECHNICAL - Engineering Blog

Kubernetes

Kubernetes Load Balancer – On-Premises & Bare Metal

Managing a Kubernetes Load Balancer has always been challenging. In the past, setting it up involved a time-consuming process of submitting tickets and collaborating between network and Linux engineers. Now, with cloud technology, on-demand load balancer services are available instantly for Kubernetes, simplifying tasks for DevOps engineers. However, replicating this ease of use in on-premises…

Konika Sitoula

March 11, 2024

Kubernetes

Introducing ingress2gateway; Simplifying Upgrades to Gateway API

In the dynamic realm of Kubernetes, networking is crucial for effective service exposure. The Ingress API, familiar to many Kubernetes users, is key for managing external access to services within the cluster. Despite its usefulness, Ingress has limitations, which can become bottlenecks as applications grow in complexity and Kubernetes clusters face increased demands. Here are…

Konika Sitoula

March 5, 2024

Kubernetes

PinCompute: Pinterest’s Kubernetes-Backed Platform for Versatile Computing Needs

Overview Pinterest is enhancing its compute platform with PinCompute, a fully managed compute API designed for various use cases. Built on Kubernetes, PinCompute simplifies infrastructure management and embraces cloud-native principles. This article explores its architecture and impact on innovation and efficiency at Pinterest. Architecture PinCompute is a regional Platform-as-a-Service (PaaS) on Kubernetes, with a host…

Konika Sitoula

February 26, 2024

Kubernetes

Railyard: Accelerating Machine Learning Model Training Using Kubernetes

Stripe leverages machine learning in services like Radar and Billing, handling millions of daily predictions across diverse models trained with billions of data points. To simplify model training, they developed Railyard, an API and job manager on Kubernetes, enabling independent team training with scalability. Railyard’s API prioritizes flexibility and ease of use, supporting Python workflows…

Konika Sitoula

February 21, 2024

Kubernetes

Does containerization affect the performance of databases?

The trend of containerizing databases is growing, as seen in Fig.1. With databases and analytics playing a significant role in technology, a common question arises: Does containerization affect database performance? If so, what factors are involved, and how can we address performance and stability challenges caused by containerization? Advantages and technical principles of containerization Containerization…

Konika Sitoula

February 20, 2024

Kubernetes

Rubix: Palantir’s Move to Kubernetes

In January 2017, Palantir commenced the Rubix project, aimed at rebuilding the cloud architecture around Kubernetes. With the majority of cloud instances dedicated to computation, the core objective was to establish a secure, scalable, and intelligent scheduling and execution engine for Spark and other distributed compute frameworks. Rubix has now been successfully rolled out to…

Konika Sitoula

February 15, 2024

Kubernetes, TECHNICAL

Airbnb’s Dynamic Kubernetes Cluster Scaling Journey

Introduction At Airbnb, it’s crucial to adjust cloud spending according to demand. Their traffic varies daily, so they need their cloud resources to scale up and down as needed. To achieve this, they use Kubernetes, a system for managing containers, along with OneTouch, a tool for configuring services. In this post, we’ll discuss how they…

Konika Sitoula

February 14, 2024

Kubernetes

Troubleshooting Kubernetes Secrets and Identifying Reasons for Pod Startup Failure

Amazon Web Services (AWS) provides a tool called AWS Secrets Manager that makes it easier to securely store and manage sensitive data, such as database credentials and API keys, that are utilized in modern applications. By allowing developers to consolidate storage and automate rotation operations, it simplifies the handling of sensitive data. Many people could…

Konika Sitoula

February 13, 2024

Kubernetes

Scaling Kubernetes to 7,500 nodes

Scaling a Kubernetes cluster to this magnitude(7500 nodes) is a rare feat that demands careful consideration, but it offers the benefit of a simple infrastructure, empowering OpenAI’s machine learning teams to scale rapidly without altering their code. Following OpenAI’s previous update on scaling to 2,500 nodes, OpenAI has further developed its infrastructure, imparting valuable lessons…

Konika Sitoula

February 7, 2024

Kubernetes

Connecting Kernel Panics to Kubernetes Pods: Keeping Track of Lost Nodes at Netflix

With a dedicated effort to enhance user experience on the Titus container platform, Netflix delved into the issue of “orphaned” pods – those left incomplete without a clear final status. Although this may not be a concern for Service job owners, it holds significant importance for Batch users. This blog post provides insights into our…

Konika Sitoula

February 6, 2024