 
   
        
            Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes
In the dynamic landscape of AI/ML, deploying and orchestrating large open-source inference models on Kubernetes has become paramount. This talk delves into the intricacies of automating the deployment of heavyweight models like Falcon and Llama 2, leveraging Kubernetes Custom Resource Definitions (CRDs) to manage large model files seamlessly through container images. The deployment is streamlined…
Building Meta’s GenAI Infrastructure
Announcing two 24k GPU clusters, Meta marks a significant investment in its AI future. Details on hardware, network, storage, design, performance, and software, crucial for high throughput and reliability in various AI workloads, are shared. These clusters, utilized for Llama 3 training, underscore Meta’s strong commitment to open compute and source, built atop platforms like…
 
   
        
            CNCF Survey Reveals: Half of Companies Overspend with Kubernetes, Primarily Due to Overprovisioning
CNCF’s recent microsurvey on cloud-native FinOps and CFM revealed insights into Kubernetes’ impact on cloud spending. Nearly half of respondents reported increased costs, while others saw no change or savings post-migration. The main factors contributing to overspending include overprovisioning, lack of awareness, and resource sprawl. The survey sought details on Kubernetes spending and overall cloud…
 
   
        
            KrakenD CE v2.6 released with OpenTelemetry
KrakenD Community Edition v2.6 updates! Enhancements include improved observability, plugin development tools, and JWT validation. This release introduces major improvements and OpenTelemetry integration. Plus, developers now have a new testing command for plugins and access to additional data for more creative plugin development. Introducing OpenTelemetry KrakenD has relied on OpenCensus or its native component for…
 
   
        
            Kubernetes Load Balancer – On-Premises & Bare Metal
Managing a Kubernetes Load Balancer has always been challenging. In the past, setting it up involved a time-consuming process of submitting tickets and collaborating between network and Linux engineers. Now, with cloud technology, on-demand load balancer services are available instantly for Kubernetes, simplifying tasks for DevOps engineers. However, replicating this ease of use in on-premises…
 
   
        
            Concerned about Serverless Lock-in? Consider Patterns!
Design patterns have been enhancing software design for years. In the cloud, they can also cut switching costs. It’s like magic! Design patterns are essential for software design, offering developers a technology-agnostic framework to address common challenges and trade-offs. They can be implemented using standard language constructs or integrated into platforms. Despite fluctuations in interest…
 
   
        
            Introducing ingress2gateway; Simplifying Upgrades to Gateway API
In the dynamic realm of Kubernetes, networking is crucial for effective service exposure. The Ingress API, familiar to many Kubernetes users, is key for managing external access to services within the cluster. Despite its usefulness, Ingress has limitations, which can become bottlenecks as applications grow in complexity and Kubernetes clusters face increased demands. Here are…
 
   
        
            Booking.com improved its delivery speed by using DORA metrics and Micro Frontends, doubling its performance.
Booking.com’s fintech team improved both backend and frontend, doubling delivery performance per DORA metrics. They also utilized Micro Frontends to divide the monolithic frontend into separate deployable apps. In mid-2022, Booking.com formed a new engineering team to manage finance processes. This team inherited architecture with a monolithic frontend (Perl/Javascript with Vue Framework) and a Java…
PinCompute: Pinterest’s Kubernetes-Backed Platform for Versatile Computing Needs
Overview Pinterest is enhancing its compute platform with PinCompute, a fully managed compute API designed for various use cases. Built on Kubernetes, PinCompute simplifies infrastructure management and embraces cloud-native principles. This article explores its architecture and impact on innovation and efficiency at Pinterest. Architecture PinCompute is a regional Platform-as-a-Service (PaaS) on Kubernetes, with a host…
Railyard: Accelerating Machine Learning Model Training Using Kubernetes
Stripe leverages machine learning in services like Radar and Billing, handling millions of daily predictions across diverse models trained with billions of data points. To simplify model training, they developed Railyard, an API and job manager on Kubernetes, enabling independent team training with scalability. Railyard’s API prioritizes flexibility and ease of use, supporting Python workflows…
