Engineering Blog

                            

Blog – 4 Column

AWS Lambda Under the Hood
AWS Lambda Under the Hood

Mike Danilov covers how Lambda is built and how they had to modify the architecture to support 10GiB payloads⁠. AWS Lambda Overview Lambda is a serverless computing system where you can run your code on-demand without managing servers. It supports various programming languages, scales rapidly to match demand, and is used by millions of users…

Generative AI Tools for Infrastructure as Code
Generative AI Tools for Infrastructure as Code

Learn how to use generative AI to generate, interpret and debug code and accelerate your workflows. Infrastructure as Code (IaC) aids engineers in managing data, applications, and infrastructure in dynamic IT environments. Through a GitOps-driven approach, it ensures standardization, security, and operational consistency across various environments. Generative artificial intelligence (AI) has emerged as a game-changer…

Cilium and eBPF, with Bill Mulligan
Cilium and eBPF, with Bill Mulligan

Guest is Bill Mulligan. Bill is a Community Pollinator at Isovalent working on Cilium and eBPF. We learned how to properly pronounce Isovalent and what it actually means. We also spoke in depth about eBPF, Cilium, network functions in Kubernetes and more. Link to the Podcast https://kubernetespodcast.com/episode/217-cilium-ebpf/

Rubix: Palantir’s Move to Kubernetes
Rubix: Palantir’s Move to Kubernetes

In January 2017, Palantir commenced the Rubix project, aimed at rebuilding the cloud architecture around Kubernetes. With the majority of cloud instances dedicated to computation, the core objective was to establish a secure, scalable, and intelligent scheduling and execution engine for Spark and other distributed compute frameworks. Rubix has now been successfully rolled out to…

Airbnb’s Dynamic Kubernetes Cluster Scaling Journey

Introduction At Airbnb, it’s crucial to adjust cloud spending according to demand. Their traffic varies daily, so they need their cloud resources to scale up and down as needed. To achieve this, they use Kubernetes, a system for managing containers, along with OneTouch, a tool for configuring services. In this post, we’ll discuss how they…

Troubleshooting Kubernetes Secrets and Identifying Reasons for Pod Startup Failure
Troubleshooting Kubernetes Secrets and Identifying Reasons for Pod Startup Failure

Amazon Web Services (AWS) provides a tool called AWS Secrets Manager that makes it easier to securely store and manage sensitive data, such as database credentials and API keys, that are utilized in modern applications. By allowing developers to consolidate storage and automate rotation operations, it simplifies the handling of sensitive data. Many people could…

StrimziCon 2024: Learn & Connect with the Strimzi Community
StrimziCon 2024: Learn & Connect with the Strimzi Community

StrimziCon, the first virtual conference dedicated to Strimzi and Apache Kafka, is coming this May! For developers, DevOps engineers, and solution architects: Details about the event to participate in StrimziCon. Share this with your network and join the Strimzi community! Link to the Article https://www.cncf.io/blog/2024/02/07/welcome-strimzicon-2024/

Improving and Assessing DevEx in Your Organization
Improving and Assessing DevEx in Your Organization

Developer experience (DevEx), encompassing the entirety of the environment, tools, practices, and culture encountered by software developers in their daily tasks, significantly impacts your capacity to attract and retain top talent. Elements such as development environments, workflows, tools, processes, and overall work culture are pivotal not just in fostering developer satisfaction but also in enhancing…

A Simple Framework for Architectural Decisions

Software engineers face critical decisions like choosing between Python and Java for a microservice or determining the repository structure. Establishing a clear decision-making framework is crucial, defining team autonomy and aligning with business goals and culture. This framework influences leadership perception, managerial roles, and empowerment levels. While implementation varies, key components like Tech Radar, Technology…

Scaling Kubernetes to 7,500 nodes
Scaling Kubernetes to 7,500 nodes

Scaling a Kubernetes cluster to this magnitude(7500 nodes) is a rare feat that demands careful consideration, but it offers the benefit of a simple infrastructure, empowering OpenAI’s machine learning teams to scale rapidly without altering their code. Following OpenAI’s previous update on scaling to 2,500 nodes, OpenAI has further developed its infrastructure, imparting valuable lessons…

Connecting Kernel Panics to Kubernetes Pods: Keeping Track of Lost Nodes at Netflix
Connecting Kernel Panics to Kubernetes Pods: Keeping Track of Lost Nodes at Netflix

With a dedicated effort to enhance user experience on the Titus container platform, Netflix delved into the issue of “orphaned” pods – those left incomplete without a clear final status. Although this may not be a concern for Service job owners, it holds significant importance for Batch users. This blog post provides insights into our…

Slack’s journey to reliable and scalable cron execution at scale
Slack’s journey to reliable and scalable cron execution at scale

Slack started with the classic “one box, one crontab” approach for cron jobs. Initially, it worked fine, but as the platform grew, so did the number of scripts and their processing demands. This led to several issues: Building a Better Way: Introducing Chronos Facing these challenges, Slack opted for a custom solution: Chronos. Here’s a…