Engineering Blog

                            

Don’t Let Your Users Disconnect! Achieve True Zero-Downtime with Kubernetes

This article dives into achieving true zero-downtime deployments in Kubernetes, specifically focusing on avoiding disruptions during rolling updates that can lead to broken client connections.

The Challenge: Rolling Updates and Downtime

While Kubernetes offers rolling updates for seamless application upgrades, these updates can introduce brief downtime windows. This downtime, measured in microseconds to seconds, might be negligible for low-traffic applications, but for critical services like payment gateways, every millisecond counts.

Understanding Rolling Updates and Downtime Causes

  • Rolling Updates Explained: Kubernetes replaces pods incrementally with newer versions during a rolling update. While new pods are launched, older pods are shut down.
  • Downtime During Pod Lifecycle: Downtime occurs during both pod startup and shutdown phases due to a mismatch in task completion times.
    • Pod Startup: Without a readiness probe, new pods might receive traffic before they’re fully initialized, leading to instability.
    • Pod Shutdown: When a pod is deleted (during a rolling update or manually), Kubernetes removes it from services first. However, it takes some time for network traffic rules (iptables) to be updated, potentially causing traffic to be routed to terminating pods and resulting in connection errors.

The Solution: Pre-Stop Hooks for Graceful Shutdowns

Since the core issue is the time difference between pod termination and iptables updates, we can implement a pre-stop hook in the deployment configuration. This hook instructs the container to wait for a specified duration before shutting down completely. This delay allows ample time for the network rules to be updated, ensuring new connections are directed to healthy pods.

Implementation Steps:

  1. Readiness Probes: Ensure your deployment configuration includes a readiness probe. This probe verifies a pod’s readiness before directing traffic, preventing issues during startup.
  2. Pre-Stop Hook: Add a pre-stop hook to your deployment using the lifecycle.preStop.exec section. This hook defines a command to execute before pod termination. Here’s an example with a 20-second delay:

YAML

        lifecycle:
          preStop:
            exec:
              command: ["/bin/bash", "-c", "sleep 20"]

Important Considerations:

  • Pre-Stop Hook Duration: Set the sleep duration in the pre-stop hook to a value less than the terminationGracePeriodSeconds (default 30 seconds). Exceeding this limit might lead to forceful container termination.
  • Graceful Shutdown: While the pre-stop hook delays container termination, existing connections to terminating pods will be maintained until processes are completed gracefully. New connections, however, will be routed to healthy pods.

Conclusion

By incorporating readiness probes and pre-stop hooks, you can significantly reduce downtime during rolling deployments in Kubernetes. This ensures a smoother user experience even during frequent application updates.

Happy Automating!

Reference to the Article- kunmidevOpstories

Follow us for more Updates

Previous Post
Next Post