The trend of containerizing databases is growing, as seen in Fig.1. With databases and analytics playing a significant role in technology, a common question arises: Does containerization affect database performance? If so, what factors are involved, and how can we address performance and stability challenges caused by containerization?
Advantages and technical principles of containerization
Containerization is like putting your app and all its parts into a neat, self-contained box that you can easily move around. It’s like a cool magic trick that makes it super easy to package, deploy, and manage applications. This magic is done by tools like Docker or Containerd, which create and manage these boxes called containers.
Kubernetes (K8s) is a big deal in the world of managing these containers. It’s like a master organizer that takes care of everything, from putting containers where they need to go, to making sure they have what they need to run smoothly. It helps with deploying, scaling, managing, and scheduling containers so you can focus on building your app without worrying about the technical stuff.
Advantages of containerization
- Flexibility and portability
- Resource isolation and scalability
- More user-friendly scheduling strategies
Technical principles and categories of containerization
Virtualization abstracts and isolates computing resources, allowing multiple virtual instances on one physical server via a software layer called Hypervisor. Containerization, a lightweight virtualization technique, creates isolated environments for applications using OS-level virtualization. Technologies include:
- Standard Containers (e.g., Docker/Containerd): Adhere to OCI standards, use runC runtime, ideal for K8s workloads.
- User-Space Kernel Containers (e.g., gVisor): Meet OCI standards, use runsc runtime for enhanced isolation and security, suitable for less demanding tasks.
- Microkernel Containers (e.g., Firecracker): Employ hypervisors like Firecracker or Kata-Container, provide balanced security, isolation, and performance.
- Virtual Machines (e.g., KVM, Xen, VMWare): Foundational for cloud servers, operate at a more fundamental level than containers.
Mainstream containerization technologies complying with OCI standards include:
- RunC: Utilizes Linux’s Namespace and Cgroup functions for secure isolation, minimal performance impact.
- Kata Containers: Offers secure compartments by merging virtual machine monitors with container runtimes, slightly slower than classic runtimes.
- gVisor: Enhances security by simulating the OS interfaces within the container, may lead to increased syscalls and I/O performance overhead.
- Firecracker: Uses micro-VMs for serverless computing, ensuring higher security and isolation but potentially higher overhead for syscalls and I/O operations.
Comparing the fundamentals
Table. 1. Overview of implementations of virtualization and isolation in Containerization
– | Containerd-RunC | Kata-Container | gVisor | FireCracker-Containerd |
---|---|---|---|---|
Isolation Mechanisms | Namespace + Cgroup | Guest Kernel | Sandboxed Kernel | microVM |
OCI Runtime | RunC | Clear Container + runv | runsc | RunC |
Virtualization | Namespace | QEMU/Cloud Hypervisor+KVM | Rule-Based Execution | rust-VMM + KVM |
vCPU | Cgroup | Cgroup | Cgroup | Cgroup |
Memory | Cgroup | Cgroup | Cgroup | Cgroup |
Syscall | Host | Guest + Host | Sentry | Guest + Host |
Disk I/O | Host | virtio | Gofer | virtio |
Network I/O | Host + veth | tc + veth | netstack | tap + virtio-net |
How K8s and containerization impact databases
Containerization enhances databases by simplifying deployment and management, providing a consistent and isolated runtime environment. It enables easy deployment and flexible migration across diverse settings, along with standardized version control. Additionally, with Kubernetes support, database roles and components can be seamlessly integrated.
The challenges containerization presents databases
Combining K8s and containerization poses significant challenges for databases due to their unique operational characteristics:
- Databases comprise multiple roles, each serving specific functions, necessitating precise management during operations like creation, restart, and backup. Managing data dependencies across containers remains unresolved.
- Databases require robust data persistence and consistency beyond containerization, often relying on additional components like CSI and PersistentVolume for production-level workloads.
- Databases have diverse performance needs, spanning CPU, memory, network, and storage, varying greatly depending on the database type and query workload.
- Databases mandate stringent security measures for environment isolation, access control, and auditing to safeguard critical data.
Overall, running databases on containerized platforms like K8s presents formidable challenges for both databases and the container+K8s system. KubeBlocks offers comprehensive solutions to address these challenges. For more information, visit http://kubeblocks.io. Now, let’s delve into a detailed examination of how containerization impacts database performance.
How K8s and containerization affect database performance
Database performance hinges on CPU, memory, storage, and network. This section explores how Kubernetes (K8s) and containerization can impact database performance across these dimensions. While K8s offers scheduling strategies, their direct connection to containerization is beyond this discussion.
The following sections dissect the performance of applications, particularly databases, concerning the factors mentioned. Drawing from industry research and recent tests, these sections uncover underlying causes and discrepancies in the data. Additionally, specific areas previously overlooked, such as the influence of K8s’ Container Network Interface (CNI) on network efficiency, are examined.
CPU
The experiments conducted, as detailed in the research paper[1], demonstrate a negligible difference in CPU performance across various container technologies. However, there’s a modest 4% performance dip attributed to CPU restrictions imposed by Cgroup, affecting the utilization of containerized environments.
Memory
Memory performance remains consistent across various solutions, with minimal impact observed. Memory access is largely unaffected by containerization, as syscalls like mmap and brk, which typically influence memory performance, play a minor role in these tests.
Disk I/O
Tests indicate negligible effects on sequential read and write performance with K8s + containerization, except for Kata-QEMU, which exhibits a notable performance drop. The virtio-9p file system in Kata-QEMU is identified as the source of this impact, highlighting the importance of optimized file systems in virtual environments.
Network I/O
While K8s + containerization minimally affects runC and Kata-QEMU, gVisor experiences significant degradation in Redis performance due to its syscall interception mechanism and internal network stack overhead. TCP stream throughput tests also reveal poorer network performance with gVisor compared to other solutions.
CNI Network
Comparisons between legacy host-routing with iptables and eBPF-based host-routing show significant improvements in network efficiency with the latter, particularly evident in Redis benchmark tests. eBPF-based routing eliminates the substantial performance gap between Pod and host networks, making it a favorable option for network-sensitive applications like Redis.
In summary, runC exhibits performance closest to bare metal, making it a preferred choice for running database workloads. Kata Containers, while slightly behind runC in speed, offers enhanced security and isolation. gVisor, prioritizing security features, shows poorer performance due to its syscall implementation. However, ongoing improvements in newer versions aim to address these limitations.
Common Database Performance Issues
Disk I/O Hang: Intense I/O activity can lead to CPU throttling and accumulation of dirty pages, causing disk I/O hang-ups, particularly evident in environments with shared local disks.
Out of Memory (OOM): Memory isolation via Cgroup poses challenges in memory allocation and reclamation, often resulting in OOM errors and performance degradation.
Too Many Connections: Multi-process database models face memory limitations due to the overhead of connection structures, requiring strategies like connection pooling or Hugepages to alleviate the strain.
TCP Retransmissions: Networking issues, including latency and bandwidth limitations, can impact database availability and stability, necessitating careful monitoring and optimization of network performance.
CPU Schedule Wait: VM-based containerization solutions may introduce additional scheduling wait time, affecting database performance, which can be mitigated through workload reduction or VM CPU affinity configuration.
Lock & Latch: Lock and latch mechanisms safeguard resources but may introduce scalability limitations, particularly evident in multi-process database models.
Various Performance Bottlenecks: Each storage engine, like InnoDB for MySQL or WiredTiger for MongoDB, faces unique performance bottlenecks related to disk I/O, I/O unit, process model, and connection limits, requiring tailored optimization strategies.
Overall, understanding and addressing these common database performance issues are crucial for maintaining optimal database performance in containerized environments.