The Linux kernel version 7.0 now supports network namespaces in AF_VSOCK. This fixes issues in how virtual sockets work in isolated environments like containers. It’s exciting for anyone working with virtualization!
What is AF_VSOCK?
Let’s start simple. In normal networking, programs use sockets to talk over the internet or local networks. These use address families like AF_INET (for IP addresses, e.g., 192.168.1.1).
AF_VSOCK (Address Family for Virtual Sockets) is a special socket type designed for virtualized environments. It’s like a private communication channel between:
- A host machine (the main computer) and its guest VMs (virtual computers running inside it, like with KVM or QEMU).
- Or between different VMs on the same host.
Why use it? In virtualization, you might not want to rely on physical network cards or IP addresses for host-guest chats. VSOCK is faster and more secure for that.
Key parts:
- Addresses: Instead of IPs, VSOCK uses Context IDs (CIDs)—simple numbers to identify “who” you’re talking to.
- CID 0: Reserved (hypervisor, but rarely used).
- CID 1: Loopback (talk to yourself, like localhost in IP world).
- CID 2: The host machine.
- Higher numbers (e.g., CID 42): Assigned to guest VMs.
- Ports: Like TCP/UDP, you add a port number (e.g., 1234) for specific services.
- Transports: The “highway” for data. Common ones:
- vhost-vsock: For host-to-guest (H2G) communication in tools like KVM.
- vsock-loopback: A fake transport for testing on the same machine (load it with sudo modprobe vsock_loopback).
- Others like virtio-vsock (guest-side) or Hyper-V/VMCI (for specific hypervisors).
Example: To listen for connections on VSOCK, you might use a tool like nc (netcat): nc –vsock -l 1234. To connect: nc –vsock 42 1234 (to CID 42, port 1234).
Before this update, VSOCK was “global”—it ignored isolation, which leads us to the problem.
The Problem: Why Network Namespaces Matter
Next layer: Network namespaces in Linux are like invisible walls that separate networking stuff. They’re a core feature for containerization (e.g., Docker, Podman) and security.
- What they do: Each namespace is its own “mini-network universe.”
- Separate interfaces (e.g., one namespace has its own eth0).
- Separate IPs, routes, firewalls.
- Even separate sockets.
- Why use them? For isolation. Example: Two containers on one server—one runs a web app, the other a database. Namespaces ensure they can’t accidentally mess with each other’s network (unless you allow it).
- Creating one: Use commands like unshare –net (temporary) or ip netns add myns (persistent).
The issue with old VSOCK (before Linux 7.0):
- VSOCK sockets were always global—they didn’t respect namespaces.
- Problems:
- No isolation: A VM in Container A (with CID 42) could be reached via VSOCK from Container B, even if B had no other network access to A. This breaks security—imagine sensitive data leaking between isolated apps!
- No CID reuse: You couldn’t assign the same CID (e.g., 42) to VMs in different namespaces. It was like all phone numbers had to be unique across the whole world, not just per country.
- This made VSOCK tricky for containerized VMs (e.g., running QEMU inside Docker).
Think of it as: Before, VSOCK was like a shared party line phone—anyone could pick up and listen. Now, it’s getting private lines.
The Solution: Adding Namespace Support to AF_VSOCK
The big fix in Linux 7.0: VSOCK now understands namespaces! It’s backward compatible (old behavior is default), but you can opt into isolation.
- Two Modes per Namespace:
| Mode | Description | When to Use |
| Global (default) | VSOCK acts like before: Shared across all namespaces. CIDs are unique globally, and communication crosses namespaces. | For legacy setups or when you want everything connected. |
| Local | Isolated! Each namespace has its own VSOCK world. No cross-talk, and you can reuse CIDs (e.g., multiple VMs with CID 42 in different namespaces). | For secure, containerized environments. |
- How to Control It: These use sysctl (kernel settings you tweak via files in /proc/sys).
- Child_ns_mode (writable, set in parent namespace): Decides the mode for NEW child namespaces.
- Example: echo local | sudo tee /proc/sys/net/vsock/child_ns_mode
- This makes all future namespaces “local” by default.
- Ns_mode (read-only): Check current namespace’s mode.
- cat /proc/sys/net/vsock/ns_mode → Outputs “global” or “local”.
- Key: Mode is set at creation and can’t change later. It’s like choosing a house foundation—you can’t swap it after building.
- Child_ns_mode (writable, set in parent namespace): Decides the mode for NEW child namespaces.
- Which Transports Support It?
- Yes: vhost-vsock (H2G, for KVM/QEMU) and vsock-loopback (testing).
- No (yet): Guest-to-host (G2H) like virtio-vsock in the guest OS. These always act “global” and can’t reach “local” namespaces. Future work needed here.
Result: In “local” mode, VSOCK traffic stays inside its namespace—perfect isolation.
Hands-On Examples: Testing It Out
Now, let’s walk through examples from the post. These assume you’re on Linux 7.0+ with VSOCK tools (e.g., nc from nmap-ncat package). I’ll explain each step.
Example 1: Basic Loopback (No VMs Needed)
Use vsock-loopback for local testing.
- Global Mode:
- Ensure default (global): Don’t change anything.
- Create a new namespace and start a listener: unshare –user –net nc –vsock -l 1234 & (background job).
- From your main shell (init namespace): Connect with nc –vsock 1 1234. Type something—it works! (CID 1 is loopback, but crosses namespaces.)
- Why? No isolation.
- Local Mode:
- Set: echo local | sudo tee /proc/sys/net/vsock/child_ns_mode
- Repeat step 2 above (new namespace is now local).
- Connect: nc –vsock 1 1234 → Fails with “Connection reset by peer.”
- Why? Isolation—loopback is now per-namespace.
Check mode in a namespace: sudo ip netns add myns; sudo ip netns exec myns cat /proc/sys/net/vsock/ns_mode
Example 2: With Containers (Podman)
Podman creates namespaces automatically.
- Build image: podman build -t fedora-ncat – <<< “FROM fedora; RUN dnf -y install nmap-ncat”
- Global Mode:
- Listener container: podman run –rm –init -d fedora-ncat sh -c “echo hello | nc –vsock -l 1234”
- Connector: podman run –rm –init -it fedora-ncat nc –vsock 1 1234 → Outputs “hello” (cross-container talk).
- Local Mode (after setting child_ns_mode to local):
- Same commands → Connector fails. Isolation wins!
Example 3: With VMs (QEMU)
Assume you have QEMU installed and a VM image (e.g., a basic Linux ISO).
- Create namespaces: sudo ip netns add vsock_ns_global; sudo ip netns add vsock_ns_local (assuming child_ns_mode set for local).
- Global:
- Run QEMU in global ns: sudo ip netns exec vsock_ns_global qemu-system-x86_64 … -device vhost-vsock-pci,guest-cid=42 (add your VM details).
- In VM: nc –vsock -l 1234
- From host: nc –vsock 42 1234 → Works.
- Local:
- Run QEMU in local ns.
- From host: Fails.
- From same ns: sudo ip netns exec vsock_ns_local nc –vsock 42 1234 → Works.
Example 4: CID Reuse
In local mode: Run two QEMUs with CID 42 in different local namespaces—no conflict. In global: Second one fails.
Follow us for more Updates