Kubernetes Pause Container

KANS Study Week 2–1

Sigrid Jin
36 min readSep 7, 2024
https://creboring.net/blog/how-docker-divide-image-layer/

Docker’s Container Execution Layer Structure

Docker’s architecture is a multi-layered system designed for efficient container management:

  1. Docker Client: This is the primary interface for users to interact with Docker. It communicates with the Docker daemon through a socket or API, translating user commands into API calls.
  2. Docker Daemon (dockerd): The heart of Docker, this daemon processes Docker API requests and manages Docker objects such as images, containers, networks, and volumes. It communicates with the container runtime via gRPC calls.
  3. containerd (High-Level Runtime): The runtime manages container lifecycles, handling image management which is to upack images into OCI runtime bundles to invoke runc to execute containers.
  4. containerd-shim: This lightweight process sits between containerd and runc. It allows containerd to manage container lifecycles without being blocked by container processes. Importantly, it ensures containers continue running even if containerd restarts or crashes.
  5. runc (Low-Level Runtime): The reference implementation of the OCI container runtime specification. It’s responsible for the actual creation and execution of containers at the lowest level.
  6. Application: The actual application running inside the container.

Kubelet’s Core Functionality: The Sync Loop

https://alibaba-cloud.medium.com/getting-started-with-kubernetes-kubernetes-container-runtime-interface-5a6348d4d438

At the heart of Kubelet’s operation is the Sync Loop. This continuous process ensures that the actual state of the node matches the desired state as defined in the Kubernetes control plane. The Sync Loop is responsible for:

  1. Watching for changes in pod specifications
  2. Creating or updating containers as needed
  3. Ensuring that pod resources are available
  4. Reporting the status of pods and the node back to the API server

Kubelet communicates with the high-level container runtime (like containerd) using gRPC over the Container Runtime Interface (CRI).

  1. Kubelet acts as the gRPC client.
  2. The container runtime (e.g., containerd) acts as the gRPC server.

The actual container operations are then passed from the high-level runtime (containerd) to the low-level runtime (like runc) via the OCI standard.

Installing Kind

Kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It’s particularly useful for development and testing environments. The setup process involves installing Kind and creating a cluster with control plane and worker nodes.

Kind creates a separate Docker network for the Kubernetes cluster, isolating it from other Docker networks on the host. Within this network, the Kubernetes nodes (which are Docker containers) can communicate with each other.

The control plane node’s API server is made accessible outside the Docker network through port forwarding, allowing tools on the host (like kubectl) to interact with the cluster, while the CNI (kindnet) handles pod-to-pod networking within the cluster.

# For AMD64 / x86_64
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.24.0/kind-linux-amd64
# For ARM64
[ $(uname -m) = aarch64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.24.0/kind-linux-arm64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

kind creates and manages local Kubernetes clusters using Docker container 'nodes'

Usage:
kind [command]

Available Commands:
build Build one of [node-image]
completion Output shell completion code for the specified shell (bash, zsh or fish)
create Creates one of [cluster]
delete Deletes one of [cluster]
export Exports one of [kubeconfig, logs]
get Gets one of [clusters, nodes, kubeconfig]
help Help about any command
load Loads images into nodes
version Prints the kind CLI version

Flags:
-h, --help help for kind
--loglevel string DEPRECATED: see -v instead
-q, --quiet silence all stderr output
-v, --verbosity int32 info log verbosity, higher value produces more output
--version version for kind

Use "kind [command] --help" for more information about a command.
sigridjineth@sigridjineth-Z590-VISION-G:~$

In this scenario, a Kind cluster named “myk8s” has been created with two nodes: a control plane node (myk8s-control-plane) and a worker node (myk8s-worker). These nodes are actually Docker containers simulating Kubernetes nodes.

Kind creates a dedicated Docker network for the Kubernetes cluster. In this case, it’s a bridge network named “kind” with the following characteristics, which allows the docker containers (as being worked as kubernetes nodes) to communicate with each other and with the host system.

  • Network ID: ae9342d89db1
  • IPv4 subnet: 172.19.0.0/16
  • IPv6 subnet: fc00:f853:ccd:e793::/64
  • Gateway: 172.19.0.1
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-kind:N/A)]$ docker network ls
NETWORK ID NAME DRIVER SCOPE
f81d60963e03 bridge bridge local
2a79290d1214 docker-cuda_default bridge local
7e8218e64901 host host local
ae9342d89db1 kind bridge local
eeadb4969a76 none null local
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-kind:N/A)]$ docker inspect kind | jq
[
{
"Name": "kind",
"Id": "ae9342d89db1b8c5b314957c661134fa71fb37e1b511f8aab20b3bd47139e882",
"Created": "2024-09-07T11:40:43.607443509+09:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": true,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "fc00:f853:ccd:e793::/64"
},
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"716d84f308986cda6bab4d0805e8bf7bdd5171fb9e72b7d3dd658d18b6442d47": {
"Name": "kind-control-plane",
"EndpointID": "8bfcb38e383d73a55317ed551f6447fe96aa7bb0e2e045b5cf06175ce317b73a",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": "fc00:f853:ccd:e793::2/64"
}
},
"Options": {
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
The cluster configuration defines two nodes: one control-plane and one worker.
  • The kind create cluster command sets up the cluster named "myk8s" using this configuration.
  • It goes through several steps, including preparing nodes, writing configuration, starting the control-plane, installing CNI (Container Network Interface), installing StorageClass, and joining worker nodes.
  • The kubectl context is set to kind-myk8s, allowing immediate interaction with the new cluster.
This image shows the output of kubectl cluster-info.

The Kubernetes control plane is accessible at https://127.0.0.1:46013. This indicates that the Kubernetes API server is running locally and is port-forwarded to this address. CoreDNS, the cluster’s DNS service, is also accessible through this same IP and port.

After the worker node joins the cluster, a random port on the host is forwarded to the control plane’s kube-apiserver. This is how the local kubectl command can communicate with the cluster — it connects to this forwarded port, which then routes requests to the API server running inside the Kind container.

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kind get nodes --name myk8s
myk8s-worker
myk8s-control-plane
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:46013
CoreDNS is running at https://127.0.0.1:46013/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1265c1f0cdd9 kindest/node:v1.31.0 "/usr/local/bin/entr…" About a minute ago Up About a minute myk8s-worker
5df6ff6f7d5b kindest/node:v1.31.0 "/usr/local/bin/entr…" About a minute ago Up About a minute 127.0.0.1:46013->6443/tcp myk8s-control-plane
716d84f30898 kindest/node:v1.31.0 "/usr/local/bin/entr…" 3 minutes ago Up 3 minutes 127.0.0.1:33817->6443/tcp kind-control-plane
65677d6f056b konuu/llm_ready:20240904 "/opt/nvidia/nvidia_…" 35 hours ago Up 35 hours 0.0.0.0:8099->8080/tcp, :::8099->8080/tcp sigrid-test
e2798f340496 docker-cuda-llamafactory "/opt/nvidia/nvidia_…" 3 weeks ago Up 3 weeks 0.0.0.0:7860->7860/tcp, :::7860->7860/tcp, 6006/tcp, 8888/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp llamafactory
75329215b6bf a70e34c547e6 "scripts/app-start.sh" 3 weeks ago Up 3 weeks 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp thirsty_burnell
1ce17f355b70 korammar-api "python3" 6 weeks ago Up 3 weeks 0.0.0.0:8889->8889/tcp, :::8889->8889/tcp korammar
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
myk8s-control-plane Ready control-plane 84s v1.31.0 172.19.0.4 <none> Debian GNU/Linux 12 (bookworm) 6.5.0-44-generic containerd://1.7.18
myk8s-worker Ready <none> 73s v1.31.0 172.19.0.3 <none> Debian GNU/Linux 12 (bookworm) 6.5.0-44-generic containerd://1.7.18

$ kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-6f6b679f8f-bvz24 1/1 Running 0 81s 10.244.0.2 myk8s-control-plane <none> <none>
kube-system coredns-6f6b679f8f-mzr6l 1/1 Running 0 81s 10.244.0.4 myk8s-control-plane <none> <none>
kube-system etcd-myk8s-control-plane 1/1 Running 0 88s 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kindnet-gnvz6 1/1 Running 0 80s 172.19.0.3 myk8s-worker <none> <none>
kube-system kindnet-mfb4x 1/1 Running 0 82s 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-apiserver-myk8s-control-plane 1/1 Running 0 88s 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-controller-manager-myk8s-control-plane 1/1 Running 0 88s 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-proxy-k9scz 1/1 Running 0 82s 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-proxy-vz6kg 1/1 Running 0 80s 172.19.0.3 myk8s-worker <none> <none>
kube-system kube-scheduler-myk8s-control-plane 1/1 Running 0 88s 172.19.0.4 myk8s-control-plane <none> <none>
local-path-storage local-path-provisioner-57c5987fd4-5499m 1/1 Running 0 81s 10.244.0.3 myk8s-control-plane <none> <none>
docker exec -it myk8s-control-plane sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump htop git nano -y'
docker exec -it myk8s-worker sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump htop git nano -y'

The network is named “kind” and uses the bridge driver. It has both IPv4 (172.19.0.0/16) and IPv6 (fc00:f853:ccd:e793::/64) subnets configured. The gateway for the IPv4 subnet is 172.19.0.1. This network is local in scope, meaning it’s only accessible on the host machine.

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) rancher.io/local-path Delete WaitForFirstConsumer false 112s

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl get deploy -n local-path-storage
NAME READY UP-TO-DATE AVAILABLE AGE
local-path-provisioner 1/1 1 1 119s

The local-path StorageClass is a storage solution installed in this Kubernetes cluster that leverages the local storage of the nodes. It’s implemented through the local-path provisioner, which automates the management of persistent volumes without requiring manual specification of host paths. This provisioner dynamically creates and manages volumes on the node’s local filesystem, abstracting away the complexities of storage allocation and lifecycle management.

The cluster has two nodes: myk8s-control-plane and myk8s-worker. CoreDNS is deployed with two replicas on the control-plane node, using the 10.244.0.0/16 pod network.
  • The control-plane node has static pod manifests for etcd, kube-apiserver, kube-controller-manager, and kube-scheduler in /etc/kubernetes/manifests/.
  • The worker node does not have any static pod manifests, which is expected as these core components run only on the control plane.
  • The output of crictl ps shows two containers running on the worker node: kindnet-cni (the CNI plugin) and kube-proxy.
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it myk8s-control-plane grep staticPodPath /var/lib/kubelet/config.yaml
staticPodPath: /etc/kubernetes/manifests
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it myk8s-control-plane tree /etc/kubernetes/manifests/
/etc/kubernetes/manifests/
|-- etcd.yaml
|-- kube-apiserver.yaml
|-- kube-controller-manager.yaml
`-- kube-scheduler.yaml

1 directory, 4 files
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it myk8s-worker tree /etc/kubernetes/manifests/
/etc/kubernetes/manifests/

0 directories, 0 files

Static Pods are located in /etc/kubernetes/manifests/ on the control-plane node, which includescore components: etcd, kube-apiserver, kube-controller-manager, and kube-scheduler. They are managed directly by the kubelet, not by the Kubernetes control plane. Worker Node, however, no static pods on the worker node, as expected.

  • Rules with the comment “kube-system/kube-dns:dns” indicate DNS service routing.
  • The cluster IP range appears to be 10.96.0.0/12, which is standard for Kubernetes services.
  • KUBE-MARK-MASQ and KUBE-SVC chains are used for network address translation (NAT) and load balancing of Kubernetes services.
  • Some rules use “ — probability 0.50000000000” for even distribution of traffic across multiple endpoints.
# static pod manifest 위치 찾기
docker exec -it myk8s-control-plane grep staticPodPath /var/lib/kubelet/config.yaml
staticPodPath: /etc/kubernetes/manifests

# static pod 정보 확인 : kubectl 및 control plane 에서 관리되지 않고 kubelet 을 통해 지정한 컨테이너를 배포
docker exec -it myk8s-control-plane tree /etc/kubernetes/manifests/
/etc/kubernetes/manifests/
├── etcd.yaml
├── kube-apiserver.yaml
├── kube-controller-manager.yaml
└── kube-scheduler.yaml

docker exec -it myk8s-worker tree /etc/kubernetes/manifests/
...

# 워커 노드(컨테이너) bash 진입
docker exec -it myk8s-worker bash
---------------------------------
whoami

# kubelet 상태 확인
systemctl status kubelet

# 컨테이너 확인
docker ps
crictl ps

# kube-proxy 확인
pstree
pstree -p
ps afxuwww |grep proxy
iptables -t filter -S
iptables -t nat -S
iptables -t mangle -S
iptables -t raw -S
iptables -t security -S

# tcp listen 포트 정보 확인
ss -tnlp

# 빠져나오기
exit
---------------------------------

root@myk8s-worker:/# ss -tnlp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.11:42031 0.0.0.0:*
LISTEN 0 4096 127.0.0.1:35519 0.0.0.0:* users:(("containerd",pid=111,fd=10))
LISTEN 0 4096 127.0.0.1:10248 0.0.0.0:* users:(("kubelet",pid=259,fd=17))
LISTEN 0 4096 127.0.0.1:10249 0.0.0.0:* users:(("kube-proxy",pid=420,fd=12))
LISTEN 0 4096 *:10250 *:* users:(("kubelet",pid=259,fd=18))
LISTEN 0 4096 *:10256 *:* users:(("kube-proxy",pid=420,fd=15))
  • The command grep staticPodPath /var/lib/kubelet/config.yaml reveals that static pod manifests are located at /etc/kubernetes/manifests/ on the control-plane node.
  • Static pods are a critical component of the Kubernetes control plane, managed directly by the kubelet without involvement from the API server or controller manager.
  • The control-plane node hosts four essential static pods: etcd (distributed key-value store), kube-apiserver (API server), kube-controller-manager (controller processes), and kube-scheduler (pod scheduling).
  • The worker node, as expected, does not have any static pod manifests, as these core components run exclusively on control-plane nodes.
  • The ss -tnlp command output reveals several important listening ports on the worker node.
  • a. 127.0.0.11:42031 - Likely related to Docker's embedded DNS server.
  • b. 127.0.0.1:35519 - containerd's API endpoint.
  • c. 127.0.0.1:10248 - kubelet's healthz endpoint (for liveness probes).
  • d. 127.0.0.1:10249 - kube-proxy metrics endpoint.
  • e. *:10250 - kubelet's API server (used for pods' exec, log streaming, port-forward features).
  • f. *:10256 - kube-proxy's healthz endpoint.

Two pods were deployed: “netpod” (using nicolaka/netshoot image) and “nginx” (using nginx:alpine image).

# 파드 생성
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: netpod
spec:
containers:
- name: netshoot-pod
image: nicolaka/netshoot
command: ["tail"]
args: ["-f", "/dev/null"]
terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx-pod
image: nginx:alpine
terminationGracePeriodSeconds: 0
EOF
kubectl get pod -owide

# The successful curl command from netpod to nginx demonstrates functioning inter-pod communication within the cluster.
kubectl exec -it netpod -- curl -s $(kubectl get pod nginx -o jsonpath={.status.podIP}) | grep -o "<title>.*</title>"
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl exec -it netpod -- curl -s $(kubectl get pod nginx -o jsonpath={.status.podIP}) | grep -o "<title>.*</title>"
<title>Welcome to nginx!</title>
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$

As being mentioned, the cluster is set up in a DinD configuration, where Kubernetes nodes are Docker containers on the host. Kind typically creates a single-node cluster by default; This single node serves dual roles as both the control plane and worker node, which differs from production Kubernetes deployments.

In standard Kubernetes clusters, control plane nodes have a taint (node-role.kubernetes.io/master:NoSchedule) to prevent regular pod scheduling. However, the absence of this taint in the Kind setup (Taints: <none>) allows pods to be scheduled on the control plane node. This configuration enhances resource utilization in single-node setups but may not reflect production best practices.

The control plane node shows multiple veth interfaces, indicating connections to pods. When it comes to IP addressing scheme, 10.244.0.0/16 for pods and 172.19.0.0/16 for nodes, which is typical for Kind setups.

The presence of veth interfaces (veth8cbec9db@if2, vetha547a260@if2, vethece8989a@if2) shows how containers are connected to the node’s network namespace, while /etc/resolv.conf is configured to use 172.19.0.1 as the nameserver, or Docker bridge network gateway. Additionally, “ndots:0” option is noteworthy, as it differs from the Kubernetes default of “ndots:5”, potentially affecting DNS resolution behavior.

docker ps | grep myk8s
1265c1f0cdd9 kindest/node:v1.31.0 "/usr/local/bin/entr…" 10 minutes ago Up 10 minutes myk8s-worker
5df6ff6f7d5b kindest/node:v1.31.0 "/usr/local/bin/entr…" 10 minutes ago Up 10 minutes 127.0.0.1:46013->6443/tcp myk8s-control-plane
 docker inspect myk8s-control-plane | jq | grep Entrypoint -A 3
"Entrypoint": [
"/usr/local/bin/entrypoint",
"/sbin/init"
],
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it myk8s-control-plane bash
root@myk8s-control-plane:/# arch
x86_64
root@myk8s-control-plane:/# whoami
root
root@myk8s-control-plane:/# root
bash: root: command not found
root@myk8s-control-plane:/# ip -br -c -4 addr
ip -c route
cat /etc/resolv.conf
lo UNKNOWN 127.0.0.1/8
veth8cbec9db@if2 UP 10.244.0.1/32
vetha547a260@if2 UP 10.244.0.1/32
vethece8989a@if2 UP 10.244.0.1/32
eth0@if127 UP 172.19.0.4/16
default via 172.19.0.1 dev eth0
10.244.0.2 dev veth8cbec9db scope host
10.244.0.3 dev vetha547a260 scope host
10.244.0.4 dev vethece8989a scope host
10.244.1.0/24 via 172.19.0.3 dev eth0
172.19.0.0/16 dev eth0 proto kernel scope link src 172.19.0.4
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 172.19.0.1
search localdomain
options edns0 trust-ad ndots:0

# Based on host file: '/etc/resolv.conf' (internal resolver)
# ExtServers: [host(127.0.0.53)]
# Overrides: []
# Option ndots from: internal
root@myk8s-control-plane:/#
  • The cluster uses containerd as the container runtime, as evidenced by the systemctl status containerd command.
  • PID 1 is /sbin/init, which is unusual for containers but makes sense in this DinD setup where containers simulate full nodes.
  • The crictl images command reveals a comprehensive set of images used in the Kubernetes cluster, including Kind-specific images (kindnetd, local-path-provisioner), core Kubernetes components, CoreDNS and the pause container which is crucial for pod networking.
# Entrypoint 정보 확인
cat /usr/local/bin/entrypoint

# 프로세스 확인 : PID 1 은 /sbin/init
ps -ef

# 컨테이터 런타임 정보 확인
systemctl status containerd

# DinD 컨테이너 확인 : crictl 사용
crictl version
crictl info
crictl ps -o json | jq -r '.containers[] | {NAME: .metadata.name, POD: .labels["io.kubernetes.pod.name"]}'
crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
ff3d3a53905fd 9d6767b714bf1 12 minutes ago Running nginx-pod 0 20328fe63d512 nginx
bebe6b14d1ab3 eead9e442471d 13 minutes ago Running netshoot-pod 0 28cd918f0561a netpod
...

root@myk8s-control-plane:/# crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/kindest/kindnetd v20240813-c6f155d6 12968670680f4 36.8MB
docker.io/kindest/local-path-helper v20230510-486859a6 be300acfc8622 3.05MB
docker.io/kindest/local-path-provisioner v20240813-c6f155d6 3a195b56ff154 19.4MB
registry.k8s.io/coredns/coredns v1.11.1 cbb01a7bd410d 18.2MB
registry.k8s.io/etcd 3.5.15-0 2e96e5913fc06 56.9MB
registry.k8s.io/kube-apiserver-amd64 v1.31.0 4f8c99889f8e4 95.2MB
registry.k8s.io/kube-apiserver v1.31.0 4f8c99889f8e4 95.2MB
registry.k8s.io/kube-controller-manager-amd64 v1.31.0 7e9a7dc204d9d 89.4MB
registry.k8s.io/kube-controller-manager v1.31.0 7e9a7dc204d9d 89.4MB
registry.k8s.io/kube-proxy-amd64 v1.31.0 af3ec60a3d89b 92.7MB
registry.k8s.io/kube-proxy v1.31.0 af3ec60a3d89b 92.7MB
registry.k8s.io/kube-scheduler-amd64 v1.31.0 418e326664bd2 68.4MB
registry.k8s.io/kube-scheduler v1.31.0 418e326664bd2 68.4MB
registry.k8s.io/pause 3.10 873ed75102791 320kB

root@myk8s-control-plane:/# kubectl get node -v6
I0907 02:56:01.384657 2722 loader.go:395] Config loaded from file: /etc/kubernetes/admin.conf
I0907 02:56:01.390377 2722 round_trippers.go:553] GET https://myk8s-control-plane:6443/api/v1/nodes?limit=500 200 OK in 3 milliseconds
NAME STATUS ROLES AGE VERSION
myk8s-control-plane Ready control-plane 12m v1.31.0
myk8s-worker Ready <none> 12m v1.31.0

Let us setup multi-node cluster using Kind. A two-node cluster (one control-plane, one worker) is created using the framework. The configuration uses extraPortMappings to expose ports 31000 & 31001 from the worker node to the host system, enabling external access to services.

The control-plane node is automatically tainted with node-role.kubernetes.io/control-plane:NoSchedule, preventing regular workloads from being scheduled on it. The worker node has no taints, allowing it to run all workloads.

cat <<EOT> kind-2node.yaml
# two node (one workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraPortMappings:
- containerPort: 31000
hostPort: 31000
listenAddress: "0.0.0.0" # Optional, defaults to "0.0.0.0"
protocol: tcp # Optional, defaults to tcp
- containerPort: 31001
hostPort: 31001
EOT
kind delete cluster --name myk8s

CLUSTERNAME=myk8s
kind create cluster --config kind-2node.yaml --name $CLUSTERNAME

# 배포 확인
kind get clusters
kind get nodes --name $CLUSTERNAME

# 노드 확인
kubectl get nodes -o wide

# 노드에 Taints 정보 확인
kubectl describe node $CLUSTERNAME-control-plane | grep Taints
Taints: node-role.kubernetes.io/control-plane:NoSchedule

kubectl describe node $CLUSTERNAME-worker | grep Taints
Taints: <none>

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
myk8s-control-plane Ready control-plane 32s v1.31.0 172.19.0.4 <none> Debian GNU/Linux 12 (bookworm) 6.5.0-44-generic containerd://1.7.18
myk8s-worker Ready <none> 21s v1.31.0 172.19.0.3 <none> Debian GNU/Linux 12 (bookworm) 6.5.0-44-generic containerd://1.7.18
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl describe node $CLUSTERNAME-control-plane | grep Taints
Taints: node-role.kubernetes.io/control-plane:NoSchedule
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl describe node $CLUSTERNAME-worker | grep Taints
Taints: <none>
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kind get nodes --name $CLUSTERNAME
myk8s-worker
myk8s-control-plane
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kind get clusters
kind
myk8s
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$
# 컨테이너 확인 : 컨테이너 갯수, 컨테이너 이름 확인
# kind yaml 에 포트 맵핑 정보 처럼, 자신의 PC 호스트에 31000 포트 접속 시, 워커노드(실제로는 컨테이너)에 TCP 31000 포트로 연결
# 즉, 워커노드에 NodePort TCP 31000 설정 시 자신의 PC 호스트에서 접속 가능!

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker port $CLUSTERNAME-worker
31000/tcp -> 0.0.0.0:31000
31001/tcp -> 0.0.0.0:31001

# The cluster uses the 172.19.0.0/16 range for node IPs and 10.244.0.0/16 for pod IPs.
# Virtual Ethernet (veth) interfaces are used to connect pods to the node network namespace.

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it $CLUSTERNAME-control-plane ip -br -c -4 addr
lo UNKNOWN 127.0.0.1/8
vethf5e85852@if2 UP 10.244.0.1/32
vethc185ffde@if2 UP 10.244.0.1/32
veth1ad0d5d3@if2 UP 10.244.0.1/32
eth0@if139 UP 172.19.0.4/16
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it $CLUSTERNAME-worker ip -br -c -4 addr
lo UNKNOWN 127.0.0.1/8
eth0@if137 UP 172.19.0.3/16

The Kind configuration and service setup allow accessing both kube-ops-view and Nginx from the host system using localhost:31000 and localhost:31001 respectively.

The use of NodePort services to make applications accessible outside the cluster :)

# kube-ops-view
# helm show values geek-cookbook/kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set service.main.type=NodePort,service.main.ports.http.nodePort=31000 --set env.TZ="Asia/Seoul" --namespace kube-system

# 설치 확인
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kube-ops-view 1/1 1 1 15s

NAME READY STATUS RESTARTS AGE
pod/kube-ops-view-657dbc6cd8-p5rgn 1/1 Running 0 15s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-ops-view NodePort 10.96.238.144 <none> 8080:31000/TCP 15s

NAME ENDPOINTS AGE
endpoints/kube-ops-view 10.244.1.2:8080 15s

# kube-ops-view 접속 URL 확인 (1.5 , 2 배율)
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ echo -e "KUBE-OPS-VIEW URL = http://localhost:31000/#scale=1.5"
KUBE-OPS-VIEW URL = http://localhost:31000/#scale=1.5

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ curl -v http://localhost:31000/#scale=1.5
* Trying 127.0.0.1:31000...
* Connected to localhost (127.0.0.1) port 31000 (#0)
> GET / HTTP/1.1
> Host: localhost:31000
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< Content-Length: 1341
< Date: Sat, 07 Sep 2024 03:01:40 GMT
<
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Kubernetes Operational View 20.4.0</title>
<link rel="shortcut icon" href="static/favicon.ico">
<style>* {padding: 0; margin: 0} body { color: #aaaaff; background: #000; }</style>
<style>
/* latin */
@font-face {
font-family: 'ShareTechMono';
font-style: normal;
font-weight: 400;
/* ShareTechMono-Regular.ttf: Copyright (c) 2012, Carrois Type Design, Ralph du Carrois (www.carrois.com post@carrois.com), with Reserved Font Name 'Share'
License: SIL Open Font License, 1.1 */
src: local('Share Tech Mono'), local('ShareTechMono-Regular'), url(static/sharetechmono.woff2) format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074, U+20AC, U+2212, U+2215, U+E0FF, U+EFFD, U+F000;
}
</style>
</head>
<body>
<!-- make sure the font is loaded -->
<div id="loading" style="font-family: ShareTechMono">Loading..</div>
<script src="static/build/app-1577af7fd97589e7285e.js"></script>
<script>document.getElementById('loading').style.display = 'none'; const app = new App({"node_link_url_template": null, "pod_link_url_template": null}); app.run()</script>
</body>
* Connection #0 to host localhost left intact
# 디플로이먼트와 서비스 배포
cat <<EOF | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-websrv
spec:
replicas: 2
selector:
matchLabels:
app: deploy-websrv
template:
metadata:
labels:
app: deploy-websrv
spec:
terminationGracePeriodSeconds: 0
containers:
- name: deploy-websrv
image: nginx:alpine
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: deploy-websrv
spec:
ports:
- name: svc-webport
port: 80
targetPort: 80
nodePort: 31001
selector:
app: deploy-websrv
type: NodePort
EOF

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7a00ab0d4385 kindest/node:v1.31.0 "/usr/local/bin/entr…" 4 minutes ago Up 4 minutes 0.0.0.0:31000-31001->31000-31001/tcp myk8s-worker
b8798f958835 kindest/node:v1.31.0 "/usr/local/bin/entr…" 4 minutes ago Up 4 minutes 127.0.0.1:33179->6443/tcp myk8s-control-plane
716d84f30898 kindest/node:v1.31.0 "/usr/local/bin/entr…" 21 minutes ago Up 21 minutes 127.0.0.1:33817->6443/tcp kind-control-plane
65677d6f056b konuu/llm_ready:20240904 "/opt/nvidia/nvidia_…" 35 hours ago Up 35 hours 0.0.0.0:8099->8080/tcp, :::8099->8080/tcp sigrid-test
e2798f340496 docker-cuda-llamafactory "/opt/nvidia/nvidia_…" 3 weeks ago Up 3 weeks 0.0.0.0:7860->7860/tcp, :::7860->7860/tcp, 6006/tcp, 8888/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp llamafactory
75329215b6bf a70e34c547e6 "scripts/app-start.sh" 3 weeks ago Up 3 weeks 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp thirsty_burnell
1ce17f355b70 korammar-api "python3" 6 weeks ago Up 3 weeks 0.0.0.0:8889->8889/tcp, :::8889->8889/tcp korammar
[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl get deploy,svc,ep deploy-websrv
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/deploy-websrv 2/2 2 2 10s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/deploy-websrv NodePort 10.96.142.208 <none> 80:31001/TCP 10s

NAME ENDPOINTS AGE
endpoints/deploy-websrv 10.244.1.3:80,10.244.1.4:80 10s

open http://localhost:31001
curl -s localhost:31001 | grep -o "<title>.*</title>"
<title>Welcome to nginx!</title>

kubectl delete deploy,svc deploy-websrv

Pause Containers and Network Namespace Creation

https://velog.io/@nigasa12/pod%EC%97%90-%EB%8C%80%ED%95%9C-%EA%B3%A0%EC%B0%B0-feat-sandbox-pod%EC%99%80-pause-container

The Pause container plays a crucial role in Kubernetes networking. It creates and manages the network namespace that is shared by all containers within a Pod. Pause containers hold the network namespace and other shared resources for the pod. They ensure that pod-level resources persist even if application containers crash and restart.

There’s a concept called Lo (loopback) — Used for localhost communication within the Pod, and eth0 — The primary network interface for external communication.

The namespaces, NET (Network namespace) & MNT (mount namespace) and UTS (Unux timesharing system namespace) are then shared with all other containers in the Pod, creating a unified networking environment.

For Pods, they can contain multiple containers, enabling patterns like sidecars within a single network namespace. All containers in a Pod are scheduled on the same node; so that they share the same lifecycle — when a Pod is deleted, all its containers are terminated. Containers within a Pod share the same IP address. They can communicate using localhost, distinguished by different ports.

Each Pod gets a unique IP address within the cluster, and this IP is different from the node IP. Pods can communicate across nodes without NAT, using their Pod IPs. Containers in a Pod can mount the same volumes, which enables file-based communication between containers.

Signal Handling

sigdown: Handles termination signals (SIGINT, SIGTERM).

sigreap: Reaps child processes to prevent zombie processes.

static void sigdown(int signo) {
psignal(signo, "Shutting down, got signal");
exit(0);
}

static void sigreap(int signo) {
while (waitpid(-1, NULL, WNOHANG) > 0);
}

Main Function

  • The main function sets up signal handlers and enters an infinite loop.
  • The pause() system call suspends the process until a signal is received.
  • When the Pause container starts, it initializes the network namespace for the Pod — setting up the lo and eth0 interfaces.
  • The IP address assigned to the Pod is actually assigned to the network namespace held by the Pause container; other containers in the Pod inherit this IP address.
  • The Pause container’s simple design (infinite loop with signal handling) ensures it stays alive as long as the Pod exists. This maintains the network namespace for the entire lifecycle of the Pod.
int main(int argc, char **argv) {
// ... (signal handling setup)
for (;;)
pause();
// ...
}

PID 1 Role:

  • Checks if it's running as PID 1, which is crucial for its role in the Pod.
if (getpid() != 1)
fprintf(stderr, "Warning: pause should be the first process\n");
  • Before starting a Pod, kubelet calls RuntimeService.RunPodSandbox. PodSandbox is the isolated environment created for each pod, and it is set up before the actual containers are created and includes crucial steps like network setup.
Kubelet                  KubeletGenericRuntimeManager       RemoteRuntime
+ + +
| | |
+---------SyncPod------------->+ |
| | |
| +---- Create PodSandbox ------->+
| +<------------------------------+
| | |
| XXXXXXXXXXXX |
| | X |
| | NetworkPlugin. |
| | SetupPod |
| | X |
| XXXXXXXXXXXX |
| | |
| +<------------------------------+
| +---- Pull image1 -------->+
| +<------------------------------+
| +---- Create container1 ------->+
| +<------------------------------+
| +---- Start container1 -------->+
| +<------------------------------+
| | |
| +<------------------------------+
| +---- Pull image2 -------->+
| +<------------------------------+
| +---- Create container2 ------->+
| +<------------------------------+
| +---- Start container2 -------->+
| +<------------------------------+
| | |
| <-------Success--------------+ |
| | |
+ + +

The setup — for the sake of understanding the behaviour of Pause Containers — involves creating a Kind cluster with one control-plane node and one worker node.

# '컨트롤플레인, 워커 노드 1대' 클러스터 배포 : 파드에 접속하기 위한 포트 맵핑 설정
cat <<EOT> kind-2node.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
extraPortMappings:
- containerPort: 30000
hostPort: 30000
- containerPort: 30001
hostPort: 30001
EOT

kind create cluster --config kind-2node.yaml --name myk8s

docker exec -it myk8s-control-plane sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump htop git nano -y'
docker exec -it myk8s-worker sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump htop -y'

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ kubectl get nodes -o wide
docker ps
docker port myk8s-worker
docker exec -it myk8s-control-plane ip -br -c -4 addr
docker exec -it myk8s-worker ip -br -c -4 addr
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
myk8s-control-plane Ready control-plane 58s v1.31.0 172.19.0.2 <none> Debian GNU/Linux 12 (bookworm) 6.5.0-44-generic containerd://1.7.18
myk8s-worker Ready <none> 47s v1.31.0 172.19.0.3 <none> Debian GNU/Linux 12 (bookworm) 6.5.0-44-generic containerd://1.7.18
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ab986c0487d9 kindest/node:v1.31.0 "/usr/local/bin/entr…" About a minute ago Up About a minute 127.0.0.1:45453->6443/tcp myk8s-control-plane
9a1cc012a541 kindest/node:v1.31.0 "/usr/local/bin/entr…" About a minute ago Up About a minute 0.0.0.0:30000-30001->30000-30001/tcp myk8s-worker
65677d6f056b konuu/llm_ready:20240904 "/opt/nvidia/nvidia_…" 35 hours ago Up 35 hours 0.0.0.0:8099->8080/tcp, :::8099->8080/tcp sigrid-test
e2798f340496 docker-cuda-llamafactory "/opt/nvidia/nvidia_…" 3 weeks ago Up 3 weeks 0.0.0.0:7860->7860/tcp, :::7860->7860/tcp, 6006/tcp, 8888/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp llamafactory
75329215b6bf a70e34c547e6 "scripts/app-start.sh" 3 weeks ago Up 3 weeks 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp thirsty_burnell
1ce17f355b70 korammar-api "python3" 6 weeks ago Up 3 weeks 0.0.0.0:8889->8889/tcp, :::8889->8889/tcp korammar
30000/tcp -> 0.0.0.0:30000
30001/tcp -> 0.0.0.0:30001
lo UNKNOWN 127.0.0.1/8
veth5ff5466c@if2 UP 10.244.0.1/32
veth03284ca8@if2 UP 10.244.0.1/32
veth849a5a76@if2 UP 10.244.0.1/32
eth0@if141 UP 172.19.0.2/16
lo UNKNOWN 127.0.0.1/8
eth0@if143 UP 172.19.0.3/16

helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set service.main.type=NodePort,service.main.ports.http.nodePort=30000 --set env.TZ="Asia/Seoul" --namespace kube-system

helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set service.main.type=NodePort,service.main.ports.http.nodePort=30000 --set env.TZ="Asia/Seoul" --namespace kube-system

# 설치 확인
kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view

# kube-ops-view 접속 URL 확인 (1.5 , 2 배율) : macOS 사용자
echo -e "KUBE-OPS-VIEW URL = http://localhost:30000/#scale=1.5"

The output below demonstrates how the pause container and application containers share namespaces. The pause container creates network, UTS, and IPC namespaces, which are then shared with the application container (in this case, the python3 process for kube-ops-view). The application container has its own mount, PID, and cgroup namespaces.

The pstree output provides a detailed view of the processes running on the worker node, including containerd, kubelet, and various pods.

The lsns command outputs show the namespaces for different processes, revealing how containers are isolated.

[sigridjineth@sigridjineth-Z590-VISION-G ~ (⎈|kind-myk8s:N/A)]$ docker exec -it myk8s-worker bash

root@myk8s-worker:/# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
2b9c27be23ead a645de6a07a3d 26 seconds ago Running kube-ops-view 0 3691d12c8e8e5 kube-ops-view-657dbc6cd8-b2fsn
0cf367e375c0f 12968670680f4 About a minute ago Running kindnet-cni 0 577a47bf1458e kindnet-5hf7q
69c54a6270ecb af3ec60a3d89b About a minute ago Running kube-proxy 0 0513f602f6d0e kube-proxy-kds58

root@myk8s-worker:/# pstree -aln
systemd
|-systemd-journal
|-containerd
| `-24*[{containerd}]
|-kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.19.0.3 --node-labels= --pod-infra-container-image=registry.k8s.io/pause:3.10 --provider-id=kind://docker/myk8s/myk8s-worker --runtime-cgroups=/system.slice/containerd.service
| `-23*[{kubelet}]
|-containerd-shim -namespace k8s.io -id 0513f602f6d0e58def532bba509c0fa88ac85e45e57eaa20e6b4d2aecc74cafb -address /run/containerd/containerd.sock
| |-11*[{containerd-shim}]
| |-pause
| `-kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=myk8s-worker
| `-17*[{kube-proxy}]
|-containerd-shim -namespace k8s.io -id 577a47bf1458e2a6d86298a44fd12e1db4e421279e125e943c0436172b4c7e01 -address /run/containerd/containerd.sock
| |-11*[{containerd-shim}]
| |-pause
| `-kindnetd
| `-15*[{kindnetd}]
`-containerd-shim -namespace k8s.io -id 3691d12c8e8e596519e6b13ec678592a5e934812f0689c6e718bc5534743d4dc -address /run/containerd/containerd.sock
|-11*[{containerd-shim}]
|-pause
`-python3 -m kube_ops_view
`-2*[{python3}]

root@myk8s-worker:/# pstree -aclnpsS
systemd,1
|-systemd-journal,96
|-containerd,111
| |-{containerd},112
| |-{containerd},113
| |-{containerd},114
| |-{containerd},115
| |-{containerd},116
| |-{containerd},117
| |-{containerd},118
| |-{containerd},119
| |-{containerd},121
| |-{containerd},122
| |-{containerd},123
| |-{containerd},124
| |-{containerd},125
| |-{containerd},126
| |-{containerd},127
| |-{containerd},128
| |-{containerd},129
| |-{containerd},130
| |-{containerd},131
| |-{containerd},356
| |-{containerd},379
| |-{containerd},380
| |-{containerd},393
| `-{containerd},394
|-kubelet,259 --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd/containerd.sock --node-ip=172.19.0.3 --node-labels= --pod-infra-container-image=registry.k8s.io/pause:3.10 --provider-id=kind://docker/myk8s/myk8s-worker --runtime-cgroups=/system.slice/containerd.service
| |-{kubelet},260
| |-{kubelet},261
| |-{kubelet},262
| |-{kubelet},263
| |-{kubelet},264
| |-{kubelet},265
| |-{kubelet},266
| |-{kubelet},267
| |-{kubelet},268
| |-{kubelet},269
| |-{kubelet},270
| |-{kubelet},271
| |-{kubelet},272
| |-{kubelet},273
| |-{kubelet},275
| |-{kubelet},276
| |-{kubelet},277
| |-{kubelet},278
| |-{kubelet},279
| |-{kubelet},283
| |-{kubelet},284
| |-{kubelet},369
| `-{kubelet},673
|-containerd-shim,313 -namespace k8s.io -id 0513f602f6d0e58def532bba509c0fa88ac85e45e57eaa20e6b4d2aecc74cafb -address /run/containerd/containerd.sock
| |-{containerd-shim},314
| |-{containerd-shim},315
| |-{containerd-shim},316
| |-{containerd-shim},317
| |-{containerd-shim},318
| |-{containerd-shim},319
| |-{containerd-shim},320
| |-{containerd-shim},321
| |-pause,358,ipc,mnt,pid
| |-{containerd-shim},367
| |-kube-proxy,413,ipc,mnt,pid --config=/var/lib/kube-proxy/config.conf --hostname-override=myk8s-worker
| | |-{kube-proxy},432
| | |-{kube-proxy},433
| | |-{kube-proxy},434
| | |-{kube-proxy},435
| | |-{kube-proxy},436
| | |-{kube-proxy},437
| | |-{kube-proxy},438
| | |-{kube-proxy},439
| | |-{kube-proxy},440
| | |-{kube-proxy},441
| | |-{kube-proxy},442
| | |-{kube-proxy},443
| | |-{kube-proxy},444
| | |-{kube-proxy},445
| | |-{kube-proxy},446
| | |-{kube-proxy},447
| | `-{kube-proxy},448
| |-{containerd-shim},431
| `-{containerd-shim},664
|-containerd-shim,339 -namespace k8s.io -id 577a47bf1458e2a6d86298a44fd12e1db4e421279e125e943c0436172b4c7e01 -address /run/containerd/containerd.sock
| |-{containerd-shim},340
| |-{containerd-shim},341
| |-{containerd-shim},342
| |-{containerd-shim},343
| |-{containerd-shim},344
| |-{containerd-shim},345
| |-{containerd-shim},346
| |-{containerd-shim},347
| |-pause,366,ipc,mnt,pid
| |-{containerd-shim},382
| |-kindnetd,600,cgroup,ipc,mnt,pid
| | |-{kindnetd},619
| | |-{kindnetd},620
| | |-{kindnetd},621
| | |-{kindnetd},622
| | |-{kindnetd},623
| | |-{kindnetd},624
| | |-{kindnetd},625
| | |-{kindnetd},626
| | |-{kindnetd},627
| | |-{kindnetd},654
| | |-{kindnetd},655
| | |-{kindnetd},656
| | |-{kindnetd},657
| | |-{kindnetd},658
| | `-{kindnetd},659
| |-{containerd-shim},612
| `-{containerd-shim},670
`-containerd-shim,1124 -namespace k8s.io -id 3691d12c8e8e596519e6b13ec678592a5e934812f0689c6e718bc5534743d4dc -address /run/containerd/containerd.sock
|-{containerd-shim},1125
|-{containerd-shim},1126
|-{containerd-shim},1127
|-{containerd-shim},1128
|-{containerd-shim},1129
|-{containerd-shim},1130
|-{containerd-shim},1131
|-{containerd-shim},1132
|-pause,1144,ipc,mnt,net,pid,uts
|-{containerd-shim},1150
|-python3,1206,cgroup,ipc,mnt,net,pid,uts -m kube_ops_view
| |-{python3},1226
| `-{python3},1227
|-{containerd-shim},1218
`-{containerd-shim},1228
root@myk8s-worker:/#

lsns -p 1
lsns -p $$
NS TYPE NPROCS PID USER COMMAND
4026531834 time 15 1 root /sbin/init
4026531837 user 15 1 root /sbin/init
4026533402 mnt 9 1 root /sbin/init
4026533403 uts 13 1 root /sbin/init
4026533404 ipc 9 1 root /sbin/init
4026533405 pid 9 1 root /sbin/init
4026533406 net 13 1 root /sbin/init
4026533471 cgroup 13 1 root /sbin/init
NS TYPE NPROCS PID USER COMMAND
4026531834 time 15 1 root /sbin/init
4026531837 user 15 1 root /sbin/init
4026533402 mnt 9 1 root /sbin/init
4026533403 uts 13 1 root /sbin/init
4026533404 ipc 9 1 root /sbin/init
4026533405 pid 9 1 root /sbin/init
4026533406 net 13 1 root /sbin/init
4026533471 cgroup 13 1 root /sbin/init
# 해당 파드에 pause 컨테이너는 호스트NS와 다른 5개의 NS를 가짐 : mnt/pid 는 pasue 자신만 사용, net/uts/ipc는 app 컨테이너를 위해서 먼저 생성해둠
lsns -p 1797

# app 컨테이너(metrics-server)는 호스트NS와 다른 6개의 NS를 가짐 : mnt/pid/cgroup 는 자신만 사용, net/uts/ipc는 pause 컨테이너가 생성한 것을 공유 사용함
pgrep python3
lsns -p $(pgrep python3)
pgrep metrics-server
lsns -p $(pgrep metrics-server)
#
ls -l /run/containerd/containerd.sock

# 특정 소켓 파일을 사용하는 프로세스 확인
root@myk8s-worker:/# ls -l /run/containerd/containerd.sock
srw-rw---- 1 root root 0 Sep 7 03:13 /run/containerd/containerd.sock
root@myk8s-worker:/# lsof /run/containerd/containerd.sock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
container 111 root 9u unix 0xffff8a749e4dfb40 0t0 7444391 /run/containerd/containerd.sock type=STREAM (LISTEN)
container 111 root 11u unix 0xffff8a745e20e600 0t0 7466450 /run/containerd/containerd.sock type=STREAM (CONNECTED)
container 111 root 12u unix 0xffff8a753154d0c0 0t0 7465364 /run/containerd/containerd.sock type=STREAM (CONNECTED)
container 111 root 13u unix 0xffff8a7449aa0880 0t0 7459741 /run/containerd/containerd.sock type=STREAM (CONNECTED)
kubelet 259 root 20u unix 0xffff8a788f1d8000 0t0 7462568 /var/lib/kubelet/pod-resources/1371288139 type=STREAM (LISTEN)

#
ss -xl | egrep 'Netid|containerd'
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
u_str LISTEN 0 4096 /run/containerd/s/dbff367de6707d471ba410291b01bc91e8d8d4b51ea8f3b9f83734f8899c221e 7481729 * 0
u_str LISTEN 0 4096 /run/containerd/containerd.sock.ttrpc 7444388 * 0
u_str LISTEN 0 4096 /run/containerd/containerd.sock 7444391 * 0
u_str LISTEN 0 4096 /run/containerd/s/862f24a132e23854837ff59db92293f9b05b5b1007ab35bb21a64c8086b7486b 7460816 * 0
u_str LISTEN 0 4096 /run/containerd/s/c4bc147a4720ce5b3134ba9974317bb3ebe250170f128bbfe6b17120999091c7 7465405 * 0

#
findmnt -A
findmnt -t cgroup2
grep cgroup /proc/filesystems
stat -fc %T /sys/fs/cgroup/

root@myk8s-worker:/# findmnt -A
findmnt -t cgroup2
grep cgroup /proc/filesystems
stat -fc %T /sys/fs/cgroup/
TARGET SOURCE FSTYPE OPTIONS
/ overlay overlay rw,relatime,l
|-/kind/product_uuid overlay[/kind/product_uuid] overlay rw,relatime,l
|-/proc proc proc rw,nosuid,nod
| `-/proc/sys/fs/binfmt_misc systemd-1 autofs rw,relatime,f
|-/dev tmpfs tmpfs rw,nosuid,siz
| |-/dev/hugepages hugetlbfs hugetlbfs rw,relatime,p
| |-/dev/console devpts[/0] devpts rw,nosuid,noe
| |-/dev/pts devpts devpts rw,nosuid,noe
| |-/dev/mqueue mqueue mqueue rw,nosuid,nod
| `-/dev/shm shm tmpfs rw,nosuid,nod
|-/sys sysfs sysfs ro,nosuid,nod
| |-/sys/devices/virtual/dmi/id/product_name overlay[/kind/product_name] overlay ro,relatime,l
| |-/sys/devices/virtual/dmi/id/product_uuid overlay[/kind/product_uuid] overlay ro,relatime,l
| | `-/sys/devices/virtual/dmi/id/product_uuid overlay[/kind/product_uuid] overlay ro,relatime,l
| |-/sys/kernel/debug debugfs debugfs rw,nosuid,nod
| |-/sys/kernel/tracing tracefs tracefs rw,nosuid,nod
| |-/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nod
| |-/sys/kernel/config configfs configfs rw,nosuid,nod
| `-/sys/fs/cgroup cgroup cgroup2 rw,nosuid,nod
|-/run tmpfs tmpfs rw,nosuid,nod
| |-/run/lock tmpfs tmpfs rw,nosuid,nod
| |-/run/credentials/systemd-sysctl.service ramfs ramfs ro,nosuid,nod
| |-/run/credentials/systemd-sysusers.service ramfs ramfs ro,nosuid,nod
| |-/run/credentials/systemd-tmpfiles-setup-dev.service ramfs ramfs ro,nosuid,nod
| |-/run/containerd/io.containerd.grpc.v1.cri/sandboxes/0513f602f6d0e58def532bba509c0fa88ac85e45e57eaa20e6b4d2aecc74cafb/shm
| | shm tmpfs rw,nosuid,nod
| |-/run/containerd/io.containerd.runtime.v2.task/k8s.io/0513f602f6d0e58def532bba509c0fa88ac85e45e57eaa20e6b4d2aecc74cafb/rootfs
| | overlay overlay rw,relatime,l
| |-/run/containerd/io.containerd.grpc.v1.cri/sandboxes/577a47bf1458e2a6d86298a44fd12e1db4e421279e125e943c0436172b4c7e01/shm
| | shm tmpfs rw,nosuid,nod
| |-/run/containerd/io.containerd.runtime.v2.task/k8s.io/577a47bf1458e2a6d86298a44fd12e1db4e421279e125e943c0436172b4c7e01/rootfs
| | overlay overlay rw,relatime,l
| |-/run/containerd/io.containerd.runtime.v2.task/k8s.io/69c54a6270ecb1a845b6d6d1317d56db7404d95e4b34328a7cc52ff554ebbfe2/rootfs
| | overlay overlay rw,relatime,l
| |-/run/containerd/io.containerd.runtime.v2.task/k8s.io/0cf367e375c0f7d8d92ff4d9cd650058a41e02baa09897c404e5a95f0c9fff60/rootfs
| | overlay overlay rw,relatime,l
| |-/run/containerd/io.containerd.grpc.v1.cri/sandboxes/3691d12c8e8e596519e6b13ec678592a5e934812f0689c6e718bc5534743d4dc/shm
| | shm tmpfs rw,nosuid,nod
| |-/run/netns/cni-28887e40-76be-1ae2-c009-2236a0e332a3 nsfs[net:[4026533716]] nsfs rw
| |-/run/containerd/io.containerd.runtime.v2.task/k8s.io/3691d12c8e8e596519e6b13ec678592a5e934812f0689c6e718bc5534743d4dc/rootfs
| | overlay overlay rw,relatime,l
| `-/run/containerd/io.containerd.runtime.v2.task/k8s.io/2b9c27be23ead48ccb30ec660663e3da89bc1db6065dc07efe5cf7bf6210b8d1/rootfs
| overlay overlay rw,relatime,l
|-/tmp tmpfs tmpfs rw,nosuid,nod
|-/var /dev/nvme0n1p2[/var/lib/docker/volumes/ce8db2462b09e4fd25a3ac0b16d1133f1e2a618ebb762d8aed28b56106e714f2/_data]
| ext4 rw,relatime,e
| |-/var/lib/kubelet/pods/4b39e47c-565f-423d-a43a-c20de4e6e8f9/volumes/kubernetes.io~projected/kube-api-access-hvdgm
| | tmpfs tmpfs rw,relatime,s
| |-/var/lib/kubelet/pods/0e710f03-13f3-4b97-b550-f27c955aa524/volumes/kubernetes.io~projected/kube-api-access-l9g7f
| | tmpfs tmpfs rw,relatime,s
| `-/var/lib/kubelet/pods/9e9a22b6-fc03-4016-8099-b4b60fc1ebe4/volumes/kubernetes.io~projected/kube-api-access-l6h95
| tmpfs tmpfs rw,relatime,s
|-/usr/lib/modules /dev/nvme0n1p2[/usr/lib/modules] ext4 ro,relatime,e
|-/etc/resolv.conf /dev/nvme0n1p2[/var/lib/docker/containers/9a1cc012a541fb75a6f72a7d82b4beac3569d875ee86acb4c11556627737bfc5/resolv.conf]
| ext4 rw,relatime,e
|-/etc/hostname /dev/nvme0n1p2[/var/lib/docker/containers/9a1cc012a541fb75a6f72a7d82b4beac3569d875ee86acb4c11556627737bfc5/hostname]
| ext4 rw,relatime,e
`-/etc/hosts /dev/nvme0n1p2[/var/lib/docker/containers/9a1cc012a541fb75a6f72a7d82b4beac3569d875ee86acb4c11556627737bfc5/hosts]
ext4 rw,relatime,e
TARGET SOURCE FSTYPE OPTIONS
/sys/fs/cgroup cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot
nodev cgroup
nodev cgroup2
cgroup2fs
root@myk8s-worker:/#

The lsns -p $$ shows the namespaces of the current shell process, which is part of the host system. All namespaces (time, user, mnt, uts, ipc, pid, net, cgroup) are associated with PID 1 (/sbin/init) which indicates that the host process is using the root namespaces of the system.

root@myk8s-worker:/# lsns -p $$
NS TYPE NPROCS PID USER COMMAND
4026531834 time 15 1 root /sbin/init
4026531837 user 15 1 root /sbin/init
4026533402 mnt 9 1 root /sbin/init
4026533403 uts 13 1 root /sbin/init
4026533404 ipc 9 1 root /sbin/init
4026533405 pid 9 1 root /sbin/init
4026533406 net 13 1 root /sbin/init
4026533471 cgroup 13 1 root /sbin/init

The lsns -p $(pgrep python3) shows the namespaces of the Python process running in a container — Time and User namespaces are shared with the host (PID 1), which is common in container setups.

Network (net), UTS, and IPC namespaces are associated with PID 1144, which is the pause container. This demonstrates how the pause container creates and holds these namespaces for the pod.

Mount (mnt), PID, and Cgroup namespaces: These are unique to the Python process (PID 1206). This isolation provides the container with its own file system view, process tree, and resource control group.

While sharing some namespaces with the pause container, the Python process has its own mount, PID, and cgroup namespaces. This allows for file system and process isolation from other containers and the host, while still enabling necessary communication within the pod.

root@myk8s-worker:/# lsns -p $(pgrep python3)
NS TYPE NPROCS PID USER COMMAND
4026531834 time 15 1 root /sbin/init # host process ns
4026531837 user 15 1 root /sbin/init # host process ns
4026533716 net 2 1144 65535 /pause # pause container가 만들어둔 ns사용
4026533776 uts 2 1144 65535 /pause # pause container가 만들어둔 ns사용
4026533777 ipc 2 1144 65535 /pause # pause container가 만들어둔 ns사용
4026533779 mnt 1 1206 1000 python3 -m kube_ops_view # 자체 ns 사용
4026533780 pid 1 1206 1000 python3 -m kube_ops_view # 자체 ns 사용
4026533781 cgroup 1 1206 1000 python3 -m kube_ops_view # 자체 ns 사용

Let us create a pod (myweb2) with two containers — nginx and netshoot. These application containers share the network namespace created by the pause container, which is indeed demonstrated in the outputs.

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myweb 1/1 Running 0 116m 10.244.1.3 myk8s-worker <none> <none>
myweb2 2/2 Running 0 115m 10.244.1.4 myk8s-worker <none> <none>
sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get pod myweb2 -o jsonpath='{range .status.containerStatuses[*]}{"Container Name: "}{.name}{"\nContainer ID: "}{.containerID}{"\nImage: "}{.image}{"\n\n"}{end}'
Container Name: myweb2-netshoot
Container ID: containerd://70215de6c6931310ba8aa723ec35a6e1b44534bdeff530a882031be54815c6b5
Image: docker.io/nicolaka/netshoot:latest

Container Name: myweb2-nginx
Container ID: containerd://ab84723eb21e7557d7d51ba67311b8900d1f774dcec89f6f54ab08ac40541e32
Image: docker.io/library/nginx:latest

Both the nginx and netshoot containers show the same IP address (10.244.1.4) when queried using ip addr or ifconfig. This confirms that containers within the same pod share the same network namespace.

kubectl exec myweb2 -c myweb2-netshoot -- ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
2: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 0e:a8:7a:08:ed:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.1.4/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::ca8:7aff:fe08:edc8/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl exec myweb2 -c myweb2-nginx -- ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.244.1.4 netmask 255.255.255.0 broadcast 10.244.1.255
inet6 fe80::ca8:7aff:fe08:edc8 prefixlen 64 scopeid 0x20<link>
ether 0e:a8:7a:08:ed:c8 txqueuelen 0 (Ethernet)
RX packets 774 bytes 9535550 (9.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 635 bytes 43856 (42.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 16692 bytes 2510755 (2.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16692 bytes 2510755 (2.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

kubectl exec myweb2 -c myweb2-netshoot -it -- zsh
dP dP dP
88 88 88
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P
88' `88 88ooood8 88 Y8ooooo. 88' `88 88' `88 88' `88 88
88 88 88. ... 88 88 88 88 88. .88 88. .88 88
dP dP `88888P' dP `88888P' dP dP `88888P' `88888P' dP

Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.13

myweb2  ~  ifconfig
eth0 Link encap:Ethernet HWaddr 0E:A8:7A:08:ED:C8
inet addr:10.244.1.4 Bcast:10.244.1.255 Mask:255.255.255.0
inet6 addr: fe80::ca8:7aff:fe08:edc8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:774 errors:0 dropped:0 overruns:0 frame:0
TX packets:635 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9535550 (9.0 MiB) TX bytes:43856 (42.8 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:16764 errors:0 dropped:0 overruns:0 frame:0
TX packets:16764 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2521585 (2.4 MiB) TX bytes:2521585 (2.4 MiB)

myweb2  ~  ss -tnlp

State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 511 0.0.0.0:80 0.0.0.0:*
LISTEN 0 511 [::]:80 [::]:*

myweb2  ~  curl localhost # nginx 컨테이너가 아니지만 로컬로 접속 되고 tcp 80 listen 이다.
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

ps -ef # nginx 프로세스 정보가 보이지 않음
PID USER TIME COMMAND
1 root 0:00 /bin/bash -c while true; do sleep 5; curl localhost; done
2808 root 0:00 zsh
2901 root 0:00 sleep 5
2904 root 0:00 ps -ef
kubectl logs -f myweb2 -c myweb2-nginx

::1 - - [07/Sep/2024:15:49:44 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.7.1" "-"
::1 - - [07/Sep/2024:15:49:49 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.7.1" "-"
::1 - - [07/Sep/2024:15:49:54 +0000] "GET / HTTP/1.1" 200 615 "-" "curl/8.7.1" "-"
# 컨테이너 정보 확인 : POD 와 POD ID가 같음을 확인

sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker bash
root@myk8s-worker:/# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
70215de6c6931 27b858cdcd8ac 2 hours ago Running myweb2-netshoot 0 31bd0af24b570 myweb2
ab84723eb21e7 39286ab8a5e14 2 hours ago Running myweb2-nginx 0 31bd0af24b570 myweb2
4c98f0dd6af12 c7b4f26a7d93f 2 hours ago Running myweb-container 0 089fc095b649c myweb
2b9c27be23ead a645de6a07a3d 13 hours ago Running kube-ops-view 0 3691d12c8e8e5 kube-ops-view-657dbc6cd8-b2fsn
0cf367e375c0f 12968670680f4 13 hours ago Running kindnet-cni 0 577a47bf1458e kindnet-5hf7q
69c54a6270ecb af3ec60a3d89b 13 hours ago Running kube-proxy 0 0513f602f6d0e kube-proxy-kds58
root@myk8s-worker:/#
root@myk8s-worker:/# ps -ef | grep 'nginx -g' | grep -v grep
root 7802 7711 0 13:51 ? 00:00:00 nginx: master process nginx -g daemon off;
root 8007 7929 0 13:52 ? 00:00:00 nginx: master process nginx -g daemon off;
root@myk8s-worker:/# ps -ef | grep 'curl' | grep -v grep
root 8143 7929 0 13:52 ? 00:00:00 /bin/bash -c while true; do sleep 5; curl localhost; done

The lsns command outputs show that while the mount (mnt), PID (pid), and cgroup namespaces are separate for each container, the network (net), UTS, and IPC namespaces are shared among containers in the same pod.

Time and User Namespaces are not isolated and are the same as the host. This means containers in a pod use the same time and user contexts as the host system. Mount (mnt), UTS, and PID Namespaces are isolated for each container within a pod. This provides separation for filesystems, hostname/domain name, and process IDs between containers.

IPC, UTS, and Network Namespaces are shared among all containers within a pod — IPC (Inter-Process Communication) allows containers to use shared memory, semaphores, and message queues. Shared UTS namespace means containers in a pod have the same hostname. Shared network namespace means all containers in a pod use the same IP address and port space.

Role of the Pause Container is to creates and maintains the shared namespaces (IPC, Network, UTS). Other containers in the pod then use these shared namespaces. If a user-created container terminates abnormally, it doesn’t affect the shared namespaces. This is because the pause container, which created these namespaces, continues to run, maintaining the pod’s network identity and IPC resources.

root@myk8s-worker:/# NGINXPID=$(ps -ef | grep 'nginx -g' | grep -v grep | awk '{print $2}')
root@myk8s-worker:/# echo $NGINXPID
7802 8007

root@myk8s-worker:/# NETSHPID=$(ps -ef | grep 'curl' | grep -v grep | awk '{print $2}')
root@myk8s-worker:/# echo $NETSHPID
8143


# 한 파드 내의 각 컨테이너의 네임스페이스 정보 확인
## time, user 네임스페이스는 호스트와 같음, 격리하지 않음
## mnt, uts, pid 네임스페이스는 컨테이너별로 격리
## ipc, uts, net 네임스페이스는 파드 내의 컨테이너 간 공유 (IPC : 컨테이너 프로세스간 공유 - signal, socket, pipe 등)
## Pause 컨테이너는 IPC, Network, UTS 네임스페이스를 생성하고 유지 -> 나머지 컨테이너들은 해당 네임스페이스를 공유하여 사용
## 유저가 실행한 특정 컨테이너가 비정상 종료되어 컨터이너 전체에서 공유되는 네임스페이스에 문제가 발생하는 것을 방지

root@myk8s-worker:/# lsns -p $NGINXPID
lsns: --task is mutually exclusive with <namespace>

# pause 정보 확인 :
root@myk8s-worker:/# lsns -p $NETSHPID
NS TYPE NPROCS PID USER COMMAND
4026531834 time 56 1 root /sbin/init
4026531837 user 56 1 root /sbin/init
4026533849 net 20 7948 65535 /pause
4026533909 uts 20 7948 65535 /pause
4026533910 ipc 20 7948 65535 /pause
4026533915 mnt 2 8143 root /bin/bash -c while true; do sleep 5; curl localhost; done
4026533916 pid 2 8143 root /bin/bash -c while true; do sleep 5; curl localhost; done
4026533917 cgroup 2 8143 root /bin/bash -c while true; do sleep 5; curl localhost; done

root@myk8s-worker:/# PAUSEPID=1144
root@myk8s-worker:/# lsns -p $PAUSEPID
NS TYPE NPROCS PID USER COMMAND
4026531834 time 56 1 root /sbin/init
4026531837 user 56 1 root /sbin/init
4026533471 cgroup 18 1 root /sbin/init
4026533716 net 2 1144 65535 /pause
4026533775 mnt 1 1144 65535 /pause
4026533776 uts 2 1144 65535 /pause
4026533777 ipc 2 1144 65535 /pause
4026533778 pid 1 1144 65535 /pause

root@myk8s-worker:/# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
70215de6c6931 27b858cdcd8ac 2 hours ago Running myweb2-netshoot 0 31bd0af24b570 myweb2
ab84723eb21e7 39286ab8a5e14 2 hours ago Running myweb2-nginx 0 31bd0af24b570 myweb2
4c98f0dd6af12 c7b4f26a7d93f 2 hours ago Running myweb-container 0 089fc095b649c myweb
2b9c27be23ead a645de6a07a3d 13 hours ago Running kube-ops-view 0 3691d12c8e8e5 kube-ops-view-657dbc6cd8-b2fsn
0cf367e375c0f 12968670680f4 13 hours ago Running kindnet-cni 0 577a47bf1458e kindnet-5hf7q
69c54a6270ecb af3ec60a3d89b 13 hours ago Running kube-proxy 0 0513f602f6d0e kube-proxy-kds58
root@myk8s-worker:/# crictl ps -q
70215de6c6931310ba8aa723ec35a6e1b44534bdeff530a882031be54815c6b5
ab84723eb21e7557d7d51ba67311b8900d1f774dcec89f6f54ab08ac40541e32
4c98f0dd6af12e8944a81de5192cd86aa3f2db49684cbb61f54ef7caa8a63eaa
2b9c27be23ead48ccb30ec660663e3da89bc1db6065dc07efe5cf7bf6210b8d1
0cf367e375c0f7d8d92ff4d9cd650058a41e02baa09897c404e5a95f0c9fff60
69c54a6270ecb1a845b6d6d1317d56db7404d95e4b34328a7cc52ff554ebbfe2

# 개별 컨테이너에 명령 실행 : IP 동일 확인
root@myk8s-worker:/# crictl exec -its 70215de6c6931310ba8aa723ec35a6e1b44534bdeff530a882031be54815c6b5 ifconfig
eth0 Link encap:Ethernet HWaddr 0E:A8:7A:08:ED:C8
inet addr:10.244.1.4 Bcast:10.244.1.255 Mask:255.255.255.0
inet6 addr: fe80::ca8:7aff:fe08:edc8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:774 errors:0 dropped:0 overruns:0 frame:0
TX packets:636 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9535550 (9.0 MiB) TX bytes:43926 (42.8 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:17220 errors:0 dropped:0 overruns:0 frame:0
TX packets:17220 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2590175 (2.4 MiB) TX bytes:2590175 (2.4 MiB)


root@myk8s-worker:/# crictl exec -its ab84723eb21e7557d7d51ba67311b8900d1f774dcec89f6f54ab08ac40541e32 ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.244.1.4 netmask 255.255.255.0 broadcast 10.244.1.255
inet6 fe80::ca8:7aff:fe08:edc8 prefixlen 64 scopeid 0x20<link>
ether 0e:a8:7a:08:ed:c8 txqueuelen 0 (Ethernet)
RX packets 774 bytes 9535550 (9.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 636 bytes 43926 (42.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 17244 bytes 2593785 (2.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 17244 bytes 2593785 (2.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

# PAUSE 의 NET 네임스페이스 PID 확인 및 IP 정보 확인

root@myk8s-worker:/# lsns -t net
nsenter -t $PAUSEPID -n ip -c addr
nsenter -t $NGINXPID -n ip -c addr
nsenter -t $NETSHPID -n ip -c addr
NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
4026533406 net 16 1 root unassigned /sbin/init
4026533716 net 2 1144 65535 1 /run/netns/cni-28887e40-76be-1ae2-c009-2236a0e332a3 /pause
4026533782 net 18 7731 65535 2 /run/netns/cni-64870bb7-efc1-ff31-113b-6181abbef0d6 /pause
4026533849 net 20 7948 65535 3 /run/netns/cni-b46c2dcd-d273-d6cc-e720-562526556ae1 /pause
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether d2:d9:9f:26:77:86 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.1.2/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::d0d9:9fff:fe26:7786/64 scope link
valid_lft forever preferred_lft forever
nsenter: failed to execute 8007: No such file or directory
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 0e:a8:7a:08:ed:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.1.4/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::ca8:7aff:fe08:edc8/64 scope link
valid_lft forever preferred_lft forever
root@myk8s-worker:/#

The output shows that both containers (nginx and netshoot) are sharing the same IPC, UTS, and network namespaces. This is evident from the identical paths for these namespaces (/proc/7948/ns/ipc, /proc/7948/ns/uts, /proc/7948/ns/net). The pause container (PID 7948) is creating and maintaining these shared namespaces.

Distroless Container are minimal containers that only contain the application and its runtime dependencies, making them smaller and potentially more secure. By leveraging the shared namespaces, especially the network namespace, you can attach ephemeral debugging containers to a pod running a Ephemeral Containers which allows for debugging without having to include debugging tools in the production container.

root@myk8s-worker:/# crictl inspect ab84723eb21e7557d7d51ba67311b8900d1f774dcec89f6f54ab08ac40541e32 | jq '.info.runtimeSpec.linux.namespaces[] | .path'
null
"/proc/7948/ns/ipc"
"/proc/7948/ns/uts"
null
"/proc/7948/ns/net"
null
root@myk8s-worker:/# crictl inspect 70215de6c6931310ba8aa723ec35a6e1b44534bdeff530a882031be54815c6b5 | jq '.info.runtimeSpec.linux.namespaces[] | .path'
null
"/proc/7948/ns/ipc"
"/proc/7948/ns/uts"
null
"/proc/7948/ns/net"
null

--

--

No responses yet