The beauty of Flannel CNI

KANS Study Week 2–2

Sigrid Jin
33 min readSep 7, 2024

The Kubernetes network model must satisfy four key requirements and address four specific problems;

  • Pod-to-pod communication without NAT
  • Enabling inter-container communication within a pod via loopback, Facilitating pod-to-pod communication across the cluster;
  • Node agents (kubelet, system daemons) must be able to communicate with pods, Enabling internal cluster communication through Services;
  • Pods using host networking must communicate with other pods without NAT;
  • Service cluster IP ranges must not overlap with pod IP ranges;
  • Allowing external access to Services from outside the cluster;

CNI is a crucial component in solving the Kubernetes networking problems, responsible for creating network interfaces, assigning IP addresses, and applying network policies. Its significance cannot be overstated in the Kubernetes ecosystem.

The plugins are responsible for creating network interfaces, such as virtual Ethernet devices, for containers. These interfaces are then attached to the container’s network namespace, providing connectivity to the pod. Another primary functions of CNI plugins is to assign IP addresses to network interfaces, which is to manage IP address pools and ensuring unique address allocation across the cluster. CNI plugins often implement IPAM (IP Address Management) functionality to handle this task efficiently.

https://ikcoo.tistory.com/101

Additionally, CNI plugins play a vital role in implementing network policies within Kubernetes clusters. These policies define rules for controlling traffic flow between pods, namespaces, and external networks, enhancing security and isolation. To accommodate diverse networking requirements and topologies, CNI plugins support multiple network modes. These may include bridge networks, VLANs, overlay networks, and more, providing flexibility in network design and implementation.

CNI follows a plugin-based architecture, which allows for modular and extensible network configurations. When specific events occur within the Kubernetes cluster, the orchestration tool invokes the appropriate CNI plugin to perform the necessary network setup. Each CNI plugin is defined using a JSON-formatted configuration file. This configuration file specifies the plugin’s behavior, including details such as the network type, subnet information, and any plugin-specific parameters. The JSON format allows for easy modification and version control of network configurations.

There are numerous CNI plugins available, each designed to address specific networking requirements.

https://blog.laputa.io/kubernetes-flannel-networking-6a1cb1f8ec7c
  • Flannel: A simple and easy-to-use overlay network plugin, ideal for basic Kubernetes deployments.
  • Calico: Emphasizes network policy and security, excelling in network isolation and policy enforcement.
  • Weave: Provides automatic mesh networking and service discovery capabilities.
  • Cilium: Offers high-performance networking and security based on eBPFtechnology.
  • Multus: Enables the use of multiple CNI plugins, allowing pods to have multiple network interfaces.
https://white-polarbear.tistory.com/84

Flannel is a lightweight overlay network plugin designed to provide network connectivity within Kubernetes clusters. It creates a flat network that allows each pod to have a unique, routable IP address. It typically uses VXLAN (Virtual Extensible LAN) — Link to create its overlay network. VXLAN encapsulates Layer 2 Ethernet frames within UDP packets, using port 8472 for inter-node communication. This tunneling technique allows pods to communicate across nodes as if they were on the same local network.

UDP port 8472 is commonly associated with VXLAN (Virtual Extensible LAN) tunneling. VXLAN is a network virtualization technology that encapsulates Layer 2 Ethernet frames within Layer 3 IP/UDP packets, allowing communication between virtual machines or pods across different physical networks. VXLAN uses UDP for transport because UDP provides efficient, low-latency delivery of data without requiring the overhead of connection management (as in TCP) — Link Link 2

In a Flannel setup, each pod’s eth0 interface is paired with a vethY (virtual ethernet of worker node) interface in the host namespace. The vethY interface is then connected to a bridge interface (typically named cni0) on the host. This bridge facilitates communication between pods on the same node.

Flannel creates a VTEP interface (usually named flannel.1) on each node. This interface handles the encapsulation and decapsulation of packets for inter-node communication. The flannel.1 interface allocated per each node utilizes the host’s primary network interface (eth0) to transmit encapsulated packets between cluster nodes.

Flannel assigns each node a subnet from which it can allocate IP addresses to pods. This subnet information is distributed to all nodes in the cluster, either through etcd or the Kubernetes API server. Each node maintains a routing table with this information, enabling cluster-wide pod-to-pod communication.

When a pod on one node needs to communicate with a pod on another node, the packet is first routed to the local flannel.1 interface. Flannel then encapsulates the packet in a VXLAN header and sends it to the destination flannel.1 interface via the host network. On the receiving node, the packet is decapsulated and routed to the appropriate pod.

When a pod communicates with another pod, the following process occurs:

a) Intra-node communication (pods on the same node):

  • Traffic goes through the veth pair to the cni0 bridge.
  • The cni0 bridge forwards the traffic to the destination pod’s veth interface.

b) Inter-node communication (pods on different nodes):

  • Traffic from the source pod goes to the cni0 bridge.
  • The node’s routing table directs traffic for other pod CIDRs to the flannel.1 interface.
  • Flannel encapsulates the packet using VXLAN.
  • The encapsulated packet is sent over the physical network to the destination node.
  • On the destination node, Flannel decapsulates the packet.
  • The packet is then forwarded to the appropriate pod via the cni0 bridge and veth pair.

In conclusion, you can explain the network overlay as the following:

  1. Node Level:
  • eth0 (in node): This is the physical network interface of the (worker) node. It’s connected to the external network and other nodes.
  • flannel.1: This is a virtual interface created by Flannel. It handles inter-node pod communication using an overlay network (often VXLAN).
  • cni0: This is a bridge interface created by the Container Network Interface (CNI) plugin. It acts as a virtual switch for pods within the node.

2. Pod Level:

  • eth0 (in pod): This is the pod’s network interface. It’s actually one end of a veth pair.

3. Connecting Pods to Nodes:

  • veth pairs: These are virtual Ethernet devices that always come in pairs. One end is in the pod’s network namespace (appearing as eth0 to the pod), and the other end is in the host’s network namespace (attached to the cni0 bridge).

Kubernetes Network Structure as Air Travel:

Inter-Pod Communication (Same Node): This is like a domestic direct flight. For example, flying from ICN (Incheon, Korea) to CJU (Jeju, Korea).

Passenger (data) -> ICN Gate (Pod1 eth0) -> ICN Terminal (veth1) -> ICN Runway (cni0 bridge) -> CJU Terminal (veth2) -> CJU Gate (Pod2 eth0)

Inter-Pod Communication (Different Nodes): This resembles an international flight with a layover. Let us say that you are flying from ICN (Incheon) to JFK (New York) via SFO (San Francisco).

Passenger (data) -> ICN Gate (Pod1 eth0) -> ICN Terminal (veth1) -> ICN International Terminal (cni0 bridge) -> Asiana Airlines plane (flannel.1) -> ICN Runway (node1 eth0) -> [Pacific Ocean Crossing] -> SFO Runway (node2 eth0) -> Asiana Airlines plane (flannel.1) -> SFO International Terminal (cni0 bridge) -> SFO Domestic Terminal (veth2) -> San Jose Airport Gate (Pod2 eth0)

Pod to External Network: This is similar to a direct international flight to a remote destination.

For example, a special charter flight from ICN (Incheon) to ASU (Asunción, Paraguay). Passenger (data) -> ICN Gate (Pod eth0) -> ICN Terminal (veth) -> ICN International Terminal (cni0 bridge) -> Passport Control & Visa Issuance (NAT) -> ICN Runway (node eth0) -> ASU Silvio Pettirossi International Airport (External Network)

cat <<EOF> kind-cni.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
labels:
mynode: control-plane
extraPortMappings:
- containerPort: 30000
hostPort: 30000
- containerPort: 30001
hostPort: 30001
- containerPort: 30002
hostPort: 30002
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
controllerManager:
extraArgs:
bind-address: 0.0.0.0
etcd:
local:
extraArgs:
listen-metrics-urls: http://0.0.0.0:2381
scheduler:
extraArgs:
bind-address: 0.0.0.0
- |
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0
- role: worker
labels:
mynode: worker
- role: worker
labels:
mynode: worker2
networking:
disableDefaultCNI: true
EOF

# kind create cluster --config kind-cni.yaml --name myk8s --image kindest/node:v1.30.4

sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-control-plane ip -br -c -4 addr
lo UNKNOWN 127.0.0.1/8
eth0@if149 UP 172.19.0.4/16
sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker ip -br -c -4 addr
lo UNKNOWN 127.0.0.1/8
eth0@if145 UP 172.19.0.2/16
sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker2 ip -br -c -4 addr
lo UNKNOWN 127.0.0.1/8
eth0@if147 UP 172.19.0.3/16

# docker exec -it myk8s-control-plane sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump iputils-ping htop git nano -y'
# docker exec -it myk8s-worker sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump iputils-ping -y'
# docker exec -it myk8s-worker2 sh -c 'apt update && apt install tree jq psmisc lsof wget bridge-utils tcpdump iputils-ping -y'
docker exec -it myk8s-control-plane bash

apt install golang -y
git clone https://github.com/containernetworking/plugins
cd plugins
chmod +x build_linux.sh

root@myk8s-control-plane:/plugins# ./build_linux.sh
Building plugins
bandwidth
firewall
portmap
sbr
tuning
vrf
bridge
dummy
host-device
ipvlan
loopback
macvlan
ptp
tap
vlan
dhcp
host-local
static

root@myk8s-control-plane:/plugins# ls -l bin
total 77160
-rwxr-xr-x 1 root root 4172790 Sep 7 18:31 bandwidth
-rwxr-xr-x 1 root root 4559683 Sep 7 18:31 bridge
-rwxr-xr-x 1 root root 10628276 Sep 7 18:31 dhcp
-rwxr-xr-x 1 root root 4192997 Sep 7 18:31 dummy
-rwxr-xr-x 1 root root 4617977 Sep 7 18:31 firewall
-rwxr-xr-x 1 root root 4081411 Sep 7 18:31 host-device
-rwxr-xr-x 1 root root 3448194 Sep 7 18:31 host-local
-rwxr-xr-x 1 root root 4215186 Sep 7 18:31 ipvlan
-rwxr-xr-x 1 root root 3544931 Sep 7 18:31 loopback
-rwxr-xr-x 1 root root 4249960 Sep 7 18:31 macvlan
-rwxr-xr-x 1 root root 3962584 Sep 7 18:31 portmap
-rwxr-xr-x 1 root root 4376198 Sep 7 18:31 ptp
-rwxr-xr-x 1 root root 3765855 Sep 7 18:31 sbr
-rwxr-xr-x 1 root root 3102390 Sep 7 18:31 static
-rwxr-xr-x 1 root root 4274060 Sep 7 18:31 tap
-rwxr-xr-x 1 root root 3636463 Sep 7 18:31 tuning
-rwxr-xr-x 1 root root 4209007 Sep 7 18:31 vlan
-rwxr-xr-x 1 root root 3938832 Sep 7 18:31 vrf

exit

--------

sigridjineth@sigridjineth-Z590-VISION-G:~$ docker cp -a myk8s-control-plane:/plugins/bin/bridge .
Successfully copied 4.56MB to /home/sigridjineth/.

sigridjineth@sigridjineth-Z590-VISION-G:~$ ls -l bridge
-rwxr-xr-x 1 sigridjineth sigridjineth 4559683 9月 8 03:31 bridge
watch -d kubectl get pod -A -owide

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get ns --show-labels
NAME STATUS AGE LABELS
default Active 9m49s kubernetes.io/metadata.name=default
kube-flannel Active 15s k8s-app=flannel,kubernetes.io/metadata.name=kube-flannel,pod-security.kubernetes.io/enforce=privileged
kube-node-lease Active 9m49s kubernetes.io/metadata.name=kube-node-lease
kube-public Active 9m49s kubernetes.io/metadata.name=kube-public
kube-system Active 9m49s kubernetes.io/metadata.name=kube-system
local-path-storage Active 9m46s kubernetes.io/metadata.name=local-path-storage
sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get ds,pod,cm -n kube-flannel
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds 3 3 3 3 3 <none> 19s

NAME READY STATUS RESTARTS AGE
pod/kube-flannel-ds-hrx88 1/1 Running 0 19s
pod/kube-flannel-ds-lbljd 1/1 Running 0 19s
pod/kube-flannel-ds-tbct7 1/1 Running 0 19s

NAME DATA AGE
configmap/kube-flannel-cfg 2 19s
configmap/kube-root-ca.crt 1 19s

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl describe cm -n kube-flannel kube-flannel-cfg
Name: kube-flannel-cfg
Namespace: kube-flannel
Labels: app=flannel
k8s-app=flannel
tier=node
Annotations: <none>

Data
====
cni-conf.json:
----
{
"name": "cbr0", # the information of bridge interface
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}

net-conf.json:
----
{
"Network": "10.244.0.0/16", # ip range for pod allocation.
"EnableNFTables": false,
"Backend": {
"Type": "vxlan" # flannel network type: vxlan
}
}


BinaryData
====

Events: <none>

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl describe ds -n kube-flannel kube-flannel-ds
Name: kube-flannel-ds
Selector: app=flannel
Node-Selector: <none>
Labels: app=flannel
k8s-app=flannel
tier=node
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 3
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=flannel
tier=node
Service Account: flannel
Init Containers:
install-cni-plugin:
Image: docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel2
Port: <none>
Host Port: <none>
Command:
cp
Args:
-f
/flannel
/opt/cni/bin/flannel
Environment: <none>
Mounts:
/opt/cni/bin from cni-plugin (rw)
install-cni:
Image: docker.io/flannel/flannel:v0.25.6
Port: <none>
Host Port: <none>
Command:
cp
Args:
-f
/etc/kube-flannel/cni-conf.json
/etc/cni/net.d/10-flannel.conflist
Environment: <none>
Mounts:
/etc/cni/net.d from cni (rw)
/etc/kube-flannel/ from flannel-cfg (rw)
Containers:
kube-flannel:
Image: docker.io/flannel/flannel:v0.25.6
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: (v1:metadata.name)
POD_NAMESPACE: (v1:metadata.namespace)
EVENT_QUEUE_DEPTH: 5000
Mounts:
/etc/kube-flannel/ from flannel-cfg (rw)
/run/flannel from run (rw)
/run/xtables.lock from xtables-lock (rw)
Volumes:
run:
Type: HostPath (bare host directory volume)
Path: /run/flannel
HostPathType:
cni-plugin:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
flannel-cfg:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-flannel-cfg
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
Priority Class Name: system-node-critical
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 55s daemonset-controller Created pod: kube-flannel-ds-lbljd
Normal SuccessfulCreate 55s daemonset-controller Created pod: kube-flannel-ds-tbct7
Normal SuccessfulCreate 55s daemonset-controller Created pod: kube-flannel-ds-hrx88

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl exec -it ds/kube-flannel-ds -n kube-flannel -c kube-flannel -- ls -l /etc/kube-flannel
total 0
lrwxrwxrwx 1 root root 20 Sep 7 18:32 cni-conf.json -> ..data/cni-conf.json
lrwxrwxrwx 1 root root 20 Sep 7 18:32 net-conf.json -> ..data/net-conf.json

# failed to find plugin "bridge" in path [/opt/cni/bin]
sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-hrx88 1/1 Running 0 83s 172.19.0.3 myk8s-worker2 <none> <none>
kube-flannel kube-flannel-ds-lbljd 1/1 Running 0 83s 172.19.0.4 myk8s-control-plane <none> <none>
kube-flannel kube-flannel-ds-tbct7 1/1 Running 0 83s 172.19.0.2 myk8s-worker <none> <none>
kube-system coredns-7db6d8ff4d-d7zs4 0/1 ContainerCreating 0 10m <none> myk8s-worker2 <none> <none>
kube-system coredns-7db6d8ff4d-s62dj 0/1 ContainerCreating 0 10m <none> myk8s-worker2 <none> <none>
kube-system etcd-myk8s-control-plane 1/1 Running 0 10m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-apiserver-myk8s-control-plane 1/1 Running 0 10m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-controller-manager-myk8s-control-plane 1/1 Running 0 10m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-proxy-8b558 1/1 Running 0 10m 172.19.0.2 myk8s-worker <none> <none>
kube-system kube-proxy-b9f58 1/1 Running 0 10m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-proxy-qkr6c 1/1 Running 0 10m 172.19.0.3 myk8s-worker2 <none> <none>
kube-system kube-scheduler-myk8s-control-plane 1/1 Running 0 10m 172.19.0.4 myk8s-control-plane <none> <none>
local-path-storage local-path-provisioner-7d4d9bdcc5-z2l7s 0/1 ContainerCreating 0 10m <none> myk8s-worker2 <none> <none>

kubectl describe pod -n kube-system -l k8s-app=kube-dns
Warning FailedCreatePodSandBox 73s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f19b5fd5a1bf399647422b8e4277f9e8c9347f931aeb99e9e389788b3c0abe04": plugin type="flannel" failed (add): failed to delegate add: failed to find plugin "bridge" in path [/opt/cni/bin]
Warning FailedCreatePodSandBox 67s (x4 over 70s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9e8d15aef99d6be6614c27b1111e04ad7c7fd58186e0c169b0187aa4763440b2": plugin type="flannel" failed (add): failed to delegate add: failed to find plugin "bridge" in path [/opt/cni/bin]

다음으로, kube-dns 파드의 상태를 살펴보면 FailedCreatePodSandBox 문제로 STATUS가 ContainerCreating 상태에 머무르는 것을 확인할 수 있습니다. 이는 네트워크 이슈로 인해 pouse 컨테이너가 생성되지 못했다는 것입니다.

이 현상을 해결하기 위해 앞서 생성한 bridge를 각각의 인스턴스의 /opt/cni/bin에 복사합니다.

#
docker cp bridge myk8s-control-plane:/opt/cni/bin/bridge
docker cp bridge myk8s-worker:/opt/cni/bin/bridge
docker cp bridge myk8s-worker2:/opt/cni/bin/bridge

docker exec -it myk8s-control-plane chmod 755 /opt/cni/bin/bridge
docker exec -it myk8s-worker chmod 755 /opt/cni/bin/bridge
docker exec -it myk8s-worker2 chmod 755 /opt/cni/bin/bridge

#
sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-control-plane ls -l /opt/cni/bin/
total 23208
-rwxr-xr-x 1 1000 1000 4559683 Sep 7 18:31 bridge
-rwxr-xr-x 1 root root 2834432 Sep 7 18:32 flannel
-rwxr-xr-x 1 root root 3698580 Aug 13 21:34 host-local
-rwxr-xr-x 1 root root 3773711 Aug 13 21:34 loopback
-rwxr-xr-x 1 root root 4261435 Aug 13 21:34 portmap
-rwxr-xr-x 1 root root 4627600 Aug 13 21:34 ptp

sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker ls -l /opt/cni/bin/
total 23208
-rwxr-xr-x 1 1000 1000 4559683 Sep 7 18:31 bridge
-rwxr-xr-x 1 root root 2834432 Sep 7 18:32 flannel
-rwxr-xr-x 1 root root 3698580 Aug 13 21:34 host-local
-rwxr-xr-x 1 root root 3773711 Aug 13 21:34 loopback
-rwxr-xr-x 1 root root 4261435 Aug 13 21:34 portmap
-rwxr-xr-x 1 root root 4627600 Aug 13 21:34 ptp

sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker2 ls -l /opt/cni/bin/
total 23208
-rwxr-xr-x 1 1000 1000 4559683 Sep 7 18:31 bridge
-rwxr-xr-x 1 root root 2834432 Sep 7 18:32 flannel
-rwxr-xr-x 1 root root 3698580 Aug 13 21:34 host-local
-rwxr-xr-x 1 root root 3773711 Aug 13 21:34 loopback
-rwxr-xr-x 1 root root 4261435 Aug 13 21:34 portmap
-rwxr-xr-x 1 root root 4627600 Aug 13 21:34 ptp

sigridjineth@sigridjineth-Z590-VISION-G:~$ for i in myk8s-control-plane myk8s-worker myk8s-worker2; do echo ">> node $i <<"; docker exec -it $i ls /opt/cni/bin/; echo; done
>> node myk8s-control-plane <<
bridge flannel host-local loopback portmap ptp

>> node myk8s-worker <<
bridge flannel host-local loopback portmap ptp

>> node myk8s-worker2 <<
bridge flannel host-local loopback portmap ptp

#
kubectl get pod -A -owide
sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get pod -A -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-hrx88 1/1 Running 0 3m44s 172.19.0.3 myk8s-worker2 <none> <none>
kube-flannel kube-flannel-ds-lbljd 1/1 Running 0 3m44s 172.19.0.4 myk8s-control-plane <none> <none>
kube-flannel kube-flannel-ds-tbct7 1/1 Running 0 3m44s 172.19.0.2 myk8s-worker <none> <none>
kube-system coredns-7db6d8ff4d-d7zs4 1/1 Running 0 13m 10.244.2.3 myk8s-worker2 <none> <none>
kube-system coredns-7db6d8ff4d-s62dj 1/1 Running 0 13m 10.244.2.4 myk8s-worker2 <none> <none>
kube-system etcd-myk8s-control-plane 1/1 Running 0 13m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-apiserver-myk8s-control-plane 1/1 Running 0 13m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-controller-manager-myk8s-control-plane 1/1 Running 0 13m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-proxy-8b558 1/1 Running 0 12m 172.19.0.2 myk8s-worker <none> <none>
kube-system kube-proxy-b9f58 1/1 Running 0 13m 172.19.0.4 myk8s-control-plane <none> <none>
kube-system kube-proxy-qkr6c 1/1 Running 0 12m 172.19.0.3 myk8s-worker2 <none> <none>
kube-system kube-scheduler-myk8s-control-plane 1/1 Running 0 13m 172.19.0.4 myk8s-control-plane <none> <none>
local-path-storage local-path-provisioner-7d4d9bdcc5-z2l7s 1/1 Running 0 13m 10.244.2.2 myk8s-worker2 <none> <none>

Flannel creates an overlay network using the 10.244.0.0/16 CIDR block. Each node in the cluster is assigned a /24 subnet from this range for pod IPs. As being more pod deployed, the iptable policy gets complicated.

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl describe cm -n kube-flannel kube-flannel-cfg
Name: kube-flannel-cfg
Namespace: kube-flannel
Labels: app=flannel
k8s-app=flannel
tier=node
Annotations: <none>

Data
====
cni-conf.json:
----
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}

net-conf.json:
----
{
"Network": "10.244.0.0/16",
"EnableNFTables": false,
"Backend": {
"Type": "vxlan"
}
}


BinaryData
====

Events: <none>
sigridjineth@sigridjineth-Z590-VISION-G:~$ for i in filter nat mangle raw ; do echo ">> IPTables Type : $i <<"; docker exec -it myk8s-control-plane iptables -t $i -S ; echo; done
>> IPTables Type : filter <<
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N FLANNEL-FWD
-N KUBE-EXTERNAL-SERVICES
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-KUBELET-CANARY
-N KUBE-NODEPORTS
-N KUBE-PROXY-CANARY
-N KUBE-PROXY-FIREWALL
-N KUBE-SERVICES
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A FORWARD -m comment --comment "flanneld forward" -j FLANNEL-FWD
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes load balancer firewall" -j KUBE-PROXY-FIREWALL
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A FLANNEL-FWD -s 10.244.0.0/16 -m comment --comment "flanneld forward" -j ACCEPT
-A FLANNEL-FWD -d 10.244.0.0/16 -m comment --comment "flanneld forward" -j ACCEPT
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

>> IPTables Type : nat <<
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N DOCKER_OUTPUT
-N DOCKER_POSTROUTING
-N FLANNEL-POSTRTG
-N KUBE-KUBELET-CANARY
-N KUBE-MARK-MASQ
-N KUBE-NODEPORTS
-N KUBE-POSTROUTING
-N KUBE-PROXY-CANARY
-N KUBE-SEP-BGFZBX5GRCLSKMIC
-N KUBE-SEP-DLP2S2N3HX5UKLVP
-N KUBE-SEP-RJHMR3QLYGJVBWVL
-N KUBE-SEP-RT6NLFOJTZIVAVHH
-N KUBE-SEP-TFTZVOJFQDTMM5AB
-N KUBE-SEP-ZH42AN6Z2PIZT2OV
-N KUBE-SEP-ZHICQ2ODADGCY7DS
-N KUBE-SERVICES
-N KUBE-SVC-ERIFXISQEP7F7OF4
-N KUBE-SVC-JD5MR3NA4I4DYORP
-N KUBE-SVC-NPX46M4PTMTKRN6Y
-N KUBE-SVC-TCOU7JCQXEZGVUNU
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -d 172.19.0.1/32 -j DOCKER_OUTPUT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -d 172.19.0.1/32 -j DOCKER_OUTPUT
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -d 172.19.0.1/32 -j DOCKER_POSTROUTING
-A POSTROUTING -m comment --comment "flanneld masq" -j FLANNEL-POSTRTG
-A DOCKER_OUTPUT -d 172.19.0.1/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:38073
-A DOCKER_OUTPUT -d 172.19.0.1/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:52841
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 38073 -j SNAT --to-source 172.19.0.1:53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 52841 -j SNAT --to-source 172.19.0.1:53
-A FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s 10.244.0.0/24 -d 10.244.0.0/16 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s 10.244.0.0/16 -d 10.244.0.0/24 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG ! -s 10.244.0.0/16 -d 10.244.0.0/24 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s 10.244.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
-A FLANNEL-POSTRTG ! -s 10.244.0.0/16 -d 10.244.0.0/16 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
-A KUBE-SEP-BGFZBX5GRCLSKMIC -s 10.244.2.4/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-BGFZBX5GRCLSKMIC -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.2.4:53
-A KUBE-SEP-DLP2S2N3HX5UKLVP -s 10.244.2.3/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-DLP2S2N3HX5UKLVP -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.2.3:53
-A KUBE-SEP-RJHMR3QLYGJVBWVL -s 10.244.2.4/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-RJHMR3QLYGJVBWVL -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.2.4:53
-A KUBE-SEP-RT6NLFOJTZIVAVHH -s 172.19.0.4/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-RT6NLFOJTZIVAVHH -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 172.19.0.4:6443
-A KUBE-SEP-TFTZVOJFQDTMM5AB -s 10.244.2.3/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-TFTZVOJFQDTMM5AB -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.2.3:9153
-A KUBE-SEP-ZH42AN6Z2PIZT2OV -s 10.244.2.4/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZH42AN6Z2PIZT2OV -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.2.4:9153
-A KUBE-SEP-ZHICQ2ODADGCY7DS -s 10.244.2.3/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZHICQ2ODADGCY7DS -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.2.3:53
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp -> 10.244.2.3:53" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-DLP2S2N3HX5UKLVP
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp -> 10.244.2.4:53" -j KUBE-SEP-BGFZBX5GRCLSKMIC
-A KUBE-SVC-JD5MR3NA4I4DYORP ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics -> 10.244.2.3:9153" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-TFTZVOJFQDTMM5AB
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics -> 10.244.2.4:9153" -j KUBE-SEP-ZH42AN6Z2PIZT2OV
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> 172.19.0.4:6443" -j KUBE-SEP-RT6NLFOJTZIVAVHH
-A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns -> 10.244.2.3:53" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZHICQ2ODADGCY7DS
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns -> 10.244.2.4:53" -j KUBE-SEP-RJHMR3QLYGJVBWVL

>> IPTables Type : mangle <<
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
-N KUBE-IPTABLES-HINT
-N KUBE-KUBELET-CANARY
-N KUBE-PROXY-CANARY

>> IPTables Type : raw <<
-P PREROUTING ACCEPT
-P OUTPUT ACCEPT

We can confirm this by examining the subnet.env file on each node.

  • The overall network CIDR is 10.244.0.0/16
  • Each node gets a /24 subnet (e.g., 10.244.0.0/24, 10.244.1.0/24, 10.244.2.0/24)
  • MTU is set to 65485, which is optimized for VXLAN encapsulation
  • IP masquerading is enabled for external communication
sigridjineth@sigridjineth-Z590-VISION-G:~$ for i in myk8s-control-plane myk8s-worker myk8s-worker2; do echo ">> node $i <<"; docker exec -it $i cat /run/flannel/subnet.env ; echo; done
>> node myk8s-control-plane <<
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24 # 해당 노드에 파드가 배포 시 할당 할 수 있는 네트워크 대역
FLANNEL_MTU=1450 # MTU 지정
FLANNEL_IPMASQ=true # 파드가 외부(인터넷) 통신 시 해당 노드의 마스커레이딩을 사용

>> node myk8s-worker <<
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.1.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

>> node myk8s-worker2 <<
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.2.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

The output shows Pod CIDR assignment which illustrates that each node is assiged a /24 subnet from the 10.244.0.0/16 CIDR block for pod allocation.

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' ;echo
10.244.0.0/24 10.244.1.0/24 10.244.2.0/24

Flannel is using VXLAN (Virtual Extensible LAN) as its backend type. Each node has a unique VTEP (VXLAN Tunnel Endpoint) MAC address and a public IP for inter-node communication.

sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl describe node | grep -A3 Annotations
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f6:2d:77:84:77:6c"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.19.0.4
--
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"92:2e:91:2c:81:b3"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.19.0.2
--
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"9e:f2:78:e7:96:c2"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.19.0.3

The lsns command outputs show that Flannel and kube-proxy are running in separate network namespaces from the host, which is crucial for network isolation and management.

  • flannel.1: This is the VXLAN interface created by Flannel on each node. The interface has an IP of 10.244.X.0/32 (where X is the node number).
  • cni0 has an IP of 10.244.X.1/24. This is the bridge interface that connects all pods on a node. It's only present when there are pods scheduled on the node.

ARP and FDB Tables map the Flannel IPs to their corresponding VTEP MAC addresses and node IPs, facilitating VXLAN encapsulation and routing. The successful ping tests to 10.244.0.0, 10.244.1.0, and 10.244.2.0 demonstrate that the VXLAN overlay network is functioning correctly, allowing communication between pods on different nodes.

NAT table implements masquerading for traffic from pods to external destinations, while allowing direct communication between pods within the cluster.

sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker        bash
root@myk8s-worker:/# lsns -p 1
NS TYPE NPROCS PID USER COMMAND
4026531834 time 12 1 root /sbin/init
4026531837 user 12 1 root /sbin/init
4026533092 mnt 8 1 root /sbin/init
4026533093 uts 12 1 root /sbin/init
4026533094 ipc 8 1 root /sbin/init
4026533095 pid 8 1 root /sbin/init
4026533096 net 12 1 root /sbin/init
4026533403 cgroup 11 1 root /sbin/init

root@myk8s-worker:/# lsns -p $(pgrep flanneld)
NS TYPE NPROCS PID USER COMMAND
4026531834 time 12 1 root /sbin/init
4026531837 user 12 1 root /sbin/init
4026533093 uts 12 1 root /sbin/init
4026533096 net 12 1 root /sbin/init
4026533579 ipc 2 978 65535 /pause
4026533584 mnt 1 1173 root /opt/bin/flanneld --ip-masq --kube-subnet-mgr
4026533585 pid 1 1173 root /opt/bin/flanneld --ip-masq --kube-subnet-mgr
4026533586 cgroup 1 1173 root /opt/bin/flanneld --ip-masq --kube-subnet-mgr

root@myk8s-worker:/# lsns -p $(pgrep kube-proxy)
NS TYPE NPROCS PID USER COMMAND
4026531834 time 12 1 root /sbin/init
4026531837 user 12 1 root /sbin/init
4026533093 uts 12 1 root /sbin/init
4026533096 net 12 1 root /sbin/init
4026533403 cgroup 11 1 root /sbin/init
4026533569 ipc 2 325 65535 /pause
4026533573 mnt 1 357 root /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=myk8s-worker
4026533574 pid 1 357 root /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=myk8s-worker

# Routing Table: this shows routes to other nodes' pod CIDRs via the flannel.1 interface, enabling inter-node pod communication.
root@myk8s-worker:/# ip -c link | grep -E 'flannel|cni|veth' -A1
2: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 92:2e:91:2c:81:b3 brd ff:ff:ff:ff:ff:ff

root@myk8s-worker:/# ip -c -d addr show cni0
Device "cni0" does not exist.

# 네트워크 네임스페이스 격리 파드가 1개 이상 배치 시 확인됨
sigridjineth@sigridjineth-Z590-VISION-G:~$ docker exec -it myk8s-worker2 bash
root@myk8s-worker2:/# ip -c -d addr show cni0
3: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
link/ether c6:89:bf:70:b6:3c brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.c6:89:bf:70:b6:3c designated_root 8000.c6:89:bf:70:b6:3c root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 114.90 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3124 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536
inet 10.244.2.1/24 brd 10.244.2.255 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::c489:bfff:fe70:b63c/64 scope link
valid_lft forever preferred_lft forever

root@myk8s-worker:/# ip -c -d addr show flannel.1
2: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 92:2e:91:2c:81:b3 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
vxlan id 1 local 172.19.0.2 dev eth0 srcport 0 0 dstport 8472 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536
inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::902e:91ff:fe2c:81b3/64 scope link
valid_lft forever preferred_lft forever

root@myk8s-worker2:/# brctl show
bridge name bridge id STP enabled interfaces
cni0 8000.c689bf70b63c no veth117e3c5e
veth2a1be3db
veth8fd05849

# 라우팅 정보 확인 : 다른 노드의 파드 대역(podCIDR)의 라우팅 정보가 업데이트되어 있음을 확인
root@myk8s-worker2:/# ip -c route
default via 172.19.0.1 dev eth0
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 dev cni0 proto kernel scope link src 10.244.2.1
172.19.0.0/16 dev eth0 proto kernel scope link src 172.19.0.3

# flannel.1 인터페이스를 통한 ARP 테이블 정보 확인 : 다른 노드의 flannel.1 IP와 MAC 정보를 확인
root@myk8s-worker2:/# ip -c neigh show dev flannel.1
10.244.1.0 lladdr 92:2e:91:2c:81:b3 PERMANENT
10.244.0.0 lladdr f6:2d:77:84:77:6c PERMANENT

# 브리지 fdb 정보에서 해당 MAC 주소와 통신 시 각 노드의 enp0s8
root@myk8s-worker2:/# bridge fdb show dev flannel.1
92:2e:91:2c:81:b3 dst 172.19.0.2 self permanent
f6:2d:77:84:77:6c dst 172.19.0.4 self permanent

# 다른 노드의 flannel.1 인터페이스로 ping 통신 : VXLAN 오버레이를 통해서 통신
ping -c 1 10.244.0.0
ping -c 1 10.244.1.0
ping -c 1 10.244.2.0
PING 10.244.0.0 (10.244.0.0) 56(84) bytes of data.
64 bytes from 10.244.0.0: icmp_seq=1 ttl=64 time=0.063 ms

--- 10.244.0.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.063/0.063/0.063/0.000 ms
PING 10.244.1.0 (10.244.1.0) 56(84) bytes of data.
64 bytes from 10.244.1.0: icmp_seq=1 ttl=64 time=0.064 ms

--- 10.244.1.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.064/0.064/0.064/0.000 ms
PING 10.244.2.0 (10.244.2.0) 56(84) bytes of data.
64 bytes from 10.244.2.0: icmp_seq=1 ttl=64 time=0.025 ms

--- 10.244.2.0 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.025/0.025/0.025/0.000 ms
root@myk8s-worker2:/#

# iptables 필터 테이블 정보 확인 : 파드의 10.244.0.0/16 대역 끼리는 모든 노드에서 전달이 가능
iptables -t filter -S | grep 10.244.0.0
-A FLANNEL-FWD -s 10.244.0.0/16 -m comment --comment "flanneld forward" -j ACCEPT
-A FLANNEL-FWD -d 10.244.0.0/16 -m comment --comment "flanneld forward" -j ACCEPT

# iptables NAT 테이블 정보 확인 : 10.244.0.0/16 대역 끼리 통신은 마스커레이딩 없이 통신을 하며,
# 10.244.0.0/16 대역에서 동일 대역(10.244.0.0/16)과 멀티캐스트 대역(224.0.0.0/4) 를 제외한 나머지 (외부) 통신 시에는 마스커레이딩을 수행

iptables -t nat -S | grep 'flanneld masq' | grep -v '! -s'
-A POSTROUTING -m comment --comment "flanneld masq" -j FLANNEL-POSTRTG
-A FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s 10.244.2.0/24 -d 10.244.0.0/16 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s 10.244.0.0/16 -d 10.244.2.0/24 -m comment --comment "flanneld masq" -j RETURN
-A FLANNEL-POSTRTG -s 10.244.0.0/16 ! -d 224.0.0.0/4 -m comment --comment "flanneld masq" -j MASQUERADE --random-fully

When pods are created on worker nodes, Flannel automatically sets up the necessary network interfaces.

  • cni0 bridge: This is a Linux bridge created on each node to connect all pods on that node. It acts as a virtual switch for pods on the same node.
  • veth pairs: For each pod, a veth (virtual Ethernet) pair is created. One end of the pair is placed in the pod’s network namespace, while the other end is attached to the cni0 bridge.

Each worker node is assigned a subnet from the overall Flannel network CIDR (10.244.0.0/16).

  • myk8s-worker: 10.244.1.0/24
  • myk8s-worker2: 10.244.2.0/24

Pods on each node receive IP addresses from their respective subnet:

  • pod-1 on myk8s-worker: 10.244.1.2
  • pod-2 on myk8s-worker2: 10.244.2.5

On myk8s-worker2, we can see:

  • cni0 bridge interface (3: cni0)
  • Multiple veth interfaces (4: veth8fd05849, 5: veth117e3c5e, 6: veth2a1be3db, 7: vetha73ba852)
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: pod-1
labels:
app: pod
spec:
nodeSelector:
kubernetes.io/hostname: myk8s-worker
containers:
- name: netshoot-pod
image: nicolaka/netshoot
command: ["tail"]
args: ["-f", "/dev/null"]
terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Pod
metadata:
name: pod-2
labels:
app: pod
spec:
nodeSelector:
kubernetes.io/hostname: myk8s-worker2
containers:
- name: netshoot-pod
image: nicolaka/netshoot
command: ["tail"]
args: ["-f", "/dev/null"]
terminationGracePeriodSeconds: 0
EOF
sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-1 1/1 Running 0 66s 10.244.1.2 myk8s-worker <none> <none>
pod-2 1/1 Running 0 66s 10.244.2.5 myk8s-worker2 <none> <none>

docker exec -it myk8s-worker bash
docker exec -it myk8s-worker2 bash

root@myk8s-worker2:/# ip link | egrep 'cni|veth' ;echo; brctl show cni0
3: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
4: veth8fd05849@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default
link/ether a2:38:82:4e:16:d8 brd ff:ff:ff:ff:ff:ff link-netns cni-f04aef26-5f3f-4245-2e26-ba7ec64ac27a
5: veth117e3c5e@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default
link/ether 96:ba:d8:83:5f:69 brd ff:ff:ff:ff:ff:ff link-netns cni-78b7fe9e-bc4a-34b8-91f0-3f74ebdcf046
6: veth2a1be3db@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default
link/ether 3a:a0:90:c1:4d:8e brd ff:ff:ff:ff:ff:ff link-netns cni-409648c5-dcb8-2d23-3b23-e777c545fdcb
7: vetha73ba852@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default
link/ether a2:8d:dd:e8:4d:a3 brd ff:ff:ff:ff:ff:ff link-netns cni-a6003db6-7517-873b-17a6-55139f54c349

bridge name bridge id STP enabled interfaces
cni0 8000.c689bf70b63c no veth117e3c5e
veth2a1be3db
veth8fd05849
vetha73ba852

Let us explore how Flannel creates a flat network topology in Kubernetes, enabling seamless communication between pods regardless of their physical location in the cluster.

  1. Pod-to-Pod Communication within a Single Node:
  • Pods 1 (172.16.2.2) and 2 (172.16.2.3) are on the same node.
  • Communication path: Pod 1’s eth0 -> veth -> cni0 bridge (172.16.2.1) -> veth -> Pod 2’s eth0
  • The cni0 bridge acts as a local switch, directly handling communication between pods on the same node.
  • In this case, traffic doesn’t pass through the flannel.1 interface.

2. Pod to External Communication:

  • Pod 1 (172.16.2.2) is communicating with an external internet destination.
  • Communication path: Pod 1’s eth0 -> veth -> cni0 bridge -> flannel.1 (172.16.2.0) -> enp0s3 (10.0.2.15) -> External network
  • Traffic from the pod goes through the node’s network stack and is masqueraded (NAT) using the node’s IP address before leaving for the external network.

3. Pod-to-Pod Communication across Different Nodes:

  • Pod 1 (172.16.1.3) on Node 1 is communicating with Pod 2 (172.16.2.4) on Node 2.
  • Communication path: A. Pod 1’s eth0 -> veth -> cni0 bridge on Node 1 B. flannel.1 on Node 1 encapsulates the packet using VXLAN C. Packet travels over the physical network (enp0s8) to Node 2 D. flannel.1 on Node 2 decapsulates the packet E. cni0 bridge on Node 2 -> veth -> Pod 2’s eth0
  • Flannel uses VXLAN encapsulation to create an overlay network, allowing pods to communicate across nodes as if they were on the same local network.

Let us start by executing the following commands directly on the nodes.

  • ping -c 1 <pod-2 IP>: Verifies communication with a pod deployed on a different node.
  • ping -c 8.8.8.8: Confirms connectivity to an external internet IP address.
  • curl -s wttr.in/Seoul: Checks connectivity to an external internet domain by accessing weather information for Seoul.
root@myk8s-worker2:/# kubectl exec -it pod-1 -- zsh^C
root@myk8s-worker2:/# exit
exit
sigridjineth@sigridjineth-Z590-VISION-G:~$ kubectl exec -it pod-1 -- zsh
dP dP dP
88 88 88
88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P
88' `88 88ooood8 88 Y8ooooo. 88' `88 88' `88 88' `88 88
88 88 88. ... 88 88 88 88 88. .88 88. .88 88
dP dP `88888P' dP `88888P' dP dP `88888P' `88888P' dP

Welcome to Netshoot! (github.com/nicolaka/netshoot)
Version: 0.13



pod-1  ~  ip -c addr show eth0

2: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 2a:a5:3f:0c:33:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.1.2/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::28a5:3fff:fe0c:3304/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever

pod-1  ~  ip -c route
default via 10.244.1.1 dev eth0
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.2

pod-1  ~  route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.244.1.1 0.0.0.0 UG 0 0 0 eth0
10.244.0.0 10.244.1.1 255.255.0.0 UG 0 0 0 eth0
10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0

pod-1  ~  ping -c 1 10.244.1.1 <GW IP>
PING 10.244.1.1 (10.244.1.1) 56(84) bytes of data.
64 bytes from 10.244.1.1: icmp_seq=1 ttl=64 time=0.038 ms

--- 10.244.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.038/0.038/0.038/0.000 ms

pod-1  ~  ping -c 1
ping: usage error: Destination address required

pod-1  ~  ping -c 1 10.244.2.5 <pod-2 IP>
PING 10.244.2.5 (10.244.2.5) 56(84) bytes of data.
64 bytes from 10.244.2.5: icmp_seq=1 ttl=62 time=0.112 ms

--- 10.244.2.5 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.112/0.112/0.112/0.000 ms

pod-1  ~  ping -c 8.8.8.8
ping: invalid argument: '8.8.8.8'

pod-1  ~  ping -c 8.8.8.8
ping: invalid argument: '8.8.8.8'

pod-1  ~  curl -s wttr.in/Seoul
Weather report: Seoul

Mist
_ - _ - _ - 21 °C
_ - _ - _ ↗ 4 km/h
_ - _ - _ - 4 km
0.0 mm
┌─────────────┐
┌──────────────────────────────┬───────────────────────┤ Sun 08 Sep ├───────────────────────┬──────────────────────────────┐
│ Morning │ Noon └──────┬──────┘ Evening │ Night │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ \ / Sunny │ \ / Sunny │ \ / Sunny │ \ / Clear │
│ .-. +26(27) °C │ .-. +31(32) °C │ .-. +29(30) °C │ .-. +27(29) °C │
│ ― ( ) ― ← 4 km/h │ ― ( ) ― ↖ 2 km/h │ ― ( ) ― → 12-15 km/h │ ― ( ) ― → 3-5 km/h │
│ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │
│ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘
┌─────────────┐
┌──────────────────────────────┬───────────────────────┤ Mon 09 Sep ├───────────────────────┬──────────────────────────────┐
│ Morning │ Noon └──────┬──────┘ Evening │ Night │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ \ / Partly Cloudy │ \ / Partly Cloudy │ \ / Partly Cloudy │ _`/"".-. Patchy rain ne…│
│ _ /"".-. +26(27) °C │ _ /"".-. +29(31) °C │ _ /"".-. +31(34) °C │ ,\_( ). +28(31) °C │
│ \_( ). ← 5-7 km/h │ \_( ). ← 6-7 km/h │ \_( ). ↗ 7-10 km/h │ /(___(__) → 5-8 km/h │
│ /(___(__) 10 km │ /(___(__) 10 km │ /(___(__) 10 km │ ‘ ‘ ‘ ‘ 10 km │
│ 0.0 mm | 0% │ 0.0 mm | 0% │ 0.0 mm | 0% │ ‘ ‘ ‘ ‘ 0.0 mm | 89% │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘
┌─────────────┐
┌──────────────────────────────┬───────────────────────┤ Tue 10 Sep ├───────────────────────┬──────────────────────────────┐
│ Morning │ Noon └──────┬──────┘ Evening │ Night │
├──────────────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ \ / Sunny │ \ / Sunny │ \ / Sunny │ \ / Clear │
│ .-. +28(31) °C │ .-. +32(35) °C │ .-. +31(32) °C │ .-. +28(29) °C │
│ ― ( ) ― ↗ 2 km/h │ ― ( ) ― → 9-10 km/h │ ― ( ) ― ↘ 10-14 km/h │ ― ( ) ― ↘ 3-5 km/h │
│ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │ `-’ 10 km │
│ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │ / \ 0.0 mm | 0% │
└──────────────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘
Location: 서울특별시, 대한민국 [37.5666791,126.9782914]

Follow @igor_chubin for wttr.in updates

pod-1  ~  ip -c neigh
10.244.1.1 dev eth0 lladdr e2:3d:cb:15:d1:ba REACHABLE

Let’s analyze the network traffic from different perspectives in a Kubernetes cluster, focusing on cni0, flannel.1, and eth0 interfaces. We’ll examine both pod-to-pod and pod-to-external communications.

cni0 Perspective

a) Pod 1 -> Pod 2:

  • Direct ICMP traffic between 10.244.x.x IP addresses is observed.
  • This represents the starting point of inter-pod communication within the cluster.

b) Pod 1 -> 8.8.8.8 (External):

  • ICMP traffic from the pod’s IP (10.244.x.x) to 8.8.8.8 is seen.
  • This traffic will be NAT’ed later in the network stack.

flannel.1 Perspective

a) Pod 1 -> Pod 2:

  • VXLAN-encapsulated traffic between pods is observed.
  • Original ICMP packets are encapsulated in UDP.

b) Pod 1 -> 8.8.8.8 (External):

  • External traffic typically doesn’t pass through flannel.1.
  • No relevant traffic should be observed here for external communication.

eth0 Perspective

a) Pod 1 -> Pod 2:

  • VXLAN-encapsulated traffic (UDP) between node IPs (e.g., 172.19.x.x) is seen.
  • This represents inter-node communication for pod-to-pod traffic.

b) Pod 1 -> 8.8.8.8 (External):

  • NAT’ed traffic from the node’s IP to 8.8.8.8 is observed.
  • The source IP has been changed from the pod’s IP to the node’s IP.

VXLAN Analysis (eth0, UDP port 8472):

  • Capture file shows VXLAN-encapsulated traffic on UDP port 8472.
  • This confirms the use of VXLAN for pod-to-pod communication across nodes.
# On worker nodes 1 and 2
docker exec -it myk8s-worker bash
docker exec -it myk8s-worker2 bash

# Packet capture commands
tcpdump -i cni0 -nn icmp
tcpdump -i flannel.1 -nn icmp
tcpdump -i eth0 -nn icmp
tcpdump -i eth0 -nn udp port 8472 -w /root/vxlan.pcap

# Check connection tracking
conntrack -L | grep -i icmp

# Copy and analyze VXLAN traffic
docker cp myk8s-worker:/root/vxlan.pcap .
wireshark vxlan.pcap
tcpdump -i cni0 -nn icmp
tcpdump -i flannel.1 -nn icmp
tcpdump -i eth0 -nn icmp X

Even though communication between Pod 1 and Pod 2 is functioning, tcpdump does not capture the traffic. This is because the communication is using UDP.
tcpdump -i eth0 -nn udp port 8472 -w /root/vxlan.pcap

On Node 1, initiate a ping to 8.8.8.8. For Pod 1, capture network traffic to Pod 2 from the perspective of the eth0 interface, focusing on UDP port 8472.

Open the UDP dump in Wireshark to observe the packet flow, then you will see that ICMP communication between Pods is encapsulated using VXLAN, confirming the encapsulated traffic between the Pods.

The result shows the output of a packet capture using tsharks, specifically focusing on VXLAN (Virtual Extensible LAN) traffic between two nodes in a Kubernetes cluster using Flannel as the network overlay.

  1. Protocol: The capture shows UDP packets. VXLAN uses UDP for encapsulation.
  2. Source and Destination IPs: We see traffic between 172.19.0.2 and 172.19.0.3. These are likely the IP addresses of two different nodes in the Kubernetes cluster.
  3. Ports: The source port is 50300, and the destination port is 8472. Port 8472 is the standard port used for VXLAN traffic.
  4. Length: Each packet has a length of 106 bytes.
  5. Timing: The capture shows multiple packets being exchanged back and forth between the two IP addresses, indicating ongoing communication.

What this capture represents:

  • When a pod on one node tries to communicate with a pod on another node (or with an external IP like 8.8.8.8), Flannel encapsulates this traffic using VXLAN.
  • The original packet (which could be ICMP for ping, in this case) is wrapped inside a UDP packet.
  • This UDP packet is then sent from one node to another using the nodes’ IP addresses (172.19.0.2 and 172.19.0.3).
  • The receiving node’s Flannel process then unwraps the VXLAN packet and forwards the original packet to the appropriate pod.

This encapsulation allows pods to communicate across nodes as if they were on the same local network, even though they’re physically on different machines. It’s a key feature of how Flannel creates an overlay network for Kubernetes clusters.

The consistent packet size (106 bytes) and the regular back-and-forth pattern suggest this could indeed be capturing a ping (ICMP) traffic that’s been encapsulated for inter-node communication.

--

--

No responses yet