Notes on EKS Networking in AWS CNI
The study node excerpt learnt on PKOS study 2nd week
In Kubernetes, pod networking can be achieved using Container Network Interface (CNI) plugins. Two popular types of CNI plugins are overlay networks and Layer 3 implementations.
Overlay networks, such as Flannel and Weave, create a virtual network on top of an existing layer 3 network by using tunnels to route pod traffic among worker nodes. On the other hand, Layer 3 implementations use routing techniques to achieve the same goal. AWS VPC CNI plugin is an alternative to these and is the default CNI plugin used on EKS — AWS-managed Kubernetes service. It is crucial to evaluate the pros and cons of different CNI plugins to determine the best fit for your specific use case.
AWS VPC CNI
The AWS VPC CNI (Container Network Interface) plugin is an alternative approach to implementing pod networking for Kubernetes clusters running on AWS. We will utilize the approach in this post. Unlike overlay networks, the VPC CNI plugin does not use tunnels to route traffic among worker nodes. Instead, it allocates IP addresses for pods from the VPC address space, utilizing extra network interfaces (ENIs) on EC2 instances. This allows pods to communicate directly with each other via the VPC network fabric, without the need for an overlay network or additional route configuration.
As shown in the illustration above, this approach enables two worker nodes, each with a primary and secondary ENI and IP address, to communicate seamlessly with all pods, even when they are running on different nodes. This eliminates the need for tunnels or route distribution mechanisms, resulting in a flat network that can be easily integrated with other AWS services or external networks.
The AWS VPC CNI plugin takes a unique approach by assigning IP addresses from the VPC address space to pods. There is a restriction on the number of ENIs and secondary IP addresses that each EC2 instance may utilize. A t3.medium instance, for example, may connect eight ENIs, each with a main and secondary IP address. This restricts the number of possible IP addresses to 17 (3 * 5 + 2). When connecting with IP addresses outside the VPC, the main IP address on each ENI is utilized as the source IP address.
Additionally, it is important to note that certain PODs such as kube-proxy and aws-node, use the host’s IP directly.
There are two kinds of network namespaces used to separate processes’ resources: the host (
root) namespace and the per-pod namespace. In this context, the principal NIC (
eth0)of the virtual machine (VM) is in the root network namespace, whereas containers (
ctr*) inside pods are in their own network namespaces. Each pod is equipped with its own network stack and
eth0 interface. Certain pods, such kube-proxy and aws-node, use the host’s IP directly. ENIs for t3.medium instances are limited to a maximum of six IP addresses. For instance, ENI0 and ENI1 may have two main IP addresses and five secondary private IP addresses. In addition, the coredns pod is connected to the host through a veth interface that links the eniY@ifN interface on the host to the eth0 interface on the pod.
Every Pod in Kubernetes has its own network namespace (
eth0 interface. Virtual Ethernet (
veth) devices provide communication between the Pod network namespace and the Root network namespace. These devices have two virtual interfaces that are capable of spanning various network namespaces.
One end of the veth pair is put in the Root Network Namespace and the other end in the network namespace of the Pod. This enables the transmission of data between Pods and the Root namespace.
In the given scenario above, Linux Ethernet Bridges, often known as
cbr0, to ease connectivity between Pods. These L2 network devices link and combine different network segments and networks. They employ the Address Resolution Protocol (ARP) to determine the MAC address associated with a specific IP address.
When a packet must be transferred from Pod 1 to Pod 2, for instance, Pod 1 transmits the packet across its
eth0 interface. In other words, once the packet reaches the destination pod’s veth, it is then forwarded to the
eth0 interface of the destination pod’s network namespace. This process allows for communication between pods on different nodes, with the cbr0 bridge acting as the intermediary for forwarding packets between the different network namespaces.
This interface is linked to the Root namespace by means of a veth pair, especially the
veth0 interface. Once the packet is on the bridge, ARP is used to determine which network segment the packet should be sent to, in this instance “veth1.” When the packet reaches
veth1, it is sent immediately to the
eth0 interface of Pod 2 network namespace.
SNAT is a mechanism used by Kubernetes to allow pod-to-external-address communication — Source Network Address Translation. This procedure permits the substitution of the internal source IP and port of the pod with the IP (
eth0) and port of the host. When the host receives the response packet, it rewrites the pod’s IP and port as the destination and sends it back to the pod that sent it. Due to the open nature of this operation, the pod is ignorant of this address translation.
How about allowing external-to-pod communication? In order for external requests to reach a service within a Kubernetes cluster, a service must be created to expose the desired pods to external network traffic. Without service, the virtual and private IP addresses of the pods are not accessible from outside the cluster. This can be demonstrated by attempting to access the IP address of a frontend pod from an external server without a corresponding service in place.
To expose a FrontEnd service to external traffic, a NodePort service may be established. This sort of service assigns a port within a certain range (Note: default is 30000–32767) to each cluster node. This port is then used to proxy FrontEnd service traffic. The allocated port number can be found in the
.spec.ports[*].nodePort field of service. External requests may contact the FrontEnd service by using the format
<anyClusterNode>:<nodePort> to access the service. If a particular port number is required, it may be supplied in the nodePort field; however, care must be taken to verify that the intended port number is within the permitted range and does not create conflicts with other services.
Load Balancer Service Type in k8s
The Local external traffic policy in a LoadBalancer service type is used by applications that receive a significant volume of external traffic and intend to decrease latency by eliminating needless network hops. Nodes are not directly exposed to the external network in a Kubernetes cluster; only the load balancer is publicly accessible. This indicates that external clients can only connect to the load balancer and have no access to information about internal nodes. The load balancer distributes load among nodes where pods are situated, while iptables rules restrict connections to pods on its own node, by using the
externalTrafficPolicy: local setting.
DNAT (Destination Network Address Translation) happens twice: first, when the client accesses and departs the load balancer, and second when the node’s iptables rule transfers the connection to the pod’s IP. In Kubernetes clusters, it is also possible to maintain the external client’s IP. If the target of an AWS NLB is an instance, the client IP will be retained. This is also possible using the externalTrafficPolicy iptables rule setting.