Diving Deep Into Kubernetes Networking
Diving Deep Into Kubernetes Networking
Diving Deep Into Kubernetes Networking
into Kubernetes
Networking
AUTHORS
Adrian Goins
Alena Prokharchyk
Murali Paluru
DIVING DEEP INTO KUBERNETES NETWORKING TABLE OF CONTENTS
Introduction............................................................................................................. 1
Goals of This Book...................................................................................................................... 1
Container-to-Container Communication.........................................................................8
Flannel Backends.....................................................................................................................21
Conclusion.............................................................................................................40
The reader is expected to have a basic understanding of containers, Kubernetes, and operating system fundamentals.
JANUARY 2019 1
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
with Docker and all containers on the same host can communicate with one another. We can
change this by defining the network to which the container should connect, either
by creating a custom user-defined network or by using a network provider plugin.
Docker follows a unique
approach to networking The network providers are pluggable using drivers. We connect a Docker container
that is very different from to a particular network by using the --net switch when launching it.
the Kubernetes approach.
Understanding how The following command launches a container from the busybox image and joins it
Docker works help later in to the host network. This container prints its IP address and then exits.
A. Host Networking: The container shares the same IP address and network namespace as that of the host. Services
running inside of this container have the same network capabilities as services running directly on the host.
B. Bridge Networking: The container runs in a private network internal to the host. Communication is open to other
containers in the same network. Communication with services outside of the host goes through network address
translation (NAT) before exiting the host. (This is the default mode of networking when the --net option isn't specified)
C. Custom bridge network: This is the same as Bridge Networking but uses a bridge explicitly created for this (and other)
containers. An example of how to use this would be a container that runs on an exclusive "database" bridge network.
Another container can have an interface on the default bridge and the database bridge, enabling it to communicate with
both networks.
D. Container-defined Networking: A container can share the address and network configuration of another container. This
type enables process isolation between containers, where each container runs one service but where services can still
communicate with one another on the localhost address.
Host Networking
The host mode of networking allows the Docker container to share the same IP address
as that of the host and disables the network isolation otherwise provided by network
namespaces. The container’s network stack is mapped directly to the host’s network
stack. All interfaces and addresses on the host are visible within the container, and all Container
communication possible to or from the host is possible to or from the container.
If you run the command ip addr on a host (or ifconfig -a if your host doesn’t have the ip
command available), you will see information about the network interfaces.
eth0
JANUARY 2019 2
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
If you run the same command from a container using host networking, you will see the same information.
JANUARY 2019 3
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
Bridge Networking
In a standard Docker installation, the
Docker daemon creates a bridge on the
host with the name of docker0. When a
container launches, Docker then creates Container Container
a virtual ethernet device for it. This device
appears within the container as eth0 and
on the host with a name like vethxxx eth0 eth0
where xxx is a unique identifier for the
interface. The vethxxx interface is added
to the docker0 bridge, and this enables
communication with other containers on
the same host that also use the default
vethxxx vethyyy
bridge.
docker0 bridge
To demonstrate using the default bridge,
run the following command on a host
with Docker installed. Since we are not
specifying the network - the container
will connect to the default bridge when it
launches. eth0 ip tables
Run the ip addr and ip route commands inside of the container. You will see the IP address of the container with the eth0
interface:
JANUARY 2019 4
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
In another terminal connected to the host, run the ip addr command. You will see the corresponding interface created for the
container. In the image below it is named veth5dd2b68@if9. Yours will be different.
Although Docker mapped the container IPs on the bridge, network services running inside of the container are not visible outside
of the host. To make them visible, the Docker Engine must be told when launching a container to map ports from that container to
ports on the host. This process is called publishing. For example, if you want to map port 80 of a container to port 8080 on the host,
then you would have to publish the port as shown in the following command:
By default, the Docker container can send traffic to any destination. The Docker daemon creates a rule within Netfilter that
modifies outbound packets and changes the source address to be the address of the host itself. The Netfilter configuration allows
inbound traffic via the rules that Docker creates when initially publishing the container's ports.
The output included below shows the Netfilter rules created by Docker when it publishes a container’s ports.
JANUARY 2019 5
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
JANUARY 2019 6
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
• All containers in a custom bridge can communicate with the ports of other containers on that bridge. This means
that you do not need to publish the ports explicitly. It also ensures that the communication between them is
secure. Imagine an application in which a backend container and a database container need to communicate and
where we also want to make sure that no external entity can talk to the database. We do this with a custom bridge
network in which only the database container and the backend containers reside. You can explicitly expose the
backend API to the rest of the world using port publishing.
• The same is true with environment variables - environment variables in a bridge network are shared by all
containers on that bridge.
• Network configuration options such as MTU can differ between applications. By creating a bridge, you can
configure the network to best suit the applications connected to it.
To create a custom bridge network and two containers that use it, run the following commands:
Container-Defined Network
A specialized case of custom networking is when a container joins the network of another container. This is similar to how a Pod
works in Kubernetes.
The following commands launch two containers that share the same network namespace and thus share the same IP address.
Services running on one container can talk to services running on the other via the localhost address.
No Networking
This mode is useful when the container does not need to communicate with other containers or with the outside world. It is not
assigned an IP address, and it cannot publish any ports.
JANUARY 2019 7
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
CONTAINER-TO-CONTAINER COMMUNICATION
How do two containers on the same bridge network talk to one another?
1 4
PACKET
eth0 eth0
2 vethxxx 3 vethyyy
docker0 bridge
eth0 ip tables
In the above diagram, two containers running on the same host connect via the docker0 bridge. If 172.17.0.6 (on the left-hand
side) wants to send a request to 172.17.0.7 (the one on the right-hand side), the packets move as follows:
1. A packet leaves the container via eth0 and lands on the corresponding vethxxx interface.
2. The vethxxx interface connects to the vethyyy interface via the docker0 bridge.
3. The docker0 bridge forwards the packet to the vethyyy interface.
4. The packet moves to the eth0 interface within the destination container.
JANUARY 2019 8
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Networking with Docker
We can see this in action by using ping and tcpdump. Create two containers and inspect their network configuration with ip
addr and ip route. The default route for each container is via the eth0 interface.
Ping one container from the other, and let the command run so that we can inspect the traffic. Run tcpdump on the docker0
bridge on the host machine. You will see in the output that the traffic moves between the two containers via the docker0 bridge.
The overlay network functionality built into Docker is called Swarm. When you connect a host to a swarm, the Docker engine on
each host handles communication and routing between the hosts.
Other overlay networks exist, such as IPVLAN, VxLAN, and MACVLAN. More solutions are available for Kubernetes.
For more information on pure-Docker networking implementations for cross-host networking (including Swarm mode and
libnetwork), please refer to the documentation available at the Docker website.
JANUARY 2019 9
DIVING DEEP INTO KUBERNETES NETWORKING Interlude: Netfilter and iptables rules
Interlude: Netfilter and Rules in the Filter table control if a packet is allowed or
denied. Packets which are allowed are forwarded whereas
iptables rules packets which are denied are either rejected or silently
dropped.
Raw Table
This table marks packets to bypass the iptables stateful connection tracking.
Security Table
This table sets the SELinux security context marks on packets. Setting the marks affects how SELinux (or systems that can
interpret SELinux security contexts) handle the packets. The rules in this table set marks on a per-packet or per-connection basis.
Netfilter organizes the rules in a table into chains. Chains are the means by which Netfilter hooks in the kernel intercept packets as
they move through processing. Packets flow through one or more chains and exit when they match a rule.
A rule defines a set of conditions, and if the packet matches those conditions, an action is taken. The universe of actions is diverse,
but examples include:
The action that a rule takes is called a target, and represents the decision to accept, drop, or forward the packet.
The system comes with five default chains that match different phases of a packet’s journey through processing: PREROUTING,
INPUT, FORWARD, OUTPUT, and POSTROUTING. Users and programs may create additional chains and inject rules into the system
chains to forward packets to a custom chain for continued processing. This architecture allows the Netfilter configuration to follow
a logical structure, with chains representing groups of related rules.
Docker creates several chains, and it is the actions of these chains that handle communication between containers, the host, and
the outside world.
JANUARY 2019 10
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
Pods
Kubernetes networking builds on top of the In the simplest definition, a Pod encapsulates one or
Docker and Netfilter constructs to tie multiple more containers. Containers in the same Pod always
components together into applications. run on the same host. They share resources such as the
Kubernetes resources have specific names and network namespace and storage.
capabilities, and we want to understand those
before exploring their inner workings. Each Pod has a routable IP address assigned to it, not
to the containers running within it. Having a shared
network space for all containers means that the
containers inside can communicate with one another
over the localhost address, a feature not present in traditional Docker networking.
The most common use of a Pod is to run a single container. Situations where dif ferent processes work on the same
shared resource, such as content in a storage volume, benefit from having multiple containers in a single Pod. Some
projects inject containers into running Pods to deliver a ser vice. An example of this is the Istio ser vice mesh, which
uses this injected container as a proxy for all communication.
Because a Pod is the basic unit of deployment, we can map it to a single instance of an application. For example, a
three-tier application that runs a user interface (UI), a backend, and a database would model the deployment of the
application on Kubernetes with three Pods. If one tier of the application needed to scale, the number of Pods in that
tier could scale accordingly.
Volume
Pod
JANUARY 2019 11
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
Production applications with users The ReplicaSet maintains the desired A DaemonSet runs one copy of the
run more than one instance of the number of copies of a Pod running Pod on each node in the Kubernetes
application. This enables fault tolerance, within the cluster. If a Pod or the host cluster. This workload model provides
where if one instance goes down, another on which it's running fails, Kubernetes the flexibility to run daemon processes
handles the traffic so that users don't launches a replacement. In all cases, such as log management, monitoring,
experience a disruption to the service. Kubernetes works to maintain the storage providers, or network providers
In a traditional model that doesn't use desired state of the ReplicaSet. that handle Pod networking for the
Kubernetes, these types of deployments cluster.
require that an external person or DEPLOYMENT
software monitors the application and STATEFULSET
A Deployment manages a ReplicaSet.
acts accordingly. Although it’s possible to launch A StatefulSet controller ensures that
a ReplicaSet directly or to use a the Pods it manages have durable
Kubernetes recognizes that an
ReplicationController, the use of a storage and persistent identity.
application might have unique
Deployment gives more control over StatefulSets are appropriate for
requirements. Does it need to run on
the rollout strategies of the Pods that situations where Pods have a similar
every host? Does it need to handle
the ReplicaSet controller manages. definition but need a unique identity,
state to avoid data corruption? Can all
By defining the desired states of Pods ordered deployment and scaling,
of its pieces run anywhere, or do they
through a Deployment, users can and storage that persists across Pod
need special scheduling consideration?
perform updates to the image running rescheduling.
To accommodate those situations
within the containers and maintain the
where a default structure won't give
ability to perform rollbacks.
the best results, Kubernetes provides
abstractions for different workload types.
POD NETWORKING
The Pod is the smallest unit in Kubernetes, so it is essential to first understand Kubernetes networking in the context of
communication between Pods. Because a Pod can hold more than one container, we can start with a look at how communication
happens between containers in a Pod. Although Kubernetes can use Docker for the underlying container runtime, its approach to
networking differs slightly and imposes some basic principles:
• Any Pod can communicate with any other Pod without the use of network address translation (NAT). To facilitate
this, Kubernetes assigns each Pod an IP address that is routable within the cluster.
• A Pod's awareness of its address is the same as how other resources see the address. The host's address doesn't
mask it.
These principles give a unique and first-class identity to every Pod in the cluster. Because of this, the networking model is more
straightforward and does not need to include port mapping for the running container workloads. By keeping the model simple,
migrations into a Kubernetes cluster require fewer changes to the container and how it communicates.
JANUARY 2019 12
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
The pause container was initially designed to act as the init process within a PID namespace shared by all containers in the Pod. It
performed the function of reaping zombie processes when a container died. PID namespace sharing is now disabled by default, so
unless it has been explicitly enabled in the kubelet, all containers run their process as PID 1.
If we launch a Pod running Nginx, we can inspect the Docker container running within the Pod.
When we do so, we see that the container does not have the network settings provided to it. The pause container which runs as
part of the Pod is the one which gives the networking constructs to the Pod.
Note: Run the commands below on the host where the nginx Pod is scheduled.
JANUARY 2019 13
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
Kubernetes Service
Pods are ephemeral. The services that they provide may be critical, but because Kubernetes can terminate Pods at any time, they are
unreliable endpoints for direct communication. For example, the number of Pods in a ReplicaSet might change as the Deployment scales
it up or down to accommodate changes in load on the application, and it is unrealistic to expect every client to track these changes while
communicating with the Pods. Instead, Kubernetes offers the Service resource, which provides a stable IP address and balances traffic
across all of the Pods behind it. This abstraction brings stability and a reliable mechanism for communication between microservices.
Services which sit in front of Pods use a selector and labels to find the Pods they manage. All Pods with a label that matches the selector
receive traffic through the Service. Like a traditional load balancer, the service can expose the Pod functionality at any port, irrespective of
the port in use by the Pods themselves.
KUBE-PROXY
The kube-proxy daemon that runs on all nodes of the cluster allows the Service to map traffic from one port to another.
This component configures the Netfilter rules on all of the nodes according to the Service’s definition in the API server. From Kubernetes
1.9 onward it uses the netlink interface to create IPVS rules. These rules direct traffic to the appropriate Pod.
A service definition specifies the type of Service to deploy, with each type of Service having a different set of capabilities.
This type of Service is the default and A Service of type NodePort exposes the When working with a cloud provider for
exists on an IP that is only visible within same port on every node of the cluster. The whom support exists within Kubernetes, a
the cluster. It enables cluster resources range of available ports is a cluster-level Service of type LoadBalancer creates a load
to reach one another via a known address configuration item, and the Service can balancer in that provider's infrastructure.
while maintaining the security boundaries either choose one of the ports at random The exact details of how this happens differ
of the cluster itself. For example, a or have one designated in its configuration. between providers, but all create the load
database used by a backend application This type of Service automatically creates balancer asynchronously and configure it
does not need to be visible outside of the a ClusterIP Service as its target, and the to proxy the request to the corresponding
cluster, so using a service of type ClusterIP ClusterIP Service routes traffic to the Pods. Pods via NodePort and ClusterIP Services
is appropriate. The backend application that it also creates.
would expose an API for interacting with External load balancers frequently use
records in the database, and a frontend NodePort services. They receive traffic for a In a later section, we explore Ingress
application or remote clients would specific site or address and forward it to the Controllers and how to use them to deliver
consume that API. cluster on that specific port. a load balancing solution for a cluster.
JANUARY 2019 14
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
DNS
As we stated above, Pods are ephemeral, and because of this, their IP addresses are not reliable endpoints for communication.
Although Services solve this by providing a stable address in front of a group of Pods, consumers of the Service still want to avoid
using an IP address. Kubernetes solves this by using DNS for service discovery.
The default internal domain name for a cluster is cluster.local. When you create a Service, it assembles a subdomain of
namespace.svc.cluster.local (where namespace is the namespace in which the service is running) and sets its name as the
hostname. For example, if the service was named nginx and ran in the default namespace, consumers of the service would be able
to reach it as nginx.default.svc.cluster.local. If the service's IP changes, the hostname remains the same. There is no
interruption of service.
The default DNS provider for Kubernetes is KubeDNS, but it’s a pluggable component. Beginning with Kubernetes 1.11 CoreDNS is
available as an alternative. In addition to providing the same basic DNS functionality within the cluster, CoreDNS supports a wide
range of plugins to activate additional functionality.
NETWORK POLICY
In an enterprise deployment of Kubernetes the cluster often supports multiple projects with different goals. Each of these projects
has different workloads, and each of these might require a different security policy.
Pods, by default, do not filter incoming traffic. There are no firewall rules for inter-Pod communication. Instead, this responsibility
falls to the NetworkPolicy resource, which uses a specification to define the network rules applied to a set of Pods.
The network policies are defined in Kubernetes, but the CNI plugins that support network policy implementation do the actual
configuration and processing. In a later section, we look at CNI plugins and how they work.
JANUARY 2019 15
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
The policy defined below states that the database Pods can only receive traffic from the Pods with the labels app=myapp
and role=backend. It also defines that the backend Pods can only receive traffic from Pods with the labels app=myapp and
role=web.
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: backend-access-ingress
spec:
podSelector:
matchLabels:
app: myapp
role: backend
ingress:
- from:
- podSelector:
matchLabels:
app: myapp
role: web
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: db-access-ingress
spec:
podSelector:
matchLabels:
app: myapp
role: db
ingress:
- from:
- podSelector:
matchLabels:
app: myapp
role: backend
JANUARY 2019 16
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
backend pod
backend pod
backend pod
With this network policy in place, Kubernetes blocks communication between the web
and database tiers.
PODSELECTOR POLICYTYPES
This field tells Kubernetes how to find the Pods to which this This field defines the direction of network traffic to which the
policy applies. Multiple network policies can select the same set rules apply. If missing, Kubernetes interprets the rules and
of Pods, and the ingress rules are applied sequentially. The field only applies them to ingress traffic unless egress rules also
is not optional, but if the manifest defines a key with no value, it appear in the rules list. This default interpretation simplifies the
applies to all Pods in the namespace. manifest's definition by having it adapt to the rules defined later.
JANUARY 2019 17
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
The following rule permits traffic from the Pods to any address in 10.0.0.0/24 and only on TCP
EGRESS
port 5978:
Rules defined under this field apply
to egress traffic from the selected
egress:
Pods to destinations defined in - to:
the rule. Destinations can be an - ipBlock:
cidr: 10.0.0.0/24
IP block (ipBlock), one or more ports:
Pods (podSelector), one or more - protocol: TCP
port: 5978
namespaces (namespaceSelector), or
a combination of both podSelector and
nameSpaceSelector.
The next rule permits outbound traffic for Pods with the labels app=myapp and role=backend
to any host on TCP or UDP port 53:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-egress-denyall
spec:
podSelector:
matchLabels:
app: myapp
role: backend
policyTypes:
- Egress
egress:
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
Egress rules work best to limit a resource’s communication to the other resources on which it
relies. If those resources are in a specific block of IP addresses, use the ipBlock selector to
target them, specifying the appropriate ports:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-egress-denyall
spec:
podSelector:
matchLabels:
app: myapp
role: backend
policyTypes:
- Egress
egress:
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 3306
JANUARY 2019 18
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
INGRESS ingress:
- from:
Rules listed in this field apply to traffic that is inbound to the
- ipBlock:
selected Pods. If the field is empty, all inbound traffic will be
cidr: 172.17.0.0/16
blocked. The example below permits inbound access from any except:
address in 172.17.0.0/16 unless it’s within 172.17.1.0/24. It also - 172.17.1.0/24
permits traffic from any Pod in the namespace myproject. - namespaceSelector:
matchLabels:
(Note the subtle distinction in how the rules are listed. Because project: myproject
namespaceSelector is a separate item in the list, it matches - podSelector:
with an or value. Had namespaceSelector been listed as matchLabels:
an additional key in the first list item, it would permit traffic role: frontend
that came from the specified ipBlock and was also from the ports:
namespace myproject.) - protocol: TCP
port: 6379
The next policy permits access to the Pods labeled app=myapp kind: NetworkPolicy
and role=web from all sources, external or internal. apiVersion: networking.k8s.io/v1
metadata:
name: web-allow-all-access
spec:
podSelector:
matchLabels:
app: myapp
role: web
ingress:
- from: []
Consider, however, that this allows traffic to any port on those kind: NetworkPolicy
Pods. Even if no other ports are listening, the principle of least apiVersion: networking.k8s.io/v1
privilege states that we only want to expose what we need to metadata:
expose for the services to work. The following modifications to name: web-allow-all-access-specific-port
the NetworkPolicy take this rule into account by only allowing spec:
inbound traffic to the ports where our Service is running. podSelector:
matchLabels:
app: myapp
role: web
ingress:
- ports:
- port: 8080
from: []
JANUARY 2019 19
DIVING DEEP INTO KUBERNETES NETWORKING An Introduction to Kubernetes Networking
Apart from opening incoming traffic on certain ports, you can kind: NetworkPolicy
also enable all traffic from a set of Pods inside the cluster. apiVersion: networking.k8s.io/v1
This enables a few trusted applications to reach out from the metadata:
application on all ports and is especially useful when workloads name: web-allow-internal-port80
in a cluster communicate with each other over many random spec:
ports. The opening of traffic from certain Pods is achieved by podSelector:
using labels as described in the policy below: matchLabels:
app: "myapp"
role: "web"
ingress:
- ports:
- port: 8080
from:
- podSelector:
matchLabels:
app: "mytestapp"
role: "web-test-client"
Even if a Service listens on a different port than where the Pod’s containers listen, use the container ports in the network policy.
Ingress rules affect inter-Pod communication, and the policy does not know about the abstraction of the service.
The specification requires that providers implement their plugin as a binary executable that the container engine invokes.
Kubernetes does this via the Kubelet process running on each node of the cluster.
The CNI specification expects the container runtime to create a new network namespace before invoking the CNI plugin. The
plugin is then responsible for connecting the container’s network with that of the host. It does this by creating the virtual Ethernet
devices that we discussed earlier.
Kubernetes natively supports the CNI model. It gives its users the freedom to choose the network provider or product best suited
for their needs.
To use the CNI plugin, pass --network-plugin=cni to the Kubelet when launching it. If your environment is not using the default
configuration directory (/etc/cni/net.d), pass the correct configuration directory as a value to --cni-conf-dir. The Kubelet
looks for the CNI plugin binary at /opt/cni/bin, but you can specify an alternative location with --cni-bin-dir.
The CNI plugin provides IP address management for the Pods and builds routes for the virtual interfaces. To do this, the plugin
interfaces with an IPAM plugin that is also part of the CNI specification. The IPAM plugin must also be a single executable that the
CNI plugin consumes. The role of the IPAM plugin is to provide to the CNI plugin the gateway, IP subnet, and routes for the Pod.
JANUARY 2019 20
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Flannel
RUNNING FLANNEL
Networking with Flannel WITH KUBERNETES
Flannel Pods roll out as a DaemonSet,
with one Pod assigned to each host.
Flannel is one of the most straightforward network providers for Kubernetes.
To deploy it within Kubernetes, use the
It operates at Layer 3 and offloads the actual packet forwarding to a backend
kube-flannel.yaml manifest from
such as VxLAN or IPSec. It assigns a large network to all hosts in the cluster
the Flannel repository on Github.
and then assigns a portion of that network to each host. Routing between
containers on a host happens via the usual channels, and Flannel handles Once Flannel is running, it is not
routing between hosts using one of its available options. possible to change the network
address space or the backend
Flannel uses etcd to store the map of what network is assigned to which host.
communication format without cluster
The target can be an external deployment of etcd or the one that Kubernetes
downtime.
itself uses.
FLANNEL BACKENDS
VxLAN
VxLAN is the simplest of the officially supported backends for Flannel. Encapsulation happens within the kernel, so there is no
additional overhead caused by moving data between the kernel and user space.
The VxLAN backend creates a Flannel interface on every host. When a container on one node wishes to send traffic to a different
node, the packet goes from the container to the bridge interface in the host’s network namespace. From there the bridge forwards
it to the Flannel interface because the kernel route table designates that this interface is the target for the non-local portion of the
overlay network. The Flannel network driver wraps the packet in a UDP packet and sends it to the target host.
Once it arrives at its destination, the process flows in reverse, with the Flannel driver on the destination host unwrapping the
packet, sending it to the bridge interface, and from there the packet finds its way into the overlay network and to the destination
Pod.
JANUARY 2019 21
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Flannel
JANUARY 2019 22
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
Calico operates at Layer 3 and assigns every workload a The final piece of a Calico deployment is the controller.
routable IP address. It prefers to operate by using BGP without Although presented as a single object, it is a set of controllers
an overlay network for the highest speed and efficiency, but in that run as a control loop within Kubernetes to manage policy,
scenarios where hosts cannot directly communicate with one workload endpoints, and node changes.
another, it can utilize an overlay solution such as VxLAN or IP-
in-IP. • The Policy Controller watches for changes in
the defined network policies and translates
Calico supports network policies for protecting workloads and them into Calico network policies.
nodes from malicious activity or aberrant applications.
• The Profile Controller watches for the addition
or removal of namespaces and programs
The Calico networking Pod contains a CNI container, a
Calico objects called Profiles.
container that runs an agent that tracks Pod deployments and
registers addresses and routes, and a daemon that announces • Calico stores Pod information as workload
the IP and route information to the network via the Border endpoints. The Workload Endpoint Controller
Gateway Protocol (BGP). The BGP daemons build a map of the watches for updates to labels on the Pod and
network that enables cross-host communication. updates the workload endpoints.
Users can manage Calico objects within the Kubernetes cluster via the command-line tool calicoctl. The tool’s only requirement
is that it can reach the Calico datastore.
JANUARY 2019 23
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
Note the IP address and the eth0 interface within the Pod:
In the output below, note that the routing table indicates that a local interface (cali106d129118f) handles traffic for the IP
address of the Pod. The calico-node Pod creates this interface and propagates the routes to other nodes in the cluster.
JANUARY 2019 24
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
Kubernetes scheduled our Pod to run on k8s-n-1. If we look at the route table on the other two nodes, we see that each directs
192.168.2.0/24 to 70.0.80.117, which is the address of k8s-n-1.
Before we can use a route reflector, we first have to disable the default node-to-node BGP peering in the Calico configuration.
JANUARY 2019 25
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: false
asNumber: 63400
Next, use calicoctl to show the autonomous system number (ASN) for each node in the Kubernetes cluster.
The calico-node Pods use one of two methods to build the peering relationship with external peers: global peering or per-node
peering.
Route Reflector
JANUARY 2019 26
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
If the network has a device that we want to have all of the nodes peer with, we can create a global BGPPeer resource within the
cluster. Doing it this way assures that we only have to create the configuration once for it to be applied correctly everywhere.
Use the ASN retrieved above and the IP of the external peer.
You can view the current list of BGP Peers with the following:
JANUARY 2019 27
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
To create a network topology where only a subset of nodes peers with certain external devices, we create a per-node BGPPeer
resource within the cluster.
As before, use the ASN for the Calico network and the IP of the BGP peer. Specify the node to which this configuration applies.
You can remove a per-node BGP peer or view the current per-node configuration with calicoctl:
JANUARY 2019 28
DIVING DEEP INTO KUBERNETES NETWORKING Networking with Calico
USING IP-IN-IP
If we’re unable to use BGP, perhaps because we’re using a cloud provider or another environment where we have limited control
over the network or no permission to peer with other routers, Calico's IP-in-IP mode encapsulates packets before sending them to
other nodes.
To enable this mode, define the ipipMode field on the IPPool resource:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: project1IPPool
spec:
cidr: 10.11.12.0/16
ipipMode: CrossSubnet
natOutgoing: true
After activating IP-in-IP, Calico wraps inter-Pod packets in a new packet with headers that indicate the source of the packet is the
host with the originating Pod, and the target of the packet is the host with the destination Pod. The Linux kernel performs this
encapsulation and then forwards the packet to the destination host where it is unwrapped and delivered to the destination Pod.
For the CrossSubnet mode to work, each Calico node must use the IP address and subnet mask for the host. For more information
on this, see the Calico documentation for IP-in-IP.
JANUARY 2019 29
DIVING DEEP INTO KUBERNETES NETWORKING Combining Flannel and Calico (Canal)
For some time an effort to integrate Flannel's easy overlay networking engine and Calico's network policy enforcement ran
under the project name Canal. The maintainers deprecated it as a separate project, and instead, the Calico documentation
contains instructions on deploying Flannel and Calico together.
They only abandoned the name and status; the result remains the same. Flannel provides an overlay network using
one of its backends, and Calico provides granular access control to the running workloads with its network policy
implementation.
Calico & Flannel Networking BGP IPIP Native UDP VXLAN ...
JANUARY 2019 30
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
Host
Request A
Host
Request C
JANUARY 2019 31
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
SSL/TLS Termination
The overhead of encrypting and decrypting data can impact the performance of a backend, so deployments frequently move this
work to the load balancer. Encrypted traffic lands on the load balancer, which decrypts it and forwards it to a backend. By operating
with a decrypted data stream, the load balancer can make informed decisions about how to route the data because it’s now able to
see more than the basic metadata present in the flow.
www-1
Neutral Component
CLI
user www-2
www-backend
Load Balancer
1 Container
Active
web.example.com
Letschat1 Letschat2
2 Containers 2 Containers
Active Active
Mongo
1 Container web.example.com/support web.example.com/career
Active
JANUARY 2019 32
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
v1 v1 v1
JANUARY 2019 33
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
Before
Load Balancer
Load Balancer
Load Balancer
After
Load Balancer
JANUARY 2019 34
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
Host
Client
JANUARY 2019 35
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
A load balancer that works at Layer 4 only routes traffic based on the TCP or UDP port. It does not look inside the packets or the
data stream to make any decisions.
A Kubernetes Service of the type LoadBalancer creates a Layer 4 load balancer outside of the cluster, but it only does this if the
cluster knows how. External load balancers require that the cluster use a supported cloud provider in its configuration and that the
configuration for the cloud provider includes the relevant access credentials when required.
Once created, the Status field of the service shows the address of the external load balancer.
Workload
JANUARY 2019 36
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
The following manifest creates an external Layer 4 load Because a Layer 4 load balancer does not look into the packet
balancer: stream, it only has basic capabilities. If a site runs multiple
applications, every one of them requires an external load
balancer. Escalating costs make that scenario inefficient.
kind: Service
apiVersion: v1
Furthermore, because the LoadBalancer Service type
metadata:
requires a supported external cloud provider, and because
name: my-service
Kubernetes only supports a small number of providers, many
spec:
selector: sites instead choose to run a Layer 7 load balancer inside of the
app: MyApp cluster.
ports:
- protocol: TCP
port: 80
targetPort: 9376
clusterIP: 10.0.171.239
loadBalancerIP: 78.11.24.19
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: 146.148.47.155
LAYER 7 The following manifest defines an Ingress for the site foo.bar.com, sending /foo to
the s1 Service and /bar to the s2 Service:
The Kubernetes resource that handles
load balancing at Layer 7 is called
an Ingress, and the component that apiVersion: extensions/v1beta1
kind: Ingress
creates Ingresses is known as an Ingress
metadata:
Controller.
name: test
annotations:
The Ingress Resource nginx.ingress.Kubernetes.io/rewrite-target: /
The Ingress resource defines the rules spec:
rules:
and routing for a particular application.
- host: foo.bar.com
Any number of Ingresses can exist within
http:
a cluster, each using a combination of
paths:
host, path, or other rules to send traffic to
- path: /foo
a Service and then on to the Pods. backend:
serviceName: s1
servicePort: 80
- path: /bar
backend:
serviceName: s2
servicePort: 80
JANUARY 2019 37
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
An Ingress Controller is a daemon, deployed as a Kubernetes pod, that listens for requests to create or modify Ingresses within the
cluster and converts the rules in the manifests into configuration directives for a load balancing component. That component is
either a software load balancer such as Nginx, HAProxy, or Traefik, or it’s an external load balancer such as an Amazon ALB or an F5
Big/IP.
When working with an external load balancer the Ingress Controller is a lightweight component that translates the Ingress
resource definitions from the cluster into API calls that configure the external piece.
userdomain.com/website userdomain.com/chat
In the case of internal software load balancers, the Ingress Controller combines the management and load balancing components
into one piece. It uses the instructions in the Ingress resource to reconfigure itself.
JANUARY 2019 38
DIVING DEEP INTO KUBERNETES NETWORKING Load Balancers and Ingress Controllers
The following diagram shows a Nginx Ingress Controller working within a cluster.
Nginx Daemonset
Node Node
Kubernetes uses annotations to control the behavior of the Ingress Controller. Each controller has a list of accepted annotations,
and their use activates advanced features such as canary deployments, default backends, timeouts, redirects, CORS configuration,
and more.
JANUARY 2019 39
DIVING DEEP INTO KUBERNETES NETWORKING Conclusion
Conclusion
JANUARY 2019 40