K3S Supports CNI and Flannel Plugin

Posted by Henry Du on Monday, November 16, 2020

K3S Supports Container Network Interface (CNI) and Flannel

Introduction

Kubernetes network provide the following features:

  • Pods can communicate directly with all other pods on all nodes (no NAT).
  • All nodes agents can communicate with other pods on all nodes (no NAT).
  • The IP that a container sees itself as is the same IP that others see it as.

This article explains how every pod is assigned an IP address managed by K3S.

K3S

K3S is a lightweight kubernetes built for IoT and edge computing, provided by the company Rancher. The brief how it works diagram is from k3s.io

Container Network Interface (CNI)

Container network interface defines how executable plugins can be used to configure network interfaces for Linux application containers. The CNI git repo explains more implementation specifications. Infoblox also has a contribution years ago.

Flannel

Flannel is a very simple overlay network (VXLAN) that satisfies the Kubernetes requirements. Flannel CNI project is the one of the projects developed by CoreOS. Even though CoreOS has stopped maintenance, Flannel is still the one option of Kubernetes network CNI solution. K3S integrated Flannel as one option of CNI. By default, K3S will run with flannel as the CNI, using VXLAN as the default backend. Flannel is running as backend go routine when K3S starts.

When Flannel is running, it creates a network device flannel.1 as VTEP (VXLAN Tunnel End Point) device.

flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default
    link/ether aa:3b:fb:2a:3b:57 brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 1 local 172.28.5.61 dev eth0 srcport 0 0 dstport 8472 nolearning ageing 300
    inet 10.42.0.0/32 brd 10.42.0.0 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::a83b:fbff:fe2a:3b57/64 scope link
       valid_lft forever preferred_lft forever

Invoke Procedures

Kubelet Calls CRI

When a pod scheduled on the node, Kubelet calls container runtime through the CRI plugin. CRI plugin will have a function call to containerd. Containerd starts runtime containerd-shim, which creates pod sandbox ID and pod network namespace.

containerd-shim

According to Michael Crosby, the shim allows for daemonless containers. It basically sits as the parent of the container’s process to facilitate a few things.

  1. It allows the runtimes, i.e. docker-runc, to exit after it starts the container.
  2. it keeps the STDIO and other FDs open for the container in case containerd and/or docker both die.
  3. it allows the container’s exit status to be reported back to a higher level tool like docker without having the be the actual parent of the container’s process and do a wait4.

The following ps output shows how docker-containerd and docker-containerd-shim work together.

ps fxa | grep containerd
 1247 ?        Ssl   92:42  \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-shim --runtime docker-runc
19543 ?        Sl     0:01      \_ docker-containerd-shim a73febba6926a03298160f95cbd7fca3bc4f750d0f6aeba9d69e09cd3b8b935d /var/run/docker/libcontainerd/a73febba6926a03298160f95cbd7fca3bc4f750d0f6aeba9d69e09cd3b8b935d docker-runc
19595 ?        Sl     0:01      \_ docker-containerd-shim 2a701f0fa57a88eb63c69b4fd90bde1a959ba310aaf604a202bf887b7f838333 /var/run/docker/libcontainerd/2a701f0fa57a88eb63c69b4fd90bde1a959ba310aaf604a202bf887b7f838333 docker-runc

Containerd-shim calls CNI

When a containerd-shim start running, it will check if node config sets Flannel as default CNI.

CNI Bridge Config File

If node config set Flannel as default CNI, then it creates CNI config file as below. CRI plugin creates a network namespace for the pod and calls CNI plugin with the CNI config file.

$ cat /var/lib/rancher/k3s/agent/etc/cni/net.d/10-flannel.conflist
{
  "name":"cbr0",
  "cniVersion":"0.3.1",
  "plugins":[
    {
      "type":"flannel",
      "delegate":{
        "hairpinMode":true,
        "forceAddress":true,
        "isDefaultGateway":true
      }
    },
    {
      "type":"portmap",
      "capabilities":{
        "portMappings":true
      }
    }
  ]
}

As can be seen from above, the flannel plugin, by default, will delegate to the bridge plugin.

Flannel Subnet Config

When Flannel start running, it fetches node podCIDR for the node and other cluster network metadata form apiserver and writes it to subnet.env file.

$ cat /run/flannel/subnet.env

FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.0.1/24
FLANNEL_MTU=8951
FLANNEL_IPMASQ=true

Flannel Host-local CNI Config

Flannel CNI Plugin configures and calls bridge CNI plugin. Bridge CNI plugin configuration:

$ cat /var/lib/cni/flannel/354d4bfb4bf04ca26bd8005cc8df2f4d9d52cd137b6ad850663a6a0a5d45fc09 | jq
{
  "cniVersion": "0.3.1",
  "forceAddress": true,
  "hairpinMode": true,
  "ipMasq": false,
  "ipam": {
    "routes": [
      {
        "dst": "10.42.0.0/16"
      }
    ],
    "subnet": "10.42.0.0/24",
    "type": "host-local"
  },
  "isDefaultGateway": true,
  "isGateway": true,
  "mtu": 8951,
  "name": "cbr0",
  "type": "bridge"
}

The Final Network Config

The final network configuration for one container is stored in the cache folder /var/lib/cni/cache.

cat cbr0-da976c72d909d073027ba3ca13a9674c4152f63798ad9a75583f58073f31bca4-eth0 | jq
{
  "cniVersion": "0.3.1",
  "interfaces": [
    {
      "name": "cni0",
      "mac": "d2:02:bf:7f:30:95"
    },
    {
      "name": "vetha6b08aad",
      "mac": "3e:ed:92:7a:75:9c"
    },
    {
      "name": "eth0",
      "mac": "4e:d3:26:a5:27:f8",
      "sandbox": "/proc/4502/ns/net"
    }
  ],
  "ips": [
    {
      "version": "4",
      "interface": 2,
      "address": "10.42.0.24/24",
      "gateway": "10.42.0.1"
    }
  ],
  "routes": [
    {
      "dst": "10.42.0.0/16"
    },
    {
      "dst": "0.0.0.0/0",
      "gw": "10.42.0.1"
    }
  ],
  "dns": {}
}

According to the configuration above, the bridge cni0 is created.

cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP group default qlen 1000
    link/ether 52:9b:f3:72:2c:93 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
    inet 10.42.0.1/24 brd 10.42.0.255 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::509b:f3ff:fe72:2c93/64 scope link
       valid_lft forever preferred_lft forever

Conclusion

Without Flanneld running at backend, K3S integrated Flannel implementations and start it after containerd-shim starts, as a back-end go routine. K3S manages the CNI plugin config file and finally generated network config file.