K8S Runtime: With or Without Docker

Posted by Henry Du on Saturday, December 12, 2020

K8S Container Runtime Evolution

With Docker

In Kubernetes v1.20 release note, the major change is to deprecate dockershim, which means, K8S will never use Docker as container runtime. The Kubernetes community has written a blog post about this in detail.

The docker runtime is just one component of Docker suite. Developers still use Docker to compile a docker image, and use docker hub to store docker images, as a docker image repository.

Before Kubernetes v1.20, the most popular integration between K8S and Docker is shown in the following diagram.

When receiving a request to create a container, kubelet will call dockershim via CRI (Container Runtime Interface) by gRPC. kubelet is CRI client and dockershim is CRI server.

After dockershim receives request, it sends request to Docker daemon to create a container. The actual process to response this request is containerd.

After containerd receives the request, it creates a containerd-shim process. This process will be parent process of the container runtime. It avoids to restart all containers if containerd restarts.

Following OCI (Open Container Initiative) standard, runC will handle namespace creation, handle cgroups and load root file system etc. The containerd-shim is actually calling runC to create a container. After container is created, runC exit, so that containerd-shim will be the parent process of the container.

Without Docker

OCI and CRI

CRI and OCI are actually two abstraction in K8S runtime architect.

Orchestration API -> Container API (cri-runtime) -> Kernel API (oci-runtime)

CRI is a set of gRPC interface, that we can reference from kubelet/apis/cri/services.go. It contains:

  • ContainerManager interface: create, stop containers
  • ImageManagerService interface: pull or delete images
  • PodSandboxManager interface: PodSandbox is an abstraction of a pod. When kubelet creates a pod, it creates an infra container as an container to maintain all namespaces and cgroups inside pod. Kubelet creates each container based on the pod sandbox environment.
  • etc.

There are two important specification defined in OCI.

  • It defines the image specification, ImageSpec.
  • It also defines the instruction that container recognizes, including create, start, stop and delete.

Containerd and CRI-O

In containerd 1.1, integrated CRI-plugin into containerd process.

However, there is a better cri-runtime: CRI-O, purely for K8S runtime to integrate CRI and OCI.

The conmon replaced containterd-shim. Compared with containerd solution, CRI-O is more clean and straight forward. However, most production still use containerd as K8S runtime.

New OCI Implementation: Kata Runtime

The default oci-runtime is runC. All containers created by runC are sharing the same kernel. If we want to provide multi-tenant feature in K8S, so that, each tenant’s container only use their resource without sharing resource with other tenants. It also increase the security for each tenant.

Kata container provides a solution: run each container on the top of VM.

For Kata Container, after RunPodSandbox creates a VM, the container will be added to the VM.