K8S Container Runtime Evolution
With Docker
In Kubernetes v1.20 release note, the major change is to deprecate dockershim, which means, K8S will never use Docker as container runtime. The Kubernetes community has written a blog post about this in detail.
The docker runtime is just one component of Docker suite. Developers still use Docker to compile a docker image, and use docker hub to store docker images, as a docker image repository.
Before Kubernetes v1.20, the most popular integration between K8S and Docker is shown in the following diagram.
When receiving a request to create a container, kubelet will call dockershim via CRI (Container Runtime Interface) by gRPC. kubelet is CRI client and dockershim is CRI server.
After dockershim receives request, it sends request to Docker daemon to create a container. The actual process to response this request is containerd
.
After containerd
receives the request, it creates a containerd-shim
process. This process will be parent process of the container runtime. It avoids to restart all containers if containerd
restarts.
Following OCI (Open Container Initiative) standard, runC
will handle namespace creation, handle cgroups and load root file system etc. The containerd-shim
is actually calling runC
to create a container. After container is created, runC
exit, so that containerd-shim
will be the parent process of the container.
Without Docker
OCI and CRI
CRI and OCI are actually two abstraction in K8S runtime architect.
Orchestration API -> Container API (cri-runtime) -> Kernel API (oci-runtime)
CRI is a set of gRPC interface, that we can reference from kubelet/apis/cri/services.go. It contains:
- ContainerManager interface: create, stop containers
- ImageManagerService interface: pull or delete images
- PodSandboxManager interface: PodSandbox is an abstraction of a pod. When kubelet creates a pod, it creates an
infra
container as an container to maintain all namespaces and cgroups inside pod. Kubelet creates each container based on the pod sandbox environment. - etc.
There are two important specification defined in OCI.
- It defines the image specification, ImageSpec.
- It also defines the instruction that container recognizes, including
create
,start
,stop
anddelete
.
Containerd and CRI-O
In containerd
1.1, integrated CRI-plugin into containerd
process.
However, there is a better cri-runtime: CRI-O, purely for K8S runtime to integrate CRI and OCI.
The conmon
replaced containterd-shim
. Compared with containerd
solution, CRI-O is more clean and straight forward. However, most production still use containerd
as K8S runtime.
New OCI Implementation: Kata Runtime
The default oci-runtime is runC
. All containers created by runC
are sharing the same kernel. If we want to provide multi-tenant feature in K8S, so that, each tenant’s container only use their resource without sharing resource with other tenants. It also increase the security for each tenant.
Kata container provides a solution: run each container on the top of VM.
For Kata Container, after RunPodSandbox
creates a VM, the container will be added to the VM.