Cgroups: Container Resource Limitation

Posted by Henry Du on Saturday, November 20, 2021

Container Resource Limitation

Cgroups

A container is running as a single process in a host. If multiple containers are running in the same host, there is a chance that one container will occupy the full CPU, while others are in the starvation situation. Therefore, it would be better to have a resource limitation, such that, one container will not fully share the host resources, such as CPU, memory etc.

Linux Cgroups is the feature to limit resources for a process in Linux kernel. Engineers at Google (primarily Paul Menage and Rohit Seth) started the work on this feature in 2006 under the name “process containers”.

Linux Cgroups is Linux Control Group. The main purpose of Cgroups is to set the upper bound limit for a group of processes to use resources, including CPU, memory, disk I/O, network I/O etc. In addition, Cgroups provides the process priority control, auditing, and process suspension and wakeup operation.

The interface that users are able to configure Cgroups is the file system /sys/fs/cgroup. For example, in the container with Ubuntu based image, we could use mount -t cgroup to list all resources with the file interface.

Let’s run a Ngnix container with memory and CPU limitation.

docker run -m 512m --cpu-period=100000 --cpu-quota=20000 nginx

Then, we enter the container by using interactive mode to see the list of Cgroups.

> docker exec -ti 31cd0694017a bash

root@31cd0694017a:/# mount -t cgroup
cpuset on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cpu on /sys/fs/cgroup/cpu type cgroup (ro,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
memory on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
net_cls on /sys/fs/cgroup/net_cls type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /sys/fs/cgroup/net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)
pids on /sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids)
rdma on /sys/fs/cgroup/rdma type cgroup (ro,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,name=systemd)

  • blkio: set the limitation for I/O device, such as Disk.
  • cpuset: allocate a set of CPU cores and memory for the process.
  • memory: set the memory usage limitation for the process.

Since we set memory limitation is 512m, which equals 512 x 1024 x 1024 = 536870912, we could see memory.limit_in_bytes file has been set by this value.

root@31cd0694017a:/sys/fs/cgroup/memory# cat memory.limit_in_bytes
536870912

We also can verify Cgroups CPU subsystem for the setting of cfs_period and cfs_quota. These two settings are used to limit the process CPU time (cfs_quota) in the given period.

root@6e63e2bea3ac:/sys/fs/cgroup/cpu# cat cpu.cfs_period_us
100000
root@6e63e2bea3ac:/sys/fs/cgroup/cpu# cat cpu.cfs_quota_us
20000

It means, during the 100 milliseconds period, the container process should only use 20 milliseconds CPU time. More specifically, the container process should use 20% CPU bandwidth.

Let’s run a dead-loop shell script to make CPU 100%.

while true; do : ; done &
[1] 532

As we can see from htop, the dead-loop shell script only take roughly 20% CPU for one core. It concludes that, Linux Cgroups provides the file system interface with a group of resource definition files for each subsystem.

The following two sections give the example of resource limitation definition for Docker Compose and Kubernetes.

Docker-Compose

The following is the example of docker-compose yaml definition, which gives the Nginx service limit of half of CPU and 512 megabytes of memory, and reservation of a quarter of CPU and 128 megabytes of memory.

service:
  image: nginx
  mem_limit: 512m
  mem_reservation: 128M
  cpus: 0.5
  ports:
    - "80:80"

K8S Deployment

The following is the example of K8S Deployment yaml definition, which gives the Nginx container a request of 0.25 CPU and 64MiB of memory. The Container has a limit of 0.5 CPU and 128MiB of memory.

spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Reference

Deep Dive Kubernetes: Lei Zhang, a TOC member of CNCF.