Migrating Kubernetes from Docker to Containerd

Overview

Reece operates multiple on-premise and cloud hosted K8s clusters for many years and we heavily utilise docker as our container runtime for master and worker nodes.

As most readers would be aware by now, the Kubernetes update to 1.20 also announced the deprecation and future removal of the much loved docker interface.

This post documents our journey from docker to a suitable replacement option.

Options

The two most obvious alternatives are cri-o and containerd. As containerd is the default for many cloud based K8s environments and containerd was used behind the scenes by our K8s docker layer already anyway, the choice was quite easy.

Changes required

The main change (for K8s 1.19.5) was to install containerd instead of dockerd and then start kubelet with additional --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock command line options.

The dedicated /var/lib/docker volume has been renamed and remounted as /var/lib/containerd instead. We also added additional dedicated volumes for /var/lib/kubelet and /var/log as disk space usage for these directories increased somewhat after the migration.

A new version of crictl was required as well as changes to /etc/crictl.yaml to add the default cri runtime via runtime-endpoint: unix:///run/containerd/containerd.sock.

Rather than running docker system prune -a -f on each worker periodically, we are now using the below script on each new containerd node to remove unused images and clean up old containers.

for id in `crictl ps -a | grep -i exited | awk '{print $1}'`; do crictl rm $id ; done ; crictl rmi --prune` 

For CentOS7 kernels, an additional kernel parameter was required as we experienced random problems such as cannot allocate memory errors when kubelet was starting up new pods, especially for Kubernetes cronjobs. This lead to having quite a few pods hanging in ContainerCreating state, obviously not ideal. Adding the option cgroup.memory=nokmem to the kernel fixed the issue for us.

Logging

Our logging pipeline has changed a bit over the years and the solution prior to migrating to containerd was a modified version of the https://github.com/looplab/logspout-logstash daemonset. Each pod was reading logs from journald, enriching them with cluster and docker metadata and forwarding them to logstash located on each of our Elasticsearch servers. This solution was not perfect as logspout-logstash sometimes lost network connectivity to logstash without recovering and the combination of docker-ce and journald added quite some extra load to each worker.

The new logging solution with containerd employs fluent-bit to tail container logs from /var/log/containers/ and to send them including K8s labels straight to Elasticsearch. Fluent-bit also filters out some unnecessary logging such as health checks.

Familiarity

After years of using familiar docker commands, we suddenly found ourselves learning and using ctr and crictl. We introduced a temporary docker shell script wrapper which runs the equivalent ctr and crictl commands for troubleshooting tasks such as docker images, docker ps or docker rm to name a few.

Other side effects

The average worker load has decreased considerably - most likely due to migrating away from journald. Also having pods logs in the local /var/log/containers directory made debugging (especially for the logging pipeline) somewhat easier.

Conclusion

The actual changes required were quite small, however it forced quite a big change to our logging infrastructure and also required additional monitoring for each worker in order to be fit for production workloads.

Links and further reading

https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/ https://acloudguru.com/blog/engineering/kubernetes-is-deprecating-docker-what-you-need-to-know