Kubernetes - Utilising tmpfs volumes

There are multiple choices for volume types in Kubernetes land, from persistent storage volumes that can be backed by a cloud service like EFS, to non-persistent storage types like ephemeral volumes that disappear with your pod or the worker node backing it.

‘tmpfs is a file system which keeps all files in virtual memory’ kernel.org

If you’re after fast volatile storage, then tmpfs might be the way, but some caveats should be considered before utilising it for your projects. The most obvious one is that tmpfs won’t persist across restarts or crashes, which is slightly different from ephemeral volumes in Kubernetes which will persist until the end of the pod’s lifecycle.

A typical use case for such fast volatile volumes, and what we at Reecetech investigated, is to back a database temp space, which is often used for temporary tables created during complex queries. While we didn’t necessarily get the results we were after that prompted this use case, we did learn a bit about how tmpfs volumes are created and function within Kubernetes pods.

Preparing tmpfs in Kubernetes

A tmpfs volume can be explicitly created through the emptyDir volume type and setting the medium field to memory. This is to define what storage type on the scheduled node will back the emptyDir storage.

volumes:
- name: tmpfs-volume
  emptyDir:
    medium: memory # Memory type storage

One of the unclear caveats for a tmpfs volume in Kubernetes is the correlation between the size of this volume and the memory available on the node that the pod is to be scheduled on.

Firstly, the maximum storage space that can be allocated to this tmpfs volume is determined by the memory limits set for the container in the pod specification. This limit is crucial to remember as it defines the upper boundary for the tmpfs volume size and isn’t quite clearly defined as in other volume types. Exceeding this limit can lead to Out of Memory (OOM) issues, thus restarting your pod and clearing the tmpfs volume contents.

Secondly, it’s important to note that the memory used for this tmpfs volume is shared with the container’s OS and not specifically allocated / restricted to it. If this volume becomes too large, it will starve other processes running in the container, including the OS itself, leading to OOM issues. This shared usage means that the allocation of memory to the tmpfs volume can have a direct impact on the container’s overall performance and stability, requiring you to find the perfect balance between the application and the allocated volume.

Within the pod specification we define the volume, there exists another key that we can set on our tmpfs volume called sizeLimit. As the name suggests, we can set a limit to how large this tmpfs volume can get, and essentially prevent this volume from consuming more memory than we expect.

volumes:
- name: tmpfs-volume
  emptyDir:
    medium: memory # Memory type storage
    sizeLimit: 200Mi

Note, however, that the memory backing the volume is still shared with the OS and there is no concept of this sizeLimit separation from the operating system’s perspective other than setting a limit on the volume size.

This means that files created in this tmpfs space will still take memory from the OS and still create the potential to starve resources from other processes. Processes running in the container can still utilise all memory up to the container memory limits defined in the Kubernetes YAML, including the space reserved for the tmpfs volume.

Now let’s take a look at how to get a deployment with a container that contains tmpfs storage running in a Kubernetes environment to play around with.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-super-cool-app
  labels:
    app: 'my-super-cool-app'
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-super-cool-app
  template:
    metadata:
      labels:
        app: 'my-super-cool-app'
    spec:
      containers:
        - name: my-super-cool-app-container
          command:
            - "sh"
            - "-c"
            - "while true; do sleep 6000; done"
          image: busybox
          volumeMounts:
            - mountPath: /tmpfs-storage 
              name: tmpfs-storage # Referencing the tmpfs volume
          resources:
            limits:
              cpu: 1
              memory: 400Mi       # Defining a maximum
            requests:
              cpu: 0.1
              memory: 400Mi
      volumes:
        - emptyDir: 
            medium: Memory        # tmpfs storage
            sizeLimit: 200Mi      # Restricting storage size
          name: tmpfs-storage

We’ve now got a deployment YAML set up so that a busybox pod will spin up with a 400Mi memory limit and a tmpfs volume attached at the directory /tmpfs-storage.

Using a local Kubernetes cluster (like docker desktop’s built-in Kubernetes node) let’s deploy it!

❯ kubectl apply -f tmpfs-deployment.yaml
deployment.apps/my-super-cool-app created
❯ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
my-super-cool-app-6bd6bd7bfb-g87k9   1/1     Running   0          41s

So now that we’re up and running, we can exec into the container and see what the current memory max and current usage are like.

❯ kubectl exec -it --tty my-super-cool-app-6bd6bd7bfb-g87k9 -- sh
/ # cd tmpfs-storage/
/tmpfs-storage # cat /sys/fs/cgroup/memory.current
1097728
/tmpfs-storage # cat /sys/fs/cgroup/memory.max
419430400

Depending on what container runtime you’re using, the filenames might be different. For example, in containerd, the current and max memory usage files might live at cat /sys/fs/cgroup/memory/memory.usage_in_bytes and cat /sys/fs/cgroup/memory/memory.limit_in_bytes respectively.

Looking at the results from the series of commands, it lines up with what was defined in the Kubernetes deployment YAML, with roughly 400MB of maximum memory for the container and barely any memory usage currently.

tmpfs in Action

Let’s simulate a process using some memory with the following head command and quickly check the contents of the current memory usage file.

/tmpfs-storage # head -c 250m /dev/zero | tail &
/tmpfs-storage # ps -ef
PID   USER     TIME  COMMAND
    1 root      0:00 sh -c while true; do sleep 6000; done
   13 root      0:00 sh
   41 root      0:02 head -c 250m /dev/zero
   42 root      0:00 tail
   43 root      0:00 ps -ef
/tmpfs-storage # cat /sys/fs/cgroup/memory.current
233738240
/tmpfs-storage # cat /sys/fs/cgroup/memory.current
248315904

Now we have a quick way to generate a process with specific memory usage. How about a quick way to allocate space and mock some files being created in our new tmpfs space?

/tmpfs-storage # cat /sys/fs/cgroup/memory.current
1052672
/tmpfs-storage # fallocate -l 100m /tmpfs-storage/example
/tmpfs-storage # ls -al
total 102404
drwxrwxrwt    2 root     root            60 Dec 27 23:33 .
drwxr-xr-x    1 root     root          4096 Dec 27 04:55 ..
-rw-r--r--    1 root     root     104857600 Dec 27 23:33 example
/tmpfs-storage # cat /sys/fs/cgroup/memory.current
106004480

We’ve allocated some space on our tmpfs file system to a file called example, and we can notice that when we check the current memory usage within the container, it has jumped from our usual low idle of a few megabytes to over 100MB to match the usage of our tmpfs space.

We can see that by using the df command as well:

/tmpfs-storage # df /tmpfs-storage/
Filesystem           1K-blocks      Used Available Use% Mounted on
tmpfs                   204800    102400    102400  50% /tmpfs-storage

Since we have set the limit of our tmpfs space to be 200MB, when we try to allocate another file that would bring us over that limit, we should be blocked from it right?

/tmpfs-storage # cat /sys/fs/cgroup/memory.current
106004480
/tmpfs-storage # fallocate -l 150m /tmpfs-storage/example2
fallocate: fallocate '/tmpfs-storage/example2': No space left on device

As expected from the sizeLimit we defined in the Kubernetes YAML, we get an error preventing us from allocating more space than what we defined. Seems pretty straight-forward where we can see the relationship between tmpfs, the limits we set on the volume with sizeLimit and the limits we set for the entire container.

Now let’s flip the script and suggest that we have a memory-intensive container that has a tmpfs volume and we try allocating space on the volume while low on memory.

/tmpfs-storage # cat /sys/fs/cgroup/memory.current
1060864
/tmpfs-storage # head -c 300m /dev/zero | tail &
/tmpfs-storage # fallocate -l 200m /tmpfs-storage/example
/tmpfs-storage # cat /sys/fs/cgroup/memory.current
377839616
[1]+  Killed                     head -c 300m /dev/zero | tail

The memory usage climbs after running our test process, it eventually reaches out of memory and kills the process as the tmpfs file space has priority over memory.

You can imagine that if this was an integral process to a pod it could fail the liveness check in Kubernetes and trigger a restart of the entire container. To manage this, it would require some effort to calculate the balance between the memory requirements of your processes, OS and the tmpfs space within your container such that they won’t conflict and cause the pod to get killed to out of memory issues.

What did we learn?

Setting up a tmpfs volume is fairly straight-forward but there are some caveats to consider when using it specifically in a Kubernetes environment and particularly with production workloads. If there are any takeaways from this, it’s these few points: