on
Terminating pods can still receive traffic via a service
Based on searching The Internet and asking GitHub Copilot, I have concluded that it is a little-known
fact that terminating pods can still receive traffic from a Kubernetes Service
(on a sufficiently recent release of Kubernetes).
What do I mean by sufficiently recent? I mean Kubernetes v1.26 (December 2022) or beyond.
Conventional wisdom
Conventional wisdom is that a Service
routes traffic to pods listed in the subsets
of an Endpoint
object.
However, since Kubernetes 1.19 (August 2020) this is no longer the case! A Service
actually routes traffic to
pods listed in EndpointSlice
objects! In case you missed it, it’s there in the
CHANGELOG for v1.19.
I’ve taken it out of context, but this statement in the CHANGELOG rings true to me:
mostly be an invisible change
If you haven’t recalled the “mostly invisible” change introduced in v1.19, you might notice that traffic is flowing to
terminating pods, despite the output of kubectl
looking somewhat like this:
# kubectl get pods -n the-namespace | grep the-app
the-app-5968b5d75f-5jdjs 2/2 Terminating 0 3h49m
the-app-5968b5d75f-7sfpk 2/2 Terminating 0 124m
the-app-5968b5d75f-8fmpm 2/2 Terminating 0 4h58m
the-app-5968b5d75f-92qdb 2/2 Terminating 0 14h
the-app-5968b5d75f-bxg95 2/2 Terminating 0 4h16m
the-app-5968b5d75f-dzgxc 2/2 Terminating 0 14h
the-app-5968b5d75f-k6684 2/2 Terminating 0 100m
# kubectl get ep -n the-namespace the-endpoint
NAME ENDPOINTS AGE
the-endpoint <none> 47d
Endpoints are <none>
, but traffic is flowing? Better have a look at the YAML output to double check:
# kubectl get ep -n the-namespace the-endpoint -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: "2024-03-13T05:04:51Z"
name: the-endpoint
namespace: the-namespace
resourceVersion: "2137011176"
uid: f5ae4db1-3764-406a-8352-d7270bdf412d
Yup, there is definitely not a subsets
array. “Mostly invisible” is right, for all the wrong reasons!
However, our trusty curl
requests to the Service
are definitely working, so what is going on?
Ready vs Serving
EndpointSlices
introduced a serving
condition in Kubernetes v1.20 (December 2020).
The documentation for serving
has a description that reminds me of the “mostly invisible”
description of EndpointSlices
themselves:
The
serving
condition is almost identical to theready
condition
Great, what is the difference then? The difference is that terminating pods can never be ready
, but they can
be serving
. The documentation goes on to say:
The difference is that consumers of the EndpointSlice API should check the
serving
condition if they care about pod readiness while the pod is also terminating.
Ah huh! So that seems interesting. What would care about pod readiness whilst terminating? All signs pointed to
kube-proxy
.
kube-proxy
Trusty old kube-proxy
is the component that creates IPtables rules (in EKS) or IPVS rules
(in our self-hosted clusters) to route Service
traffic to pods. And now we know that since Kubernetes v1.19, it uses
EndpointSlices
to determine which pods to route traffic to. It seems that kube-proxy
also cares about the
serving
condition.
To confirm that, I increased the verbosity of kube-proxy
from --v=2
to --v=9
. To create a surprising example of
something I would not expect to work, but does, I setup a pod to be not ready and terminating. Then I transitioned
the pod back to ready (but still terminating) and found the following logs:
I0430 06:41:58.927894 1 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0430 06:41:58.928139 1 endpointslicecache.go:373] "Setting endpoints for service port name" portName="the-namespace/the-app:http" endpoints=["10.104.88.233:5000"]
I0430 06:41:58.928153 1 endpointslicecache.go:373] "Setting endpoints for service port name" portName="the-namespace/the-app:http" endpoints=["10.104.88.233:5000"]
I0430 06:41:58.928172 1 proxier.go:796] "Syncing iptables rules"
My whole reality was shattered! I had always assumed that kube-proxy
would only route traffic to ready pods
and that terminating pods could never be ready. But here we are, with a terminating (but ready) pod
receiving traffic from a Service
.
I still was puzzled why kube-proxy
would do this, but a fresh search of The Internet for information turned up KEP-1669 and a Kubernetes blog post that said the behaviour changed in Kubernetes v1.26 (December 2022):
Starting in Kubernetes v1.26, kube-proxy enables the
ProxyTerminatingEndpoints
feature by default, which adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise be dropped.
Fail open
Now I get it! From Kubernetes 1.26, kube-proxy
has a fail open mechanism to try and not drop traffic if it
can help it. This is reminiscent of the behaviour of AWS ELBv2 (better known as ALB and NLB).
If kube-proxy
has pods that are running and ready it will send traffic to those pods. However if there are
zero pods in the running state, but some pods in the terminating state which are ready, it will send traffic to
to these pods instead of dropping the traffic.
This can be confusing when you see that the EndpointSlice
object for a terminating pod has the ready
condition set to false
, but you need to keep in mind that kube-proxy
cares about the serving
condition instead.
Inspecting an EndpointSlice
object for a terminating pod that is still ready shows these conditions:
# kubectl get endpointslice -n the-namespace the-app-qgvs7 -o yaml
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.104.88.233
conditions:
ready: false
serving: true
terminating: true
Inspecting an EndpointSlice
object for a terminating pod that is not ready shows these conditions:
# kubectl get endpointslice -n the-namespace the-app-qgvs7 -o yaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- 10.104.88.233
conditions:
ready: false
serving: false
terminating: true
Things change
So there you go! Over the years 2020 - 2022, the behaviour of Kubernetes Services
, EndpointSlices
and
kube-proxy
has changed. It confused me & my team for a while, and it took longer to find the answer to why traffic
was arriving at terminating pods than I expected. This blog post is an attempt to distribute the
knowledge of this behaviour and I hope that it gets picked up into the indexes of search engines to shortcut the
learning process for others!
Minutiae
A few other random facts (as at May 2024!):
- Terminating pods do not receive
livenessProbe
checks - Terminating pods do receive
readinessProbe
checks - Terminating pods can transition from not ready to ready if
readinessProbe
checks pass - Terminating pods do not receive service traffic if they are not ready (
readinessProbe
checks failing) - Terminating pods do not receive service traffic if other pods are
Running
and ready