Kubernetes pod graceful shutdown

Because Pods represent processes running on nodes in the cluster, it is important to allow those processes to gracefully terminate when they are no longer needed (rather than being abruptly stopped with a KILL signal and having no chance to clean up).
Typically, the container runtime sends a TERM signal to the main process in each container. Many container runtimes respect the STOPSIGNAL value defined in the container image and send this instead of TERM. Once the grace period has expired, the KILL signal is sent to any remaining processes, and the Pod is then deleted from the API Server. If the kubelet or the container runtime's management service is restarted while waiting for processes to terminate, the cluster retries from the start including the full original grace period.

example flow

Kubectl delete pod ftp-rest-service-84599d54fd-m56bj

This command is trying to delete a specific pod, with the default grace period (30 seconds).

If you use kubectl describe to check on the Pod you're deleting, that Pod shows up as "Terminating". On the node where the Pod is running: as soon as the kubelet sees that a Pod has been marked as terminating (a graceful shutdown duration has been set), the kubelet begins the local Pod shutdown process.

If one of the Pod's containers has defined a preStop hook, the kubelet runs that hook inside of the container. If the preStop hook is still running after the grace period expires, the kubelet requests a small, one-off grace period extension of 2 seconds. If the preStop hook needs longer to complete than the default grace period allows, you must modify terminationGracePeriodSeconds to suit this.
The kubelet triggers the container runtime to send a TERM signal to process 1 inside each container (current example we run one container in the pod).

At the same time as the kubelet is starting graceful shutdown

the control plane removes that shutting-down Pod from Endpoints (and, if enabled, EndpointSlice) objects where these represent a Service with a configured selector.
ReplicaSets and other workload resources no longer treat the shutting-down Pod as a valid, in-service replica.
Pods that shut down slowly cannot continue to serve traffic as load balancers (like the service proxy) remove the Pod from the list of endpoints as soon as the termination grace period begins.

When the grace period expires, the kubelet triggers forcible shutdown. The container runtime sends SIGKILL to any processes still running in any container in the Pod. The kubelet also cleans up a hidden pause container if that container runtime uses one.
The kubelet triggers forcible removal of Pod object from the API server, by setting grace period to 0 (immediate deletion).
The API server deletes the Pod's API object, which is then no longer visible from any client.

Summary

image.png

The following 5 steps occur when Kubernetes kills a pod:

The pod switches to Terminating state and stops receiving any new traffic. Container is still running inside the pod.
preStop hook that is a special command or HTTP request is executed, and is sent to the container inside the pod.
SIGTERM signal is sent to pod and the container realizes that it will close soon.
Kubernetes waits for a grace period (terminationGracePeriodSeconds). This waiting is parallel to preStop hook and SIGTERM signal executions (default 30 sec). So, Kubernetes doesn’t wait for these to finish. If this period is finished, it goes directly to the next step. It is very important to correctly set the value of the grace period.
SIGKILL signal is sent to the pod, and the pod is removed. If the container is still running after the grace period, the pod is forcibly removed by SIGKILL, and the termination is finished.

Kubernetes pod graceful shutdown

example flow

Summary

推荐阅读更多精彩内容