Kubernetes: Checking Pod Health

In my previous post on Pods, I gave a simple example running a container in Kubernetes. In this post, I will be discussing how to check pod health and keep them running with probes.

Probes on Pods

Even the best software fails from time to time. Be it networking issues, a software bug, or unrelated 3rd party dependencies. Being able to react to these failures and ensure the appropriate action occurs is critical to maintaining an overall healthy software system.

Kubernetes has probes to inspect a pod to ensure it is up, running, and healthy. Depending on the type of probe, it may stop directing traffic to the pod or terminate the pod.

There are three main probes on the Pods:

Startup Probe
Readiness Probe
Liveness Probe

The Startup Probe

The startup probe is to determine when a container is done starting and ready to begin processing. This affects the system as a pod that is not Ready won't be available for network traffic yet.

The startup probe inspects a Pod and ensures a condition is met prior to considering the Pod ready. If the startup probe fails it will stop the pod. The restart policy will take effect and the pod will be restarted if so configured. Many pieces of software will have some initialization routine, or need to prepare some resources before they can be considered ready to use.

The startup probe may seem redundant with the liveness probe. The two are very similar. However, the main reasoning for the startup probe is that certain applications take longer to startup. An example would be an application that performs database migrations on startup. That initial database migration may take a long time. However, it's a one time cost at the start of the application. Once it's are in a running state inspecting if they are still working is a faster operation.

The Liveness Probe

The liveness probe executes as the pod is running. It will intermittently check that the Pod is still healthy, if it is not, Kubernetes will terminate the container and restart it via it's configured restart policy. Where the startup probe was just when the container starts up, the liveness probe is the probe that handles checks after the container was initially ready.

The Readiness Probe

This one is perhaps the coolest probe. In the same way as the other probes, the Readiness probe inspects the Pod for health. However, this probe doesn't terminate the pod. If the probe indicates the container is not healthy, it will take it out of any service end points. In other words - traffic will stop routing to the Pod, but the Pod will remain running.

I haven't talked about networking in Kubernetes yet, but plan to in the future.

Options on The Probes

The probes can consist of one of the following:

A command executed within the Pod
An HTTP GET request
A TCP port check
A GRPC health check

It's up to a container to implement some mechanism from the above list for kubernetes to check its health. I can set a startup probe which will wait for the Pod to become available until a TCP port is available. Another example would be a liveness probe including an HTTP endpoint on my application which indicates the Pod is healthy. The liveness probe could continue to run and check the endpoint through an HTTP GET request.

There are other options including but not limited to:

How often the probe runs.
How many times the probe needs to fail.
How many times it needs to succeed to have traffic routed to it again (for a readiness probe).

Example #1: The Startup Probe

For my first sample, I'm going to setup a Pod, which will wait for 15 seconds and then write a file to disk. A startup probe based on an exec command which will check for that file to be written, that will check every 5 seconds and do up to 12 checks. The probe will be an exec command that will check for the file to exist on disk. The cat command will return a non-success return code if the file doesn't exist and the exec command will pick up on the return code. After that, Kubernetes should show the Pod as running.

To run the pod, I will just setup a YAML resource and use kubectl apply to apply it to the cluster.

Here's the YAML I'll be using for this example:

apiVersion: v1
kind: Pod
metadata:
  name: busybox-startup-exec-probe
spec:
  containers:
  - image: busybox
    command: ["/bin/sh"]
    args:
      - -c
      - echo "started";
        sleep 15;
        echo "touching startup file";
        touch ~/startup;
        echo "sleeping before restart";
        sleep 120;
    name: busybox-startup-exec-probe
    startupProbe:
      exec:
        command:
          - /bin/sh
          - -c
          - cat ~/startup
      periodSeconds: 5
      failureThreshold: 12
  restartPolicy: Always

Watching the Status

To see the status of the pod, the 'describe' command can be used as it starts up. There is a lot of information that comes back in describe, so I'm only going to highlight a few parts I'm interested in.

A few seconds after starting the container, I ran the kubectl describe pod/busybox-startup-exec-probe command to look at the status. There are a few things to note here. The container indicates it is Running. However, in the conditions you can see that the containers aren't Ready yet. This is because the startup probe has been failing. There is a separate Events section which shows us the actions Kuberenetes is taking against the Pod.

The event I want to point out is the final line. This indicates that the pod is unhealthy because the startup probe failed multiple times. This is due to the file not existing immediately after the Pod started running. The message section gives these details.

Name:         busybox-startup-exec-probe
Status:       Running
Containers:
  busybox-startup-exec-probe:
    State:          Running
      Started:      Sat, 18 Jun 2022 10:14:43 -0500
    Ready:          False
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  13s              default-scheduler  Successfully assigned default/busybox-startup-exec-probe to docker-desktop
  Normal   Pulling    13s              kubelet            Pulling image "busybox"
  Normal   Pulled     11s              kubelet            Successfully pulled image "busybox" in 1.2569084s
  Normal   Created    11s              kubelet            Created container busybox-startup-exec-probe
  Normal   Started    11s              kubelet            Started container busybox-startup-exec-probe
  Warning  Unhealthy  3s (x2 over 8s)  kubelet            Startup probe failed: cat: can't open '/root/startup': No such file or directory

After running for some time, I again run the command to describe the Pod. From the events, we can see no more Unhealthy events popped up after a while. It had 3 events, and is no longer logging new ones. The pod is now in a Ready state. The startup probe did its job.

Name:         busybox-startup-exec-probe
Containers:
  busybox-startup-exec-probe:
    State:          Running
      Started:      Sat, 18 Jun 2022 10:14:43 -0500
    Ready:          True
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m6s                 default-scheduler  Successfully assigned default/busybox-startup-exec-probe to docker-desktop
  Normal   Pulling    2m6s                 kubelet            Pulling image "busybox"
  Normal   Pulled     2m4s                 kubelet            Successfully pulled image "busybox" in 1.2569084s
  Normal   Created    2m4s                 kubelet            Created container busybox-startup-exec-probe
  Normal   Started    2m4s                 kubelet            Started container busybox-startup-exec-probe
  Warning  Unhealthy  111s (x3 over 2m1s)  kubelet            Startup probe failed: cat: can't open '/root/startup': No such file or directory

Example #2: The Liveness Probe

I'm going to modify my previous example. I will setup a liveness probe which will run the same exec command. I'm going to modify my pod so that after another 30 seconds, it will delete the file.

The result should be, the Pod will start, the startup probe will succeed when the file is there. The liveness probe will fail and will restart the container. The liveness probe will be set to check every second, and after 1 failure, it will terminate the container. Due to the Always restart policy, the container will be restarted.

Here is the yaml describing the Pod.

apiVersion: v1
kind: Pod
metadata:
  name: busybox-liveness-exec-probe
spec:
  containers:
  - image: busybox
    command: ["/bin/sh"]
    args:
      - -c
      - echo "started";
        sleep 15;
        echo "touching startup file";
        touch ~/startup;
        echo "waiting for 15 seconds and then deleting file";
        sleep 15;
        echo "deleting file";
        rm ~/startup;
        echo "sleeping before restart";
        sleep 120;
    name: busybox-liveness-exec-probe
    startupProbe:
      exec:
        command:
          - /bin/sh
          - -c
          - cat ~/startup
      periodSeconds: 5
      failureThreshold: 12
    livenessProbe:
      exec:
        command:
          - /bin/sh
          - -c
          - cat ~/startup
      periodSeconds: 1
      failureThreshold: 1
  restartPolicy: Always

Rather than monitoring the pod with the describe command, I'm going to look at the raw events for the pod.

I can do that by using a k get events --field-selector=involvedObject.name=busybox-liveness-exec-probe command. The field selector parameter is powerful in that it lets me reduce the list of messages to the ones just for my pod.

LAST SEEN   TYPE      REASON      OBJECT                            MESSAGE
58s         Normal    Scheduled   pod/busybox-liveness-exec-probe   Successfully assigned default/busybox-liveness-exec-probe to docker-desktop
57s         Normal    Pulling     pod/busybox-liveness-exec-probe   Pulling image "busybox"
53s         Normal    Pulled      pod/busybox-liveness-exec-probe   Successfully pulled image "busybox" in 4.3484866s
52s         Normal    Created     pod/busybox-liveness-exec-probe   Created container busybox-liveness-exec-probe
52s         Normal    Started     pod/busybox-liveness-exec-probe   Started container busybox-liveness-exec-probe
38s         Warning   Unhealthy   pod/busybox-liveness-exec-probe   Startup probe failed: cat: can't open '/root/startup': No such file or directory
22s         Warning   Unhealthy   pod/busybox-liveness-exec-probe   Liveness probe failed: cat: can't open '/root/startup': No such file or directory
22s         Normal    Killing     pod/busybox-liveness-exec-probe   Container busybox-liveness-exec-probe failed liveness probe, will be restarted

If I kept checking the events, I'd see this repeat again as the container will just keep getting restarted. I will spare the details for the describe command here.

Wrapping Up

Through probes, Kubernetes provides a way to inspect containers in pods to ensure they are still healthy. Having functionality like this baked into the orchestrator helps build automation for heath checks around even the most basic use cases. It also makes it very easy to configure.

I've started a GitHub repository with various examples as I write more blogs about Kubernetes. The YAML files for my examples from above can be found here.

I hope you enjoyed this post, and it will encourage you to configure some probes on the next Kubernetes pod you create.

Share this post: