Imagine you've deployed your web application to production. You check kubectl get pods and see STATUS: Running for all your containers. Everything looks perfect, so you go home for the evening. But at 2 AM, you get paged because users are reporting error messages and timeouts. You check again — the pods still show as Running. What's going on? A container can be technically running but completely nonfunctional. Maybe the application deadlocked while processing a request.
Maybe it ran out of database connections and can't serve traffic anymore. The pod status only tells Kubernetes that the process is alive, not whether it's actually healthy. In this lesson, you'll learn how health probes enable Kubernetes to automatically detect these problems and take corrective action, whether that means restarting the container or temporarily stopping traffic until it recovers.
Kubernetes actually provides three different types of health probes, though we'll focus on two in this lesson. Understanding when to use each probe is crucial for building reliable systems. A liveness probe answers the question: "Is this container alive and functioning?" Think of it as checking for a pulse. If the liveness probe fails repeatedly, Kubernetes assumes the container is in a broken state that it cannot recover from on its own. The kubelet (the Kubernetes agent on the node) will kill the container and restart it, hoping that a fresh start will fix the problem. This is perfect for situations like application deadlocks, infinite loops, or corrupted internal states where the only solution is to restart the process.
A readiness probe answers a different question: "Is this container ready to receive traffic right now?" This probe doesn't indicate whether the container is broken — it indicates whether it's currently capable of handling requests. If the readiness probe fails, Kubernetes removes the pod's IP address from any Service endpoints, ensuring that no new traffic gets routed to it. The container keeps running, and Kubernetes keeps checking the readiness probe. Once the probe starts succeeding again, the pod is added back to the Service and begins receiving traffic. This is ideal for temporary conditions like loading large datasets into memory, waiting for database connections to be established, or performing cache warm-up operations.
The third type, a startup probe, is specifically designed for applications with slow startup times. It runs only during container initialization, and while it's running, the liveness and readiness probes are disabled. Once the startup probe succeeds, it stops running and the liveness and readiness probes take over. This prevents situations where a liveness probe kills a container that's still legitimately starting up. For most applications, configuring appropriate initialDelaySeconds on your liveness probe is sufficient, but startup probes are valuable when startup time is highly variable or exceptionally long.
Here's a concrete example to clarify the difference between liveness and readiness. Imagine your application needs to load a 2 GB product catalog from a database when it starts. During the first 30 seconds, the container is running and the application process is alive (liveness should pass), but it's not ready to serve customer requests because it hasn't finished loading the catalog (readiness should fail). You want Kubernetes to keep the container running but not send it any traffic yet. Now, imagine that after running for three days, your application hits a bug that causes all threads to deadlock. The process is still running (from the operating system's perspective), but it can't respond to any requests. In this case, the liveness probe should fail, triggering Kubernetes to restart the container and restore service.
Most production applications should configure both liveness and readiness probes. They work together to provide comprehensive health monitoring. The liveness probe protects against permanent failures that require a restart, while the readiness probe handles temporary unavailability. When both probes are configured, Kubernetes can make intelligent decisions: temporarily stop traffic during a transient issue (readiness) or restart completely if the container is truly broken (liveness).
Kubernetes supports three different mechanisms for performing health checks, each suited to different types of applications. The httpGet probe makes an HTTP GET request to a specified path and port on the container. If the response status code is between 200 and 399, the probe is considered successful. This is the most common probe type for web applications and REST APIs because these applications already expose HTTP endpoints. For example, you might configure a probe to hit /health or /ready endpoints that your application provides specifically for health checking.
The exec probe runs a command inside the container. If the command exits with status code 0, the probe succeeds. This is useful for applications that don't expose HTTP endpoints, such as databases or message queues. For instance, you might run pg_isready to check if PostgreSQL is accepting connections, or a custom script that verifies specific application conditions. The tcpSocket probe attempts to open a TCP connection to a specified port. If the connection succeeds, the probe passes. This is the simplest probe type and works well for any network service where you just need to verify that the port is accepting connections, without caring about the application-level protocol.
For this lesson, we'll focus on httpGet probes because they're the most widely applicable and because that's what our nginx example uses. When you configure an httpGet probe, you specify the path (like / or /health), the port (like 80 or 8080), and optionally headers or scheme ( vs. ). Kubernetes will make HTTP GET requests to and evaluate the response. If you're running a simple web server like nginx, checking the root path works fine. For more complex applications, you'd typically create dedicated health check endpoints that verify database connections, cache availability, and other dependencies.
Let's build a deployment that includes both liveness and readiness probes for an nginx web server. We'll start with the basic structure you learned in previous lessons, then add the probe configurations step by step. Start by creating a file called deployment-probes.yaml with the following content that defines the deployment metadata and replica settings:
This creates a deployment named web-app-probes that will manage two pod replicas. The selector tells the deployment to manage any pods with the label app: web-app. Now, let's add the pod template that defines what each replica should look like:
So far, this is familiar — we're defining a pod template with a single container running nginx version 1.25. The pod gets labeled with app: web-app so the deployment can track it. Now comes the important part: adding the liveness probe configuration:
The livenessProbe section goes directly under the container definition because each container can have its own probe. We've configured an httpGet probe that makes requests to path: / (the root URL) on ('s default HTTP port). This means Kubernetes will perform an HTTP GET request to to check if the container is alive. is configured by default to serve a welcome page at , so a healthy container will respond with an HTTP 200 status code, making the probe succeed.
Let's deploy this health-monitored application to your cluster and verify that the probes are configured correctly. Apply the deployment using kubectl:
You should see confirmation that the deployment was created:
The Kubernetes API server has received your deployment configuration, and the deployment controller is now working to create your two replica pods. Let's check if the pods were created and are running:
The -l app=web-app flag filters to show only pods with the app=web-app label. You should see output like this:
Both pods show STATUS: Running and READY: 1/1, which means they started successfully and their readiness probes are passing. The RESTARTS column shows 0, indicating that the liveness probes have been succeeding and Kubernetes hasn't needed to restart any containers. If you run this command very quickly after applying the deployment (within 5-10 seconds), you might briefly see READY: 0/1 because the readiness probe hasn't passed yet. Once the probe succeeds, the status changes to 1/1.
Now, let's inspect one of these pods in detail to see exactly how the probes are configured. Copy one of the pod names from your output (your actual pod names will be different than the example shown above) and run:
Understanding what happens when probes fail is crucial for debugging and troubleshooting. When a liveness probe fails repeatedly (three times by default), Kubernetes considers the container to be in an unrecoverable state and takes corrective action. The kubelet kills the container and starts a new one in its place. You'll see the RESTARTS column in kubectl get pods increment each time this happens. The pod itself isn't deleted — it's the same pod object, but the container inside has been restarted. This automatic recovery is one of Kubernetes' most powerful features: it can detect and fix broken containers without any human intervention.
When a readiness probe fails, Kubernetes takes a gentler approach. The container keeps running (it's not restarted), but the pod is removed from any Service endpoints that route traffic to it. This means if you have a Service pointing to pods with label app: web-app, and one pod's readiness probe starts failing, that pod will stop receiving traffic from the Service. The other healthy pods continue handling requests. Kubernetes keeps checking the readiness probe, and as soon as it starts succeeding again, the pod is added back to the Service endpoints and begins receiving traffic. This behavior is perfect for scenarios like temporary database connection issues or cache rebuilding — you want the pod to recover on its own without a full restart.
Let's look at how to monitor probe status and events. The Events section at the bottom of kubectl describe pod output shows a timeline of what's happened to the pod, including probe failures. Run the describe command again and look for the Events section:
You've now learned how to configure liveness and readiness probes to build truly self-healing applications in Kubernetes. Liveness probes detect when containers are broken and need restarting, while readiness probes control when pods should receive traffic. Combined with namespace isolation from Lesson 1 and resource management from Lesson 2, you now have three essential reliability pillars: organized environments, guaranteed resources, and automatic health monitoring.
You created a deployment with both probe types using httpGet, applied it to your cluster, and learned how to inspect probe configuration and monitor probe events. In the upcoming practice exercises, you'll experiment with probe failures and watch Kubernetes automatically recover, cementing your understanding of how these health checks enable reliable, production-grade applications.
