In the previous lesson, you created Pods that worked correctly on the first try. You wrote a YAML manifest, applied it, and saw your Pod running smoothly. That's the happy path, and it's important to understand, but real-world Kubernetes operations involve troubleshooting failures just as often as creating working resources.
Pods don't always start successfully. Images might not exist, containers might crash immediately, or network issues might prevent image downloads. Understanding how Kubernetes manages Pods through their entire lifecycle — from creation through running to termination and failure — is essential for operating applications reliably. In this lesson, you'll learn how to read Pod status, interpret events, diagnose common failures, and observe what happens when Pods are deleted. By the end, you'll be able to troubleshoot broken Pods confidently and understand what Kubernetes is doing behind the scenes.
Every Pod goes through a series of phases during its lifetime. A phase is a high-level summary of where the Pod is in its lifecycle. Kubernetes tracks five possible phases: Pending, Running, Succeeded, Failed, and Unknown.
When you first create a Pod, it enters the Pending phase. This means Kubernetes has accepted the Pod definition and is working to schedule it onto a node and pull the container images. The Pod stays in Pending until all containers have been created and at least one container starts running. During this phase, Kubernetes is finding a suitable node, pulling images from registries, and preparing the environment.
Once at least one container in the Pod starts running, the phase changes to Running. This doesn't mean all containers are healthy or ready to serve traffic, just that they've started. The Pod remains in Running as long as at least one container is still running or restarting. Most long-lived applications, like web servers, stay in the Running phase indefinitely.
The Succeeded phase applies to Pods that are designed to run to completion, like batch jobs or one-time tasks. When all containers in the Pod terminate successfully (exit code 0) and won't be restarted, the Pod enters Succeeded. You won't see this phase often with typical web applications because they're meant to run continuously.
If all containers in a Pod have terminated and at least one terminated with a failure (non-zero exit code), the Pod enters the Failed phase. This indicates something went wrong — the application crashed or a configuration error caused the container to exit uncleanly. Failed Pods stay in this phase and don't automatically recover unless a higher-level controller (like a Deployment) recreates them.
Finally, the Unknown phase means Kubernetes lost contact with the node where the Pod was running and can't determine the Pod's actual state. This is rare and usually indicates a serious infrastructure problem, like a node failure or network partition. Understanding these phases helps you quickly assess what's happening with your Pods and where to focus your troubleshooting efforts.
The kubectl get pods command is your first tool for checking Pod health. It shows a table with several columns that give you a quick overview of each Pod's state.
The output looks like this:
The NAME column shows the Pod's name from the metadata. The READY column shows how many containers in the Pod are ready versus the total number of containers. A value like 1/1 means one container is ready out of one total container, which is what you want. If you see 0/1, it means the container exists but isn't ready yet, possibly because it's still starting or failing health checks.
The STATUS column displays a summary of the Pod's state, but it's important to understand that this is not always the same as the Pod's phase. While it can show phase-like values such as Running, Pending, or Completed, it also surfaces container-level status reasons like ImagePullBackOff or CrashLoopBackOff that carry more specific meaning. Crucially, these fine-grained statuses don't always map directly to the Pod phase: a Pod displaying ImagePullBackOff in STATUS is actually in the Pending phase (the container never started, so the Pod is still waiting), while a Pod displaying CrashLoopBackOff is typically in the Running phase (the container keeps starting and crashing, and Kubernetes keeps restarting it under the default restartPolicy: Always). These container-level status reasons are more actionable than the phase alone and immediately tell you what kind of problem you're dealing with.
The RESTARTS column shows how many times the containers in the Pod have been restarted. A value of is ideal. If you see a number like or , it means the container crashed and Kubernetes restarted it multiple times. Frequent restarts indicate an unstable application that's crashing repeatedly. The column shows how long ago the Pod was created, which helps you understand if a problem is new or has been ongoing.
While kubectl get pods gives you a quick overview, it doesn't explain why a Pod is in a particular state. That's where kubectl describe pod comes in. This command shows detailed information about a Pod, including its configuration, current state, and, most importantly, a timeline of events that show what Kubernetes has been doing.
The output is long and contains several sections. Near the top, you'll see the Pod's metadata, labels, and status. Further down, you'll see details about each container, including its image, ports, environment variables, and current state. But the most valuable section for troubleshooting is at the bottom: Events.
The Events section shows a chronological log of actions Kubernetes took with this Pod. Each event has a timestamp, a type (Normal or Warning), a reason (like Scheduled, Pulling, Pulled, Created, Started), and a message explaining what happened. When a Pod is working correctly, you'll see events like:
This tells the story: the scheduler assigned the Pod to a node, the kubelet on that node pulled the image, created the container, and started it. Everything worked smoothly. When something goes wrong, you'll see Warning events with error messages that explain the problem. These messages are your primary clue for understanding failures.
One of the most common Pod failures you'll encounter is ImagePullBackOff. This status means Kubernetes tried to pull the container image from a registry but failed, and it's now backing off (waiting longer between retries) before trying again.
ImagePullBackOff happens for several reasons. The most common is a typo in the image name or tag — if you specify nginx:nonexistent, Kubernetes will try to pull that tag from Docker Hub, fail because it doesn't exist, and enter ImagePullBackOff. Another cause is trying to pull from a private registry without providing credentials. Network issues or registry outages can also cause this error, though they're less common.
Let's create a Pod that will fail with ImagePullBackOff so you can see what it looks like. Save this manifest as pod-broken.yaml:
The only difference from a working Pod is the image tag: nginx:nonexistent doesn't exist in Docker Hub. Apply this Pod:
Now check its status:
You'll see output like this:
The STATUS column shows ImagePullBackOff, and READY shows 0/1 because the container never started. Notice that even though STATUS says ImagePullBackOff, the Pod's actual phase is Pending — the container never ran, so the Pod is still waiting. The Pod is stuck trying to pull an image that doesn't exist. To understand exactly what went wrong, use :
Another essential troubleshooting tool is kubectl logs, which shows the output (stdout and stderr) from a container. Logs are useful for debugging application-level problems — crashes, configuration errors, or unexpected behavior after the container starts.
However, in the case of our broken Pod, you'll see an error:
This makes sense: the container never started because the image couldn't be pulled, so there are no logs to show. Logs are only available after a container has started at least once. If a Pod fails during image pull or container creation, kubectl logs won't help — you need to use kubectl describe to see the events instead.
Logs become valuable when a container starts but then crashes or behaves incorrectly. For example, if your application starts but immediately exits with an error, the logs will show the error message. If it starts but serves errors to users, the logs might show stack traces or warnings. Always check logs when a Pod is in CrashLoopBackOff (meaning it starts, crashes, and restarts repeatedly) because the logs will show why the application is crashing.
When a container in CrashLoopBackOff is currently in its backoff waiting period, you can use the --previous flag to view logs from the most recent terminated instance:
This is helpful when a container crashes so quickly that you can't catch its logs before it restarts. Note that --previous only works when a container has actually started and crashed at least once — there must be a previously terminated instance to inspect. It would not work on a Pod like nginx-broken, where the container never started at all. Understanding when to use (for infrastructure and scheduling issues) versus (for application issues) is key to efficient troubleshooting.
So far, we've focused on Pod creation and failure, but it's also important to understand what happens when you delete a Pod. Kubernetes doesn't just kill Pods instantly — it follows a graceful shutdown process to give applications time to finish their work and clean up.
When you run kubectl delete pod, Kubernetes immediately marks the Pod for deletion and stops sending new traffic to it (if it's part of a Service). Then it sends a TERM signal (SIGTERM) to the main process in each container, which is a polite request to shut down. The application has a grace period (default 30 seconds) to finish handling existing requests, close database connections, save state, and exit cleanly.
If the container doesn't exit within the grace period, Kubernetes sends a KILL signal (SIGKILL), which forcefully terminates the process. This ensures the Pod doesn't hang forever, but it also means the application might not clean up properly if it takes too long to shut down.
Let's observe this process with our broken Pod. Delete it:
You'll see:
The command returns quickly, but if you run kubectl get pods immediately after, you might catch the Pod in a Terminating state:
Within a few seconds, the Pod disappears completely because it's been removed from the cluster. In this case, termination is fast because the container never started, so there's nothing to shut down gracefully. For a running Pod, you'd see the Terminating status for longer as Kubernetes waits for the application to exit.
Understanding graceful shutdown matters for stateful applications like databases or message queues. If you delete a Pod running a database, you want it to flush writes to disk and close connections cleanly before exiting. Kubernetes gives it time to do that. If your application needs more than 30 seconds, you can configure a longer grace period in the Pod spec using terminationGracePeriodSeconds.
In this lesson, you moved beyond creating working Pods to understanding their full lifecycle and diagnosing failures. You learned about the five Pod phases (Pending, Running, Succeeded, Failed, Unknown) and what each phase means. You saw how to read Pod status with kubectl get pods, interpreting the READY, STATUS, RESTARTS, and AGE columns to quickly assess Pod health.
A key distinction you learned is that the STATUS column is not the same as the Pod phase. Container-level status reasons like ImagePullBackOff (Pod phase: Pending) and CrashLoopBackOff (Pod phase: Running) give you more actionable information than the phase alone and behave differently than the Failed phase — which only applies when all containers have actually terminated with a non-zero exit code.
You also learned to use kubectl describe pod to see detailed events and error messages, which is essential for understanding why a Pod isn't working. You created a Pod with a broken image to experience ImagePullBackOff firsthand and learned how to interpret the error messages in the Events section. You explored kubectl logs and understood when it's useful (after a container starts) versus when you need kubectl describe (for infrastructure issues like image pull failures). You also learned that the --previous flag is specifically for retrieving logs from a crashed container instance, and requires the container to have started and run at least once. Finally, you observed Pod termination and learned about Kubernetes' graceful shutdown process.
You now have a complete troubleshooting workflow: use kubectl get for a quick overview, kubectl describe to understand what went wrong, and kubectl logs to debug application issues. In the upcoming practice exercises, you'll apply these skills to diagnose and fix various failure scenarios, building confidence in your ability to operate Pods in real-world situations.
