In a dynamic Kubernetes environment, pods can sometimes fail to start correctly. Understanding how to diagnose these startup issues is a fundamental skill for any Kubernetes practitioner. Common failure states like ImagePullBackOff
, CrashLoopBackOff
, or a pod being perpetually stuck in Pending
can bring applications down. This exercise provides a hands-on simulation of these common problems, guiding you through a systematic troubleshooting process using essential kubectl
commands.
You are the on-call engineer for a team that has just deployed a new application. Shortly after the deployment, you receive an alert that the application is down. Your mission is to investigate the Kubernetes cluster, identify why the application's pods are not running correctly, and apply the necessary fixes to bring the service back online. You will face three separate challenges, each representing a common real-world problem.
This exercise consists of three challenges. You must diagnose and fix the startup issue for each of the three pods created in the Environment Setup section.
app-challenge-1
is in an ImagePullBackOff
state. You must identify the cause and get the pod into a Running
state.app-challenge-2
is in a CrashLoopBackOff
state. You must inspect the logs, identify the configuration error, and get the pod into a Running
state.app-challenge-3
is in a Pending
state. You must identify the resource scheduling conflict and get the pod into a Running
state.app-challenge-1
, initially suffering from an ImagePullBackOff
error, is successfully fixed and achieves a Running
state.app-challenge-2
, initially stuck in a CrashLoopBackOff
loop, is successfully fixed by providing the required configuration and achieves a Running
state.app-challenge-3
, initially stuck in a Pending
state due to resource constraints, is successfully scheduled and achieves a Running
state after its definition is corrected.To begin the exercise, you must first create the problematic pods in your cluster. This YAML manifest defines three pods, each with a unique and common startup failure.
Create a file named broken-pods.yaml
with the following content:
# broken-pods.yaml apiVersion: v1 kind: Pod metadata: name: app-challenge-1 spec: containers: - name: challenge-1-container image: nginxx:1.21.0 ports: - containerPort: 80 --- apiVersion: v1 kind: Pod metadata: name: app-challenge-2 spec: containers: - name: challenge-2-container image: busybox:1.35 command: ["/bin/sh", "-c", "echo 'Configuration value is: $MY_CONFIG' && test -n '$MY_CONFIG' && sleep 3600"] --- apiVersion: v1 kind: Pod metadata: name: app-challenge-3 spec: containers: - name: challenge-3-container image: nginx:1.21.0 ports: - containerPort: 80 resources: requests: cpu: "1000" # 1000 CPU cores - intentionally excessive to trigger scheduling failure
Apply the manifest to your cluster to create the pods:
kubectl apply -f broken-pods.yaml
Confirm that the pods have been created and are in their respective failure states:
kubectl get pods # Expected output (the statuses and restart counts may vary depending on timing): # NAME READY STATUS RESTARTS AGE # app-challenge-1 0/1 ImagePullBackOff 0 15s # app-challenge-2 0/1 CrashLoopBackOff 2 15s # app-challenge-3 0/1 Pending 0 15s
Verification Note: It may take 30-60 seconds for the pods to reach their expected failure states. Run kubectl get pods -w
to watch the status changes in real-time, and press Ctrl+C to exit the watch mode once all pods show their respective error states.
Your task is to investigate each of the three challenge pods, diagnose the root cause of its failure, and apply a fix to get it into a Running
state.
kubectl describe pod <pod-name>
command is your most powerful tool. Pay close attention to the Events
section at the bottom of its output.kubectl logs <pod-name>
is essential. If the pod is restarting too quickly, use kubectl logs <pod-name> --previous
to view logs from the previous (crashed) container run.Pod startup failures are one of the most common issues you will face in a production Kubernetes environment. The three scenarios covered here—ImagePullBackOff
, CrashLoopBackOff
, and Pending
due to resource constraints—represent a huge percentage of these real-world incidents. By mastering the kubectl describe
and kubectl logs
commands, you gain a systematic and effective method for diagnosing problems. This allows you to quickly identify the root cause, whether it's a simple typo, a missing configuration, or a resource allocation issue, and restore service with confidence. This skill is fundamental to being a reliable and effective Kubernetes operator.