Troubleshooting Containerized Workloads

Practical tactics for hunting down failures in Docker and Kubernetes

Overview

Containers fail in a handful of predictable ways: startup/exit problems, image pulls, resource limits, networking, storage, and security contexts. This guide gives you a field-tested checklist and copy‑paste commands to get from mystery to root cause quickly.

Golden loop: LogsDescribe/EventsExecReproduce locally.

Rapid triage flow

1) Is it running?

$ kubectl get pods -n <ns>
$ kubectl describe pod <pod> -n <ns>
$ kubectl get events -n <ns> --sort-by=.lastTimestamp

Look for CrashLoopBackOff, ImagePullBackOff, OOMKilled, mount or permission errors in Events.

2) What do logs say?

$ kubectl logs <pod> -n <ns>
# previous container attempt
$ kubectl logs -p <pod> -n <ns>

If multiple containers, add -c <name>. For sidecars, tail both.

3) Exec inside

$ kubectl exec -it <pod> -n <ns> -- /bin/sh
# check process & ports
# try: ss over netstat in minimal images
/ # ps -ef
/ # ss -tuln
/ # env | sort

4) Reproduce locally

$ docker run --rm -it <image:tag> /bin/sh
$ docker logs <ctr>
$ docker inspect <ctr-or-image>

Common failure categories & fixes

Startup & lifecycle

Image & registry

Resources (CPU, memory, disk)

Networking & DNS

Volumes & filesystem

Security context

Kubernetes playbook

Inspect

$ kubectl get pods -o wide -n <ns>
$ kubectl describe pod <pod> -n <ns>
$ kubectl get rs,deploy,sts,ds -n <ns>

Probe & health

livenessProbe:
  httpGet: { path: /healthz, port: 8080 }
  initialDelaySeconds: 10
  periodSeconds: 10
readinessProbe:
  httpGet: { path: /ready, port: 8080 }
  periodSeconds: 5

Sidecar debugging

# ephemeral toolbox container (K8s 1.23+)
$ kubectl debug pod/<pod> -n <ns> --image=busybox:1.36 --target=<app-container>

Port-forward to test

$ kubectl port-forward pod/<pod> 8080:8080 -n <ns>
$ curl -i localhost:8080/healthz

Production checklist

FAQ

How do I check if my Service points to any Pods?

$ kubectl get svc <name> -n <ns> -o yaml | yq '.spec.selector'
$ kubectl get pods -n <ns> -l app=<value>
$ kubectl get endpoints <name> -n <ns>

How can I test DNS & egress quickly?

$ kubectl run diag --rm -it --image=busybox:1.36 --restart=Never -- sh
/ # nslookup example.com
/ # wget -qO- https://ifconfig.me

How do I spot env/config mistakes?

$ kubectl get cm,secret -n <ns>
$ kubectl exec -it <pod> -- env | sort
$ kubectl describe pod <pod> | sed -n '/Environment/,+8p'