container-debugging

aj-geddes

Updated Today

14 views

Testingai

About

This Claude Skill helps developers debug Docker containers and containerized applications by diagnosing deployment issues, container lifecycle problems, and resource constraints. It provides commands and techniques for troubleshooting containers that won't start, application crashes, network issues, and performance problems. Key capabilities include checking container status, inspecting logs, connecting to running containers, and analyzing resource usage.

Documentation

Container Debugging

Overview

Container debugging focuses on issues within Docker/Kubernetes environments including resource constraints, networking, and application runtime problems.

When to Use

Container won't start
Application crashes in container
Resource limits exceeded
Network connectivity issues
Performance problems in containers

Instructions

1. Docker Debugging Basics

# Check container status
docker ps -a
docker inspect <container-id>
docker stats <container-id>

# View container logs
docker logs <container-id>
docker logs --follow <container-id>  # Real-time
docker logs --tail 100 <container-id>  # Last 100 lines

# Connect to running container
docker exec -it <container-id> /bin/bash
docker exec -it <container-id> sh

# Inspect container details
docker inspect <container-id> | grep -A 5 "State"
docker inspect <container-id> | grep -E "Memory|Cpu"

# Check container processes
docker top <container-id>

# View resource usage
docker stats <container-id>
# Shows: CPU%, Memory usage, Network I/O

# Copy files from container
docker cp <container-id>:/path/to/file /local/path

# View image layers
docker history <image-name>
docker inspect <image-name>

2. Common Container Issues

Issue: Container Won't Start

Diagnosis:
  1. docker logs <container-id>
  2. Check exit code: docker inspect (ExitCode)
  3. Verify image exists: docker images
  4. Check entrypoint: docker inspect --format='{{.Config.Entrypoint}}'

Common Exit Codes:
  0: Normal exit
  1: General application error
  127: Command not found
  128+N: Terminated by signal N
  137: Out of memory (SIGKILL)
  139: Segmentation fault

Solutions:
  - Fix application error
  - Ensure required files exist
  - Check executable permissions
  - Verify working directory

---

Issue: Out of Memory

Symptoms: Exit code 137 (SIGKILL)

Debug:
  docker stats <container-id>
  # Check Memory usage vs limit

Solution:
  docker run -m 512m <image>
  # Increase memory limit
  docker inspect (MemoryLimit)
  # Check current limit

---

Issue: Port Already in Use

Error: "bind: address already in use"

Debug:
  docker ps  # Check running containers
  netstat -tlnp | grep 8080  # Check port usage

Solution:
  docker run -p 8081:8080 <image>
  # Use different host port

---

Issue: Network Issues

Symptom: Cannot reach other containers

Debug:
  docker network ls
  docker inspect <container-id> | grep IPAddress
  docker exec <container-id> ping <other-container>

Solution:
  docker network create app-network
  docker run --network app-network <image>

3. Container Optimization

Resource Limits:

Set in docker-compose:
  version: '3'
  services:
    app:
      image: myapp
      environment:
        - NODE_ENV=production
      resources:
        limits:
          cpus: '1.0'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

Limits: Maximum resources
Reservations: Guaranteed resources

---

Multi-Stage Builds:

FROM node:16 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

FROM node:16-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm install --production
EXPOSE 3000
CMD ["node", "dist/index.js"]

Result: 1GB → 200MB image size

4. Debugging Checklist

Container Issues:

[ ] Container starts without error
[ ] Ports mapped correctly
[ ] Logs show no errors
[ ] Environment variables set
[ ] Volumes mounted correctly
[ ] Network connectivity works
[ ] Resource limits appropriate
[ ] Permissions correct
[ ] Dependencies installed
[ ] Entrypoint working

Kubernetes Issues:

[ ] Pod running (not Pending/CrashLoop)
[ ] All containers started
[ ] Readiness probes passing
[ ] Liveness probes passing
[ ] Resource requests/limits set
[ ] Network policies allow traffic
[ ] Secrets/ConfigMaps available
[ ] Logs show no errors

Tools:

docker:
  - logs
  - stats
  - inspect
  - exec

docker-compose:
  - logs
  - ps
  - config

kubectl (Kubernetes):
  - logs
  - describe pod
  - get events
  - port-forward

Key Points

Check logs first: docker logs <container>
Understand exit codes (137=OOM, 127=not found)
Use resource limits appropriately
Network containers on same network
Multi-stage builds reduce image size
Monitor resource usage with stats
Port mappings: host:container
Exec into running containers for debugging
Update base images regularly
Include health checks in containers

Quick Install

/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/container-debugging

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

aj-geddes/useful-ai-prompts

Path: skills/container-debugging

Related Skills

sglang

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.