Back to Skills

container-debugging

aj-geddes
Updated Today
14 views
7
7
View on GitHub
Testingai

About

This Claude Skill helps developers debug Docker containers and containerized applications by diagnosing deployment issues, container lifecycle problems, and resource constraints. It provides commands and techniques for troubleshooting containers that won't start, application crashes, network issues, and performance problems. Key capabilities include checking container status, inspecting logs, connecting to running containers, and analyzing resource usage.

Documentation

Container Debugging

Overview

Container debugging focuses on issues within Docker/Kubernetes environments including resource constraints, networking, and application runtime problems.

When to Use

  • Container won't start
  • Application crashes in container
  • Resource limits exceeded
  • Network connectivity issues
  • Performance problems in containers

Instructions

1. Docker Debugging Basics

# Check container status
docker ps -a
docker inspect <container-id>
docker stats <container-id>

# View container logs
docker logs <container-id>
docker logs --follow <container-id>  # Real-time
docker logs --tail 100 <container-id>  # Last 100 lines

# Connect to running container
docker exec -it <container-id> /bin/bash
docker exec -it <container-id> sh

# Inspect container details
docker inspect <container-id> | grep -A 5 "State"
docker inspect <container-id> | grep -E "Memory|Cpu"

# Check container processes
docker top <container-id>

# View resource usage
docker stats <container-id>
# Shows: CPU%, Memory usage, Network I/O

# Copy files from container
docker cp <container-id>:/path/to/file /local/path

# View image layers
docker history <image-name>
docker inspect <image-name>

2. Common Container Issues

Issue: Container Won't Start

Diagnosis:
  1. docker logs <container-id>
  2. Check exit code: docker inspect (ExitCode)
  3. Verify image exists: docker images
  4. Check entrypoint: docker inspect --format='{{.Config.Entrypoint}}'

Common Exit Codes:
  0: Normal exit
  1: General application error
  127: Command not found
  128+N: Terminated by signal N
  137: Out of memory (SIGKILL)
  139: Segmentation fault

Solutions:
  - Fix application error
  - Ensure required files exist
  - Check executable permissions
  - Verify working directory

---

Issue: Out of Memory

Symptoms: Exit code 137 (SIGKILL)

Debug:
  docker stats <container-id>
  # Check Memory usage vs limit

Solution:
  docker run -m 512m <image>
  # Increase memory limit
  docker inspect (MemoryLimit)
  # Check current limit

---

Issue: Port Already in Use

Error: "bind: address already in use"

Debug:
  docker ps  # Check running containers
  netstat -tlnp | grep 8080  # Check port usage

Solution:
  docker run -p 8081:8080 <image>
  # Use different host port

---

Issue: Network Issues

Symptom: Cannot reach other containers

Debug:
  docker network ls
  docker inspect <container-id> | grep IPAddress
  docker exec <container-id> ping <other-container>

Solution:
  docker network create app-network
  docker run --network app-network <image>

3. Container Optimization

Resource Limits:

Set in docker-compose:
  version: '3'
  services:
    app:
      image: myapp
      environment:
        - NODE_ENV=production
      resources:
        limits:
          cpus: '1.0'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

Limits: Maximum resources
Reservations: Guaranteed resources

---

Multi-Stage Builds:

FROM node:16 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

FROM node:16-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm install --production
EXPOSE 3000
CMD ["node", "dist/index.js"]

Result: 1GB → 200MB image size

4. Debugging Checklist

Container Issues:

[ ] Container starts without error
[ ] Ports mapped correctly
[ ] Logs show no errors
[ ] Environment variables set
[ ] Volumes mounted correctly
[ ] Network connectivity works
[ ] Resource limits appropriate
[ ] Permissions correct
[ ] Dependencies installed
[ ] Entrypoint working

Kubernetes Issues:

[ ] Pod running (not Pending/CrashLoop)
[ ] All containers started
[ ] Readiness probes passing
[ ] Liveness probes passing
[ ] Resource requests/limits set
[ ] Network policies allow traffic
[ ] Secrets/ConfigMaps available
[ ] Logs show no errors

Tools:

docker:
  - logs
  - stats
  - inspect
  - exec

docker-compose:
  - logs
  - ps
  - config

kubectl (Kubernetes):
  - logs
  - describe pod
  - get events
  - port-forward

Key Points

  • Check logs first: docker logs <container>
  • Understand exit codes (137=OOM, 127=not found)
  • Use resource limits appropriately
  • Network containers on same network
  • Multi-stage builds reduce image size
  • Monitor resource usage with stats
  • Port mappings: host:container
  • Exec into running containers for debugging
  • Update base images regularly
  • Include health checks in containers

Quick Install

/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/container-debugging

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

aj-geddes/useful-ai-prompts
Path: skills/container-debugging

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill