verify-inference-example

kungfuai

Updated Today

23 views

Metadesign

About

This skill verifies CVlization inference examples by checking their structure, ensuring successful builds, and validating correct inference execution. It's primarily used for validating new implementations or debugging existing inference pipeline issues. The skill includes GPU environment awareness, advising users to check resource availability before running intensive inference tasks.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/kungfuai/CVlization

Git CloneAlternative

git clone https://github.com/kungfuai/CVlization.git ~/.claude/skills/verify-inference-example

Copy and paste this command in Claude Code to install this skill

Documentation

Verify Inference Example

Systematically verify that a CVlization inference example is complete, properly structured, and functional.

When to Use

Validating a new or modified inference example
Debugging inference pipeline issues
Ensuring example completeness before commits
Verifying example works after CVlization updates

Important Context

Shared GPU Environment: This machine may be used by multiple users simultaneously. Before running GPU-intensive inference:

Check GPU memory availability with nvidia-smi
Wait for sufficient VRAM and low GPU utilization if needed
Consider stopping other processes if you have permission
If CUDA OOM errors occur, wait and retry when GPU is less busy

Verification Checklist

1. Structure Verification

Check that the example directory contains all required files:

# Navigate to example directory
cd examples/<capability>/<task>/<framework>/

# Expected structure:
# .
# ├── example.yaml        # Required: CVL metadata
# ├── Dockerfile          # Required: Container definition
# ├── build.sh            # Required: Build script
# ├── predict.sh          # Required: Inference script
# ├── predict.py          # Required: Inference code
# ├── examples/           # Required: Sample inputs
# ├── outputs/            # Created at runtime
# └── README.md           # Recommended: Documentation

Key files to check:

example.yaml - Must have: name, capability, stability, presets (build, predict/inference)
Dockerfile - Should copy necessary files and install dependencies
build.sh - Must set SCRIPT_DIR and call docker build
predict.sh - Must mount volumes correctly and call predict.py
predict.py - Main inference script
examples/ - Directory with sample input files

2. Build Verification

# Option 1: Build using script directly
./build.sh

# Option 2: Build using CVL CLI (recommended)
cvl run <example-name> build

# Verify image was created
docker images | grep <example-name>

# Expected: Image appears with recent timestamp

What to check:

Build completes without errors (both methods)
All dependencies install successfully
Image size is reasonable
cvl info <example-name> shows correct metadata

3. Inference Verification

Run inference with sample inputs:

# Option 1: Run inference using script directly
./predict.sh

# Option 2: Run inference using CVL CLI (recommended)
cvl run <example-name> predict

# With custom inputs (if supported)
./predict.sh path/to/custom/input.jpg

Immediate checks:

Container starts without errors
Model loads successfully (check GPU memory with nvidia-smi if using GPU)
Inference completes (outputs generated)
Output files created in outputs/ or similar directory
Results look reasonable (open output files to inspect)

4. Output Verification

Check that inference produces valid outputs:

# Check outputs directory
ls -la outputs/

# Expected: Output files with recent timestamps

# Inspect output content
cat outputs/output.md  # For text outputs
# or
python -m json.tool outputs/output.json  # For JSON outputs

What to verify:

Output files are created
Output format is correct (markdown, JSON, etc.)
Output contains expected content structure
Output is non-empty and valid

5. Model Caching Verification

Verify that pretrained models are cached properly:

# Check HuggingFace cache
ls -la ~/.cache/huggingface/hub/
# Expected: Model files downloaded once and reused

# Run inference twice and verify no re-download
./predict.sh 2>&1 | tee first_run.log

# Second run should reuse cached models
./predict.sh 2>&1 | tee second_run.log

# Verify no download messages in second run
grep -i "downloading" second_run.log
# Expected: No new downloads (models already cached)

What to verify:

Models download to ~/.cache/huggingface/ (or framework-specific cache)
Second run reuses cached models without re-downloading
Check predict.py doesn't set custom cache directories that break caching

6. Runtime Checks

GPU VRAM Usage Monitoring (REQUIRED for GPU models):

Monitor GPU VRAM usage before, during, and after inference:

# In another terminal, watch GPU memory in real-time
watch -n 1 nvidia-smi

# Or get detailed memory breakdown
nvidia-smi --query-gpu=index,name,memory.used,memory.total,memory.free,utilization.gpu --format=csv,noheader,nounits

# Record peak VRAM usage during inference
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | awk '{print $1 " MB"}'

Expected metrics:

Model loading: VRAM usage increases as model loads into memory
Inference peak: VRAM spikes during forward pass
Cleanup: Memory released after inference completes (for short-running containers)
Temperature: Stable (<85°C)

What to record for verification metadata:

Peak VRAM usage in GB (e.g., "8.2GB VRAM" or "12.5GB VRAM")
Percentage of total VRAM (e.g., "52%" for 12.5GB on 24GB GPU)
Whether 4-bit/8-bit quantization was used (affects VRAM requirements)

Troubleshooting:

CUDA OOM: Use smaller model variant, enable quantization (4-bit/8-bit), or run on CPU
High VRAM idle usage: Check if other processes are using GPU
Memory not released: Container may still be running (docker ps)

Docker Container Health:

# Check container runs and exits cleanly
docker ps -a | head

# Verify mounts (for running container)
docker inspect <container-id> | grep -A 10 Mounts
# Should see: workspace, cvlization_repo, huggingface cache

7. Quick Validation Test

For fast verification during development:

# Run with smallest sample input
./predict.sh examples/small_sample.jpg

# Expected runtime: seconds to few minutes
# Verify: Completes without errors, output generated

8. Update Verification Metadata

After successful verification, update the example.yaml with verification metadata:

First, check GPU info:

# Get GPU model and VRAM
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader

Format:

verification:
  last_verified: 2025-10-25
  last_verification_note: "Verified build, inference, model caching, and outputs on [GPU_MODEL] ([VRAM]GB VRAM)"

What to include in the note:

What was verified: build, inference, outputs
Key aspects: model caching, GPU/CPU inference
GPU info: Dynamically determine GPU model and VRAM using nvidia-smi (e.g., "A10 GPU (24GB VRAM)", "RTX 4090 (24GB)")
- If no GPU: Use "CPU-only"
VRAM usage: Peak VRAM used during inference (e.g., "Uses 8.2GB VRAM (34%) with 4-bit quantization")
- Get with: nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits
- Convert to GB and calculate percentage of total VRAM
- Note if quantization (4-bit/8-bit) was used
Any limitations: e.g., "Requires 8GB VRAM", "GPU memory constraints"
Quick notes: e.g., "First run downloads 470MB models"

Example complete entry:

name: pose-estimation-dwpose
docker: dwpose
capability: perception/pose_estimation
# ... other fields ...

verification:
  last_verified: 2025-10-25
  last_verification_note: "Verified build, inference with video/image inputs, model caching (470MB models), and JSON outputs on [detected GPU]."

When to update:

After completing full verification checklist (steps 1-7)
Only if ALL success criteria pass
When re-verifying after CVlization updates or fixes

Common Issues and Fixes

Build Failures

# Issue: Dockerfile can't find files
# Fix: Check COPY paths are relative to Dockerfile location

# Issue: Dependency conflicts
# Fix: Check requirements.txt versions, update base image

# Issue: Large build context
# Fix: Add .dockerignore file

Inference Failures

# Issue: CUDA out of memory
# Fix: Use smaller model variant or CPU inference

# Issue: Model not found
# Fix: Check model name/path in predict.py, ensure internet connection

# Issue: Input file not found
# Fix: Check file paths, ensure examples/ directory exists

# Issue: Permission denied on outputs
# Fix: Ensure output directories exist and are writable

Output Issues

# Issue: Empty outputs
# Fix: Check model loaded correctly, verify input format

# Issue: Malformed JSON output
# Fix: Check output parsing logic in predict.py

# Issue: Outputs not saved
# Fix: Verify output directory path, check file write permissions

Example Commands

Document AI - Granite Docling

cd examples/perception/doc_ai/granite_docling
./build.sh
./predict.sh
# Check: outputs/output.md contains extracted document structure

Vision-Language - Moondream

cd examples/perception/vision_language/moondream2
./build.sh
./predict.sh examples/demo.jpg
# Check: outputs/ contains image description

CVL Integration

Inference examples integrate with CVL command system:

# List all available examples
cvl list

# Get example info
cvl info granite-docling

# Run example directly (uses example.yaml presets)
cvl run granite-docling build
cvl run granite-docling predict

Success Criteria

An inference example passes verification when:

✅ Structure: All required files present, example.yaml valid
✅ Build: Docker image builds without errors (both ./build.sh and cvl run <name> build)
✅ Inference: Runs successfully on sample inputs (both ./predict.sh and cvl run <name> predict)
✅ Outputs: Valid output files generated in expected format
✅ Model Caching: Models cached to ~/.cache/ (typically ~/.cache/huggingface/), avoiding repeated downloads
✅ CVL CLI: cvl info <name> shows correct metadata, build and predict presets work
✅ Documentation: README explains how to use the example
✅ Verification Metadata: example.yaml updated with verification field containing last_verified date and last_verification_note

Related Files

Check these files for debugging:

predict.py - Core inference logic
predict.sh - Docker run script
Dockerfile - Environment setup
example.yaml - CVL metadata and presets
examples/ - Sample input files
README.md - Usage instructions

Tips

Use small sample inputs for fast validation
Monitor GPU memory with nvidia-smi if using GPU
Check docker logs <container> if inference hangs
For HuggingFace models, set HF_TOKEN environment variable if needed
Most examples support custom input paths as arguments to predict.sh
Check example.yaml for supported parameters and environment variables
For diffusion/flow matching models: Reduce sampling steps for faster validation (e.g., --num_steps 5 or -i num_steps=5 for Cog). Most models support step parameters:
- Common parameter names: num_steps, num_inference_steps, steps
- Typical defaults: 20-50 steps
- Fast validation: 5-10 steps (lower quality but completes quickly)
- Production: Full step count for best quality
- Examples: Stable Diffusion, SVD, FLUX, AnimateDiff, Flow Matching models

GitHub Repository

kungfuai/CVlization

Path: .claude/skills/verify-inference-example

aidecentralizeddockerfinetunegenerative-aiinference

Related Skills

langchain

Algorithmic Art Generation

webapp-testing

Testing

This Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.

View skill

requesting-code-review

Design

This skill dispatches a code-reviewer subagent to analyze code changes against requirements before proceeding. It should be used after completing tasks, implementing major features, or before merging to main. The review helps catch issues early by comparing the current implementation with the original plan.

View skill