verify-inference-example
About
This skill verifies CVlization inference examples by checking their structure, ensuring successful builds, and validating correct inference execution. It's primarily used for validating new implementations or debugging existing inference pipeline issues. The skill includes GPU environment awareness, advising users to check resource availability before running intensive inference tasks.
Quick Install
Claude Code
Recommended/plugin add https://github.com/kungfuai/CVlizationgit clone https://github.com/kungfuai/CVlization.git ~/.claude/skills/verify-inference-exampleCopy and paste this command in Claude Code to install this skill
Documentation
Verify Inference Example
Systematically verify that a CVlization inference example is complete, properly structured, and functional.
When to Use
- Validating a new or modified inference example
- Debugging inference pipeline issues
- Ensuring example completeness before commits
- Verifying example works after CVlization updates
Important Context
Shared GPU Environment: This machine may be used by multiple users simultaneously. Before running GPU-intensive inference:
- Check GPU memory availability with
nvidia-smi - Wait for sufficient VRAM and low GPU utilization if needed
- Consider stopping other processes if you have permission
- If CUDA OOM errors occur, wait and retry when GPU is less busy
Verification Checklist
1. Structure Verification
Check that the example directory contains all required files:
# Navigate to example directory
cd examples/<capability>/<task>/<framework>/
# Expected structure:
# .
# ├── example.yaml # Required: CVL metadata
# ├── Dockerfile # Required: Container definition
# ├── build.sh # Required: Build script
# ├── predict.sh # Required: Inference script
# ├── predict.py # Required: Inference code
# ├── examples/ # Required: Sample inputs
# ├── outputs/ # Created at runtime
# └── README.md # Recommended: Documentation
Key files to check:
example.yaml- Must have: name, capability, stability, presets (build, predict/inference)Dockerfile- Should copy necessary files and install dependenciesbuild.sh- Must setSCRIPT_DIRand calldocker buildpredict.sh- Must mount volumes correctly and call predict.pypredict.py- Main inference scriptexamples/- Directory with sample input files
2. Build Verification
# Option 1: Build using script directly
./build.sh
# Option 2: Build using CVL CLI (recommended)
cvl run <example-name> build
# Verify image was created
docker images | grep <example-name>
# Expected: Image appears with recent timestamp
What to check:
- Build completes without errors (both methods)
- All dependencies install successfully
- Image size is reasonable
cvl info <example-name>shows correct metadata
3. Inference Verification
Run inference with sample inputs:
# Option 1: Run inference using script directly
./predict.sh
# Option 2: Run inference using CVL CLI (recommended)
cvl run <example-name> predict
# With custom inputs (if supported)
./predict.sh path/to/custom/input.jpg
Immediate checks:
- Container starts without errors
- Model loads successfully (check GPU memory with
nvidia-smiif using GPU) - Inference completes (outputs generated)
- Output files created in
outputs/or similar directory - Results look reasonable (open output files to inspect)
4. Output Verification
Check that inference produces valid outputs:
# Check outputs directory
ls -la outputs/
# Expected: Output files with recent timestamps
# Inspect output content
cat outputs/output.md # For text outputs
# or
python -m json.tool outputs/output.json # For JSON outputs
What to verify:
- Output files are created
- Output format is correct (markdown, JSON, etc.)
- Output contains expected content structure
- Output is non-empty and valid
5. Model Caching Verification
Verify that pretrained models are cached properly:
# Check HuggingFace cache
ls -la ~/.cache/huggingface/hub/
# Expected: Model files downloaded once and reused
# Run inference twice and verify no re-download
./predict.sh 2>&1 | tee first_run.log
# Second run should reuse cached models
./predict.sh 2>&1 | tee second_run.log
# Verify no download messages in second run
grep -i "downloading" second_run.log
# Expected: No new downloads (models already cached)
What to verify:
- Models download to
~/.cache/huggingface/(or framework-specific cache) - Second run reuses cached models without re-downloading
- Check predict.py doesn't set custom cache directories that break caching
6. Runtime Checks
GPU VRAM Usage Monitoring (REQUIRED for GPU models):
Monitor GPU VRAM usage before, during, and after inference:
# In another terminal, watch GPU memory in real-time
watch -n 1 nvidia-smi
# Or get detailed memory breakdown
nvidia-smi --query-gpu=index,name,memory.used,memory.total,memory.free,utilization.gpu --format=csv,noheader,nounits
# Record peak VRAM usage during inference
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | awk '{print $1 " MB"}'
Expected metrics:
- Model loading: VRAM usage increases as model loads into memory
- Inference peak: VRAM spikes during forward pass
- Cleanup: Memory released after inference completes (for short-running containers)
- Temperature: Stable (<85°C)
What to record for verification metadata:
- Peak VRAM usage in GB (e.g., "8.2GB VRAM" or "12.5GB VRAM")
- Percentage of total VRAM (e.g., "52%" for 12.5GB on 24GB GPU)
- Whether 4-bit/8-bit quantization was used (affects VRAM requirements)
Troubleshooting:
- CUDA OOM: Use smaller model variant, enable quantization (4-bit/8-bit), or run on CPU
- High VRAM idle usage: Check if other processes are using GPU
- Memory not released: Container may still be running (
docker ps)
Docker Container Health:
# Check container runs and exits cleanly
docker ps -a | head
# Verify mounts (for running container)
docker inspect <container-id> | grep -A 10 Mounts
# Should see: workspace, cvlization_repo, huggingface cache
7. Quick Validation Test
For fast verification during development:
# Run with smallest sample input
./predict.sh examples/small_sample.jpg
# Expected runtime: seconds to few minutes
# Verify: Completes without errors, output generated
8. Update Verification Metadata
After successful verification, update the example.yaml with verification metadata:
First, check GPU info:
# Get GPU model and VRAM
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
Format:
verification:
last_verified: 2025-10-25
last_verification_note: "Verified build, inference, model caching, and outputs on [GPU_MODEL] ([VRAM]GB VRAM)"
What to include in the note:
- What was verified: build, inference, outputs
- Key aspects: model caching, GPU/CPU inference
- GPU info: Dynamically determine GPU model and VRAM using nvidia-smi (e.g., "A10 GPU (24GB VRAM)", "RTX 4090 (24GB)")
- If no GPU: Use "CPU-only"
- VRAM usage: Peak VRAM used during inference (e.g., "Uses 8.2GB VRAM (34%) with 4-bit quantization")
- Get with:
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits - Convert to GB and calculate percentage of total VRAM
- Note if quantization (4-bit/8-bit) was used
- Get with:
- Any limitations: e.g., "Requires 8GB VRAM", "GPU memory constraints"
- Quick notes: e.g., "First run downloads 470MB models"
Example complete entry:
name: pose-estimation-dwpose
docker: dwpose
capability: perception/pose_estimation
# ... other fields ...
verification:
last_verified: 2025-10-25
last_verification_note: "Verified build, inference with video/image inputs, model caching (470MB models), and JSON outputs on [detected GPU]."
When to update:
- After completing full verification checklist (steps 1-7)
- Only if ALL success criteria pass
- When re-verifying after CVlization updates or fixes
Common Issues and Fixes
Build Failures
# Issue: Dockerfile can't find files
# Fix: Check COPY paths are relative to Dockerfile location
# Issue: Dependency conflicts
# Fix: Check requirements.txt versions, update base image
# Issue: Large build context
# Fix: Add .dockerignore file
Inference Failures
# Issue: CUDA out of memory
# Fix: Use smaller model variant or CPU inference
# Issue: Model not found
# Fix: Check model name/path in predict.py, ensure internet connection
# Issue: Input file not found
# Fix: Check file paths, ensure examples/ directory exists
# Issue: Permission denied on outputs
# Fix: Ensure output directories exist and are writable
Output Issues
# Issue: Empty outputs
# Fix: Check model loaded correctly, verify input format
# Issue: Malformed JSON output
# Fix: Check output parsing logic in predict.py
# Issue: Outputs not saved
# Fix: Verify output directory path, check file write permissions
Example Commands
Document AI - Granite Docling
cd examples/perception/doc_ai/granite_docling
./build.sh
./predict.sh
# Check: outputs/output.md contains extracted document structure
Vision-Language - Moondream
cd examples/perception/vision_language/moondream2
./build.sh
./predict.sh examples/demo.jpg
# Check: outputs/ contains image description
CVL Integration
Inference examples integrate with CVL command system:
# List all available examples
cvl list
# Get example info
cvl info granite-docling
# Run example directly (uses example.yaml presets)
cvl run granite-docling build
cvl run granite-docling predict
Success Criteria
An inference example passes verification when:
- ✅ Structure: All required files present, example.yaml valid
- ✅ Build: Docker image builds without errors (both
./build.shandcvl run <name> build) - ✅ Inference: Runs successfully on sample inputs (both
./predict.shandcvl run <name> predict) - ✅ Outputs: Valid output files generated in expected format
- ✅ Model Caching: Models cached to
~/.cache/(typically~/.cache/huggingface/), avoiding repeated downloads - ✅ CVL CLI:
cvl info <name>shows correct metadata, build and predict presets work - ✅ Documentation: README explains how to use the example
- ✅ Verification Metadata: example.yaml updated with
verificationfield containinglast_verifieddate andlast_verification_note
Related Files
Check these files for debugging:
predict.py- Core inference logicpredict.sh- Docker run scriptDockerfile- Environment setupexample.yaml- CVL metadata and presetsexamples/- Sample input filesREADME.md- Usage instructions
Tips
- Use small sample inputs for fast validation
- Monitor GPU memory with
nvidia-smiif using GPU - Check
docker logs <container>if inference hangs - For HuggingFace models, set
HF_TOKENenvironment variable if needed - Most examples support custom input paths as arguments to predict.sh
- Check example.yaml for supported parameters and environment variables
- For diffusion/flow matching models: Reduce sampling steps for faster validation (e.g.,
--num_steps 5or-i num_steps=5for Cog). Most models support step parameters:- Common parameter names:
num_steps,num_inference_steps,steps - Typical defaults: 20-50 steps
- Fast validation: 5-10 steps (lower quality but completes quickly)
- Production: Full step count for best quality
- Examples: Stable Diffusion, SVD, FLUX, AnimateDiff, Flow Matching models
- Common parameter names:
GitHub Repository
Related Skills
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
Algorithmic Art Generation
MetaThis skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.
webapp-testing
TestingThis Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.
requesting-code-review
DesignThis skill dispatches a code-reviewer subagent to analyze code changes against requirements before proceeding. It should be used after completing tasks, implementing major features, or before merging to main. The review helps catch issues early by comparing the current implementation with the original plan.
