runpod
关于
This Claude Skill enables cloud GPU processing via RunPod's serverless platform for running AI models like image editing, upscaling, and text-to-speech. It handles endpoint setup, Docker deployment, and resource management while covering five specific toolkit images. Developers should use it when they need pay-per-second GPU access without minimum commitments.
快速安装
Claude Code
推荐npx skills add digitalsamba/claude-code-video-toolkit -a claude-code/plugin add https://github.com/digitalsamba/claude-code-video-toolkitgit clone https://github.com/digitalsamba/claude-code-video-toolkit.git ~/.claude/skills/runpod在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
RunPod Cloud GPU
Run open-source AI models on cloud GPUs via RunPod serverless. Pay-per-second, no minimums.
Setup
# 1. Create account at https://runpod.io
# 2. Add API key to .env
echo "RUNPOD_API_KEY=your_key_here" >> .env
# 3. Deploy any tool with --setup
python tools/image_edit.py --setup
python tools/upscale.py --setup
python tools/dewatermark.py --setup
python tools/sadtalker.py --setup
python tools/qwen3_tts.py --setup
Each --setup command:
- Creates a RunPod template from the Docker image
- Creates a serverless endpoint with appropriate GPU
- Saves the endpoint ID to
.env(e.g.RUNPOD_QWEN_EDIT_ENDPOINT_ID)
Available Images
All images are public on GHCR — no authentication needed.
| Tool | Docker Image | GPU | VRAM | Typical Cost |
|---|---|---|---|---|
| image_edit | ghcr.io/conalmullan/video-toolkit-qwen-edit:latest | A6000/L40S | 48GB+ | ~$0.05-0.15/job |
| upscale | ghcr.io/conalmullan/video-toolkit-realesrgan:latest | RTX 3090/4090 | 24GB | ~$0.01-0.05/job |
| dewatermark | ghcr.io/conalmullan/video-toolkit-propainter:latest | RTX 3090/4090 | 24GB | ~$0.05-0.30/job |
| sadtalker | ghcr.io/conalmullan/video-toolkit-sadtalker:latest | RTX 4090 | 24GB | ~$0.05-0.15/job |
| qwen3_tts | ghcr.io/conalmullan/video-toolkit-qwen3-tts:latest | ADA 24GB | 24GB | ~$0.01-0.05/job |
Total monthly cost: Rarely exceeds $10 even with heavy use.
How It Works
All tools follow the same pattern:
Local CLI → Upload input to cloud storage → RunPod API → Poll for result → Download output
- File transfer: Tools use Cloudflare R2 when configured (
R2_ACCOUNT_ID,R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY,R2_BUCKET_NAME), falling back to free upload services - RunPod API: Tools call the
/runendpoint, then poll/status/{job_id}until complete - Cold vs warm start: First request after idle spins up a worker (~30-90s). Subsequent requests are fast (~5-15s)
Endpoint Management
Workers
workersMin: 0 — Scale to zero when idle (no cost)
workersMax: 1 — Max concurrent jobs (increase for throughput)
idleTimeout: 5 — Seconds before worker scales down
Across all endpoints, you share a total worker pool based on your RunPod plan. If you hit limits, reduce workersMax on endpoints you're not actively using.
Checking Endpoint Status
Each tool stores its endpoint ID in .env:
| Tool | Env Var |
|---|---|
| image_edit | RUNPOD_QWEN_EDIT_ENDPOINT_ID |
| upscale | RUNPOD_UPSCALE_ENDPOINT_ID |
| dewatermark | RUNPOD_DEWATERMARK_ENDPOINT_ID |
| sadtalker | RUNPOD_SADTALKER_ENDPOINT_ID |
| qwen3_tts | RUNPOD_QWEN3_TTS_ENDPOINT_ID |
Disabling an Endpoint
To free worker slots without deleting the endpoint, set workersMax=0 via the RunPod dashboard or GraphQL API.
RunPod API Reference
Use these to query and manage endpoints programmatically. RunPod disables GraphQL introspection, so these field names are verified and must be exact.
Authentication
All API calls require Authorization: Bearer $RUNPOD_API_KEY.
- GraphQL:
POST https://api.runpod.io/graphql - REST (Serverless):
https://api.runpod.ai/v2/{endpoint_id}/...
GraphQL Queries
List all endpoints:
query { myself { endpoints { id name gpuIds templateId workersMax workersMin } } }
Current spend rate:
query { myself { currentSpendPerHr spendDetails { localStoragePerHour networkStoragePerHour gpuComputePerHour } } }
List pods:
query { myself { pods { id name runtime { uptimeInSeconds } machine { gpuDisplayName } desiredStatus } } }
Common mistakes: Field names are camelCase with full words —
localStoragePerHournotlocalStoragePerHr. Endpoints areendpointsnotserverlessWorkers.spendingis not a field — usecurrentSpendPerHrandspendDetails.
GraphQL Mutations
Update endpoint GPU or config:
mutation { saveEndpoint(input: {
id: "endpoint_id",
name: "endpoint-name",
templateId: "template_id",
gpuIds: "AMPERE_24",
workersMin: 0,
workersMax: 1
}) { id gpuIds } }
saveEndpoint requires name and templateId even for updates — query first to get current values.
REST API (Serverless)
| Action | Method | URL |
|---|---|---|
| Submit job | POST | /v2/{id}/run |
| Check status | GET | /v2/{id}/status/{job_id} |
| Cancel job | POST | /v2/{id}/cancel/{job_id} |
| List pending | GET | /v2/{id}/requests |
| Health/stats | GET | /v2/{id}/health |
Health response includes job counts and worker state:
{
"jobs": { "completed": 16, "failed": 1, "inProgress": 0, "inQueue": 2, "retried": 0 },
"workers": { "idle": 0, "initializing": 1, "ready": 0, "running": 0, "throttled": 0 }
}
Note:
/requestsonly returns pending/queued jobs. Completed job history is not available via the API — check the RunPod web console for logs.
GPU Type IDs
| ID | GPU | VRAM | Typical Cost |
|---|---|---|---|
AMPERE_24 | RTX 3090 | 24GB | ~$0.34/hr |
ADA_24 | RTX 4090 | 24GB | ~$0.69/hr |
AMPERE_48 | A6000 | 48GB | ~$0.76/hr |
AMPERE_80 | A100 | 80GB | ~$1.99/hr |
Availability note: ADA_24 (4090) is frequently throttled/unavailable on RunPod. Always configure endpoints with multiple fallback GPU types (comma-separated) to avoid jobs getting stuck in queue indefinitely:
gpuIds: "AMPERE_24,ADA_24" # Try 3090 first, fall back to 4090
All toolkit tools also enforce a 5-minute queue timeout — if no GPU is available within 300 seconds, the job is automatically cancelled to prevent runaway billing from failed initialization cycles.
Cloudflare R2 via AWS CLI
R2 uses the S3-compatible API but requires --region auto:
AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY_ID" \
AWS_SECRET_ACCESS_KEY="$R2_SECRET_ACCESS_KEY" \
aws s3api list-objects-v2 \
--bucket "$R2_BUCKET_NAME" \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region auto
Common mistake: Omitting
--region autocausesInvalidRegionNameerror. R2 valid regions:wnam,enam,weur,eeur,apac,oc,auto.
Troubleshooting
Force Image Pull
When you push a new Docker image version, RunPod may still use the cached old one. To force a pull:
- Update the template's
imageNameto use@sha256:DIGESTnotation - Wait for the worker to restart
- Revert to
:latesttag after confirming
Cold Start Too Slow
- qwen3-tts: ~70s cold start, ~7s warm
- sadtalker: ~60s cold start, ~10s warm
- image_edit: ~90s cold start, ~15s warm
If cold starts are a problem, set workersMin: 1 (costs money when idle).
Job Fails with OOM
The model needs more VRAM than the GPU provides. Options:
- Use a larger GPU tier
- For dewatermark: reduce
--resize-ratio(default 0.5 for safety) - For image_edit: reduce
--steps
"No workers available"
You've hit your plan's concurrent worker limit. Either:
- Wait for a running job to finish
- Set
workersMax=0on endpoints you're not using - Upgrade your RunPod plan
Docker Images
All Dockerfiles live in docker/runpod-*/. Images use runpod/pytorch as the base to share layers across tools.
Building for RunPod (from Apple Silicon Mac):
docker buildx build --platform linux/amd64 -t ghcr.io/conalmullan/video-toolkit-<name>:latest docker/runpod-<name>/
docker push ghcr.io/conalmullan/video-toolkit-<name>:latest
GHCR packages default to private — you must manually make them public for RunPod to pull them. Go to GitHub > Packages > Package Settings > Change Visibility.
Cost Optimization
- Keep
workersMin: 0on all endpoints (scale to zero) - Only deploy endpoints you actively need
- Use
workersMax=0to disable idle endpoints without deleting them - Qwen3-TTS is significantly cheaper than ElevenLabs for voiceovers
- Check the RunPod dashboard for usage and billing
GitHub 仓库
相关推荐技能
railway-docs
文档Railway Docs Skill可实时获取最新的Railway官方文档,确保回答的准确性。当开发者询问Railway功能特性、工作原理或分享docs.railway.com链接时,应优先使用此技能。它通过专门的LLM优化文档源提供最新信息,避免依赖过时记忆来回答技术问题。
n8n-code-python
文档该Skill为在n8n平台的Python代码节点中编写代码提供专家指导,特别适用于需要使用_input/_json/_node语法、Python标准库或了解n8n中Python限制的场景。它强调JavaScript应作为首选方案,仅当需要特定Python功能或对Python语法更熟悉时才使用Python。Skill提供了快速入门模板和关键注意事项,帮助开发者在n8n中高效编写Python代码。
archon
文档Archon Skill为开发者提供了基于RAG的语义搜索和项目任务管理功能,可通过REST API访问知识库。它支持文档搜索、网站爬取、文件上传和版本控制,适用于技术文档查询和项目管理场景。首次使用时需要配置Archon主机地址,建议在处理外部文档时优先使用该Skill。
n8n-code-javascript
文档这个Skill为n8n工作流中的JavaScript代码节点提供专业指导,涵盖数据处理、HTTP请求和日期操作等核心场景。它详细解释了如何正确使用n8n特有的`$input`/`$json`语法、`$helpers`工具以及DateTime对象,并包含关键的错误排查和模式选择建议。开发者通过该Skill能快速掌握Code节点的正确返回格式、数据访问方法和常见陷阱解决方案。
