gemini-video-understanding
关于
This Claude Skill enables developers to analyze video content using Google's Gemini API, including YouTube URLs. It can describe video content, answer questions, transcribe audio with timestamps, and create video clips. Use it when you need to process videos up to 6 hours long across 9 formats with large context window support.
技能文档
Gemini Video Understanding Skill
This skill enables comprehensive video analysis using Google's Gemini API, including video summarization, question answering, transcription, timestamp references, and more.
Capabilities
- Video Summarization: Create concise summaries of video content
- Question Answering: Answer specific questions about video content
- Transcription: Transcribe audio with visual descriptions and timestamps
- Timestamp References: Query specific moments in videos (MM:SS format)
- Video Clipping: Process specific segments using start/end offsets
- Multiple Videos: Compare and analyze up to 10 videos (Gemini 2.5+)
- YouTube Support: Analyze YouTube videos directly (preview feature)
- Custom Frame Rate: Adjust FPS sampling for different video types
Supported Formats
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP
Models Available
Gemini 2.5 Series:
gemini-2.5-pro- Best quality, 1M contextgemini-2.5-flash- Balanced quality/speed, 1M contextgemini-2.5-flash-preview-09-2025- Preview features, 1M context
Gemini 2.0 Series:
gemini-2.0-flash- Fast processinggemini-2.0-flash-lite- Lightweight option
Context Windows:
- 2M token models: ~2 hours (default) or ~6 hours (low-res)
- 1M token models: ~1 hour (default) or ~3 hours (low-res)
API Key Configuration
The skill supports both Google AI Studio and Vertex AI endpoints.
Option 1: Google AI Studio (Default)
The skill checks for GEMINI_API_KEY in this order:
- Process environment:
process.env.GEMINI_API_KEYor$GEMINI_API_KEY - Project root:
.env - .claude directory:
.claude/.env - .claude/skills directory:
.claude/skills/.env - Skill directory:
.claude/skills/gemini-video-understanding/.env
Get your API key: https://aistudio.google.com/apikey
To set up:
# Environment variable (recommended)
export GEMINI_API_KEY="your-api-key-here"
# Or in .env file
echo "GEMINI_API_KEY=your-api-key-here" > .env
Option 2: Vertex AI
To use Vertex AI instead:
# Enable Vertex AI
export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1
Or in .env file:
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1
Usage Instructions
When to Use This Skill
Use this skill when the user asks to:
- Analyze, summarize, or describe video content
- Answer questions about videos
- Transcribe video audio with visual context
- Extract information from specific timestamps
- Compare multiple videos
- Process YouTube video content
- Create quizzes or educational content from videos
Basic Video Analysis
For video files:
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Summarize this video in 3 key points"
For YouTube URLs:
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
--prompt "What are the main topics discussed?"
Advanced Features
Video Clipping (specific time range):
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Summarize this segment" \
--start-offset "40s" \
--end-offset "80s"
Custom Frame Rate:
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Analyze the rapid movements" \
--fps 5
Transcription with Timestamps:
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Transcribe the audio with timestamps and visual descriptions"
Multiple Videos (Gemini 2.5+ only):
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-paths "/path/video1.mp4" "/path/video2.mp4" \
--prompt "Compare these two videos and highlight the differences"
Model Selection:
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Detailed analysis" \
--model "gemini-2.5-pro"
Script Parameters
Required (one of):
--video-path PATH Path to local video file
--youtube-url URL YouTube video URL
--video-paths PATH [PATH..] Multiple video paths (Gemini 2.5+)
Required:
--prompt TEXT Analysis prompt/question
Optional:
--model NAME Model to use (default: gemini-2.5-flash)
--start-offset TIME Video clip start (e.g., "40s", "1m30s")
--end-offset TIME Video clip end (e.g., "80s", "2m")
--fps NUMBER Frame sampling rate (default: 1)
--output-file PATH Save response to file
--verbose Show detailed processing info
Common Use Cases
1. Video Summarization
Prompt: "Summarize this video in 3 key points with timestamps"
2. Educational Content
Prompt: "Create a quiz with 5 questions and answer key based on this video"
3. Timestamp-Specific Questions
Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"
4. Transcription
Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"
5. Content Comparison
Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"
6. Action Detection
Prompt: "List all the actions performed in this tutorial video with timestamps"
Rate Limits & Quotas
Free Tier (per model):
- 10-15 RPM (requests per minute)
- 1M-4M TPM (tokens per minute)
- 1,500 RPD (requests per day)
YouTube Limitations:
- Free tier: 8 hours of YouTube video per day
- Paid tier: No length-based limits
- Public videos only (no private/unlisted)
Storage (Files API):
- 20GB per project
- 2GB per file
- 48-hour retention period
Token Calculation
Video tokens depend on resolution:
- Default resolution: ~300 tokens per second of video
- Low resolution: ~100 tokens per second of video
Example: A 10-minute video = 600 seconds × 300 tokens = ~180,000 tokens
Error Handling
Common errors and solutions:
| Error | Cause | Solution |
|---|---|---|
| 400 Bad Request | Invalid video format or corrupt file | Check file format and integrity |
| 403 Forbidden | Invalid/missing API key | Verify GEMINI_API_KEY configuration |
| 404 Not Found | File URI not found | Ensure file is uploaded and active |
| 429 Too Many Requests | Rate limit exceeded | Implement backoff, upgrade to paid tier |
| 500 Internal Error | Server-side issue | Retry with exponential backoff |
Best Practices
- Use Files API for videos >20MB - More reliable than inline data
- Wait for file processing - Poll until state is ACTIVE before analysis
- Optimize FPS - Use lower FPS for static content to save tokens
- Clip long videos - Process specific segments instead of entire video
- Cache context - Reuse uploaded files for multiple queries
- Batch processing - Process multiple short videos in one request (2.5+)
- Specific prompts - Be precise about what you want to extract
Implementation Notes
For Claude Code:
When a user requests video analysis:
- Check API key availability first using the helper script
- Determine video source: local file, YouTube URL, or multiple videos
- Select appropriate model based on requirements (default: gemini-2.5-flash)
- Run the analysis script with proper parameters
- Parse and present results to the user clearly
- Handle errors gracefully with helpful suggestions
Files API Workflow:
For videos >20MB or reusable content:
- Upload video using Files API (script handles this automatically)
- Wait for ACTIVE state (polling included in script)
- Use file URI for analysis
- Files auto-delete after 48 hours
Inline Data Workflow:
For videos <20MB:
- Read video file as bytes
- Base64 encode for API
- Send in generateContent request
- Single-use, no upload needed
Example Workflows
Workflow 1: YouTube Video Summary
# User: "Analyze this YouTube tutorial video"
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--youtube-url "https://www.youtube.com/watch?v=abc123" \
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
Workflow 2: Interview Transcription
# User: "Transcribe this interview with timestamps"
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "interview.mp4" \
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
Workflow 3: Product Comparison
# User: "Compare these two product demo videos"
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-paths "demo1.mp4" "demo2.mp4" \
--model "gemini-2.5-pro" \
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
Troubleshooting
API Key Not Found:
# Check API key detection
python .claude/skills/gemini-video-understanding/scripts/check_api_key.py
Video Too Large:
Error: Request size exceeds 20MB
Solution: Script automatically uses Files API for large videos
Processing Timeout:
Error: File not reaching ACTIVE state
Solution: Check video integrity, try smaller file, or different format
Rate Limit Errors:
Error: 429 Too Many Requests
Solution: Wait before retry, or upgrade to paid tier
Additional Resources
- API Documentation: https://ai.google.dev/gemini-api/docs/video-understanding
- Files API Guide: https://ai.google.dev/gemini-api/docs/vision#uploading-files
- Rate Limits: https://ai.google.dev/gemini-api/docs/rate-limits
- Pricing: https://ai.google.dev/pricing
- Get API Key: https://aistudio.google.com/apikey
Version History
- 1.0.0 (2025-10-26): Initial release with full video understanding capabilities
快速安装
/plugin add https://github.com/Elios-FPT/EliosCodePracticeService/tree/main/gemini-video-understanding在 Claude Code 中复制并粘贴此命令以安装该技能
GitHub 仓库
相关推荐技能
evaluating-llms-harness
测试该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。
langchain
元LangChain是一个用于构建LLM应用程序的框架,支持智能体、链和RAG应用开发。它提供多模型提供商支持、500+工具集成、记忆管理和向量检索等核心功能。开发者可用它快速构建聊天机器人、问答系统和自主代理,适用于从原型验证到生产部署的全流程。
project-structure
元这个Skill为开发者提供全面的项目目录结构设计指南和最佳实践。它涵盖了多种项目类型包括monorepo、前后端框架、库和扩展的标准组织结构。帮助团队创建可扩展、易维护的代码架构,特别适用于新项目设计、遗留项目迁移和团队规范制定。
issue-documentation
元该Skill为开发者提供标准化的issue文档模板和指南,适用于创建bug报告、GitHub/Linear/Jira问题等场景。它能系统化地记录问题状况、复现步骤、根本原因、解决方案和影响范围,确保团队沟通清晰高效。通过实施主流问题跟踪系统的最佳实践,帮助开发者生成结构完整的故障排除文档和事件报告。
