SKILL·DF04F2

whisper-transcription

Name: whisper-transcription
Author: guia-matthieu

guia-matthieu

Updated 1 month ago

8 views

134

View on GitHub

Metaaidesign

About

This skill transcribes audio and video files to text using OpenAI's Whisper model. It's ideal for developers needing to generate subtitles, convert podcasts to text, or build searchable audio archives. Key capabilities include extracting quotes from interviews and repurposing multimedia content into written formats.

Quick Install

Claude Code

Recommended

Primary

npx skills add guia-matthieu/clawfu-skills -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/guia-matthieu/clawfu-skills

Git CloneAlternative

git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/whisper-transcription

Copy and paste this command in Claude Code to install this skill

Documentation

Whisper Transcription

Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.

When to Use This Skill

Podcast repurposing - Convert episodes to blog posts, show notes, social snippets
Video subtitles - Generate SRT/VTT files for YouTube, social media
Interview extraction - Pull quotes and insights from recorded calls
Content audit - Make audio/video libraries searchable
Translation - Transcribe and translate foreign language content

What Claude Does vs What You Decide

Claude Does	You Decide
Structures production workflow	Final creative direction
Suggests technical approaches	Equipment and tool choices
Creates templates and checklists	Quality standards
Identifies best practices	Brand/voice decisions
Generates script outlines	Final script approval

Dependencies

pip install openai-whisper torch ffmpeg-python click
# Also requires ffmpeg installed on system
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg

Commands

Transcribe Single File

python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt

Batch Transcription

python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/

Transcribe + Translate

python scripts/main.py translate foreign-audio.mp3 --to en

Extract Timestamps

python scripts/main.py timestamps podcast.mp3 --format json

Examples

Example 1: Podcast to Blog Post

# Transcribe 1-hour podcast
python scripts/main.py transcribe episode-42.mp3 --model medium

# Output: episode-42.txt (full transcript with timestamps)
# Processing time: ~5 min for 1 hour audio on M1 Mac

Example 2: YouTube Subtitles

# Generate SRT for video upload
python scripts/main.py transcribe marketing-video.mp4 --format srt

# Output: marketing-video.srt
# Upload directly to YouTube/Vimeo

Example 3: Batch Process Interview Library

# Transcribe all recordings in folder
python scripts/main.py batch ./customer-interviews/ --model small --format txt

# Output: ./customer-interviews/*.txt (one per audio file)

Model Selection Guide

Model	Speed	Accuracy	VRAM	Best For
`tiny`	Fastest	~70%	1GB	Quick drafts, short clips
`base`	Fast	~80%	1GB	Social media clips
`small`	Medium	~85%	2GB	Podcasts, interviews
`medium`	Slow	~90%	5GB	Professional transcripts
`large`	Slowest	~95%	10GB	Critical accuracy needs

Recommendation: Start with small for most marketing content. Use medium for client deliverables.

Output Formats

Format	Extension	Use Case
`txt`	.txt	Blog posts, analysis
`srt`	.srt	Video subtitles (YouTube)
`vtt`	.vtt	Web video subtitles
`json`	.json	Programmatic access
`tsv`	.tsv	Spreadsheet analysis

Performance Tips

GPU acceleration - 10x faster with CUDA GPU
Audio extraction - Script auto-extracts audio from video
Chunking - Long files auto-split for memory efficiency
Language detection - Automatic, or specify with --language

Skill Boundaries

What This Skill Does Well

Structuring audio production workflows
Providing technical guidance
Creating quality checklists
Suggesting creative approaches

What This Skill Cannot Do

Replace audio engineering expertise
Make subjective creative decisions
Access or edit audio files directly
Guarantee commercial success

Related Skills

video-processing - Extract audio from video
youtube-downloader - Download videos to transcribe
content-repurposer - Transform transcripts to content
podcast-production - Create podcasts

Skill Metadata

Mode: cyborg

category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week

GitHub Repository

guia-matthieu/clawfu-skills

Path: skills/automation/whisper-transcription

ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

FAQ

Frequently asked questions

What is the whisper-transcription skill?

whisper-transcription is a Claude Skill by guia-matthieu. Skills package instructions and resources that Claude loads on demand, so Claude can perform whisper-transcription-related tasks without extra prompting.

How do I install whisper-transcription?

Use the install commands on this page: add whisper-transcription to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does whisper-transcription belong to?

whisper-transcription is in the Meta category, tagged ai and design.

Is whisper-transcription free to use?

Yes. whisper-transcription is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

polymarket

Meta

This skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.

View skill

creating-opencode-plugins

Meta

This skill helps developers create OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It provides the plugin structure, event API specifications, and implementation patterns for JavaScript/TypeScript modules. Use it when you need to intercept, monitor, or extend the OpenCode AI assistant's lifecycle with custom event-driven logic.

View skill

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill