voice-localization
关于
This skill provides AI-powered voice synthesis to localize audio content into multiple languages while preserving your brand's vocal identity. It's ideal for dubbing videos, localizing marketing materials, and creating multilingual training content. Developers can use it to maintain consistent voice quality and character across global language expansions.
快速安装
Claude Code
推荐npx skills add guia-matthieu/clawfu-skills -a claude-code/plugin add https://github.com/guia-matthieu/clawfu-skillsgit clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/voice-localization在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
AI Voice Localization
Scale your brand voice across multiple languages using AI voice synthesis, maintaining consistent character and quality for global content.
When to Use This Skill
- Expanding video content to new language markets
- Creating multilingual courses or training
- Localizing ads and marketing videos
- Dubbing existing content for international audiences
- Building consistent global brand voice
- Deciding between dubbing vs. subtitles
Methodology Foundation
Source: ElevenLabs Multilingual + Global Content Best Practices
Core Principle: True localization means the same perceived person speaks each language natively—not a translated voice, but a voice that sounds local while maintaining brand character. AI voice synthesis enables this at scale by preserving voice identity while adapting pronunciation and rhythm to each language.
Why This Matters: Global content traditionally required separate voice actors per language, losing brand consistency. AI voice localization maintains the same "person" across 29+ languages, creating unified brand experience worldwide while reducing production costs 70-90%.
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |
What This Skill Does
- Maintains voice identity across languages - Same character, different language
- Handles cultural adaptation - Beyond translation to localization
- Manages multilingual production - Efficient workflows for many languages
- Ensures quality per market - Native speaker validation
- Calculates ROI - Traditional dubbing vs. AI localization costs
How to Use
Plan Localization Project
Help me plan voice localization for [content].
Source language: [original]
Target languages: [list]
Content type: [video/audio/course]
Volume: [duration/number of assets]
Evaluate Localization Approach
Should I use AI voice localization or traditional dubbing?
Content: [describe]
Markets: [target countries]
Budget: [range]
Timeline: [deadline]
Instructions
When localizing voice content, follow this methodology:
Step 1: Assess Localization Needs
Determine the right approach for your content.
## Localization Decision Matrix
### When to Use AI Voice Localization
✓ Same brand voice needed across markets
✓ Frequent content updates (efficiency matters)
✓ Educational/informational content
✓ Budget constraints
✓ Quick turnaround needed
✓ 5+ languages needed
### When to Use Traditional Dubbing
✓ Character-driven content (emotions critical)
✓ One-time major production
✓ Markets expect dubbed content (Germany, France)
✓ Complex lip-sync requirements
✓ Budget allows $1,000+ per language
### When to Use Subtitles Instead
✓ Documentary/interview content
✓ Authenticity of original voice matters
✓ Lowest budget option
✓ Markets prefer subtitles (Nordics, Netherlands)
✓ Legal/compliance content (exact words matter)
### Hybrid Approach
Hero content → Traditional dubbing
Supporting content → AI localization
Supplementary → Subtitles
Step 2: Select Languages Strategically
Prioritize languages based on market opportunity.
## Language Prioritization Framework
### Tier 1: High Volume Languages (1B+ speakers)
| Language | Global Speakers | Key Markets |
|----------|----------------|-------------|
| English | 1.5B | Global |
| Mandarin | 1.1B | China |
| Spanish | 550M | LATAM, Spain |
| Hindi | 600M | India |
### Tier 2: High Value Languages
| Language | Economic Value | Markets |
|----------|---------------|---------|
| German | High GDP | DACH |
| French | Colonial reach | France, Africa |
| Japanese | High spending | Japan |
| Portuguese | Large market | Brazil |
### Tier 3: Strategic Languages
| Language | Strategic Value | Markets |
|----------|----------------|---------|
| Arabic | Growing middle class | MENA |
| Korean | Tech-forward | South Korea |
| Italian | Fashion/luxury | Italy |
| Dutch | High English | Benelux |
### ElevenLabs Supported Languages (29+)
English, Spanish, French, German, Italian, Portuguese,
Polish, Dutch, Hindi, Arabic, Chinese, Japanese, Korean,
Turkish, Swedish, Indonesian, Filipino, Malay, Russian,
Czech, Danish, Finnish, Greek, Romanian, Ukrainian,
Vietnamese, Norwegian, Hungarian, Tamil, and more.
Step 3: Prepare Content for Localization
Translation alone isn't enough—prepare for voice adaptation.
## Content Preparation Checklist
### Script Adaptation
**Text expansion/contraction**:
| Language | vs English |
|----------|-----------|
| German | +30% longer |
| French | +15-20% longer |
| Spanish | +15-25% longer |
| Chinese | -30% shorter |
| Japanese | Variable |
**Implications**:
- Video may need re-timing
- Allow flexibility in pacing
- Consider sentence splitting for longer languages
**Localization notes to provide**:
□ Brand terms (don't translate, keep English)
□ Product names (pronunciation guide)
□ Numbers (format varies by locale)
□ Dates (format varies by locale)
□ Currency (localize amounts)
□ Cultural references (adapt or explain)
### Voice Consistency Notes
**Preserve across languages**:
- Character/personality
- Energy level
- Authority/warmth balance
- Pace relative to content
**Adapt per language**:
- Natural rhythm and cadence
- Pronunciation of brand terms
- Formal/informal register (varies by culture)
Step 4: Production Workflow
Efficient process for multilingual voice production.
## Multilingual Production Pipeline
### Phase 1: Source Production
1. Finalize English script
2. Record/generate English voice
3. Lock timing and pacing
4. Create master video/audio
### Phase 2: Translation
1. Professional translation (not machine)
2. Localization review (cultural adaptation)
3. Timing adaptation (fit original duration)
4. Brand term glossary enforcement
### Phase 3: Voice Generation
**Per language**:
- Load translated script
- Apply same voice settings as source
- Generate voice in target language
- Check pronunciation of brand terms
- Adjust pacing if needed
- Review for naturalness
### Phase 4: Quality Control
**Native speaker review checklist**:
□ Natural pronunciation
□ Correct emphasis and intonation
□ Brand terms handled correctly
□ No awkward phrasing
□ Appropriate formality level
□ Cultural appropriateness
### Phase 5: Integration
1. Replace audio track in video
2. Re-sync if timing changed
3. Update text overlays
4. Localize captions/subtitles
5. Final review per language
Step 5: Quality Assurance
Ensure each language meets standards.
## Localization QA Framework
### Technical QA
□ Audio levels consistent across languages
□ No clipping or distortion
□ Background music balanced correctly
□ Transitions smooth
□ Sync with video acceptable
### Linguistic QA
□ Translation accuracy (spot check 10%)
□ Natural flow and rhythm
□ Brand voice maintained
□ Technical terms correct
□ No machine-translation artifacts
### Cultural QA
□ No offensive content for market
□ References appropriate
□ Humor/idioms adapted correctly
□ Visual content appropriate
□ Call-to-action localized
### Native Speaker Sign-Off
For each language:
- [ ] Spanish (Reviewer: _____) ☐ Approved
- [ ] French (Reviewer: _____) ☐ Approved
- [ ] German (Reviewer: _____) ☐ Approved
- [ ] [Add languages...]
Step 6: Calculate ROI
Compare AI localization to traditional approaches.
## Localization Cost Comparison
### Traditional Dubbing (per language)
| Component | Cost |
|-----------|------|
| Translation | $0.15/word |
| Voice talent | $300-1,000/hour finished |
| Studio time | $100-200/hour |
| Direction | $50-100/hour |
| Engineering | $50-100/hour |
**Example**: 10-minute video (1,500 words)
- Translation: $225
- Voice talent: $400
- Studio: $200
- Direction: $150
- Engineering: $100
- **Total: ~$1,075 per language**
### AI Voice Localization
| Component | Cost |
|-----------|------|
| Translation | $0.15/word |
| ElevenLabs Pro | $99/mo (unlimited) |
| QA review | $50-100/language |
**Example**: 10-minute video (1,500 words)
- Translation: $225
- Voice generation: ~$0 (within plan)
- QA review: $75
- **Total: ~$300 per language**
### ROI Summary
| Languages | Traditional | AI Localization | Savings |
|-----------|-------------|-----------------|---------|
| 5 | $5,375 | $1,500 | 72% |
| 10 | $10,750 | $3,000 | 72% |
| 20 | $21,500 | $6,000 | 72% |
**Break-even**: AI localization typically saves 70%+ vs. traditional
Examples
Example 1: E-Learning Course Localization
Context: Localizing a 20-video course to 5 languages.
Input:
We have a 20-video SaaS training course in English (total 3 hours). Need to localize to Spanish, French, German, Portuguese, and Japanese. What's the best approach?
Output:
Course Localization Plan
Project Scope
- Content: 20 videos, 3 hours total
- Source: English
- Targets: Spanish, French, German, Portuguese, Japanese
- Content type: Educational (clarity > emotion)
Recommended Approach: AI Voice Localization
Why AI is ideal here:
- Educational content (fact delivery, not drama)
- 5 languages (efficiency critical)
- Likely ongoing updates (need repeatable process)
- Consistent "instructor" voice across all versions
Implementation Plan
Platform: ElevenLabs Pro ($99/mo)
Voice Strategy:
- Clone original instructor OR design consistent voice
- Same voice speaks all 6 languages
- Maintains authority and teaching style
Workflow per Video:
-
Translation (outsource)
- Professional translation (not machine)
- Provide glossary of product terms
- Timing guidance for text expansion
-
Voice Generation
- Generate all 5 languages from same voice
- ~15 minutes per language per video
- Total: ~25 hours voice generation
-
QA Review
- Native speaker review per language
- Focus: pronunciation, naturalness, accuracy
- Budget 1 hour review per language per video
-
Video Integration
- Replace audio tracks
- Adjust timing if needed
- Verify captions match
Timeline
| Phase | Duration | Notes |
|---|---|---|
| Translation (all) | 2 weeks | Parallel |
| Voice generation | 1 week | ~5 hours/day |
| QA review | 2 weeks | Parallel per language |
| Integration | 1 week | Parallel |
| Total | 5-6 weeks | With buffer |
Budget Estimate
| Item | Cost |
|---|---|
| Translation (5 lang × 27k words) | $20,250 |
| ElevenLabs (2 months) | $200 |
| QA review (5 lang × 20 videos × $30) | $3,000 |
| Integration (20 hours @ $50) | $1,000 |
| Total | ~$24,500 |
vs. Traditional dubbing: ~$107,500 (77% savings)
Example 2: Global Ad Campaign
Context: 30-second ad needs localization for global launch.
Input:
We're launching a 30-second video ad globally. Need 10 language versions in 2 weeks. Budget is $15,000.
Output:
Ad Localization: Rapid Global Launch
Constraints
- 30-second ad (tight timing)
- 10 languages
- 2-week deadline
- $15,000 budget (~$1,500/language)
Recommended: Hybrid Approach
Tier 1 (Hero Markets) - Traditional Dubbing
- English (source)
- Spanish (largest reach)
- German (high value)
- French (high value)
Tier 2 (Scale Markets) - AI Localization
- Portuguese, Italian, Dutch, Polish, Japanese, Korean
Rationale
- Hero markets get premium treatment
- AI handles scale efficiently
- Both meet deadline
Production Schedule
Week 1:
| Day | Task |
|---|---|
| 1-2 | All translations complete |
| 2-3 | Traditional dubbing sessions (4 languages) |
| 3-4 | AI voice generation (6 languages) |
| 4-5 | QA review all versions |
Week 2:
| Day | Task |
|---|---|
| 1-2 | Revisions and fixes |
| 3-4 | Video integration all versions |
| 5 | Final review and delivery |
Budget Allocation
| Item | Cost |
|---|---|
| Translation (10 × ~120 words) | $1,800 |
| Traditional dubbing (4 lang) | $4,800 |
| AI generation (6 lang) | $600 |
| QA review (10 lang) | $2,000 |
| Integration (10 lang) | $2,500 |
| Buffer | $3,300 |
| Total | $15,000 |
Checklists & Templates
Localization Project Checklist
## Pre-Production
□ Languages selected and prioritized
□ Budget allocated per language
□ Timeline established
□ Translation vendor selected
□ Brand glossary prepared
□ Voice consistency plan defined
## Production
□ Translations complete
□ Translations reviewed for brand terms
□ Voice generated per language
□ Pronunciation verified
□ Timing adjusted if needed
## Quality Assurance
□ Native speaker review complete
□ Technical QA passed
□ Brand guidelines verified
□ Cultural review passed
□ Legal/compliance check (if needed)
## Delivery
□ Files named correctly per language
□ All formats delivered
□ Captions/subtitles provided
□ Documentation complete
□ Source files archived
Brand Glossary Template
## [Brand] Localization Glossary
### Never Translate
| English | Note |
|---------|------|
| [Brand Name] | Keep English, pronunciation: [X] |
| [Product Name] | Keep English |
| [Feature Name] | Keep English, explain in context |
### Translate Consistently
| English | Spanish | French | German |
|---------|---------|--------|--------|
| Dashboard | Panel | Tableau de bord | Dashboard |
| Workflow | Flujo de trabajo | Flux de travail | Arbeitsablauf |
| [Term] | | | |
### Pronunciation Guide
| Term | Pronunciation |
|------|--------------|
| [Brand] | /brănd/ |
| [Feature] | /fē-chər/ |
Skill Boundaries
What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches
What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success
References
- ElevenLabs. "Multilingual Voice Synthesis" - Platform documentation
- CSA Research. "Global Content Strategy" - Localization best practices
- Unbabel. "The State of Localization" - Industry benchmarks
- Nimdzi. "Localization Market Research" - Cost and ROI data
Related Skills
- voice-design - Creating the base voice
- voiceover-direction - Quality control principles
- transcription-to-content - Preparing source content
Skill Metadata (Internal Use)
name: voice-localization
category: audio
subcategory: voice
version: 1.0
author: MKTG Skills
source_expert: ElevenLabs, Localization Best Practices
source_work: Multilingual Content Production
difficulty: intermediate
estimated_value: 70%+ cost savings vs. traditional dubbing
tags: [localization, multilingual, dubbing, ai-voice, global]
created: 2026-01-26
updated: 2026-01-26
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
