返回技能列表

voice-localization

guia-matthieu
更新于 2 days ago
8 次查看
111
20
111
在 GitHub 上查看
aidesign

关于

This skill provides AI-powered voice synthesis to localize audio content into multiple languages while preserving your brand's vocal identity. It's ideal for dubbing videos, localizing marketing materials, and creating multilingual training content. Developers can use it to maintain consistent voice quality and character across global language expansions.

快速安装

Claude Code

推荐
主要方式
npx skills add guia-matthieu/clawfu-skills -a claude-code
插件命令备选方式
/plugin add https://github.com/guia-matthieu/clawfu-skills
Git 克隆备选方式
git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/voice-localization

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

AI Voice Localization

Scale your brand voice across multiple languages using AI voice synthesis, maintaining consistent character and quality for global content.

When to Use This Skill

  • Expanding video content to new language markets
  • Creating multilingual courses or training
  • Localizing ads and marketing videos
  • Dubbing existing content for international audiences
  • Building consistent global brand voice
  • Deciding between dubbing vs. subtitles

Methodology Foundation

Source: ElevenLabs Multilingual + Global Content Best Practices

Core Principle: True localization means the same perceived person speaks each language natively—not a translated voice, but a voice that sounds local while maintaining brand character. AI voice synthesis enables this at scale by preserving voice identity while adapting pronunciation and rhythm to each language.

Why This Matters: Global content traditionally required separate voice actors per language, losing brand consistency. AI voice localization maintains the same "person" across 29+ languages, creating unified brand experience worldwide while reducing production costs 70-90%.

What Claude Does vs What You Decide

Claude DoesYou Decide
Structures production workflowFinal creative direction
Suggests technical approachesEquipment and tool choices
Creates templates and checklistsQuality standards
Identifies best practicesBrand/voice decisions
Generates script outlinesFinal script approval

What This Skill Does

  1. Maintains voice identity across languages - Same character, different language
  2. Handles cultural adaptation - Beyond translation to localization
  3. Manages multilingual production - Efficient workflows for many languages
  4. Ensures quality per market - Native speaker validation
  5. Calculates ROI - Traditional dubbing vs. AI localization costs

How to Use

Plan Localization Project

Help me plan voice localization for [content].
Source language: [original]
Target languages: [list]
Content type: [video/audio/course]
Volume: [duration/number of assets]

Evaluate Localization Approach

Should I use AI voice localization or traditional dubbing?
Content: [describe]
Markets: [target countries]
Budget: [range]
Timeline: [deadline]

Instructions

When localizing voice content, follow this methodology:

Step 1: Assess Localization Needs

Determine the right approach for your content.

## Localization Decision Matrix

### When to Use AI Voice Localization

✓ Same brand voice needed across markets
✓ Frequent content updates (efficiency matters)
✓ Educational/informational content
✓ Budget constraints
✓ Quick turnaround needed
✓ 5+ languages needed

### When to Use Traditional Dubbing

✓ Character-driven content (emotions critical)
✓ One-time major production
✓ Markets expect dubbed content (Germany, France)
✓ Complex lip-sync requirements
✓ Budget allows $1,000+ per language

### When to Use Subtitles Instead

✓ Documentary/interview content
✓ Authenticity of original voice matters
✓ Lowest budget option
✓ Markets prefer subtitles (Nordics, Netherlands)
✓ Legal/compliance content (exact words matter)

### Hybrid Approach
Hero content → Traditional dubbing
Supporting content → AI localization
Supplementary → Subtitles

Step 2: Select Languages Strategically

Prioritize languages based on market opportunity.

## Language Prioritization Framework

### Tier 1: High Volume Languages (1B+ speakers)
| Language | Global Speakers | Key Markets |
|----------|----------------|-------------|
| English | 1.5B | Global |
| Mandarin | 1.1B | China |
| Spanish | 550M | LATAM, Spain |
| Hindi | 600M | India |

### Tier 2: High Value Languages
| Language | Economic Value | Markets |
|----------|---------------|---------|
| German | High GDP | DACH |
| French | Colonial reach | France, Africa |
| Japanese | High spending | Japan |
| Portuguese | Large market | Brazil |

### Tier 3: Strategic Languages
| Language | Strategic Value | Markets |
|----------|----------------|---------|
| Arabic | Growing middle class | MENA |
| Korean | Tech-forward | South Korea |
| Italian | Fashion/luxury | Italy |
| Dutch | High English | Benelux |

### ElevenLabs Supported Languages (29+)
English, Spanish, French, German, Italian, Portuguese,
Polish, Dutch, Hindi, Arabic, Chinese, Japanese, Korean,
Turkish, Swedish, Indonesian, Filipino, Malay, Russian,
Czech, Danish, Finnish, Greek, Romanian, Ukrainian,
Vietnamese, Norwegian, Hungarian, Tamil, and more.

Step 3: Prepare Content for Localization

Translation alone isn't enough—prepare for voice adaptation.

## Content Preparation Checklist

### Script Adaptation

**Text expansion/contraction**:
| Language | vs English |
|----------|-----------|
| German | +30% longer |
| French | +15-20% longer |
| Spanish | +15-25% longer |
| Chinese | -30% shorter |
| Japanese | Variable |

**Implications**:
- Video may need re-timing
- Allow flexibility in pacing
- Consider sentence splitting for longer languages

**Localization notes to provide**:
□ Brand terms (don't translate, keep English)
□ Product names (pronunciation guide)
□ Numbers (format varies by locale)
□ Dates (format varies by locale)
□ Currency (localize amounts)
□ Cultural references (adapt or explain)

### Voice Consistency Notes

**Preserve across languages**:
- Character/personality
- Energy level
- Authority/warmth balance
- Pace relative to content

**Adapt per language**:
- Natural rhythm and cadence
- Pronunciation of brand terms
- Formal/informal register (varies by culture)

Step 4: Production Workflow

Efficient process for multilingual voice production.

## Multilingual Production Pipeline

### Phase 1: Source Production
1. Finalize English script
2. Record/generate English voice
3. Lock timing and pacing
4. Create master video/audio

### Phase 2: Translation
1. Professional translation (not machine)
2. Localization review (cultural adaptation)
3. Timing adaptation (fit original duration)
4. Brand term glossary enforcement

### Phase 3: Voice Generation

**Per language**:
  1. Load translated script
  2. Apply same voice settings as source
  3. Generate voice in target language
  4. Check pronunciation of brand terms
  5. Adjust pacing if needed
  6. Review for naturalness

### Phase 4: Quality Control

**Native speaker review checklist**:
□ Natural pronunciation
□ Correct emphasis and intonation
□ Brand terms handled correctly
□ No awkward phrasing
□ Appropriate formality level
□ Cultural appropriateness

### Phase 5: Integration
1. Replace audio track in video
2. Re-sync if timing changed
3. Update text overlays
4. Localize captions/subtitles
5. Final review per language

Step 5: Quality Assurance

Ensure each language meets standards.

## Localization QA Framework

### Technical QA
□ Audio levels consistent across languages
□ No clipping or distortion
□ Background music balanced correctly
□ Transitions smooth
□ Sync with video acceptable

### Linguistic QA
□ Translation accuracy (spot check 10%)
□ Natural flow and rhythm
□ Brand voice maintained
□ Technical terms correct
□ No machine-translation artifacts

### Cultural QA
□ No offensive content for market
□ References appropriate
□ Humor/idioms adapted correctly
□ Visual content appropriate
□ Call-to-action localized

### Native Speaker Sign-Off
For each language:
- [ ] Spanish (Reviewer: _____) ☐ Approved
- [ ] French (Reviewer: _____) ☐ Approved
- [ ] German (Reviewer: _____) ☐ Approved
- [ ] [Add languages...]

Step 6: Calculate ROI

Compare AI localization to traditional approaches.

## Localization Cost Comparison

### Traditional Dubbing (per language)

| Component | Cost |
|-----------|------|
| Translation | $0.15/word |
| Voice talent | $300-1,000/hour finished |
| Studio time | $100-200/hour |
| Direction | $50-100/hour |
| Engineering | $50-100/hour |

**Example**: 10-minute video (1,500 words)
- Translation: $225
- Voice talent: $400
- Studio: $200
- Direction: $150
- Engineering: $100
- **Total: ~$1,075 per language**

### AI Voice Localization

| Component | Cost |
|-----------|------|
| Translation | $0.15/word |
| ElevenLabs Pro | $99/mo (unlimited) |
| QA review | $50-100/language |

**Example**: 10-minute video (1,500 words)
- Translation: $225
- Voice generation: ~$0 (within plan)
- QA review: $75
- **Total: ~$300 per language**

### ROI Summary

| Languages | Traditional | AI Localization | Savings |
|-----------|-------------|-----------------|---------|
| 5 | $5,375 | $1,500 | 72% |
| 10 | $10,750 | $3,000 | 72% |
| 20 | $21,500 | $6,000 | 72% |

**Break-even**: AI localization typically saves 70%+ vs. traditional

Examples

Example 1: E-Learning Course Localization

Context: Localizing a 20-video course to 5 languages.

Input:

We have a 20-video SaaS training course in English (total 3 hours). Need to localize to Spanish, French, German, Portuguese, and Japanese. What's the best approach?

Output:

Course Localization Plan

Project Scope

  • Content: 20 videos, 3 hours total
  • Source: English
  • Targets: Spanish, French, German, Portuguese, Japanese
  • Content type: Educational (clarity > emotion)

Recommended Approach: AI Voice Localization

Why AI is ideal here:

  • Educational content (fact delivery, not drama)
  • 5 languages (efficiency critical)
  • Likely ongoing updates (need repeatable process)
  • Consistent "instructor" voice across all versions

Implementation Plan

Platform: ElevenLabs Pro ($99/mo)

Voice Strategy:

  • Clone original instructor OR design consistent voice
  • Same voice speaks all 6 languages
  • Maintains authority and teaching style

Workflow per Video:

  1. Translation (outsource)

    • Professional translation (not machine)
    • Provide glossary of product terms
    • Timing guidance for text expansion
  2. Voice Generation

    • Generate all 5 languages from same voice
    • ~15 minutes per language per video
    • Total: ~25 hours voice generation
  3. QA Review

    • Native speaker review per language
    • Focus: pronunciation, naturalness, accuracy
    • Budget 1 hour review per language per video
  4. Video Integration

    • Replace audio tracks
    • Adjust timing if needed
    • Verify captions match

Timeline

PhaseDurationNotes
Translation (all)2 weeksParallel
Voice generation1 week~5 hours/day
QA review2 weeksParallel per language
Integration1 weekParallel
Total5-6 weeksWith buffer

Budget Estimate

ItemCost
Translation (5 lang × 27k words)$20,250
ElevenLabs (2 months)$200
QA review (5 lang × 20 videos × $30)$3,000
Integration (20 hours @ $50)$1,000
Total~$24,500

vs. Traditional dubbing: ~$107,500 (77% savings)


Example 2: Global Ad Campaign

Context: 30-second ad needs localization for global launch.

Input:

We're launching a 30-second video ad globally. Need 10 language versions in 2 weeks. Budget is $15,000.

Output:

Ad Localization: Rapid Global Launch

Constraints

  • 30-second ad (tight timing)
  • 10 languages
  • 2-week deadline
  • $15,000 budget (~$1,500/language)

Recommended: Hybrid Approach

Tier 1 (Hero Markets) - Traditional Dubbing

  • English (source)
  • Spanish (largest reach)
  • German (high value)
  • French (high value)

Tier 2 (Scale Markets) - AI Localization

  • Portuguese, Italian, Dutch, Polish, Japanese, Korean

Rationale

  • Hero markets get premium treatment
  • AI handles scale efficiently
  • Both meet deadline

Production Schedule

Week 1:

DayTask
1-2All translations complete
2-3Traditional dubbing sessions (4 languages)
3-4AI voice generation (6 languages)
4-5QA review all versions

Week 2:

DayTask
1-2Revisions and fixes
3-4Video integration all versions
5Final review and delivery

Budget Allocation

ItemCost
Translation (10 × ~120 words)$1,800
Traditional dubbing (4 lang)$4,800
AI generation (6 lang)$600
QA review (10 lang)$2,000
Integration (10 lang)$2,500
Buffer$3,300
Total$15,000

Checklists & Templates

Localization Project Checklist

## Pre-Production
□ Languages selected and prioritized
□ Budget allocated per language
□ Timeline established
□ Translation vendor selected
□ Brand glossary prepared
□ Voice consistency plan defined

## Production
□ Translations complete
□ Translations reviewed for brand terms
□ Voice generated per language
□ Pronunciation verified
□ Timing adjusted if needed

## Quality Assurance
□ Native speaker review complete
□ Technical QA passed
□ Brand guidelines verified
□ Cultural review passed
□ Legal/compliance check (if needed)

## Delivery
□ Files named correctly per language
□ All formats delivered
□ Captions/subtitles provided
□ Documentation complete
□ Source files archived

Brand Glossary Template

## [Brand] Localization Glossary

### Never Translate
| English | Note |
|---------|------|
| [Brand Name] | Keep English, pronunciation: [X] |
| [Product Name] | Keep English |
| [Feature Name] | Keep English, explain in context |

### Translate Consistently
| English | Spanish | French | German |
|---------|---------|--------|--------|
| Dashboard | Panel | Tableau de bord | Dashboard |
| Workflow | Flujo de trabajo | Flux de travail | Arbeitsablauf |
| [Term] | | | |

### Pronunciation Guide
| Term | Pronunciation |
|------|--------------|
| [Brand] | /brănd/ |
| [Feature] | /fē-chər/ |

Skill Boundaries

What This Skill Does Well

  • Structuring audio production workflows
  • Providing technical guidance
  • Creating quality checklists
  • Suggesting creative approaches

What This Skill Cannot Do

  • Replace audio engineering expertise
  • Make subjective creative decisions
  • Access or edit audio files directly
  • Guarantee commercial success

References

  • ElevenLabs. "Multilingual Voice Synthesis" - Platform documentation
  • CSA Research. "Global Content Strategy" - Localization best practices
  • Unbabel. "The State of Localization" - Industry benchmarks
  • Nimdzi. "Localization Market Research" - Cost and ROI data

Related Skills


Skill Metadata (Internal Use)

name: voice-localization
category: audio
subcategory: voice
version: 1.0
author: MKTG Skills
source_expert: ElevenLabs, Localization Best Practices
source_work: Multilingual Content Production
difficulty: intermediate
estimated_value: 70%+ cost savings vs. traditional dubbing
tags: [localization, multilingual, dubbing, ai-voice, global]
created: 2026-01-26
updated: 2026-01-26

GitHub 仓库

guia-matthieu/clawfu-skills
路径: skills/audio/voice-localization
0
ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

相关推荐技能

content-collections

Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。

查看技能

polymarket

这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。

查看技能

creating-opencode-plugins

该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。

查看技能

sglang

SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。

查看技能