SKILL·092F2C

tts

Name: tts
Author: NoizAI

NoizAI

Updated 1 month ago

9 views

517

View on GitHub

Metapdf

About

The TTS skill converts text into speech audio, supporting both simple generation and timeline-accurate audio aligned to subtitles. It handles voice cloning, emotion/speed control, and can process documents like EPUBs, PDFs, or SRT files into audio. Use it when users request text-to-speech, voiceovers, dubbing, or converting written content to spoken audio.

Quick Install

Claude Code

Recommended

Primary

npx skills add NoizAI/skills -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/NoizAI/skills

Git CloneAlternative

git clone https://github.com/NoizAI/skills.git ~/.claude/skills/tts

Copy and paste this command in Claude Code to install this skill

Documentation

tts

Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.

Triggers

text to speech / tts / speak / say
voice clone / dubbing
epub to audio / srt to audio / convert to audio
语音 / 说 / 讲 / 说话

Simple Mode — text to audio

speak is the default — the subcommand can be omitted:

# Basic usage (speak is implicit)
python3 skills/tts/scripts/tts.py -t "Hello world"          # add -o path to save
python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3

# Voice cloning — local file path or URL
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav

# Voice message format
python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus
python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg

Third-party integration (Feishu/Telegram/Discord) is documented in ref_3rd_party.md.

Timeline Mode — SRT to time-aligned audio

For precise per-segment timing (dubbing, subtitles, video narration).

Step 1: Get or create an SRT

If the user doesn't have one, generate from text:

python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500

--cps = characters per second (default 4, good for Chinese; ~15 for English). The agent can also write SRT manually.

Step 2: Create a voice map

JSON file controlling default + per-segment voice settings. segments keys support single index "3" or range "5-8".

Kokoro voice map:

{
  "default": { "voice": "zf_xiaoni", "lang": "cmn" },
  "segments": {
    "1": { "voice": "zm_yunxi" },
    "5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
  }
}

Noiz voice map (adds emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):

{
  "default": { "voice_id": "voice_123", "target_lang": "zh" },
  "segments": {
    "1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
    "2-4": { "reference_audio": "./refs/guest.wav" }
  }
}

Dynamic Reference Audio Slicing: If you are translating or dubbing a video and want each sentence to automatically use the audio from the original video at the exact same timestamp as its reference audio, use the --ref-audio-track argument instead of setting reference_audio in the map:

python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav

See examples/ for full samples.

Step 3: Render

python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav

When to Choose Which

Need	Recommended
Just read text aloud, no fuss	Kokoro (default)
EPUB/PDF audiobook with chapters	Kokoro (native support)
Voice blending (`"v1:60,v2:40"`)	Kokoro
Voice cloning from reference audio	Noiz
Emotion control (`emo` param)	Noiz
Exact server-side duration per segment	Noiz

When the user needs emotion control + voice cloning + precise duration together, Noiz is the only backend that supports all three.

Guest Mode (no API key)

When no API key is configured, tts.py automatically falls back to guest mode — a limited Noiz endpoint that requires no authentication. Guest mode only supports --voice-id, --speed, and --format; voice cloning, emotion, duration, and timeline rendering are not available.

# Guest mode (auto-detected when no API key is set)
python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav

# Explicit backend override to use kokoro instead
python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro

Available guest voices (15 built-in):

voice_id	name	lang	gender	tone
`063a4491`	販売員（なおみ）	ja	F	喜び
`4252b9c8`	落ち着いた女性	ja	F	穏やか
`578b4be2`	熱血漢（たける）	ja	M	怒り
`a9249ce7`	安らぎ（みなと）	ja	M	穏やか
`f00e45a1`	旅人（かいと）	ja	M	穏やか
`b4775100`	悦悦｜社交分享	zh	F	Joyful
`77e15f2c`	婉青｜情绪抚慰	zh	F	Calm
`ac09aeb4`	阿豪｜磁性主持	zh	M	Calm
`87cb2405`	建国｜知识科普	zh	M	Calm
`3b9f1e27`	小明｜科技达人	zh	M	Joyful
`95814add`	Science Narration	en	M	Calm
`883b6b7c`	The Mentor (Alex)	en	M	Joyful
`a845c7de`	The Naturalist (Silas)	en	M	Calm
`5a68d66b`	The Healer (Serena)	en	F	Calm
`0e4ab6ec`	The Mentor (Maya)	en	F	Calm

Security & data disclosure

This skill performs the following file and network operations at runtime:

Credential storage: When you run config --set-api-key, the key is saved to ~/.config/noiz/api_key (permissions 0600). The NOIZ_API_KEY environment variable is also supported as an alternative.
Legacy key migration: If ~/.noiz_api_key exists and ~/.config/noiz/api_key does not, the key is copied (not deleted) to the new location. A message is printed; the old file is left untouched for you to remove manually.
Network calls (Noiz backend): Text and optional reference audio are uploaded to https://noiz.ai/v1/ for synthesis. No data is sent unless you invoke a Noiz command.
Reference audio download: When --ref-audio is a URL, the file is downloaded to a temp file, used for the API call, then deleted. If no voice-id or ref-audio is provided, a default reference audio is downloaded from storage.googleapis.com or noiz.ai.
Temp files: Temporary audio/text files may be created during synthesis and are cleaned up after use.
ffmpeg: Invoked only in timeline render mode to assemble the final audio.

No files outside the output path and ~/.config/noiz/ are modified. The Kokoro backend runs entirely offline with no network access.

Requirements

ffmpeg in PATH (timeline mode only)
requests package: uv pip install requests (required for Noiz backend)
Get your API key at Noiz Developer, then run python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY (guest mode works without a key but has limited features)
Kokoro: if already installed, pass --backend kokoro to use the local backend

Noiz API authentication

Use only the base64-encoded API key as Authorization—no prefix (e.g. no APIKEY or Bearer ). Any prefix causes 401.

For backend details and full argument reference, see reference.md.

GitHub Repository

NoizAI/skills

Path: skills/tts

FAQ

Frequently asked questions

What is the tts skill?

tts is a Claude Skill by NoizAI. Skills package instructions and resources that Claude loads on demand, so Claude can perform tts-related tasks without extra prompting.

How do I install tts?

Use the install commands on this page: add tts to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does tts belong to?

tts is in the Meta category, tagged pdf.

Is tts free to use?

Yes. tts is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

polymarket

Meta

This skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.

View skill

creating-opencode-plugins

Meta

This skill helps developers create OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It provides the plugin structure, event API specifications, and implementation patterns for JavaScript/TypeScript modules. Use it when you need to intercept, monitor, or extend the OpenCode AI assistant's lifecycle with custom event-driven logic.

View skill

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill