SKILL·092F2C

tts

Name: tts
Author: NoizAI

NoizAI

업데이트됨 1 month ago

9 조회

517

GitHub에서 보기

메타pdf

정보

TTS 스킬은 텍스트를 음성 오디오로 변환하며, 단순 생성과 자막에 맞춘 타임라인 정확 오디오 생성을 모두 지원합니다. 음성 복제, 감정/속도 제어를 처리하며, EPUB, PDF, SRT 파일과 같은 문서를 오디오로 변환할 수 있습니다. 사용자가 텍스트 음성 변환, 보이스오버, 더빙 또는 작성된 콘텐츠를 음성 오디오로 전환해 달라고 요청할 때 사용하세요.

빠른 설치

Claude Code

문서

tts

Convert any text into speech audio. Supports two backends (Kokoro local, Noiz cloud), two modes (simple or timeline-accurate), and per-segment voice control.

Triggers

text to speech / tts / speak / say
voice clone / dubbing
epub to audio / srt to audio / convert to audio
语音 / 说 / 讲 / 说话

Simple Mode — text to audio

speak is the default — the subcommand can be omitted:

# Basic usage (speak is implicit)
python3 skills/tts/scripts/tts.py -t "Hello world"          # add -o path to save
python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3

# Voice cloning — local file path or URL
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav

# Voice message format
python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus
python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg

Third-party integration (Feishu/Telegram/Discord) is documented in ref_3rd_party.md.

Timeline Mode — SRT to time-aligned audio

For precise per-segment timing (dubbing, subtitles, video narration).

Step 1: Get or create an SRT

If the user doesn't have one, generate from text:

python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500

--cps = characters per second (default 4, good for Chinese; ~15 for English). The agent can also write SRT manually.

Step 2: Create a voice map

JSON file controlling default + per-segment voice settings. segments keys support single index "3" or range "5-8".

Kokoro voice map:

{
  "default": { "voice": "zf_xiaoni", "lang": "cmn" },
  "segments": {
    "1": { "voice": "zm_yunxi" },
    "5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
  }
}

Noiz voice map (adds emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):

{
  "default": { "voice_id": "voice_123", "target_lang": "zh" },
  "segments": {
    "1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
    "2-4": { "reference_audio": "./refs/guest.wav" }
  }
}

Dynamic Reference Audio Slicing: If you are translating or dubbing a video and want each sentence to automatically use the audio from the original video at the exact same timestamp as its reference audio, use the --ref-audio-track argument instead of setting reference_audio in the map:

python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav

See examples/ for full samples.

Step 3: Render

python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav

When to Choose Which

Need	Recommended
Just read text aloud, no fuss	Kokoro (default)
EPUB/PDF audiobook with chapters	Kokoro (native support)
Voice blending (`"v1:60,v2:40"`)	Kokoro
Voice cloning from reference audio	Noiz
Emotion control (`emo` param)	Noiz
Exact server-side duration per segment	Noiz

When the user needs emotion control + voice cloning + precise duration together, Noiz is the only backend that supports all three.

Guest Mode (no API key)

When no API key is configured, tts.py automatically falls back to guest mode — a limited Noiz endpoint that requires no authentication. Guest mode only supports --voice-id, --speed, and --format; voice cloning, emotion, duration, and timeline rendering are not available.

# Guest mode (auto-detected when no API key is set)
python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav

# Explicit backend override to use kokoro instead
python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro

Available guest voices (15 built-in):

voice_id	name	lang	gender	tone
`063a4491`	販売員（なおみ）	ja	F	喜び
`4252b9c8`	落ち着いた女性	ja	F	穏やか
`578b4be2`	熱血漢（たける）	ja	M	怒り
`a9249ce7`	安らぎ（みなと）	ja	M	穏やか
`f00e45a1`	旅人（かいと）	ja	M	穏やか
`b4775100`	悦悦｜社交分享	zh	F	Joyful
`77e15f2c`	婉青｜情绪抚慰	zh	F	Calm
`ac09aeb4`	阿豪｜磁性主持	zh	M	Calm
`87cb2405`	建国｜知识科普	zh	M	Calm
`3b9f1e27`	小明｜科技达人	zh	M	Joyful
`95814add`	Science Narration	en	M	Calm
`883b6b7c`	The Mentor (Alex)	en	M	Joyful
`a845c7de`	The Naturalist (Silas)	en	M	Calm
`5a68d66b`	The Healer (Serena)	en	F	Calm
`0e4ab6ec`	The Mentor (Maya)	en	F	Calm

Security & data disclosure

This skill performs the following file and network operations at runtime:

Credential storage: When you run config --set-api-key, the key is saved to ~/.config/noiz/api_key (permissions 0600). The NOIZ_API_KEY environment variable is also supported as an alternative.
Legacy key migration: If ~/.noiz_api_key exists and ~/.config/noiz/api_key does not, the key is copied (not deleted) to the new location. A message is printed; the old file is left untouched for you to remove manually.
Network calls (Noiz backend): Text and optional reference audio are uploaded to https://noiz.ai/v1/ for synthesis. No data is sent unless you invoke a Noiz command.
Reference audio download: When --ref-audio is a URL, the file is downloaded to a temp file, used for the API call, then deleted. If no voice-id or ref-audio is provided, a default reference audio is downloaded from storage.googleapis.com or noiz.ai.
Temp files: Temporary audio/text files may be created during synthesis and are cleaned up after use.
ffmpeg: Invoked only in timeline render mode to assemble the final audio.

No files outside the output path and ~/.config/noiz/ are modified. The Kokoro backend runs entirely offline with no network access.

Requirements

ffmpeg in PATH (timeline mode only)
requests package: uv pip install requests (required for Noiz backend)
Get your API key at Noiz Developer, then run python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY (guest mode works without a key but has limited features)
Kokoro: if already installed, pass --backend kokoro to use the local backend

Noiz API authentication

Use only the base64-encoded API key as Authorization—no prefix (e.g. no APIKEY or Bearer ). Any prefix causes 401.

For backend details and full argument reference, see reference.md.

GitHub 저장소

NoizAI/skills

경로: skills/tts

FAQ

Frequently asked questions

What is the tts skill?

tts is a Claude Skill by NoizAI. Skills package instructions and resources that Claude loads on demand, so Claude can perform tts-related tasks without extra prompting.

How do I install tts?

Use the install commands on this page: add tts to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does tts belong to?

tts is in the Meta category, tagged pdf.

Is tts free to use?

Yes. tts is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

content-collections

메타

이 스킬은 콘텐츠 콜렉션(Content Collections)을 위한 프로덕션 검증된 설정을 제공합니다. 콘텐츠 콜렉션은 Markdown/MDX 파일을 Zod 검증이 포함된 타입 안전한 데이터 콜렉션으로 변환해주는 TypeScript 최우선 도구입니다. 블로그, 문서 사이트 또는 콘텐츠 중심의 Vite + React 애플리케이션을 구축할 때 타입 안전성과 자동 콘텐츠 검증을 보장하기 위해 사용하세요. Vite 플러그인 구성과 MDX 컴파일부터 배포 최적화 및 스키마 검증에 이르기까지 모든 것을 다룹니다.

스킬 보기

polymarket

메타

이 스킬은 개발자들이 Polymarket 예측 시장 플랫폼을 활용한 애플리케이션을 구축할 수 있도록 지원하며, 거래 및 시장 데이터를 위한 API 통합 기능을 포함합니다. 또한 WebSocket을 통한 실시간 데이터 스트리밍을 제공하여 실시간 거래와 시장 활동을 모니터링할 수 있습니다. 이를 통해 거래 전략을 구현하거나 실시간 시장 업데이트를 처리하는 도구를 생성하는 데 활용할 수 있습니다.

스킬 보기

creating-opencode-plugins

메타

이 스킬은 개발자들이 명령어, 파일, LSP 작업 등 25개 이상의 이벤트 유형에 연결되는 OpenCode 플러그인을 만들 수 있도록 돕습니다. JavaScript/TypeScript 모듈을 위한 플러그인 구조, 이벤트 API 명세, 구현 패턴을 제공합니다. OpenCode AI 어시스턴트의 라이프사이클을 사용자 정의 이벤트 기반 로직으로 가로채거나, 모니터링하거나, 확장해야 할 때 사용하세요.

스킬 보기

sglang

메타

SGLang은 RadixAttention 프리픽스 캐싱을 활용하여 JSON, 정규식, 에이전트 워크플로우를 위한 고속 구조화 생성에 특화된 고성능 LLM 서빙 프레임워크입니다. 특히 반복되는 프리픽스가 있는 작업에서 상당히 빠른 추론 속도를 제공하여 복잡한 구조화 출력 및 다중 턴 대화에 이상적입니다. 제약 디코딩이 필요하거나 광범위한 프리픽스 공유가 있는 애플리케이션을 구축할 때는 vLLM과 같은 대안보다 SGLang을 선택하십시오.

스킬 보기