SKILL·581F07

speech-to-text

Name: speech-to-text
Author: NoizAI

NoizAI

更新于 1 month ago

11 次查看

517

在 GitHub 上查看

元word

关于

This skill transcribes audio/video files to text, triggered by terms like 'transcribe' or 'speech to text'. It supports multilingual transcription, speaker identification, and timestamp generation for captions. Developers can use it to extract spoken content from media files with auto-detection capabilities.

快速安装

Claude Code

技能文档

speech-to-text

Transcribe any audio file to text. Supports multilingual auto-detection, timestamps, and speaker labels.

Triggers

transcribe / transcript / transcription
speech to text / STT / audio to text
what does this audio say / convert audio
转录 / 语音转文字 / 识别音频

Quick Start

# Transcribe with auto language detection
python3 skills/speech-to-text/scripts/stt.py audio.mp3

# Specify language explicitly
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en

# Save transcript to file
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt

# Output full JSON (with timestamps and speaker labels)
python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json

Arguments

Argument	Default	Description
`file`	required	Audio file to transcribe (mp3, wav, m4a, ogg, flac, aac, webm). Max 50 MB, max 10 min.
`--language` / `-l`	auto-detect	BCP-47 language code (e.g. `en`, `zh`, `ja`). Omit to auto-detect.
`--output` / `-o`	stdout	Path to save transcript text (or JSON if `--json` is set).
`--json`	off	Output full JSON response with timestamps and speaker labels.
`--api-key`	from env/config	Noiz API key (overrides stored key).

Output Format

Without --json, only the transcript text is printed:

Hello, welcome to today's podcast. We have a special guest joining us...

With --json, the full structured response is printed:

{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}

Supported Languages

Common codes: en (English), zh (Chinese), ja (Japanese), ko (Korean), es (Spanish), fr (French), de (German), pt (Portuguese), ru (Russian), ar (Arabic). Omit --language to auto-detect.

Configuration

# Save your API key once
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY

# Or set via environment variable
export NOIZ_API_KEY=YOUR_KEY

Get your API key at developers.noiz.ai.

Pricing

Billed at $0.0006 per second of audio. A 10-minute file costs ~$0.36. New accounts include 10,000 free TTS characters; STT is billed separately.

Security & data disclosure

Credential storage: API key is saved to ~/.config/noiz/api_key (permissions 0600). NOIZ_API_KEY env var is also supported.
Network calls: The audio file is uploaded to https://noiz.ai/v1/speech-to-text for transcription. No data is sent until you run the command.
File limits: Max 50 MB per file, max 10 minutes (600 seconds) of audio.

Requirements

requests package: pip install requests
Get your API key at developers.noiz.ai

GitHub 仓库

NoizAI/skills

路径: skills/speech-to-text

FAQ

Frequently asked questions

What is the speech-to-text skill?

speech-to-text is a Claude Skill by NoizAI. Skills package instructions and resources that Claude loads on demand, so Claude can perform speech-to-text-related tasks without extra prompting.

How do I install speech-to-text?

Use the install commands on this page: add speech-to-text to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does speech-to-text belong to?

speech-to-text is in the Meta category, tagged word.

Is speech-to-text free to use?

Yes. speech-to-text is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

speech-to-text

关于

快速安装

Claude Code

技能文档

speech-to-text

Triggers

Quick Start

Arguments

Output Format

Supported Languages

Configuration

Pricing

Security & data disclosure

Requirements

GitHub 仓库

Frequently asked questions

What is the speech-to-text skill?

How do I install speech-to-text?

What category does speech-to-text belong to?

Is speech-to-text free to use?

相关推荐技能