Higgs Audio v3Self-hosted TTS + ASR API
checking…

The Higgs Audio v3 API Service

Expressive, controllable text-to-speech (100+ languages, inline emotion & prosody) and speech-to-text, served OpenAI-compatible on RTX 3090. Drop-in for /v1/audio/speech and /v1/audio/transcriptions.

Base URL … Copy
Get started
πŸ“–
β†—

API Docs

Full reference β€” endpoints, parameters, the inline control-token catalog, languages, and copy-paste samples.

🦊
β†—

Live Demo

β€œCoco” β€” a bilingual δΈ­/θ‹± early-childhood oral-practice & emotional-companion app showing off emotion + pacing.

🧩
β†—

Models

higgs-audio-v3-tts-4b & higgs-audio-v3-stt β€” weights, model cards, and architecture.

Endpoints
πŸ—£οΈ
β†—

Text β†’ Speech

Synthesize expressive audio. Inline <|emotion:…|> tags for emotion, prosody, speed, and sound effects. Zero-shot voice cloning.

POST /v1/audio/speech
πŸ‘‚
β†—

Speech β†’ Text

Whisper-compatible transcription. 60+ languages, thinking mode for accuracy. Drop-in OpenAI transcriptions API.

POST /v1/audio/transcriptions
Quick start
# Synthesize speech with an emotion tag
curl $BASE/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"<|emotion:elation|>Hello from Higgs!","response_format":"mp3"}' \
  --output hello.mp3
Resources
πŸ“š
β†—

Boson AI Docs

Official Higgs Audio v3 documentation β€” API spec, voices, control tokens.

βš™οΈ
β†—

SGLang-Omni

The serving engine powering TTS β€” cookbook & deployment recipes.

πŸͺ„
β†—

ModelScope

Model weights mirror (CN-fast) for the TTS-4B and STT models.

Higgs Audio v3 Β· powered by SGLang-Omni (TTS) + transformers (ASR). Demo brain: DeepSeek V4 Flash via AI Gateway.