Expressive, controllable text-to-speech (100+ languages, inline emotion & prosody)
and speech-to-text, served OpenAI-compatible on RTX 3090. Drop-in for
/v1/audio/speech and
/v1/audio/transcriptions.
β¦
Copy
Full reference β endpoints, parameters, the inline control-token catalog, languages, and copy-paste samples.
βCocoβ β a bilingual δΈ/θ± early-childhood oral-practice & emotional-companion app showing off emotion + pacing.
higgs-audio-v3-tts-4b & higgs-audio-v3-stt β weights, model cards, and architecture.
Synthesize expressive audio. Inline <|emotion:β¦|> tags for emotion, prosody, speed, and sound effects. Zero-shot voice cloning.
Whisper-compatible transcription. 60+ languages, thinking mode for accuracy. Drop-in OpenAI transcriptions API.
POST /v1/audio/transcriptions# Synthesize speech with an emotion tag curl $BASE/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input":"<|emotion:elation|>Hello from Higgs!","response_format":"mp3"}' \ --output hello.mp3
Official Higgs Audio v3 documentation β API spec, voices, control tokens.
The serving engine powering TTS β cookbook & deployment recipes.
Model weights mirror (CN-fast) for the TTS-4B and STT models.