TTS Service — Text-to-Speech¶

TTS in Nexivo is implemented inside Atlas (~/PycharmProjects/AI/atlas/src/voice_pipeline). There is no separate TTS microservice — Atlas selects, configures, and calls the TTS provider directly, then streams the audio back to the caller via LiveKit.

The TTS provider and voice are configured per agent via voice_config.tts_provider and voice_config.voice in Compass.

Supported Providers¶

Provider	Model(s)	Notes
`openai`	`gpt-4o-mini-tts`	Default; high quality
`gemini`	`gemini-2.5-flash-preview-tts`	Fast, expressive
`elevenlabs`	`eleven_turbo_v2_5`	Natural, multilingual
`google`	`Chirp3-HD`	Streaming; Indian languages
`cartesia`	`sonic-2`, `sonic-3`	Emotion and speed control
`hume`	`octave v1`, `octave v2`	Expressive, emotional
`huggingface`	Chatterbox	On-premise
`on_premise`	Custom API	`localhost:5000`

Text Pre-Processing¶

Before synthesis, Atlas applies a text processing pipeline to the LLM response:

Step	Description
Abbreviation expansion	Expands common abbreviations (e.g. `Dr.` → `Doctor`) — results cached in Redis
Number normalisation	Converts digits to spoken form (e.g. `42` → `forty-two`)
Markdown stripping	Removes `bold`, `_italic_`, bullet points
Character cleaning	Removes unspeakable characters

Output¶

Atlas receives a PCM or Opus audio stream from the TTS provider and forwards it directly to LiveKit for real-time playback to the caller.

TTS output text is logged at INFO level (first 300 characters) for debugging.

Voice Model Selection¶

The voice_config.voice field in the Compass agent record selects the specific voice within the provider's catalogue. Example values:

Voice ID	Provider	Language	Character
`Zephyr`	Google	en-IN	Female
`alloy`	OpenAI	en	Neutral
`Rachel`	ElevenLabs	en	Female
`en-US-Neural2-F`	Google Cloud	en-US	Female
`ar-XA-Standard-B`	Google Cloud	ar	Male

TO DO

Document the full voice catalogue and per-provider voice ID conventions.

Interruption Handling¶

Atlas monitors the caller audio stream during TTS playback:

If the caller starts speaking during TTS, Atlas evaluates whether to interrupt based on:
Minimum interruption duration threshold
Minimum word count threshold
Interruptions that don't meet the threshold are suppressed (logged as interruption rejected)
Valid interruptions stop TTS immediately and process the new utterance