LLM Service¶

LLM inference in Nexivo is implemented inside Atlas (~/PycharmProjects/AI/atlas/src/voice_pipeline). There is no separate LLM microservice — Atlas manages the full conversation context and calls the configured LLM provider directly.

The LLM provider is configured per agent via voice_config.llm_model and voice_config.pipeline_mode in Compass.

Pipeline Modes¶

Atlas supports two pipeline modes selected per agent:

Mode	Description
`stt_llm_tts`	Standard pipeline — STT transcript → LLM text → TTS audio
`realtime`	Audio-native model (e.g. Gemini Live, OpenAI Realtime) — audio in, audio out; no separate STT/TTS

Supported Providers (STT-LLM-TTS mode)¶

Configured via voice_config.llm_model:

Provider	Models	Notes
`google`	`gemini-2.5-flash`, `gemini-3-flash`	Default; tool calling, vision capable
`openai`	`gpt-4o`, `gpt-4o-mini`	Tool calling, structured output
Custom endpoint	Any OpenAI-compatible API	HuggingFace, Together AI, Groq

Supported Providers (Realtime mode)¶

Provider	Models	Notes
`google`	Gemini Live (`gemini-2.5-flash-live-preview`)	Audio-native, low latency
`openai`	OpenAI Realtime API	Audio-native
`qwen`	Qwen Omni	Audio-native

Input / Output¶

Input to LLM (per turn):

ChatContext:
  [system]   assembled agent_prompt (from Compass prompt components)
  [user]     STT transcript of current utterance
  [history]  previous assistant + user messages (sliding window)
  [tools]    registered tool definitions from agent_skills

Output from LLM:

Plain response text → passed to TTS Service
Tool call request → Atlas executes tool, result fed back to LLM, then final response to TTS
Escalation signal → Atlas triggers transfer_to_human_agent

Prompt Assembly¶

The LLM system message is assembled by Atlas from the prompt components returned by Compass, sorted by priority:

Priority	Component	Content
10–30	`base`	Core agent identity
11	`base_prompt_voice`	Voice channel rules
60	`language`	Per-language conversation style
90	`escalation`	Human handoff rules
100	`guard`	Hallucination / safety guardrails
110+	`tool_prompt`	Per-tool usage instructions

For realtime mode, Compass returns a pre-assembled prompt_realtime string that is injected immutably at session start.

Tool Calling¶

Atlas registers active agent skills (from Compass agent_skills) as RawFunctionTools available to the LLM each turn. Execution is handled by MCPManager:

LLM emits a tool call request
Atlas routes to the relevant MCP server or local tool
Tool result is injected back into context as a tool response message
LLM generates the final response

Local tools always available:

Tool	Purpose
`transfer_to_human_agent`	Escalate to human agent queue
`switch_language`	Change conversation language mid-call
`end_call`	Gracefully terminate the call
`search_knowledge`	Query agent's Qdrant knowledge base

MCPManager deduplicates tool calls within a 30 s window to prevent concurrent duplicate executions.

Guardrails¶

Pre-generation: Prompt validation applied at agent creation time in Compass (malicious content check)
Post-generation: guard prompt components instruct the model to avoid hallucinations and enforce tone policy
Escalation: If the model or the guard component determines the query cannot be handled, it emits an escalation signal to trigger transfer_to_human_agent

Observability¶

All tool inputs and outputs logged at DEBUG level with call_id context
LangFuse / OpenTelemetry traces: session ID = call_id, user = caller phone number
Recoverable LLM errors: Atlas speaks a filler phrase while the SDK retries internally