Orchestrator Engine¶

The Orchestrator Engine externalizes AI/ML processing from Atlas. It implements the LLM Service (Phase 1, production) and provides a platform for future STT and TTS services. It communicates with Atlas using the A2A (Agent-to-Agent) JSON-RPC protocol and executes MCP tools via Gemini Flash 2.5 with three layers of hallucination guardrails.

Overview¶

Property	Value
Service	`orchestrator-engine`
Image	`ai-services/orchestrator-engine`
Namespace	`livekit`
Replicas	3
Port	`9100`
Language	Python
Framework	Starlette / Uvicorn
Protocol	A2A (JSON-RPC 2.0 over HTTP)

Tech Stack¶

Component	Library / Version
Web framework	Starlette + Uvicorn
A2A protocol	`a2a-sdk` 0.3.24+
LLM	`google-genai` — Gemini Flash 2.5
Embeddings	`gemini-embedding-001` (768 dims)
MCP client	`mcp` 1.0.0+
Vector store	`qdrant-client`
Database	`asyncpg` (PostgreSQL)
HTTP client	`httpx`
Config	`pydantic-settings`

Mounted Services¶

Two A2A services are mounted on the same process:

Service	Mount Path	Purpose
LLM Service	`/`	MCP tool execution + 3-layer post-guardrails
KB Service	`/kb/`	Qdrant vector search + Gemini summarization

API Endpoints¶

Method	Path	Description
`POST`	`/`	A2A LLM service (JSON-RPC `message/send` or `message/stream`)
`GET`	`/.well-known/agent-card.json`	A2A agent metadata for LLM service
`POST`	`/kb/`	A2A KB service (JSON-RPC)
`GET`	`/kb/.well-known/agent-card.json`	A2A agent metadata for KB service
`GET`	`/health`	Returns `{"status": "ok"}`
`GET`	`/ready`	Returns `{"status": "ready"}`

Atlas Integration¶

Atlas delegates the search_info tool to the orchestrator when enabled.

Atlas Env Var	Default	Description
`ORCHESTRATOR_ENABLED`	`false`	Enable orchestrator delegation
`ORCHESTRATOR_URL`	`http://orchestrator-engine:9100`	Service URL
`ORCHESTRATOR_TIMEOUT`	`30`	Request timeout (seconds)

When enabled, Atlas calls POST / with a JSON-RPC message/send or message/stream request. Metadata fields passed per request:

tenant_id
mcp_url
agent_id
workflow_context

LLM Service Request Flow¶

flowchart TD
    A([Atlas — A2A JSON-RPC]) --> B[Extract query + metadata\ntenant_id · mcp_url · agent_id]

    B --> C{PRE-GUARDRAILS}

    C --> D[Emergency detection\nchest pain · stroke · suicidal etc.]
    D -->|Match| E([Return: call 999])

    C --> F[Prompt injection detection\n49 regex patterns]
    F -->|Match| G([Return: deflect response])

    C -->|Pass| H[Get / create MCP connection\npool keyed by full MCP URL]

    H --> I[Acquire per-tenant semaphore\nmax 20 concurrent]

    I --> J[TOOL EXECUTION\nList tools → Build FunctionDeclarations]
    J --> K[Send to Gemini Flash 2.5]
    K --> L{Tool call\nrequested?}
    L -->|Yes — up to 5 rounds| M[Execute MCP tool\nRecord ToolCallRecord]
    M --> K
    L -->|No| N[LLM final response]

    N --> O{POST-GUARDRAILS}

    O --> P[Layer 1 — Provenance\nraw_outputs non-empty?]
    P -->|Empty — training data answer| Q([BLOCK\nconfidence 0.0])

    P -->|Non-empty| R[Layer 2 — Schema\ntool responses valid?]
    R -->|Partial| S([Partial pass\nconfidence 0.7])

    R -->|Valid| T[Layer 3 — Cross-reference\nentities in summary ∈ raw outputs?]
    T -->|Hallucinated entities| U([Replace with safe summary\nconfidence 0.3])

    T -->|All match| V([VerifiedResult\nconfidence 1.0 · verified true])

Pre-Guardrails¶

Emergency Detection¶

Triggers on keywords associated with medical or mental health emergencies (chest pain, stroke, suicidal, etc.). Returns a hardcoded "call 999" response immediately — no LLM invocation.

Prompt Injection Detection¶

49 compiled regex patterns cover common jailbreak and injection vectors. On match, the engine deflects without executing any tool.

MCP Tool Execution¶

List available tools from the MCP server.
Build FunctionDeclaration objects for the Gemini API.
Send the user query to Gemini Flash 2.5.
Enter a tool-call loop (max 5 rounds):
Execute the requested MCP tool.
Record a ToolCallRecord (tool_name, args, raw_response, timestamp).
Feed result back to Gemini.
Collect the final LLM response and all raw outputs.

Post-Guardrails — 3-Layer Verification¶

Layer	Check	Fail Outcome	Confidence
1 — Provenance	`raw_outputs` non-empty	BLOCK — LLM answered from training data	0.0
2 — Schema	Tool responses valid (non-null, no error dicts)	Partial pass	0.7
3 — Cross-reference	Entities in LLM summary exist in raw tool outputs	Replace summary with safe summary built from raw data	0.3
Clean pass	All layers pass	`verified=True`	1.0

VerifiedResult fields: summary, confidence (0.0–1.0), verified (bool), verification_notes (list), raw_outputs (list).

KB Service Flow¶

Generate query embedding — gemini-embedding-001, 768 dimensions, task_type=retrieval_query.
Search Qdrant collection tenant_{id}_agent_{id} — score threshold 0.5, top-K 6.
Log question to PostgreSQL question_logs table (non-blocking background task).
Summarize with Gemini Flash 2.5 — temperature 0.1, max_tokens 1024, system prompt: "Answer using ONLY provided KB articles."
Apply KB variant of post-guardrails.

MCP Connection Pool¶

Property	Value
Pool key	Full MCP URL (including query params)
Connection lifecycle	Background task holds context manager open
Idle cleanup	300 s
Per-tenant concurrency	20 simultaneous requests max
Tool call timeout	90 s

Phase Roadmap¶

Phase	Service	Status	Notes
Phase 1	LLM Service	Production	MCP tool execution + 3-layer verification
Phase 2	STT Service	Planned	VAD, language detection, model router (Deepgram / Google / ElevenLabs / Groq), turn detection
Phase 3	TTS Service	Planned	Model router (OpenAI / Google / ElevenLabs / Cartesia / Hume)
Long-term	Full orchestration	Future	Atlas becomes thin LiveKit adapter; orchestrator owns all AI/ML

Configuration¶

Variable	Default	Required	Description
`GOOGLE_API_KEY`	—	Yes	Google AI API key
`ORCHESTRATOR_LLM_MODEL`	`gemini-2.5-flash`	No	Gemini model for LLM service
`QDRANT_URL`	`http://qdrant:6333`	No	Qdrant vector store URL
`MCP_MAX_CONCURRENT_PER_TENANT`	`20`	No	Per-tenant semaphore limit
`MCP_TOOL_TIMEOUT`	`90.0`	No	MCP tool call timeout (seconds)
`MCP_IDLE_TIMEOUT`	`300.0`	No	Idle connection cleanup (seconds)
`KB_SCORE_THRESHOLD`	`0.5`	No	Qdrant similarity threshold
`KB_DEFAULT_LIMIT`	`6`	No	Qdrant top-K results
`POSTGRES_HOST`	—	No	PostgreSQL host for question logging
`POSTGRES_PORT`	—	No	PostgreSQL port
`POSTGRES_DB`	—	No	PostgreSQL database name
`POSTGRES_USER`	—	No	PostgreSQL user
`POSTGRES_PASSWORD`	—	No	PostgreSQL password

ORCHESTRATOR_ENABLED is off by default

Atlas ships with ORCHESTRATOR_ENABLED=false. The orchestrator must be explicitly opted in per deployment until Phase 1 is fully validated in production.