Orchestrator Engine¶
The Orchestrator Engine externalizes AI/ML processing from Atlas. It implements the LLM Service (Phase 1, production) and provides a platform for future STT and TTS services. It communicates with Atlas using the A2A (Agent-to-Agent) JSON-RPC protocol and executes MCP tools via Gemini Flash 2.5 with three layers of hallucination guardrails.
Overview¶
| Property | Value |
|---|---|
| Service | orchestrator-engine |
| Image | ai-services/orchestrator-engine |
| Namespace | livekit |
| Replicas | 3 |
| Port | 9100 |
| Language | Python |
| Framework | Starlette / Uvicorn |
| Protocol | A2A (JSON-RPC 2.0 over HTTP) |
Tech Stack¶
| Component | Library / Version |
|---|---|
| Web framework | Starlette + Uvicorn |
| A2A protocol | a2a-sdk 0.3.24+ |
| LLM | google-genai — Gemini Flash 2.5 |
| Embeddings | gemini-embedding-001 (768 dims) |
| MCP client | mcp 1.0.0+ |
| Vector store | qdrant-client |
| Database | asyncpg (PostgreSQL) |
| HTTP client | httpx |
| Config | pydantic-settings |
Mounted Services¶
Two A2A services are mounted on the same process:
| Service | Mount Path | Purpose |
|---|---|---|
| LLM Service | / |
MCP tool execution + 3-layer post-guardrails |
| KB Service | /kb/ |
Qdrant vector search + Gemini summarization |
API Endpoints¶
| Method | Path | Description |
|---|---|---|
POST |
/ |
A2A LLM service (JSON-RPC message/send or message/stream) |
GET |
/.well-known/agent-card.json |
A2A agent metadata for LLM service |
POST |
/kb/ |
A2A KB service (JSON-RPC) |
GET |
/kb/.well-known/agent-card.json |
A2A agent metadata for KB service |
GET |
/health |
Returns {"status": "ok"} |
GET |
/ready |
Returns {"status": "ready"} |
Atlas Integration¶
Atlas delegates the search_info tool to the orchestrator when enabled.
| Atlas Env Var | Default | Description |
|---|---|---|
ORCHESTRATOR_ENABLED |
false |
Enable orchestrator delegation |
ORCHESTRATOR_URL |
http://orchestrator-engine:9100 |
Service URL |
ORCHESTRATOR_TIMEOUT |
30 |
Request timeout (seconds) |
When enabled, Atlas calls POST / with a JSON-RPC message/send or message/stream request. Metadata fields passed per request:
tenant_idmcp_urlagent_idworkflow_context
LLM Service Request Flow¶
flowchart TD
A([Atlas — A2A JSON-RPC]) --> B[Extract query + metadata\ntenant_id · mcp_url · agent_id]
B --> C{PRE-GUARDRAILS}
C --> D[Emergency detection\nchest pain · stroke · suicidal etc.]
D -->|Match| E([Return: call 999])
C --> F[Prompt injection detection\n49 regex patterns]
F -->|Match| G([Return: deflect response])
C -->|Pass| H[Get / create MCP connection\npool keyed by full MCP URL]
H --> I[Acquire per-tenant semaphore\nmax 20 concurrent]
I --> J[TOOL EXECUTION\nList tools → Build FunctionDeclarations]
J --> K[Send to Gemini Flash 2.5]
K --> L{Tool call\nrequested?}
L -->|Yes — up to 5 rounds| M[Execute MCP tool\nRecord ToolCallRecord]
M --> K
L -->|No| N[LLM final response]
N --> O{POST-GUARDRAILS}
O --> P[Layer 1 — Provenance\nraw_outputs non-empty?]
P -->|Empty — training data answer| Q([BLOCK\nconfidence 0.0])
P -->|Non-empty| R[Layer 2 — Schema\ntool responses valid?]
R -->|Partial| S([Partial pass\nconfidence 0.7])
R -->|Valid| T[Layer 3 — Cross-reference\nentities in summary ∈ raw outputs?]
T -->|Hallucinated entities| U([Replace with safe summary\nconfidence 0.3])
T -->|All match| V([VerifiedResult\nconfidence 1.0 · verified true])
Pre-Guardrails¶
Emergency Detection¶
Triggers on keywords associated with medical or mental health emergencies (chest pain, stroke, suicidal, etc.). Returns a hardcoded "call 999" response immediately — no LLM invocation.
Prompt Injection Detection¶
49 compiled regex patterns cover common jailbreak and injection vectors. On match, the engine deflects without executing any tool.
MCP Tool Execution¶
- List available tools from the MCP server.
- Build
FunctionDeclarationobjects for the Gemini API. - Send the user query to Gemini Flash 2.5.
- Enter a tool-call loop (max 5 rounds):
- Execute the requested MCP tool.
- Record a
ToolCallRecord(tool_name, args, raw_response, timestamp). - Feed result back to Gemini.
- Collect the final LLM response and all raw outputs.
Post-Guardrails — 3-Layer Verification¶
| Layer | Check | Fail Outcome | Confidence |
|---|---|---|---|
| 1 — Provenance | raw_outputs non-empty |
BLOCK — LLM answered from training data | 0.0 |
| 2 — Schema | Tool responses valid (non-null, no error dicts) | Partial pass | 0.7 |
| 3 — Cross-reference | Entities in LLM summary exist in raw tool outputs | Replace summary with safe summary built from raw data | 0.3 |
| Clean pass | All layers pass | verified=True |
1.0 |
VerifiedResult fields: summary, confidence (0.0–1.0), verified (bool), verification_notes (list), raw_outputs (list).
KB Service Flow¶
- Generate query embedding —
gemini-embedding-001, 768 dimensions,task_type=retrieval_query. - Search Qdrant collection
tenant_{id}_agent_{id}— score threshold 0.5, top-K 6. - Log question to PostgreSQL
question_logstable (non-blocking background task). - Summarize with Gemini Flash 2.5 — temperature 0.1, max_tokens 1024, system prompt: "Answer using ONLY provided KB articles."
- Apply KB variant of post-guardrails.
MCP Connection Pool¶
| Property | Value |
|---|---|
| Pool key | Full MCP URL (including query params) |
| Connection lifecycle | Background task holds context manager open |
| Idle cleanup | 300 s |
| Per-tenant concurrency | 20 simultaneous requests max |
| Tool call timeout | 90 s |
Phase Roadmap¶
| Phase | Service | Status | Notes |
|---|---|---|---|
| Phase 1 | LLM Service | Production | MCP tool execution + 3-layer verification |
| Phase 2 | STT Service | Planned | VAD, language detection, model router (Deepgram / Google / ElevenLabs / Groq), turn detection |
| Phase 3 | TTS Service | Planned | Model router (OpenAI / Google / ElevenLabs / Cartesia / Hume) |
| Long-term | Full orchestration | Future | Atlas becomes thin LiveKit adapter; orchestrator owns all AI/ML |
Configuration¶
| Variable | Default | Required | Description |
|---|---|---|---|
GOOGLE_API_KEY |
— | Yes | Google AI API key |
ORCHESTRATOR_LLM_MODEL |
gemini-2.5-flash |
No | Gemini model for LLM service |
QDRANT_URL |
http://qdrant:6333 |
No | Qdrant vector store URL |
MCP_MAX_CONCURRENT_PER_TENANT |
20 |
No | Per-tenant semaphore limit |
MCP_TOOL_TIMEOUT |
90.0 |
No | MCP tool call timeout (seconds) |
MCP_IDLE_TIMEOUT |
300.0 |
No | Idle connection cleanup (seconds) |
KB_SCORE_THRESHOLD |
0.5 |
No | Qdrant similarity threshold |
KB_DEFAULT_LIMIT |
6 |
No | Qdrant top-K results |
POSTGRES_HOST |
— | No | PostgreSQL host for question logging |
POSTGRES_PORT |
— | No | PostgreSQL port |
POSTGRES_DB |
— | No | PostgreSQL database name |
POSTGRES_USER |
— | No | PostgreSQL user |
POSTGRES_PASSWORD |
— | No | PostgreSQL password |
ORCHESTRATOR_ENABLED is off by default
Atlas ships with ORCHESTRATOR_ENABLED=false. The orchestrator must be explicitly opted in per deployment until Phase 1 is fully validated in production.