Skip to content

Orchestrator Engine

The Orchestrator Engine externalizes AI/ML processing from Atlas. It implements the LLM Service (Phase 1, production) and provides a platform for future STT and TTS services. It communicates with Atlas using the A2A (Agent-to-Agent) JSON-RPC protocol and executes MCP tools via Gemini Flash 2.5 with three layers of hallucination guardrails.

Overview

Property Value
Service orchestrator-engine
Image ai-services/orchestrator-engine
Namespace livekit
Replicas 3
Port 9100
Language Python
Framework Starlette / Uvicorn
Protocol A2A (JSON-RPC 2.0 over HTTP)

Tech Stack

Component Library / Version
Web framework Starlette + Uvicorn
A2A protocol a2a-sdk 0.3.24+
LLM google-genai — Gemini Flash 2.5
Embeddings gemini-embedding-001 (768 dims)
MCP client mcp 1.0.0+
Vector store qdrant-client
Database asyncpg (PostgreSQL)
HTTP client httpx
Config pydantic-settings

Mounted Services

Two A2A services are mounted on the same process:

Service Mount Path Purpose
LLM Service / MCP tool execution + 3-layer post-guardrails
KB Service /kb/ Qdrant vector search + Gemini summarization

API Endpoints

Method Path Description
POST / A2A LLM service (JSON-RPC message/send or message/stream)
GET /.well-known/agent-card.json A2A agent metadata for LLM service
POST /kb/ A2A KB service (JSON-RPC)
GET /kb/.well-known/agent-card.json A2A agent metadata for KB service
GET /health Returns {"status": "ok"}
GET /ready Returns {"status": "ready"}

Atlas Integration

Atlas delegates the search_info tool to the orchestrator when enabled.

Atlas Env Var Default Description
ORCHESTRATOR_ENABLED false Enable orchestrator delegation
ORCHESTRATOR_URL http://orchestrator-engine:9100 Service URL
ORCHESTRATOR_TIMEOUT 30 Request timeout (seconds)

When enabled, Atlas calls POST / with a JSON-RPC message/send or message/stream request. Metadata fields passed per request:

  • tenant_id
  • mcp_url
  • agent_id
  • workflow_context

LLM Service Request Flow

flowchart TD
    A([Atlas — A2A JSON-RPC]) --> B[Extract query + metadata\ntenant_id · mcp_url · agent_id]

    B --> C{PRE-GUARDRAILS}

    C --> D[Emergency detection\nchest pain · stroke · suicidal etc.]
    D -->|Match| E([Return: call 999])

    C --> F[Prompt injection detection\n49 regex patterns]
    F -->|Match| G([Return: deflect response])

    C -->|Pass| H[Get / create MCP connection\npool keyed by full MCP URL]

    H --> I[Acquire per-tenant semaphore\nmax 20 concurrent]

    I --> J[TOOL EXECUTION\nList tools → Build FunctionDeclarations]
    J --> K[Send to Gemini Flash 2.5]
    K --> L{Tool call\nrequested?}
    L -->|Yes — up to 5 rounds| M[Execute MCP tool\nRecord ToolCallRecord]
    M --> K
    L -->|No| N[LLM final response]

    N --> O{POST-GUARDRAILS}

    O --> P[Layer 1 — Provenance\nraw_outputs non-empty?]
    P -->|Empty — training data answer| Q([BLOCK\nconfidence 0.0])

    P -->|Non-empty| R[Layer 2 — Schema\ntool responses valid?]
    R -->|Partial| S([Partial pass\nconfidence 0.7])

    R -->|Valid| T[Layer 3 — Cross-reference\nentities in summary ∈ raw outputs?]
    T -->|Hallucinated entities| U([Replace with safe summary\nconfidence 0.3])

    T -->|All match| V([VerifiedResult\nconfidence 1.0 · verified true])

Pre-Guardrails

Emergency Detection

Triggers on keywords associated with medical or mental health emergencies (chest pain, stroke, suicidal, etc.). Returns a hardcoded "call 999" response immediately — no LLM invocation.

Prompt Injection Detection

49 compiled regex patterns cover common jailbreak and injection vectors. On match, the engine deflects without executing any tool.

MCP Tool Execution

  1. List available tools from the MCP server.
  2. Build FunctionDeclaration objects for the Gemini API.
  3. Send the user query to Gemini Flash 2.5.
  4. Enter a tool-call loop (max 5 rounds):
  5. Execute the requested MCP tool.
  6. Record a ToolCallRecord (tool_name, args, raw_response, timestamp).
  7. Feed result back to Gemini.
  8. Collect the final LLM response and all raw outputs.

Post-Guardrails — 3-Layer Verification

Layer Check Fail Outcome Confidence
1 — Provenance raw_outputs non-empty BLOCK — LLM answered from training data 0.0
2 — Schema Tool responses valid (non-null, no error dicts) Partial pass 0.7
3 — Cross-reference Entities in LLM summary exist in raw tool outputs Replace summary with safe summary built from raw data 0.3
Clean pass All layers pass verified=True 1.0

VerifiedResult fields: summary, confidence (0.0–1.0), verified (bool), verification_notes (list), raw_outputs (list).

KB Service Flow

  1. Generate query embedding — gemini-embedding-001, 768 dimensions, task_type=retrieval_query.
  2. Search Qdrant collection tenant_{id}_agent_{id} — score threshold 0.5, top-K 6.
  3. Log question to PostgreSQL question_logs table (non-blocking background task).
  4. Summarize with Gemini Flash 2.5 — temperature 0.1, max_tokens 1024, system prompt: "Answer using ONLY provided KB articles."
  5. Apply KB variant of post-guardrails.

MCP Connection Pool

Property Value
Pool key Full MCP URL (including query params)
Connection lifecycle Background task holds context manager open
Idle cleanup 300 s
Per-tenant concurrency 20 simultaneous requests max
Tool call timeout 90 s

Phase Roadmap

Phase Service Status Notes
Phase 1 LLM Service Production MCP tool execution + 3-layer verification
Phase 2 STT Service Planned VAD, language detection, model router (Deepgram / Google / ElevenLabs / Groq), turn detection
Phase 3 TTS Service Planned Model router (OpenAI / Google / ElevenLabs / Cartesia / Hume)
Long-term Full orchestration Future Atlas becomes thin LiveKit adapter; orchestrator owns all AI/ML

Configuration

Variable Default Required Description
GOOGLE_API_KEY Yes Google AI API key
ORCHESTRATOR_LLM_MODEL gemini-2.5-flash No Gemini model for LLM service
QDRANT_URL http://qdrant:6333 No Qdrant vector store URL
MCP_MAX_CONCURRENT_PER_TENANT 20 No Per-tenant semaphore limit
MCP_TOOL_TIMEOUT 90.0 No MCP tool call timeout (seconds)
MCP_IDLE_TIMEOUT 300.0 No Idle connection cleanup (seconds)
KB_SCORE_THRESHOLD 0.5 No Qdrant similarity threshold
KB_DEFAULT_LIMIT 6 No Qdrant top-K results
POSTGRES_HOST No PostgreSQL host for question logging
POSTGRES_PORT No PostgreSQL port
POSTGRES_DB No PostgreSQL database name
POSTGRES_USER No PostgreSQL user
POSTGRES_PASSWORD No PostgreSQL password

ORCHESTRATOR_ENABLED is off by default

Atlas ships with ORCHESTRATOR_ENABLED=false. The orchestrator must be explicitly opted in per deployment until Phase 1 is fully validated in production.