Skip to content

Media Bot

Media Bot is a LiveKit worker that joins every call room alongside Atlas. Where Atlas handles the AI conversation, Media Bot handles everything else: hold music, ringback tones, busy signals, and live transcription with sentiment analysis. It is metadata-driven — Call Service controls it entirely through LiveKit room attributes with no direct RPC.

Source

PycharmProjects/AI/media-bot — Python 3.9+ / livekit-agents 1.5


Role in the Pipeline

graph LR
    CS[Call Service] -->|manual dispatch| LK[LiveKit]
    LK --> MB[Media Bot]
    CS -->|room attribute: media.command| LK
    LK -->|attribute change event| MB
    MB -->|audio track| LK
    LK -->|hold music / tone| Caller
    MB -->|FINAL_TRANSCRIPT + sentiment| LK

Media Bot and Atlas are dispatched to every room by Call Service. They operate independently — Media Bot reacts to metadata, Atlas runs the AI conversation.


Tech Stack

Layer Technology
Runtime Python 3.9+
Framework livekit-agents 1.5.9
STT — default Groq Whisper (multilingual)
STT — English Deepgram nova-3
STT — Arabic / other ElevenLabs Scribe v2
Sentiment VADER
Audio decoding numpy (48 kHz mono PCM)
HTTP aiohttp (asset download)

Responsibilities

Audio playback

Media Bot plays pre-recorded audio assets into the room. Call Service triggers playback by setting attributes on the caller's participant object in LiveKit:

Attribute Value Purpose
media.command play / stop Start or stop playback
media.asset_url HTTPS URL to MP3 Audio to play
media.loop true / false Loop (hold music) or single-shot (tone)
media.command_id UUID Deduplication — prevents re-trigger on metadata propagation

Media Bot downloads the asset via aiohttp, decodes to 48 kHz mono PCM, scales volume to 60%, and streams frames to a TrackSource.SOURCE_SCREENSHARE_AUDIO track (separate from Atlas's microphone track to prevent collisions).

Supported audio types:

Type Loop Trigger
Hold music Yes Agent puts caller on hold
Ringback tone No Call is being connected
Busy tone No All agents unavailable

Non-looping audio (busy tone) automatically closes the room after playback drains.

Hold isolation

When a caller is placed on hold, Media Bot unsubscribes them from all other audio tracks at the LiveKit server level so they only hear the hold track. The held party is re-subscribed on unhold.

Live transcription

Media Bot runs STT on all human participants in the room in parallel with the Atlas conversation. It selects the STT provider by language:

stt.language = "en" / "en-US" / "en-GB"  →  Deepgram nova-3
stt.language = "ar" / "ar-SA"             →  ElevenLabs Scribe v2
all other languages                        →  Groq Whisper

Call Service activates transcription by setting attributes on Media Bot's own participant:

Attribute Value
stt.enabled true / false
stt.language BCP-47 language code
stt.command_id UUID (deduplication)

Transcription output (FINAL_TRANSCRIPT + VADER sentiment scores) is published to the LiveKit data channel.


Media Bot vs Atlas

Media Bot Atlas
Purpose Tones, hold, transcription AI conversation
Audio output Pre-recorded assets AI-generated TTS
Audio input All participants (STT) Caller only
Control Room metadata attributes LiveKit RPC + Compass config
Track type SOURCE_SCREENSHARE_AUDIO SOURCE_MICROPHONE
Deployment Light (≤ 512 Mi RAM) Full voice pipeline

Configuration

Variable Default Purpose
LIVEKIT_URL (required) LiveKit server WebSocket URL
LIVEKIT_API_KEY (required) LiveKit API key
LIVEKIT_API_SECRET (required) LiveKit API secret
STT_PROVIDER groq Default STT provider
GROQ_API_KEY (required) Groq API key
DEEPGRAM_API_KEY (required) Deepgram API key
ELEVENLABS_API_KEY (required) ElevenLabs API key
MEDIA_BOT_NUM_IDLE_PROCESSES 2 Pre-spawned worker processes
MEDIA_BOT_LOAD_THRESHOLD 0.75 CPU load scale trigger
MEDIA_BOT_MAX_RETRY 16 Max connection retries
MEDIA_BOT_PROMETHEUS_PORT unset Optional Prometheus metrics port
LOG_LEVEL INFO Log verbosity

Deployment

Media Bot is deployed to the livekit namespace (separate from the dps namespace used by other services):

export KUBECONFIG=~/dok8s/dev_tunnel.yaml

kubectl set image deploy/livekit-media-bot-service \
  livekit-media-bot-service=registry.freston.io/intento/tenantator/media-bot:<sha> \
  -n livekit

Config is read from ai-secrets and ai-configs Kubernetes resources in the livekit namespace.

Smoke test: Place a call and verify a media-bot-<uuid> participant appears in the room alongside the AI agent.