Media Bot¶

Media Bot is a LiveKit worker that joins every call room alongside Atlas. Where Atlas handles the AI conversation, Media Bot handles everything else: hold music, ringback tones, busy signals, and live transcription with sentiment analysis. It is metadata-driven — Call Service controls it entirely through LiveKit room attributes with no direct RPC.

Source

PycharmProjects/AI/media-bot — Python 3.9+ / livekit-agents 1.5

Role in the Pipeline¶

graph LR
    CS[Call Service] -->|manual dispatch| LK[LiveKit]
    LK --> MB[Media Bot]
    CS -->|room attribute: media.command| LK
    LK -->|attribute change event| MB
    MB -->|audio track| LK
    LK -->|hold music / tone| Caller
    MB -->|FINAL_TRANSCRIPT + sentiment| LK

Media Bot and Atlas are dispatched to every room by Call Service. They operate independently — Media Bot reacts to metadata, Atlas runs the AI conversation.

Tech Stack¶

Layer	Technology
Runtime	Python 3.9+
Framework	livekit-agents 1.5.9
STT — default	Groq Whisper (multilingual)
STT — English	Deepgram nova-3
STT — Arabic / other	ElevenLabs Scribe v2
Sentiment	VADER
Audio decoding	numpy (48 kHz mono PCM)
HTTP	aiohttp (asset download)

Responsibilities¶

Audio playback¶

Media Bot plays pre-recorded audio assets into the room. Call Service triggers playback by setting attributes on the caller's participant object in LiveKit:

Attribute	Value	Purpose
`media.command`	`play` / `stop`	Start or stop playback
`media.asset_url`	HTTPS URL to MP3	Audio to play
`media.loop`	`true` / `false`	Loop (hold music) or single-shot (tone)
`media.command_id`	UUID	Deduplication — prevents re-trigger on metadata propagation

Media Bot downloads the asset via aiohttp, decodes to 48 kHz mono PCM, scales volume to 60%, and streams frames to a TrackSource.SOURCE_SCREENSHARE_AUDIO track (separate from Atlas's microphone track to prevent collisions).

Supported audio types:

Type	Loop	Trigger
Hold music	Yes	Agent puts caller on hold
Ringback tone	No	Call is being connected
Busy tone	No	All agents unavailable

Non-looping audio (busy tone) automatically closes the room after playback drains.

Hold isolation¶

When a caller is placed on hold, Media Bot unsubscribes them from all other audio tracks at the LiveKit server level so they only hear the hold track. The held party is re-subscribed on unhold.

Live transcription¶

Media Bot runs STT on all human participants in the room in parallel with the Atlas conversation. It selects the STT provider by language:

stt.language = "en" / "en-US" / "en-GB"  →  Deepgram nova-3
stt.language = "ar" / "ar-SA"             →  ElevenLabs Scribe v2
all other languages                        →  Groq Whisper

Call Service activates transcription by setting attributes on Media Bot's own participant:

Attribute	Value
`stt.enabled`	`true` / `false`
`stt.language`	BCP-47 language code
`stt.command_id`	UUID (deduplication)

Transcription output (FINAL_TRANSCRIPT + VADER sentiment scores) is published to the LiveKit data channel.

Media Bot vs Atlas¶

	Media Bot	Atlas
Purpose	Tones, hold, transcription	AI conversation
Audio output	Pre-recorded assets	AI-generated TTS
Audio input	All participants (STT)	Caller only
Control	Room metadata attributes	LiveKit RPC + Compass config
Track type	`SOURCE_SCREENSHARE_AUDIO`	`SOURCE_MICROPHONE`
Deployment	Light (≤ 512 Mi RAM)	Full voice pipeline

Configuration¶

Variable	Default	Purpose
`LIVEKIT_URL`	(required)	LiveKit server WebSocket URL
`LIVEKIT_API_KEY`	(required)	LiveKit API key
`LIVEKIT_API_SECRET`	(required)	LiveKit API secret
`STT_PROVIDER`	`groq`	Default STT provider
`GROQ_API_KEY`	(required)	Groq API key
`DEEPGRAM_API_KEY`	(required)	Deepgram API key
`ELEVENLABS_API_KEY`	(required)	ElevenLabs API key
`MEDIA_BOT_NUM_IDLE_PROCESSES`	`2`	Pre-spawned worker processes
`MEDIA_BOT_LOAD_THRESHOLD`	`0.75`	CPU load scale trigger
`MEDIA_BOT_MAX_RETRY`	`16`	Max connection retries
`MEDIA_BOT_PROMETHEUS_PORT`	unset	Optional Prometheus metrics port
`LOG_LEVEL`	`INFO`	Log verbosity

Deployment¶

Media Bot is deployed to the livekit namespace (separate from the dps namespace used by other services):

export KUBECONFIG=~/dok8s/dev_tunnel.yaml

kubectl set image deploy/livekit-media-bot-service \
  livekit-media-bot-service=registry.freston.io/intento/tenantator/media-bot:<sha> \
  -n livekit

Config is read from ai-secrets and ai-configs Kubernetes resources in the livekit namespace.

Smoke test: Place a call and verify a media-bot-<uuid> participant appears in the room alongside the AI agent.