Media Bot¶
Media Bot is a LiveKit worker that joins every call room alongside Atlas. Where Atlas handles the AI conversation, Media Bot handles everything else: hold music, ringback tones, busy signals, and live transcription with sentiment analysis. It is metadata-driven — Call Service controls it entirely through LiveKit room attributes with no direct RPC.
Source
PycharmProjects/AI/media-bot — Python 3.9+ / livekit-agents 1.5
Role in the Pipeline¶
graph LR
CS[Call Service] -->|manual dispatch| LK[LiveKit]
LK --> MB[Media Bot]
CS -->|room attribute: media.command| LK
LK -->|attribute change event| MB
MB -->|audio track| LK
LK -->|hold music / tone| Caller
MB -->|FINAL_TRANSCRIPT + sentiment| LK
Media Bot and Atlas are dispatched to every room by Call Service. They operate independently — Media Bot reacts to metadata, Atlas runs the AI conversation.
Tech Stack¶
| Layer | Technology |
|---|---|
| Runtime | Python 3.9+ |
| Framework | livekit-agents 1.5.9 |
| STT — default | Groq Whisper (multilingual) |
| STT — English | Deepgram nova-3 |
| STT — Arabic / other | ElevenLabs Scribe v2 |
| Sentiment | VADER |
| Audio decoding | numpy (48 kHz mono PCM) |
| HTTP | aiohttp (asset download) |
Responsibilities¶
Audio playback¶
Media Bot plays pre-recorded audio assets into the room. Call Service triggers playback by setting attributes on the caller's participant object in LiveKit:
| Attribute | Value | Purpose |
|---|---|---|
media.command |
play / stop |
Start or stop playback |
media.asset_url |
HTTPS URL to MP3 | Audio to play |
media.loop |
true / false |
Loop (hold music) or single-shot (tone) |
media.command_id |
UUID | Deduplication — prevents re-trigger on metadata propagation |
Media Bot downloads the asset via aiohttp, decodes to 48 kHz mono PCM, scales volume to 60%, and streams frames to a TrackSource.SOURCE_SCREENSHARE_AUDIO track (separate from Atlas's microphone track to prevent collisions).
Supported audio types:
| Type | Loop | Trigger |
|---|---|---|
| Hold music | Yes | Agent puts caller on hold |
| Ringback tone | No | Call is being connected |
| Busy tone | No | All agents unavailable |
Non-looping audio (busy tone) automatically closes the room after playback drains.
Hold isolation¶
When a caller is placed on hold, Media Bot unsubscribes them from all other audio tracks at the LiveKit server level so they only hear the hold track. The held party is re-subscribed on unhold.
Live transcription¶
Media Bot runs STT on all human participants in the room in parallel with the Atlas conversation. It selects the STT provider by language:
stt.language = "en" / "en-US" / "en-GB" → Deepgram nova-3
stt.language = "ar" / "ar-SA" → ElevenLabs Scribe v2
all other languages → Groq Whisper
Call Service activates transcription by setting attributes on Media Bot's own participant:
| Attribute | Value |
|---|---|
stt.enabled |
true / false |
stt.language |
BCP-47 language code |
stt.command_id |
UUID (deduplication) |
Transcription output (FINAL_TRANSCRIPT + VADER sentiment scores) is published to the LiveKit data channel.
Media Bot vs Atlas¶
| Media Bot | Atlas | |
|---|---|---|
| Purpose | Tones, hold, transcription | AI conversation |
| Audio output | Pre-recorded assets | AI-generated TTS |
| Audio input | All participants (STT) | Caller only |
| Control | Room metadata attributes | LiveKit RPC + Compass config |
| Track type | SOURCE_SCREENSHARE_AUDIO |
SOURCE_MICROPHONE |
| Deployment | Light (≤ 512 Mi RAM) | Full voice pipeline |
Configuration¶
| Variable | Default | Purpose |
|---|---|---|
LIVEKIT_URL |
(required) | LiveKit server WebSocket URL |
LIVEKIT_API_KEY |
(required) | LiveKit API key |
LIVEKIT_API_SECRET |
(required) | LiveKit API secret |
STT_PROVIDER |
groq |
Default STT provider |
GROQ_API_KEY |
(required) | Groq API key |
DEEPGRAM_API_KEY |
(required) | Deepgram API key |
ELEVENLABS_API_KEY |
(required) | ElevenLabs API key |
MEDIA_BOT_NUM_IDLE_PROCESSES |
2 |
Pre-spawned worker processes |
MEDIA_BOT_LOAD_THRESHOLD |
0.75 |
CPU load scale trigger |
MEDIA_BOT_MAX_RETRY |
16 |
Max connection retries |
MEDIA_BOT_PROMETHEUS_PORT |
unset | Optional Prometheus metrics port |
LOG_LEVEL |
INFO |
Log verbosity |
Deployment¶
Media Bot is deployed to the livekit namespace (separate from the dps namespace used by other services):
export KUBECONFIG=~/dok8s/dev_tunnel.yaml
kubectl set image deploy/livekit-media-bot-service \
livekit-media-bot-service=registry.freston.io/intento/tenantator/media-bot:<sha> \
-n livekit
Config is read from ai-secrets and ai-configs Kubernetes resources in the livekit namespace.
Smoke test: Place a call and verify a media-bot-<uuid> participant appears in the room alongside the AI agent.