Audio specifications
| Property | Value |
|---|---|
| Encoding | audio/x-mulaw (G.711 PCMU) |
| Sample rate | 8000 Hz |
| Bit rate | 64 kbps |
| Bit depth | 8-bit |
| Format | Base64-encoded PCM mono audio |
| Message frequency | Every 100ms |
Stream lifecycle
Connection established
A WebSocket connection is opened. The system sends a
connected event as the initial handshake.Stream initialized
A
start event is sent with stream metadata, caller information, and audio format details.Audio streaming
media events are exchanged bidirectionally — the system sends caller audio, the vendor returns processed audio for playback.Events sent to vendor
Connected
First handshake message after WebSocket connection is established.Start
Contains stream metadata. Sent once at stream initialization.| Field | Type | Description |
|---|---|---|
event | string | Always "start" |
sequenceNumber | string | Message order tracking (starts at 1) |
start.streamSid | string | Unique stream identifier |
start.accountSid | string | Account identifier |
start.callSid | string | Call identifier |
from | string | Originating phone number |
to | string | Destination phone number |
start.direction | string | "inbound" or "outbound" |
start.mediaFormat | object | Audio format specifications |
start.customParameters | object | Custom key-value data passed with the stream |
Media
Encapsulates raw audio data from the caller.| Field | Type | Description |
|---|---|---|
media.chunk | string | Sequential chunk number (starts at 1) |
media.timestamp | string | Presentation timestamp in milliseconds from stream start |
media.payload | string | Base64-encoded raw audio |
Stop
Indicates stream termination or call end.| Field | Type | Description |
|---|---|---|
stop.accountSid | string | Account identifier |
stop.callSid | string | Call identifier |
stop.reason | string | Reason for stream termination |
DTMF
Sent when touch-tone digits are detected in the inbound audio stream.| Field | Type | Description |
|---|---|---|
dtmf.digit | string | Detected digit (0–9, *, #) |
Mark
Indicates completion of media playback. Used to synchronize audio delivery between client and vendor.| Field | Type | Description |
|---|---|---|
mark.name | string | Custom label for tracking playback completion |
Events received from vendor
Media
The vendor sends audio data for playback to the caller. The payload must beaudio/x-mulaw at 8000 Hz, Base64-encoded PCM mono.
Mark
The vendor signals that sent media has finished playing.Clear
Interrupts any buffered audio and resets the stream state. Use this to stop playback immediately (e.g., when the caller interrupts).Implementation notes
The
streamSid must be unique and consistent across all events for a given stream. Use the value from the start event for all subsequent messages.- Base64 encoding ensures compatibility and reliable transmission across systems
markevents synchronize playback completion between endpoints — use them to know when audio has finished playing before sending the next segmentclearevents allow you to interrupt the current audio (e.g., for barge-in when the caller speaks over the bot)- Media messages are sent every 100ms; ensure your endpoint can process them at this rate
sequenceNumbertracks message order and increments with each event sent

