session.speaker to play audio through the user’s smart glasses. The SpeakerManager supports three modes: text-to-speech, audio file playback from a URL, and real-time audio streaming.
Quick Examples
Text-to-Speech
Convert text to speech using ElevenLabs and play it through the glasses speakers:Audio File Playback
Play an audio file from a URL:Real-Time Audio Streaming
Stream audio chunks in real time, for example from a conversational AI API:Permissions
Your app needs the speaker permission. Add it in the Developer Console when creating or editing your app. Check permission at runtime:API Reference
speak
session.speaker.speak(text, options?) converts text to speech via ElevenLabs and plays it through the glasses. Returns a Promise<PlayResult>.
| Option | Type | Description |
|---|---|---|
voiceId | string | ElevenLabs voice ID |
modelId | string | ElevenLabs model ID |
voiceSettings | object | Voice tuning parameters (see below) |
volume | number | Playback volume (0.0 to 1.0) |
trackId | string | Identifier for this audio track, used with stop() |
stopOtherAudio | boolean | Stop any currently playing audio before speaking |
| Setting | Type | Description |
|---|---|---|
stability | number | Voice stability (0.0 to 1.0) |
similarityBoost | number | Similarity boost (0.0 to 1.0) |
style | number | Style exaggeration (0.0 to 1.0) |
speed | number | Speech speed multiplier |
play
session.speaker.play(options) plays an audio file from a URL. Returns a Promise<PlayResult>.
| Option | Type | Description |
|---|---|---|
url | string | URL of the audio file (must be publicly accessible) |
volume | number | Playback volume (0.0 to 1.0) |
trackId | string | Identifier for this audio track, used with stop() |
stopOtherAudio | boolean | Stop any currently playing audio first |
createStream
session.speaker.createStream(options?) opens a real-time audio output stream. Returns a Promise<AudioOutputStream>.
| Option | Type | Default | Description |
|---|---|---|---|
format | "mp3" | "pcm16" | "mp3" | Audio format |
sampleRate | number | - | Sample rate in Hz |
channels | number | - | Number of audio channels |
bitrate | number | - | Bitrate for compressed formats |
volume | number | - | Playback volume (0.0 to 1.0) |
trackId | string | - | Identifier for this audio track |
stopOtherAudio | boolean | - | Stop any currently playing audio first |
AudioOutputStream
The stream object returned bycreateStream() has the following interface:
| Member | Type | Description |
|---|---|---|
stream.write(chunk) | (chunk: Uint8Array) => void | Write audio data to the stream |
stream.end() | () => void | Gracefully end the stream (finishes playing buffered audio) |
stream.flush() | () => void | Interrupt playback, discard the buffer, and go silent immediately |
stream.onStateChange(handler) | (handler: (state) => void) => void | Listen for state transitions |
stream.state | string | Current state of the stream |
stream.id | string | UUID identifying this stream |
"created", "streaming", "ending", "ended", "error".
stop
session.speaker.stop(trackId?) stops audio playback. Pass a trackId to stop a specific track, or omit it to stop all audio.
hasPermission
session.speaker.hasPermission is a boolean that indicates whether the app has speaker permission.
Common Patterns
Respond to voice commands with speech
Play a sound effect on an action
Stream audio from a conversational AI
Interrupt streaming audio
Useflush() to immediately stop playback and discard any buffered audio. This is useful when the user interrupts the AI mid-sentence:
Stop specific audio tracks
UsetrackId to manage multiple audio tracks independently:
Tips
- Keep spoken text short and natural. The user is wearing glasses, not reading a document.
- Do not call
speak()on every interim transcription result. Only speak on final results or specific triggers. Rapid-fire TTS calls will queue and overlap. - Use MP3 for audio files. It offers the best balance of quality and file size.
- Host audio files on a CDN for fast delivery.
- For real-time AI integrations, use
createStream()with PCM16 format for the lowest latency. - Use
trackIdwhen you need to stop specific audio without interrupting everything else.
Migrating from v2
Text-to-speech and audio playback have moved fromsession.audio to session.speaker:
createStream() is new in v3 and has no v2 equivalent. The stop() method and trackId support are also new.
See the Migration Guide for the full list of changes.
