Speaker

Use session.speaker to play audio through the user’s smart glasses. The SpeakerManager supports three modes: text-to-speech, audio file playback from a URL, and real-time audio streaming.

Quick Examples

Text-to-Speech

Convert text to speech using ElevenLabs and play it through the glasses speakers:

import { MiniAppServer, type MentraSession } from "@mentra/sdk";

const app = new MiniAppServer();

app.onSession((session) => {
  session.speaker.speak("Hello, welcome to my app");
});

app.start();

Audio File Playback

Play an audio file from a URL:

app.onSession((session) => {
  session.speaker.play({ url: "https://example.com/notification.mp3" });
});

Real-Time Audio Streaming

Stream audio chunks in real time, for example from a conversational AI API:

app.onSession(async (session) => {
  const stream = await session.speaker.createStream({ format: "mp3" });

  stream.onStateChange((state) => {
    session.logger.info(`Stream state: ${state}`);
  });

  // Fetch audio from a TTS API and stream it in real time
  const response = await fetch("https://api.example.com/tts", {
    method: "POST",
    body: "Hello",
  });

  for await (const chunk of response.body) {
    stream.write(new Uint8Array(chunk));
  }

  // Signal that no more data is coming
  await stream.end();
});

Permissions

Your app needs the speaker permission. Add it in the Developer Console when creating or editing your app. Check permission at runtime:

if (session.speaker.hasPermission) {
  session.speaker.speak("Permission granted");
} else {
  session.logger.warn("Speaker permission not granted");
}

API Reference

speak

session.speaker.speak(text, options?) converts text to speech via ElevenLabs and plays it through the glasses. Returns a Promise<PlayResult>.

await session.speaker.speak("The weather is 72 degrees and sunny");

With options:

await session.speaker.speak("Hello there", {
  voiceId: "abc123",
  modelId: "eleven_monolingual_v1",
  voiceSettings: {
    stability: 0.5,
    similarityBoost: 0.75,
    style: 0.0,
    speed: 1.0,
  },
  volume: 0.8,
  trackId: "greeting",
  stopOtherAudio: true,
});

Option	Type	Description
`voiceId`	`string`	ElevenLabs voice ID
`modelId`	`string`	ElevenLabs model ID
`voiceSettings`	`object`	Voice tuning parameters (see below)
`volume`	`number`	Playback volume (0.0 to 1.0)
`trackId`	`string`	Identifier for this audio track, used with `stop()`
`stopOtherAudio`	`boolean`	Stop any currently playing audio before speaking

Voice settings:

Setting	Type	Description
`stability`	`number`	Voice stability (0.0 to 1.0)
`similarityBoost`	`number`	Similarity boost (0.0 to 1.0)
`style`	`number`	Style exaggeration (0.0 to 1.0)
`speed`	`number`	Speech speed multiplier

play

session.speaker.play(options) plays an audio file from a URL. Returns a Promise<PlayResult>.

await session.speaker.play({ url: "https://example.com/sound.mp3" });

With options:

await session.speaker.play({
  url: "https://example.com/alert.mp3",
  volume: 1.0,
  trackId: "alert-sound",
  stopOtherAudio: false,
});

Option	Type	Description
`url`	`string`	URL of the audio file (must be publicly accessible)
`volume`	`number`	Playback volume (0.0 to 1.0)
`trackId`	`string`	Identifier for this audio track, used with `stop()`
`stopOtherAudio`	`boolean`	Stop any currently playing audio first

Supported formats: MP3 (recommended), WAV, and OGG.

createStream

session.speaker.createStream(options?) opens a real-time audio output stream. Returns a Promise<AudioOutputStream>.

const stream = await session.speaker.createStream({
  format: "pcm16",
  sampleRate: 24000,
  channels: 1,
  volume: 0.9,
  trackId: "ai-response",
  stopOtherAudio: true,
});

Option	Type	Default	Description
`format`	`"mp3" \| "pcm16"`	`"mp3"`	Audio format
`sampleRate`	`number`	-	Sample rate in Hz
`channels`	`number`	-	Number of audio channels
`bitrate`	`number`	-	Bitrate for compressed formats
`volume`	`number`	-	Playback volume (0.0 to 1.0)
`trackId`	`string`	-	Identifier for this audio track
`stopOtherAudio`	`boolean`	-	Stop any currently playing audio first

AudioOutputStream

The stream object returned by createStream() has the following interface:

Member	Type	Description
`stream.write(chunk)`	`(chunk: Uint8Array) => void`	Write audio data to the stream
`stream.end()`	`() => void`	Gracefully end the stream (finishes playing buffered audio)
`stream.flush()`	`() => void`	Interrupt playback, discard the buffer, and go silent immediately
`stream.onStateChange(handler)`	`(handler: (state) => void) => void`	Listen for state transitions
`stream.state`	`string`	Current state of the stream
`stream.id`	`string`	UUID identifying this stream

Stream states: "created", "streaming", "ending", "ended", "error".

stop

session.speaker.stop(trackId?) stops audio playback. Pass a trackId to stop a specific track, or omit it to stop all audio.

// Stop a specific track
session.speaker.stop("alert-sound");

// Stop all audio
session.speaker.stop();

hasPermission

session.speaker.hasPermission is a boolean that indicates whether the app has speaker permission.

if (session.speaker.hasPermission) {
  session.speaker.speak("Ready");
}

Common Patterns

Respond to voice commands with speech

app.onSession((session) => {
  session.transcription.on(async (data) => {
    if (!data.isFinal) return;

    const text = data.text.toLowerCase();

    if (text.includes("weather")) {
      const weather = await fetchWeather();
      session.speaker.speak(`It's ${weather.temp} degrees and ${weather.condition}`);
    }
  });
});

Play a sound effect on an action

app.onSession((session) => {
  session.transcription.on((data) => {
    if (!data.isFinal) return;

    if (data.text.toLowerCase().includes("take photo")) {
      session.camera.takePhoto();
      session.speaker.play({ url: "https://example.com/shutter.mp3" });
    }
  });
});

Stream audio from a conversational AI

app.onSession(async (session) => {
  const stream = await session.speaker.createStream({
    format: "pcm16",
    sampleRate: 24000,
    channels: 1,
    stopOtherAudio: true,
  });

  stream.onStateChange((state) => {
    session.logger.info(`Audio stream state: ${state}`);
  });

  // As your AI generates audio chunks, write them to the stream
  aiClient.onAudioChunk((chunk: Uint8Array) => {
    stream.write(chunk);
  });

  aiClient.onDone(() => {
    stream.end();
  });
});

Interrupt streaming audio

Use flush() to immediately stop playback and discard any buffered audio. This is useful when the user interrupts the AI mid-sentence:

session.mic.onVoiceActivity((isSpeaking) => {
  if (isSpeaking && stream.state === "streaming") {
    stream.flush();
    session.logger.info("Audio interrupted by user");
  }
});

Stop specific audio tracks

Use trackId to manage multiple audio tracks independently:

// Start background audio
session.speaker.play({
  url: "https://example.com/ambient.mp3",
  trackId: "background",
});

// Start a notification sound without stopping the background
session.speaker.play({
  url: "https://example.com/ding.mp3",
  trackId: "notification",
  stopOtherAudio: false,
});

// Later, stop only the background audio
session.speaker.stop("background");

Tips

Keep spoken text short and natural. The user is wearing glasses, not reading a document.
Do not call speak() on every interim transcription result. Only speak on final results or specific triggers. Rapid-fire TTS calls will queue and overlap.
Use MP3 for audio files. It offers the best balance of quality and file size.
Host audio files on a CDN for fast delivery.
For real-time AI integrations, use createStream() with PCM16 format for the lowest latency.
Use trackId when you need to stop specific audio without interrupting everything else.

Migrating from v2

Text-to-speech and audio playback have moved from session.audio to session.speaker:

// v2
session.audio.speak("Hello");
session.audio.playAudio({ audioUrl: "https://example.com/sound.mp3" });

// v3
session.speaker.speak("Hello");
session.speaker.play({ url: "https://example.com/sound.mp3" });

Audio streaming via createStream() is new in v3 and has no v2 equivalent. The stop() method and trackId support are also new. See the Migration Guide for the full list of changes.

Getting Started

v3 (SDK 3.x)

v2 (Legacy)

Quick Examples

Text-to-Speech

Audio File Playback

Real-Time Audio Streaming

Permissions

API Reference

speak

play

createStream

AudioOutputStream

stop

hasPermission

Common Patterns

Respond to voice commands with speech

Play a sound effect on an action

Stream audio from a conversational AI

Interrupt streaming audio

Stop specific audio tracks

Tips

Migrating from v2

Getting Started

v3 (SDK 3.x)

v2 (Legacy)

​Quick Examples

​Text-to-Speech

​Audio File Playback

​Real-Time Audio Streaming

​Permissions

​API Reference

​speak

​play

​createStream

​AudioOutputStream

​stop

​hasPermission

​Common Patterns

​Respond to voice commands with speech

​Play a sound effect on an action

​Stream audio from a conversational AI

​Interrupt streaming audio

​Stop specific audio tracks

​Tips

​Migrating from v2

Quick Examples

Text-to-Speech

Audio File Playback

Real-Time Audio Streaming

Permissions

API Reference

speak

play

createStream

AudioOutputStream

stop

hasPermission

Common Patterns

Respond to voice commands with speech

Play a sound effect on an action

Stream audio from a conversational AI

Interrupt streaming audio

Stop specific audio tracks

Tips

Migrating from v2