Use session.speaker to play audio through the user’s smart glasses. The SpeakerManager supports three modes: text-to-speech, audio file playback from a URL, and real-time audio streaming.

Quick Examples

Text-to-Speech

Convert text to speech using ElevenLabs and play it through the glasses speakers:
import { MiniAppServer, type MentraSession } from "@mentra/sdk";

const app = new MiniAppServer();

app.onSession((session) => {
  session.speaker.speak("Hello, welcome to my app");
});

app.start();

Audio File Playback

Play an audio file from a URL:
app.onSession((session) => {
  session.speaker.play({ url: "https://example.com/notification.mp3" });
});

Real-Time Audio Streaming

Stream audio chunks in real time, for example from a conversational AI API:
app.onSession(async (session) => {
  const stream = await session.speaker.createStream({ format: "mp3" });

  stream.onStateChange((state) => {
    session.logger.info(`Stream state: ${state}`);
  });

  // Fetch audio from a TTS API and stream it in real time
  const response = await fetch("https://api.example.com/tts", {
    method: "POST",
    body: "Hello",
  });

  for await (const chunk of response.body) {
    stream.write(new Uint8Array(chunk));
  }

  // Signal that no more data is coming
  await stream.end();
});

Permissions

Your app needs the speaker permission. Add it in the Developer Console when creating or editing your app. Check permission at runtime:
if (session.speaker.hasPermission) {
  session.speaker.speak("Permission granted");
} else {
  session.logger.warn("Speaker permission not granted");
}

API Reference

speak

session.speaker.speak(text, options?) converts text to speech via ElevenLabs and plays it through the glasses. Returns a Promise<PlayResult>.
await session.speaker.speak("The weather is 72 degrees and sunny");
With options:
await session.speaker.speak("Hello there", {
  voiceId: "abc123",
  modelId: "eleven_monolingual_v1",
  voiceSettings: {
    stability: 0.5,
    similarityBoost: 0.75,
    style: 0.0,
    speed: 1.0,
  },
  volume: 0.8,
  trackId: "greeting",
  stopOtherAudio: true,
});
OptionTypeDescription
voiceIdstringElevenLabs voice ID
modelIdstringElevenLabs model ID
voiceSettingsobjectVoice tuning parameters (see below)
volumenumberPlayback volume (0.0 to 1.0)
trackIdstringIdentifier for this audio track, used with stop()
stopOtherAudiobooleanStop any currently playing audio before speaking
Voice settings:
SettingTypeDescription
stabilitynumberVoice stability (0.0 to 1.0)
similarityBoostnumberSimilarity boost (0.0 to 1.0)
stylenumberStyle exaggeration (0.0 to 1.0)
speednumberSpeech speed multiplier

play

session.speaker.play(options) plays an audio file from a URL. Returns a Promise<PlayResult>.
await session.speaker.play({ url: "https://example.com/sound.mp3" });
With options:
await session.speaker.play({
  url: "https://example.com/alert.mp3",
  volume: 1.0,
  trackId: "alert-sound",
  stopOtherAudio: false,
});
OptionTypeDescription
urlstringURL of the audio file (must be publicly accessible)
volumenumberPlayback volume (0.0 to 1.0)
trackIdstringIdentifier for this audio track, used with stop()
stopOtherAudiobooleanStop any currently playing audio first
Supported formats: MP3 (recommended), WAV, and OGG.

createStream

session.speaker.createStream(options?) opens a real-time audio output stream. Returns a Promise<AudioOutputStream>.
const stream = await session.speaker.createStream({
  format: "pcm16",
  sampleRate: 24000,
  channels: 1,
  volume: 0.9,
  trackId: "ai-response",
  stopOtherAudio: true,
});
OptionTypeDefaultDescription
format"mp3" | "pcm16""mp3"Audio format
sampleRatenumber-Sample rate in Hz
channelsnumber-Number of audio channels
bitratenumber-Bitrate for compressed formats
volumenumber-Playback volume (0.0 to 1.0)
trackIdstring-Identifier for this audio track
stopOtherAudioboolean-Stop any currently playing audio first

AudioOutputStream

The stream object returned by createStream() has the following interface:
MemberTypeDescription
stream.write(chunk)(chunk: Uint8Array) => voidWrite audio data to the stream
stream.end()() => voidGracefully end the stream (finishes playing buffered audio)
stream.flush()() => voidInterrupt playback, discard the buffer, and go silent immediately
stream.onStateChange(handler)(handler: (state) => void) => voidListen for state transitions
stream.statestringCurrent state of the stream
stream.idstringUUID identifying this stream
Stream states: "created", "streaming", "ending", "ended", "error".

stop

session.speaker.stop(trackId?) stops audio playback. Pass a trackId to stop a specific track, or omit it to stop all audio.
// Stop a specific track
session.speaker.stop("alert-sound");

// Stop all audio
session.speaker.stop();

hasPermission

session.speaker.hasPermission is a boolean that indicates whether the app has speaker permission.
if (session.speaker.hasPermission) {
  session.speaker.speak("Ready");
}

Common Patterns

Respond to voice commands with speech

app.onSession((session) => {
  session.transcription.on(async (data) => {
    if (!data.isFinal) return;

    const text = data.text.toLowerCase();

    if (text.includes("weather")) {
      const weather = await fetchWeather();
      session.speaker.speak(`It's ${weather.temp} degrees and ${weather.condition}`);
    }
  });
});

Play a sound effect on an action

app.onSession((session) => {
  session.transcription.on((data) => {
    if (!data.isFinal) return;

    if (data.text.toLowerCase().includes("take photo")) {
      session.camera.takePhoto();
      session.speaker.play({ url: "https://example.com/shutter.mp3" });
    }
  });
});

Stream audio from a conversational AI

app.onSession(async (session) => {
  const stream = await session.speaker.createStream({
    format: "pcm16",
    sampleRate: 24000,
    channels: 1,
    stopOtherAudio: true,
  });

  stream.onStateChange((state) => {
    session.logger.info(`Audio stream state: ${state}`);
  });

  // As your AI generates audio chunks, write them to the stream
  aiClient.onAudioChunk((chunk: Uint8Array) => {
    stream.write(chunk);
  });

  aiClient.onDone(() => {
    stream.end();
  });
});

Interrupt streaming audio

Use flush() to immediately stop playback and discard any buffered audio. This is useful when the user interrupts the AI mid-sentence:
session.mic.onVoiceActivity((isSpeaking) => {
  if (isSpeaking && stream.state === "streaming") {
    stream.flush();
    session.logger.info("Audio interrupted by user");
  }
});

Stop specific audio tracks

Use trackId to manage multiple audio tracks independently:
// Start background audio
session.speaker.play({
  url: "https://example.com/ambient.mp3",
  trackId: "background",
});

// Start a notification sound without stopping the background
session.speaker.play({
  url: "https://example.com/ding.mp3",
  trackId: "notification",
  stopOtherAudio: false,
});

// Later, stop only the background audio
session.speaker.stop("background");

Tips

  • Keep spoken text short and natural. The user is wearing glasses, not reading a document.
  • Do not call speak() on every interim transcription result. Only speak on final results or specific triggers. Rapid-fire TTS calls will queue and overlap.
  • Use MP3 for audio files. It offers the best balance of quality and file size.
  • Host audio files on a CDN for fast delivery.
  • For real-time AI integrations, use createStream() with PCM16 format for the lowest latency.
  • Use trackId when you need to stop specific audio without interrupting everything else.

Migrating from v2

Text-to-speech and audio playback have moved from session.audio to session.speaker:
// v2
session.audio.speak("Hello");
session.audio.playAudio({ audioUrl: "https://example.com/sound.mp3" });

// v3
session.speaker.speak("Hello");
session.speaker.play({ url: "https://example.com/sound.mp3" });
Audio streaming via createStream() is new in v3 and has no v2 equivalent. The stop() method and trackId support are also new. See the Migration Guide for the full list of changes.