Audio Tutorial
Learn how to build MentraOS Apps that can:- Play audio files from URLs on connected smart glasses or phone
- Convert text to speech and play it through the glasses speakers or phone
- Stop audio playback when needed
Prerequisites
- MentraOS SDK ≥
2.1.2installed in your project - A local development environment configured as described in Getting Started
1 - Set up the Project
Copy the basic project structure from the Quickstart if you haven’t already. We’ll focus on the contents ofsrc/index.ts.
2 - Playing Audio from URLs
The most straightforward way to play audio is from a publicly accessible URL.src/index.ts
3 - Text-to-Speech (TTS)
Convert any text to natural-sounding speech using ElevenLabs and play it on the glasses.src/index.ts
TTS Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
voice_id | string | Server default | ElevenLabs voice ID |
model_id | string | eleven_flash_v2_5 | TTS model to use (see models below) |
voice_settings.stability | number | 0.5 | Voice stability and randomness (0.0-1.0). Lower values introduce broader emotional range, higher values can result in monotonous voice |
voice_settings.similarity_boost | number | 0.75 | How closely AI adheres to original voice (0.0-1.0) |
voice_settings.style | number | 0.0 | Style exaggeration of the voice (0.0-1.0). Amplifies original speaker’s style but increases latency |
voice_settings.use_speaker_boost | boolean | false | Boosts similarity to original speaker. Increases computational load and latency |
voice_settings.speed | number | 1.0 | Playback speed. 1.0 = normal, <1.0 = slower, >1.0 = faster |
Available TTS Models
| Model | Description | Languages | Latency |
|---|---|---|---|
eleven_v3 | Human-like and expressive speech generation | 70+ languages | Standard |
eleven_flash_v2_5 | Ultra-fast model optimized for real-time use | All multilingual_v2 languages + hu, no, vi | ~75ms |
eleven_flash_v2 | Ultra-fast model (English only) | en | ~75ms |
eleven_turbo_v2_5 | High quality, low-latency with good balance | Same as flash_v2_5 | ~250-300ms |
eleven_turbo_v2 | High quality, low-latency (English only) | en | ~250-300ms |
eleven_multilingual_v2 | Most lifelike with rich emotional expression | en, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru | Standard |
4 - Interactive Audio App
Here’s a complete example that combines voice activation with audio responses:src/index.ts
5 - Audio Management
Stopping Audio
Error Handling
Next Steps
- See the detailed Audio Manager documentation
- Explore Device Capabilities to adapt audio features based on hardware
- Learn about Events to create voice-activated audio experiences
- Review Permissions for any audio-related permissions your app might need

