Audio Scenes
Audio scenes add sound to your composition: narration, music, sound effects, or any audio content. They play alongside visual scenes without affecting what's displayed.
How audio scenes work
Audio scenes are "invisible" layers. They have timing like visual scenes (start, duration) but don't render anything visible. Instead, they play audio at the specified time.
{
type: "audio_only",
start: 0,
duration: 30,
config: {
audioUrl: "https://example.com/narration.mp3",
fileId: "file_abc123"
}
}Adding audio scenes
Via AI
The most common way to add audio is text-to-speech:
Generate narration saying "Welcome to our product demo."The assistant creates an audio scene with synthesized speech. Multiple voice options are available. Check listSpeechModels for the full list.
For existing audio files:
Add the audio file from https://example.com/background-music.mp3Or drag an audio file directly into the chat panel.
Via CLI
Generate speech from the command line:
program tool exec generateSpeech \
--text "Welcome to our product demo" \
--model "eleven_labs/rachel"Add an existing audio file:
program tool exec addScene \
--title "Background Music" \
--duration 30 \
--type audio_only \
--config '{"audioUrl": "https://example.com/music.mp3"}'List available voice models:
program tool exec listSpeechModelsIn the editor
Audio scenes appear in the timeline like any other scene. Click to select, drag to reposition, or resize to adjust duration. Audio scenes show a waveform visualization when selected.
Timeline positioning
Audio scenes appear in the timeline just like visual scenes. Adjust their position:
- Drag to move the audio to a different time
- Resize to change the duration (audio may loop or cut off)
- Overlap with other scenes. Audio and visuals are independent
Supported formats
Common audio formats work:
- MP3
- WAV
- AAC
- OGG
Audio behavior
Playback
Audio scenes use Remotion's Html5Audio component:
- Synchronized: audio timing matches the composition timeline exactly
- Buffering: playback pauses if audio needs to load
- Premounting: audio loads slightly before its start time for seamless playback
Volume
Audio plays at full volume (1.0) by default. Volume control is handled at the composition level or through post-processing.
Looping
Audio doesn't automatically loop. If your scene duration exceeds the audio length, there's silence after the audio ends. For background music, either:
- Match the scene duration to the audio length
- Use audio that's long enough for your composition
Multiple audio layers
You can have multiple audio scenes playing simultaneously:
- Background music (full duration)
- Narration (specific segments)
- Sound effects (short, timed)
Speech generation
The generateSpeech tool converts text to audio:
Generate speech saying "This is the introduction to our demo"| Parameter | Description |
|---|---|
text | The words to speak |
voice | Voice model to use |
speed | Playback speed adjustment |
listSpeechModels to see options.
Best practices
Match timing to content. If your narration mentions something, time the visual scene to appear when those words are spoken. Leave breathing room. Don't pack narration too tight. Brief pauses between sentences sound more natural. Check levels. If combining music and narration, ensure the music doesn't overpower the voice. Preview with audio. Always preview with audio enabled to catch timing issues.Transcription
Have audio but need text? Use transcription:
Transcribe the audio from https://example.com/interview.mp3The transcribeAudio tool converts speech to text, useful for creating captions or generating scenes from audio content.
For the full list of tools available for composition management, see the API Reference.