Skip to content

Audio Design

Audio is the fastest way to make a world feel real. A survival thriller with rain sounds and a distant heartbeat SFX feels different from one with just text — even if the writing is identical. The system is lightweight: upload a few tracks, configure how they play, and the engine handles the rest.

For the basics of track types (BGM, SFX, Ambient) and setup, see Get Started: Audio.


How audio gets triggered

There are three ways audio plays in your world, from simplest to most flexible.

1. Playlists: set it and forget it

The BGM Playlist auto-plays a sequence of tracks when the player enters your world. Most worlds only need this.

Configure it in the Audio section of the editor:

SettingWhat it does
tracksWhich BGM tracks to play, in order
playModeloop (restart from beginning), shuffle (random), or sequential (play all, then repeat)
autoPlayStart immediately when the world loads
waitForFirstMessageDon't start until the player sends their first message — useful for worlds with character creation or an opening cutscene
gapSeconds (0-30)Silence between tracks, for a "changing records" feel

Sakura Season and PRTS Terminal both use the simplest possible setup: one BGM track, loop mode, autoPlay on. That's the whole audio configuration for a world with 2,000+ plays.

The minimal setup

Upload one audio file. Create a BGM track. Add it to the playlist with autoPlay: true. Done — players hear music from the moment they start.

2. AI directives: let the narrator cue the soundtrack

The AI can embed audio commands in its responses using the same bracket syntax as variable directives. The player never sees these — the engine strips them before displaying the text.

The door flies open with a deafening crash. [audio: door-slam play]
An armored figure steps through the smoke.

The player reads clean narrative and hears the door slam at the same time.

Available directives:

DirectiveWhat it does
[audio: trackId play]Play the track
[audio: trackId stop]Stop the track
[audio: trackId crossfade 2.0]Fade from the current BGM to this one over 2 seconds
[audio: trackId volume 0.5]Change volume without stopping
[audio: trackId play chain:nextTrackId]Play this track, then automatically play the next one when it finishes

The chain directive is especially useful for transitions: play a war horn SFX, then seamlessly transition to battle BGM when the horn finishes. Smoother than two separate directives.

AI directives can be mixed with state changes in the same response:

A rumbling echoes from deep in the dungeon. [audio: earthquake-sfx play]
Debris falls from the ceiling. [health: -5]
The ambient sound grows oppressive. [audio: ambient-cave volume 0.3]

The catch: The AI sometimes forgets to include directives, especially in long responses or when it's focused on complex narrative. For audio that absolutely must play at the right moment, use conditional BGM or rules instead.

3. Conditional BGM: music follows the story

Conditional BGM is the most interesting part of the audio system. You define conditions, and the engine automatically switches tracks when those conditions are met — no AI involvement required.

Think of it as programming a soundtrack that responds to the game state: tavern music when in the tavern, battle music when in combat, exploration music everywhere else.

Each conditional BGM entry has:

SettingWhat it does
triggerTypeWhat to watch: variable (game state), keyword / ai-keyword (text matching), turn-count, or session-start
conditionsVariable checks (when using variable trigger)
targetTrackIdWhich track to switch to
priorityHigher numbers win when multiple conditions match
fadeInDuration / fadeOutDurationTransition speed in seconds
stopPreviousBGMUsually true — unless you want to layer multiple tracks
fallbackWhat plays when the condition stops being true: "default" (return to playlist), "previous" (return to last track), or a specific trackId
Example: combat track switch

Two BGM tracks: explore-bgm and battle-bgm. The playlist plays exploration music by default. A conditional BGM entry watches for location eq "battle_arena" — when it becomes true, the engine crossfades to battle music over 0.5 seconds. When the player leaves the arena, fallback: "default" returns to exploration.

The player hears a smooth musical transition whenever combat starts and ends, without the AI having to remember anything about audio.


Using rules for audio

The Behaviors system has a play-audio action that gives you full control over audio without involving the AI at all. This is the most reliable method for audio that must fire at a precise moment.

Battle Royale uses a death-sfx track and a heartbeat-sfx track as part of its audio toolkit. While its current version relies on AI directives to trigger them, wiring these to rules would guarantee they play at the right moment — a heartbeat SFX when health drops below 20, a death sound when health hits zero.

The play-audio action supports the same operations as AI directives:

ActionWhat it does
playStart the track
stopStop the track
crossfadeFade from current BGM to this track
volumeChange volume

Combined with variable-crossed triggers, this creates audio cues that are 100% reliable:

WHEN:     health drops below 20
THEN:     play-audio crisis-bgm crossfade (fadeDuration: 1.5)

No AI involvement. The engine handles it mechanically.


Design advice

Volume balance

BGM at 0.3-0.5 works for most worlds. Players are reading text — music that's too loud competes with concentration. Ambient tracks can sit even lower (0.1-0.3) as background texture. SFX should be louder (0.7-0.9) because they're brief and meant to punctuate moments.

Fewer tracks, more impact

Battle Royale has four audio tracks: a character creation BGM, a game playlist BGM, a heartbeat SFX, and a death SFX. That's enough to create tension throughout a multi-hour survival game. You don't need a library of 20 tracks — a few well-chosen ones with good fade transitions do more than a cluttered soundtrack.

Fade everything

Abrupt audio cuts are jarring. Set fadeIn: 2 and fadeOut: 1.5 on BGM tracks so transitions feel smooth. Use crossfade instead of stop-then-play when switching between tracks. The default conditional BGM fade (1 second in, 1 second out) is a reasonable starting point.

When to let the AI handle it vs. when to automate

Let the AI handle audio when the trigger is narrative — a door slamming, a character gasping, an explosion. These moments are unpredictable and the AI knows when they happen because it's writing them.

Automate audio when the trigger is mechanical — entering a location, health crossing a threshold, a specific turn number. These are precise state changes that the engine tracks better than the AI.

Many worlds use both: a playlist for base music, conditional BGM for location-based switches, and AI directives for dramatic SFX moments.


See also

Complete audio schema and playlist config → World Spec: Audio