How to Use Stable Audio 3 AI Audio Generator

A complete step-by-step guide for creators who want to generate music, ambient beds, and SFX from a text prompt — or transform and inpaint existing audio — directly in the browser, no local install required.

Stable Audio 3 turns a text prompt into a downloadable audio clip — entirely in your browser.

What Stable Audio 3 Is, and What This Guide Covers

Stable Audio 3 is an open-weights AI audio model family from Stability AI, released in May 2026. It supports three inference modes: Text-to-Audio (generate a clip from a written prompt), Audio-to-Audio editing (transform an uploaded clip while preserving its timing), and Audio Inpaint (regenerate a selected region of an uploaded clip while preserving the rest).

Stableaudio3.com is an online product experience for that workflow. This guide walks through all three modes, the prompt formula that works across them, genre and instrument vocabulary, BPM and key guidance, and the common mistakes that make AI audio sound generic. The same guidance works whether you are sketching music, building ambient beds, prototyping game audio, or producing podcast intros.

Step 1 — Sign Up and Access the Generator

Open the generator

Open the Stable Audio 3 generator from the navigation. Sign up if you have not already — new users get 100 free credits, enough to create about 100 seconds of audio.

Pick a mode

Choose Text-to-Audio if you are creating from scratch. Choose Audio-to-Audio if you want to transform an existing clip. Choose Audio Inpaint if most of a clip works but a section needs to be regenerated.

Plan a short first test

Short clips are better for prompt exploration. Once the direction works, spend more credits on longer or higher-quality versions.

💡

First generation tip: Keep your first test short — a 15–30 second clip is the fastest way to check whether your prompt direction is working before spending more credits on longer or higher-quality outputs. The Stable Audio 3 pricing page explains how credit packs map to short-clip equivalents.

Step 2 — Text-to-Audio: How It Works

Text-to-Audio is the primary mode for generating audio from scratch. You describe the clip you want — genre, instruments, mood, tempo — and Stable Audio 3 produces an audio file.

Prompt Formula

Genre / Style+Instruments+Mood+Tempo / BPM+Key (optional)+Production Style+Duration

Prompt Examples You Can Copy

Cinematic Ambient

"A cinematic ambient track with slow synth pads, deep sub bass, distant piano notes, warm reverb, 70 BPM in A minor, soundtrack production style, 30 seconds."

Lo-fi Hip Hop Loop

"A lo-fi hip hop beat with mellow piano chords, soft sub bass, warm vinyl crackle, brushed drums, 80 BPM, relaxed afternoon mood, 30-second loop."

Electronic Dance

"An energetic electronic dance bed with driving 4-on-the-floor kick, plucky lead synth, bright hi-hats, 128 BPM in F minor, festival production style, 45 seconds."

Game UI SFX

"A short crisp UI confirmation sound effect for a sci-fi game interface, two layered tones with a quick decay, clean digital character, 1.5 seconds."

Podcast Intro Bed

"A warm podcast intro bed with rising synth pad, gentle kick drum, soft mallet percussion, 90 BPM, optimistic mood, 15 seconds with a tail for voiceover."

Step 3 — Audio-to-Audio: Transform an Existing Clip

Audio-to-Audio takes a clip you upload and reshapes it based on a transformation prompt. The model preserves the timing and structure of the source while changing genre, instrumentation, or feel.

Upload a clean source clip

Use an MP3, WAV, or FLAC file. The cleaner the upload, the cleaner the transformation. Avoid clips with heavy clipping or unclear instrumentation.

Describe the transformation

Say what should change, not what should stay the same. Examples: "transform into a lo-fi hip hop version," "shift to orchestral arrangement," "convert to a synthwave bed." Keep the change description focused.

Generate a short test first

If the transformation went too far and lost the original feel, dial back the prompt. If it didn't go far enough, be more specific about what should change.

Reuse the prompt as a template

Once a prompt + upload combination produces a transformation you like, save it as a template for similar source clips.

⚠️

Only upload audio you have rights to use. Uploading copyrighted recordings, signed songs, or someone else's production without permission is not allowed under the Terms of Service.

Step 4 — Audio Inpaint: Regenerate a Region

Audio Inpaint lets you select a region of an uploaded clip on the waveform and regenerate just that section. The rest of the clip stays untouched. Use it to fix a problem section, remove an unwanted sound, swap an instrument, or extend a loop.

Upload the source clip

Choose the audio file that needs a fix or extension. Make sure it is audio you have rights to use.

Select the region on the waveform

Drag the handles to mark the start and end of the region you want the model to regenerate. For continuations, mark the very end and extend past the original.

Match the surrounding context in the prompt

Use the same genre, instruments, tempo, and key as the rest of the clip. If the rest is a lo-fi piano loop, the regenerated region should match: "Regenerate the selected region as a smooth piano transition that bridges into the next phrase."

Check the transitions

Listen to the full clip with the new region in place. Mismatches usually show at the start and end of the regenerated region — tighten the prompt or try a slightly larger region for more blend context.

Step 5 — Choose the Right Settings

Setting	Options	When to use
Mode	T2A / A2A / Inpaint	Text-to-Audio creates from a prompt. Audio-to-Audio transforms an upload. Audio Inpaint regenerates a region of an upload.
Duration	5s · 15s · 30s · 60s+	Short clips for SFX and prompt exploration. Longer durations for music beds and ambient loops. Inpaint duration is the region size.
Quality	Standard / High	Start with standard for prompt exploration. Move to high once the prompt direction is working — higher quality costs more credits.
BPM	40–180 BPM	Specify when the use case has a sync target (video cut, voiceover bed, loop at a known tempo). Leave open for exploratory sketches.
Key	Major / Minor or open	Specify when you have a tonal center in mind. For unrelated sketches, leave open and let the model choose.

Step 6 — Genre, Mood, and Production Vocabulary

Genre Vocabulary

CinematicFilm-score adjacent — strings, pads, low brass, swelling dynamics.

AmbientSlow, atmospheric, often beatless or minimal-beat textures.

ElectronicSynth-driven; broad — narrow with subgenres like house, techno, or drum and bass.

Lo-fi hip hopMellow piano, brushed or boom-bap drums, vinyl crackle, 70–90 BPM.

SynthwaveRetro 80s-style synths, gated reverb drums, neon mood, 100–120 BPM.

OrchestralAcoustic strings, brass, woodwinds; works for cinematic and score use.

Drum and bassFast breakbeats with heavy sub bass, 160–180 BPM.

Jazz / folk / rockAcoustic-led genres; specify era (modern / vintage) and ensemble size.

Mood Words That Shape the Feel

Calm / sereneSlow tempo, soft dynamics, warm pads. Good for focus and ambient beds.

Tense / urgentDriving rhythm, dissonant intervals, builds. Useful for trailers and action beats.

MelancholicMinor key, slow decay, piano or strings. Reflective and emotional.

Hopeful / upliftingMajor key, rising progressions, brighter timbres. Common for ad and brand work.

Epic / cinematicBig dynamics, low brass, percussive swells. Good for reveals and finales.

Retro / nostalgicVintage production cues — tape hiss, analog synths, vinyl crackle.

Dreamy / etherealReverb-heavy, breathy textures, slow harmonic motion. Good for surreal scenes.

Gritty / darkDistortion, low-mid emphasis, sparse high end. Good for cyberpunk or industrial.

Step 7 — How to Iterate for Better Results

Listen before rewriting

Review the output and identify the weakest element first — instruments? tempo? mood? Don't rewrite the whole prompt if only one variable needs tightening.

Change one variable at a time

Adjust one or two things per iteration so you can tell what actually improved the result.

Preserve what works

Keep the prompt phrases that produced strong elements. If the kick drum sounds right but the synth is off, change only the synth description.

Save successful patterns

When a prompt produces a useful result, save the structure as a template for similar future work.

Common Mistakes and How to Fix Them

Problem	Likely Cause	Fix
Prompt is too vague	Genre alone is not enough direction.	Add instruments, mood, tempo, and a production style cue.
Mixing too many genres	Asking for cinematic + lo-fi + synthwave produces muddy output.	Pick one main genre and at most one supporting flavor.
Tempo feels wrong	No BPM specified, so the model defaulted to genre convention.	Set BPM explicitly when the use case has a sync target.
Asked for vocals or speech	Out of scope — Stable Audio 3 is positioned for music, ambient, and SFX.	Use a dedicated voice or TTS tool for vocals; use Stable Audio 3 for the instrumental bed.
Upload failed for A2A or Inpaint	Unusual codec, DRM-locked file, or oversized upload.	Convert to MP3, WAV, or FLAC. Stay within the size limit shown on the upload field.
Inpaint region clashes with the rest of the clip	Prompt did not match the surrounding genre, instruments, or tempo.	Tighten the prompt to match the rest, or try a slightly larger region for more blend context.

FAQ

Questions About Stable Audio 3 AI Audio Generator

How do I start using Stable Audio 3?▼

Open the generator, choose Text to Audio, Audio to Audio, or Audio Inpaint, write a descriptive prompt, select a duration, and generate a short test clip. Start simple, then improve the prompt by adding genre, instruments, mood, and tempo.

What is the best prompt structure for Stable Audio 3?▼

Use the formula subject + genre + instruments + mood + tempo + key (optional) + production style + duration. This gives the model both sonic content and structural direction. Prompts that only name the genre usually leave too much interpretation open.

Should I start with text-to-audio or one of the editing modes?▼

Use text-to-audio when you want to create audio from scratch. Use audio-to-audio when you already have a clip that needs a genre or instrumentation shift. Use audio inpaint when most of an existing clip works but a specific section needs to be regenerated.

How long should my first clip be?▼

Start with a short test. A 15–30 second clip is enough to evaluate prompt direction, instruments, and mood. Once the prompt is working, you can spend more credits on longer versions.

Why does tempo (BPM) matter so much?▼

Tempo defines the structural feel of the clip. For sync-critical use cases (video cuts, music beds under voiceover, loops at a known BPM), specifying tempo is essential. For exploratory sketches, you can leave tempo open and adjust on the second iteration.

Can I generate vocals or singing with Stable Audio 3?▼

No. Stable Audio 3 is positioned around music, ambient, and sound effects. Vocal generation, singing voice synthesis, and speech-to-audio are different model classes — use a dedicated voice or TTS tool for those use cases.

Can I reuse prompt templates?▼

Yes. Reusing a strong template is one of the fastest ways to improve results. Keep the structure, then replace the genre, instruments, mood, tempo, and key. This helps you generate new ideas without starting from a blank prompt each time.

What audio formats can I upload for A2A and Inpaint?▼

MP3, WAV, and FLAC are the most reliable. Other common formats may also work. Stay within the file size limit shown on the upload field, and make sure you have rights to use the audio you upload.

Ready to Generate Your First Audio Clip?

Open the Stable Audio 3 AI Audio Generator and start creating music, ambient, or SFX from a text prompt — or edit and inpaint existing audio.

Start Generating Free →