stabilityai/stable-audio-open-1.0

Last updated on Jun 20, 2024

💡

Original seeder @aitracker

Stable Audio Open 1.0

Model Overview

Stable Audio Open 1.0 is a latent diffusion model that generates variable-length stereo audio (up to 47s) at 44.1kHz from text prompts. It consists of an autoencoder, a T5-based text embedding, and a transformer-based diffusion model.

Usage

To use the model, download the stable-audio-tools library and follow the example code provided.

Model Details

Model type: Latent diffusion model based on a transformer architecture
Language: English
License: See LICENSE file
Commercial License: Refer to https://stability.ai/membership
Training dataset: 486,492 audio recordings from Freesound and Free Music Archive (FMA)
Attribution: See attribution files for Freesound and FMA

Mitigations

To ensure no unauthorized copyrighted music was present in the training data, we conducted an in-depth analysis using the PANNs music classifier and Audible Magic's identification services.

Use and Limitations

Intended use: Research and experimentation on AI-based music and audio generation
Out-of-scope use cases: Do not use the model to create hostile or alienating environments for people
Limitations:
- Cannot generate realistic vocals
- Performs poorly in languages other than English
- Biased towards certain music styles and cultures
- Better at generating sound effects and field recordings than music
- Prompt engineering may be required for satisfying results
Biases: Reflects biases from the training data, which may lack diversity and representation of all cultures and music genres.