stabilityai/stable-audio-open-1.0
💡
Original seeder @aitracker
Stable Audio Open 1.0
Model Overview
Stable Audio Open 1.0 is a latent diffusion model that generates variable-length stereo audio (up to 47s) at 44.1kHz from text prompts. It consists of an autoencoder, a T5-based text embedding, and a transformer-based diffusion model.
Usage
To use the model, download the stable-audio-tools
library and follow the example code provided.
Model Details
- Model type: Latent diffusion model based on a transformer architecture
- Language: English
- License: See LICENSE file
- Commercial License: Refer to https://stability.ai/membership
- Training dataset: 486,492 audio recordings from Freesound and Free Music Archive (FMA)
- Attribution: See attribution files for Freesound and FMA
Mitigations
To ensure no unauthorized copyrighted music was present in the training data, we conducted an in-depth analysis using the PANNs music classifier and Audible Magic's identification services.
Use and Limitations
- Intended use: Research and experimentation on AI-based music and audio generation
- Out-of-scope use cases: Do not use the model to create hostile or alienating environments for people
- Limitations:
- Cannot generate realistic vocals
- Performs poorly in languages other than English
- Biased towards certain music styles and cultures
- Better at generating sound effects and field recordings than music
- Prompt engineering may be required for satisfying results
- Biases: Reflects biases from the training data, which may lack diversity and representation of all cultures and music genres.