stabilityai/stable-audio-open-1.0

💡
Original seeder @aitracker

Stable Audio Open 1.0

Model Overview

Stable Audio Open 1.0 is a latent diffusion model that generates variable-length stereo audio (up to 47s) at 44.1kHz from text prompts. It consists of an autoencoder, a T5-based text embedding, and a transformer-based diffusion model.

Usage

To use the model, download the stable-audio-tools library and follow the example code provided.

Model Details

  • Model type: Latent diffusion model based on a transformer architecture
  • Language: English
  • License: See LICENSE file
  • Commercial License: Refer to https://stability.ai/membership
  • Training dataset: 486,492 audio recordings from Freesound and Free Music Archive (FMA)
  • Attribution: See attribution files for Freesound and FMA

Mitigations

To ensure no unauthorized copyrighted music was present in the training data, we conducted an in-depth analysis using the PANNs music classifier and Audible Magic's identification services.

Use and Limitations

  • Intended use: Research and experimentation on AI-based music and audio generation
  • Out-of-scope use cases: Do not use the model to create hostile or alienating environments for people
  • Limitations:
    • Cannot generate realistic vocals
    • Performs poorly in languages other than English
    • Biased towards certain music styles and cultures
    • Better at generating sound effects and field recordings than music
    • Prompt engineering may be required for satisfying results
  • Biases: Reflects biases from the training data, which may lack diversity and representation of all cultures and music genres.