Global ETD Search

Return to search

A Novel Approach to Extending Music Using Latent Diffusion

Using deep learning to synthetically generate music is a research domain that has gained more attention from the public in the past few years. A subproblem of music generation is music extension, or the task of taking existing music and extending it. This work proposes the Continuer Pipeline, a novel technique that uses deep learning to take music and extend it in 5 second increments. It does this by treating the musical generation process as an image generation problem; we utilize latent diffusion models (LDMs) to generate spectrograms, which are image representations of music. The Continuer Pipeline is able to receive a waveform as an input, and its output will be what the pipeline predicts the next five seconds might sound like. We trained the Continuer Pipeline using the expansive diffusion model functionality provided by the HuggingFace platform, and our dataset consisted of 256x256 spectrogram images representing 5-second snippets of various hip-hop songs from Spotify. The musical waveforms generated by the Continuer Pipeline are currently at a much lower quality compared to human-generated music, but we affirm that the Continuer Pipeline still has many uses in its current state, and we describe many avenues for future improvement to this technology.

Deep Learning

Artificial Intelligence

Machine Learning

Music Generation

Latent Diffusion Models

Artificial Intelligence and Robotics

Theory and Algorithms

Identifer	oai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-4260
Date	01 June 2023
Creators	Roohparvar, Keon, Kurfess, Franz J.
Publisher	DigitalCommons@CalPoly
Source Sets	California Polytechnic State University
Detected Language	English
Type	text
Format	application/pdf
Source	Master's Theses

Page generated in 0.0018 seconds

A Novel Approach to Extending Music Using Latent Diffusion

Description

Links & Downloads

Tags

Additional Fields