Using deep learning to synthetically generate music is a research domain that has gained more attention from the public in the past few years. A subproblem of music generation is music extension, or the task of taking existing music and extending it. This work proposes the Continuer Pipeline, a novel technique that uses deep learning to take music and extend it in 5 second increments. It does this by treating the musical generation process as an image generation problem; we utilize latent diffusion models (LDMs) to generate spectrograms, which are image representations of music. The Continuer Pipeline is able to receive a waveform as an input, and its output will be what the pipeline predicts the next five seconds might sound like. We trained the Continuer Pipeline using the expansive diffusion model functionality provided by the HuggingFace platform, and our dataset consisted of 256x256 spectrogram images representing 5-second snippets of various hip-hop songs from Spotify. The musical waveforms generated by the Continuer Pipeline are currently at a much lower quality compared to human-generated music, but we affirm that the Continuer Pipeline still has many uses in its current state, and we describe many avenues for future improvement to this technology.
Identifer | oai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-4260 |
Date | 01 June 2023 |
Creators | Roohparvar, Keon, Kurfess, Franz J. |
Publisher | DigitalCommons@CalPoly |
Source Sets | California Polytechnic State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Master's Theses |
Page generated in 0.002 seconds