• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A Novel Approach to Extending Music Using Latent Diffusion

Roohparvar, Keon, Kurfess, Franz J. 01 June 2023 (has links) (PDF)
Using deep learning to synthetically generate music is a research domain that has gained more attention from the public in the past few years. A subproblem of music generation is music extension, or the task of taking existing music and extending it. This work proposes the Continuer Pipeline, a novel technique that uses deep learning to take music and extend it in 5 second increments. It does this by treating the musical generation process as an image generation problem; we utilize latent diffusion models (LDMs) to generate spectrograms, which are image representations of music. The Continuer Pipeline is able to receive a waveform as an input, and its output will be what the pipeline predicts the next five seconds might sound like. We trained the Continuer Pipeline using the expansive diffusion model functionality provided by the HuggingFace platform, and our dataset consisted of 256x256 spectrogram images representing 5-second snippets of various hip-hop songs from Spotify. The musical waveforms generated by the Continuer Pipeline are currently at a much lower quality compared to human-generated music, but we affirm that the Continuer Pipeline still has many uses in its current state, and we describe many avenues for future improvement to this technology.
2

Text to Music Audio Generation using Latent Diffusion Model : A re-engineering of AudioLDM Model / Text till musik ljudgenerering med hjälp av latent diffusionsmodell : En omkonstruktion av AudioLDM-modellen

Wang, Ernan January 2023 (has links)
In the emerging field of audio generation using diffusion models, this project pioneers the adaptation of the AudioLDM model framework, initially designed for text-to-daily sounds generation, towards text-to-music audio generation. This shift addresses a gap in the current scope of audio diffusion models, predominantly focused on everyday sounds. The motivation for this thesis stems from AudioLDM’s remarkable generative capabilities in producing daily sounds from text descriptions. However, its application in music audio generation remains underexplored. The thesis aims to modify AudioLDM’s architecture and training objectives to cater to the unique nuances of musical audio. The re-engineering process involved two primary methods. First, a dataset was constructed by sourcing a variety of music audio samples from the A Dataset For Music Analysis (FMA) [1] and generating pseudo captions using a Large Language Model specified in music captioning. This dataset served as the foundation for training the adapted model. Second, the model’s diffusion backbone, a UNet architecture, was revised in its text conditioning approach by incorporating both the CLAP encoder and the T5 text encoder. This dualencoding method, coupled with a shift from the traditional noise prediction objective to the V-objective, aimed to enhance the model’s performance in generating coherent and musically relevant audio. The effectiveness of these adaptations was validated through both subjective and objective evaluations. Compared to the original AudioLDM model, the adapted version demonstrated superior quality in the audio output and a higher relevance between text prompts and generated music. This advancement not only proves the feasibility of transforming AudioLDM for music generation but also opens new avenues for research and application in text-to-music audio synthesis / Inom det framväxande området för ljudgenerering med användning av diffusionsmodeller, banar detta projekt för anpassningen av AudioLDMmodellramverket, som ursprungligen utformades för generering av text-tilldagliga ljud, mot ljudgenerering av text-till-musik. Denna förändring tar itu med en lucka i den nuvarande omfattningen av ljuddiffusionsmodeller, främst inriktade på vardagliga ljud. Motivationen för denna avhandling kommer från AudioLDM:s anmärkningsvärda generativa förmåga att producera dagliga ljud från textbeskrivningar. Dock är dess tillämpning i musikljudgenerering fortfarande underutforskad. Avhandlingen syftar till att modifiera AudioLDM:s arkitektur och utbildningsmål för att tillgodose de unika nyanserna av musikaliskt ljud. Omarbetningsprocessen involverade två primära metoder. Först konstruerades en datauppsättning genom att hämta en mängd olika musikljudprover från A Dataset For Music Analysis (FMA) [1] och generera pseudotexter med hjälp av en Large Language Model specificerad i musiktextning. Denna datauppsättning fungerade som grunden för att träna den anpassade modellen. För det andra reviderades modellens diffusionsryggrad, en UNet-arkitektur, i sin textkonditioneringsmetod genom att inkludera både CLAP-kodaren och T5-textkodaren. Denna dubbelkodningsmetod, i kombination med en övergång från det traditionella brusförutsägelsemålet till V-målet, syftade till att förbättra modellens prestanda för att generera sammanhängande och musikaliskt relevant ljud. Effektiviteten av dessa anpassningar validerades genom både subjektiva och objektiva utvärderingar. Jämfört med den ursprungliga AudioLDMmodellen visade den anpassade versionen överlägsen kvalitet i ljudutgången och en högre relevans mellan textmeddelanden och genererad musik. Detta framsteg bevisar inte bara möjligheten att transformera AudioLDM för musikgenerering utan öppnar också nya vägar för forskning och tillämpning inom text-till-musik ljudsyntes.
3

Diffusion models for anomaly detection in digital pathology

Bromée, Ruben January 2023 (has links)
Challenges within the field of pathology leads to a high workload for pathologists. Machine learning has the ability to assist pathologists in their daily work and has shown good performance in a research setting. Anomaly detection is useful for preventing machine learning models used for classification and segmentation to be applied on data outside of the training distribution of the model. The purpose of this work was to create an optimal anomaly detection pipeline for digital pathology data using a latent diffusion model and various image similarity metrics. An anomaly detection pipeline was created which used a partial diffusion process, a combined similarity metric containing the result of multiple other similarity metrics and a contrast matching strategy for better anomaly detection performance. The anomaly detection pipeline had a good performance in an out-of-distribution detection task with an ROC-AUC score of 0.90. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p>

Page generated in 0.0851 seconds