• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 39
  • 10
  • 9
  • 8
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 94
  • 94
  • 94
  • 28
  • 21
  • 19
  • 17
  • 16
  • 15
  • 14
  • 14
  • 13
  • 13
  • 12
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Rozpoznávání ručně psaného notopisu / Optical Recognition of Handwritten Music Notation

Hajič, Jan January 2019 (has links)
Optical Music Recognition (OMR) is the field of computationally reading music notation. This thesis presents, in the form of dissertation by publication, contributions to the theory, resources, and methods of OMR especially for handwritten notation. The main contributions are (1) the Music Notation Graph (MuNG) formalism for describing arbitrarily complex music notation using an oriented graph that can be unambiguously interpreted in terms of musical semantics, (2) the MUSCIMA++ dataset of musical manuscripts with MuNG as ground truth that can be used to train and evaluate OMR systems and subsystems from the image all the way to extracting the musical semantics encoded therein, and (3) a pipeline for performing OMR on musical manuscripts that relies on machine learning both for notation symbol detection and the notation assembly stage, and on properties of the inferred MuNG representation to deterministically extract the musical semantics. While the the OMR pipeline does not perform flawlessly, this is the first OMR system to perform at basic useful tasks over musical semantics extracted from handwritten music notation of arbitrary complexity.
82

Identifikace cover verzí skladeb pomocí harmonických příznaků, modelu harmonie a harmonické složitosti / Cover Song Identification using Music Harmony Features, Model and Complexity Analysis

Maršík, Ladislav January 2019 (has links)
Title: Cover Song Identification using Music Harmony Features, Model and Complexity Analysis Author: Ladislav Maršík Department: Department of Software Engineering Supervisor: Prof. RNDr. Jaroslav Pokorný, CSc., Department of Software Engineering Abstract: Analysis of digital music and its retrieval based on the audio fe- atures is one of the popular topics within the music information retrieval (MIR) field. Every musical piece has its characteristic harmony structure, but harmony analysis is seldom used for retrieval. Retrieval systems that do not focus on similarities in harmony progressions may consider two versions of the same song different, even though they differ only in instrumentation or a singing voice. This thesis takes various paths in exploring, how music harmony can be used in MIR, and in particular, the cover song identification (CSI) task. We first create a music harmony model based on the knowledge of music theory. We define novel concepts: a harmonic complexity of a mu- sical piece, as well as the chord and chroma distance features. We show how these concepts can be used for retrieval, complexity analysis, and how they compare with the state-of-the-art of music harmony modeling. An extensive comparison of harmony features is then performed, using both the novel fe- atures and the...
83

Violin Artist Identification by Analyzing Raga-vistaram Audio

Ramlal, Nandakishor January 2023 (has links)
With the inception of music streaming and media content delivery platforms, there has been a tremendous increase in the music available on the internet and the metadata associated with it. In this study, we address the problem of violin artist identification, which tries to classify the performing artist based on the learned features. Even though numerous previous works studied the problem in detail and developed features and deep learning models that can be used, an interesting fact was that most studies focused on artist identification in western popular music and less on Indian classical music. For the same reason, there was no standardized dataset for this purpose. Hence, we curated a new dataset consisting of audio recordings from 6 renowned South Indian Carnatic violin artists. In this study, we explore the use of log-Mel-spectrogram feature and the embeddings generated by a pre-learned VGGish network on a Convolutional Neural Network and Convolutional Recurrent Neural Network Model. From the experiments, we observe that the Convolutional Recurrent Neural Network model trained using the log-Mel-spectrogram feature gave the optimal performance with a classification accuracy of 71.70%. / Med starten av plattformar för musikströmning och leverans av mediainnehåll har det skett en enorm ökning av musiken tillgänglig på internet och den metadata som är associerad med den. I denna studie tar vi upp problemet med fiolkonstnärsidentifikation, som försöker klassificera den utövande konstnären utifrån de inlärda dragen. Även om många tidigare verk studerade problemet i detalj och utvecklade funktioner och modeller för djupinlärning som kan användas, var ett intressant faktum att de flesta studier fokuserade på artistidentifiering i västerländsk populärmusik och mindre på indisk klassisk musik. Av samma anledning fanns det ingen standardiserad datauppsättning för detta ändamål. Därför kurerade vi en ny datauppsättning bestående av ljudinspelningar från 6 kända sydindiska karnatiska violinkonstnärer. I den här studien utforskar vi användningen av log-Melspektrogramfunktionen och inbäddningarna som genereras av ett förinlärt VGGishnätverk på ett Convolutional Neural Network och Convolutional Recurrent Neural Network Model. Från experimenten observerar vi att modellen Convolutional Recurrent Neural Network tränad med hjälp av log-Mel-spektrogramfunktionen gav optimal prestanda med en klassificeringsnoggrannhet på 71,70%.
84

Musical Instrument Activity Detection using Self-Supervised Learning and Domain Adaptation / Självövervakad inlärning och Domänadaption för Musikinstrumentsaktivitetsigenkänning

Nyströmer, Carl January 2020 (has links)
With the ever growing media and music catalogs, tools that search and navigate this data are important. For more complex search queries, meta-data is needed, but to manually label the vast amounts of new content is impossible. In this thesis, automatic labeling of musical instrument activities in song mixes is investigated, with a focus on ways to alleviate the lack of annotated data for instrument activity detection models. Two methods for alleviating the problem of small amounts of data are proposed and evaluated. Firstly, a self-supervised approach based on automatic labeling and mixing of randomized instrument stems is investigated. Secondly, a domain-adaptation approach that trains models on sampled MIDI files for instrument activity detection on recorded music is explored. The self-supervised approach yields better results compared to the baseline and points to the fact that deep learning models can learn instrument activity detection without an intrinsic musical structure in the audio mix. The domain-adaptation models trained solely on sampled MIDI files performed worse than the baseline, however using MIDI data in conjunction with recorded music boosted the performance. A hybrid model combining both self-supervised learning and domain adaptation by using both sampled MIDI data and recorded music produced the best results overall. / I och med de ständigt växande media- och musikkatalogerna krävs verktyg för att söka och navigera i dessa. För mer komplexa sökförfrågningar så behövs det metadata, men att manuellt annotera de enorma mängderna av ny data är omöjligt. I denna uppsats undersöks automatisk annotering utav instrumentsaktivitet inom musik, med ett fokus på bristen av annoterad data för modellerna för instrumentaktivitetsigenkänning. Två metoder för att komma runt bristen på data föreslås och undersöks. Den första metoden bygger på självövervakad inlärning baserad på automatisk annotering och slumpartad mixning av olika instrumentspår. Den andra metoden använder domänadaption genom att träna modeller på samplade MIDI-filer för detektering av instrument i inspelad musik. Metoden med självövervakning gav bättre resultat än baseline och pekar på att djupinlärningsmodeller kan lära sig instrumentigenkänning trots att ljudmixarna saknar musikalisk struktur. Domänadaptionsmodellerna som endast var tränade på samplad MIDI-data presterade sämre än baseline, men att använda MIDI-data tillsammans med data från inspelad musik gav förbättrade resultat. En hybridmodell som kombinerade både självövervakad inlärning och domänadaption genom att använda både samplad MIDI-data och inspelad musik gav de bästa resultaten totalt.
85

Generation of a metrical grid informed by Deep Learning-based beat estimation in jazz-ensemble recordings / Generering av ett metriskt rutnät informerat på Deep Learning-baserad beatuppskattning i jazzensembleinspelningar

Alonso Toledo Carrera, Andres January 2023 (has links)
This work uses a Deep Learning architecture, specifically a state-of-the-art Temporal Convolutional Network, to track the beat and downbeat positions in jazz-ensemble recordings to derive their metrical grid. This network architecture has been used successfully for general beat tracking purposes. However, the jazz genre presents difficulties for this Music Information Retrieval sub-task due to its inherent complexity, and there is a lack of dedicated sets for evaluating a model’s beat tracking performance for different playstyles of this specific music genre. We present a methodology in which we trained a PyTorch implementation of the original architecture with a recalculated binary cross-entropy loss that helps boost the model’s performance compared to a standard trained version. In addition, we retrained these two models using source-separated drums and bass tracks from jazz recordings to improve performance. We further improved the model’s performance by calibrating rhythm parameters using a priori knowledge that narrows the model’s prediction range. Finally, we proposed a novel jazz dataset comprised of recordings from the same jazz piece played with different styles and used this to evaluate the performance of this methodology. We also evaluate a novel sample with tempo variations to demonstrate the architecture’s versatility. This methodology, or parts of it, can be exported to other research work and music information tools that perform beat tracking or other similar Music Information Retrieval sub-tasks. / Vi använde en Deep Learning-arkitektur för att spåra beat- och downbeatpositionerna i jazz-ensembleinspelningar för att härleda deras metriska rutnät. Denna nätverksarkitektur har använts framgångsrikt för allmän taktspårning. Men jazzgenren uppvisar svårigheter för denna deluppgift för återhämtning av musikinformation på grund av dess inneboende komplexitet, och det finns en brist på dedikerade datauppsättningar för att utvärdera en modells prestanda för olika spelstilar av denna specifika musikgenre. Vi presenterar en metod där vi tränade modellen med en omräknad binär korsentropiförlust som hjälper till att öka modellens prestanda jämfört med en utbildad standardversion. Dessutom tränade vi om dessa två modeller med hjälp av källseparerade spår från jazzinspelningar för att förbättra resultaten. Vi förbättrade modellens prestanda ytterligare genom att kalibrera parametrar med hjälp av a priori kunskap. Slutligen föreslog vi en ny jazzdatauppsättning bestående av inspelningar från samma jazzstycke som spelades med olika stilar och använde detta för att utvärdera hur denna metod fungerar. Vi utvärderar också ett nytt prov med tempovariationer för att visa arkitekturens mångsidighet. Denna metodik, eller delar av den, kan exporteras till andra forskningsarbeten och musikinformationsverktyg som utför beat tracking eller andra liknande Music Information Retrieval underuppgifter.
86

Rozpoznání hudebního slohu z orchestrální nahrávky za pomoci technik Music Information Retrieval / Recognition of music style from orchestral recording using Music Information Retrieval techniques

Jelínková, Jana January 2020 (has links)
As all genres of popular music, classical music consists of many different subgenres. The aim of this work is to recognize those subgenres from orchestral recordings. It is focused on the time period from the very end of 16th century to the beginning of 20th century, which means that Baroque era, Classical era and Romantic era are researched. The Music Information Retrieval (MIR) method was used to classify chosen subgenres. In the first phase of MIR method, parameters were extracted from musical recordings and were evaluated. Only the best parameters were used as input data for machine learning classifiers, to be specific: kNN (K-Nearest Neighbor), LDA (Linear Discriminant Analysis), GMM (Gaussian Mixture Models) and SVM (Support Vector Machines). In the final chapter, all the best results are summarized. According to the results, there is significant difference between the Baroque era and the other researched eras. This significant difference led to better identification of the Baroque era recordings. On the contrary, Classical era ended up to be relatively similar to Romantic era and therefore all classifiers had less success in identification of recordings from this era. The results are in line with music theory and characteristics of chosen musical eras.
87

Semantic Federation of Musical and Music-Related Information for Establishing a Personal Music Knowledge Base

Gängler, Thomas 22 September 2011 (has links) (PDF)
Music is perceived and described very subjectively by every individual. Nowadays, people often get lost in their steadily growing, multi-placed, digital music collection. Existing music player and management applications get in trouble when dealing with poor metadata that is predominant in personal music collections. There are several music information services available that assist users by providing tools for precisely organising their music collection, or for presenting them new insights into their own music library and listening habits. However, it is still not the case that music consumers can seamlessly interact with all these auxiliary services directly from the place where they access their music individually. To profit from the manifold music and music-related knowledge that is or can be available via various information services, this information has to be gathered up, semantically federated, and integrated into a uniform knowledge base that can personalised represent this data in an appropriate visualisation to the users. This personalised semantic aggregation of music metadata from several sources is the gist of this thesis. The outlined solution particularly concentrates on users’ needs regarding music collection management which can strongly alternate between single human beings. The author’s proposal, the personal music knowledge base (PMKB), consists of a client-server architecture with uniform communication endpoints and an ontological knowledge representation model format that is able to represent the versatile information of its use cases. The PMKB concept is appropriate to cover the complete information flow life cycle, including the processes of user account initialisation, information service choice, individual information extraction, and proactive update notification. The PMKB implementation makes use of SemanticWeb technologies. Particularly the knowledge representation part of the PMKB vision is explained in this work. Several new Semantic Web ontologies are defined or existing ones are massively modified to meet the requirements of a personalised semantic federation of music and music-related data for managing personal music collections. The outcome is, amongst others, • a new vocabulary for describing the play back domain, • another one for representing information service categorisations and quality ratings, and • one that unites the beneficial parts of the existing advanced user modelling ontologies. The introduced vocabularies can be perfectly utilised in conjunction with the existing Music Ontology framework. Some RDFizers that also make use of the outlined ontologies in their mapping definitions, illustrate the fitness in practise of these specifications. A social evaluation method is applied to carry out an examination dealing with the reutilisation, application and feedback of the vocabularies that are explained in this work. This analysis shows that it is a good practise to properly publish Semantic Web ontologies with the help of some Linked Data principles and further basic SEO techniques to easily reach the searching audience, to avoid duplicates of such KR specifications, and, last but not least, to directly establish a \"shared understanding\". Due to their project-independence, the proposed vocabularies can be deployed in every knowledge representation model that needs their knowledge representation capacities. This thesis added its value to make the vision of a personal music knowledge base come true.
88

Semantic Federation of Musical and Music-Related Information for Establishing a Personal Music Knowledge Base

Gängler, Thomas 20 May 2011 (has links)
Music is perceived and described very subjectively by every individual. Nowadays, people often get lost in their steadily growing, multi-placed, digital music collection. Existing music player and management applications get in trouble when dealing with poor metadata that is predominant in personal music collections. There are several music information services available that assist users by providing tools for precisely organising their music collection, or for presenting them new insights into their own music library and listening habits. However, it is still not the case that music consumers can seamlessly interact with all these auxiliary services directly from the place where they access their music individually. To profit from the manifold music and music-related knowledge that is or can be available via various information services, this information has to be gathered up, semantically federated, and integrated into a uniform knowledge base that can personalised represent this data in an appropriate visualisation to the users. This personalised semantic aggregation of music metadata from several sources is the gist of this thesis. The outlined solution particularly concentrates on users’ needs regarding music collection management which can strongly alternate between single human beings. The author’s proposal, the personal music knowledge base (PMKB), consists of a client-server architecture with uniform communication endpoints and an ontological knowledge representation model format that is able to represent the versatile information of its use cases. The PMKB concept is appropriate to cover the complete information flow life cycle, including the processes of user account initialisation, information service choice, individual information extraction, and proactive update notification. The PMKB implementation makes use of SemanticWeb technologies. Particularly the knowledge representation part of the PMKB vision is explained in this work. Several new Semantic Web ontologies are defined or existing ones are massively modified to meet the requirements of a personalised semantic federation of music and music-related data for managing personal music collections. The outcome is, amongst others, • a new vocabulary for describing the play back domain, • another one for representing information service categorisations and quality ratings, and • one that unites the beneficial parts of the existing advanced user modelling ontologies. The introduced vocabularies can be perfectly utilised in conjunction with the existing Music Ontology framework. Some RDFizers that also make use of the outlined ontologies in their mapping definitions, illustrate the fitness in practise of these specifications. A social evaluation method is applied to carry out an examination dealing with the reutilisation, application and feedback of the vocabularies that are explained in this work. This analysis shows that it is a good practise to properly publish Semantic Web ontologies with the help of some Linked Data principles and further basic SEO techniques to easily reach the searching audience, to avoid duplicates of such KR specifications, and, last but not least, to directly establish a \"shared understanding\". Due to their project-independence, the proposed vocabularies can be deployed in every knowledge representation model that needs their knowledge representation capacities. This thesis added its value to make the vision of a personal music knowledge base come true.:1 Introduction and Background 11 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Personal Music Collection Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 Music Information Management 17 2.1 Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.1 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.1.1 Knowledge Representation Models . . . . . . . . . . . . . . . . . 18 2.1.1.2 Semantic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.1.3 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.2 Knowledge Management Systems . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.2.1 Information Services . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.2.2 Ontology-based Distributed Knowledge Management Systems . . 20 2.1.2.3 Knowledge Management System Design Guideline . . . . . . . . 21 2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Semantic Web Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 The Evolution of the World Wide Web . . . . . . . . . . . . . . . . . . . . . 22 Personal Music Knowledge Base Contents 2.2.1.1 The Hypertext Web . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.1.2 The Normative Principles of Web Architecture . . . . . . . . . . . 23 2.2.1.3 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 Common Semantic Web Knowledge Representation Languages . . . . . . 25 2.2.3 Resource Description Levels and their Relations . . . . . . . . . . . . . . . 26 2.2.4 Semantic Web Knowledge Representation Models . . . . . . . . . . . . . . 29 2.2.4.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.4.2 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.4.3 Context Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.4.4 Storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.4.5 Providing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.4.6 Consuming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 Music Content and Context Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3.1 Categories of Musical Characteristics . . . . . . . . . . . . . . . . . . . . . 37 2.3.2 Music Metadata Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.3 Music Metadata Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3.3.1 Audio Signal Carrier Indexing Services . . . . . . . . . . . . . . . . 41 2.3.3.2 Music Recommendation and Discovery Services . . . . . . . . . . 42 2.3.3.3 Music Content and Context Analysis Services . . . . . . . . . . . 43 2.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4 Personalisation and Environmental Context . . . . . . . . . . . . . . . . . . . . . . 44 2.4.1 User Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.2 Context Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4.3 Stereotype Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3 The Personal Music Knowledge Base 48 3.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.1 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.2 Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.1 User Account Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.2 Individual Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 Information Service Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.4 Proactive Update Notification . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.5 Information Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.6 Personal Associations and Context . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 A Personal Music Knowledge Base 57 4.1 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.1 The Info Service Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.2 The Play Back Ontology and related Ontologies . . . . . . . . . . . . . . . . 61 4.1.2.1 The Ordered List Ontology . . . . . . . . . . . . . . . . . . . . . . 61 4.1.2.2 The Counter Ontology . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.2.3 The Association Ontology . . . . . . . . . . . . . . . . . . . . . . . 64 4.1.2.4 The Play Back Ontology . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.3 The Recommendation Ontology . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.4 The Cognitive Characteristics Ontology and related Vocabularies . . . . . . 72 4.1.4.1 The Weighting Ontology . . . . . . . . . . . . . . . . . . . . . . . 72 4.1.4.2 The Cognitive Characteristics Ontology . . . . . . . . . . . . . . . 73 4.1.4.3 The Property Reification Vocabulary . . . . . . . . . . . . . . . . . 78 4.1.5 The Media Types Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Knowledge Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5 Personal Music Knowledge Base in Practice 87 5.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1.1 AudioScrobbler RDF Service . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1.2 PMKB ID3 Tag Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.1 Reutilisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2.3 Reviews and Mentions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2.4 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6 Conclusion and Future Work 93 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
89

Modeling High-Dimensional Audio Sequences with Recurrent Neural Networks

Boulanger-Lewandowski, Nicolas 04 1900 (has links)
Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement. / This thesis studies models of high-dimensional sequences based on recurrent neural networks (RNNs) and their application to music and speech. While in principle RNNs can represent the long-term dependencies and complex temporal dynamics present in real-world sequences such as video, audio and natural language, they have not been used to their full potential since their introduction by Rumelhart et al. (1986a) due to the difficulty to train them efficiently by gradient-based optimization. In recent years, the successful application of Hessian-free optimization and other advanced training techniques motivated an increase of their use in many state-of-the-art systems. The work of this thesis is part of this development. The main idea is to exploit the power of RNNs to learn a probabilistic description of sequences of symbols, i.e. high-level information associated with observed signals, that in turn can be used as a prior to improve the accuracy of information retrieval. For example, by modeling the evolution of note patterns in polyphonic music, chords in a harmonic progression, phones in a spoken utterance, or individual sources in an audio mixture, we can improve significantly the accuracy of polyphonic transcription, chord recognition, speech recognition and audio source separation respectively. The practical application of our models to these tasks is detailed in the last four articles presented in this thesis. In the first article, we replace the output layer of an RNN with conditional restricted Boltzmann machines to describe much richer multimodal output distributions. In the second article, we review and develop advanced techniques to train RNNs. In the last four articles, we explore various ways to combine our symbolic models with deep networks and non-negative matrix factorization algorithms, namely using products of experts, input/output architectures, and generative frameworks that generalize hidden Markov models. We also propose and analyze efficient inference procedures for those models, such as greedy chronological search, high-dimensional beam search, dynamic programming-like pruned beam search and gradient descent. Finally, we explore issues such as label bias, teacher forcing, temporal smoothing, regularization and pre-training.
90

Non-negative matrix decomposition approaches to frequency domain analysis of music audio signals

Wood, Sean 12 1900 (has links)
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception.

Page generated in 0.2073 seconds