Global ETD Search

91	Modeling High-Dimensional Audio Sequences with Recurrent Neural Networks Boulanger-Lewandowski, Nicolas 04 1900 (has links) Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement. / This thesis studies models of high-dimensional sequences based on recurrent neural networks (RNNs) and their application to music and speech. While in principle RNNs can represent the long-term dependencies and complex temporal dynamics present in real-world sequences such as video, audio and natural language, they have not been used to their full potential since their introduction by Rumelhart et al. (1986a) due to the difficulty to train them efficiently by gradient-based optimization. In recent years, the successful application of Hessian-free optimization and other advanced training techniques motivated an increase of their use in many state-of-the-art systems. The work of this thesis is part of this development. The main idea is to exploit the power of RNNs to learn a probabilistic description of sequences of symbols, i.e. high-level information associated with observed signals, that in turn can be used as a prior to improve the accuracy of information retrieval. For example, by modeling the evolution of note patterns in polyphonic music, chords in a harmonic progression, phones in a spoken utterance, or individual sources in an audio mixture, we can improve significantly the accuracy of polyphonic transcription, chord recognition, speech recognition and audio source separation respectively. The practical application of our models to these tasks is detailed in the last four articles presented in this thesis. In the first article, we replace the output layer of an RNN with conditional restricted Boltzmann machines to describe much richer multimodal output distributions. In the second article, we review and develop advanced techniques to train RNNs. In the last four articles, we explore various ways to combine our symbolic models with deep networks and non-negative matrix factorization algorithms, namely using products of experts, input/output architectures, and generative frameworks that generalize hidden Markov models. We also propose and analyze efficient inference procedures for those models, such as greedy chronological search, high-dimensional beam search, dynamic programming-like pruned beam search and gradient descent. Finally, we explore issues such as label bias, teacher forcing, temporal smoothing, regularization and pre-training. Apprentissage automatique Machine learning Réseaux de neurones récurrents Recurrent neural networks Recherche d'information musicale Music information retrieval Modèles séquentiels Sequential models Transcription polyphonique Polyphonic transcription Reconnaissance de la parole Speech recognition Factorisation matricielle non-négative Non-negative matrix factorization
92	Non-negative matrix decomposition approaches to frequency domain analysis of music audio signals Wood, Sean 12 1900 (has links) On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception. Apprentissage machine non-supervisé Apprentissage machine semi-supervisé Factorisation matricielle non-négative Encodage parcimonieux Extraction de l’information musicale Détection de la hauteur de notes Unsupervised machine learning Semi-supervised machine learning Non-negative matrix factorization Sparse coding Music information retrieval Pitch detection
93	De l'usage des métadonnées dans l'objet sonore / The use of sound objects metadata Debaecker, Jean 09 October 2012 (has links) La reconnaissance des émotions dans la musique est un challenge industriel et académique. À l’heure de l’explosion des contenus multimédias, il nous importe de concevoir des ensembles structurés de termes, concepts et métadonnées facilitant l’organisation et l’accès aux connaissances. Notre problématique est la suivante : est-Il possible d'avoir une connaissance a priori de l'émotion en vue de son élicitation ? Autrement dit, dans quelles mesures est-Il possible d'inscrire les émotions ressenties à l'écoute d'une oeuvre musicale dans un régime de métadonnées et de bâtir une structure formelle algorithmique permettant d'isoler le mécanisme déclencheur des émotions ? Est-Il possible de connaître l'émotion que l'on ressentira à l'écoute d'une chanson, avant de l'écouter ? Suite à l'écoute, son élicitation est-Elle possible ? Est-Il possible de formaliser une émotion dans le but de la sauvegarder et de la partager ? Nous proposons un aperçu de l'existant et du contexte applicatif ainsi qu'une réflexion sur les enjeux épistémologiques intrinsèques et liés à l'indexation même de l'émotion : à travers lune démarche psychologique, physiologique et philosophique, nous proposerons un cadre conceptuel de cinq démonstrations faisant état de l'impossible mesure de l'émotion, en vue de son élicitation. Une fois dit à travers notre cadre théorique qu'il est formellement impossible d'indexer les émotions, il nous incombe de comprendre la mécanique d'indexation cependant proposée par les industriels et académiques. Nous proposons, via l'analyse d'enquêtes quantitatives et qualitatives, la production d'un algorithme effectuant des préconisationsd'écoute d’œuvres musicales. / Emotion recognition in music is an industrial and academic challenge. In the age of multimedia content explosion, we mean to design structured sets of terms, concepts and metadata facilitating access to organized knowledge. Here is our research question : can we have an a priori knowledge of emotion that could be elicited afterwards ? In other words, to what extent can we record emotions felt while listening to music, so as to turn them into metadata ? Can we create an algorithm enabling us to detect how emotions are released ? Are we likely to guess ad then elicit the emotion an individual will feel before listening to a particular song ? Can we formalize emotions to save, record and share them ? We are giving an overview of existing research, and tackling intrinsic epistemological issues related to emotion existing, recording and sharing out. Through a psychological, physiological ad philosophical approach, we are setting a theoretical framework, composed of five demonstrations which assert we cannot measure emotions in order to elicit them. Then, a practical approach will help us to understand the indexing process proposed in academic and industrial research environments. Through the analysis of quantitative and qualitative surveys, we are defining the production of an algorithm, enabling us to recommend musical works considering emotion. Art musical Elicitation Indexation audio Métadonnées audio Métadonnées pour le web sémantique Musique Qualia Recherche d'information musicale Recherche d'information musicale Similarité musicale Audio indexation Audio metadata Emotion recognition in music Indexing algorithm for emotions Music information retrieval Metadata for the semantic web Music Musical similarity Musical works Qualia
94	中文流行音樂詞曲情意關聯分析 / Conception association analysis between lyrics and music of Chinese popular music 林志傑, Lin, Chih Chieh Unknown Date (has links) 本篇論文旨在研究中文流行音樂歌詞與歌曲之間情意的關聯性，並設計一個能推薦出符合歌曲情意的「以曲找詞歌詞推薦系統」。流行音樂（Popular Music）在廣義上的定義為透過大眾媒體傳播、以大眾為閱聽對象的歌曲。其大眾化的特徵，使得流行音樂歌詞的主題多與日常生活息息相關且能清楚表達歌曲的情意，並以其所引起的共鳴性決定歌曲是否具出版的商業價值，人們也常常使用流行音樂歌曲來唱出屬於自己的故事、屬於自己的心聲。因此，本篇論文提出自動為流行音樂歌曲推薦符合歌曲情意的歌詞，讓舊有的歌曲搭配上新的歌詞，而當一首歌曲搭配了不同的歌詞就有了不同的故事，也帶給了原曲新的生命，達成一曲多詞的數位加值效果。由文獻及專業音樂創作者的論述中，我們可以了解流行音樂詞曲有相關的搭配關係，其中又以詞曲情意的搭配關係最為重要，因此詞曲情意之間的關聯性為本研究問題的核心所在。透過大量分析市面上的流行歌曲，我們便可以從中看出詞曲之間情意搭配的線索。我們利用 LSA（Latent Semantic Analysis）演算法萃取出歌詞的情意特徵值，並比較其與語言學領域中隱喻融合理論的相似性，而在歌曲方面萃取出音高、調性、速度、節奏、和弦及音色等與等能展現歌曲情意的相關特徵值。然後利用了 CFA（Cross-Modal Factor Analysis）演算法來建立詞曲之間情意特徵值的關聯模型，最後我們便可以利用關聯模型來建立推薦系統，如此便完成了詞曲情意關聯為基礎的以曲找詞歌詞推薦系統。實驗結果顯示，考慮詞曲情意特徵關聯所訓練出的關聯模型（CFA Feature Model）在以曲找詞推薦符合情意歌詞的前五名準確率平均達 60.1 %，前五十名也有 41.4 % 的準確率，比起僅考慮歌曲情意特徵（Audio Feature Model）以曲找詞推薦符合情意歌詞的前五名準確率 45.1% 及前五十名準確率28.6 % 準確率高，代表本研究所提出的詞曲情意關聯模型確實能有效推薦出符合歌曲情意的歌詞。我們也對本研究提出的詞曲情意關聯模型進行歌詞推薦結果的案例分析，我們輸入幾首學生創作的歌曲觀察詞曲情意關聯模型歌詞推薦結果，我們發現推薦出的流行音樂歌詞與學生創作的原詞在歌詞情意上非常類似，再次顯示本研究所提出的詞曲情意關聯模型確實能有效推薦出符合歌曲情意的歌詞，在詞曲創作上將能為創作者帶來靈感支援，幫助創作者詞曲創作。 / Nowadays lots of people use popular music to sing out their own story, and their own aspirations. In this thesis, we propose an approach to analyze the conception association between lyrics and music of Chinese popular music. And for applications, we design a lyrics recommendation system which can automatically recommend lyrics which is suitable to accompany with query music according to the affection and conception between lyrics and music. So, the old song with new lyrics, just like the song with different stories, brings the original song with new life. There are accompany association between lyrics and music, and the affection and conception association is most important among all. Therefore, analyze the conception association between lyrics and music is our goal. To do this, we can find out the association clues between lyrics and music from analyzing lots of popular music. For lyrics, we use LSA (Latent Semantic Analysis) algorithm to extract lyrics conception features. For music, we extracted the pitch, tonality, speed, rhythm, chords features which can show the music’s conception in the music. Then we use the CFA (Cross-Modal Factor Analysis) algorithm to analyze and learn the conception association between lyrics and music and establish the conception association model . Finally, we will be able to take advantage of the conception association model to establish the lyrics recommendation system. In the experimental results, when recommend the same conception lyrics to the query music, our proposed approach (CFA Feature Model) reaches accuracy of 60.1% on average in the top 5 recommended lyrics. Compared to control group approach (Audio Feature Model) only reaches accuracy of 45.1% on average in the top 5 recommended lyrics, our approach get better accuracy. We also presented some interesting lyrics recommendation results in case study. We upload some popular music created by students, and we found out that the affection and conception of the recommended lyrics are similar to the original song lyric which is created by the students. The experimental results show that the lyrics and music conception association model we proposed in this study does recommended lyrics suitable to the query music conception. 詞曲情意關聯分析音樂情意分析歌詞情意分析跨模態關聯探勘以曲找詞音樂資訊擷取 Music conception analysis Lyrics conception analysis Cross modal association mining Recommendation lyrics by song Music Information Retrieval
95	Non-negative matrix decomposition approaches to frequency domain analysis of music audio signals Wood, Sean 12 1900 (has links) On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception. Apprentissage machine non-supervisé Apprentissage machine semi-supervisé Factorisation matricielle non-négative Encodage parcimonieux Extraction de l’information musicale Détection de la hauteur de notes Unsupervised machine learning Semi-supervised machine learning Non-negative matrix factorization Sparse coding Music information retrieval Pitch detection
96	Automatické tagování hudebních děl pomocí metod strojového učení / Automatic tagging of musical compositions using machine learning methods Semela, René January 2020 (has links) One of the many challenges of machine learning are systems for automatic tagging of music, the complexity of this issue in particular. These systems can be practically used in the content analysis of music or the sorting of music libraries. This thesis deals with the design, training, testing, and evaluation of artificial neural network architectures for automatic tagging of music. In the beginning, attention is paid to the setting of the theoretical foundation of this field. In the practical part of this thesis, 8 architectures of neural networks are designed (4 fully convolutional and 4 convolutional recurrent). These architectures are then trained using the MagnaTagATune Dataset and mel spectrogram. After training, these architectures are tested and evaluated. The best results are achieved by the four-layer convolutional recurrent neural network (CRNN4) with the ROC-AUC = 0.9046 ± 0.0016. As the next step of the practical part of this thesis, a completely new Last.fm Dataset 2020 is created. This dataset uses Last.fm and Spotify API for data acquisition and contains 100 tags and 122877 tracks. The most successful architectures are then trained, tested, and evaluated on this new dataset. The best results on this dataset are achieved by the six-layer fully convolutional neural network (FCNN6) with the ROC-AUC = 0.8590 ± 0.0011. Finally, a simple application is introduced as a concluding point of this thesis. This application is designed for testing individual neural network architectures on a user-inserted audio file. Overall results of this thesis are similar to other papers on the same topic, but this thesis brings several new findings and innovations. In terms of innovations, a significant reduction in the complexity of individual neural network architectures is achieved while maintaining similar results.

Page generated in 0.0968 seconds