Global ETD Search

41	Analyse d'image hyperspectrale / Hyperspectral Image Analysis Faivre, Adrien 14 December 2017 (has links) Les travaux de thèse effectués dans le cadre de la convention Cifre conclue entrele laboratoire de mathématiques de Besançon et Digital Surf, entreprise éditrice dulogiciel d’analyse métrologique Mountains, portent sur les techniques d’analyse hyperspectrale.Sujet en plein essor, ces méthodes permettent d’exploiter des imagesissues de micro-spectroscopie, et en particulier de spectroscopie Raman. Digital Surfambitionne aujourd’hui de concevoir des solutions logicielles adaptées aux imagesproduites par ces appareils. Ces dernières se présentent sous forme de cubes de valeurs,où chaque pixel correspond à un spectre. La taille importante de ces données,appelées images hyperspectrales en raison du nombre important de mesures disponiblespour chaque spectre, obligent à repenser certains des algorithmes classiquesd’analyse d’image.Nous commençons par nous intéresser aux techniques de partitionnement de données.L’idée est de regrouper dans des classes homogènes les différents spectres correspondantà des matériaux similaires. La classification est une des techniques courammentutilisée en traitement des données. Cette tâche fait pourtant partie d’unensemble de problèmes réputés trop complexes pour une résolution pratique : les problèmesNP-durs. L’efficacité des différentes heuristiques utilisées en pratique était jusqu’àrécemment mal comprise. Nous proposons des argument théoriques permettantde donner des garanties de succès quand les groupes à séparer présentent certainespropriétés statistiques.Nous abordons ensuite les techniques de dé-mélange. Cette fois, il ne s’agit plus dedéterminer un ensemble de pixels semblables dans l’image, mais de proposer une interprétationde chaque pixel comme un mélange linéaire de différentes signatures spectrales,sensées émaner de matériaux purs. Cette déconstruction de spectres compositesse traduit mathématiquement comme un problème de factorisation en matrices positives.Ce problème est NP-dur lui aussi. Nous envisageons donc certaines relaxations,malencontreusement peu convaincantes en pratique. Contrairement au problème declassification, il semble très difficile de donner de bonnes garanties théoriques sur laqualité des résultats proposés. Nous adoptons donc une approche plus pragmatique,et proposons de régulariser cette factorisation en imposant des contraintes sur lavariation totale de chaque facteur.Finalement, nous donnons un aperçu d’autres problèmes d’analyse hyperspectralerencontrés lors de cette thèse, problèmes parmi lesquels figurent l’analyse en composantesindépendantes, la réduction non-linéaire de la dimension et la décompositiond’une image par rapport à une librairie regroupant un nombre important de spectresde référence. / This dissertation addresses hyperspectral image analysis, a set of techniques enabling exploitation of micro-spectroscopy images. Images produced by these sensors constitute cubic arrays, meaning that every pixel in the image is actually a spectrum.The size of these images, which is often quite large, calls for an upgrade for classical image analysis algorithms.We start out our investigation with clustering techniques. The main idea is to regroup every spectrum contained in a hyperspectralimage into homogeneous clusters. Spectrums taken across the image can indeed be generated by similar materials, and hence display spectral signatures resembling each other. Clustering is a commonly used method in data analysis. It belongs nonetheless to a class of particularly hard problems to solve, named NP-hard problems. The efficiency of a few heuristics used in practicewere poorly understood until recently. We give theoretical arguments guaranteeing success when the groups studied displaysome statistical property.We then study unmixing techniques. The objective is no longer to decide to which class a pixel belongs, but to understandeach pixel as a mix of basic signatures supposed to arise from pure materials. The mathematical underlying problem is again NP-hard.After studying its complexity, and suggesting two lengthy relaxations, we describe a more practical way to constrain the problemas to obtain regularized solutions.We finally give an overview of other hyperspectral image analysis methods encountered during this thesis, amongst whomare independent component analysis, non-linear dimension reduction, and regression against a spectrum library. Traitement d'images Relaxation SDP Factorsation matrices positives Imagerie hyperspectrale Partitionnement des données Fractorisation par matrice des données Régularisation par la variation totale Hyperspectral imaging Image Analysis Clustering Non-negative matrix factorization Total variation regularization 510
42	Détection, localisation et quantification de déplacements par capteurs à fibre optique / Detection, localization and quantification of displacements thanks to optical fiber sensors Buchoud, Edouard 13 October 2014 (has links) Pour l’auscultation d’ouvrages, les capteurs à fibre optique sont généralement utilisés puisqu’ils présentent l’avantage de fournir des mesures réparties. Plus particulièrement, le capteur basé sur la technologie Brillouin permet d’acquérir un profil de fréquence Brillouin, sensible à la température et la déformation dans une fibre optique sur une dizaine de kilomètres avec un pas de l’ordre de la dizaine de centimètres. La première problématique est d’obtenir un profil centimétrique sur la même longueur d’auscultation. Nous y répondons en s’appuyant sur des méthodes de séparation de sources, de déconvolution et de résolution de problèmes inverses. Ensuite, nous souhaitons estimer la déformation athermique dans l’ouvrage. Pour cela, plusieurs algorithmes de filtrage adaptatif sont comparés. Finalement, un procédé pour quantifier le déplacement de l’ouvrage à partir des mesures de déformation est proposé. Toutes ces méthodes sont testés sur des données simulées et réelles acquises dans des conditions contrôlées. / For structural health monitoring, optical fiber sensors are mostly used thanks their capacity to provide distributed measurements. Based on the principle of Brillouin scattering, optical fiber sensors measure Brillouin frequency profile, sensitive to strain and temperature into the optical fiber, with a meter spatial resolution over several kilometers. The first problem is to obtain a centimeter spatial resolution with the same sensing length. To solve it, source separation, deconvolution and resolution of inverse problem methodologies are used. Then, the athermal strain into the structure is searched. Several algorithms based on adaptative filter are tested to correct the thermal effect on strain measurements. Finally, several methods are developed to quantify structure displacements from the athermal strain measurements. They have been tested on simulated and controlled-conditions data Suivi de l’état des structures Capteur à fibre optique Diffusion Brillouin Séparation de sources Factorisation en matrices non négatives Problème inverse Filtrage adaptatif Structural Health Monitoring Optical fiber sensor Brillouin backscattering Non negative matrix factorization, Inverse problem Adaptative filter 550
43	Cluster Identification : Topic Models, Matrix Factorization And Concept Association Networks Arun, R 07 1900 (has links) (PDF) The problem of identifying clusters arising in the context of topic models and related approaches is important in the area of machine learning. The problem concerning traversals on Concept Association Networks is of great interest in the area of cognitive modelling. Cluster identification is the problem of finding the right number of clusters in a given set of points(or a dataset) in different settings including topic models and matrix factorization algorithms. Traversals in Concept Association Networks provide useful insights into cognitive modelling and performance. First, We consider the problem of authorship attribution of stylometry and the problem of cluster identification for topic models. For the problem of authorship attribution we show empirically that by using stop-words as stylistic features of an author, vectors obtained from the Latent Dirichlet Allocation (LDA) , outperforms other classifiers. Topics obtained by this method are generally abstract and it may not be possible to identify the cohesiveness of words falling in the same topic by mere manual inspection. Hence it is difficult to determine if the chosen number of topics is optimal. We next address this issue. We propose a new measure for topics arising out of LDA based on the divergence between the singular value distribution and the L1 norm distribution of the document-topic and topic-word matrices, respectively. It is shown that under certain assumptions, this measure can be used to find the right number of topics. Next we consider the Non-negative Matrix Factorization(NMF) approach for clustering documents. We propose entropy based regularization for a variant of the NMF with row-stochastic constraints on the component matrices. It is shown that when topic-splitting occurs, (i.e when an extra topic is required) an existing topic vector splits into two and the divergence term in the cost function decreases whereas the entropy term increases leading to a regularization. Next we consider the problem of clustering in Concept Association Networks(CAN). The CAN are generic graph models of relationships between abstract concepts. We propose a simple clustering algorithm which takes into account the complex network properties of CAN. The performance of the algorithm is compared with that of the graph-cut based spectral clustering algorithm. In addition, we study the properties of traversals by human participants on CAN. We obtain experimental results contrasting these traversals with those obtained from (i) random walk simulations and (ii) shortest path algorithms. Machine Learning Clustering (Concepts) Association Networks Concept Association Networks (CAN) Latent Dirichlet Allocation (LDA) Matrix Factorization Entropy (Information Theory) Cognitive Clustering Cluster Identification Non-negative Matrix Factorization (NMF) Computer Science
44	Méthodes rapides de traitement d’images hyperspectrales. Application à la caractérisation en temps réel du matériau bois / Fast methods for hyperspectral images processing. Application to the real-time characterization of wood material Nus, Ludivine 12 December 2019 (has links) Cette thèse aborde le démélange en-ligne d’images hyperspectrales acquises par un imageur pushbroom, pour la caractérisation en temps réel du matériau bois. La première partie de cette thèse propose un modèle de mélange en-ligne fondé sur la factorisation en matrices non-négatives. À partir de ce modèle, trois algorithmes pour le démélange séquentiel en-ligne, fondés respectivement sur les règles de mise à jour multiplicatives, le gradient optimal de Nesterov et l’optimisation ADMM (Alternating Direction Method of Multipliers) sont développés. Ces algorithmes sont spécialement conçus pour réaliser le démélange en temps réel, au rythme d'acquisition de l'imageur pushbroom. Afin de régulariser le problème d’estimation (généralement mal posé), deux sortes de contraintes sur les endmembers sont utilisées : une contrainte de dispersion minimale ainsi qu’une contrainte de volume minimal. Une méthode pour l’estimation automatique du paramètre de régularisation est également proposée, en reformulant le problème de démélange hyperspectral en-ligne comme un problème d’optimisation bi-objectif. Dans la seconde partie de cette thèse, nous proposons une approche permettant de gérer la variation du nombre de sources, i.e. le rang de la décomposition, au cours du traitement. Les algorithmes en-ligne préalablement développés sont ainsi modifiés, en introduisant une étape d’apprentissage d’une bibliothèque hyperspectrale, ainsi que des pénalités de parcimonie permettant de sélectionner uniquement les sources actives. Enfin, la troisième partie de ces travaux consiste en l’application de nos approches à la détection et à la classification des singularités du matériau bois. / This PhD dissertation addresses the problem of on-line unmixing of hyperspectral images acquired by a pushbroom imaging system, for real-time characterization of wood. The first part of this work proposes an on-line mixing model based on non-negative matrix factorization. Based on this model, three algorithms for on-line sequential unmixing, using multiplicative update rules, the Nesterov optimal gradient and the ADMM optimization (Alternating Direction Method of Multipliers), respectively, are developed. These algorithms are specially designed to perform the unmixing in real time, at the pushbroom imager acquisition rate. In order to regularize the estimation problem (generally ill-posed), two types of constraints on the endmembers are used: a minimum dispersion constraint and a minimum volume constraint. A method for the unsupervised estimation of the regularization parameter is also proposed, by reformulating the on-line hyperspectral unmixing problem as a bi-objective optimization. In the second part of this manuscript, we propose an approach for handling the variation in the number of sources, i.e. the rank of the decomposition, during the processing. Thus, the previously developed on-line algorithms are modified, by introducing a hyperspectral library learning stage as well as sparse constraints allowing to select only the active sources. Finally, the third part of this work consists in the application of these approaches to the detection and the classification of the singularities of wood. Démélange hyperspectral en-ligne Imagerie hyperspectrale pushbroom Factorisation en matrices non-négatives Contrainte de volume minimal Bibliothèque hyperspectrale Suivi du rang On-line hyperspectral unmixing Pushbroom hyperspectral imaging Non-negative matrix factorization Minimal volume constraint Regularization parameter estimation Hyperspectral library Rank tracking 621.367 006.4
45	A Confirmatory Analysis for Automating the Evaluation of Motivation Letters to Emulate Human Judgment Mercado Salazar, Jorge Anibal, Rana, S M Masud January 2021 (has links) Manually reading, evaluating, and scoring motivation letters as part of the admissions process is a time-consuming and tedious task for Dalarna University's program managers. An automated scoring system would provide them with relief as well as the ability to make much faster decisions when selecting applicants for admission. The aim of this thesis was to analyse current human judgment and attempt to emulate it using machine learning techniques. We used various topic modelling methods, such as Latent Dirichlet Allocation and Non-Negative Matrix Factorization, to find the most interpretable topics, build a bridge between topics and human-defined factors, and finally evaluate model performance by predicting scoring values and finding accuracy using logistic regression, discriminant analysis, and other classification algorithms. Despite the fact that we were able to discover the meaning of almost all human factors on our own, the topic models' accuracy in predicting overall score was unexpectedly low. Setting a threshold on overall score to select applicants for admission yielded a good overall accuracy result, but did not yield a good consistent precision or recall score. During our investigation, we attempted to determine the possible causes of these unexpected results and discovered that not only is topic modelling limitation to blame, but human bias also plays a role. Motivation Letter Natural Language Processing Topic Modelling Latent Dirichlet Allocation Non-Negative Matrix Factorization LDAVis Topic Factors Image Processing Text Processing Logistic Regression Unsupervised Learning Machine Learning Other Social Sciences Annan samhällsvetenskap
46	Reconstruction de phase par modèles de signaux : application à la séparation de sources audio / Phase recovery based on signal modeling : application to audio source separation Magron, Paul 02 December 2016 (has links) De nombreux traitements appliqués aux signaux audio travaillent sur une représentation Temps-Fréquence (TF) des données. Lorsque le résultat de ces algorithmes est un champ spectral d’amplitude, la question se pose, pour reconstituer un signal temporel, d’estimer le champ de phase correspondant. C’est par exemple le cas dans les applications de séparation de sources, qui estiment les spectrogrammes des sources individuelles à partir du mélange ; la méthode dite de filtrage de Wiener, largement utilisée en pratique, fournit des résultats satisfaisants mais est mise en défaut lorsque les sources se recouvrent dans le plan TF. Cette thèse aborde le problème de la reconstruction de phase de signaux dans le domaine TF appliquée à la séparation de sources audio. Une étude préliminaire révèle la nécessité de mettre au point de nouvelles techniques de reconstruction de phase pour améliorer la qualité de la séparation de sources. Nous proposons de baser celles-ci sur des modèles de signaux. Notre approche consiste à exploiter des informations issues de modèles sous-jacents aux données comme les mélanges de sinusoïdes. La prise en compte de ces informations permet de préserver certaines propriétés intéressantes, comme la continuité temporelle ou la précision des attaques. Nous intégrons ces contraintes dans des modèles de mélanges pour la séparation de sources, où la phase du mélange est exploitée. Les amplitudes des sources pourront être supposées connues, ou bien estimées conjointement dans un modèle inspiré de la factorisation en matrices non-négatives complexe. Enfin, un modèle probabiliste de sources à phase non-uniforme est mis au point. Il permet d’exploiter les à priori provenant de la modélisation de signaux et de tenir compte d’une incertitude sur ceux-ci. Ces méthodes sont testées sur de nombreuses bases de données de signaux de musique réalistes. Leurs performances, en termes de qualité des signaux estimés et de temps de calcul, sont supérieures à celles des méthodes traditionnelles. En particulier, nous observons une diminution des interférences entre sources estimées, et une réduction des artéfacts dans les basses fréquences, ce qui confirme l’intérêt des modèles de signaux pour la reconstruction de phase. / A variety of audio signal processing techniques act on a Time-Frequency (TF) representation of the data. When the result of those algorithms is a magnitude spectrum, it is necessary to reconstruct the corresponding phase field in order to resynthesize time-domain signals. For instance, in the source separation framework the spectrograms of the individual sources are estimated from the mixture ; the widely used Wiener filtering technique then provides satisfactory results, but its performance decreases when the sources overlap in the TF domain. This thesis addresses the problem of phase reconstruction in the TF domain for audio source separation. From a preliminary study we highlight the need for novel phase recovery methods. We therefore introduce new phase reconstruction techniques that are based on music signal modeling : our approach consists inexploiting phase information that originates from signal models such as mixtures of sinusoids. Taking those constraints into account enables us to preserve desirable properties such as temporal continuity or transient precision. We integrate these into several mixture models where the mixture phase is exploited ; the magnitudes of the sources are either assumed to be known, or jointly estimated in a complex nonnegative matrix factorization framework. Finally we design a phase-dependent probabilistic mixture model that accounts for model-based phase priors. Those methods are tested on a variety of realistic music signals. They compare favorably or outperform traditional source separation techniques in terms of signal reconstruction quality and computational cost. In particular, we observe a decrease in interferences between the estimated sources and a reduction of artifacts in the low-frequency components, which confirms the benefit of signal model-based phase reconstruction methods. Reconstruction de phase Modèles de signaux Séparation de sources audio Musique Mélanges de sinusoïdes Factorisation en matrices non-négatives Analyse temps-fréquence Modèles probabilistes Phase recovery Signal modeling Audio source separation Music Mixtures of sinusoids Non-negative matrix factorization Time-frequency analysis Probabilistic modeling
47	Apprentissage interactif de mots et d'objets pour un robot humanoïde / Interactive learning of words and objects for a humanoid robot Chen, Yuxin 27 February 2017 (has links) Les applications futures de la robotique, en particulier pour des robots de service à la personne, exigeront des capacités d’adaptation continue à l'environnement, et notamment la capacité à reconnaître des nouveaux objets et apprendre des nouveaux mots via l'interaction avec les humains. Bien qu'ayant fait d'énormes progrès en utilisant l'apprentissage automatique, les méthodes actuelles de vision par ordinateur pour la détection et la représentation des objets reposent fortement sur de très bonnes bases de données d’entrainement et des supervisions d'apprentissage idéales. En revanche, les enfants de deux ans ont une capacité impressionnante à apprendre à reconnaître des nouveaux objets et en même temps d'apprendre les noms des objets lors de l'interaction avec les adultes et sans supervision précise. Par conséquent, suivant l'approche de le robotique développementale, nous développons dans la thèse des approches d'apprentissage pour les objets, en associant leurs noms et leurs caractéristiques correspondantes, inspirées par les capacités des enfants, en particulier l'interaction ambiguë avec l’homme en s’inspirant de l'interaction qui a lieu entre les enfants et les parents.L'idée générale est d’utiliser l'apprentissage cross-situationnel (cherchant les points communs entre différentes présentations d’un objet ou d’une caractéristique) et la découverte de concepts multi-modaux basée sur deux approches de découverte de thèmes latents: la Factorisation en Natrices Non-Négatives (NMF) et l'Allocation de Dirichlet latente (LDA). Sur la base de descripteurs de vision et des entrées audio / vocale, les approches proposées vont découvrir les régularités sous-jacentes dans le flux de données brutes afin de parvenir à produire des ensembles de mots et leur signification visuelle associée (p.ex le nom d’un objet et sa forme, ou un adjectif de couleur et sa correspondance dans les images). Nous avons développé une approche complète basée sur ces algorithmes et comparé leur comportements face à deux sources d'incertitudes: ambiguïtés de références, dans des situations où plusieurs mots sont donnés qui décrivent des caractéristiques d'objets multiples; et les ambiguïtés linguistiques, dans des situations où les mots-clés que nous avons l'intention d'apprendre sont intégrés dans des phrases complètes. Cette thèse souligne les solutions algorithmiques requises pour pouvoir effectuer un apprentissage efficace de ces associations de mot-référent à partir de données acquises dans une configuration d'acquisition simplifiée mais réaliste qui a permis d'effectuer des simulations étendues et des expériences préliminaires dans des vraies interactions homme-robot. Nous avons également apporté des solutions pour l'estimation automatique du nombre de thèmes pour les NMF et LDA.Nous avons finalement proposé deux stratégies d'apprentissage actives: la Sélection par l'Erreur de Reconstruction Maximale (MRES) et l'Exploration Basée sur la Confiance (CBE), afin d'améliorer la qualité et la vitesse de l'apprentissage incrémental en laissant les algorithmes choisir les échantillons d'apprentissage suivants. Nous avons comparé les comportements produits par ces algorithmes et montré leurs points communs et leurs différences avec ceux des humains dans des situations d'apprentissage similaires. / Future applications of robotics, especially personal service robots, will require continuous adaptability to the environment, and particularly the ability to recognize new objects and learn new words through interaction with humans. Though having made tremendous progress by using machine learning, current computational models for object detection and representation still rely heavily on good training data and ideal learning supervision. In contrast, two year old children have an impressive ability to learn to recognize new objects and at the same time to learn the object names during interaction with adults and without precise supervision. Therefore, following the developmental robotics approach, we develop in the thesis learning approaches for objects, associating their names and corresponding features, inspired by the infants' capabilities, in particular, the ambiguous interaction with humans, inspired by the interaction that occurs between children and parents.The general idea is to use cross-situational learning (finding the common points between different presentations of an object or a feature) and to implement multi-modal concept discovery based on two latent topic discovery approaches : Non Negative Matrix Factorization (NMF) and Latent Dirichlet Association (LDA). Based on vision descriptors and sound/voice inputs, the proposed approaches will find the underlying regularities in the raw dataflow to produce sets of words and their associated visual meanings (eg. the name of an object and its shape, or a color adjective and its correspondence in images). We developed a complete approach based on these algorithms and compared their behavior in front of two sources of uncertainties: referential ambiguities, in situations where multiple words are given that describe multiple objects features; and linguistic ambiguities, in situations where keywords we intend to learn are merged in complete sentences. This thesis highlights the algorithmic solutions required to be able to perform efficient learning of these word-referent associations from data acquired in a simplified but realistic acquisition setup that made it possible to perform extensive simulations and preliminary experiments in real human-robot interactions. We also gave solutions for the automatic estimation of the number of topics for both NMF and LDA.We finally proposed two active learning strategies, Maximum Reconstruction Error Based Selection (MRES) and Confidence Based Exploration (CBE), to improve the quality and speed of incremental learning by letting the algorithms choose the next learning samples. We compared the behaviors produced by these algorithms and show their common points and differences with those of humans in similar learning situations. Robotique développementale Apprentissage de mot-Référent Apprentissage cross-Situationnel Apprentissage actif Allocation de Dirichlet latente (LDA) Developmental robotics Word-Referent learning Cross-Situational learning Active learning Non negative Matrix Factorization (NMF) Latent Dirichlet Association (LDA) 006.3
48	Modeling High-Dimensional Audio Sequences with Recurrent Neural Networks Boulanger-Lewandowski, Nicolas 04 1900 (has links) Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement. / This thesis studies models of high-dimensional sequences based on recurrent neural networks (RNNs) and their application to music and speech. While in principle RNNs can represent the long-term dependencies and complex temporal dynamics present in real-world sequences such as video, audio and natural language, they have not been used to their full potential since their introduction by Rumelhart et al. (1986a) due to the difficulty to train them efficiently by gradient-based optimization. In recent years, the successful application of Hessian-free optimization and other advanced training techniques motivated an increase of their use in many state-of-the-art systems. The work of this thesis is part of this development. The main idea is to exploit the power of RNNs to learn a probabilistic description of sequences of symbols, i.e. high-level information associated with observed signals, that in turn can be used as a prior to improve the accuracy of information retrieval. For example, by modeling the evolution of note patterns in polyphonic music, chords in a harmonic progression, phones in a spoken utterance, or individual sources in an audio mixture, we can improve significantly the accuracy of polyphonic transcription, chord recognition, speech recognition and audio source separation respectively. The practical application of our models to these tasks is detailed in the last four articles presented in this thesis. In the first article, we replace the output layer of an RNN with conditional restricted Boltzmann machines to describe much richer multimodal output distributions. In the second article, we review and develop advanced techniques to train RNNs. In the last four articles, we explore various ways to combine our symbolic models with deep networks and non-negative matrix factorization algorithms, namely using products of experts, input/output architectures, and generative frameworks that generalize hidden Markov models. We also propose and analyze efficient inference procedures for those models, such as greedy chronological search, high-dimensional beam search, dynamic programming-like pruned beam search and gradient descent. Finally, we explore issues such as label bias, teacher forcing, temporal smoothing, regularization and pre-training. Apprentissage automatique Machine learning Réseaux de neurones récurrents Recurrent neural networks Recherche d'information musicale Music information retrieval Modèles séquentiels Sequential models Transcription polyphonique Polyphonic transcription Reconnaissance de la parole Speech recognition Factorisation matricielle non-négative Non-negative matrix factorization
49	Competition improves robustness against loss of information Kolankeh, Arash Kermani, Teichmann, Michael, Hamker, Fred H. 21 July 2015 (has links) (PDF) A substantial number of works have aimed at modeling the receptive field properties of the primary visual cortex (V1). Their evaluation criterion is usually the similarity of the model response properties to the recorded responses from biological organisms. However, as several algorithms were able to demonstrate some degree of similarity to biological data based on the existing criteria, we focus on the robustness against loss of information in the form of occlusions as an additional constraint for better understanding the algorithmic level of early vision in the brain. We try to investigate the influence of competition mechanisms on the robustness. Therefore, we compared four methods employing different competition mechanisms, namely, independent component analysis, non-negative matrix factorization with sparseness constraint, predictive coding/biased competition, and a Hebbian neural network with lateral inhibitory connections. Each of those methods is known to be capable of developing receptive fields comparable to those of V1 simple-cells. Since measuring the robustness of methods having simple-cell like receptive fields against occlusion is difficult, we measure the robustness using the classification accuracy on the MNIST hand written digit dataset. For this we trained all methods on the training set of the MNIST hand written digits dataset and tested them on a MNIST test set with different levels of occlusions. We observe that methods which employ competitive mechanisms have higher robustness against loss of information. Also the kind of the competition mechanisms plays an important role in robustness. Global feedback inhibition as employed in predictive coding/biased competition has an advantage compared to local lateral inhibition learned by an anti-Hebb rule. Wettbewerb Laterale Inhibition Hebbsches Lernen Unabhängige Komponentenanalyse Nicht-negative Matrixzerlegung Verdeckung Informationsverlust Technische Universität Chemnitz Publikationsfonds competition lateral inhibition Hebbian learning independent component analysis non-negative matrix factorization predictive coding/biased competition occlusion information loss Technische Universität Chemnitz Publication funds ddc:000 ddc:500 ddc:600 Wettbewerb Laterale Inhibition Unabhängige Komponentenanalyse
50	Non-negative matrix decomposition approaches to frequency domain analysis of music audio signals Wood, Sean 12 1900 (has links) On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception. Apprentissage machine non-supervisé Apprentissage machine semi-supervisé Factorisation matricielle non-négative Encodage parcimonieux Extraction de l’information musicale Détection de la hauteur de notes Unsupervised machine learning Semi-supervised machine learning Non-negative matrix factorization Sparse coding Music information retrieval Pitch detection

Search results