Spelling suggestions: "subject:"oon negative 1atrix factorization (NMF)"" "subject:"oon negative 1atrix actorization (NMF)""
1 |
Decomposition methods of NMR signal of complex mixtures : models ans applicationsToumi, Ichrak 28 October 2013 (has links)
L'objectif de ce travail était de tester des méthodes de SAS pour la séparation des spectres complexes RMN de mélanges dans les plus simples des composés purs. Dans une première partie, les méthodes à savoir JADE et NNSC ont été appliqué es dans le cadre de la DOSY , une application aux données CPMG était démontrée. Dans une deuxième partie, on s'est concentré sur le développement d'un algorithme efficace "beta-SNMF" . Ceci s'est montré plus performant que NNSC pour beta inférieure ou égale à 2. Etant donné que dans la littérature, le choix de beta a été adapté aux hypothèses statistiques sur le bruit additif, une étude statistique du bruit RMN de la DOSY a été faite pour obtenir une image plus complète de nos données RMN étudiées. / The objective of the work was to test BSS methods for the separation of the complex NMR spectra of mixtures into the simpler ones of the pure compounds. In a first part, known methods namely JADE and NNSC were applied in conjunction for DOSY , performing applications for CPMG were demonstrated. In a second part, we focused on developing an effective algorithm "beta- SNMF ". This was demonstrated to outperform NNSC for beta less or equal to 2. Since in the literature, the choice of beta has been adapted to the statistical assumptions on the additive noise, a statistical study of NMR DOSY noise was done to get a more complete picture about our studied NMR data.
|
2 |
Evaluation de l'adhérence au contact roue-rail par analyse d'images spectrales / Wheel-track adhesion evaluation using spectral imagingNicodeme, Claire 04 July 2018 (has links)
L’avantage du train depuis sa création est sa faible résistance à l’avancement du fait du contact fer-fer de la roue sur le rail conduisant à une adhérence réduite. Cependant cette adhérence faible est aussi un inconvénient majeur : étant dépendante des conditions environnementales, elle est facilement altérée lors d’une pollution du rail (végétaux, corps gras, eau, etc.). Aujourd’hui, les mesures prises face à des situations d'adhérence dégradée impactent directement les performances du système et conduisent notamment à une perte de capacité de transport. L’objectif du projet est d’utiliser les nouvelles technologies d’imagerie spectrale pour identifier sur les rails les zones à adhérence réduite et leur cause afin d’alerter et d’adapter rapidement les comportements. La stratégie d’étude a pris en compte les trois points suivants : • Le système de détection, installé à bord de trains commerciaux, doit être indépendant du train. • La détection et l’identification ne doivent pas interagir avec la pollution pour ne pas rendre la mesure obsolète. Pour ce faire le principe d’un Contrôle Non Destructif est retenu. • La technologie d’imagerie spectrale permet de travailler à la fois dans le domaine spatial (mesure de distance, détection d’objet) et dans le domaine fréquentiel (détection et reconnaissance de matériaux par analyse de signatures spectrales). Dans le temps imparti des trois ans de thèse, nous nous sommes focalisés sur la validation du concept par des études et analyses en laboratoire, réalisables dans les locaux de SNCF Ingénierie & Projets. Les étapes clés ont été la réalisation d’un banc d’évaluation et le choix du système de vision, la création d'une bibliothèque de signatures spectrales de référence et le développement d'algorithmes classification supervisées et non supervisées des pixels. Ces travaux ont été valorisés par le dépôt d'un brevet et la publication d'articles dans des conférences IEEE. / The advantage of the train since its creation is in its low resistance to the motion, due to the contact iron-iron of the wheel on the rail leading to low adherence. However this low adherence is also a major drawback : being dependent on the environmental conditions, it is easily deteriorated when the rail is polluted (vegetation, grease, water, etc). Nowadays, strategies to face a deteriorated adherence impact the performance of the system and lead to a loss of transport capacity. The objective of the project is to use a new spectral imaging technology to identify on the rails areas with reduced adherence and their cause in order to quickly alert and adapt the train's behaviour. The study’s strategy took into account the three following points : -The detection system, installed on board of commercial trains, must be independent of the train. - The detection and identification process should not interact with pollution in order to keep the measurements unbiased. To do so, we chose a Non Destructive Control method. - Spectral imaging technology makes it possible to work with both spatial information (distance’s measurement, target detection) and spectral information (material detection and recognition by analysis of spectral signatures). In the assigned time, we focused on the validation of the concept by studies and analyses in laboratory, workable in the office at SNCF Ingénierie & Projets. The key steps were the creation of the concept's evaluation bench and the choice of a Vision system, the creation of a library containing reference spectral signatures and the development of supervised and unsupervised pixels classification. A patent describing the method and process has been filed and published.
|
3 |
Provable Methods for Non-negative Matrix FactorizationPani, Jagdeep January 2016 (has links) (PDF)
Nonnegative matrix factorization (NMF) is an important data-analysis problem which concerns factoring a given d n matrix A with nonnegative entries into matrices B and C where B and C are d k and k n with nonnegative entries. It has numerous applications including Object recognition, Topic Modelling, Hyper-spectral imaging, Music transcription etc. In general, NMF is intractable and several heuristics exists to solve the problem of NMF. Recently there has been interest in investigating conditions under which NMF can be tractably recovered. We note that existing attempts make unrealistic assumptions and often the associated algorithms tend to be not scalable.
In this thesis, we make three major contributions: First, we formulate a model of NMF with assumptions which are natural and is a substantial weakening of separability. Unlike requiring a bound on the error in each column of (A BC) as was done in much of previous work, our assumptions are about aggregate errors, namely spectral norm of (A BC) i.e. jjA BCjj2 should be low. This is a much weaker error assumption and the associated B; C would be much more resilient than existing models. Second, we describe a robust polynomial time SVD-based algorithm, UTSVD, with realistic provable error guarantees and can handle higher levels of noise than previous algorithms. Indeed, experimentally we show that existing NMF models, which are based on separability assumptions, degrade much faster than UTSVD, in the presence of noise. Furthermore, when the data has dominant features, UTSVD significantly outperforms existing models. On real life datasets we again see a similar outperformance of UTSVD on clustering tasks. Finally, under a weaker model, we prove a robust version of uniqueness of NMF, where again, the word \robust" refers to realistic error bounds.
|
4 |
Doppler Radar Data Processing And ClassificationAygar, Alper 01 September 2008 (has links) (PDF)
In this thesis, improving the performance of the automatic recognition of the Doppler radar targets is studied. The radar used in this study is a ground-surveillance doppler radar. Target types are car, truck, bus, tank, helicopter, moving man and running man. The input of this thesis is the output of the real doppler radar signals which are normalized and preprocessed (TRP vectors: Target Recognition Pattern vectors) in the doctorate thesis by Erdogan (2002). TRP vectors are normalized and homogenized doppler radar target signals with respect to target speed, target aspect angle and target range. Some target classes have repetitions in time in their TRPs. By the use of these repetitions, improvement of the target type classification performance is studied. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for doppler radar target classification and the results are evaluated. Before classification PCA (Principal Component Analysis), LDA (Linear Discriminant Analysis), NMF (Nonnegative Matrix Factorization) and ICA (Independent Component Analysis) are implemented and applied to normalized doppler radar signals for feature extraction and dimension reduction in an efficient way. These techniques transform the input vectors, which are the normalized doppler radar signals, to another space. The effects of the implementation of these feature extraction algoritms and the use of the repetitions in doppler radar target signals on the doppler radar target classification performance are studied.
|
5 |
Cluster Identification : Topic Models, Matrix Factorization And Concept Association NetworksArun, R 07 1900 (has links) (PDF)
The problem of identifying clusters arising in the context of topic models and related approaches is important in the area of machine learning. The problem concerning traversals on Concept Association Networks is of great interest in the area of cognitive modelling. Cluster identification is the problem of finding the right number of clusters in a given set of points(or a dataset) in different settings including topic models and matrix factorization algorithms. Traversals in Concept Association Networks provide useful insights into cognitive modelling and performance. First, We consider the problem of authorship attribution of stylometry and the problem of cluster identification for topic models. For the problem of authorship attribution we show empirically that by using stop-words as stylistic features of an author, vectors obtained from the Latent Dirichlet Allocation (LDA) , outperforms other classifiers. Topics obtained by this method are generally abstract and it may not be possible to identify the cohesiveness of words falling in the same topic by mere manual inspection. Hence it is difficult to determine if the chosen number of topics is optimal. We next address this issue. We propose a new measure for topics arising out of LDA based on the divergence between the singular value distribution and the L1 norm distribution of the document-topic and topic-word matrices, respectively. It is shown that under certain assumptions, this measure can be used to find the right number of topics. Next we consider the Non-negative Matrix Factorization(NMF) approach for clustering documents. We propose entropy based regularization for a variant of the NMF with row-stochastic constraints on the component matrices. It is shown that when topic-splitting occurs, (i.e when an extra topic is required) an existing topic vector splits into two and the divergence term in the cost function decreases whereas the entropy term increases leading to a regularization.
Next we consider the problem of clustering in Concept Association Networks(CAN). The CAN are generic graph models of relationships between abstract concepts. We propose a simple clustering algorithm which takes into account the complex network properties of CAN. The performance of the algorithm is compared with that of the graph-cut based spectral clustering algorithm. In addition, we study the properties of traversals by human participants on CAN. We obtain experimental results contrasting these traversals with those obtained from (i) random walk simulations and (ii) shortest path algorithms.
|
6 |
Apprentissage interactif de mots et d'objets pour un robot humanoïde / Interactive learning of words and objects for a humanoid robotChen, Yuxin 27 February 2017 (has links)
Les applications futures de la robotique, en particulier pour des robots de service à la personne, exigeront des capacités d’adaptation continue à l'environnement, et notamment la capacité à reconnaître des nouveaux objets et apprendre des nouveaux mots via l'interaction avec les humains. Bien qu'ayant fait d'énormes progrès en utilisant l'apprentissage automatique, les méthodes actuelles de vision par ordinateur pour la détection et la représentation des objets reposent fortement sur de très bonnes bases de données d’entrainement et des supervisions d'apprentissage idéales. En revanche, les enfants de deux ans ont une capacité impressionnante à apprendre à reconnaître des nouveaux objets et en même temps d'apprendre les noms des objets lors de l'interaction avec les adultes et sans supervision précise. Par conséquent, suivant l'approche de le robotique développementale, nous développons dans la thèse des approches d'apprentissage pour les objets, en associant leurs noms et leurs caractéristiques correspondantes, inspirées par les capacités des enfants, en particulier l'interaction ambiguë avec l’homme en s’inspirant de l'interaction qui a lieu entre les enfants et les parents.L'idée générale est d’utiliser l'apprentissage cross-situationnel (cherchant les points communs entre différentes présentations d’un objet ou d’une caractéristique) et la découverte de concepts multi-modaux basée sur deux approches de découverte de thèmes latents: la Factorisation en Natrices Non-Négatives (NMF) et l'Allocation de Dirichlet latente (LDA). Sur la base de descripteurs de vision et des entrées audio / vocale, les approches proposées vont découvrir les régularités sous-jacentes dans le flux de données brutes afin de parvenir à produire des ensembles de mots et leur signification visuelle associée (p.ex le nom d’un objet et sa forme, ou un adjectif de couleur et sa correspondance dans les images). Nous avons développé une approche complète basée sur ces algorithmes et comparé leur comportements face à deux sources d'incertitudes: ambiguïtés de références, dans des situations où plusieurs mots sont donnés qui décrivent des caractéristiques d'objets multiples; et les ambiguïtés linguistiques, dans des situations où les mots-clés que nous avons l'intention d'apprendre sont intégrés dans des phrases complètes. Cette thèse souligne les solutions algorithmiques requises pour pouvoir effectuer un apprentissage efficace de ces associations de mot-référent à partir de données acquises dans une configuration d'acquisition simplifiée mais réaliste qui a permis d'effectuer des simulations étendues et des expériences préliminaires dans des vraies interactions homme-robot. Nous avons également apporté des solutions pour l'estimation automatique du nombre de thèmes pour les NMF et LDA.Nous avons finalement proposé deux stratégies d'apprentissage actives: la Sélection par l'Erreur de Reconstruction Maximale (MRES) et l'Exploration Basée sur la Confiance (CBE), afin d'améliorer la qualité et la vitesse de l'apprentissage incrémental en laissant les algorithmes choisir les échantillons d'apprentissage suivants. Nous avons comparé les comportements produits par ces algorithmes et montré leurs points communs et leurs différences avec ceux des humains dans des situations d'apprentissage similaires. / Future applications of robotics, especially personal service robots, will require continuous adaptability to the environment, and particularly the ability to recognize new objects and learn new words through interaction with humans. Though having made tremendous progress by using machine learning, current computational models for object detection and representation still rely heavily on good training data and ideal learning supervision. In contrast, two year old children have an impressive ability to learn to recognize new objects and at the same time to learn the object names during interaction with adults and without precise supervision. Therefore, following the developmental robotics approach, we develop in the thesis learning approaches for objects, associating their names and corresponding features, inspired by the infants' capabilities, in particular, the ambiguous interaction with humans, inspired by the interaction that occurs between children and parents.The general idea is to use cross-situational learning (finding the common points between different presentations of an object or a feature) and to implement multi-modal concept discovery based on two latent topic discovery approaches : Non Negative Matrix Factorization (NMF) and Latent Dirichlet Association (LDA). Based on vision descriptors and sound/voice inputs, the proposed approaches will find the underlying regularities in the raw dataflow to produce sets of words and their associated visual meanings (eg. the name of an object and its shape, or a color adjective and its correspondence in images). We developed a complete approach based on these algorithms and compared their behavior in front of two sources of uncertainties: referential ambiguities, in situations where multiple words are given that describe multiple objects features; and linguistic ambiguities, in situations where keywords we intend to learn are merged in complete sentences. This thesis highlights the algorithmic solutions required to be able to perform efficient learning of these word-referent associations from data acquired in a simplified but realistic acquisition setup that made it possible to perform extensive simulations and preliminary experiments in real human-robot interactions. We also gave solutions for the automatic estimation of the number of topics for both NMF and LDA.We finally proposed two active learning strategies, Maximum Reconstruction Error Based Selection (MRES) and Confidence Based Exploration (CBE), to improve the quality and speed of incremental learning by letting the algorithms choose the next learning samples. We compared the behaviors produced by these algorithms and show their common points and differences with those of humans in similar learning situations.
|
7 |
Sentiment-Driven Topic Analysis Of Song LyricsSharma, Govind 08 1900 (has links) (PDF)
Sentiment Analysis is an area of Computer Science that deals with the impact a document makes on a user. The very field is further sub-divided into Opinion Mining and Emotion Analysis, the latter of which is the basis for the present work. Work on songs is aimed at building affective interactive applications such as music recommendation engines. Using song lyrics, we are interested in both supervised and unsupervised analyses, each of which has its own pros and cons.
For an unsupervised analysis (clustering), we use a standard probabilistic topic model called Latent Dirichlet Allocation (LDA). It mines topics from songs, which are nothing but probability distributions over the vocabulary of words. Some of the topics seem sentiment-based, motivating us to continue with this approach. We evaluate our clusters using a gold dataset collected from an apt website and get positive results. This approach would be useful in the absence of a supervisor dataset.
In another part of our work, we argue the inescapable existence of supervision in terms of having to manually analyse the topics returned. Further, we have also used explicit supervision in terms of a training dataset for a classifier to learn sentiment specific classes. This analysis helps reduce dimensionality and improve classification accuracy. We get excellent dimensionality reduction using Support Vector Machines (SVM) for feature selection. For re-classification, we use the Naive Bayes Classifier (NBC) and SVM, both of which perform well. We also use Non-negative Matrix Factorization (NMF) for classification, but observe that the results coincide with those of NBC, with no exceptions. This drives us towards establishing a theoretical equivalence between the two.
|
8 |
Fusion pour la séparation de sources audio / Fusion for audio source separationJaureguiberry, Xabier 16 June 2015 (has links)
La séparation aveugle de sources audio dans le cas sous-déterminé est un problème mathématique complexe dont il est aujourd'hui possible d'obtenir une solution satisfaisante, à condition de sélectionner la méthode la plus adaptée au problème posé et de savoir paramétrer celle-ci soigneusement. Afin d'automatiser cette étape de sélection déterminante, nous proposons dans cette thèse de recourir au principe de fusion. L'idée est simple : il s'agit, pour un problème donné, de sélectionner plusieurs méthodes de résolution plutôt qu'une seule et de les combiner afin d'en améliorer la solution. Pour cela, nous introduisons un cadre général de fusion qui consiste à formuler l'estimée d'une source comme la combinaison de plusieurs estimées de cette même source données par différents algorithmes de séparation, chaque estimée étant pondérée par un coefficient de fusion. Ces coefficients peuvent notamment être appris sur un ensemble d'apprentissage représentatif du problème posé par minimisation d'une fonction de coût liée à l'objectif de séparation. Pour aller plus loin, nous proposons également deux approches permettant d'adapter les coefficients de fusion au signal à séparer. La première formule la fusion dans un cadre bayésien, à la manière du moyennage bayésien de modèles. La deuxième exploite les réseaux de neurones profonds afin de déterminer des coefficients de fusion variant en temps. Toutes ces approches ont été évaluées sur deux corpus distincts : l'un dédié au rehaussement de la parole, l'autre dédié à l'extraction de voix chantée. Quelle que soit l'approche considérée, nos résultats montrent l'intérêt systématique de la fusion par rapport à la simple sélection, la fusion adaptative par réseau de neurones se révélant être la plus performante. / Underdetermined blind source separation is a complex mathematical problem that can be satisfyingly resolved for some practical applications, providing that the right separation method has been selected and carefully tuned. In order to automate this selection process, we propose in this thesis to resort to the principle of fusion which has been widely used in the related field of classification yet is still marginally exploited in source separation. Fusion consists in combining several methods to solve a given problem instead of selecting a unique one. To do so, we introduce a general fusion framework in which a source estimate is expressed as a linear combination of estimates of this same source given by different separation algorithms, each source estimate being weighted by a fusion coefficient. For a given task, fusion coefficients can then be learned on a representative training dataset by minimizing a cost function related to the separation objective. To go further, we also propose two ways to adapt the fusion coefficients to the mixture to be separated. The first one expresses the fusion of several non-negative matrix factorization (NMF) models in a Bayesian fashion similar to Bayesian model averaging. The second one aims at learning time-varying fusion coefficients thanks to deep neural networks. All proposed methods have been evaluated on two distinct corpora. The first one is dedicated to speech enhancement while the other deals with singing voice extraction. Experimental results show that fusion always outperform simple selection in all considered cases, best results being obtained by adaptive time-varying fusion with neural networks.
|
Page generated in 0.1534 seconds