1 |
Processus gaussiens pour la séparation de sources et le codage informé / Gaussian processes for source separation and posterior source codingLiutkus, Antoine 27 November 2012 (has links)
La séparation de sources est la tâche qui consiste à récupérer plusieurs signaux dont on observe un ou plusieurs mélanges. Ce problème est particulièrement difficile et de manière à rendre la séparation possible, toute information supplémentaire connue sur les sources ou le mélange doit pouvoir être prise en compte. Dans cette thèse, je propose un formalisme général permettant d’inclure de telles connaissances dans les problèmes de séparation, où une source est modélisée comme la réalisation d’un processus gaussien. L’approche a de nombreux intérêts : elle généralise une grande partie des méthodes actuelles, elle permet la prise en compte de nombreux a priori et les paramètres du modèle peuvent être estimés efficacement. Ce cadre théorique est appliqué à la séparation informée de sources audio, où la séparation est assistée d'une information annexe calculée en amont de la séparation, lors d’une phase préliminaire où à la fois le mélange et les sources sont disponibles. Pour peu que cette information puisse se coder efficacement, cela rend possible des applications comme le karaoké ou la manipulation des différents instruments au sein d'un mix à un coût en débit bien plus faible que celui requis par la transmission séparée des sources. Ce problème de la séparation informée s’apparente fortement à un problème de codage multicanal. Cette analogie permet de placer la séparation informée dans un cadre théorique plus global où elle devient un problème de codage particulier et bénéficie à ce titre des résultats classiques de la théorie du codage, qui permettent d’optimiser efficacement les performances. / Source separation consists in recovering different signals that are only observed through their mixtures. To solve this difficult problem, any available prior information about the sources must be used so as to better identify them among all possible solutions. In this thesis, I propose a general framework, which permits to include a large diversity of prior information into source separation. In this framework, the sources signals are modeled as the outcomes of independent Gaussian processes, which are powerful and general nonparametric Bayesian models. This approach has many advantages: it permits the separation of sources defined on arbitrary input spaces, it permits to take many kinds of prior knowledge into account and also leads to automatic parameters estimation. This theoretical framework is applied to the informed source separation of audio sources. In this setup, a side-information is computed beforehand on the sources themselves during a so-called encoding stage where both sources and mixtures are available. In a subsequent decoding stage, the sources are recovered using this information and the mixtures only. Provided this information can be encoded efficiently, it permits popular applications such as karaoke or active listening using a very small bitrate compared to separate transmission of the sources. It became clear that informed source separation is very akin to a multichannel coding problem. With this in mind, it was straightforwardly cast into information theory as a particular source-coding problem, which permits to derive its optimal performance as rate-distortion functions as well as practical coding algorithms achieving these bounds.
|
2 |
Décompositions parcimonieuses pour l'analyse avancée de données en spectrométrie pour la Santé / Sparse decompositions for advanced data analysis of hyperspectral data in biological applicationsRapin, Jérémy 19 December 2014 (has links)
La séparation de sources en aveugle (SSA) vise à rechercher des signaux sources inconnus et mélangés de manière inconnue au sein de plusieurs observations. Cette approche très générique et non-supervisée ne fournit cependant pas nécessairement des résultats exploitables. Il est alors nécessaire d’ajouter des contraintes, notamment physiques, afin de privilégier la recherche de sources ayant une structure particulière. La factorisation en matrices positives (non-negative matrix factorization, NMF) qui fait plus précisément l’objet de cette thèse recherche ainsi des sources positives observées au travers de mélanges linéaires positifs.L’ajout de davantage d’information reste cependant souvent nécessaire afin de pouvoir séparer les sources. Nous nous intéressons ainsi au concept de parcimonie qui permet d’améliorer le contraste entre celles-ci tout en produisant des approches très robustes, en particulier au bruit. Nous montrons qu’afin d’obtenir des solutions stables, les contraintes de positivité et la régularisation parcimonieuse doivent être appliqués de manière adéquate. Aussi, l’utilisation de la parcimonie dans un espace transformé potentiellement redondant, permettant de capturer la structure de la plu- part des signaux naturels, se révèle difficile à appliquer au côté de la contrainte de positivité dans l’espace direct. Nous proposons ainsi un nouvel algorithme de NMF parcimonieuse, appelé nGMCA (non-negative Generalized Morphological Component Analysis), qui surmonte ces difficultés via l’utilisation de techniques de calcul proximal. Des expérimentations sur des données simulées montrent que cet algorithme est robuste à une contamination par du bruit additif Gaussien, à l’aide d’une gestion automatique du paramètre de parcimonie. Des comparaisons avec des algorithmes de l’état-de-l’art en NMF sur des données réalistes montrent l’efficacité ainsi que la robustesse de l’approche proposée.Finalement, nous appliquerons nGMCA sur des données de chromatographie en phase liquide - spectrométrie de masse (liquid chromatography - mass spectrometry, LC-MS). L’observation de ces données montre qu’elles sont contaminées par du bruit multiplicatif, lequel détériore grandement les résultats des algorithmes de NMF. Une extension de nGMCA conçue pour prendre en compte ce type de bruit à l’aide d’un a priori non-stationnaire permet alors d’obtenir d’excellents résultats sur des données réelles annotées. / Blind source separation aims at extracting unknown source signals from observations where these sources are mixed together by an unknown process. However, this very generic and non-supervised approach does not always provide exploitable results. Therefore, it is often necessary to add more constraints, generally arising from physical considerations, in order to favor the recovery of sources with a particular sought-after structure. Non-negative matrix factorization (NMF), which is the main focus of this thesis, aims at searching for non-negative sources which are observed through non-negative linear mixtures.In some cases, further information still remains necessary in order to correctly separate the sources. Here, we focus on the sparsity concept, which helps improving the contrast between the sources, while providing very robust approaches, even when the data are contaminated by noise. We show that in order to obtain stable solutions, the non-negativity and sparse constraints must be applied adequately. In addition, using sparsity in a potentially redundant transformed domain could allow to capture the structure of most of natural image, but this kind of regularization proves difficult to apply together with the non-negativity constraint in the direct domain. We therefore propose a sparse NMF algorithm, named nGMCA (non-negative Generalized Morphological Component Analysis), which overcomes these difficulties by making use of proximal calculus techniques. Experiments on simulated data show that this algorithm is robust to additive Gaussian noise contamination, with an automatic control of the sparsity parameter. This novel algorithm also proves to be more efficient and robust than other state-of-the-art NMF algorithms on realistic data.Finally, we apply nGMCA on liquid chromatography - mass spectrometry data. Observation of these data show that they are contaminated by multiplicative noise, which greatly deteriorates the results of the NMF algorithms. An extension of nGMCA was designed to take into account this type of noise, thanks to the use of a non-stationary prior. This extension is then able to obtain excellent results on annotated real data.
|
3 |
Provable Methods for Non-negative Matrix FactorizationPani, Jagdeep January 2016 (has links) (PDF)
Nonnegative matrix factorization (NMF) is an important data-analysis problem which concerns factoring a given d n matrix A with nonnegative entries into matrices B and C where B and C are d k and k n with nonnegative entries. It has numerous applications including Object recognition, Topic Modelling, Hyper-spectral imaging, Music transcription etc. In general, NMF is intractable and several heuristics exists to solve the problem of NMF. Recently there has been interest in investigating conditions under which NMF can be tractably recovered. We note that existing attempts make unrealistic assumptions and often the associated algorithms tend to be not scalable.
In this thesis, we make three major contributions: First, we formulate a model of NMF with assumptions which are natural and is a substantial weakening of separability. Unlike requiring a bound on the error in each column of (A BC) as was done in much of previous work, our assumptions are about aggregate errors, namely spectral norm of (A BC) i.e. jjA BCjj2 should be low. This is a much weaker error assumption and the associated B; C would be much more resilient than existing models. Second, we describe a robust polynomial time SVD-based algorithm, UTSVD, with realistic provable error guarantees and can handle higher levels of noise than previous algorithms. Indeed, experimentally we show that existing NMF models, which are based on separability assumptions, degrade much faster than UTSVD, in the presence of noise. Furthermore, when the data has dominant features, UTSVD significantly outperforms existing models. On real life datasets we again see a similar outperformance of UTSVD on clustering tasks. Finally, under a weaker model, we prove a robust version of uniqueness of NMF, where again, the word \robust" refers to realistic error bounds.
|
4 |
Scalable and distributed constrained low rank approximationsKannan, Ramakrishnan 27 May 2016 (has links)
Low rank approximation is the problem of finding two low rank factors W and H such that the rank(WH) << rank(A) and A ≈ WH. These low rank factors W and H can be constrained for meaningful physical interpretation and referred as Constrained Low Rank Approximation (CLRA). Like most of the constrained optimization problem, performing CLRA can be computationally expensive than its unconstrained counterpart. A widely used CLRA is the Non-negative Matrix Factorization (NMF) which enforces non-negativity constraints in each of its low rank factors W and H. In this thesis, I focus on scalable/distributed CLRA algorithms for constraints such as boundedness and non-negativity for large real world matrices that includes text, High Definition (HD) video, social networks and recommender systems. First, I begin with the Bounded Matrix Low Rank Approximation (BMA) which imposes a lower and an upper bound on every element of the lower rank matrix. BMA is more challenging than NMF as it imposes bounds on the product WH rather than on each of the low rank factors W and H. For very large input matrices, we extend our BMA algorithm to Block BMA that can scale to a large number of processors. In applications, such as HD video, where the input matrix to be factored is extremely large, distributed computation is inevitable and the network communication becomes a major performance bottleneck. Towards this end, we propose a novel distributed Communication Avoiding NMF (CANMF) algorithm that communicates only the right low rank factor to its neighboring machine. Finally, a general distributed HPC- NMF framework that uses HPC techniques in communication intensive NMF operations and suitable for broader class of NMF algorithms.
|
5 |
Super-resolution methods for fluorescence microscopyMandula, Ondrej January 2013 (has links)
Fluorescence microscopy is an important tool for biological research. However, the resolution of a standard fluorescence microscope is limited by diffraction, which makes it difficult to observe small details of a specimen’s structure. We have developed two fluorescence microscopy methods that achieve resolution beyond the classical diffraction limit. The first method represents an extension of localisation microscopy. We used nonnegative matrix factorisation (NMF) to model a noisy dataset of highly overlapping fluorophores with intermittent intensities. We can recover images of individual sources from the optimised model, despite their high mutual overlap in the original dataset. This allows us to consider blinking quantum dots as bright and stable fluorophores for localisation microscopy. Moreover, NMF allows recovery of sources each having a unique shape. Such a situation can arise, for example, when the sources are located in different focal planes, and NMF can potentially be used for three dimensional superresolution imaging. We discuss the practical aspects of applying NMF to real datasets, and show super-resolution images of biological samples labelled with quantum dots. It should be noted that this technique can be performed on any wide-field epifluorescence microscope equipped with a camera, which makes this super-resolution method very accessible to a wide scientific community. The second optical microscopy method we discuss in this thesis is a member of the growing family of structured illumination techniques. Our main goal is to apply structured illumination to thick fluorescent samples generating a large out-of-focus background. The out-of-focus fluorescence background degrades the illumination pattern, and the reconstructed images suffer from the influence of noise. We present a combination of structured illumination microscopy and line scanning. This technique reduces the out-of-focus fluorescence background, which improves the quality of the illumination pattern and therefore facilitates reconstruction. We present super-resolution, optically sectioned images of a thick fluorescent sample, revealing details of the specimen’s inner structure. In addition, in this thesis we also discuss a theoretical resolution limit for noisy and pixelated data. We correct a previously published expression for the so-called fundamental resolution measure (FREM) and derive FREM for two fluorophores with intermittent intensity. We show that the intensity intermittency of the sources (observed for quantum dots, for example) can increase the “resolution” defined in terms of FREM.
|
6 |
ENHANCE NMF-BASED RECOMMENDATION SYSTEMS WITH AUXILIARY INFORMATION IMPUTATIONAlghamedy, Fatemah 01 January 2019 (has links)
This dissertation studies the factors that negatively impact the accuracy of the collaborative filtering recommendation systems based on nonnegative matrix factorization (NMF). The keystone in the recommendation system is the rating that expresses the user's opinion about an item. One of the most significant issues in the recommendation systems is the lack of ratings. This issue is called "cold-start" issue, which appears clearly with New-Users who did not rate any item and New-Items, which did not receive any rating.
The traditional recommendation systems assume that users are independent and identically distributed and ignore the connections among users whereas the recommendation actually is a social activity. This dissertation aims to enhance NMF-based recommendation systems by utilizing the imputation method and limiting the errors that are introduced in the system. External information such as trust network and item categories are incorporated into NMF-based recommendation systems through the imputation.
The proposed approaches impute various subsets of the missing ratings. The subsets are defined based on the total number of the ratings of the user or item before the imputation, such as impute the missing ratings of New-Users, New-Items, or cold-start users or items that suffer from the lack of the ratings. In addition, several factors are analyzed that affect the prediction accuracy when the imputation method is utilized with NMF-based recommendation systems. These factors include the total number of the ratings of the user or item before the imputation, the total number of imputed ratings for each user and item, the average of imputed rating values, and the value of imputed rating values. In addition, several strategies are applied to select the subset of missing ratings for the imputation that lead to increasing the prediction accuracy and limiting the imputation error. Moreover, a comparison is conducted with some popular methods that are in common with the proposed method in utilizing the imputation to handle the lack of ratings, but they differ in the source of the imputed ratings.
Experiments on different large-size datasets are conducted to examine the proposed approaches and analyze the effects of the imputation on accuracy. Users and items are divided into three groups based on the total number of the ratings before the imputation is applied and their recommendation accuracy is calculated. The results show that the imputation enhances the recommendation system by capacitating the system to recommend items to New-Users, introduce New-Items to users, and increase the accuracy of the cold-start users and items. However, the analyzed factors play important roles in the recommendation accuracy and limit the error that is introduced from the imputation.
|
7 |
Processus gaussiens pour la séparation de sources et le codage informéLiutkus, Antoine 27 November 2012 (has links) (PDF)
La séparation de sources est la tâche qui consiste à récupérer plusieurs signaux dont on observe un ou plusieurs mélanges. Ce problème est particulièrement difficile et de manière à rendre la séparation possible, toute information supplémentaire connue sur les sources ou le mélange doit pouvoir être prise en compte. Dans cette thèse, je propose un formalisme général permettant d'inclure de telles connaissances dans les problèmes de séparation, où une source est modélisée comme la réalisation d'un processus gaussien. L'approche a de nombreux intérêts : elle généralise une grande partie des méthodes actuelles, elle permet la prise en compte de nombreux a priori et les paramètres du modèle peuvent être estimés efficacement. Ce cadre théorique est appliqué à la séparation informée de sources audio, où la séparation est assistée d'une information annexe calculée en amont de la séparation, lors d'une phase préliminaire où à la fois le mélange et les sources sont disponibles. Pour peu que cette information puisse se coder efficacement, cela rend possible des applications comme le karaoké ou la manipulation des différents instruments au sein d'un mix à un coût en débit bien plus faible que celui requis par la transmission séparée des sources. Ce problème de la séparation informée s'apparente fortement à un problème de codage multicanal. Cette analogie permet de placer la séparation informée dans un cadre théorique plus global où elle devient un problème de codage particulier et bénéficie à ce titre des résultats classiques de la théorie du codage, qui permettent d'optimiser efficacement les performances.
|
8 |
Approaches to Natural Language ProcessingSmith, Sydney 01 January 2018 (has links)
This paper explores topic modeling through the example text of Alice in Wonderland. It explores both singular value decomposition as well as non-‐‑negative matrix factorization as methods for feature extraction. The paper goes on to explore methods for partially supervised implementation of topic modeling through introducing themes. A large portion of the paper also focuses on implementation of these techniques in python as well as visualizations of the results which use a combination of python, html and java script along with the d3 framework. The paper concludes by presenting a mixture of SVD, NMF and partially-‐‑supervised NMF as a possible way to improve topic modeling.
|
9 |
Finding Community Structures In Social Activity DataPeng, Chengbin 19 May 2015 (has links)
Social activity data sets are increasing in number and volume. Finding community structure in such data is valuable in many applications. For example, understand- ing the community structure of social networks may reduce the spread of epidemics or boost advertising revenue; discovering partitions in tra c networks can help to optimize routing and to reduce congestion; finding a group of users with common interests can allow a system to recommend useful items. Among many aspects, qual- ity of inference and e ciency in finding community structures in such data sets are of paramount concern. In this thesis, we propose several approaches to improve com- munity detection in these aspects.
The first approach utilizes the concept of K-cores to reduce the size of the problem. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection. It first applies a traditional algorithm that is relatively slow to the K-core, and then uses a fast heuristic to infer community labels for the remaining nodes.
The second approach is to scale the algorithm to multi-processor systems. We de- vise a scalable community detection algorithm for large networks based on stochastic block models. It is an alternating iterative algorithm using a maximum likelihood ap- proach. Compared with traditional inference algorithms for stochastic block models, our algorithm can scale to large networks and run on multi-processor systems. The
time complexity is linear in the number of edges of the input network.
The third approach is to improve the quality. We propose a framework for non- negative matrix factorization that allows the imposition of linear or approximately linear constraints on each factor. An example of the applications is to find community
structures in bipartite networks, which is useful in recommender systems.
Our algorithms are compared with the results in recent papers and their quality
and e ciency are verified by experiments.
|
10 |
Decomposition methods of NMR signal of complex mixtures : models ans applicationsToumi, Ichrak 28 October 2013 (has links)
L'objectif de ce travail était de tester des méthodes de SAS pour la séparation des spectres complexes RMN de mélanges dans les plus simples des composés purs. Dans une première partie, les méthodes à savoir JADE et NNSC ont été appliqué es dans le cadre de la DOSY , une application aux données CPMG était démontrée. Dans une deuxième partie, on s'est concentré sur le développement d'un algorithme efficace "beta-SNMF" . Ceci s'est montré plus performant que NNSC pour beta inférieure ou égale à 2. Etant donné que dans la littérature, le choix de beta a été adapté aux hypothèses statistiques sur le bruit additif, une étude statistique du bruit RMN de la DOSY a été faite pour obtenir une image plus complète de nos données RMN étudiées. / The objective of the work was to test BSS methods for the separation of the complex NMR spectra of mixtures into the simpler ones of the pure compounds. In a first part, known methods namely JADE and NNSC were applied in conjunction for DOSY , performing applications for CPMG were demonstrated. In a second part, we focused on developing an effective algorithm "beta- SNMF ". This was demonstrated to outperform NNSC for beta less or equal to 2. Since in the literature, the choice of beta has been adapted to the statistical assumptions on the additive noise, a statistical study of NMR DOSY noise was done to get a more complete picture about our studied NMR data.
|
Page generated in 0.0305 seconds