Spelling suggestions: "subject:"[een] NON-NEGATIVE MATRIX FACTORIZATION"" "subject:"[enn] NON-NEGATIVE MATRIX FACTORIZATION""
11 |
Nonnegative matrix factorization with applications to sequencing data analysisKong, Yixin 25 February 2022 (has links)
A latent factor model for count data is popularly applied when deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the estimators can enjoy much better accuracy by utilizing the extra information. However, such an advantage quickly disappears in the presence of excessive zeros. To correctly account for such a phenomenon, we propose a zero-inflated non-negative matrix factorization that models excessive zeros in both mixed and pure samples and derive an effective multiplicative parameter updating rule. In simulation studies, our method yields smaller bias comparing to other deconvolution methods. We applied our approach to gene expression from brain tissue as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.
In zero-inflated non-negative matrix factorization (iNMF) for the deconvolution of mixed signals of biological data, pure-samples play a significant role by solving the identifiability issue as well as improving the accuracy of estimates. One of the main issues of using single-cell data is that the identities(labels) of the cells are not given. Thus, it is crucial to sort these cells into their correct types computationally. We propose a nonlinear latent variable model that can be used for sorting pure-samples as well as grouping mixed-samples via deep neural networks. The computational difficulty will be handled by adopting a method known as variational autoencoding. While doing so, we keep the NMF structure in a decoder neural network, which makes the output of the network interpretable.
|
12 |
Merging and Fractionation of Muscle Synergy Indicate the Recovery Process in Patients with Hemiplegia: The First Study of Patients after Subacute Stroke. / 筋シナジーの混合と分離は脳卒中後片麻痺者の回復過程を示す : 回復期脳卒中後片麻痺者を対象とした研究)Hashiguchi, Yu 26 March 2018 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(人間健康科学) / 甲第21040号 / 人健博第56号 / 新制||人健||4(附属図書館) / 京都大学大学院医学研究科人間健康科学系専攻 / (主査)教授 三谷 章, 教授 澤本 伸克, 教授 髙橋 良輔 / 学位規則第4条第1項該当 / Doctor of Human Health Sciences / Kyoto University / DFAM
|
13 |
Machine Learning Approaches to Historic Music RestorationColeman, Quinn 01 March 2021 (has links) (PDF)
In 1889, a representative of Thomas Edison recorded Johannes Brahms playing a piano arrangement of his piece titled “Hungarian Dance No. 1”. This recording acts as a window into how musical masters played in the 19th century. Yet, due to years of damage on the original recording medium of a wax cylinder, it was un-listenable by the time it was digitized into WAV format. This thesis presents machine learning approaches to an audio restoration system for historic music, which aims to convert this poor-quality Brahms piano recording into a higher quality one. Digital signal processing is paired with two machine learning approaches: non-negative matrix factorization and deep neural networks. Our results show the advantages and disadvantages of our approaches, when we compare them to a benchmark restoration of the same recording made by the Center for Computer Research in Music and Acoustics at Stanford University. They also show how this system provides the restoration potential for a wide range of historic music artifacts like this recording, requiring minimal overhead made possible by machine learning. Finally, we go into possible future improvements to these approaches.
|
14 |
Mining Structural and Functional Patterns in Pathogenic and Benign Genetic Variants through Non-negative Matrix FactorizationPeña-Guerra, Karla A 08 1900 (has links)
The main challenge in studying genetics has evolved from identifying variations and their impact on traits to comprehending the molecular mechanisms through which genetic variations affect human biology, including disease susceptibility. Despite having identified a vast number of variants associated with human traits through large scale genome wide association studies (GWAS) a significant portion of them still lack detailed insights into their underlying mechanisms [1]. Addressing this uncertainty requires the development of precise and scalable approaches to discover how genetic variation precisely influences phenotypes at a molecular level. In this study, we developed a pipeline to automate the annotation of structural variant feature effects. We applied this pipeline to a dataset of 33,942 variants from the ClinVar and GnomAD databases, which included both pathogenic and benign associations. To bridge the gap between genetic variation data and molecular phenotypes, I implemented Non-negative Matrix Factorization (NMF) on this large-scale dataset. This algorithm revealed 6 distinct clusters of variants with similar feature profiles. Among these groups, two exhibited a predominant presence of benign variants (accounting for 70% and 85% of the clusters), while one showed an almost equal distribution of pathogenic and benign variants. The remaining three groups were predominantly composed of pathogenic variants, comprising 68%, 83%, and 77% of the respective clusters. These findings revealed valuable insights into the underlying mechanisms contributing to pathogenicity. Further analysis of this dataset and the exploration of disease-related genes can enhance the accuracy of genetic diagnosis and therapeutic development through the direct inference of variants that are likely to affect the functioning of essential genes.
|
15 |
Accuracy and Interpretability Testing of Text Mining MethodsAshton, Triss A. 08 1900 (has links)
Extracting meaningful information from large collections of text data is problematic because of the sheer size of the database. However, automated analytic methods capable of processing such data have emerged. These methods, collectively called text mining first began to appear in 1988. A number of additional text mining methods quickly developed in independent research silos with each based on unique mathematical algorithms. How good each of these methods are at analyzing text is unclear. Method development typically evolves from some research silo centric requirement with the success of the method measured by a custom requirement-based metric. Results of the new method are then compared to another method that was similarly developed. The proposed research introduces an experimentally designed testing method to text mining that eliminates research silo bias and simultaneously evaluates methods from all of the major context-region text mining method families. The proposed research method follows a random block factorial design with two treatments consisting of three and five levels (RBF-35) with repeated measures. Contribution of the research is threefold. First, the users perceived a difference in the effectiveness of the various methods. Second, while still not clear, there are characteristics with in the text collection that affect the algorithms ability to extract meaningful results. Third, this research develops an experimental design process for testing the algorithms that is adaptable into other areas of software development and algorithm testing. This design eliminates the bias based practices historically employed by algorithm developers.
|
16 |
Discovering Hidden Networks Using Topic ModelingCooper, Wyatt 01 January 2017 (has links)
This paper explores topic modeling via unsupervised non-negative matrix factorization. This technique is used on a variety of sources in order to extract salient topics. From these topics, hidden entity networks are discovered and visualized in a graph representation. In addition, other visualization techniques such as examining the time series of a topic and examining the top words of a topic are used for evaluation and analysis. There is a large software component to this project, and so this paper will also focus on the design decisions that were made in order to make the program developed as versatile and extensible as possible.
|
17 |
Fatoração de matrizes no problema de coagrupamento com sobreposição de colunas / Matrix factorization for overlapping columns coclusteringBrunialti, Lucas Fernandes 31 August 2016 (has links)
Coagrupamento é uma estratégia para análise de dados capaz de encontrar grupos de dados, então denominados cogrupos, que são formados considerando subconjuntos diferentes das características descritivas dos dados. Contextos de aplicação caracterizados por apresentar subjetividade, como mineração de texto, são candidatos a serem submetidos à estratégia de coagrupamento; a flexibilidade em associar textos de acordo com características parciais representa um tratamento adequado a tal subjetividade. Um método para implementação de coagrupamento capaz de lidar com esse tipo de dados é a fatoração de matrizes. Nesta dissertação de mestrado são propostas duas estratégias para coagrupamento baseadas em fatoração de matrizes não-negativas, capazes de encontrar cogrupos organizados com sobreposição de colunas em uma matriz de valores reais positivos. As estratégias são apresentadas em termos de suas definições formais e seus algoritmos para implementação. Resultados experimentais quantitativos e qualitativos são fornecidos a partir de problemas baseados em conjuntos de dados sintéticos e em conjuntos de dados reais, sendo esses últimos contextualizados na área de mineração de texto. Os resultados são analisados em termos de quantização do espaço e capacidade de reconstrução, capacidade de agrupamento utilizando as métricas índice de Rand e informação mútua normalizada e geração de informação (interpretabilidade dos modelos). Os resultados confirmam a hipótese de que as estratégias propostas são capazes de descobrir cogrupos com sobreposição de forma natural, e que tal organização de cogrupos fornece informação detalhada, e portanto de valor diferenciado, para as áreas de análise de agrupamento e mineração de texto / Coclustering is a data analysis strategy which is able to discover data clusters, known as coclusters. This technique allows data to be clustered based on different subsets defined by data descriptive features. Application contexts characterized by subjectivity, such as text mining, are candidates for applying coclustering strategy due to the flexibility to associate documents according to partial features. The coclustering method can be implemented by means of matrix factorization, which is suitable to handle this type of data. In this thesis two strategies are proposed in non-negative matrix factorization for coclustering. These strategies are able to find column overlapping coclusters in a given dataset of positive data and are presented in terms of their formal definitions as well as their algorithms\' implementation. Quantitative and qualitative experimental results are presented through applying synthetic datasets and real datasets contextualized in text mining. This is accomplished by analyzing them in terms of space quantization, clustering capabilities and generated information (interpretability of models). The well known external metrics Rand index and normalized mutual information are used to achieve the analysis of clustering capabilities. Results confirm the hypothesis that the proposed strategies are able to discover overlapping coclusters naturally. Moreover, these coclusters produced by the new algorithms provide detailed information and are thus valuable for future research in cluster analysis and text mining
|
18 |
Simplificação e análise de redes com dados multivariados / Simplification and analysis of network with multivariate dataDias, Markus Diego Sampaio da Silva 17 October 2018 (has links)
As técnicas de visualização desempenham um papel importante na assistência e compreensão de redes e seus elementos. No entanto, quando enfrentamos redes massivas, a análise tende a ser prejudicada pela confusão visual. Esquemas de simplificação e agrupamento têm sido algumas das principais alternativas neste contexto. No entanto, a maioria das técnicas de simplificação consideram apenas informações extraídas da topologia da rede, desconsiderando conteúdo adicional definido nos nós ou arestas da rede. Neste trabalho, propomos dois estudos. Primeiro uma nova metodologia para simplificação de redes que utiliza tanto a topologia quanto o conteúdo associado aos elementos de rede. A metodologia proposta baseia-se na fatoração de matriz não negativa (NMF) e emparelhamento para realizar a simplificação, combinadas para gerar uma representação hierárquica da rede, agrupando elementos semelhantes em cada nível da hierarquia. Propomos também um estudo da utilização da teoria de processamento de sinal em grafos para filtrar os dados associados aos elementos da rede e o seu efeito no processo de simplificação. / Visualization tools play an important role in assisting and understanding networks and their elements. However, when faced with larger networks, analytical tasks can be hindered by visual clutter. Schemes of simplification and clustering have been a main alternative in this context. Nevertheless, most simplification techniques consider only information extracted from the network topology, disregarding additional content defined in nodes or edges. In this paper, we propose two studies. First, a new methodology for network simplification that uses both topology and content associated with network elements. The proposed methodology is based on non-negative matrix factorization (NMF) and graph matching to perform the simplification, combined to generate a hierarchical representation of the network, grouping the most similar elements at each level of a hierarchy. We also provide a study of the use of the graph signal processing theory to filter data associated to the elements of a network and its effect in the process of simplification.
|
19 |
Fatoração de matrizes no problema de coagrupamento com sobreposição de colunas / Matrix factorization for overlapping columns coclusteringLucas Fernandes Brunialti 31 August 2016 (has links)
Coagrupamento é uma estratégia para análise de dados capaz de encontrar grupos de dados, então denominados cogrupos, que são formados considerando subconjuntos diferentes das características descritivas dos dados. Contextos de aplicação caracterizados por apresentar subjetividade, como mineração de texto, são candidatos a serem submetidos à estratégia de coagrupamento; a flexibilidade em associar textos de acordo com características parciais representa um tratamento adequado a tal subjetividade. Um método para implementação de coagrupamento capaz de lidar com esse tipo de dados é a fatoração de matrizes. Nesta dissertação de mestrado são propostas duas estratégias para coagrupamento baseadas em fatoração de matrizes não-negativas, capazes de encontrar cogrupos organizados com sobreposição de colunas em uma matriz de valores reais positivos. As estratégias são apresentadas em termos de suas definições formais e seus algoritmos para implementação. Resultados experimentais quantitativos e qualitativos são fornecidos a partir de problemas baseados em conjuntos de dados sintéticos e em conjuntos de dados reais, sendo esses últimos contextualizados na área de mineração de texto. Os resultados são analisados em termos de quantização do espaço e capacidade de reconstrução, capacidade de agrupamento utilizando as métricas índice de Rand e informação mútua normalizada e geração de informação (interpretabilidade dos modelos). Os resultados confirmam a hipótese de que as estratégias propostas são capazes de descobrir cogrupos com sobreposição de forma natural, e que tal organização de cogrupos fornece informação detalhada, e portanto de valor diferenciado, para as áreas de análise de agrupamento e mineração de texto / Coclustering is a data analysis strategy which is able to discover data clusters, known as coclusters. This technique allows data to be clustered based on different subsets defined by data descriptive features. Application contexts characterized by subjectivity, such as text mining, are candidates for applying coclustering strategy due to the flexibility to associate documents according to partial features. The coclustering method can be implemented by means of matrix factorization, which is suitable to handle this type of data. In this thesis two strategies are proposed in non-negative matrix factorization for coclustering. These strategies are able to find column overlapping coclusters in a given dataset of positive data and are presented in terms of their formal definitions as well as their algorithms\' implementation. Quantitative and qualitative experimental results are presented through applying synthetic datasets and real datasets contextualized in text mining. This is accomplished by analyzing them in terms of space quantization, clustering capabilities and generated information (interpretability of models). The well known external metrics Rand index and normalized mutual information are used to achieve the analysis of clustering capabilities. Results confirm the hypothesis that the proposed strategies are able to discover overlapping coclusters naturally. Moreover, these coclusters produced by the new algorithms provide detailed information and are thus valuable for future research in cluster analysis and text mining
|
20 |
Characterisation of a developer’s experience fields using topic modellingDéhaye, Vincent January 2020 (has links)
Finding the most relevant candidate for a position represents an ubiquitous challenge for organisations. It can also be arduous for a candidate to explain on a concise resume what they have experience with. Due to the fact that the candidate usually has to select which experience to expose and filter out some of them, they might not be detected by the person carrying out the search, whereas they were indeed having the desired experience. In the field of software engineering, developing one's experience usually leaves traces behind: the code one produced. This project explores approaches to tackle the screening challenges with an automated way of extracting experience directly from code by defining common lexical patterns in code for different experience fields, using topic modeling. Two different techniques were compared. On one hand, Latent Dirichlet Allocation (LDA) is a generative statistical model which has proven to yield good results in topic modeling. On the other hand Non-Negative Matrix Factorization (NMF) is simply a singular value decomposition of a matrix representing the code corpus as word counts per piece of code.The code gathered consisted of 30 random repositories from all the collaborators of the open-source Ruby-on-Rails project on GitHub, which was then applied common natural language processing transformation steps. The results of both techniques were compared using respectively perplexity for LDA, reconstruction error for NMF and topic coherence for both. The two first represent how well the data could be represented by the topics produced while the later estimates the hanging and fitting together of the elements of a topic, and can depict human understandability and interpretability. Given that we did not have any similar work to benchmark with, the performance of the values obtained is hard to assess scientifically. However, the method seems promising as we would have been rather confident in assigning labels to 10 of the topics generated. The results imply that one could probably use natural language processing methods directly on code production in order to extend the detected fields of experience of a developer, with a finer granularity than traditional resumes and with fields definition evolving dynamically with the technology.
|
Page generated in 0.1116 seconds