Return to search

Nonnegative matrix factorization with applications to sequencing data analysis

A latent factor model for count data is popularly applied when deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the estimators can enjoy much better accuracy by utilizing the extra information. However, such an advantage quickly disappears in the presence of excessive zeros. To correctly account for such a phenomenon, we propose a zero-inflated non-negative matrix factorization that models excessive zeros in both mixed and pure samples and derive an effective multiplicative parameter updating rule. In simulation studies, our method yields smaller bias comparing to other deconvolution methods. We applied our approach to gene expression from brain tissue as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.
In zero-inflated non-negative matrix factorization (iNMF) for the deconvolution of mixed signals of biological data, pure-samples play a significant role by solving the identifiability issue as well as improving the accuracy of estimates. One of the main issues of using single-cell data is that the identities(labels) of the cells are not given. Thus, it is crucial to sort these cells into their correct types computationally. We propose a nonlinear latent variable model that can be used for sorting pure-samples as well as grouping mixed-samples via deep neural networks. The computational difficulty will be handled by adopting a method known as variational autoencoding. While doing so, we keep the NMF structure in a decoder neural network, which makes the output of the network interpretable.

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/43944
Date25 February 2022
CreatorsKong, Yixin
ContributorsCarvalho, Luis E., Chun, Hyonho
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0016 seconds