Global ETD Search

1	Analysis and Prediction of Community Structure Using Unsupervised Learning Biradar, Rakesh 26 January 2016 (has links) In this thesis, we perform analysis and prediction for community structures in graphs using unsupervised learning. The methods we use require the data matrices to be of low rank, and such matrices appear quite often in real world problems across a broad range of domains. Such a modelling assumption is widely considered by classical algorithms such as principal component analysis (PCA), and the same assumption is often used to achieve dimensionality reduction. Dimension reduction, which is a classic method in unsupervised learning, can be leveraged in a wide array of problems, including prediction of strength of connection between communities from unlabeled or partially labeled data. Accordingly, a low rank assumption addresses many real world problems, and a low rank assumption has been used in this thesis to predict the strength of connection between communities in Amazon product data. In particular, we have analyzed real world data across retail and cyber domains, with the focus being on the retail domain. Herein, our focus is on analyzing the strength of connection between the communities in Amazon product data, where each community represents a group of products, and we are given the strength of connection between the individual products but not between the product communities. We call the strength of connection between individual products first order data and the strength of connection between communities second order data. This usage is inspired by [1] where first order time series are used to compute second order covariance matrices where such covariance matrices encode the strength of connection between the time series. In order to find the strength of connection between the communities, we define various metrics to measure this strength, and one of the goals of this thesis is to choose a good metric, which supports effective predictions. However, the main objective is to predict the strength of connection between most of the communities, given measurements of the strength of connection between only a few communities. To address this challenge, we use modern extensions of PCA such as eRPCA that can provide better predictions and can be computationally efficient for large problems. However, the current theory of eRPCA algorithms is not designed to treat problems where the initial data (such as the second order matrix of communities strength) is both low rank and sparse. Therefore, we analyze the performance of eRPCA algorithm on such data and modify our approaches for the particular structure of Amazon product communities to perform the necessary predictions. eRPCA Community Prediction Low Rank Sparse Matrix
2	Discovery and Interpretation of Subspace Structures in Omics Data by Low-Rank Representation Lu, Xiaoyu 10 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Biological functions in cells are highly complicated and heterogenous, and can be reflected by omics data, such as gene expression levels. Detecting subspace structures in omics data and understanding the diversity of the biological processes is essential to the full comprehension of biological mechanisms and complicated biological systems. In this thesis, we are developing novel statistical learning approaches to reveal the subspace structures in omics data. Specifically, we focus on three types of subspace structures: low-rank subspace, sparse subspace and covariates explainable subspace. For low-rank subspace, we developed a semi-supervised model SSMD to detect cell type specific low-rank structures and predict their relative proportions across different tissue samples. SSMD is the first computational tool that utilizes semi-supervised identification of cell types and their marker genes specific to each mouse tissue transcriptomics data, for better understanding of the disease microenvironment and downstream disease mechanism. For sparsity-driven sparse subspace, we proposed a novel positive and unlabeled learning model, namely PLUS, that could identify cancer metastasis related genes, predict cancer metastasis status and specifically address the under-diagnosis issue in studying metastasis potential. We found PLUS predicted metastasis potential at diagnosis have significantly strong association with patient’s progression-free survival in their follow-up data. Lastly, to discover the covariates explainable subspace, we proposed an analytical pipeline based on covariance regression, namely, scCovReg. We utilized scCovReg to detect the pathway level second-order variations using scRNA-Seq data in a statistically powerful manner, and to associate the second-order variations with important subject-level characteristics, such as disease status. In conclusion, we presented a set of state-of-the-art computational solutions for identifying sparse subspaces in omics data, which promise to provide insights into the mechanism in complex diseases. Bioinformatics Computational Biology Low-Rank Representation Subspace
3	Pushing the limits of spectroscopic imaging using novel low-rank based reconstruction algorithm Bhattacharya, Ipshita 01 May 2017 (has links) Non-invasively reosolving spatial distribution of tissue metabolites serves as a diagnostic tool to in-vivo metabolism thus making magnetic resonance spectroscopic imaging (MRSI) a very useful application. The tissue concentrations of various metabolites reveal disease state and pseudo-progression of tumors. Also, bio-chemical changes manifest much earlier than structural changes that are achieved using standard magnetic resonance imaging(MRI). However, MRSI has not achieved its potential due to several technical challenges that are specic to it. Several technical advances in the eld of MRI does not translate to MRSI. The specic limitations which make MRSI challenging include long scan times, poor spatial resolution, extremely low signal to noise ratio (SNR). In the last few decades, research in MRSI has focused on advanced data acquisition and reconstruction methods, however they cannot achieve high resolution and feasible scan time. Moreover there are several artifacts that lead to increase of spatial resolution not to mention starved SNR. Existing methods cannot deal with these limitations which considerably impacts applications of MRSI. This thesis work we revisit these problems and introduce data acquisition and reconstruction techniques to address several such challenges. In the first part of the thesis we introduce a variable density spiral acquisition technique which achieves high SNR corresponding to metabolites of interest while reducing truncation artifacts. Along with that we develop a novel compartmentalized reconstruction framework to recover high resolution data from lipid unsuppressed data. Avoiding lipid suppression not only reduces scan time and reliability but also improves SNR which is otherwise reduced even further with existing lipid suppression methods. The proposed algorithm exploits the idea that the lipid and metabolite compartment reside in low-dimensional subspace and we use orthogonality priors to reduce overlap of subspaces. We also look at spectral artifacts like Nyquist ghosting which is a common problem with spectral interleaving. Especially in echo-planar spectroscopic imaging (EPSI), one of the most popular MRSI techniques, maintaining a spatial and spectral resolution requires interleaving. Due to scanner inconsistencies spurious peaks arise which makes quantication inecient. In this thesis a novel structural low-rank prior is used to reduce and denoise spectra and achieve high resolution ESPI data. Finally we look at accelerating multi-dimensional spectroscopic problems. Resolving spectra in two dimensions can help study overlapping spectra and achieve more insight. However with an increased dimension the scan time increases. We developed an algorithm for accelerating this method by recovering data from undersampled measurements. We demonstrate the performance in two applications, 2D infra red spectroscopy and 2D MR spectroscopy . The aim of the thesis is to solve these challenges in MRSI from a signal processing perspective and be able to achieve higher resolution data in practical scan time to ultimately help MRSI reach its potential. Low-rank based algorithms Magnetic Resonance Imaging Optimization Algorithm Spectroscopy Structured low-rank Electrical and Computer Engineering
4	Scalable and distributed constrained low rank approximations Kannan, Ramakrishnan 27 May 2016 (has links) Low rank approximation is the problem of finding two low rank factors W and H such that the rank(WH) << rank(A) and A ≈ WH. These low rank factors W and H can be constrained for meaningful physical interpretation and referred as Constrained Low Rank Approximation (CLRA). Like most of the constrained optimization problem, performing CLRA can be computationally expensive than its unconstrained counterpart. A widely used CLRA is the Non-negative Matrix Factorization (NMF) which enforces non-negativity constraints in each of its low rank factors W and H. In this thesis, I focus on scalable/distributed CLRA algorithms for constraints such as boundedness and non-negativity for large real world matrices that includes text, High Definition (HD) video, social networks and recommender systems. First, I begin with the Bounded Matrix Low Rank Approximation (BMA) which imposes a lower and an upper bound on every element of the lower rank matrix. BMA is more challenging than NMF as it imposes bounds on the product WH rather than on each of the low rank factors W and H. For very large input matrices, we extend our BMA algorithm to Block BMA that can scale to a large number of processors. In applications, such as HD video, where the input matrix to be factored is extremely large, distributed computation is inevitable and the network communication becomes a major performance bottleneck. Towards this end, we propose a novel distributed Communication Avoiding NMF (CANMF) algorithm that communicates only the right low rank factor to its neighboring machine. Finally, a general distributed HPC- NMF framework that uses HPC techniques in communication intensive NMF operations and suitable for broader class of NMF algorithms. Distributed Scalable NMF Communication avoiding HPC Low rank approximation
5	Nonnegative matrix factorization algorithms and applications Ho, Ngoc-Diep 09 June 2008 (has links) Data-mining has become a hot topic in recent years. It consists of extracting relevant information or structures from data such as: pictures, textual material, networks, etc. Such information or structures are usually not trivial to obtain and many techniques have been proposed to address this problem, including Independent Component Analysis, Latent Sematic Analysis, etc. Nonnegative Matrix Factorization is yet another technique that relies on the nonnegativity of the data and the nonnegativity assumption of the underlying model. The main advantage of this technique is that nonnegative objects are modeled by a combination of some basic nonnegative parts, which provides a physical interpretation of the construction of the objects. This is an exclusive feature that is known to be useful in many areas such as Computer Vision, Information Retrieval, etc. In this thesis, we look at several aspects of Nonnegative Matrix Factorization, focusing on numerical algorithms and their applications to different kinds of data and constraints. This includes Tensor Nonnegative Factorization, Weighted Nonnegative Matrix Factorization, Symmetric Nonnegative Matrix Factorization, Stochastic Matrix Approximation, etc. The recently proposed Rank-one Residue Iteration (RRI) is the common thread in all of these factorizations. It is shown to be a fast method with good convergence properties which adapts well to many situations. Approximation Factorization Nonnegative matrix Low-rank Data-mining Numerical algorithm
6	Remote-Sensed LIDAR Using Random Impulsive Scans Castorena, Juan 10 1900 (has links) Third generation full-waveform (FW) LIDAR systems image an entire scene by emitting laser pulses in particular directions and measuring the echoes. Each of these echoes provides range measurements about the objects intercepted by the laser pulse along a specified direction. By scanning through a specified region using a series of emitted pulses and observing their echoes, connected 1D profiles of 3D scenes can be readily obtained. This extra information has proven helpful in providing additional insight into the scene structure which can be used to construct effective characterizations and classifications. Unfortunately, massive amounts of data are typically collected which impose storage, processing and transmission limitations. To address these problems, a number of compression approaches have been developed in the literature. These, however, generally require the initial acquisition of large amounts of data only to later discard most of it by exploiting redundancies, thus sampling inefficiently. Based on this, our main goal is to apply efficient and effective LIDAR sampling schemes that achieve acceptable reconstruction quality of the 3D scenes. To achieve this goal, we propose on using compressive sampling by emitting pulses only into random locations within the scene and collecting only the corresponding returned FW signals. Under this framework, the number of emissions would typically be much smaller than what traditional LIDAR systems require. Application of this requires, however, that scenes contain many degrees of freedom. Fortunately, such a requirement is satisfied in most natural and man-made scenes. Here, we propose to use a measure of rank as the measure of degrees of freedom. To recover the connected 1D profiles of the 3D scene, matrix completion is applied to the tensor slices. In this paper, we test our approach by showing that recovery of compressively sampled 1D profiles of actual 3D scenes is possible using only a subset of measurements. LIDAR full-waveform compressive sensing matrix completion low-rank approximations
7	A non-asymptotic study of low-rank estimation of smooth kernels on graphs Rangel Walteros, Pedro Andres 12 January 2015 (has links) This dissertation investigates the problem of estimating a kernel over a large graph based on a sample of noisy observations of linear measurements of the kernel. We are interested in solving this estimation problem in the case when the sample size is much smaller than the ambient dimension of the kernel. As is typical in high-dimensional statistics, we are able to design a suitable estimator based on a small number of samples only when the target kernel belongs to a subset of restricted complexity. In our study, we restrict the complexity by considering scenarios where the target kernel is both low-rank and smooth over a graph. Using standard tools of non-parametric estimation, we derive a minimax lower bound on the least squares error in terms of the rank and the degree of smoothness of the target kernel. To prove the optimality of our lower-bound, we proceed to develop upper bounds on the error for a least-square estimator based on a non-convex penalty. The proof of these upper bounds depends on bounds for estimators over uniformly bounded function classes in terms of Rademacher complexities. We also propose a computationally tractable estimator based on least-squares with convex penalty. We derive an upper bound for the computationally tractable estimator in terms of a coherence function introduced in this work. Finally, we present some scenarios wherein this upper bound achieves a near-optimal rate. The motivations for studying such problems come from various real-world applications like recommender systems and social network analysis. Low-rank matrix completion Kernels on graphs High dimensional probability
8	LOW RANK AND SPARSE MODELING FOR DATA ANALYSIS Kang, Zhao 01 May 2017 (has links) (PDF) Nowadays, many real-world problems must deal with collections of high-dimensional data. High dimensional data usually have intrinsic low-dimensional representations, which are suited for subsequent analysis or processing. Therefore, finding low-dimensional representations is an essential step in many machine learning and data mining tasks. Low-rank and sparse modeling are emerging mathematical tools dealing with uncertainties of real-world data. Leveraging on the underlying structure of data, low-rank and sparse modeling approaches have achieved impressive performance in many data analysis tasks. Since the general rank minimization problem is computationally NP-hard, the convex relaxation of original problem is often solved. One popular heuristic method is to use the nuclear norm to approximate the rank of a matrix. Despite the success of nuclear norm minimization in capturing the low intrinsic-dimensionality of data, the nuclear norm minimizes not only the rank, but also the variance of matrix and may not be a good approximation to the rank function in practical problems. To mitigate above issue, this thesis proposes several nonconvex functions to approximate the rank function. However, It is often difficult to solve nonconvex problem. In this thesis, an optimization framework for nonconvex problem is further developed. The effectiveness of this approach is examined on several important applications, including matrix completion, robust principle component analysis, clustering, and recommender systems. Another issue associated with current clustering methods is that they work in two separate steps including similarity matrix computation and subsequent spectral clustering. The learned similarity matrix may not be optimal for subsequent clustering. Therefore, a unified algorithm framework is developed in this thesis. To capture the nonlinear relations among data points, we formulate this method in kernel space. Furthermore, the obtained continuous spectral solutions could severely deviate from the true discrete cluster labels, a discrete transformation is further incorporated in our model. Finally, our framework can simultaneously learn similarity matrix, kernel, and discrete cluster labels. The performance of the proposed algorithms is established through extensive experiments. This framework can be easily extended to semi-supervised classification. clustering data analytics data mining low rank machine learning sparse
9	Provable Algorithms for Scalable and Robust Low-Rank Matrix Recovery Li, Yuanxin 09 October 2018 (has links) No description available. Electrical Engineering low-rank matrix recovery outliers provability robustness scalability
10	Low rank methods for network alignment Huda Nassar (7047152) 15 August 2019 (has links) Network alignment is the problem of finding a common subgraph between two graphs, and more generally <i>k </i>graphs. The results of network alignment are often used for information transfer, which makes it a powerful tool for deducing information or insight about networks. Network alignment is tightly related to the subgraph isomorphism problem which is known to be NP-hard, this makes the network alignment problem supremely hard in practice. Some algorithms have been devised to approach it via solving a form of a relaxed version of the NP-hard problem or by defining certain heuristic measures. These algorithms normally work well for problems when there is some form of prior known similarity between the nodes of the graphs to be aligned. The absence of such information makes the problem more challenging. In this scenario, these algorithms would often require much more time to finish executing, and even fail sometimes. The version of network alignment that this thesis tackles is the one when such prior similarity measures are absent. In this thesis, we address three versions of network alignment: (i) multimoal network alignment, (ii) standard pairwise network alignment, and (iii) multiple network alignment. A key common component of the algorithms presented in this thesis is exploiting a low rank structure in the network alignment problem and thus producing algorithms that run much faster than classic network alignment algorithms. Applied Computer Science network alignment low rank methods bipartite matching matching k-dimensional matching low rank matching

Search results