• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 55
  • 7
  • 4
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 92
  • 92
  • 34
  • 24
  • 22
  • 16
  • 15
  • 15
  • 14
  • 14
  • 12
  • 12
  • 12
  • 11
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Analysis and Prediction of Community Structure Using Unsupervised Learning

Biradar, Rakesh 26 January 2016 (has links)
In this thesis, we perform analysis and prediction for community structures in graphs using unsupervised learning. The methods we use require the data matrices to be of low rank, and such matrices appear quite often in real world problems across a broad range of domains. Such a modelling assumption is widely considered by classical algorithms such as principal component analysis (PCA), and the same assumption is often used to achieve dimensionality reduction. Dimension reduction, which is a classic method in unsupervised learning, can be leveraged in a wide array of problems, including prediction of strength of connection between communities from unlabeled or partially labeled data. Accordingly, a low rank assumption addresses many real world problems, and a low rank assumption has been used in this thesis to predict the strength of connection between communities in Amazon product data. In particular, we have analyzed real world data across retail and cyber domains, with the focus being on the retail domain. Herein, our focus is on analyzing the strength of connection between the communities in Amazon product data, where each community represents a group of products, and we are given the strength of connection between the individual products but not between the product communities. We call the strength of connection between individual products first order data and the strength of connection between communities second order data. This usage is inspired by [1] where first order time series are used to compute second order covariance matrices where such covariance matrices encode the strength of connection between the time series. In order to find the strength of connection between the communities, we define various metrics to measure this strength, and one of the goals of this thesis is to choose a good metric, which supports effective predictions. However, the main objective is to predict the strength of connection between most of the communities, given measurements of the strength of connection between only a few communities. To address this challenge, we use modern extensions of PCA such as eRPCA that can provide better predictions and can be computationally efficient for large problems. However, the current theory of eRPCA algorithms is not designed to treat problems where the initial data (such as the second order matrix of communities strength) is both low rank and sparse. Therefore, we analyze the performance of eRPCA algorithm on such data and modify our approaches for the particular structure of Amazon product communities to perform the necessary predictions.

Discovery and Interpretation of Subspace Structures in Omics Data by Low-Rank Representation

Lu, Xiaoyu 10 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Biological functions in cells are highly complicated and heterogenous, and can be reflected by omics data, such as gene expression levels. Detecting subspace structures in omics data and understanding the diversity of the biological processes is essential to the full comprehension of biological mechanisms and complicated biological systems. In this thesis, we are developing novel statistical learning approaches to reveal the subspace structures in omics data. Specifically, we focus on three types of subspace structures: low-rank subspace, sparse subspace and covariates explainable subspace. For low-rank subspace, we developed a semi-supervised model SSMD to detect cell type specific low-rank structures and predict their relative proportions across different tissue samples. SSMD is the first computational tool that utilizes semi-supervised identification of cell types and their marker genes specific to each mouse tissue transcriptomics data, for better understanding of the disease microenvironment and downstream disease mechanism. For sparsity-driven sparse subspace, we proposed a novel positive and unlabeled learning model, namely PLUS, that could identify cancer metastasis related genes, predict cancer metastasis status and specifically address the under-diagnosis issue in studying metastasis potential. We found PLUS predicted metastasis potential at diagnosis have significantly strong association with patient’s progression-free survival in their follow-up data. Lastly, to discover the covariates explainable subspace, we proposed an analytical pipeline based on covariance regression, namely, scCovReg. We utilized scCovReg to detect the pathway level second-order variations using scRNA-Seq data in a statistically powerful manner, and to associate the second-order variations with important subject-level characteristics, such as disease status. In conclusion, we presented a set of state-of-the-art computational solutions for identifying sparse subspaces in omics data, which promise to provide insights into the mechanism in complex diseases.

Pushing the limits of spectroscopic imaging using novel low-rank based reconstruction algorithm

Bhattacharya, Ipshita 01 May 2017 (has links)
Non-invasively reosolving spatial distribution of tissue metabolites serves as a diagnostic tool to in-vivo metabolism thus making magnetic resonance spectroscopic imaging (MRSI) a very useful application. The tissue concentrations of various metabolites reveal disease state and pseudo-progression of tumors. Also, bio-chemical changes manifest much earlier than structural changes that are achieved using standard magnetic resonance imaging(MRI). However, MRSI has not achieved its potential due to several technical challenges that are specic to it. Several technical advances in the eld of MRI does not translate to MRSI. The specic limitations which make MRSI challenging include long scan times, poor spatial resolution, extremely low signal to noise ratio (SNR). In the last few decades, research in MRSI has focused on advanced data acquisition and reconstruction methods, however they cannot achieve high resolution and feasible scan time. Moreover there are several artifacts that lead to increase of spatial resolution not to mention starved SNR. Existing methods cannot deal with these limitations which considerably impacts applications of MRSI. This thesis work we revisit these problems and introduce data acquisition and reconstruction techniques to address several such challenges. In the first part of the thesis we introduce a variable density spiral acquisition technique which achieves high SNR corresponding to metabolites of interest while reducing truncation artifacts. Along with that we develop a novel compartmentalized reconstruction framework to recover high resolution data from lipid unsuppressed data. Avoiding lipid suppression not only reduces scan time and reliability but also improves SNR which is otherwise reduced even further with existing lipid suppression methods. The proposed algorithm exploits the idea that the lipid and metabolite compartment reside in low-dimensional subspace and we use orthogonality priors to reduce overlap of subspaces. We also look at spectral artifacts like Nyquist ghosting which is a common problem with spectral interleaving. Especially in echo-planar spectroscopic imaging (EPSI), one of the most popular MRSI techniques, maintaining a spatial and spectral resolution requires interleaving. Due to scanner inconsistencies spurious peaks arise which makes quantication inecient. In this thesis a novel structural low-rank prior is used to reduce and denoise spectra and achieve high resolution ESPI data. Finally we look at accelerating multi-dimensional spectroscopic problems. Resolving spectra in two dimensions can help study overlapping spectra and achieve more insight. However with an increased dimension the scan time increases. We developed an algorithm for accelerating this method by recovering data from undersampled measurements. We demonstrate the performance in two applications, 2D infra red spectroscopy and 2D MR spectroscopy . The aim of the thesis is to solve these challenges in MRSI from a signal processing perspective and be able to achieve higher resolution data in practical scan time to ultimately help MRSI reach its potential.

A non-asymptotic study of low-rank estimation of smooth kernels on graphs

Rangel Walteros, Pedro Andres 12 January 2015 (has links)
This dissertation investigates the problem of estimating a kernel over a large graph based on a sample of noisy observations of linear measurements of the kernel. We are interested in solving this estimation problem in the case when the sample size is much smaller than the ambient dimension of the kernel. As is typical in high-dimensional statistics, we are able to design a suitable estimator based on a small number of samples only when the target kernel belongs to a subset of restricted complexity. In our study, we restrict the complexity by considering scenarios where the target kernel is both low-rank and smooth over a graph. Using standard tools of non-parametric estimation, we derive a minimax lower bound on the least squares error in terms of the rank and the degree of smoothness of the target kernel. To prove the optimality of our lower-bound, we proceed to develop upper bounds on the error for a least-square estimator based on a non-convex penalty. The proof of these upper bounds depends on bounds for estimators over uniformly bounded function classes in terms of Rademacher complexities. We also propose a computationally tractable estimator based on least-squares with convex penalty. We derive an upper bound for the computationally tractable estimator in terms of a coherence function introduced in this work. Finally, we present some scenarios wherein this upper bound achieves a near-optimal rate. The motivations for studying such problems come from various real-world applications like recommender systems and social network analysis.

Scalable and distributed constrained low rank approximations

Kannan, Ramakrishnan 27 May 2016 (has links)
Low rank approximation is the problem of finding two low rank factors W and H such that the rank(WH) << rank(A) and A ≈ WH. These low rank factors W and H can be constrained for meaningful physical interpretation and referred as Constrained Low Rank Approximation (CLRA). Like most of the constrained optimization problem, performing CLRA can be computationally expensive than its unconstrained counterpart. A widely used CLRA is the Non-negative Matrix Factorization (NMF) which enforces non-negativity constraints in each of its low rank factors W and H. In this thesis, I focus on scalable/distributed CLRA algorithms for constraints such as boundedness and non-negativity for large real world matrices that includes text, High Definition (HD) video, social networks and recommender systems. First, I begin with the Bounded Matrix Low Rank Approximation (BMA) which imposes a lower and an upper bound on every element of the lower rank matrix. BMA is more challenging than NMF as it imposes bounds on the product WH rather than on each of the low rank factors W and H. For very large input matrices, we extend our BMA algorithm to Block BMA that can scale to a large number of processors. In applications, such as HD video, where the input matrix to be factored is extremely large, distributed computation is inevitable and the network communication becomes a major performance bottleneck. Towards this end, we propose a novel distributed Communication Avoiding NMF (CANMF) algorithm that communicates only the right low rank factor to its neighboring machine. Finally, a general distributed HPC- NMF framework that uses HPC techniques in communication intensive NMF operations and suitable for broader class of NMF algorithms.


Kang, Zhao 01 May 2017 (has links)
Nowadays, many real-world problems must deal with collections of high-dimensional data. High dimensional data usually have intrinsic low-dimensional representations, which are suited for subsequent analysis or processing. Therefore, finding low-dimensional representations is an essential step in many machine learning and data mining tasks. Low-rank and sparse modeling are emerging mathematical tools dealing with uncertainties of real-world data. Leveraging on the underlying structure of data, low-rank and sparse modeling approaches have achieved impressive performance in many data analysis tasks. Since the general rank minimization problem is computationally NP-hard, the convex relaxation of original problem is often solved. One popular heuristic method is to use the nuclear norm to approximate the rank of a matrix. Despite the success of nuclear norm minimization in capturing the low intrinsic-dimensionality of data, the nuclear norm minimizes not only the rank, but also the variance of matrix and may not be a good approximation to the rank function in practical problems. To mitigate above issue, this thesis proposes several nonconvex functions to approximate the rank function. However, It is often difficult to solve nonconvex problem. In this thesis, an optimization framework for nonconvex problem is further developed. The effectiveness of this approach is examined on several important applications, including matrix completion, robust principle component analysis, clustering, and recommender systems. Another issue associated with current clustering methods is that they work in two separate steps including similarity matrix computation and subsequent spectral clustering. The learned similarity matrix may not be optimal for subsequent clustering. Therefore, a unified algorithm framework is developed in this thesis. To capture the nonlinear relations among data points, we formulate this method in kernel space. Furthermore, the obtained continuous spectral solutions could severely deviate from the true discrete cluster labels, a discrete transformation is further incorporated in our model. Finally, our framework can simultaneously learn similarity matrix, kernel, and discrete cluster labels. The performance of the proposed algorithms is established through extensive experiments. This framework can be easily extended to semi-supervised classification.

Remote-Sensed LIDAR Using Random Impulsive Scans

Castorena, Juan 10 1900 (has links)
Third generation full-waveform (FW) LIDAR systems image an entire scene by emitting laser pulses in particular directions and measuring the echoes. Each of these echoes provides range measurements about the objects intercepted by the laser pulse along a specified direction. By scanning through a specified region using a series of emitted pulses and observing their echoes, connected 1D profiles of 3D scenes can be readily obtained. This extra information has proven helpful in providing additional insight into the scene structure which can be used to construct effective characterizations and classifications. Unfortunately, massive amounts of data are typically collected which impose storage, processing and transmission limitations. To address these problems, a number of compression approaches have been developed in the literature. These, however, generally require the initial acquisition of large amounts of data only to later discard most of it by exploiting redundancies, thus sampling inefficiently. Based on this, our main goal is to apply efficient and effective LIDAR sampling schemes that achieve acceptable reconstruction quality of the 3D scenes. To achieve this goal, we propose on using compressive sampling by emitting pulses only into random locations within the scene and collecting only the corresponding returned FW signals. Under this framework, the number of emissions would typically be much smaller than what traditional LIDAR systems require. Application of this requires, however, that scenes contain many degrees of freedom. Fortunately, such a requirement is satisfied in most natural and man-made scenes. Here, we propose to use a measure of rank as the measure of degrees of freedom. To recover the connected 1D profiles of the 3D scene, matrix completion is applied to the tensor slices. In this paper, we test our approach by showing that recovery of compressively sampled 1D profiles of actual 3D scenes is possible using only a subset of measurements.

Provable Algorithms for Scalable and Robust Low-Rank Matrix Recovery

Li, Yuanxin 09 October 2018 (has links)
No description available.

Nonnegative matrix factorization algorithms and applications

Ho, Ngoc-Diep 09 June 2008 (has links)
Data-mining has become a hot topic in recent years. It consists of extracting relevant information or structures from data such as: pictures, textual material, networks, etc. Such information or structures are usually not trivial to obtain and many techniques have been proposed to address this problem, including Independent Component Analysis, Latent Sematic Analysis, etc. Nonnegative Matrix Factorization is yet another technique that relies on the nonnegativity of the data and the nonnegativity assumption of the underlying model. The main advantage of this technique is that nonnegative objects are modeled by a combination of some basic nonnegative parts, which provides a physical interpretation of the construction of the objects. This is an exclusive feature that is known to be useful in many areas such as Computer Vision, Information Retrieval, etc. In this thesis, we look at several aspects of Nonnegative Matrix Factorization, focusing on numerical algorithms and their applications to different kinds of data and constraints. This includes Tensor Nonnegative Factorization, Weighted Nonnegative Matrix Factorization, Symmetric Nonnegative Matrix Factorization, Stochastic Matrix Approximation, etc. The recently proposed Rank-one Residue Iteration (RRI) is the common thread in all of these factorizations. It is shown to be a fast method with good convergence properties which adapts well to many situations.

On the use of low-rank arithmetic to reduce the complexity of parallel sparse linear solvers based on direct factorization techniques / Utilisation de la compression low-rank pour réduire la complexité des solveurs creux parallèles basés sur des techniques de factorisation directes.

Pichon, Grégoire 29 November 2018 (has links)
La résolution de systèmes linéaires creux est un problème qui apparaît dans de nombreuses applications scientifiques, et les solveurs creux sont une étape coûteuse pour ces applications ainsi que pour des solveurs plus avancés comme les solveurs hybrides direct-itératif. Pour ces raisons, optimiser la performance de ces solveurs pour les architectures modernes est un problème critique. Cependant, les contraintes mémoire et le temps de résolution limitent l’utilisation de ce type de solveur pour des problèmes de très grande taille. Pour les approches concurrentes, par exemple les méthodes itératives, des préconditionneurs garantissant une bonne convergence pour un large ensemble de problèmes sont toujours inexistants. Dans la première partie de cette thèse, nous présentons deux approches exploitant la compression Block Low-Rank (BLR) pour réduire la consommation mémoire et/ou le temps de résolution d’un solveur creux. Ce format de compression à plat, sans hiérarchie, permet de tirer profit du caractère low-rank des blocs apparaissant dans la factorisation de systèmes linéaires creux. La solution proposée peut être utilisée soit en tant que solveur direct avec une précision réduite, soit comme un préconditionneur très robuste. La première approche, appelée Minimal Memory, illustre le meilleur gain mémoire atteignable avec la compression BLR, alors que la seconde approche, appelée Just-In-Time, est dédiée à la réduction du nombre d’opérations, et donc du temps de résolution. Dans la seconde partie, nous présentons une stratégie de reordering qui augmente la granularité des blocs pour tirer davantage profit de la localité dans l’utilisation d’architectures multi-coeurs et pour fournir de tâches plus volumineuses aux GPUs. Cette stratégie s’appuie sur la factorisation symbolique par blocs pour raffiner la numérotation produite par des outils de partitionnement comme Metis ou Scotch, et ne modifie pas le nombre d’opérations nécessaires à la résolution du problème. A partir de cette approche, nous proposons dans la troisième partie de ce manuscrit une technique de clustering low-rank qui a pour objectif de former des clusters d’inconnues au sein d’un séparateur. Nous démontrons notamment les intérêts d’une telle approche par rapport aux techniques de clustering classiquement utilisées. Ces deux stratégies ont été développées pour le format à plat BLR, mais sont également une première étape pour le passage à un format hiérarchique. Dans la dernière partie de cette thèse, nous nous intéressons à une modification de la technique de dissection emboîtée afin d’aligner les séparateurs par rapport à leur père pour obtenir des structures de données plus régulières. / Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For those reasons, optimizing their performance on modern architectures is critical. However, memory requirements and time-to-solution limit the use of direct methods for very large matrices. For other approaches, such as iterative methods, general black-box preconditioners that can ensure fast convergence for a wide range of problems are still missing. In the first part of this thesis, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of a supernodal sparse direct solver. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. In the second part, we present a reordering strategy that increases the block granularity to better take advantage of the locality for multicores and provide larger tasks to GPUs. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. From this approach, we propose in the third part of this manuscript a new low-rank clustering technique that is designed to cluster unknowns within a separator to obtain the BLR partition, and demonstrate its assets with respect to widely used clustering strategies. Both reordering and clustering where designed for the flat BLR representation but are also a first step to move to hierarchical formats. We investigate in the last part of this thesis a modified nested dissection strategy that aligns separators with respect to their father to obtain more regular data structure.

Page generated in 0.1253 seconds