Spelling suggestions: "subject:"1atrix completion"" "subject:"1atrix ompletion""
11 |
Cooperative Wideband Spectrum Sensing Based on Joint Sparsityjowkar, ghazaleh 01 January 2017 (has links)
COOPERATIVE WIDEBAND SPECTRUM SENSING BASED ON JOINT SPARSITY
By Ghazaleh Jowkar, Master of Science
A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University
Virginia Commonwealth University 2017
Major Director: Dr. Ruixin Niu, Associate Professor of Department of Electrical and Computer Engineering
In this thesis, the problem of wideband spectrum sensing in cognitive radio (CR) networks using sub-Nyquist sampling and sparse signal processing techniques is investigated. To mitigate multi-path fading, it is assumed that a group of spatially dispersed SUs collaborate for wideband spectrum sensing, to determine whether or not a channel is occupied by a primary user (PU). Due to the underutilization of the spectrum by the PUs, the spectrum matrix has only a small number of non-zero rows. In existing state-of-the-art approaches, the spectrum sensing problem was solved using the low-rank matrix completion technique involving matrix nuclear-norm minimization. Motivated by the fact that the spectrum matrix is not only low-rank, but also sparse, a spectrum sensing approach is proposed based on minimizing a mixed-norm of the spectrum matrix instead of low-rank matrix completion to promote the joint sparsity among the column vectors of the spectrum matrix. Simulation results are obtained, which demonstrate that the proposed mixed-norm minimization approach outperforms the low-rank matrix completion based approach, in terms of the PU detection performance. Further we used mixed-norm minimization model in multi time frame detection. Simulation results shows that increasing the number of time frames will increase the detection performance, however, by increasing the number of time frames after a number of times the performance decrease dramatically.
|
12 |
Neural Inductive Matrix Completion for Predicting Disease-Gene AssociationsHou, Siqing 21 May 2018 (has links)
In silico prioritization of undiscovered associations can help find causal genes of newly discovered diseases. Some existing methods are based on known associations, and side information of diseases and genes. We exploit the possibility of using a neural network model, Neural inductive matrix completion (NIMC), in disease-gene prediction. Comparing to the state-of-the-art inductive matrix completion method, using neural networks allows us to learn latent features from non-linear functions of input features.
Previous methods use disease features only from mining text. Comparing to text mining, disease ontology is a more informative way of discovering correlation of dis- eases, from which we can calculate the similarities between diseases and help increase the performance of predicting disease-gene associations.
We compare the proposed method with other state-of-the-art methods for pre- dicting associated genes for diseases from the Online Mendelian Inheritance in Man (OMIM) database. Results show that both new features and the proposed NIMC model can improve the chance of recovering an unknown associated gene in the top 100 predicted genes. Best results are obtained by using both the new features and the new model. Results also show the proposed method does better in predicting associated genes for newly discovered diseases.
|
13 |
Matrix completion : statistical and computational aspects / Complétion de matrice : aspects statistiques et computationnelsLafond, Jean 19 December 2016 (has links)
Dans cette thèse nous nous intéressons aux méthodes de complétion de matrices de faible rang et étudions certains problèmes reliés. Un premier ensemble de résultats visent à étendre les garanties statistiques existantes pour les modèles de complétion avec bruit additif sous-gaussiens à des distributions plus générales. Nous considérons en particulier les distributions multinationales et les distributions appartenant à la famille exponentielle. Pour ces dernières, nous prouvons l'optimalité (au sens minimax) à un facteur logarithmique près des estimateurs à pénalité norme trace. Un second ensemble de résultats concernent l'algorithme du gradient conditionnel qui est notamment utilisé pour calculer les estimateurs précédents. Nous considérons en particulier deux algorithmes de type gradient conditionnel dans le cadre de l'optimisation stochastique. Nous donnons les conditions sous lesquelles ces algorithmes atteignent les performance des algorithmes de type gradient projeté. / This thesis deals with the low rank matrix completion methods and focuses on some related problems, of both statistical and algorithmic nature. The first part of this work extends the existing statistical guarantees obained for sub-Gaussian additive noise models, to more general distributions. In particular,we provide upper bounds on the prediction error of trace norm penalized estimatorwith high probability for multinomial distributions and for distributions belonging to the exponential family. For the latter, we prove that the trace norm penalized estimators are minimax optimal up to a logarithmic factor by giving a lower bound.The second part of this work focuses on the conditionnal gradient algorithm, which is used in particular to compute previous estimators. We consider the stochastic optimization framework and gives the convergence rate of twovariants of the conditional gradient algorithm. We gives the conditions under which these algorithms match the performance of projected gradient algorithms.
|
14 |
GRAPH-BASED ANALYSIS OF NON-RANDOM MISSING DATA PROBLEMS WITH LOW-RANK NATURE: STRUCTURED PREDICTION, MATRIX COMPLETION AND SPARSE PCAHanbyul Lee (17586345) 09 December 2023 (has links)
<p dir="ltr">In most theoretical studies on missing data analysis, data is typically assumed to be missing according to a specific probabilistic model. However, such assumption may not accurately reflect real-world situations, and sometimes missing is not purely random. In this thesis, our focus is on analyzing incomplete data matrices without relying on any probabilistic model assumptions for the missing schemes. To characterize a missing scheme deterministically, we employ a graph whose adjacency matrix is a binary matrix that indicates whether each matrix entry is observed or not. Leveraging its graph properties, we mathematically represent the missing pattern of an incomplete data matrix and conduct a theoretical analysis of how this non-random missing pattern affects the solvability of specific problems related to incomplete data. This dissertation primarily focuses on three types of incomplete data problems characterized by their low-rank nature: structured prediction, matrix completion, and sparse PCA.</p><p dir="ltr">First, we investigate a basic structured prediction problem, which involves recovering binary node labels on a fixed undirected graph, where noisy binary observations corresponding to edges are given. Essentially, this setting parallels a simple binary rank-1 symmetric matrix completion problem, where missing entries are determined by a fixed undirected graph. Our aim is to establish the fundamental limit bounds of this problem, revealing a close association between the limits and graph properties, such as connectivity.</p><p dir="ltr">Second, we move on to the general low-rank matrix completion problem. In this study, we establish provable guarantees for exact and approximate low-rank matrix completion problems that can be applied to any non-random missing pattern, by utilizing the observation graph corresponding to the missing scheme. We theoretically and experimentally show that the standard constrained nuclear norm minimization algorithm can successfully recover the true matrix when the observation graph is well-connected and has similar node degrees. We also verify that matrix completion is achievable with a near-optimal sample complexity rate when the observation graph has uniform node degrees and its adjacency matrix has a large spectral gap.</p><p dir="ltr">Finally, we address the sparse PCA problem, featuring an approximate low-rank attribute. Missing data is common in situations where sparse PCA is useful, such as single-cell RNA sequence data analysis. We propose a semidefinite relaxation of the non-convex $\ell_1$-regularized PCA problem to solve sparse PCA on incomplete data. We demonstrate that the method is particularly effective when the observation pattern has favorable properties. Our theory is substantiated through synthetic and real data analysis, showcasing the superior performance of our algorithm compared to other sparse PCA approaches, especially when the observed data pattern has specific characteristics.</p>
|
15 |
Overcoming the Curse of Missing and Noisy Data in Computational Drug DesignMeng, Fanwang January 2022 (has links)
Machine learning (ML) has enjoyed great success in chemistry and drug design, from designing synthetic pathways to drug screening, to biomolecular property predictions, etc.. However, ML model's generalizability and robustness require high-quality training data, which is often difficult to obtain, especially when the training data is acquired from experimental measurements. While one can always discard all data associated with noisy and/or missing values, this often results in discarding invaluable data.
This thesis presents and applies mathematical techniques to solve this problem, and applies them to problems in molecular medicinal chemistry. In chapter 1, we indicate that the missing-data problem can be expressed as a matrix completion problem, and we point out how frequently matrix completion problems arise in (bio)chemical problems. Next, we use matrix completion to impute the missing values in protein-NMR data, and use this as a stepping-stone for understanding protein allostery in Chapter 2. This chapter also used several other techniques from statistical data analysis and machine learning, including denoising (from robust principal component analysis), latent feature identification from singular-value decomposition, and residue clustering by a Gaussian mixture model.
In chapter 3, Δ-learning was used to predict the free energies of hydration (Δ𝐺). The aim of this study is to correct estimated hydration energies from low-level quantum chemistry calculations using continuum solvation models without significant additional computation. Extensive feature engineering, with 8 different regression algorithms and with Gaussian process regression (38 different kernels) were used to construct the predictive models. The optimal model gives us MAE of 0.6249 kcal/mol and RMSE of 1.0164 kcal/mol. Chapter 4 provides an open-source computational tool Procrustes to find the maximum similarities between metrics. Some examples are also given to show how to use Procrustes for chemical and biological problems. Finally, in Chapters 5 and 6, a database for permeability of the blood-brain barrier (BBB) was curated, and combined with resampling strategies to form predictive models. The resulting models have promising performance and are released along with a computational tool B3clf for its evaluation. / Thesis / Doctor of Science (PhD)
|
16 |
A Study of Machine Learning Approaches for Integrated Biomedical Data AnalysisChang, Yi Tan 29 June 2018 (has links)
This thesis consists of two projects in which various machine learning approaches and statistical analysis for the integration of biomedical data analysis were explored, developed and tested. Integration of different biomedical data sources allows us to get a better understating of human body from a bigger picture. If we can get a more complete view of the data, we not only get a more complete view of the molecule basis of phenotype, but also possibly can identify abnormality in diseases which were not found when using only one type of biomedical data. The objective of the first project is to find biological pathways which are related to Duechenne Muscular Dystrophy(DMD) and Lamin A/C(LMNA) using the integration of multi-omics data. We proposed a novel method which allows us to integrate proteins, mRNAs and miRNAs to find disease related pathways. The goal of the second project is to develop a personalized recommendation system which recommend cancer treatments to patients. Compared to the traditional way of using only users' rating to impute missing values, we proposed a method to incorporate users' profile to help enhance the accuracy of the prediction. / Master of Science / There are two existing major problems in the biomedical field. Previously, researchers only used one data type for analysis. However, one measurement does not fully capture the processes at work and can lead to inaccurate result with low sensitivity and specificity. Moreover, there are too many missing values in the biomedical data. This left us with many questions unanswered and can lead us to draw wrong conclusions from the data. To overcome these problems, we would like to integrate multiple data types which not only better captures the complex biological processes but also leads to a more comprehensive characterization. Moreover, utilizing the correlation among various data structures also help us impute missing values in the biomedical datasets.
For my two research projects, we are interested in integrating multiple biological data to identify disease specific pathways and predict unknown treatment responses for cancer patients. In this thesis, we propose a novel approach for pathways identification using the integration of multi-omics data. Secondly, we also develop a recommendation system which combines different types of patients’ medical information for missing treatment responses’ prediction. Our goal is that we would find disease related pathways for the first project and enhance missing treatment response’s prediction for the second project with the methods we develop.
The findings of my studies show that it is possible to find pathways related to muscular dystrophies using the integration of multi-omics data. Moreover, we also demonstrate that incorporating patient’s genetic profile can improve the prediction accuracy compared to using the treatment responses matrix alone for imputation.
|
17 |
Efficient algorithms for compressed sensing and matrix completionWei, Ke January 2014 (has links)
Compressed sensing and matrix completion are two new data acquisition techniques whose efficiency is achieved by exploring low dimensional structures in high dimensional data. Despite the combinatorial nature of compressed sensing and matrix completion, there has been significant development of computationally efficient algorithms which can produce accurate desired solutions to these problems. In this thesis, we are concerned with the development of low per iteration computational complexity algorithms for compressed sensing and matrix completion. First, we derive a locally optimal stepsize selection rule for the simplest iterative hard thresholding algorithm for matrix completion, and obtain a simple yet efficient algorithm. It is observed to have average case performance superior in some aspects to other matrix completion algorithms. To balance the fast convergence rates of more sophisticated recovery algorithms with the low per iteration computational cost of simple line-search algorithms, we introduce a family of conjugate gradient iterative hard thresholding algorithms for both compressed sensing and matrix completion. The theoretical results establish recovery guarantees for the restarted and projected variants of the algorithms, while the empirical performance comparisons establish significant computational advantages of the proposed methods over other hard thresholding algorithms. Finally, we introduce an alternating steepest descent method and a scaled variant especially designed for the matrix completion problem based on a simple factorization model of the low rank matrix. The computational efficacy of this method is achieved by reducing the high per iteration computational cost of the second order method and fully exploring the numerical linear algebra structure in the algorithm. Empirical evaluations establish the effectiveness of the proposed algorithms, compared with other state-of-the-art algorithms.
|
18 |
Provable alternating minimization for non-convex learning problemsNetrapalli, Praneeth Kumar 17 September 2014 (has links)
Alternating minimization (AltMin) is a generic term for a widely popular approach in non-convex learning: often, it is possible to partition the variables into two (or more) sets, so that the problem is convex/tractable in one set if the other is held fixed (and vice versa). This allows for alternating between optimally updating one set of variables, and then the other. AltMin methods typically do not have associated global consistency guarantees; even though they are empirically observed to perform better than methods (e.g. based on convex optimization) that do have guarantees. In this thesis, we obtain rigorous performance guarantees for AltMin in three statistical learning settings: low rank matrix completion, phase retrieval and learning sparsely-used dictionaries. The overarching theme behind our results consists of two parts: (i) devising new initialization procedures (as opposed to doing so randomly, as is typical), and (ii) establishing exponential local convergence from this initialization. Our work shows that the pursuit of statistical guarantees can yield algorithmic improvements (initialization in our case) that perform better in practice. / text
|
19 |
Non-convex methods for spectrally sparse signal reconstruction via low-rank Hankel matrix completionWang, Tianming 01 May 2018 (has links)
Spectrally sparse signals arise in many applications of signal processing. A spectrally sparse signal is a mixture of a few undamped or damped complex sinusoids. An important problem from practice is to reconstruct such a signal from partial time domain samples. Previous convex methods have the drawback that the computation and storage costs do not scale well with respect to the signal length. This common drawback restricts their applicabilities to large and high-dimensional signals.
The reconstruction of a spectrally sparse signal from partial samples can be formulated as a low-rank Hankel matrix completion problem. We develop two fast and provable non-convex solvers, FIHT and PGD. FIHT is based on Riemannian optimization while PGD is based on Burer-Monteiro factorization with projected gradient descent. Suppose the underlying spectrally sparse signal is of model order r and length n. We prove that O(r^2log^2(n)) and O(r^2log(n)) random samples are sufficient for FIHT and PGD respectively to achieve exact recovery with overwhelming probability. Every iteration, the computation and storage costs of both methods are linear with respect to signal length n. Therefore they are suitable for handling spectrally sparse signals of large size, which may be prohibited for previous convex methods. Extensive numerical experiments verify their recovery abilities as well as computation efficiency, and also show that the algorithms are robust to noise and mis-specification of the model order. Comparing the two solvers, FIHT is faster for easier problems while PGD has a better recovery ability.
|
20 |
Semidefinite Facial Reduction for Low-Rank Euclidean Distance Matrix CompletionKrislock, Nathan January 2010 (has links)
The main result of this thesis is the development of a theory of semidefinite facial reduction for the Euclidean distance matrix completion problem. Our key result shows a close connection between cliques in the graph of the partial Euclidean distance matrix and faces of the semidefinite cone containing the feasible set of the semidefinite relaxation. We show how using semidefinite facial reduction allows us to dramatically reduce the number of variables and constraints required to represent the semidefinite feasible set. We have used this theory to develop a highly efficient algorithm capable of solving many very large Euclidean distance matrix completion problems exactly, without the need for a semidefinite optimization solver. For problems with a low level of noise, our SNLSDPclique algorithm outperforms existing algorithms in terms of both CPU time and accuracy. Using only a laptop, problems of size up to 40,000 nodes can be solved in under a minute and problems with 100,000 nodes require only a few minutes to solve.
|
Page generated in 0.0655 seconds