Global ETD Search

1	Statistical methods for the testing and estimation of linear dependence structures on paired high-dimensional data : application to genomic data Mestres, Adrià Caballé January 2018 (has links) This thesis provides novel methodology for statistical analysis of paired high-dimensional genomic data, with the aimto identify gene interactions specific to each group of samples as well as the gene connections that change between the two classes of observations. An example of such groups can be patients under two medical conditions, in which the estimation of gene interaction networks is relevant to biologists as part of discerning gene regulatory mechanisms that control a disease process like, for instance, cancer. We construct these interaction networks fromdata by considering the non-zero structure of correlationmatrices, which measure linear dependence between random variables, and their inversematrices, which are commonly known as precision matrices and determine linear conditional dependence instead. In this regard, we study three statistical problems related to the testing, single estimation and joint estimation of (conditional) dependence structures. Firstly, we develop hypothesis testingmethods to assess the equality of two correlation matrices, and also two correlation sub-matrices, corresponding to two classes of samples, and hence the equality of the underlying gene interaction networks. We consider statistics based on the average of squares, maximum and sum of exceedances of sample correlations, which are suitable for both independent and paired observations. We derive the limiting distributions for the test statistics where possible and, for practical needs, we present a permuted samples based approach to find their corresponding non-parametric distributions. Cases where such hypothesis testing presents enough evidence against the null hypothesis of equality of two correlation matrices give rise to the problem of estimating two correlation (or precision) matrices. However, before that we address the statistical problem of estimating conditional dependence between random variables in a single class of samples when data are high-dimensional, which is the second topic of the thesis. We study the graphical lasso method which employs an L1 penalized likelihood expression to estimate the precision matrix and its underlying non-zero graph structure. The lasso penalization termis given by the L1 normof the precisionmatrix elements scaled by a regularization parameter, which determines the trade-off between sparsity of the graph and fit to the data, and its selection is our main focus of investigation. We propose several procedures to select the regularization parameter in the graphical lasso optimization problem that rely on network characteristics such as clustering or connectivity of the graph. Thirdly, we address the more general problem of estimating two precision matrices that are expected to be similar, when datasets are dependent, focusing on the particular case of paired observations. We propose a new method to estimate these precision matrices simultaneously, a weighted fused graphical lasso estimator. The analogous joint estimation method concerning two regression coefficient matrices, which we call weighted fused regression lasso, is also developed in this thesis under the same paired and high-dimensional setting. The two joint estimators maximize penalized marginal log likelihood functions, which encourage both sparsity and similarity in the estimated matrices, and that are solved using an alternating direction method of multipliers (ADMM) algorithm. Sparsity and similarity of thematrices are determined by two tuning parameters and we propose to choose them by controlling the corresponding average error rates related to the expected number of false positive edges in the estimated conditional dependence networks. These testing and estimation methods are implemented within the R package ldstatsHD, and are applied to a comprehensive range of simulated data sets as well as to high-dimensional real case studies of genomic data. We employ testing approaches with the purpose of discovering pathway lists of genes that present significantly different correlation matrices on healthy and unhealthy (e.g., tumor) samples. Besides, we use hypothesis testing problems on correlation sub-matrices to reduce the number of genes for estimation. The proposed joint estimation methods are then considered to find gene interactions that are common between medical conditions as well as interactions that vary in the presence of unhealthy tissues.
2	Dynamic Adaptive Robust Estimations for High-Dimensional Standardized Transelliptical Latent Networks Wu, Tzu-Chun 24 May 2022 (has links) No description available. Statistics graphical models latent precision matrix estimation rank-based estimators sparse networks transelliptical distribution
3	Estimating Dependence Structures with Gaussian Graphical Models : A Simulation Study in R / Beroendestruktur Skattning med Gaussianska Grafiska Modeller : En Simuleringsstudie i R Angelchev Shiryaev, Artem, Karlsson, Johan January 2021 (has links) Graphical models are powerful tools when estimating complex dependence structures among large sets of data. This thesis restricts the scope to undirected Gaussian graphical models. An initial predefined sparse precision matrix was specified to generate multivariate normally distributed data. Utilizing the generated data, a simulation study was conducted reviewing accuracy, sensitivity and specificity of the estimated precision matrix. The graphical LASSO was applied using four different packages available in R with seven selection criteria's for estimating the tuning parameter. The findings are mostly in line with previous research. The graphical LASSO is generally faster and feasible in high dimensions, in contrast to stepwise model selection. A portion of the selection methods for estimating the optimal tuning parameter obtained the true network structure. The results provide an estimate of how well each model obtains the true, predefined dependence structure as featured in our simulation. As the simulated data used in this thesis is merely an approximation of real-world data, one should not take the results as the only aspect of consideration when choosing a model. Simulation study Graphical models undirected Gaussian graphical model Partial correlation Precision matrix Probability Theory and Statistics Sannolikhetsteori och statistik
4	Dynamics of high-dimensional covariance matrices Avanesov, Valeriy 15 February 2018 (has links) Wir betrachten die Detektion und Lokalisation von plötzlichen Änderungen in der Kovarianzstruktur hochdimensionaler zufälliger Daten. Diese Arbeit schlägt zwei neuartige Ansätze für dieses Problem vor. Die Vorgehensweise beinhaltet im Wesentlichen Verfahren zum Test von Hypothesen, welche ihrerseits die Wahl geeigneter kritischer Werte erfordern. Dafür werden Kalibrierungsschemata vorgeschlagen, die auf unterschiedlichen Nichtstandard-Bootstrap-Verfahren beruhen. Der eine der beiden Ansätze verwendet Techniken zum Schätzen inverser Kovarianzmatrizen und ist durch Anwendungen in der neurowissenschaftlichen Bildgebung motiviert. Eine Beschränkung dieses Ansatzes besteht in der für die Schätzung der „Precision matrix“ wesentlichen Voraussetzung ihrer schwachen Besetztheit. Diese Bedingung ist im zweiten Ansatz nicht erforderlich. Die Beschreibung beider Ansätze wird gefolgt durch ihre theoretische Untersuchung, welche unter schwachen Voraussetzungen die vorgeschlagenen Kalibrierungsschemata rechtfertigt und die Detektion von Änderungen der Kovarianzstruktur gewährleistet. Die theoretischen Resultate für den ersten Ansatz basieren auf den Eigenschaften der Verfahren zum Schätzen der Präzisionsmatrix. Wir können daher die adaptiven Schätzverfahren für die Präzisionsmatrix streng rechtfertigen. Alle Resultate beziehen sich auf eine echt hochdimensionale Situation (Dimensionalität p >> n) mit endlichem Stichprobenumfang. Die theoretischen Ergebnisse werden durch Simulationsstudien untermauert, die durch reale Daten aus den Neurowissenschaften oder dem Finanzwesen inspiriert sind. / We consider the detection and localization of an abrupt break in the covariance structure of high-dimensional random data. The study proposes two novel approaches for this problem. The approaches are essentially hypothesis testing procedures which requires a proper choice of a critical level. In that regard calibration schemes, which are in turn different non-standard bootstrap procedures, are proposed. One of the approaches relies on techniques of inverse covariance matrix estimation, which is motivated by applications in neuroimaging. A limitation of the approach is a sparsity assumption crucial for precision matrix estimation which the second approach does not rely on. The description of the approaches are followed by a formal theoretical study justifying the proposed calibration schemes under mild assumptions and providing the guaranties for the break detection. Theoretical results for the first approach rely on the guaranties for inference of precision matrix procedures. Therefore, we rigorously justify adaptive inference procedures for precision matrices. All the results are obtained in a truly high-dimensional (dimensionality p >> n) finite-sample setting. The theoretical results are supported by simulation studies, most of which are inspired by either real-world neuroimaging or financial data. Bootstrap Strukturelle Veränderung Kovarianzmatrix Präzisionsmatrix bootstrap structural change precision matrix covariance matrix 510 Mathematik SK 840 ddc:510
5	Joint Gaussian Graphical Model for multi-class and multi-level data Shan, Liang 01 July 2016 (has links) Gaussian graphical model has been a popular tool to investigate conditional dependency between random variables by estimating sparse precision matrices. The estimated precision matrices could be mapped into networks for visualization. For related but different classes, jointly estimating networks by taking advantage of common structure across classes can help us better estimate conditional dependencies among variables. Furthermore, there may exist multilevel structure among variables; some variables are considered as higher level variables and others are nested in these higher level variables, which are called lower level variables. In this dissertation, we made several contributions to the area of joint estimation of Gaussian graphical models across heterogeneous classes: the first is to propose a joint estimation method for estimating Gaussian graphical models across unbalanced multi-classes, whereas the second considers multilevel variable information during the joint estimation procedure and simultaneously estimates higher level network and lower level network. For the first project, we consider the problem of jointly estimating Gaussian graphical models across unbalanced multi-class. Most existing methods require equal or similar sample size among classes. However, many real applications do not have similar sample sizes. Hence, in this dissertation, we propose the joint adaptive graphical lasso, a weighted L1 penalized approach, for unbalanced multi-class problems. Our joint adaptive graphical lasso approach combines information across classes so that their common characteristics can be shared during the estimation process. We also introduce regularization into the adaptive term so that the unbalancedness of data is taken into account. Simulation studies show that our approach performs better than existing methods in terms of false positive rate, accuracy, Mathews correlation coefficient, and false discovery rate. We demonstrate the advantage of our approach using liver cancer data set. For the second one, we propose a method to jointly estimate the multilevel Gaussian graphical models across multiple classes. Currently, methods are still limited to investigate a single level conditional dependency structure when there exists the multilevel structure among variables. Due to the fact that higher level variables may work together to accomplish certain tasks, simultaneously exploring conditional dependency structures among higher level variables and among lower level variables are of our main interest. Given multilevel data from heterogeneous classes, our method assures that common structures in terms of the multilevel conditional dependency are shared during the estimation procedure, yet unique structures for each class are retained as well. Our proposed approach is achieved by first introducing a higher level variable factor within a class, and then common factors across classes. The performance of our approach is evaluated on several simulated networks. We also demonstrate the advantage of our approach using breast cancer patient data. / Ph. D. Bias Correction Gaussian graphical model Heterogeneous classes Joint adaptive graphical lasso Joint estimation Multilevel network Precision matrix Unbalanced multi-class.
6	On regularized estimation methods for precision and covariance matrix and statistical network inference Kuismin, M. (Markku) 14 November 2018 (has links) Abstract Estimation of the covariance matrix is an important problem in statistics in general because the covariance matrix is an essential part of principal component analysis, statistical pattern recognition, multivariate regression and network exploration, just to mention but a few applications. Penalized likelihood methods are used when standard estimates cannot be computed. This is a common case when the number of explanatory variables is much larger compared to the sample size (high-dimensional case). An alternative ridge-type estimator for the precision matrix estimation is introduced in Article I. This estimate is derived using a penalized likelihood estimation method. Undirected networks, which are connected to penalized covariance and precision matrix estimation and some applications related to networks are also explored in this dissertation. In Article II novel statistical methods are used to infer population networks from discrete measurements of genetic data. More precisely, Least Absolute Shrinkage and Selection Operator, LASSO for short, is applied in neighborhood selection. This inferred network is used for more detailed inference of population structures. We illustrate how community detection can be a promising tool in population structure and admixture exploration of genetic data. In addition, in Article IV it is shown how the precision matrix estimator introduced in Article I can be used in graphical model selection via a multiple hypothesis testing procedure. Article III in this dissertation contains a review of current tools for practical graphical model selection and precision/covariance matrix estimation. The other three publications have detailed descriptions of the fundamental computational and mathematical results which create a basis for the methods presented in these articles. Each publication contains a collection of practical research questions where the novel methods can be applied. We hope that these applications will help readers to better understand the possible applications of the methods presented in this dissertation. / Tiivistelmä Kovarianssimatriisin estimointi on yleisesti ottaen tärkeä tilastotieteen ongelma, koska kovarianssimatriisi on oleellinen osa pääkomponenttianalyysia, tilastollista hahmontunnistusta, monimuuttujaregressiota ja verkkojen tutkimista, vain muutamia sovellutuksia mainitakseni. Sakotettuja suurimman uskottavuuden menetelmiä käytetään sellaisissa tilanteissa, joissa tavanomaisia estimaatteja ei voida laskea. Tämä on tyypillistä tilanteessa, jossa selittävien muuttujien lukumäärä on hyvin suuri verrattuna otoskokoon (englanninkielisessä kirjallisuudessa tämä tunnetaan nimellä ”high dimensional case”). Ensimmäisessä artikkelissa esitellään vaihtoehtoinen harjanne (ridge)-tyyppinen estimaattori tarkkuusmatriisin estimointiin. Tämä estimaatti on johdettu käyttäen sakotettua suurimman uskottavuuden estimointimenetelmää. Tässä väitöskirjassa käsitellään myös suuntaamattomia verkkoja, jotka liittyvät läheisesti sakotettuun kovarianssi- ja tarkkuusmatriisin estimointiin, sekä joitakin verkkoihin liittyviä sovelluksia. Toisessa artikkelissa käytetään uusia tilastotieteen menetelmiä populaatioverkon päättelyyn epäjatkuvista mittauksista. Tarkemmin sanottuna Lassoa (Least Absolute Shrinkage and Selection Operator) sovelletaan naapuruston valinnassa. Näin muodostettua verkkoa hyödynnetään tarkemmassa populaatiorakenteen tarkastelussa. Havainnollistamme, kuinka verkon kommuunien (communities) tunnistaminen saattaa olla lupaava tapa tutkia populaatiorakennetta ja populaation sekoittumista (admixture) geneettisestä datasta. Lisäksi neljännessä artikkelissa näytetään, kuinka ensimmäisessä artikkelissa esiteltyä tarkkuusmatriisin estimaattoria voidaan käyttää graafisessa mallinvalinnassa usean hypoteesin testauksen avulla. Tämän väitöskirjan kolmas artikkeli sisältää yleiskatsauksen tämänhetkisistä työkaluista, joiden avulla voidaan valita graafinen malli ja estimoida tarkkuus- sekä kovarianssimatriiseja. Muissa kolmessa julkaisussa on kuvailtu yksityiskohtaisesti olennaisia laskennallisista ja matemaattisista tuloksista, joihin artikkeleissa esitellyt estimointimenetelmät perustuvat. Jokaisessa julkaisussa on kokoelma käytännöllisiä tutkimuskysymyksiä, joihin voidaan soveltaa uusia estimointimenetelmiä. Toivomme, että nämä sovellukset auttavat lukijaa ymmärtämään paremmin tässä väitöskirjassa esiteltyjen menetelmien käyttömahdollisuuksia. LASSO covariance matrix graphical model network estimation precision matrix ridge Lasso graafinen malli kovarianssimatriisi ridge tarkkuusmatriisi verkkojen estimointi high-dimensional setting
7	Quelques contributions à l'estimation de grandes matrices de précision / Some contributions to large precision matrix estimation Balmand, Samuel 27 June 2016 (has links) Sous l'hypothèse gaussienne, la relation entre indépendance conditionnelle et parcimonie permet de justifier la construction d'estimateurs de l'inverse de la matrice de covariance -- également appelée matrice de précision -- à partir d'approches régularisées. Cette thèse, motivée à l'origine par la problématique de classification d'images, vise à développer une méthode d'estimation de la matrice de précision en grande dimension, lorsque le nombre $n$ d'observations est petit devant la dimension $p$ du modèle. Notre approche repose essentiellement sur les liens qu'entretiennent la matrice de précision et le modèle de régression linéaire. Elle consiste à estimer la matrice de précision en deux temps. Les éléments non diagonaux sont tout d'abord estimés en considérant $p$ problèmes de minimisation du type racine carrée des moindres carrés pénalisés par la norme $ell_1$.Les éléments diagonaux sont ensuite obtenus à partir du résultat de l'étape précédente, par analyse résiduelle ou maximum de vraisemblance. Nous comparons ces différents estimateurs des termes diagonaux en fonction de leur risque d'estimation. De plus, nous proposons un nouvel estimateur, conçu de sorte à tenir compte de la possible contamination des données par des {em outliers}, grâce à l'ajout d'un terme de régularisation en norme mixte $ell_2/ell_1$. L'analyse non-asymptotique de la convergence de notre estimateur souligne la pertinence de notre méthode / Under the Gaussian assumption, the relationship between conditional independence and sparsity allows to justify the construction of estimators of the inverse of the covariance matrix -- also called precision matrix -- from regularized approaches. This thesis, originally motivated by the problem of image classification, aims at developing a method to estimate the precision matrix in high dimension, that is when the sample size $n$ is small compared to the dimension $p$ of the model. Our approach relies basically on the connection of the precision matrix to the linear regression model. It consists of estimating the precision matrix in two steps. The off-diagonal elements are first estimated by solving $p$ minimization problems of the type $ell_1$-penalized square-root of least-squares. The diagonal entries are then obtained from the result of the previous step, by residual analysis of likelihood maximization. This various estimators of the diagonal entries are compared in terms of estimation risk. Moreover, we propose a new estimator, designed to consider the possible contamination of data by outliers, thanks to the addition of a $ell_2/ell_1$ mixed norm regularization term. The nonasymptotic analysis of the consistency of our estimator points out the relevance of our method Estimation de la matrice de précision Régression parcimonieuse Modèles graphiques gaussiens Estimation robuste Analyse non-Asymptotique Minimisation convexe Precision matrix estimation Sparse regression Gaussian graphical models Robust estimation Nonasymptotic analysis Convex minimization
8	Addressing Challenges in Graphical Models: MAP estimation, Evidence, Non-Normality, and Subject-Specific Inference Sagar K N Ksheera (15295831) 17 April 2023 (has links) <p>Graphs are a natural choice for understanding the associations between variables, and assuming a probabilistic embedding for the graph structure leads to a variety of graphical models that enable us to understand these associations even further. In the realm of high-dimensional data, where the number of associations between interacting variables is far greater than the available number of data points, the goal is to infer a sparse graph. In this thesis, we make contributions in the domain of Bayesian graphical models, where our prior belief on the graph structure, encoded via uncertainty on the model parameters, enables the estimation of sparse graphs.</p> <p><br></p> <p>We begin with the Gaussian Graphical Model (GGM) in Chapter 2, one of the simplest and most famous graphical models, where the joint distribution of interacting variables is assumed to be Gaussian. In GGMs, the conditional independence among variables is encoded in the inverse of the covariance matrix, also known as the precision matrix. Under a Bayesian framework, we propose a novel prior--penalty dual called the `graphical horseshoe-like' prior and penalty, to estimate precision matrix. We also establish the posterior convergence of the precision matrix estimate and the frequentist consistency of the maximum a posteriori (MAP) estimator.</p> <p><br></p> <p>In Chapter 3, we develop a general framework based on local linear approximation for MAP estimation of the precision matrix in GGMs. This general framework holds true for any graphical prior, where the element-wise priors can be written as a Laplace scale mixture. As an application of the framework, we perform MAP estimation of the precision matrix under the graphical horseshoe penalty.</p> <p><br></p> <p>In Chapter 4, we focus on graphical models where the joint distribution of interacting variables cannot be assumed Gaussian. Motivated by the quantile graphical models, where the Gaussian likelihood assumption is relaxed, we draw inspiration from the domain of precision medicine, where personalized inference is crucial to tailor individual-specific treatment plans. With an aim to infer Directed Acyclic Graphs (DAGs), we propose a novel quantile DAG learning framework, where the DAGs depend on individual-specific covariates, making personalized inference possible. We demonstrate the potential of this framework in the regime of precision medicine by applying it to infer protein-protein interaction networks in Lung adenocarcinoma and Lung squamous cell carcinoma.</p> <p><br></p> <p>Finally, we conclude this thesis in Chapter 5, by developing a novel framework to compute the marginal likelihood in a GGM, addressing a longstanding open problem. Under this framework, we can compute the marginal likelihood for a broad class of priors on the precision matrix, where the element-wise priors on the diagonal entries can be written as gamma or scale mixtures of gamma random variables and those on the off-diagonal terms can be represented as normal or scale mixtures of normal. This result paves new roads for model selection using Bayes factors and tuning of prior hyper-parameters.</p> Applied statistics Biostatistics Computational statistics Statistical data science Statistical theory graphical models non-convex optimization posterior concentration posterior consistency sparsity complete monotonicity graph structure learning graphical horseshoe prior precision matrix estimation Global-local shrinkage priors Precision medicine Quantile regression Varying sparsity model Bayes factor Chib's method Marginal likelihood

Search results