Spelling suggestions: "subject:"gaussian graphical models"" "subject:"maussian graphical models""
11 |
Quelques contributions à l'estimation de grandes matrices de précision / Some contributions to large precision matrix estimationBalmand, Samuel 27 June 2016 (has links)
Sous l'hypothèse gaussienne, la relation entre indépendance conditionnelle et parcimonie permet de justifier la construction d'estimateurs de l'inverse de la matrice de covariance -- également appelée matrice de précision -- à partir d'approches régularisées. Cette thèse, motivée à l'origine par la problématique de classification d'images, vise à développer une méthode d'estimation de la matrice de précision en grande dimension, lorsque le nombre $n$ d'observations est petit devant la dimension $p$ du modèle. Notre approche repose essentiellement sur les liens qu'entretiennent la matrice de précision et le modèle de régression linéaire. Elle consiste à estimer la matrice de précision en deux temps. Les éléments non diagonaux sont tout d'abord estimés en considérant $p$ problèmes de minimisation du type racine carrée des moindres carrés pénalisés par la norme $ell_1$.Les éléments diagonaux sont ensuite obtenus à partir du résultat de l'étape précédente, par analyse résiduelle ou maximum de vraisemblance. Nous comparons ces différents estimateurs des termes diagonaux en fonction de leur risque d'estimation. De plus, nous proposons un nouvel estimateur, conçu de sorte à tenir compte de la possible contamination des données par des {em outliers}, grâce à l'ajout d'un terme de régularisation en norme mixte $ell_2/ell_1$. L'analyse non-asymptotique de la convergence de notre estimateur souligne la pertinence de notre méthode / Under the Gaussian assumption, the relationship between conditional independence and sparsity allows to justify the construction of estimators of the inverse of the covariance matrix -- also called precision matrix -- from regularized approaches. This thesis, originally motivated by the problem of image classification, aims at developing a method to estimate the precision matrix in high dimension, that is when the sample size $n$ is small compared to the dimension $p$ of the model. Our approach relies basically on the connection of the precision matrix to the linear regression model. It consists of estimating the precision matrix in two steps. The off-diagonal elements are first estimated by solving $p$ minimization problems of the type $ell_1$-penalized square-root of least-squares. The diagonal entries are then obtained from the result of the previous step, by residual analysis of likelihood maximization. This various estimators of the diagonal entries are compared in terms of estimation risk. Moreover, we propose a new estimator, designed to consider the possible contamination of data by outliers, thanks to the addition of a $ell_2/ell_1$ mixed norm regularization term. The nonasymptotic analysis of the consistency of our estimator points out the relevance of our method
|
12 |
Uncovering Structure in High-Dimensions: Networks and Multi-task Learning ProblemsKolar, Mladen 01 July 2013 (has links)
Extracting knowledge and providing insights into complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Statistical modeling has become ubiquitous in the analysis of high dimensional functional data in search of better understanding of cognition mechanisms, in the exploration of large-scale gene regulatory networks in hope of developing drugs for lethal diseases, and in prediction of volatility in stock market in hope of beating the market. Statistical analysis in these high-dimensional data sets is possible only if an estimation procedure exploits hidden structures underlying data.
This thesis develops flexible estimation procedures with provable theoretical guarantees for uncovering unknown hidden structures underlying data generating process. Of particular interest are procedures that can be used on high dimensional data sets where the number of samples n is much smaller than the ambient dimension p. Learning in high-dimensions is difficult due to the curse of dimensionality, however, the special problem structure makes inference possible. Due to its importance for scientific discovery, we put emphasis on consistent structure recovery throughout the thesis. Particular focus is given to two important problems, semi-parametric estimation of networks and feature selection in multi-task learning.
|
13 |
Modelling of extremesHitz, Adrien January 2016 (has links)
This work focuses on statistical methods to understand how frequently rare events occur and what the magnitude of extreme values such as large losses is. It lies in a field called extreme value analysis whose scope is to provide support for scientific decision making when extreme observations are of particular importance such as in environmental applications, insurance and finance. In the univariate case, I propose new techniques to model tails of discrete distributions and illustrate them in an application on word frequency and multiple birth data. Suitably rescaled, the limiting tails of some discrete distributions are shown to converge to a discrete generalized Pareto distribution and generalized Zipf distribution respectively. In the multivariate high-dimensional case, I suggest modeling tail dependence between random variables by a graph such that its nodes correspond to the variables and shocks propagate through the edges. Relying on the ideas of graphical models, I prove that if the variables satisfy a new notion called asymptotic conditional independence, then the density of the joint distribution can be simplified and expressed in terms of lower dimensional functions. This generalizes the Hammersley- Clifford theorem and enables us to infer tail distributions from observations in reduced dimension. As an illustration, extreme river flows are modeled by a tree graphical model whose structure appears to recover almost exactly the actual river network. A fundamental concept when studying limiting tail distributions is regular variation. I propose a new notion in the multivariate case called one-component regular variation, of which Karamata's and the representation theorem, two important results in the univariate case, are generalizations. Eventually, I turn my attention to website visit data and fit a censored copula Gaussian graphical model allowing the visualization of users' behavior by a graph.
|
Page generated in 0.0773 seconds