Global ETD Search

11	The polyphonic Latin hymns of Orlando di Lasso a liturgical and repertorial study / Zager, Daniel. January 1985 (has links) Thesis (Ph. D.)--University of Minnesota, 1985. / Typescript. Includes bibliographical references (leaves 300-310). Lasso, Orlando di, Hymns, Latin
12	Threshold Regression Estimation via Lasso, Elastic-Net, and Lad-Lasso: A Simulation Study with Applications to Urban Traffic Data January 2015 (has links) abstract: Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize an iterative search procedure, seeking to minimize the sum of squares criterion. However, when unnecessary variables are included in the model or certain variables drop out of the model depending on the regime, this method may have high variability. This paper proposes Lasso-type methods as an alternative to ordinary least squares. By incorporating an L_{1} penalty term, Lasso methods perform variable selection, thus potentially reducing some of the variance in estimating the threshold parameter. This paper discusses the results of a study in which two different underlying model structures were simulated. The first is a regression model with correlated predictors, whereas the second is a self-exciting threshold autoregressive model. Finally the proposed Lasso-type methods are compared to conventional methods in an application to urban traffic data. / Dissertation/Thesis / Masters Thesis Industrial Engineering 2015 Statistics Lasso SETAR Threshold Regression
13	On ridge regression and least absolute shrinkage and selection operator AlNasser, Hassan 30 August 2017 (has links) This thesis focuses on ridge regression (RR) and least absolute shrinkage and selection operator (lasso). Ridge properties are being investigated in great detail which include studying the bias, the variance and the mean squared error as a function of the tuning parameter. We also study the convexity of the trace of the mean squared error in terms of the tuning parameter. In addition, we examined some special properties of RR for factorial experiments. Not only do we review ridge properties, we also review lasso properties because they are somewhat similar. Rather than shrinking the estimates toward zero in RR, the lasso is able to provide a sparse solution, setting many coefficient estimates exaclty to zero. Furthermore, we try a new approach to solve the lasso problem by formulating it as a bilevel problem and implementing a new algorithm to solve this bilevel program. / Graduate LASSO Ridge Regression Bilevel Optimization
14	Statistical analysis of high dimensional data Ruan, Lingyan 05 November 2010 (has links) This century is surely the century of data (Donoho, 2000). Data analysis has been an emerging activity over the last few decades. High dimensional data is in particular more and more pervasive with the advance of massive data collection system, such as microarrays, satellite imagery, and financial data. However, analysis of high dimensional data is of challenge with the so called curse of dimensionality (Bellman 1961). This research dissertation presents several methodologies in the application of high dimensional data analysis. The first part discusses a joint analysis of multiple microarray gene expressions. Microarray analysis dates back to Golub et al. (1999). It draws much attention after that. One common goal of microarray analysis is to determine which genes are differentially expressed. These genes behave significantly differently between groups of individuals. However, in microarray analysis, there are thousands of genes but few arrays (samples, individuals) and thus relatively low reproducibility remains. It is natural to consider joint analyses that could combine microarrays from different experiments effectively in order to achieve improved accuracy. In particular, we present a model-based approach for better identification of differentially expressed genes by incorporating data from different studies. The model can accommodate in a seamless fashion a wide range of studies including those performed at different platforms, and/or under different but overlapping biological conditions. Model-based inferences can be done in an empirical Bayes fashion. Because of the information sharing among studies, the joint analysis dramatically improves inferences based on individual analysis. Simulation studies and real data examples are presented to demonstrate the effectiveness of the proposed approach under a variety of complications that often arise in practice. The second part is about covariance matrix estimation in high dimensional data. First, we propose a penalised likelihood estimator for high dimensional t-distribution. The student t-distribution is of increasing interest in mathematical finance, education and many other applications. However, the application in t-distribution is limited by the difficulty in the parameter estimation of the covariance matrix for high dimensional data. We show that by imposing LASSO penalty on the Cholesky factors of the covariance matrix, EM algorithm can efficiently compute the estimator and it performs much better than other popular estimators. Secondly, we propose an estimator for high dimensional Gaussian mixture models. Finite Gaussian mixture models are widely used in statistics thanks to its great flexibility. However, parameter estimation for Gaussian mixture models with high dimensionality can be rather challenging because of the huge number of parameters that need to be estimated. For such purposes, we propose a penalized likelihood estimator to specifically address such difficulties. The LASSO penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps reducing the dimensionality of the problem. We show that the proposed estimator can be efficiently computed via an Expectation-Maximization algorithm. To illustrate the practical merits of the proposed method, we consider its application in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool in handling high dimensional data. Finally, we present structured estimators for high dimensional Gaussian mixture models. The graphical representation of every cluster in Gaussian mixture models may have the same or similar structure, which is an important feature in many applications, such as image processing, speech recognition and gene network analysis. Failure to consider the sharing structure would deteriorate the estimation accuracy. To address such issues, we propose two structured estimators, hierarchical Lasso estimator and group Lasso estimator. An EM algorithm can be applied to conveniently solve the estimation problem. We show that when clusters share similar structures, the proposed estimator perform much better than the separate Lasso estimator. Group lasso Hierarchical lasso Lasso Microarrays Covariance matrix Gaussian mixture models Joint analysis Statistics Analysis of covariance
15	The Linkage Disequilibrium LASSO for SNP Selection in Genetic Association Studies Younkin, Samuel G. January 2011 (has links) No description available. Biostatistics Epidemiology Genetics Statistics LASSO fused LASSO Genetic Association R R package Haplotype Block LD Linkage Disequilibrium LD LASSO
16	Adaptive learning in lasso models Patnaik, Kaushik 07 January 2016 (has links) Regression with L1-regularization, Lasso, is a popular algorithm for recovering the sparsity pattern (also known as model selection) in linear models from observations contaminated by noise. We examine a scenario where a fraction of the zero co-variates are highly correlated with non-zero co-variates making sparsity recovery difficult. We propose two methods that adaptively increment the regularization parameter to prune the Lasso solution set. We prove that the algorithms achieve consistent model selection with high probability while using fewer samples than traditional Lasso. The algorithm can be extended to a broad set of L1-regularized M-estimators for linear statistical models. Lasso L1 regression Adaptive methods Active learning
17	Modeling Non-Linear Relationships Between DNA Methylation And Age: The Application of Regularization Methods To Predict Human Age And The Implication Of DNA Methylation In Immunosenescence Johnson, Nicholas 13 May 2016 (has links) Background: Gene expression is regulated via highly coordinated epigenetic changes, the most studied of which is DNA methylation (DNAm). Many studies have shown that DNAm is linearly associated with age, and some have even used DNAm data to build predictive models of human age, which are immensely important considering that DNAm can predict health outcomes, such as all-cause mortality, better than chronological age. Nevertheless, few studies have investigated non-linear relationships between DNAm and age, which could potentially improve these predictive models. While such investigations are relevant to predicting health outcomes, non-linear relationships between DNAm and age can also add to our understanding of biological responses to late-life events, such as diseases that afflict the elderly. Objectives: We aim to (1) examine non-linear relationships between DNAm and age at specific loci on the genome and (2) build upon regularization methods by comparing prediction errors between models with both non-transformed and square-root transformed predictors to models that include only non-transformed predictors. We used both the sparse partial least squares (SPLS) regression model and the lasso regression model to make our comparisons. Results: We found two age-differentially methylated sites implicated in the regulation of a gene known as KLF14, which could be involved in an immunosenescent phenotype. Inclusion of the square-root transformed variables had little effect on the prediction error of the SPLS model. On the other hand, the prediction error increased substantially in the lasso regression model, particularly when few predictors (70) were included. Conclusion: The growing amount and complexity of biological data coupled with advances in computational technology are indispensable to our understanding of biological pathways and perplexing biological phenomena. Moreover, high-dimensional biological data have enormous implications for clinical practice. Our findings implicate a possible biological pathway involved in immunosenescence. While we were unable to improve the predictive models of human age, future research should investigate other possible non-linear relationships between DNAm and human age, considering that such statistical methods can improve predictions of health outcomes. methylation epigenetics aging lasso genome age
18	Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: simulation and application Vasquez, Monica M., Hu, Chengcheng, Roe, Denise J., Chen, Zhao, Halonen, Marilyn, Guerra, Stefano 14 November 2016 (has links) Background: The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. The Least Absolute Shrinkage and Selection Operator (LASSO) is a data analysis method that may be utilized for biomarker selection in these high dimensional data. However, it is unclear which LASSO-type method is preferable when considering data scenarios that may be present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. The goal of this study was to compare the LASSO to five LASSO-type methods given these scenarios. Methods: A simulation study was performed to compare the LASSO, Adaptive LASSO, Elastic Net, Iterated LASSO, Bootstrap-Enhanced LASSO, and Weighted Fusion for the binary logistic regression model. The simulation study was designed to reflect the data structure of the population-based Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD), specifically the sample size (N = 1000 for total population, 500 for sub-analyses), correlation of biomarkers (0.20, 0.50, 0.80), prevalence of overweight (40%) and obese (12%) outcomes, and the association of outcomes with standardized serum biomarker concentrations (log-odds ratio = 0.05-1.75). Each LASSO-type method was then applied to the TESAOD data of 306 overweight, 66 obese, and 463 normal-weight subjects with a panel of 86 serum biomarkers. Results: Based on the simulation study, no method had an overall superior performance. The Weighted Fusion correctly identified more true signals, but incorrectly included more noise variables. The LASSO and Elastic Net correctly identified many true signals and excluded more noise variables. In the application study, biomarkers of overweight and obesity selected by all methods were Adiponectin, Apolipoprotein H, Calcitonin, CD14, Complement 3, C-reactive protein, Ferritin, Growth Hormone, Immunoglobulin M, Interleukin-18, Leptin, Monocyte Chemotactic Protein-1, Myoglobin, Sex Hormone Binding Globulin, Surfactant Protein D, and YKL-40. Conclusions: For the data scenarios examined, choice of optimal LASSO-type method was data structure dependent and should be guided by the research objective. The LASSO-type methods identified biomarkers that have known associations with obesity and obesity related conditions. LASSO Biomarkers High-Dimensional Obesity Overweight
19	Dimension Reduction and LASSO using Pointwise and Group Norms Jutras, Melanie A 11 December 2018 (has links) Principal Components Analysis (PCA) is a statistical procedure commonly used for the purpose of analyzing high dimensional data. It is often used for dimensionality reduction, which is accomplished by determining orthogonal components that contribute most to the underlying variance of the data. While PCA is widely used for identifying patterns and capturing variability of data in lower dimensions, it has some known limitations. In particular, PCA represents its results as linear combinations of data attributes. PCA is therefore, often seen as difficult to interpret and because of the underlying optimization problem that is being solved it is not robust to outliers. In this thesis, we examine extensions to PCA that address these limitations. Specific techniques researched in this thesis include variations of Robust and Sparse PCA as well as novel combinations of these two methods which result in a structured low-rank approximation that is robust to outliers. Our work is inspired by the well known machine learning methods of Least Absolute Shrinkage and Selection Operator (LASSO) as well as pointwise and group matrix norms. Practical applications including robust and non-linear methods for anomaly detection in Domain Name System network data as well as interpretable feature selection with respect to a website classification problem are discussed along with implementation details and techniques for analysis of regularization parameters.
20	Estimation for counting processes with high-dimensional covariates / Estimation pour les processus de comptage avec beaucoup de covariables Lemler, Sarah 09 December 2014 (has links) Nous cherchons à estimer l’intensité de sauts d’un processus de comptage en présence d’un grand nombre de covariables. Nous proposons deux approches. D’abord, nous considérons une intensité non-paramétrique et nous l’estimons par le meilleur modèle de Cox étant donné deux dictionnaires de fonctions. Le premier dictionnaire est utilisé pour construire une approximation du logarithme du risque de base et le second pour approximer le risque relatif. Nous considérons une procédure Lasso, spécifique à la grande dimension, pour estimer simultanément les deux paramètres inconnus du meilleur modèle de Cox approximant l’intensité. Nous prouvons des inégalités oracles non-asymptotiques pour l’estimateur Lasso obtenu. Dans une seconde partie, nous supposons que l’intensité satisfait un modèle de Cox. Nous proposons deux procédures en deux étapes pour estimer les paramètres inconnus du modèle de Cox. La première étape est commune aux deux procédures, il s’agit d’estimer le paramètre de régression en grande dimension via une procédure Lasso. Le risque de base est ensuite estimé soit par sélection de modèles, soit par un estimateur à noyau avec une fenêtre choisie par la méthode de Goldenshluger et Lepski. Nous établissons des inégalités oracles non-asymptotiques pour les deux estimateurs du risque de base ainsi obtenus. Nous menons une étude comparative de ces estimateurs sur des données simulées, et enfin, nous appliquons les procédures implémentées à une base de données sur le cancer du sein. / We consider the problem of estimating the intensity of a counting process adjusted on high-dimensional covariates. We propose two different approaches. First, we consider a non-parametric intensity function and estimate it by the best Cox proportional hazards model given two dictionaries of functions. The first dictionary is used to construct an approximation of the logarithm of the baseline hazard function and the second to approximate the relative risk. In this high-dimensional setting, we consider the Lasso procedure to estimate simultaneously the unknown parameters of the best Cox model approximating the intensity. We provide non-asymptotic oracle inequalities for the resulting Lasso estimator. In a second part, we consider an intensity that rely on the Cox model. We propose two two-step procedures to estimate the unknown parameters of the Cox model. Both procedures rely on a first step which consists in estimating the regression parameter in high-dimension via a Lasso procedure. The baseline function is then estimated either via model selection or by a kernel estimator with a bandwidth selected by the Goldenshluger and Lepski method. We establish non-asymptotic oracle inequalities for the two resulting estimators of the baseline function. We conduct a comparative study of these estimators on simulated data, and finally, we apply the implemented procedure to a real dataset on breast cancer. Modèle de Cox Inégalités oracles non-asymptotiques Procédure Lasso

Search results