Global ETD Search

1	A numerical study of penalized regression Yu, Han 22 August 2013 (has links) In this thesis, we review important aspects and issues of multiple linear regression, in particular on the problem of multi-collinearity. The focus is on a numerical study of different methods of penalized regression, including the ridge regression, lasso regression and elastic net regression, as well as the newly introduced correlation adjusted regression and correlation adjusted elastic net regression. We compare the performance and relative advantages of these methods. penalized regression
2	A numerical study of penalized regression Yu, Han 22 August 2013 (has links) In this thesis, we review important aspects and issues of multiple linear regression, in particular on the problem of multi-collinearity. The focus is on a numerical study of different methods of penalized regression, including the ridge regression, lasso regression and elastic net regression, as well as the newly introduced correlation adjusted regression and correlation adjusted elastic net regression. We compare the performance and relative advantages of these methods. penalized regression
3	Inferring the time-varying transmission rate and effective reproduction number by fitting semi-mechanistic compartmental models to incidence data Forkutza, Gregory January 2024 (has links) This thesis presents a novel approach to ecological dynamic modeling using non-stochastic compartmental models. Estimating the transmission rate (\(\beta\)) and the effective reproduction number (\(R_t\)) is essential for understanding disease spread and guiding public health interventions. We extend this method to infectious disease models, where the transmission rate varies dynamically due to external factors. Using Simon Wood's partially specified modeling framework, we introduce penalized smoothing to estimate time-varying latent variables within the `R` package `macpan2`. This integration provides an accessible tool for complex estimation problems. The efficacy of our approach is first validated via a simulation study and then demonstrated with real-world datasets on Scarlet Fever, COVID-19, and Measles. We infer the effective reproduction number (\(R_t\)) using the estimated \(\beta\) values, providing further insights into the dynamics of disease transmission. Model fit is compared using the Akaike Information Criterion (AIC), and we evaluate the performance of different smoothing bases derived using the `mgcv` package. Our findings indicate that this methodology can be extended to various ecological and epidemiological contexts, offering a versatile and robust approach to parameter estimation in dynamic models. / Thesis / Master of Science (MSc) / This thesis explores a new way to model how diseases spread using a deterministic mathematical framework. We focus on estimating the changing transmission rate and the effective reproduction number, key factors in understanding and controlling disease outbreaks. Our method, incorporated into the `macpan2` software, uses advanced techniques to estimate these changing rates over time. We first prove the effectiveness of our approach with simulations and then apply it to real data from Scarlet Fever, COVID-19, and Measles. We also compare the model performance. Our results show that this flexible and user-friendly approach is a valuable tool for modelers working on disease dynamics. Epidemiology Statistical Modelling Penalized Smoothing
4	Penalized methods in genome-wide association studies Liu, Jin 01 July 2011 (has links) Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as the LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient and stable in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP (SMCP) and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with a LASSO approach are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using data from a GWAS on rheumatoid arthritis. Based on the idea of SMCP, we propose a new penalized method for group variable selection in GWAS with respect to the correlation between adjacent groups. The proposed method uses the group LASSO for encouraging group sparsity and a quadratic difference for adjacent group smoothing. We call it smoothed group LASSO, or SGL for short. Canonical correlations between two adjacent groups of SNPS are used as the weights in the quadratic difference penalty. Principal components are used to reduced dimensionality locally within groups. We derive a group coordinate descent algorithm for computing the solution path of the SGL. Simulation studies are used to evaluate the finite sample performance of the SGL and group LASSO. We also demonstrate its applicability on rheumatoid arthritis data. GWAS Linkage disequilibrium Penalized regression Statistics and Probability
5	penalized: A MATLAB toolbox for fitting generalized linear models with penalties McIlhagga, William H. 07 August 2015 (has links) Yes / penalized is a exible, extensible, and e cient MATLAB toolbox for penalized maximum likelihood. penalized allows you to t a generalized linear model (gaussian, logistic, poisson, or multinomial) using any of ten provided penalties, or none. The toolbox can be extended by creating new maximum likelihood models or new penalties. The toolbox also includes routines for cross-validation and plotting. Generalized linear models Penalized regression LASSO MATLAB
6	Fishing Economic Growth Determinants Using Bayesian Elastic Nets Hofmarcher, Paul, Crespo Cuaresma, Jesus, Grün, Bettina, Hornik, Kurt 09 1900 (has links) (PDF) We propose a method to deal simultaneously with model uncertainty and correlated regressors in linear regression models by combining elastic net specifications with a spike and slab prior. The estimation method nests ridge regression and the LASSO estimator and thus allows for a more flexible modelling framework than existing model averaging procedures. In particular, the proposed technique has clear advantages when dealing with datasets of (potentially highly) correlated regressors, a pervasive characteristic of the model averaging datasets used hitherto in the econometric literature. We apply our method to the dataset of economic growth determinants by Sala-i-Martin et al. (Sala-i-Martin, X., Doppelhofer, G., and Miller, R. I. (2004). Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (BACE) Approach. American Economic Review, 94: 813-835) and show that our procedure has superior out-of-sample predictive abilities as compared to the standard Bayesian model averaging methods currently used in the literature. (authors' abstract) / Series: Research Report Series / Department of Statistics and Mathematics
7	Two component semiparametric density mixture models with a known component Zhou Shen (5930258) 17 January 2019 (has links) <pre>Finite mixture models have been successfully used in many applications, such as classification, clustering, and many others. As opposed to classical parametric mixture models, nonparametric and semiparametric mixture models often provide more flexible approaches to the description of inhomogeneous populations. As an example, in the last decade a particular two-component semiparametric density mixture model with a known component has attracted substantial research interest. Our thesis provides an innovative way of estimation for this model based on minimization of a smoothed objective functional, conceptually similar to the log-likelihood. The minimization is performed with the help of an EM-like algorithm. We show that the algorithm is convergent and the minimizers of the objective functional, viewed as estimators of the model parameters, are consistent. </pre><pre><br></pre><pre>More specifically, in our thesis, a semiparametric mixture of two density functions is considered where one of them is known while the weight and the other function are unknown. For the first part, a new sufficient identifiability condition for this model is derived, and a specific class of distributions describing the unknown component is given for which this condition is mostly satisfied. A novel approach to estimation of this model is derived. That approach is based on an idea of using a smoothed likelihood-like functional as an objective functional in order to avoid ill-posedness of the original problem. Minimization of this functional is performed using an iterative Majorization-Minimization (MM) algorithm that estimates all of the unknown parts of the model. The algorithm possesses a descent property with respect to the objective functional. Moreover, we show that the algorithm converges even when the unknown density is not defined on a compact interval. Later, we also study properties of the minimizers of this functional viewed as estimators of the mixture model parameters. Their convergence to the true solution with respect to a bandwidth parameter is justified by reconsidering in the framework of Tikhonov-type functional. They also turn out to be large-sample consistent; this is justified using empirical minimization approach. The third part of the thesis contains a series of simulation studies, comparison with another method and a real data example. All of them show the good performance of the proposed algorithm in recovering unknown components from data.</pre> Statistics mixture models penalized smoothed likelihood MM algorithms regularization
8	Robust Approaches to Marker Identification and Evaluation for Risk Assessment Dai, Wei January 2013 (has links) Assessment of risk has been a key element in efforts to identify factors associated with disease, to assess potential targets of therapy and enhance disease prevention and treatment. Considerable work has been done to develop methods to identify markers, construct risk prediction models and evaluate such models. This dissertation aims to develop robust approaches for these tasks. In Chapter 1, we present a robust, flexible yet powerful approach to identify genetic variants that are associated with disease risk in genome-wide association studies when some subjects are related. In Chapter 2, we focus on identifying important genes predictive of survival outcome when the number of covariates greatly exceeds the number of observations via a nonparametric transformation model. We propose a rank-based estimator that poses minimal assumptions and develop an efficient Biostatistics marker identification penalized regression risk prediction variable selection
9	Regularized Markov Model for Modeling Disease Transitioning Huang, Shuang, Huang, Shuang January 2017 (has links) In longitudinal studies of chronic diseases, the disease states of individuals are often collected at several pre-scheduled clinical visits, but the exact states and the times of transitioning from one state to another between observations are not observed. This is commonly referred to as "panel data". Statistical challenges arise in panel data in regard to identifying predictors governing the transitions between different disease states with only the partially observed disease history. Continuous-time Markov models (CTMMs) are commonly used to analyze panel data, and allow maximum likelihood estimations without making any assumptions about the unobserved states and transition times. By assuming that the underlying disease process is Markovian, CTMMs yield tractable likelihood. However, CTMMs generally allow covariate effect to differ for different transitions, resulting in a much higher number of coefficients to be estimated than the number of covariates, and model overfitting can easily happen in practice. In three papers, I develop a regularized CTMM using the elastic net penalty for panel data, and implement it in an R package. The proposed method is capable of simultaneous variable selection and estimation even when the dimension of the covariates is high. In the first paper (Section 2), I use elastic net penalty to regularize the CTMM, and derive an efficient coordinate descent algorithm to solve the corresponding optimization problem. The algorithm takes advantage of the multinomial state distribution under the non-informative observation scheme assumption to simplify computation of key quantities. Simulation study shows that this method can effectively select true non-zero predictors while reducing model size. In the second paper (Section 3), I extend the regularized CTMM developed in the previous paper to accommodate exact death times and censored states. Death is commonly included as an endpoint in longitudinal studies, and exact time of death can be easily obtained but the state path leading to death is usually unknown. I show that exact death times result in a very different form of likelihood, and the dependency of death time on the model requires significantly different numerical methods for computing the derivatives of the log likelihood, a key quantity for the coordinate descent algorithm. I propose to use numerical differentiation to compute the derivatives of the log likelihood. Computation of the derivatives of the log likelihood from a transition involving a censored state is also discussed. I carry out a simulation study to evaluate the performance of this extension, which shows consistently good variable selection properties and comparable prediction accuracy compared to the oracle models where only true non-zero coefficient are fitted. I then apply the regularized CTMM to the airflow limitation data to the TESAOD (The Tucson Epidemiological Study of Airway Obstructive Disease) study with exact death times and censored states, and obtain a prediction model with great size reduction from a total of 220 potential parameters. Methods developed in the first two papers are implemented in an R package markovnet, and a detailed introduction to the key functionalities of the package is demonstrated with a simulated data set in the third paper (Section 4). Finally, some conclusion remarks are given and directions to future work are discussed (Section 5). The outline for this dissertation is as follows. Section 1 presents an in-depth background regarding panel data, CTMMs, and penalized regression methods, as well as an brief description of the TESAOD study design. Section 2 describes the first paper entitled "Regularized continuous-time Markov model via elastic net'". Section 3 describes the second paper entitled "Regularized continuous-time Markov model with exact death times and censored states"'. Section 4 describes the third paper "Regularized continuous-time Markov model for panel data: the markovnet package for R"'. Section 5 gives an overall summary and a discussion of future work. elastic net Markov model Panel data penalized regression
10	Avoiding the redundant effect on regression analyses of including an outcome in the imputation model Tamegnon, Monelle 01 January 2018 (has links) Imputation is one well recognized method for handling missing data. Multiple imputation provides a framework for imputing missing data that incorporate uncertainty about the imputations at the analysis stage. An important factor to consider when performing multiple imputation is the imputation model. In particular, a careful choice of the covariates to include in the model is crucial. The current recommendation by several authors in the literature (Van Buren, 2012; Moons et al., 2006, Little and Rubin, 2002) is to include all variables that will appear in the analytical model including the outcome as covariates in the imputation model. When the goal of the analysis is to explore the relationship between the outcome and the variable with missing data (the target variable), this recommendation seems questionable. Should we make use of the outcome to fill-in the target variable missing observations and then use these filled-in observations along with the observed data on the target variable to explore the relationship of the target variable with the outcome? We believe that this approach is circular. Instead, we have designed multiple imputation approaches rooted in machines learning techniques that avoid the use of the outcome at the imputation stage and maintain reasonable inferential properties. We also compare our approaches performances to currently available methods. clustering imputation model multiple imputation penalized splines Biostatistics

Search results