• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 50
  • 10
  • 9
  • 9
  • 4
  • 2
  • 2
  • 1
  • Tagged with
  • 111
  • 45
  • 34
  • 31
  • 29
  • 25
  • 23
  • 16
  • 15
  • 15
  • 15
  • 13
  • 13
  • 13
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

A numerical study of penalized regression

Yu, Han 22 August 2013 (has links)
In this thesis, we review important aspects and issues of multiple linear regression, in particular on the problem of multi-collinearity. The focus is on a numerical study of different methods of penalized regression, including the ridge regression, lasso regression and elastic net regression, as well as the newly introduced correlation adjusted regression and correlation adjusted elastic net regression. We compare the performance and relative advantages of these methods.
2

A numerical study of penalized regression

Yu, Han 22 August 2013 (has links)
In this thesis, we review important aspects and issues of multiple linear regression, in particular on the problem of multi-collinearity. The focus is on a numerical study of different methods of penalized regression, including the ridge regression, lasso regression and elastic net regression, as well as the newly introduced correlation adjusted regression and correlation adjusted elastic net regression. We compare the performance and relative advantages of these methods.
3

Penalized methods in genome-wide association studies

Liu, Jin 01 July 2011 (has links)
Penalized regression methods are becoming increasingly popular in genome-wide association studies (GWAS) for identifying genetic markers associated with disease. However, standard penalized methods such as the LASSO do not take into account the possible linkage disequilibrium between adjacent markers. We propose a novel penalized approach for GWAS using a dense set of single nucleotide polymorphisms (SNPs). The proposed method uses the minimax concave penalty (MCP) for marker selection and incorporates linkage disequilibrium (LD) information by penalizing the difference of the genetic effects at adjacent SNPs with high correlation. A coordinate descent algorithm is derived to implement the proposed method. This algorithm is efficient and stable in dealing with a large number of SNPs. A multi-split method is used to calculate the p-values of the selected SNPs for assessing their significance. We refer to the proposed penalty function as the smoothed MCP (SMCP) and the proposed approach as the SMCP method. Performance of the proposed SMCP method and its comparison with a LASSO approach are evaluated through simulation studies, which demonstrate that the proposed method is more accurate in selecting associated SNPs. Its applicability to real data is illustrated using data from a GWAS on rheumatoid arthritis. Based on the idea of SMCP, we propose a new penalized method for group variable selection in GWAS with respect to the correlation between adjacent groups. The proposed method uses the group LASSO for encouraging group sparsity and a quadratic difference for adjacent group smoothing. We call it smoothed group LASSO, or SGL for short. Canonical correlations between two adjacent groups of SNPS are used as the weights in the quadratic difference penalty. Principal components are used to reduced dimensionality locally within groups. We derive a group coordinate descent algorithm for computing the solution path of the SGL. Simulation studies are used to evaluate the finite sample performance of the SGL and group LASSO. We also demonstrate its applicability on rheumatoid arthritis data.
4

penalized: A MATLAB toolbox for fitting generalized linear models with penalties

McIlhagga, William H. 07 August 2015 (has links)
Yes / penalized is a exible, extensible, and e cient MATLAB toolbox for penalized maximum likelihood. penalized allows you to t a generalized linear model (gaussian, logistic, poisson, or multinomial) using any of ten provided penalties, or none. The toolbox can be extended by creating new maximum likelihood models or new penalties. The toolbox also includes routines for cross-validation and plotting.
5

Fishing Economic Growth Determinants Using Bayesian Elastic Nets

Hofmarcher, Paul, Crespo Cuaresma, Jesus, Grün, Bettina, Hornik, Kurt 09 1900 (has links) (PDF)
We propose a method to deal simultaneously with model uncertainty and correlated regressors in linear regression models by combining elastic net specifications with a spike and slab prior. The estimation method nests ridge regression and the LASSO estimator and thus allows for a more flexible modelling framework than existing model averaging procedures. In particular, the proposed technique has clear advantages when dealing with datasets of (potentially highly) correlated regressors, a pervasive characteristic of the model averaging datasets used hitherto in the econometric literature. We apply our method to the dataset of economic growth determinants by Sala-i-Martin et al. (Sala-i-Martin, X., Doppelhofer, G., and Miller, R. I. (2004). Determinants of Long-Term Growth: A Bayesian Averaging of Classical Estimates (BACE) Approach. American Economic Review, 94: 813-835) and show that our procedure has superior out-of-sample predictive abilities as compared to the standard Bayesian model averaging methods currently used in the literature. (authors' abstract) / Series: Research Report Series / Department of Statistics and Mathematics
6

Two component semiparametric density mixture models with a known component

Zhou Shen (5930258) 17 January 2019 (has links)
<pre>Finite mixture models have been successfully used in many applications, such as classification, clustering, and many others. As opposed to classical parametric mixture models, nonparametric and semiparametric mixture models often provide more flexible approaches to the description of inhomogeneous populations. As an example, in the last decade a particular two-component semiparametric density mixture model with a known component has attracted substantial research interest. Our thesis provides an innovative way of estimation for this model based on minimization of a smoothed objective functional, conceptually similar to the log-likelihood. The minimization is performed with the help of an EM-like algorithm. We show that the algorithm is convergent and the minimizers of the objective functional, viewed as estimators of the model parameters, are consistent. </pre><pre><br></pre><pre>More specifically, in our thesis, a semiparametric mixture of two density functions is considered where one of them is known while the weight and the other function are unknown. For the first part, a new sufficient identifiability condition for this model is derived, and a specific class of distributions describing the unknown component is given for which this condition is mostly satisfied. A novel approach to estimation of this model is derived. That approach is based on an idea of using a smoothed likelihood-like functional as an objective functional in order to avoid ill-posedness of the original problem. Minimization of this functional is performed using an iterative Majorization-Minimization (MM) algorithm that estimates all of the unknown parts of the model. The algorithm possesses a descent property with respect to the objective functional. Moreover, we show that the algorithm converges even when the unknown density is not defined on a compact interval. Later, we also study properties of the minimizers of this functional viewed as estimators of the mixture model parameters. Their convergence to the true solution with respect to a bandwidth parameter is justified by reconsidering in the framework of Tikhonov-type functional. They also turn out to be large-sample consistent; this is justified using empirical minimization approach. The third part of the thesis contains a series of simulation studies, comparison with another method and a real data example. All of them show the good performance of the proposed algorithm in recovering unknown components from data.</pre>
7

Robust Approaches to Marker Identification and Evaluation for Risk Assessment

Dai, Wei January 2013 (has links)
Assessment of risk has been a key element in efforts to identify factors associated with disease, to assess potential targets of therapy and enhance disease prevention and treatment. Considerable work has been done to develop methods to identify markers, construct risk prediction models and evaluate such models. This dissertation aims to develop robust approaches for these tasks. In Chapter 1, we present a robust, flexible yet powerful approach to identify genetic variants that are associated with disease risk in genome-wide association studies when some subjects are related. In Chapter 2, we focus on identifying important genes predictive of survival outcome when the number of covariates greatly exceeds the number of observations via a nonparametric transformation model. We propose a rank-based estimator that poses minimal assumptions and develop an efficient
8

Regularized Markov Model for Modeling Disease Transitioning

Huang, Shuang, Huang, Shuang January 2017 (has links)
In longitudinal studies of chronic diseases, the disease states of individuals are often collected at several pre-scheduled clinical visits, but the exact states and the times of transitioning from one state to another between observations are not observed. This is commonly referred to as "panel data". Statistical challenges arise in panel data in regard to identifying predictors governing the transitions between different disease states with only the partially observed disease history. Continuous-time Markov models (CTMMs) are commonly used to analyze panel data, and allow maximum likelihood estimations without making any assumptions about the unobserved states and transition times. By assuming that the underlying disease process is Markovian, CTMMs yield tractable likelihood. However, CTMMs generally allow covariate effect to differ for different transitions, resulting in a much higher number of coefficients to be estimated than the number of covariates, and model overfitting can easily happen in practice. In three papers, I develop a regularized CTMM using the elastic net penalty for panel data, and implement it in an R package. The proposed method is capable of simultaneous variable selection and estimation even when the dimension of the covariates is high. In the first paper (Section 2), I use elastic net penalty to regularize the CTMM, and derive an efficient coordinate descent algorithm to solve the corresponding optimization problem. The algorithm takes advantage of the multinomial state distribution under the non-informative observation scheme assumption to simplify computation of key quantities. Simulation study shows that this method can effectively select true non-zero predictors while reducing model size. In the second paper (Section 3), I extend the regularized CTMM developed in the previous paper to accommodate exact death times and censored states. Death is commonly included as an endpoint in longitudinal studies, and exact time of death can be easily obtained but the state path leading to death is usually unknown. I show that exact death times result in a very different form of likelihood, and the dependency of death time on the model requires significantly different numerical methods for computing the derivatives of the log likelihood, a key quantity for the coordinate descent algorithm. I propose to use numerical differentiation to compute the derivatives of the log likelihood. Computation of the derivatives of the log likelihood from a transition involving a censored state is also discussed. I carry out a simulation study to evaluate the performance of this extension, which shows consistently good variable selection properties and comparable prediction accuracy compared to the oracle models where only true non-zero coefficient are fitted. I then apply the regularized CTMM to the airflow limitation data to the TESAOD (The Tucson Epidemiological Study of Airway Obstructive Disease) study with exact death times and censored states, and obtain a prediction model with great size reduction from a total of 220 potential parameters. Methods developed in the first two papers are implemented in an R package markovnet, and a detailed introduction to the key functionalities of the package is demonstrated with a simulated data set in the third paper (Section 4). Finally, some conclusion remarks are given and directions to future work are discussed (Section 5). The outline for this dissertation is as follows. Section 1 presents an in-depth background regarding panel data, CTMMs, and penalized regression methods, as well as an brief description of the TESAOD study design. Section 2 describes the first paper entitled "Regularized continuous-time Markov model via elastic net'". Section 3 describes the second paper entitled "Regularized continuous-time Markov model with exact death times and censored states"'. Section 4 describes the third paper "Regularized continuous-time Markov model for panel data: the markovnet package for R"'. Section 5 gives an overall summary and a discussion of future work.
9

Avoiding the redundant effect on regression analyses of including an outcome in the imputation model

Tamegnon, Monelle 01 January 2018 (has links)
Imputation is one well recognized method for handling missing data. Multiple imputation provides a framework for imputing missing data that incorporate uncertainty about the imputations at the analysis stage. An important factor to consider when performing multiple imputation is the imputation model. In particular, a careful choice of the covariates to include in the model is crucial. The current recommendation by several authors in the literature (Van Buren, 2012; Moons et al., 2006, Little and Rubin, 2002) is to include all variables that will appear in the analytical model including the outcome as covariates in the imputation model. When the goal of the analysis is to explore the relationship between the outcome and the variable with missing data (the target variable), this recommendation seems questionable. Should we make use of the outcome to fill-in the target variable missing observations and then use these filled-in observations along with the observed data on the target variable to explore the relationship of the target variable with the outcome? We believe that this approach is circular. Instead, we have designed multiple imputation approaches rooted in machines learning techniques that avoid the use of the outcome at the imputation stage and maintain reasonable inferential properties. We also compare our approaches performances to currently available methods.
10

Zonal And Regional Load Forecasting In The New England Wholesale Electricity Market: A Semiparametric Regression Approach

Farland, Jonathan 01 January 2013 (has links) (PDF)
Power system planning, reliability analysis and economically efficient capacity scheduling all rely heavily on electricity demand forecasting models. In the context of a deregulated wholesale electricity market, using scheduling a region’s bulk electricity generation is inherently linked to future values of demand. Predictive models are used by municipalities and suppliers to bid into the day-ahead market and by utilities in order to arrange contractual interchanges among neighboring utilities. These numerical predictions are therefore pervasive in the energy industry. This research seeks to develop a regression-based forecasting model. Specifically, electricity demand is modeled as a function of calendar effects, lagged demand effects, weather effects, and a stochastic disturbance. Variables such as temperature, wind speed, cloud cover and humidity are known to be among the strongest predictors of electricity demand and as such are used as model inputs. It is well known, however, that the relationship between demand and weather can be highly nonlinear. Rather than assuming a linear functional form, the structural change in these relationships is explored. Those variables that indicate a nonlinear relationship with demand are accommodated with penalized splines in a semiparametric regression framework. The equivalence between penalized splines and the special case of a mixed model formulation allows for model estimation with currently available statistical packages such as R, STATA and SAS. Historical data are available for the entire New England region as well as for the smaller zones that collectively make up the regional grid. As such, a secondary research objective of this thesis is to explore whether or not an aggregation of zonal forecasts might perform better than those produced from a single regional model. Prior to this research, neither the applicability of a semiparametric regression-based approach towards load forecasting nor the potential improvement in forecasting performance resulting from zonal load forecasting has been investigated for the New England wholesale electricity market.

Page generated in 0.0772 seconds