Global ETD Search

1	Some statistical methods for dimension reduction Al-Kenani, Ali J. Kadhim January 2013 (has links) The aim of the work in this thesis is to carry out dimension reduction (DR) for high dimensional (HD) data by using statistical methods for variable selection, feature extraction and a combination of the two. In Chapter 2, the DR is carried out through robust feature extraction. Robust canonical correlation (RCCA) methods have been proposed. In the correlation matrix of canonical correlation analysis (CCA), we suggest that the Pearson correlation should be substituted by robust correlation measures in order to obtain robust correlation matrices. These matrices have been employed for producing RCCA. Moreover, the classical covariance matrix has been substituted by robust estimators for multivariate location and dispersion in order to get RCCA. In Chapter 3 and 4, the DR is carried out by combining the ideas of variable selection using regularisation methods with feature extraction, through the minimum average variance estimator (MAVE) and single index quantile regression (SIQ) methods, respectively. In particular, we extend the sparse MAVE (SMAVE) reported in (Wang and Yin, 2008) by combining the MAVE loss function with different regularisation penalties in Chapter 3. An extension of the SIQ of Wu et al. (2010) by considering different regularisation penalties is proposed in Chapter 4. In Chapter 5, the DR is done through variable selection under Bayesian framework. A flexible Bayesian framework for regularisation in quantile regression (QR) model has been proposed. This work is different from Bayesian Lasso quantile regression (BLQR), employing the asymmetric Laplace error distribution (ALD). The error distribution is assumed to be an infinite mixture of Gaussian (IMG) densities. 519.5
2	Metoda Lasso a její aplikace v časových řadách / The Lasso and its application to time series Holý, Vladimír January 2014 (has links) This thesis first describes the Lasso method and its adaptive improvement. Then the basic theoretical properties are shown and different algorithms are introduced. The main part of this thesis is application of the Lasso method to AR, MA and ARCH time series and to REGAR, REGMA and REGARCH models. An algorithm of the adaptive Lasso in a more general time series model, which includes all above mentioned models and series, is developed. The properties of methods and algorithms are shown on simulations and on a practical example. Powered by TCPDF (www.tcpdf.org)
3	Adaptive L1 regularized second-order least squares method for model selection Xue, Lin 11 September 2015 (has links) The second-order least squares (SLS) method in regression model proposed by Wang (2003, 2004) is based on the first two conditional moments of the response variable given the observed predictor variables. Wang and Leblanc (2008) show that the SLS estimator (SLSE) is asymptotically more efficient than the ordinary least squares estimator (OLSE) if the third moment of the random error is nonzero. We apply the SLS method to variable selection problems and propose the adaptively weighted L1 regularized SLSE (L1-SLSE). The L1-SLSE is robust against the shape of error distributions in variable selection problems. Finite sample simulation studies show that the L1-SLSE is more efficient than L1-OLSE in the case of asymmetric error distributions. A real data application with L1-SLSE is presented to demonstrate the usage of this method. / October 2015
4	Statistical Discovery of Biomarkers in Metagenomics Abdul Wahab, Ahmad Hakeem January 2015 (has links) Metagenomics holds unyielding potential in uncovering relationships within microbial communities that have yet to be discovered, particularly because the field circumvents the need to isolate and culture microbes from their natural environmental settings. A common research objective is to detect biomarkers, microbes are associated with changes in a status. For instance, determining such microbes across conditions such as healthy and diseased groups for instance allows researchers to identify pathogens and probiotics. This is often achieved via analysis of differential abundance of microbes. The problem is that differential abundance analysis looks at each microbe individually without considering the possible associations the microbes may have with each other. This is not favorable, since microbes rarely act individually but within intricate communities involving other microbes. An alternative would be variable selection techniques such as Lasso or Elastic Net which considers all the microbes simultaneously and conducts selection. However, Lasso often selects only a representative feature of a correlated cluster of features and the Elastic Net may incorrectly select unimportant features too frequently and erratically due to high levels of sparsity and variation in the data.\par In this research paper, the proposed method AdaLassop is an augmented variable selection technique that overcomes the misgivings of Lasso and Elastic Net. It provides researchers with a holistic model that takes into account the effects of selected biomarkers in presence of other important biomarkers. For AdaLassop, variable selection on sparse ultra-high dimensional data is implemented using the Adaptive Lasso with p-values extracted from Zero Inflated Negative Binomial Regressions as augmented weights. Comprehensive simulations involving varying correlation structures indicate that AdaLassop has optimal performance in the presence multicollinearity. This is especially apparent as sample size grows. Application of Adalassop on a Metagenome-wide study of diabetic patients reveals both pathogens and probiotics that have been researched in the medical field. Adaptive Lasso Biomarker Metagenomics Variable Selection Statistics Adaptive Elastic Net
5	Computing a journal meta-ranking using paired comparisons and adaptive lasso estimators Vana, Laura, Hochreiter, Ronald, Hornik, Kurt 01 1900 (has links) (PDF) In a "publish-or-perish culture", the ranking of scientific journals plays a central role in assessing the performance in the current research environment. With a wide range of existing methods for deriving journal rankings, meta-rankings have gained popularity as a means of aggregating different information sources. In this paper, we propose a method to create a meta-ranking using heterogeneous journal rankings. Employing a parametric model for paired comparison data we estimate quality scores for 58 journals in the OR/MS/POM community, which together with a shrinkage procedure allows for the identification of clusters of journals with similar quality. The use of paired comparisons provides a flexible framework for deriving an aggregated score while eliminating the problem of missing data.
6	THE FAMILY OF CONDITIONAL PENALIZED METHODS WITH THEIR APPLICATION IN SUFFICIENT VARIABLE SELECTION Xie, Jin 01 January 2018 (has links) When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable Screening (CS-SCAL-VS) algorithms based on CAL. The asymptotic and oracle properties are proved. Simulations, especially for the large p small n problems, are performed with comparisons with other existing methods. We further extend to the linear model setup to the generalized linear models (GLM). Instead of least squares, we consider the likelihood function with L1 penalty, that is the penalized likelihood methods. We proposed for Generalized Conditional Adaptive Lasso (GCAL) for the generalized linear models. We then further extend the method for any penalty terms that satisfy certain regularity conditions, namely Conditionally Penalized Estimate (CPE). Asymptotic and oracle properties are showed. Four corresponding sufficient variable screening algorithms are proposed. Simulation examples are evaluated for our method with comparisons with existing methods. GCAL is also evaluated with a read data set on leukemia. Generalized Conditional Adaptive Lasso High-dimensional Data Variable Screening Variable Selection Applied Statistics Statistical Methodology Statistical Models Statistical Theory
7	A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites Zeng, Yan 01 August 2011 (has links) Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers. Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection. Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models of non-imputed datasets (average NRMSEP of 6.3% for model of MOR and 8.1% for model of IB). The second part finds that Bayesian Additive Regression Tree (BART) produced most precise prediction results (average NRMSEP of 7.7% for MOR model and 8.6% for IB model) than other three models: PLSR, LASSO, and Adaptive LASSO. missing data imputation predictive modeling partial least squares regression LASSO Adaptive LASSO BART Applied Statistics Statistical Methodology Statistical Models
8	Variable Selection and Function Estimation Using Penalized Methods Xu, Ganggang 2011 December 1900 (has links) Penalized methods are becoming more and more popular in statistical research. This dissertation research covers two major aspects of applications of penalized methods: variable selection and nonparametric function estimation. The following two paragraphs give brief introductions to each of the two topics. Infinite variance autoregressive models are important for modeling heavy-tailed time series. We use a penalty method to conduct model selection for autoregressive models with innovations in the domain of attraction of a stable law indexed by alpha is an element of (0, 2). We show that by combining the least absolute deviation loss function and the adaptive lasso penalty, we can consistently identify the true model. At the same time, the resulting coefficient estimator converges at a rate of n^(?1/alpha) . The proposed approach gives a unified variable selection procedure for both the finite and infinite variance autoregressive models. While automatic smoothing parameter selection for nonparametric function estimation has been extensively researched for independent data, it is much less so for clustered and longitudinal data. Although leave-subject-out cross-validation (CV) has been widely used, its theoretical property is unknown and its minimization is computationally expensive, especially when there are multiple smoothing parameters. By focusing on penalized modeling methods, we show that leave-subject-out CV is optimal in that its minimization is asymptotically equivalent to the minimization of the true loss function. We develop an efficient Newton-type algorithm to compute the smoothing parameters that minimize the CV criterion. Furthermore, we derive one simplification of the leave-subject-out CV, which leads to a more efficient algorithm for selecting the smoothing parameters. We show that the simplified version of CV criteria is asymptotically equivalent to the unsimplified one and thus enjoys the same optimality property. This CV criterion also provides a completely data driven approach to select working covariance structure using generalized estimating equations in longitudinal data analysis. Our results are applicable to additive, linear varying-coefficient, nonlinear models with data from exponential families. Adaptive lasso Autoregressive model Infinite variance Least absolute deviation
9	Comparison of existing ZOI estimation methods with different model specifications and data. Mukhopadhyay, Shraddha January 2020 (has links) With the increasing demand and interest in wind power worldwide, it is interesting to study the effects of running windfarms on the activity of reindeers and estimate the associated Zone of Influence (ZOI) relative to these disturbances. Through simulation, Hierarchical Likelihood (HL) and adaptive Lasso methods are used to estimate the ZOI of windfarms and catching the correct threshold at which the negative effect of the disturbances on the reindeer behaviour disappears. The results found some merit to the explanation that the negative effect may not disappear abruptly and more merit to the fact that a linear model was still a better choice than the smooth polynomial models used. A real-life data related to reindeer faecal pellet counts from an area in northern Sweden were windfarms were running were analyzed. The yearly time series data was divided into three periods : before construction, during construction and during operation of the windfarms. Logistic regression, segmented model, and HL methods were implemented for data analysis by using covariates as distance from wind turbine, vegetation type, the interaction between distance to wind turbine and time period. A significant breakpoint could be estimated using the segmented model at a distance of 2.8 km from running windfarm, after which the negative effects of the windfarm on the reindeer activity disappeared. However, further work is needed for estimation of ZOI using HL method and considering other possible factors causing disturbances to the reindeer habitat and behaviour. Zone of Influence reindeer pellet group count logistic regression segmented model Hierarchical Likelihood Adaptive Lasso threshold breakpoint Computer and Information Sciences Data- och informationsvetenskap
10	Model Selection and Adaptive Lasso Estimation of Spatial Models Liu, Tuo 07 December 2017 (has links) No description available. Economics likelihood ratio near-epoch dependence spatial autoregressive model adaptive lasso oracle property least square approximation selection consistency

Search results