Spelling suggestions: "subject:"inimum distance estimation"" "subject:"aminimum distance estimation""
1 |
Parametric classification and variable selection by the minimum integrated squared error criterionJanuary 2012 (has links)
This thesis presents a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of [cursive l] 1 regularized model titling has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. While this is an asymptotically efficient procedure under very general conditions, it is not robust. Outliers can negatively impact both estimation and variable selection. Moreover, outliers can be very difficult to identify as the number of predictor variables becomes large. Minimizing the integrated squared error, or L 2 error, while less efficient, has been shown to generate parametric estimators that are robust to a fair amount of contamination in several contexts. In this thesis, we present a novel robust parametric regression model for the binary classification problem based on L 2 distance, the logistic L 2 estimator (L 2 E). To perform simultaneous model fitting and variable selection among correlated predictors in the high dimensional setting, an elastic net penalty is introduced. A fast computational algorithm for minimizing the elastic net penalized logistic L 2 E loss is derived and results on the algorithm's global convergence properties are given. Through simulations we demonstrate the utility of the penalized logistic L 2 E at robustly recovering sparse models from high dimensional data in the presence of outliers and inliers. Results on real genomic data are also presented.
|
2 |
Minimum Distance Estimation in Categorical Conditional Independence ModelsJanuary 2012 (has links)
One of the oldest and most fundamental problems in statistics is the analysis of cross-classified data called contingency tables. Analyzing contingency tables is typically a question of association - do the variables represented in the table exhibit special dependencies or lack thereof? The statistical models which best capture these experimental notions of dependence are the categorical conditional independence models; however, until recent discoveries concerning the strongly algebraic nature of the conditional independence models surfaced, the models were widely overlooked due to their unwieldy implicit description. Apart from the inferential question above, this thesis asks the more basic question - suppose such an experimental model of association is known, how can one incorporate this information into the estimation of the joint distribution of the table? In the traditional parametric setting several estimation paradigms have been developed over the past century; however, traditional results are not applicable to arbitrary categorical conditional independence models due to their implicit nature. After laying out the framework for conditional independence and algebraic statistical models, we consider three aspects of estimation in the models using the minimum Euclidean (L2E), minimum Pearson chi-squared, and minimum Neyman modified chi-squared distance paradigms as well as the more ubiquitous maximum likelihood approach (MLE). First, we consider the theoretical properties of the estimators and demonstrate that under general conditions the estimators exist and are asymptotically normal. For small samples, we present the results of large scale simulations to address the estimators' bias and mean squared error (in the Euclidean and Frobenius norms, respectively). Second, we identify the computation of such estimators as an optimization problem and, for the case of the L2E, propose two different methods by which the problem can be solved, one algebraic and one numerical. Finally, we present an R implementation via two novel packages, mpoly for symbolic computing with multivariate polynomials and catcim for fitting categorical conditional independence models. It is found that in general minimum distance estimators in categorical conditional independence models behave as they do in the more traditional parametric setting and can be computed in many practical situations with the implementation provided.
|
3 |
Minimum disparity inference for discrete ranked set sampling dataAlexandridis, Roxana Antoanela 12 September 2005 (has links)
No description available.
|
Page generated in 0.1227 seconds