Dimension reduction is a vital tool in many areas of applied statistics in which the dimensionality of the predictors can be large. In such cases, many statistical methods will fail or yield unsatisfactory results. However, many data sets of high dimensionality actually contain a much simpler, low-dimensional structure. Classical methods such as principal components analysis are able to detect linear structures very effectively, but fail in the presence of nonlinear structures. In the first part of this thesis, we investigate the asymptotic behavior of two nonlinear dimensionality reduction algorithms, LTSA and HLLE. In particular, we show that both algorithms, under suitable conditions, asymptotically recover the true generating coordinates up to an isometry. We also discuss the relative merits of the two algorithms, and the effects of the underlying probability distributions of the coordinates on their performance.
Model selection is a fundamental problem in nearly all areas of applied statistics. In particular, a balance must be achieved between good in-sample performance and out-of-sample prediction. It is typically very easy to achieve good fit in the sample data, but empirically we often find that such models will generalize poorly. In the second part of the thesis, we propose a new procedure for the model selection problem which generalizes traditional methods. Our algorithm allows the combination of existing model selection criteria via a ranking procedure, leading to the creation of new criteria which are able to combine measures of in-sample fit and out-of-sample prediction performance into a single value. We then propose an algorithm which provably finds the optimal combination with a specified probability. We demonstrate through simulations that these new combined criteria can be substantially more powerful than any individual criterion.
Identifer | oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/22586 |
Date | 26 March 2008 |
Creators | Smith, Andrew Korb |
Publisher | Georgia Institute of Technology |
Source Sets | Georgia Tech Electronic Thesis and Dissertation Archive |
Detected Language | English |
Type | Dissertation |
Page generated in 0.0021 seconds