Return to search

Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.

Discriminant analysis and random forests are used to create models for classification. The number of variables to be tested for inclusion in a model can be large. The goal of this work was to create an efficient and effective selection program. The first method used was based on the work of others. The resulting models were underperforming, so another approach was adopted. Models were built by adding the variable that maximized new-model accuracy. The two programs were used to generate discriminant-analysis and random forest models for three data sets. An existing software package was also used. The second program outperformed the alternatives. For the small number of runs produced in this study, it outperformed the method that inspired this work. The data sets were studied to identify determinants of performance. No definite conclusions were reached, but the results suggest topics for future study.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OGU.10214/2933
Date31 August 2011
CreatorsPetrcich, William
ContributorsMcNicholas, Dr. Paul
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0293 seconds