Global ETD Search

Return to search

Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.

Discriminant analysis and random forests are used to create models for classification. The number of variables to be tested for inclusion in a model can be large. The goal of this work was to create an efficient and effective selection program. The first method used was based on the work of others. The resulting models were underperforming, so another approach was adopted. Models were built by adding the variable that maximized new-model accuracy. The two programs were used to generate discriminant-analysis and random forest models for three data sets. An existing software package was also used. The second program outperformed the alternatives. For the small number of runs produced in this study, it outperformed the method that inspired this work. The data sets were studied to identify determinants of performance. No definite conclusions were reached, but the results suggest topics for future study.

http://hdl.handle.net/10214/2933

Identifer	oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OGU.10214/2933
Date	31 August 2011
Creators	Petrcich, William
Contributors	McNicholas, Dr. Paul
Source Sets	Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Language	English
Detected Language	English
Type	Thesis

Page generated in 0.002 seconds

Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.

Description

Links & Downloads

Tags

Additional Fields