Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the response. We conclude that the random forest method is an effective classification tool when the goals of a study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables. These goals are common in gene expression analysis, therefore we apply the random forest method for the purpose of estimating variable importance on a microarray data set.
Identifer | oai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-2432 |
Date | 01 January 2006 |
Creators | Kimes, Ryan Vincent |
Publisher | VCU Scholars Compass |
Source Sets | Virginia Commonwealth University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | © The Author |
Page generated in 0.0022 seconds