Return to search

Stereotype Logit Models for High Dimensional Data

Gene expression studies are of growing importance in the field of medicine. In fact, subtypes within the same disease have been shown to have differing gene expression profiles (Golub et al., 1999). Often, researchers are interested in differentiating a disease by a categorical classification indicative of disease progression. For example, it may be of interest to identify genes that are associated with progression and to accurately predict the state of progression using gene expression data. One challenge when modeling microarray gene expression data is that there are more genes (variables) than there are observations. In addition, the genes usually demonstrate a complex variance-covariance structure. Therefore, modeling a categorical variable reflecting disease progression using gene expression data presents the need for methods capable of handling an ordinal outcome in the presence of a high dimensional covariate space. In this research we present a method that combines the stereotype regression model (Anderson, 1984) with an elastic net penalty (Friedman et al., 2010) as a method capable of modeling an ordinal outcome for high-throughput genomic datasets. Results from applying the proposed method to both simulated and gene expression data will be reported and the effectiveness of the proposed method compared to a univariable and heuristic approach will be discussed.

Identiferoai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-1146
Date29 October 2010
CreatorsWilliams, Andre
PublisherVCU Scholars Compass
Source SetsVirginia Commonwealth University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rights© The Author

Page generated in 0.011 seconds