In this thesis, we consider the problem of constructing an additive risk model based on the right censored survival data to predict the survival times of the cancer patients, especially when the dimension of the covariates is much larger than the sample size. For microarray Gene Expression data, the number of gene expression levels is far greater than the number of samples. Such ¡°small n, large p¡± problems have attracted researchers to investigate the association between cancer patient survival times and gene expression profiles for recent few years. We apply Partial Least Squares to reduce the dimension of the covariates and get the corresponding latent variables (components), and these components are used as new regressors to fit the extensional additive risk model. Also we employ the time dependent AUC curve (area under the Receiver Operating Characteristic (ROC) curve) to assess how well the model predicts the survival time. Finally, this approach is illustrated by re-analysis of the well known AML data set and breast cancer data set. The results show that the model fits both of the data sets very well.
Identifer | oai:union.ndltd.org:GEORGIA/oai:digitalarchive.gsu.edu:math_theses-1005 |
Date | 09 June 2006 |
Creators | Zhou, Yue |
Publisher | Digital Archive @ GSU |
Source Sets | Georgia State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Mathematics Theses |
Page generated in 0.0022 seconds