This dissertation consists of five independent projects. In each project, a novel
statistical method was developed to address a practical problem encountered in genomic
contexts. For example, we considered testing for constant nonparametric effects
in a general semiparametric regression model in genetic epidemiology; analyzed the
relationship between covariates in the secondary analysis of case-control data; performed
model selection in joint modeling of paired functional data; and assessed the
prediction ability of genes in gene expression data generated by the CodeLink System
from GE.
In the first project in Chapter II we considered the problem of testing for constant
nonparametric effects in a general semiparametric regression model when there is the
potential for interaction between the parametrically and nonparametrically modeled
variables. We derived a generalized likelihood ratio test for this hypothesis, showed
how to implement it, and gave evidence that it can improve statistical power when
compared to standard partially linear models.
The second project in Chapter III addressed the issue of score testing for the
independence of X and Y in the second analysis of case-control data. The semiparametric
efficient approaches can be used to construct semiparametric score tests, but
they suffer from a lack of robustness to the assumed model for Y given X. We showed
how to adjust the semiparametric score test to make its level/Type I error correct even if the assumed model for Y given X is incorrect, and thus the test is robust.
The third project in Chapter IV took up the issue of estimation of a regression
function when Y given X follows a homoscedastic regression model. We showed how
to estimate the regression parameters in a rare disease case even if the assumed model
for Y given X is incorrect, and thus the estimates are model-robust.
In the fourth project in Chapter V we developed novel AIC and BIC-type methods
for estimating the smoothing parameters in a joint model of paired, hierarchical
sparse functional data, and showed in our numerical work that they are many times
faster than 10-fold crossvalidation while at the same time giving results that are
remarkably close to the crossvalidated estimates.
In the fifth project in Chapter VI we introduced a practical permutation test
that uses cross-validated genetic predictors to determine if the list of genes in question
has “good” prediction ability. It avoids overfitting by using cross-validation to
derive the genetic predictor and determines if the count of genes that give “good”
prediction could have been obtained by chance. This test was then used to explore
gene expression of colonic tissue and exfoliated colonocytes in the fecal stream to
discover similarities between the two.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2010-08-8269 |
Date | 2010 August 1900 |
Creators | Wei, Jiawei |
Contributors | Carroll, Raymond J. |
Source Sets | Texas A and M University |
Language | en_US |
Detected Language | English |
Type | thesis, text |
Format | application/pdf |
Page generated in 0.002 seconds