1 |
Mixture Modeling and Outlier Detection in Microarray Data AnalysisGeorge, Nysia I. 16 January 2010 (has links)
Microarray technology has become a dynamic tool in gene expression analysis
because it allows for the simultaneous measurement of thousands of gene expressions.
Uniqueness in experimental units and microarray data platforms, coupled with how
gene expressions are obtained, make the field open for interesting research questions.
In this dissertation, we present our investigations of two independent studies related
to microarray data analysis.
First, we study a recent platform in biology and bioinformatics that compares
the quality of genetic information from exfoliated colonocytes in fecal matter with
genetic material from mucosa cells within the colon. Using the intraclass correlation
coe�cient (ICC) as a measure of reproducibility, we assess the reliability of density
estimation obtained from preliminary analysis of fecal and mucosa data sets. Numerical findings clearly show that the distribution is comprised of two components.
For measurements between 0 and 1, it is natural to assume that the data points are
from a beta-mixture distribution. We explore whether ICC values should be modeled
with a beta mixture or transformed first and fit with a normal mixture. We find that
the use of mixture of normals in the inverse-probit transformed scale is less sensitive toward model mis-specification; otherwise a biased conclusion could be reached. By
using the normal mixture approach to compare the ICC distributions of fecal and
mucosa samples, we observe the quality of reproducible genes in fecal array data to
be comparable with that in mucosa arrays.
For microarray data, within-gene variance estimation is often challenging due
to the high frequency of low replication studies. Several methodologies have been
developed to strengthen variance terms by borrowing information across genes. However, even with such accommodations, variance may be initiated by the presence of
outliers. For our second study, we propose a robust modification of optimal shrinkage variance estimation to improve outlier detection. In order to increase power, we
suggest grouping standardized data so that information shared across genes is similar
in distribution. Simulation studies and analysis of real colon cancer microarray data
reveal that our methodology provides a technique which is insensitive to outliers, free of distributional assumptions, effective for small sample size, and data adaptive.
|
Page generated in 0.1094 seconds