Return to search

The Growth Curve Model for High Dimensional Data and its Application in Genomics

<p>Recent advances in technology have allowed researchers to collect high-dimensional biological data simultaneously. In genomic studies, for instance, measurements from tens of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in massive amount of data. In such experiments, researchers are faced with two types of high-dimensionality. The first is global high-dimensionality, which is common to all genomic experiments. The global high-dimensionality arises because inference is being done on tens of thousands of genes resulting in multiplicity. This challenge is often dealt with statistical methods for multiple comparison, such as the Bonferroni correction or false discovery rate (FDR). We refer to the second type of high-dimensionality as gene specific high-dimensionality, which arises in time course microarry experiments due to the fact that, in such experiments, sample size is often smaller than the number of time points ($n</p> <p>In this thesis, we use the growth curve model (GCM), which is a generalized multivariate analysis of variance (GMANOVA) model, and propose a moderated test statistic for testing a special case of the general linear hypothesis, which is specially useful for identifying genes that are expressed. We use the trace test for the GCM and modify it so that it can be used in high-dimensional situations. We consider two types of moderation: the Moore-Penrose generalized inverse and Stein's shrinkage estimator of $ S $. We performed extensive simulations to show performance of the moderated test, and compared the results with original trace test. We calculated empirical level and power of the test under many scenarios. Although the focus is on hypothesis testing, we also provided moderated maximum likelihood estimator for the parameter matrix and assessed its performance by investigating bias and mean squared error of the estimator and compared the results with those of the maximum likelihood estimators. Since the parameters are matrices, we consider distance measures in both power and level comparisons as well as when investigating bias and mean squared error. We also illustrated our approach using time course microarray data taken from a study on Lung Cancer. We were able to filter out 1053 genes as non-noise genes from a pool of 22,277 genes which is approximately 5\% of the total number of genes. This is in sync with results from most biological experiments where around 5\% genes are found to be differentially expressed.</p> / Master of Science (MSc)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/12780
Date04 1900
CreatorsJana, Sayantee
ContributorsHamid, Jemila S, N. Balakrishnan, R. Viveros, Mathematics and Statistics
Source SetsMcMaster University
Detected LanguageEnglish
Typethesis

Page generated in 0.0018 seconds