DNA microarray technology makes it possible to analyze the expression levels of many thousands of genes simultaneously. One of the goals of microarray data analysis is to understand the multiple biological roles of genes and their interactions in complex biological processes. Genes with similar expression patterns are likely to share similar functions or biological processes. Therefore, analysis of changes in gene expression of a certain biological processes over time is of particular interest. Unsupervised clustering methods provide an efficient way of finding overall patterns and tendencies by clustering microarray gene expression data. The genes in the same cluster are regulated in a similar manner based on the assumption above. But traditional unsupervised clustering methods usually end up with clusters of genes with similar expression patterns but without interpretations describing the clusters in terms of gene functions or processes involved.
In this project, some statistical techniques are applied to analyze the data set from microarray experiments of sporulation in yeast. These techniques include LOWESS data normalization, which is intended to remove the systematic variations from the data; a partitional clustering method, K-means, is used with initial centroids obtained from hierarchical clustering method of DIANA; the "gap statistic" technique is implemented to estimate the "optimal" number of clusters in the data set; and finally multiple hypothesis testing is used to determine whether biologically related genes are statistically over-represented in the gene clusters using the web query tool FatiGO. These methods are combined with graphical representation of cluster profile shape and colour maps of up and down regulation via heat maps. Application of these methods to a yeast sporulation time-course data set [Chu 𝘦𝘵 𝘢𝘭. 1998] demonstrates the utility of cluster analysis to such data sets and provides an automated method for including biological information about gene function and characteristics. / Thesis / Master of Science (MSc)
Identifer | oai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/22779 |
Date | 09 1900 |
Creators | Li, Fang |
Contributors | Esterby, Sylvia, Statistics |
Source Sets | McMaster University |
Language | English |
Detected Language | English |
Type | Thesis |
Page generated in 0.0565 seconds