The availability of large gene expression microarray data has brought along many challenges for biological data mining. Many different clustering methods have been proposed and widely used to analyze gene expression data. The underlying concept allows to identify sets of genes sharing similar expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and data sets. Currently, there are several biclustering methods that use different techniques; however, it is not clear how to compare the resulted biclusters with respect to biological relevance. So far, there are no available guidelines for choosing a biclustering technique from available ones. In this work, we propose two new Mean Squared Residue (MSR) based biclustering methods. The first method is a dual biclustering algorithm which finds a set of biclusters using a greedy approach. The second method combines dual biclustering algorithm with quadratic programming. The dual biclustering algorithm reduces the size of the matrix, so that the quadratic program can find an optimal bicluster reasonably fast. We also describe the comparison method, explain how we handle bicluster’s overlap and how we treat missing data.
Identifer | oai:union.ndltd.org:GEORGIA/oai:digitalarchive.gsu.edu:cs_diss-1042 |
Date | 01 December 2009 |
Creators | Gremalschi, Stefan |
Publisher | Digital Archive @ GSU |
Source Sets | Georgia State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Computer Science Dissertations |
Page generated in 0.0023 seconds