Because of the Human Genome Project, enormous quantities of biological data, e.g., microarray data, are generated. Since the amount of biological data is very large, data mining techniques can be used to help biologists efficiently analyze the biological data. For microarray data, biclustering, which performs simulataneous clustering of rows (e.g., genes) and columns (e.g., conditions), has proved of great value for finding interesting patterns. There were several types of biclusters proposed. To mine biclusters with coherent values, most of the previous methods need to compute Maximum Dimension Sets (MDSs) for every two genes in the microarray data. Since the number of genes is far larger than the number of conditions, this step is inefficient. On the other hand, to mine biclusters with coherent evolutions, the Co-gclustering method was proposed which could simultaneously find biclusters with both coregulated and negative-coregulated patterns. However, its time complexity is exponential to the number of conditions, which is not efficient. Therefore, in this dissertation, to efficiently solve the problem of biclustering for microarray databases, first, we propose a Condition Enumeration Tree (CE-Tree) method which mines biclusters with coherent values. Second, we propose an Up-Down Bit Pattern (UDB) method which mines biclusters with coherent evolutions. In the first proposed method, CE-Tree, to mine biclusters, instead of generating MDSs for every two genes, we generate only MDSs for every two conditions. Then, we expand the CE-Tree in a special local breadth-first within global depth-first manner to efficiently find the clustering result. From the experimental results on real data, we have shown that the CE-Tree method could mine biclusters more efficiently than several previous methods. In the second proposed method, UDB, we utilize up-down bit patterns to record the condition pairs where one gene is upregulated or downregulated. Then, we utilize bit operations and apply a heuristic idea on these up-down bit patterns to efficiently find the clustering result. As compared to the Co-gclustering method, the UDB method reduces the time complexity from exponential time to polynomial time. From the experimental results on real data, we have shown that the UDB method is more efficient than the Co-gclustering method.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0614110-161634 |
Date | 14 June 2010 |
Creators | Chen, Jiun-Rung |
Contributors | Suh-Yin Lee, Chiang Lee, Wei-Pang Yang, Ye-In Chang, Chungnan Lee, Vincent Shin-Mu Tseng, Tei-Wei Kuo |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0614110-161634 |
Rights | withheld, Copyright information available at source archive |
Page generated in 0.0019 seconds