On TFBS motif discovery, three novel GA based algorithms are developed, namely GALF-P with focus on optimization, GALF-G for modeling, and GASMEN for spaced motifs. Novel memetic operators are introduced, namely local filtering and probabilistic refinement, to significantly improve effectiveness (e.g. 73% better than MEME) and efficiency (e.g. 4.49 times speedup) in search. The GA based algorithms have been extensively tested on comprehensive synthetic, real and benchmark datasets, and shown outstanding performances compared with state-of-the-art approaches. Our algorithms also "evolve" to handle more and more relaxed cases, namely from fixed motif widths to most flexible widths, from single motifs to multiple motifs with overlapping control, from stringent motif instance assumption to very relaxed ones, and from contiguous motifs to generic spaced motifs with arbitrary spacers. / TF-TFBS associated sequence pattern (rule) discovery is further investigated for better deciphering protein-DNA interactions in regulation. We for the first time generalize previous exact TF-TFBS rules to approximate ones using a progressive approach. A customized algorithm is developed, outperforming MEME by over 73%. The approximate TF-TFBS rules, compared with the exact ones, have significantly more verified rules and better verification ratios. Detailed analysis on PDB cases and conservation verification on NCBI protein records illustrate that the approximate rules reveal the flexible and specific protein-DNA interactions with much greater generalized capability. / The comprehensive pattern discovery algorithms developed will be further verified, improved and extended to further deciphering transcriptionial regulation, such as inferring whole gene regulatory networks by applying TFBS and TF-TFBS patterns discovered and incorporating expression data. / Transcription Factor (TF) and Transcription Factor Binding Site (TFBS) bindings are fundamental protein-DNA interactions in transcriptional regulation. TFs and TFBSs are conserved to form patterns (motifs) due to their important roles for controlling gene expressions and finally affecting functions and appearances. Pattern discovery is thus important for deciphering gene regulation, which has tremendous impacts on the understanding of life, bio-engineering and therapeutic applications. This thesis contributes to pattern discovery involving TFBS motifs and TF-TFBS associated sequence patterns based on Evolutionary Computation (EC), especially Genetic Algorithms (GAs), which are promising for bioinformatics problems with huge and noisy search space. / Chan, Tak Ming. / Advisers: Kwong-Sak Leung; Kin-Hong Lee. / Source: Dissertation Abstracts International, Volume: 73-03, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 147-153). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_344879 |
Date | January 2010 |
Contributors | Chan, Tak Ming., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, theses |
Format | electronic resource, microform, microfiche, 1 online resource (xviii, 156 leaves : ill.) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0028 seconds