De novo motif discovery in biological sequences is an important and computationally challenging problem. A myriad of algorithms have been developed to solve this problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. We present a novel and fast tool MCAT (Motif Combining and Association Tool) for de novo motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). We apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has a better performance in practice. / Master of Science / Finding hidden motifs in DNA or protein sequences is an important and computationally challenging problem. A motif is a short patterned DNA/protein sequence that has biological functions. Motifs regulate the process of gene expression, which is the fundamental biological process in which DNA is transcribed into RNA which is then translated to protein. In the past 20 years, a myriad of algorithms have been developed to solve the motif finding problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. I present a novel and fast tool MCAT (Motif Combining and Association Tool) for motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). I apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has an improved performance in practice.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/84999 |
Date | 02 July 2018 |
Creators | Yang, Yanshen |
Contributors | Computer Science, Heath, Lenwood S., Zhang, Liqing, Hauf, Silke |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0023 seconds