Global ETD Search

1	MCAT: Motif Combining and Association Tool Yang, Yanshen 02 July 2018 (has links) De novo motif discovery in biological sequences is an important and computationally challenging problem. A myriad of algorithms have been developed to solve this problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. We present a novel and fast tool MCAT (Motif Combining and Association Tool) for de novo motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). We apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has a better performance in practice. / Master of Science / Finding hidden motifs in DNA or protein sequences is an important and computationally challenging problem. A motif is a short patterned DNA/protein sequence that has biological functions. Motifs regulate the process of gene expression, which is the fundamental biological process in which DNA is transcribed into RNA which is then translated to protein. In the past 20 years, a myriad of algorithms have been developed to solve the motif finding problem with varying success, but it can be difficult for even a small number of these tools to reach a consensus. Because individual tools can be better suited for specific scenarios, an ensemble tool that combines the results of many algorithms can yield a more confident and complete result. I present a novel and fast tool MCAT (Motif Combining and Association Tool) for motif discovery by combining six state-of-the-art motif discovery tools (MEME, BioProspector, DECOD, XXmotif, Weeder, and CMF). I apply MCAT to data sets with DNA sequences that come from various species and compare our results with two well-established ensemble motif finding tools, EMD and DynaMIT. The experimental results show that MCAT is able to identify exact match motifs in DNA sequences efficiently, and it has an improved performance in practice. Motif finding
2	Improved Algorithms for Discovery of Transcription Factor Binding Sites in DNA Sequences Zhao, Xiaoyan 2010 December 1900 (has links) Understanding the mechanisms that regulate gene expression is a major challenge in biology. One of the most important tasks in this challenge is to identify the transcription factors binding sites (TFBS) in DNA sequences. The common representation of these binding sites is called “motif” and the discovery of TFBS problem is also referred as motif finding problem in computer science. Despite extensive efforts in the past decade, none of the existing algorithms perform very well. This dissertation focuses on this difficult problem and proposes three new methods (MotifEnumerator, PosMotif, and Enrich) with excellent improvements. An improved pattern-driven algorithm, MotifEnumerator, is first proposed to detect the optimal motif with reduced time complexity compared to the traditional exact pattern-driven approaches. This strategy is further extended to allow arbitrary don’t care positions within a motif without much decrease in solvable values of motif length. The performance of this algorithm is comparable to the best existing motif finding algorithms on a large benchmark set of samples. Another algorithm with further post processing, PosMotif, is proposed to use a string representation that allows arbitrary ignored positions within the non-conserved portion of single motifs, and use Markov chains to model the background distributions of motifs of certain length while skipping these positions within each Markov chain. Two post processing steps considering redundancy information are applied in this algorithm. PosMotif demonstrates an improved performance compared to the best five existing motif finding algorithms on several large benchmark sets of samples. The third method, Enrich, is proposed to improve the performance of general motif finding algorithms by adding more sequences to the samples in the existing benchmark datasets. Five famous motif finding algorithms have been chosen to run on the original datasets and the enriched datasets, and the performance comparisons show a general great improvement on the enriched datasets. Computational Biology Motif finding Transcription
3	Motif Finding in Biological Sequences Liao, Ying-Jer 21 August 2003 (has links) A huge number of genomic information, including protein and DNA sequences, is generated by the human genome project. Deciphering these sequences and detecting local residue patterns of multiple sequences are very difficult. One of the ways to decipher these biological sequences is to detect local residue patterns from them. However, detecting unknown patterns from multiple sequences is still very difficult. In this thesis, we propose an algorithm, based on the Gibbs sampler method, for identifying local consensus patterns (motifs) in monomolecular sequences. We first designed an ACO (ant colony optimization) algorithm to find a good initial solution and a set of better candidate positions for revising the motif. Then the Gibbs sampler method is applied with these better candidate positions as the input. The required time for finding motifs using our algorithm is reduced drastically. It takes only 20 % of time of the Gibbs sampler method and it maintains the comparable quality. Computational Biology Motif Finding ACO Algorithm Local Sequences Alignment
4	COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences Liang, Chengzhi January 2001 (has links) Consensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs. Computer Science bioinformatics software multiple alignment motif-finding consensus pattern problem
5	COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences Liang, Chengzhi January 2001 (has links) Consensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs. Computer Science bioinformatics software multiple alignment motif-finding consensus pattern problem
6	Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic Nephropathy Schmidt, Jens January 2013 (has links) No description available. Computer Science Bioinformatics STAT5 diabetic nephropathy diabetes inflammation motif finding TFBS
7	Computational models to investigate binding mechanisms of regulatory proteins Munteanu, Alina 07 May 2018 (has links) Es gibt tausende regulatorische Proteine in Eukaryoten, die spezifische cis-regulatorischen Elemente von Genen und/oder RNA-Transkripten binden und die Genexpession koordinieren. Auf DNA-Ebene modulieren Transkriptionsfaktoren (TFs) die Initiation der Transkription, während auf RNA-Ebene RNA-bindende Proteine (RBPs) viele Aspekte des RNA-Metabolismus und der RNA-Funktion regulieren. Für hunderte dieser regulatorischer Proteine wurden die gebundenen Gene beziehungsweise RNA-Transkripte, sowie deren etwaige Sequenzbindepräferenzen mittels in vivo oder in vitro Hochdurchsatz-Experimente bestimmt. Zu diesen Methoden zählen unter anderem Chromatin-Immunpräzipitation (ChIP) gefolgt von Sequenzierung (ChIP-seq) und Protein Binding Microarrays (PBMs) für TFs, sowie Cross-Linking und Immunpräzipitation (CLIP)-Techniken und RNAcompete für RBPs. In vielen Fällen kann die zum Teil hohe Bindespezifität für ein zumeist sehr kurzes Sequenzmotiv regulatorischer Proteine nicht allein durch die gebundene Primärsequenz erklärt werden. Um besser zu verstehen, wie verschiedene Proteine ihre regulatorische Spezifität erreichen, haben wir zwei Computerprogramme entwickelt, die zusätzliche Informationen in die Analyse von experimentell bestimmten Bindestellen einbeziehen und somit differenziertere Bindevorhersagen ermöglichen. Für Protein-DNA-Interaktionen untersuchen wir die Bindungsspezifität paraloger TFs (d.h. Mitglieder der gleichen TF-Familie). Mit dem Fokus auf der Unterscheidung von genomischen Regionen, die in vivo von Paaren eng miteinander verwandter TFs gebunden sind, haben wir ein Klassifikationsframework entwickelt, das potenzielle Co-Faktoren identifiziert, die zur Spezifität paraloger TFs beitragen. Für Protein-RNA-Interaktionen untersuchen wir die Rolle von RNA-Sekundärstruktur und ihre Auswirkung auf die Auswahl von Bindestellen. Wir haben einen Motif-Finding-Algorithmus entwickelt, der Sekundärstruktur und Primärsequenz integriert, um Bindungspräferenzen der RBPs besser zu bestimmen. / There are thousands of eukaryotic regulatory proteins that bind to specific cis regulatory regions of genes and/or RNA transcripts and coordinate gene expression. At the DNA level, transcription factors (TFs) modulate the initiation of transcription, while at the RNA level, RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. The DNA or RNA targets and/or the sequence preferences of hundreds of eukaryotic regulatory proteins have been determined thus far using high-throughput in vivo and in vitro experiments, such as chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) and protein binding microarrays (PBMs) for TFs, or cross-linking and immunoprecipitation (CLIP) techniques and RNAcompete for RBPs. However, the derived short sequence motifs do not fully explain the highly specific binding of these regulatory proteins. In order to improve our understanding of how different proteins achieve their regulatory specificity, we developed two computational tools that incorporate additional information in the analysis of experimentally determined binding sites. For protein-DNA interactions, we investigate the binding specificity of paralogous TFs (i.e. members of the same TF family). Focusing on distinguishing between genomic regions bound in vivo by pairs of closely-related TFs, we developed a classification framework that identifies putative co-factors that provide specificity to paralogous TFs. For protein-RNA interactions, we investigate the role of RNA secondary structure and its impact on binding-site recognition. We developed a motif finding algorithm that integrates secondary structure together with primary sequence in order to better identify binding preferences of RBPs. Genexpession regulatorische Proteine Motif-Finding-Algorithmus Klassifikation gene expression regulatory proteins motif finding classification 004 Datenverarbeitung; Informatik 570 Biowissenschaften; Biologie WC 7700 ddc:000 ddc:004 ddc:570

1

Page generated in 0.0704 seconds