Global ETD Search

1	Novel stochastic and entropy-based Expectation-Maximisation algorithm for transcription factor binding site motif discovery Kilpatrick, Alastair Morris January 2015 (has links) The discovery of transcription factor binding site (TFBS) motifs remains an important and challenging problem in computational biology. This thesis presents MITSU, a novel algorithm for TFBS motif discovery which exploits stochastic methods as a means of both overcoming optimality limitations in current algorithms and as a framework for incorporating relevant prior knowledge in order to improve results. The current state of the TFBS motif discovery field is surveyed, with a focus on probabilistic algorithms that typically take the promoter regions of coregulated genes as input. A case is made for an approach based on the stochastic Expectation-Maximisation (sEM) algorithm; its position amongst existing probabilistic algorithms for motif discovery is shown. The algorithm developed in this thesis is unique amongst existing motif discovery algorithms in that it combines the sEM algorithm with a derived data set which leads to an improved approximation to the likelihood function. This likelihood function is unconstrained with regard to the distribution of motif occurrences within the input dataset. MITSU also incorporates a novel heuristic to automatically determine TFBS motif width. This heuristic, known as MCOIN, is shown to outperform current methods for determining motif width. MITSU is implemented in Java and an executable is available for download. MITSU is evaluated quantitatively using realistic synthetic data and several collections of previously characterised prokaryotic TFBS motifs. The evaluation demonstrates that MITSU improves on a deterministic EM-based motif discovery algorithm and an alternative sEM-based algorithm, in terms of previously established metrics. The ability of the sEM algorithm to escape stable fixed points of the EM algorithm, which trap deterministic motif discovery algorithms and the ability of MITSU to discover multiple motif occurrences within a single input sequence are also demonstrated. MITSU is validated using previously characterised Alphaproteobacterial motifs, before being applied to motif discovery in uncharacterised Alphaproteobacterial data. A number of novel results from this analysis are presented and motivate two extensions of MITSU: a strategy for the discovery of multiple different motifs within a single dataset and a higher order Markov background model. The effects of incorporating these extensions within MITSU are evaluated quantitatively using previously characterised prokaryotic TFBS motifs and demonstrated using Alphaproteobacterial motifs. Finally, an information-theoretic measure of motif palindromicity is presented and its advantages over existing approaches for discovering palindromic motifs discussed. 572.8
2	Motif Selection: Identification of Gene Regulatory Elements using Sequence CoverageBased Models and Evolutionary Algorithms Al-Ouran, Rami January 2015 (has links) No description available. Bioinformatics Computer Science Motif selection motif discovery ENCODE
3	A Parallel, High-Throughput Framework for Discovery of DNA Motifs Kurz, Kyle W. 27 July 2010 (has links) No description available. Bioinformatics Computer Science DNA motifs bioinformatics motif discovery bioinformatics framework
4	MotifGP: DNA Motif Discovery Using Multiobjective Evolution Belmadani, Manuel January 2016 (has links) The motif discovery problem is becoming increasingly important for molecular biologists as new sequencing technologies are producing large amounts of data, at rates which are unprecedented. The solution space for DNA motifs is too large to search with naive methods, meaning there is a need for fast and accurate motif detection tools. We propose MotifGP, a multiobjective motif discovery tool evolving regular expressions that characterize overrepresented motifs in a given input dataset. This thesis describes and evaluates a multiobjective strongly typed genetic programming algorithm for the discovery of network expressions in DNA sequences. Using 13 realistic data sets, we compare the results of our tool, MotifGP, to that of DREME, a state-of-art program. MotifGP outperforms DREME when the motifs to be sought are long, and the specificity is distributed over the length of the motif. For shorter motifs, the performance of MotifGP compares favourably with the state-of-the-art method. Finally, we discuss the advantages of multi-objective optimization in the context of this specific motif discovery problem. Genetic Programming Multiobjective optimization Motif Discovery Evolutionary Computing Bioinformatics ChIP-seq
5	In Silico Discovery of Pollen-specific Cis-regulatory Elements in the Arabidopsis Hydroxyproline-Rich Glycoprotein Gene Family Wolfe, Richard A. January 2014 (has links) No description available. Bioinformatics Computer Science Bioinformatics transcription factor transcription factor binding site motif discovery computer
6	Mining Gene Regulatory Motifs Using the Concept of Sequence Coverage Naik, Ashwini January 2014 (has links) No description available. Bioinformatics Computer Science Sequence Coverage Motif Discovery Set Cover Mining Greedy Method
7	Using Weighted Set Cover to Identify Biologically Significant Motifs Schmidt, Robert J.M. January 2015 (has links) No description available. Bioinformatics Computer Science Discriminative Motif Discovery Weighted Set Cover Greedy Set Cover Linear Programming ENCODE
8	Discovering Protein Sequence-Structure Motifs and Two Applications to Structural Prediction Tang, Thomas Cheuk Kai January 2004 (has links) This thesis investigates the correlations between short protein peptide sequences and local tertiary structures. In particular, it introduces a novel algorithm for partitioning short protein segments into clusters of local sequence-structure motifs, and demonstrates that these motif clusters contain useful structural information via two applications to structural prediction. The first application utilizes motif clusters to predict local protein tertiary structures. A novel dynamic programming algorithm that performs comparably with some of the best existing algorithms is described. The second application exploits the capability of motif clusters in recognizing regular secondary structures to improve the performance of secondary structure prediction based on Support Vector Machines. Empirical results show significant improvement in overall prediction accuracy with no performance degradation in any specific aspect being measured. The encouraging results obtained illustrate the great potential of using local sequence-structure motifs to tackle protein structure predictions and possibly other important problems in computational biology. Computer Science Bioinformatics Data mining Clustering Secondary Structure Prediction Local Tertiary Structure Prediction SVM
9	Genome-Wide Studies of Transcriptional Regulation in Mammalian Cells Wallerman, Ola January 2010 (has links) The key to the complexity of higher organisms lies not in the number of protein coding genes they carry, but rather in the intrinsic complexity of the gene regulatory networks. The major effectors of transcriptional regulation are proteins called transcription factors, and in this thesis four papers describing genome-wide studies of seven such factors are presented, together with studies on components of the chromatin and transcriptome. In Paper I, we optimized a large-scale in vivo method, ChIP-chip, to study protein – DNA interactions using microarrays. The metabolic-disease related transcription factors USF1, HNF4a and FOXA2 were studied in 1 % of the genome, and a surprising number of binding sites were found, mostly far from annotated genes. In Paper II, a novel sequencing based method, ChIP-seq, was applied to FOXA2, HNF4a and GABPa, allowing a true genome-wide view of binding sites. A large overlap between the datasets were seen, and molecular interactions were verified in vivo. Using a ChIP-seq specific motif discovery method, we identified both the expected motifs and several for co-localized transcription factors. In Paper III, we identified and studied a novel transcription factor, ZBED6, using the ChIP-seq method. Here, we went from one known binding site to several hundred sites throughout the mouse genome. Finally, in Paper IV, we studied the chromatin landscape by deep sequencing of nucleosomal DNA, and further used RNA-sequencing to quantify expression levels, and extended the knowledge about the binding profiles for the transcription factors NFY and TCF7L2. ChIP ChIP-chip ChIP-seq transcription factors motif discovery nucleosome positioning HepG2 genome-wide RNA-seq Medical genetics Medicinsk genetik
10	Computational Biology: Insights into Hemagglutinin and Polycomb Repressive Complex 2 Function January 2012 (has links) Influenza B virus hemagglutinin (HA) is a major surface glycoprotein with frequent amino-acid substitutions. However, the roles of antibody selection in the amino-acid substitutions of HA were still poorly understood. An analysis was conducted on a total of 271 HA 1 sequences of influenza B virus strains isolated during 1940âˆ¼2007 finding positively selected sites all located in the four major epitopes (120-loop, 150-loop, 160-loop and 190-helix) supporting a predominant role of antibody selection in HA evolution. Of particular significance is the involvement of the 120-loop in positive selection. Influenza B virus HA continues to evolve into new sublineages, within which the four major epitopes were targeted selectively in positive selection. Thus, any newly emerging strains need to be placed in the context of their evolutionary history in order to understand and predict their epidemic potential. As key epigenetic regulators, polycomb group (PcG) proteins are responsible for the control of cell proliferation and differentiation as well as stem cell pluripotency and self-renewal. To facilitate experimental identification of PcG target genes, which are poorly understood, we propose a novel computational method, EpiPredictor , which models transcription factor interaction using a non-linear kernel. The resulting targets suggests that multiple transcription factor networking at the cis -regulatory elements is critical for PcG recruitment, while high GC content and high conservation level are also important features of PcG target genes. To try to translate the EpiPredictor into human data, we performed a computational study utilizing 22 human genome-wide CHIP data to identify DNA motifs and genome features that would potentially specify PRC2 using five motif discovery algorithms, Jaspar known transcription binding motifs, and other whole genome data. We have found multiple motifs within the various subgroups of experimental categories that have much higher enrichment against CHIP identified gene promoter than among random gene promoters. Specifically, we have identified Low CpG content CpG Islands (LeG's) as being critical in the separation of Cancer cell line identified targets from Embryonic Stem cell line identified targets. Additionally, there are differences between human and mouse ES cell predictions using the same motifs and features suggesting relevant evolutionary divergence. Applied sciences Biological sciences Polycomb Polycomb response element Hemagglutinin Motif discovery Positive selective pressure Viral evolution Biomedical engineering Bioinformatics

Search results