1 |
Computational discovery of Cis-regulatory elements in multiple drosophila speciesArunachalam, Manonmani 09 November 2009 (has links) (PDF)
Gene regulation lies at the heart of most biological processes and transcription factors are the key molecules that control tissues specific gene expression. In higher eukaryotes transcription factors control gene expression by binding regulatory DNA segments called cis-regulatory modules (CRMs). The increasing number of sequenced genomes of multicellular eukaryotes along with high-throughput methods such as whole genome microarray expression data allows for systematic characterization of the CRMs that control gene expression. A first step towards understanding gene regulation is the identification of the regulatory elements present in the genome. We take advantage of the large database of spatio-temporal patterns of gene expression in D. melanogaster embryogenesis to identify sets of developmentally co-expressed genes. We developed a computational method that identifies DNA binding sites for transcription factors from families of co-regulated genes that are expressed during Drosophila embryo development. This method discovers over-represented motifs in a set of co-regulated genes using the exhaustive motif enumeration technique. Clustering the predicted motifs identifies the CRMs, which assist in translating a combinatorial code of TF inputs into a specific gene expression output. The predicted CRMs were verified experimentally by searching the whole genome for the predicted CRMs and establishing expression pattern of the genes that are associated with these CRMs. It is well know that the gene expression is substantially controlled through CRMs and those key regulatory sequences are conserved in related species. The conservation of CRMs can be studied by comparing the related genomes and alignment methods are widely used computational tools for comparing the sequences. However, in distantly related species the CRM sequences are simply not align able. To identify the similar CRMs in distantly related species we developed a non-alignment based method for discovering similar CRMs in related species. This method is based on word frequencies where the given sequences are compared using Poisson based metric. When starting with a set of CRMs involved in Drosophila early embryo development, we show here that our non-alignment method successfully detects similar CRMs in distantly related species ( D. ananassae, D. pseudoobscura, D. willisoni, D. mojavensis, D. virilis, D. grimshawi ). This method proved efficient in discriminating the functional CRMs from the non-functional ones.
|
2 |
Computational discovery of Cis-regulatory elements in multiple drosophila speciesArunachalam, Manonmani 02 November 2009 (has links)
Gene regulation lies at the heart of most biological processes and transcription factors are the key molecules that control tissues specific gene expression. In higher eukaryotes transcription factors control gene expression by binding regulatory DNA segments called cis-regulatory modules (CRMs). The increasing number of sequenced genomes of multicellular eukaryotes along with high-throughput methods such as whole genome microarray expression data allows for systematic characterization of the CRMs that control gene expression. A first step towards understanding gene regulation is the identification of the regulatory elements present in the genome. We take advantage of the large database of spatio-temporal patterns of gene expression in D. melanogaster embryogenesis to identify sets of developmentally co-expressed genes. We developed a computational method that identifies DNA binding sites for transcription factors from families of co-regulated genes that are expressed during Drosophila embryo development. This method discovers over-represented motifs in a set of co-regulated genes using the exhaustive motif enumeration technique. Clustering the predicted motifs identifies the CRMs, which assist in translating a combinatorial code of TF inputs into a specific gene expression output. The predicted CRMs were verified experimentally by searching the whole genome for the predicted CRMs and establishing expression pattern of the genes that are associated with these CRMs. It is well know that the gene expression is substantially controlled through CRMs and those key regulatory sequences are conserved in related species. The conservation of CRMs can be studied by comparing the related genomes and alignment methods are widely used computational tools for comparing the sequences. However, in distantly related species the CRM sequences are simply not align able. To identify the similar CRMs in distantly related species we developed a non-alignment based method for discovering similar CRMs in related species. This method is based on word frequencies where the given sequences are compared using Poisson based metric. When starting with a set of CRMs involved in Drosophila early embryo development, we show here that our non-alignment method successfully detects similar CRMs in distantly related species ( D. ananassae, D. pseudoobscura, D. willisoni, D. mojavensis, D. virilis, D. grimshawi ). This method proved efficient in discriminating the functional CRMs from the non-functional ones.
|
3 |
Identification and Characterization of Novel Ribosomal Protein-binding RNA motifs in BacteriaFu, Yang January 2014 (has links)
Thesis advisor: Michelle M. Meyer / As the factory responsible for producing proteins, ribosomes are of great importance. In bacteria, ribosomes are composed of three ribosomal RNAs (rRNA) of different sizes, and around 50 ribosomal proteins (r-protein). During ribosome biogenesis in bacteria, synthesis of rRNAs and r-proteins are both tightly regulated and coordinated to ensure robust growth. In particular, a group of cis-regulatory RNA elements located in the 5' untranslated regions or the intergenic regions in r-protein operons are responsible for the regulation of r-protein biosynthesis. Based on the fact that RNA-regulated r-protein biosynthesis is essential and universal in bacteria, such unique and varied regulatory RNAs could provide new targets for antibacterial purpose. In this thesis, we report and experimentally verify a novel r-protein L1 regulation model that contains dual L1-binding RNA motif, and for the first time, a S6:S18 dimer-binding RNA structure in the S6 operon. We also describe Escherichia coli-based and Schizosaccharomyces pombe-based reporter systems for in vivo characterization of RNA-protein interactions. So far, both in vivo systems failed to report RNA-protein interactions, and thus need further tuning. In addition, we performed phage-display to select for regulatory RNA-binding small peptides and examined their effects on bacteria viability. One selected peptide, N-TVNFKLY-C, caused defective growth when overexpressed in E. coli. Yet, further studies must be conducted to verify the possibility that bacteria were killed by direct RNA-peptide interaction that disrupted the native r-protein regulation. / Thesis (MS) — Boston College, 2014. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology.
|
4 |
ModuleInducer: Automating the Extraction of Knowledge from Biological SequencesKorol, Oksana 14 October 2011 (has links)
In the past decade, fast advancements have been made in the sequencing, digitalization and collection of the biological data. However the bottleneck remains at the point of analysis and extraction of patterns from the data. We have developed a method that is aimed at widening this bottleneck by automating the knowledge extraction from the biological data. Our approach is aimed at discovering patterns in a set of DNA sequences based on the location of transcription factor binding sites or any other biological markers with the emphasis of discovering relationships. A variety of statistical and computational methods exists to analyze such data. However, they either require an initial hypothesis, which is later tested, or classify the data based on its attributes. Our approach does not require an initial hypothesis and the classification it produces is based on the relationships between attributes. The value of such approach is that is is able to uncover new knowledge about the data by inducing a general theory based on basic known rules.
The core of our approach lies in an inductive logic programming engine, which, based on positive and negative examples as well as background knowledge, is able to induce a descriptive, human-readable theory, describing the data. An application provides an end-to-end analysis of DNA sequences. A simple to use Web interface accepts a set of related sequences to be analyzed, set of negative example sequences to contrast the main set (optional), and a set of possible genetic markers as position-specific scoring matrices. A Java-based backend formats the sequences, determines the location of the genetic markers inside them and passes the information to the ILP engine, which induces the theory.
The model, assumed in our background knowledge, is a set of basic interactions between biological markers in any DNA sequence. This makes our approach applicable to analyze a wide variety of biological problems, including detection of cis-regulatory modules and analysis of ChIP-Sequencing experiments. We have evaluated our method in the context of such applications on two real world datasets as well as a number of specially designed synthetic datasets. The approach has shown to have merit even in situations when no significant classification could be determined.
|
5 |
ModuleInducer: Automating the Extraction of Knowledge from Biological SequencesKorol, Oksana 14 October 2011 (has links)
In the past decade, fast advancements have been made in the sequencing, digitalization and collection of the biological data. However the bottleneck remains at the point of analysis and extraction of patterns from the data. We have developed a method that is aimed at widening this bottleneck by automating the knowledge extraction from the biological data. Our approach is aimed at discovering patterns in a set of DNA sequences based on the location of transcription factor binding sites or any other biological markers with the emphasis of discovering relationships. A variety of statistical and computational methods exists to analyze such data. However, they either require an initial hypothesis, which is later tested, or classify the data based on its attributes. Our approach does not require an initial hypothesis and the classification it produces is based on the relationships between attributes. The value of such approach is that is is able to uncover new knowledge about the data by inducing a general theory based on basic known rules.
The core of our approach lies in an inductive logic programming engine, which, based on positive and negative examples as well as background knowledge, is able to induce a descriptive, human-readable theory, describing the data. An application provides an end-to-end analysis of DNA sequences. A simple to use Web interface accepts a set of related sequences to be analyzed, set of negative example sequences to contrast the main set (optional), and a set of possible genetic markers as position-specific scoring matrices. A Java-based backend formats the sequences, determines the location of the genetic markers inside them and passes the information to the ILP engine, which induces the theory.
The model, assumed in our background knowledge, is a set of basic interactions between biological markers in any DNA sequence. This makes our approach applicable to analyze a wide variety of biological problems, including detection of cis-regulatory modules and analysis of ChIP-Sequencing experiments. We have evaluated our method in the context of such applications on two real world datasets as well as a number of specially designed synthetic datasets. The approach has shown to have merit even in situations when no significant classification could be determined.
|
6 |
ModuleInducer: Automating the Extraction of Knowledge from Biological SequencesKorol, Oksana 14 October 2011 (has links)
In the past decade, fast advancements have been made in the sequencing, digitalization and collection of the biological data. However the bottleneck remains at the point of analysis and extraction of patterns from the data. We have developed a method that is aimed at widening this bottleneck by automating the knowledge extraction from the biological data. Our approach is aimed at discovering patterns in a set of DNA sequences based on the location of transcription factor binding sites or any other biological markers with the emphasis of discovering relationships. A variety of statistical and computational methods exists to analyze such data. However, they either require an initial hypothesis, which is later tested, or classify the data based on its attributes. Our approach does not require an initial hypothesis and the classification it produces is based on the relationships between attributes. The value of such approach is that is is able to uncover new knowledge about the data by inducing a general theory based on basic known rules.
The core of our approach lies in an inductive logic programming engine, which, based on positive and negative examples as well as background knowledge, is able to induce a descriptive, human-readable theory, describing the data. An application provides an end-to-end analysis of DNA sequences. A simple to use Web interface accepts a set of related sequences to be analyzed, set of negative example sequences to contrast the main set (optional), and a set of possible genetic markers as position-specific scoring matrices. A Java-based backend formats the sequences, determines the location of the genetic markers inside them and passes the information to the ILP engine, which induces the theory.
The model, assumed in our background knowledge, is a set of basic interactions between biological markers in any DNA sequence. This makes our approach applicable to analyze a wide variety of biological problems, including detection of cis-regulatory modules and analysis of ChIP-Sequencing experiments. We have evaluated our method in the context of such applications on two real world datasets as well as a number of specially designed synthetic datasets. The approach has shown to have merit even in situations when no significant classification could be determined.
|
7 |
ModuleInducer: Automating the Extraction of Knowledge from Biological SequencesKorol, Oksana January 2011 (has links)
In the past decade, fast advancements have been made in the sequencing, digitalization and collection of the biological data. However the bottleneck remains at the point of analysis and extraction of patterns from the data. We have developed a method that is aimed at widening this bottleneck by automating the knowledge extraction from the biological data. Our approach is aimed at discovering patterns in a set of DNA sequences based on the location of transcription factor binding sites or any other biological markers with the emphasis of discovering relationships. A variety of statistical and computational methods exists to analyze such data. However, they either require an initial hypothesis, which is later tested, or classify the data based on its attributes. Our approach does not require an initial hypothesis and the classification it produces is based on the relationships between attributes. The value of such approach is that is is able to uncover new knowledge about the data by inducing a general theory based on basic known rules.
The core of our approach lies in an inductive logic programming engine, which, based on positive and negative examples as well as background knowledge, is able to induce a descriptive, human-readable theory, describing the data. An application provides an end-to-end analysis of DNA sequences. A simple to use Web interface accepts a set of related sequences to be analyzed, set of negative example sequences to contrast the main set (optional), and a set of possible genetic markers as position-specific scoring matrices. A Java-based backend formats the sequences, determines the location of the genetic markers inside them and passes the information to the ILP engine, which induces the theory.
The model, assumed in our background knowledge, is a set of basic interactions between biological markers in any DNA sequence. This makes our approach applicable to analyze a wide variety of biological problems, including detection of cis-regulatory modules and analysis of ChIP-Sequencing experiments. We have evaluated our method in the context of such applications on two real world datasets as well as a number of specially designed synthetic datasets. The approach has shown to have merit even in situations when no significant classification could be determined.
|
8 |
Transcriptional Regulation of FEV, a Human Serotonin Neuron Developmental Control GeneKrueger, Katherine C. 21 July 2009 (has links)
No description available.
|
9 |
Hox Specificity: Constrained vs. Flexible Requirements for the PBC and MEIS CofactorsUhl, Juli D. 17 October 2014 (has links)
No description available.
|
10 |
TRANSLATIONAL CONTROL OF MATERNAL mRNA POPULATION IN MOUSE EMBRYOSPotireddy, Santhi January 2010 (has links)
Early mammalian development before the oocyte-to-embryo transition is under 'maternal control' from factors deposited in the cytoplasm during oocyte growth, synthesized independent of de novo transcription. Maternal mRNAs encode proteins necessary for early embryo development. Two elements in the mRNA 3’untranslated region (UTR), the cytoplasmic polyadenylation element (CPE) and the hexanucleotide (AAUAAA) are involved in the control of translation of specific mRNAs during meiotic maturation. Despite advances in understanding the translational regulation during meiotic maturation, regulation at the 1-cell stage has not been explained. More studies are required to explain this complex mechanism of temporal mRNA recruitment after fertilization. Maternal mRNAs translated at different stages were examined to understand how specific maternal mRNAs are synthesized and stored, what are these maternal mRNAs, which maternal mRNAs are translated, and how these maternal mRNAs are temporally regulated. Polysomal mRNAs from eggs and 1-cell embryos were analyzed by microarray analysis and this indicated that temporally significant biological activities were encoded by mRNAs recruited at different stages of development. The mRNAs recruited in eggs were involved in homeostasis and transport mechanisms and those recruited in zygotes were involved in biosynthesis and metabolic activities. These data indicated that there is a temporal regulation of maternal mRNAs to meet the different biological requirements of the embryos. After the identification of temporally translated mRNAs, experiments were performed to understand the mechanism underlying temporal translation. The prevalence of the CPE differed between the two mRNA populations translated i.e., egg and 1-cell stage polysomal mRNAs. CPEs were present in ~53% of transcripts at the 1-cell stage compared to ~86% at the MII stage. This indicated that novel motifs other than CPEs regulate translation of mRNAs at the 1-cell stage. Truncation and deletion experiments were conducted using chimeric mRNAs based on one mRNA that was enriched in the 1- cell polysomes (Bag4). These experiments led to the identification of two regulatory regions that control translation at the 1-cell stage, an 80 nt region and a 43 nt region with different regulatory motifs. The 80 nt region is involved in activation of translation and the 43 nt region has an inhibitory effect on translation at the MII and early 1-cell stage. These results provide a detailed picture of how specific maternal mRNAs are prevented from undergoing translation at the MII stage and how the effect of inhibition is eliminated by the late 1-cell stage. / Biochemistry
|
Page generated in 0.0764 seconds