Spelling suggestions: "subject:"DNA motifs"" "subject:"DNA votifs""
1 |
A Parallel, High-Throughput Framework for Discovery of DNA MotifsKurz, Kyle W. 27 July 2010 (has links)
No description available.
|
2 |
Mining DNA elements involved in targeting of chromatin modifiersPhilip, Philge January 2014 (has links)
Background: In all higher organisms, the nuclear DNA is condensed into nucleosomes that consist of DNA wrapped around a core of highly conserved histone proteins. DNA bound to histones and other structural proteins form the chromatin. Generally, only few regions of DNA are accessible and most of the time RNA polymerase and other DNA binding proteins have to overcome this compaction to initiate transcription. Several proteins are involved in making the chromatin more compact or open. Such chromatin-modifying proteins make distinct post-translational modifications of histones – especially in the histone tails – to alter their affinity to DNA. Aim: The main aim of my thesis work is to study the targeting of chromatin modifiers important for correct gene expression in Drosophila melanogaster (fruit flies). Primary DNA sequences, chromatin associated proteins, transcription, and non-coding RNAs are all likely to be involved in targeting mechanisms. This thesis work involves the development of new computational methods for identification of DNA motifs and protein factors involved in the targeting of chromatin modifiers. Targeting and functional analysis of two chromatin modifiers, namely male-specific lethal (MSL) complex and CREB-binding protein (CBP) are specifically studied. The MSL complex is a protein complex that mediates dosage compensation in flies. CBP protein is known as a transcriptional co-regulator in metazoans and it has histone acetyl transferase activity and CBP has been used to predict novel enhancers. Results: My studies of the binding sites of MSL complex shows that promoters and coding sequences of MSL-bound genes on the X-chromosome of Drosophila melanogaster can influence the spreading of the complex along the X-chromosome. Analysis of MSL binding sites when two non-coding roX RNAs are mutated shows that MSL-complex recruitment to high-affinity sites on the Xchromosome is independent of roX, and the role of roX RNAs is to prevent binding to repeats in autosomal sites. Functional analysis of MSL-bound genes using their dosage compensation status shows that the function of the MSL complex is to enhance the expression of short housekeeping genes, but MSL-independent mechanisms exist to achieve complete dosage compensation. Studies of the binding sites of the CBP protein show that, in early embryos, Dorsal in cooperation with GAGA factor (GAF) and factors like Medea and Dichaete target CBP to its binding sites. In the S2 cell line, GAF is identified as the targeting factor of CBP at promoters and enhancers, and GAF and CBP together are found to induce high levels of polymerase II pausing at promoters. In another study using integrated data analysis, CBP binding sites could be classified into polycomb protein binding sites, repressed enhancers, insulator protein-bound regions, active promoters, and active enhancers, and this suggested different potential roles for CBP. A new approach was also developed to eliminate technical bias in skewed experiments. Our study shows that in the case of skewed datasets it is always better to identify non-altered variables and to normalize the data using only such variables.
|
3 |
Aplicação de métodos estatísticos e computacionais para o estudo da cis-regulação da expressão gênica / Aplication of computational and statistical methods for the study of cis-regulation of genic expressionAlmeida, Marcio Augusto Afonso de 16 April 2010 (has links)
Ferramentas bioinformática têm se tornado a escolha para auxiliar pesquisadores tanto para a anotação de novos genes, como para estudar genes em condições fisiológicas de interesse. Entre essas ferramentas destacam-se os algoritmos de agrupamento filogenético e os algoritmos de predição de padrões curtos de DNA, como, por exemplo, predições de sítios para ligação de fatores de transcrição. Desenvolver uma abordagem mista com o objetivo de agrupar genes baseando-se unicamente nos sinais transcricionais preditos em suas seqüências é um desafio de difícil transposição. No presente trabalho, apresentamos nossos resultados para tentar superar tal limitação que podem ser subdividos em duas seções: a primeira aonde desenvolvemos uma abordagem para a melhoria das predições computacionais de sítios de ligação e a segunda, onde passamos a agrupar genes com base nos seus sinais transcricionais preditos em seqüências conservadas flanqueadoras. A primeira seção de nosso trabalho foi focada no estudo de uma seqüência de indução de transcrição próxima ao gene Aldh1a2 de camundongo aonde foram preditos sítios para fatores de transcrição que foram posteriormente testados biologicamente e se mostraram associados ao controle da expressão desse gene. A partir de uma profunda pesquisa bibliográfica, nós determinamos um grupo de 57 fatores de transcrição já associados com a especialização de subpopulações de neurônios durante o desenvolvimento neuroembrionário de vertebrados. Nossa abordagem de seleção de sítios de alto valor biológico foi agora testada em seqüências conservadas próximas a cada um desses genes que codificam esses fatores de transcrição associados e os sítios de ligação para fatores de transcrição foram preditos. Tais sítios foram contabilizados e utilizados com entrada para nossa abordagem de agrupamento. A análise dos resultados do agrupamento determinou que, nossa abordagem se mostrou suficientemente sensível para construir uma árvore solução com boas relações com os padrões, já conhecidos, de expressão para esses genes agrupados. Essa abordagem poderá ser utilizada tanto para anotar funcionalmente genes de interesse quanto para minerar informações dentro de um grupo de genes previamente selecionado. / Bioinformatics tools are becoming the choice for aiding scientists for gene annotation and for studying gene in physiological conditions of interest. Among those efforts, phylogenetics clustering algorithms and tools for predicting short DNA patterns, such as binding sites for transcription factor, are outlined as essential. To develop a mixture procedure merging this two distant fields of bioinformatics research is a challenge hard to overcome. In the present study, we present our results of trying to overcome such limitation and it be easily subdivided in two distinct sections: initially we develop a procedure to improve the computational prediction of binding site for transcription factors and the second one where genes were grouped based solely in their transcriptional patterns predicted in conserved flanking sequences. The first section of the present study was focused in the study of an enhancer near Aldh1a2 gene in mouse where binding sites were predicted and latter biologically tested and showed strong influence in expression control of this gene. By a comprehensive bibliographic research we determined a group of 57 transcription factors which were already associated with neuron subpopulations specialization during the neuroembryonary development in vertebrates. Our computational procedure for selection of high biological value binding sites was applied in conserved flanking sequence in each of these genes encoding these associated transcription factors and a large group of binding sites were predicted. This sites were counted and use as an input for our clustering procedure. Clustering results analyses determined that our procedure showed to be sufficiently sensible to construct a solution tree showing good relations with, already determined, expression patterns of grouped genes. This procedure could be for functionally annotation of genes and for data mining in a group of already determined genes of interest.
|
4 |
Aplicação de métodos estatísticos e computacionais para o estudo da cis-regulação da expressão gênica / Aplication of computational and statistical methods for the study of cis-regulation of genic expressionMarcio Augusto Afonso de Almeida 16 April 2010 (has links)
Ferramentas bioinformática têm se tornado a escolha para auxiliar pesquisadores tanto para a anotação de novos genes, como para estudar genes em condições fisiológicas de interesse. Entre essas ferramentas destacam-se os algoritmos de agrupamento filogenético e os algoritmos de predição de padrões curtos de DNA, como, por exemplo, predições de sítios para ligação de fatores de transcrição. Desenvolver uma abordagem mista com o objetivo de agrupar genes baseando-se unicamente nos sinais transcricionais preditos em suas seqüências é um desafio de difícil transposição. No presente trabalho, apresentamos nossos resultados para tentar superar tal limitação que podem ser subdividos em duas seções: a primeira aonde desenvolvemos uma abordagem para a melhoria das predições computacionais de sítios de ligação e a segunda, onde passamos a agrupar genes com base nos seus sinais transcricionais preditos em seqüências conservadas flanqueadoras. A primeira seção de nosso trabalho foi focada no estudo de uma seqüência de indução de transcrição próxima ao gene Aldh1a2 de camundongo aonde foram preditos sítios para fatores de transcrição que foram posteriormente testados biologicamente e se mostraram associados ao controle da expressão desse gene. A partir de uma profunda pesquisa bibliográfica, nós determinamos um grupo de 57 fatores de transcrição já associados com a especialização de subpopulações de neurônios durante o desenvolvimento neuroembrionário de vertebrados. Nossa abordagem de seleção de sítios de alto valor biológico foi agora testada em seqüências conservadas próximas a cada um desses genes que codificam esses fatores de transcrição associados e os sítios de ligação para fatores de transcrição foram preditos. Tais sítios foram contabilizados e utilizados com entrada para nossa abordagem de agrupamento. A análise dos resultados do agrupamento determinou que, nossa abordagem se mostrou suficientemente sensível para construir uma árvore solução com boas relações com os padrões, já conhecidos, de expressão para esses genes agrupados. Essa abordagem poderá ser utilizada tanto para anotar funcionalmente genes de interesse quanto para minerar informações dentro de um grupo de genes previamente selecionado. / Bioinformatics tools are becoming the choice for aiding scientists for gene annotation and for studying gene in physiological conditions of interest. Among those efforts, phylogenetics clustering algorithms and tools for predicting short DNA patterns, such as binding sites for transcription factor, are outlined as essential. To develop a mixture procedure merging this two distant fields of bioinformatics research is a challenge hard to overcome. In the present study, we present our results of trying to overcome such limitation and it be easily subdivided in two distinct sections: initially we develop a procedure to improve the computational prediction of binding site for transcription factors and the second one where genes were grouped based solely in their transcriptional patterns predicted in conserved flanking sequences. The first section of the present study was focused in the study of an enhancer near Aldh1a2 gene in mouse where binding sites were predicted and latter biologically tested and showed strong influence in expression control of this gene. By a comprehensive bibliographic research we determined a group of 57 transcription factors which were already associated with neuron subpopulations specialization during the neuroembryonary development in vertebrates. Our computational procedure for selection of high biological value binding sites was applied in conserved flanking sequence in each of these genes encoding these associated transcription factors and a large group of binding sites were predicted. This sites were counted and use as an input for our clustering procedure. Clustering results analyses determined that our procedure showed to be sufficiently sensible to construct a solution tree showing good relations with, already determined, expression patterns of grouped genes. This procedure could be for functionally annotation of genes and for data mining in a group of already determined genes of interest.
|
5 |
Une nouvelle approche computationnelle pour la découverte des sites de fixation de facteurs de transcription à l’ADN, adaptée aux données de ChIP-chip et de ChIP-séquençageAid, Malika 09 1900 (has links)
Les facteurs de transcription sont des protéines spécialisées qui jouent un rôle important dans différents processus biologiques tel que la différenciation, le cycle cellulaire et la tumorigenèse. Ils régulent la transcription des gènes en se fixant sur des séquences d’ADN spécifiques (éléments cis-régulateurs). L’identification de ces éléments est une étape cruciale dans la compréhension des réseaux de régulation des gènes. Avec l’avènement des technologies de séquençage à haut débit, l’identification de tout les éléments fonctionnels dans les génomes, incluant gènes et éléments cis-régulateurs a connu une avancée considérable. Alors qu’on est arrivé à estimer le nombre de gènes chez différentes espèces, l’information sur les éléments qui contrôlent et orchestrent la régulation de ces gènes est encore mal définie. Grace aux techniques de ChIP-chip et de ChIP-séquençage il est possible d’identifier toutes les régions du génome qui sont liées par un facteur de transcription d’intérêt. Plusieurs approches computationnelles ont été développées pour prédire les sites fixés par les facteurs de transcription. Ces approches sont classées en deux catégories principales: les algorithmes énumératifs et probabilistes. Toutefois, plusieurs études ont montré que ces approches génèrent des taux élevés de faux négatifs et de faux positifs ce qui rend difficile l’interprétation des résultats et par conséquent leur validation expérimentale.
Dans cette thèse, nous avons ciblé deux objectifs. Le premier objectif a été de développer une nouvelle approche pour la découverte des sites de fixation des facteurs de transcription à l’ADN (SAMD-ChIP) adaptée aux données de ChIP-chip et de ChIP-séquençage. Notre approche implémente un algorithme hybride qui combine les deux stratégies énumérative et probabiliste, afin d’exploiter les performances de chacune d’entre elles. Notre approche a montré ses performances, comparée aux outils de découvertes de motifs existants sur des jeux de données simulées et des jeux de données de ChIP-chip et de ChIP-séquençage. SAMD-ChIP présente aussi l’avantage d’exploiter les propriétés de distributions des sites liés par les facteurs de transcription autour du centre des régions liées afin de limiter la prédiction aux motifs qui sont enrichis dans une fenêtre de longueur fixe autour du centre de ces régions.
Les facteurs de transcription agissent rarement seuls. Ils forment souvent des complexes pour interagir avec l’ADN pour réguler leurs gènes cibles. Ces interactions impliquent des facteurs de transcription dont les sites de fixation à l’ADN sont localisés proches les uns des autres ou bien médier par des boucles de chromatine. Notre deuxième objectif a été d’exploiter la proximité spatiale des sites liés par les facteurs de transcription dans les régions de ChIP-chip et de ChIP-séquençage pour développer une approche pour la prédiction des motifs composites (motifs composés par deux sites et séparés par un espacement de taille fixe). Nous avons testé ce module pour prédire la co-localisation entre les deux demi-sites ERE qui forment le site ERE, lié par le récepteur des œstrogènes ERα. Ce module a été incorporé à notre outil de découverte de motifs SAMD-ChIP. / Transcription factors (TF) play important roles in various biological processes such as differentiation, cell cycle progression and tumorigenesis. They regulate gene expression by binding to specific DNA sequences (TFBS). Identifying these cis-regulatory elements is a crucial step to understand gene regulatory networks. Technological developments have enhanced DNA sequencing at genomic scale. On the basis of the resulting sequences, computational biologists now attempt to localize the most important functional regions, starting with genes, but also importantly the whole genome characterization of transcription factor binding sites and allow the development of several computational DNA motif discovery tools.
Although these various tools are widely used and have been successful at discovering novel motifs, they are not adapted to ChIP-chip and ChIP-sequencing data. The main drawback of these approaches is that most of the predicted motifs represent artifacts due to an inefficient assessment of their enrichment.
This thesis is about transcription factor proteins and statistical analysis of their
binding sites in ChIP-chip and ChIP-sequencing data. The first objective was to develop a new do novo DNA motif discovery tool adapted to ChIP-chip and ChIP-sequencing data. SAMD-ChIP combines enumerative and stochastic strategies to predict enriched motifs in the vicinity of the ChIP peak summits. Our approach is an automated pipeline that includes motif discovery, motif clustering, motif optimization and finally motif identification using transcription factor (TF) databases. SAMD-ChIP outperforms state-of-the-art motif discovery tools in term of the number of predicted motifs and the prediction of rare and degenerate motifs. In particular, SAMD-ChIP efficiently identifies gapped motifs such as inverted or direct repeats bound by nuclear receptors and composite motifs resulting from the association of different single TF binding sites.
The underlying assumption of the second objective is that in regulatory regions, binding sites of interacting transcription factors co-occur more often than expected by chance in the vicinity of the ChIP-peak summits. We proposed an approach to predict transcription factor binding sites co-localization based on the prediction of single motifs by do novo motif discovery tools or by using TFBS models from TF data bases.
|
6 |
Optimizing Biomarkers From an Ensemble Learning PipelineKuntala, Prashant Kumar January 2017 (has links)
No description available.
|
Page generated in 0.0274 seconds