1 |
Elucidating transcription factor regulation by TCDD within the hs1,2 enhancerOchs, Sharon D. 13 April 2012 (has links)
No description available.
|
2 |
The Role of Affinity and Arrangement of Transcription Factor Binding Sites in Determining Hox-regulated Gene Expression PatternsZandvakili, Arya 30 October 2018 (has links)
No description available.
|
3 |
Combining Prior Information for the Prediction of Transcription Factor Binding SitesBenner, Philipp 21 June 2018 (has links)
Despite the fact that each cell in an organism has the same genetic information, it is possible that cells fundamentally differ in their function. The molecular basis for the functional diversity of cells is governed by biochemical processes that regulate the expression of genes. Key to this regulatory process are proteins called transcription factors that recognize and bind specific DNA sequences of a few nucleotides. Here we tackle the problem of identifying the binding sites of a given transcription factor. The prediction of binding preferences from the structure of a transcription factor is still an unsolved problem. For that reason, binding sites are commonly identified by searching for overrepresented sites in a given collection of nucleotide sequences. Such sequences might be known regulatory regions of genes that are assumed to be coregulated, or they are obtained from so-called ChIP-seq experiments that identify approximately the sites that were bound by a given transcription factor. In both cases, the observed nucleotide sequences are much longer than the actual binding sites and computational tools are required to uncover the actual binding preferences of a factor. Aggravated by the fact that transcription factors recognize not only a single nucleotide sequence, the search for overrepresented patterns in a given collection of sequences has proven to be a challenging problem.
Most computational methods merely relied on the given set of sequences, but additional information is required in order to make reliable predictions. Here, this information is obtained by looking at the evolution of nucleotide sequences. For that reason, each nucleotide sequence in the observed data is augmented by its orthologs, i.e. sequences from related species where the same transcription factor is present. By constructing multiple sequence alignments of the orthologous sequences it is possible to identify functional regions that are under selective pressure and therefore appear more conserved than others. The processing of the additional information exerted by ortholog sequences relies on a phylogenetic tree equipped with a nucleotide substitution model that not only carries information about the ancestry, but also about the expected similarity of functional sites.
As a result, a Bayesian method for the identification of transcription factor binding sites is presented. The method relies on a phylogenetic tree that agrees with the assumptions of the nucleotide substitution process. Therefore, the problem of estimating phylogenetic trees is discussed first. The computation of point estimates relies on recent developments in Hadamard spaces. Second, the statistical model is presented that captures the enrichment and conservation of binding sites and other functional regions in the observed data. The performance of the method is evaluated on ChIP-seq data of transcription factors, where the binding preferences have been estimated in previous studies.
|
4 |
An integrated genomic approach for the identification and analysis of single nucleotide polymorphisms that affect cancer in humansRepapi, Emmanouela January 2013 (has links)
The identification of genetic variants such as single nucleotide polymorphisms (SNPs), which affect cancer progression, survival and response to treatments could help in the design of better prevention and treatment strategies. Genome-wide association studies (GWAS) have provided the first step of identifying SNPs associating with cancer risk. However, identifying the causal SNPs responsible for the associations has proven challenging, and GWAS have not been successful for time-to-event phenotypes such as cancer progression, due to the insurmountable obstacle of the large sample size needed. The aim of this thesis is to design and implement strategies that combine the identification of SNPs significantly associated with cancer, focusing on time-to-event phenotypes, with detailed bioinformatics analysis to allow for further experimental validation and modelling, to better understand cancer-associated genomic loci and accelerate their incorporation into the clinic. First, a methodology that utilises the Random Survival Forest is developed and combined with a bioinformatics analysis that ranks SNPs according to their potential to result in differential protein levels or activity, in order to identify SNPs that affect the progression of B-cell chronic lymphocytic leukaemia. Next, an analysis that aims to extend our understanding of the role of SNPs in mediating the cellular responses to chemotherapeutic agents is applied. SNPs that could associate with differential cellular growth responses in cancer cell line panels are identified, and their association with the differential survival of cancer patients is explored. Finally, the potential roles of SNPs in affecting the transcriptional regulation of key cancer genes resulting in differential cancer risk are assessed. First, by focusing on SNPs in an important transcription factor binding motif that has been shown to be extremely sensitive to single base pair changes (the E-box) and next, by exploring the possibility that polymorphic transcription factor binding sites could underlie the significant associations noted in cancer GWAS.
|
5 |
Phenotype-related regulatory element and transcription factor identification via phylogeny-aware discriminative sequence motif scoringLanger, Björn 18 September 2018 (has links)
Understanding the connection between an organism’s genotype and its phenotype is a key question in evolutionary biology and genetics. It has been shown that many changes of morphological or other complex phenotypic traits result from changes in the expression pattern of key developmental genes rather than from changes in the genes itself. Such altered gene expression arises often from changes in the gene regulatory regions. That usually means the loss of important transcription factor (TF) binding sites within these regulatory regions, because the interaction between TFs and specific sites on the DNA is a key element of gene regulation.
An established approach for the genome-wide mapping of genomic regions to phenotypes is the Forward Genomics framework. This approach compares the genomic sequences of species with and without the phenotype of interest based upon two ideas. First, the initial loss of a phenotype relaxes selection on all phenotypically related genomic regions and, second, this can happen independently in multiple species. Of interest are such regions that diverged specifically in phenotype-loss species. Although this principle is general, the current implementation is only well-suited for the identification of phenotype related gene-coding regions and has a limited applicability on regulatory regions. The reason is its reliance on sequence conservation as divergence measure, which does not accurately measure functional divergence of regulatory elements.
In this thesis, I developed REforge, a novel implementation of the Forward Genomics principle that takes functional information of regulatory elements in the form of known phenotype-related TF into account. The consideration of the flexible organization of TF binding sites within a regulatory region, both in terms of strength and order, allows the abstraction from the region’s sequence level to its functional level. Thus, functional divergence of regulatory regions is directly compared to phenotypical divergence, which tremendously improves performance compared to Forward Genomics, as I demonstrated on synthetic and real data.
Additionally, I developed TFforge which follows the same approach but aims at identifying the TFs relevant for the given phenotype. Given a multi-species alignment with a phenotype annotation and a set of regulatory regions, TFforge systematically searches for TFs whose changes in binding affinity between species fit the phenotype signature. The reported output is a ranking of the TFs according to their level of correspondence. I prove the concept of this approach on both biological data and artificially generated regions. TFforge can be used as a standalone analysis tool and also to generate the input set of TFs for a subsequent REforge analysis. I demonstrate that REforge in combination with TFforge is able to substantially outperform standard Forward Genomics, i.e. even without foreknowledge of relevant TFs.
Overall, the in this thesis introduced methods are examples for the power of computational tools in comparative genomics to catalyze biological insights. I did not only show a detailed description of the methods but also conducted a real data analysis as validation. REforge and TFforge have a wide applicability on endless phenotypes, both on their own in the association of TF and regulatory region to a phenotype. Moreover, particularly their combination constitutes in respect to gene regulatory network analyses a valuable tool set for evo-devo studies.
|
6 |
Ferramenta de bioinformática para integrar e compreender as mudanças epigenômicas e genômicas aberrantes associadas com câncer: métodos, desenvolvimento e análise / Bioinformatic tool to integrate and understand aberrant epigenomic and genomic changes associated with cancer: Methods, development and analysisSilva, Tiago Chedraoui 01 February 2018 (has links)
O câncer configura uma das maiores causas de mortalidade no mundo, caracterizando-se como uma doença complexa orquestrada por alterações genômicas e epigenômicas capazes de alterar a expressão gênica e a identidade celular. Nova evidência obtida por meio de um estudo genômico em larga escala e cujos dados encontram-se disponíveis no banco público do TCGA sugere que um em cada dez pacientes portadores de câncer pode ser classificado com maior eficácia tendo como base a taxonomia molecular quando comparada à histologia. Dessa maneira, nós hipotetizamos que o estabelecimento de mapas genômicos exibindo a localização de sítios de ligação de fatores de transcrição combinada à identificação de regiões diferencialmente metiladas e perfis alterados de expressão gênica possa nos auxiliar a caracterizar e explorar, ao nível molecular, fenótipos associados ao câncer. Avanços tecnológicos e bancos de dados públicos a exemplo do The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE) e o NIH Roadmap Epigenomics Mapping Consortium (Roadmap) têm proporcionado um recurso inestimável para interrogar o (epi)genoma de linhagens de células tumorais em cultura, bem como de tecidos normais e tumorais em alta resolução. Todavia, a informação biológica encontra-se armazenada em diferentes formatos e não há ferramentas computacionais para integrar esses dados, evidenciando um cenário atual que requer, com urgência, o desenvolvimento de ferramentas de bioinformática e softwares capazes de direcionar a solução deste obstáculo. Nesse contexto, o objetivo principal deste estudo consiste em implementar o desenvolvimento de ferramentas de bioinformática, na linguagem de programação R que, ao final do estudo, será submetido à comunidade científica do projeto Bioconductor sob a licença de código aberto GNU GPL versão 3. Além disso, ajudaremos nossos colaboradores com o aperfeiçoamento do ELMER, um pacote R/Bioconductor que identifica elementos reguladores usando dados de expressão gênica, de metilação do DNA e análise de motivo. Nossa expectativa é que essas ferramentas possam automatizar com acurácia a pesquisa, o download e a análise dos dados (epi)genômicos que se encontram atualmente disponíveis nas bases de dados públicas dos consórcios internacionais TCGA, ENCODE e Roadmap, além de integrá-los facilmente aos dados genômicos e epigenômicos gerados por pesquisadores por meio de experimentos em larga escala. Além disso, realizaremos também o processamento e a análise manual dos dados que serão automatizados pelas ferramentas, visando validar sua capacidade em descobrir assinaturas epigenômicas que possam redefinir subtipos de câncer. Por xi fim, as usaremos para investigar as diferenças moleculares entre dois subgrupos de gliomas recentemente descobertos por nosso laboratório. / Cancer, which is one of the major causes of mortality worldwide, is a complex disease orchestrated by aberrant genomic and epigenomic changes that can modify gene regulatory circuits and cellular identity. Emerging evidence obtained through high-throughput genomic data deposited within the public TCGA international consortium suggests that one in ten cancer patients would be more accurately classified by molecular taxonomy versus histology. Therefore, we have hypothesized that the establishment of genome-wide maps of the de novo DNA binding motifs localization coupled with differentially methylated regions and gene expression changes might help to characterize and exploit cancer phenotypes at the molecular level. Technological advances and public databases like The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (roadmap) have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high resolution. Markedly however, biological information is stored in different formats and there is no current tool to integrate the data, highlighting an urgent need to develop bioinformatic tools and/or computational softwares to overcome this challenge. In this context, the main purpose of this study is the development of bioinformatics tools in R programming language that will be submitted to the larger open-source Bioconductor community project under the GNU GPL3 (General Public License version 3). Also, we will help our collaborators improve of the R/Bioconductor ELMER package that identifies regulatory enhancers using gene expression, DNA methylation data and motif analysis. Our expectation is that these tools can effectively automate search, retrieve, and analyze the vast (epi)genomic data currently available from TCGA, ENCODE, and Roadmap, and integrate genomics and epigenomics features with researchers own high-throughput data. Furthermore, we will also navigate through these data manually in order to validate the capacity of these tools in discovering epigenomic signatures able to redefine subtypes of cancer. Finally, we will use them to investigate the molecular differences between two subgroups of gliomas, one of the most aggressive primary brain cancer, recently discovered by our laboratory.
|
7 |
Predição computacional de sítios de ligação de fatores de transcrição baseada em gramáticas regulares estocásticas / Computational prediction of transcription factor binding sites based on stochastic regular grammarsFerrão Neto, Antonio 27 October 2017 (has links)
Fatores de transcrição (FT) são proteínas que se ligam em sequências específicas e bem conservadas de nucleotídeos no DNA, denominadas sítios de ligação dos fatores de transcrição (SLFT), localizadas em regiões de regulação gênica conhecidas como módulos cis-reguladores (CRM). Ao reconhecer o SLFT, o fator de transcrição se liga naquele sítio e influencia a transcrição gênica positiva ou negativamente. Existem técnicas experimentais para a identificação dos locais dos SLFTs em um genoma, como footprinting, ChIP-chip ou ChIP-seq. Entretanto, a execução de tais técnicas implica em custos e tempo elevados. Alternativamente, pode-se utilizar as sequências de SLFTs já conhecidas para um determinado fator de transcrição e aplicar técnicas de aprendizado computacional supervisionado para criar um modelo computacional para tal sítio e então realizar a predição computacional no genoma. Entretanto, a maioria das ferramentas computacionais existentes para esse fim considera independência entre as posições entre os nucleotídeos de um sítio - como as baseadas em PWMs (position weight matrix) - o que não é necessariamente verdade. Este projeto teve como objetivo avaliar a utilização de gramáticas regulares estocásticas (GRE) como técnica alternativa às PWMs neste problema, uma vez que GREs são capazes de caracterizar dependências entre posições consecutivas dos sítios. Embora as diferenças de desempenho tenham sido sutis, GREs parecem mesmo ser mais adequadas do que PWMs na presença de valores mais altos de dependência de bases, e PWMs nos demais casos. Por fim, uma ferramenta de predição computacional de SLFTs foi criada baseada tanto em GREs quanto em PWMs. / Transcription factors (FT) are proteins that bind to specific and well-conserved sequences of nucleotides in the DNA, called transcription factor binding sites (TFBS), contained in regions of gene regulation known as cis-regulatory modules (CRM). By recognizing TFBA, the transcription factor binds to that site and positively or negatively influence the gene transcription. There are experimental procedures for the identification of TFBS in a genome such as footprinting, ChIP-chip or ChIP-Seq. However, the implementation of these techniques involves high costs and time. Alternatively, one may utilize the TFBS sequences already known for a particular transcription factor and applying computational supervised learning techniques to create a computational model for that site and then perform the computational prediction in the genome. However, most existing software tools for this purpose considers independence between nucleotide positions in the site - such as those based on PWMs (position weight matrix) - which is not necessarily true. This project aimed to evaluate the use of stochastic regular grammars (SRG) as an alternative technique to PWMs in this problem, since SRGs are able to characterize dependencies between consecutive positions in the sites. Although differences in performance have been subtle, SRGs appear to be more suitable than PWMs in the presence of higher base dependency values, and PWMs in other cases. Finally, a computational TFBS prediction tool was created based on both SRGs and PWMs.
|
8 |
A functional genomics approach to map transcriptional and post-transcriptional gene regulatory networksBhinge, Akshay Anant 15 October 2009 (has links)
It has been suggested that organismal complexity correlates with the complexity
of gene regulation. Transcriptional control of gene expression is mediated by binding of
regulatory proteins to cis-acting sequences on the genome. Hence, it is crucial to identify
the chromosomal targets of transcription factors (TFs) to delineate transcriptional
regulatory networks underlying gene expression programs. The development of ChIP-chip
technology has enabled high throughput mapping of TF binding sites across the
genome. However, there are many limitations to the technology including the availability
of whole genome arrays for complex organisms such human or mouse. To circumvent
these limitations, we developed the Sequence Tag Analysis of Genomic Enrichment
(STAGE) methodology that is based on extracting short DNA sequences or “tags” from
ChIP-enriched DNA. With improvements in sequencing technologies, we applied the
recently developed ChIP-Seq technique i.e. ChIP followed by ultra high throughput
sequencing, to identify binding sites for the TF E2F4 across the human genome. We identified previously uncharacterized E2F4 binding sites in intergenic regions and found
that several microRNAs are potential E2F4 targets.
Binding of TFs to their respective chromosomal targets requires access of the TF
to its regulatory element, which is strongly influenced by nucleosomal remodeling. In
order to understand nucleosome remodeling in response to transcriptional perturbation,
we used ultra high throughput sequencing to map nucleosome positions in yeast that were
subjected to heat shock or were grown normally. We generated nucleosome remodeling
profiles across yeast promoters and found that specific remodeling patterns correlate with
specific TFs active during the transcriptional reprogramming.
Another important aspect of gene regulation operates at the post-transcriptional
level. MicroRNAs (miRNAs) are ~22 nucleotide non-coding RNAs that suppress
translation or mark mRNAs for degradation. MiRNAs regulate TFs and in turn can be
regulated by TFs. We characterized a TF-miRNA network involving the oncofactor Myc
and the miRNA miR-22 that suppresses the interferon pathway as primary fibroblasts
enter a stage of rapid proliferation. We found that miR-22 suppresses the interferon
pathway by inhibiting nuclear translocation of the TF NF-kappaB. Our results show how
the oncogenic TF Myc cross-talks with other TF regulatory pathways via a miRNA intermediary. / text
|
9 |
Predição computacional de sítios de ligação de fatores de transcrição baseada em gramáticas regulares estocásticas / Computational prediction of transcription factor binding sites based on stochastic regular grammarsAntonio Ferrão Neto 27 October 2017 (has links)
Fatores de transcrição (FT) são proteínas que se ligam em sequências específicas e bem conservadas de nucleotídeos no DNA, denominadas sítios de ligação dos fatores de transcrição (SLFT), localizadas em regiões de regulação gênica conhecidas como módulos cis-reguladores (CRM). Ao reconhecer o SLFT, o fator de transcrição se liga naquele sítio e influencia a transcrição gênica positiva ou negativamente. Existem técnicas experimentais para a identificação dos locais dos SLFTs em um genoma, como footprinting, ChIP-chip ou ChIP-seq. Entretanto, a execução de tais técnicas implica em custos e tempo elevados. Alternativamente, pode-se utilizar as sequências de SLFTs já conhecidas para um determinado fator de transcrição e aplicar técnicas de aprendizado computacional supervisionado para criar um modelo computacional para tal sítio e então realizar a predição computacional no genoma. Entretanto, a maioria das ferramentas computacionais existentes para esse fim considera independência entre as posições entre os nucleotídeos de um sítio - como as baseadas em PWMs (position weight matrix) - o que não é necessariamente verdade. Este projeto teve como objetivo avaliar a utilização de gramáticas regulares estocásticas (GRE) como técnica alternativa às PWMs neste problema, uma vez que GREs são capazes de caracterizar dependências entre posições consecutivas dos sítios. Embora as diferenças de desempenho tenham sido sutis, GREs parecem mesmo ser mais adequadas do que PWMs na presença de valores mais altos de dependência de bases, e PWMs nos demais casos. Por fim, uma ferramenta de predição computacional de SLFTs foi criada baseada tanto em GREs quanto em PWMs. / Transcription factors (FT) are proteins that bind to specific and well-conserved sequences of nucleotides in the DNA, called transcription factor binding sites (TFBS), contained in regions of gene regulation known as cis-regulatory modules (CRM). By recognizing TFBA, the transcription factor binds to that site and positively or negatively influence the gene transcription. There are experimental procedures for the identification of TFBS in a genome such as footprinting, ChIP-chip or ChIP-Seq. However, the implementation of these techniques involves high costs and time. Alternatively, one may utilize the TFBS sequences already known for a particular transcription factor and applying computational supervised learning techniques to create a computational model for that site and then perform the computational prediction in the genome. However, most existing software tools for this purpose considers independence between nucleotide positions in the site - such as those based on PWMs (position weight matrix) - which is not necessarily true. This project aimed to evaluate the use of stochastic regular grammars (SRG) as an alternative technique to PWMs in this problem, since SRGs are able to characterize dependencies between consecutive positions in the sites. Although differences in performance have been subtle, SRGs appear to be more suitable than PWMs in the presence of higher base dependency values, and PWMs in other cases. Finally, a computational TFBS prediction tool was created based on both SRGs and PWMs.
|
10 |
Análise in silico de regiões promotoras de genes de Xylella fastidiosa / In silico analysis on promoter sequences of protein-coding genes from Xylella fastidiosaTria, Fernando Domingues Kümmel 24 June 2013 (has links)
Xylella fastidiosa é uma bactéria gram-negativa, não flagelada, agente causal de doenças de importância econômica como a doença de Pierce nas videiras e a clorose variegada dos citros (CVC) nas laranjeiras. O objetivo do presente trabalho foi realizar análises in silico das sequências promotoras dos genes deste fitopatógeno em uma tentativa de arrecadar novas evidências para o melhor entendimento da dinâmica de regulação transcricional de seus genes, incluindo aqueles envolvidos em mecanismos de patogenicidade e virulência. Para tanto, duas estratégias foram utilizadas para predição de elementos cis-regulatórios em regiões promotoras do genoma da cepa referência 9a5c, comprovadamente associada à CVC. A primeira, conhecida como phylogenetic footprinting, foi empregada para identificação de elementos regulatórios conservados em promotores de unidades transcricionais ortólogas, levando em consideração o conjunto de genes de X. fastidiosa e 7 espécies comparativas. O critério para identificação de unidades transcricionais ortólogas, isto é, unidades trancricionais oriundas de espécies distintas e cujos promotores compartilham elementos cis-regulatórios, foi paralelamente estudado utilizando-se informações regulatórias das bactérias modelos: Pseudomonas aeruginosa, Bacillus subtilis e Escherichia coli. Os resultados obtidos com análise de phylogenetic footprinting nos permitiu acessar a rede regulatória transcricional da espécie de forma compreensiva (global). Foram estabelecidas 2990 interações regulatórias, compreendendo 80 motivos distribuídos nos promotores de 56.8% das unidades transcricionais do genoma de X. fastidiosa. Na segunda estratégia recuperamos informações regulatórias experimentalmente validadas em E. coli e complementamos o conhecimento de dez regulons de X. fastidiosa, através de uma metodologia de scanning (varredura), dos quais algumas interações regulatórias já haviam sido previamente descritas por outros trabalhos. Destacamos os regulons de Fur e CRP, reguladores transcricionais globais, que se mostraram responsáveis pela modulação de genes relacionados a mecanismos de invasão e colonização do hospedeiro vegetal entre outros. Por fim, análises comparativas em regiões regulatórias correspondentes entre cepas foram realizadas e diferenças possivelmente associadas a particularidades fenotípicas foram identificadas entre 9a5c e J1a12, um isolado de citros não virulento, e 9a5c e Temecula1, um isolado de videira causador da doença de Pierce. / Xylella fastidiosa is a gram-negative, non-flagellated bacterium responsible for causing economically important diseases such as Pierce\'s disease in grapevines and Citrus Variegated Clorosis (CVC) in sweet orange trees. In the present work we performed in silico analysis on promoter sequences of protein-coding genes from this phytopathogen, including those involved in virulence and pathogenic mechanisms, in an attempt to better understand the underlying transcriptional regulatory dynamics. Two strategies for cis-regulatory elements prediction were applied on promoter sequences from 9a5c strain genome, a proven causal agent of CVC. The first one, known as phylogenetic footprinting, involved the prediction of regulatory motifs conserved on promoter sequences of orthologous transcription units from X. fastidiosa and a set of 7 comparatives species. The criteria to identify orthologous transcription units, i. e., those from different species and whose promoter sequences share at least one common regulatory motif, was studied based on regulatory information available for model organisms: Pseudomonas aeruginosa, Bacillus subtilis and Escherichia coli. The results obtained with the phylogenetic footprinting analysis permitted us to access the underlying transcriptional regulatory network from the species in a comprehensive manner (genome-wide), with a total of 2990 regulatory interactions corresponding to 80 predicted motifs distributed on promoter sequences of 56.8% of all transcription units. In the second strategy regulatory information from E. coli was recovered and used to expand the knowledge of ten regulons in X. fastidiosa, through a scanning process, of which some regulatory interactions were previously described by independent studies. We emphasize some genes related to host invasion and colonization present in the Fur and CRP regulons, two global transcription regulators. Lastly, comparative analysis on corresponding regulatory regions among strains were performed and differences possibly associated to phenotypic variation were identified between 9a5c and J1a12, a non-virulent strain isolated from orange trees, and between 9a5c and Temecula1, a strain associated to Pierce\'s disease on grapevines.
|
Page generated in 0.1755 seconds