Global ETD Search

1	On diverse biophysical aspects of genetics : from the action of regulators to the characterization of transcripts Fouquier D´Hérouel, Aymeric January 2011 (has links) Genetics is among the most rewarding fields of biology for the theoretically inclined, offering both room and need for modeling approaches in the light of an abundance of experimental data of different kinds. Many aspects of the field are today understood in terms of physical and chemical models, joined by information theoretical descriptions. This thesis discusses different mechanisms and phenomena related to genetics, employing tools from statistical physics along with experimental biomolecular methods. Five articles support this work. Two articles deal with interactions between proteins and DNA. The first one reports on the properties of non-specific binding of transcription factors proteins in the yeast Saccharomyces cerevisiae, due to an effective background free energy which describes the affinity of a single protein for random locations on DNA. We argue that a background pool of non-specific binding sites is filled up before specific binding sites can be occupied with high probability, thus presenting a natural filter for genetic responses to spurious transcription factor productions. The second article describes an algorithm for the inference of transcription factor binding sites for proteins using a realistic physical model. The functionality of the method is verified on a set of known binding sequences for Escherichia coli transcription factors. The third article describes a possible genetic feedback mechanism between human cells and the ubiquitous Epstein-Barr virus (EBV). 40 binding regions for the major EBV transcription factor EBNA1 are identified in human DNA. Several of these are located nearby genes of particular relevance in the context of EBV infection and the most interesting ones are discussed. The fourth article describes results obtained from a positional autocorrelation analysis of the human genome, a simple technique to visualize and classify sequence repeats, constituting large parts of eukaryotic genomes. Applying this analysis to genome sequences in which previously known repeats have been removed gives rise to signals corroborating the existence of yet unclassified repeats of surprisingly long periods. The fifth article combines computational predictions with a novel molecular biological method based on the rapid amplification of cDNA ends (RACE), coined 5’tagRACE. The first search for non-coding RNAs encoded in the genome of the opportunistic bacterium Enterococcus faecalis is performed here. Applying 5’tagRACE allows us to discover and map 29 novel ncRNAs, 10 putative novelm RNAs and 16 antisense transcriptional organizations. Further studies, which are not included as articles, on the monitoring of secondary structure formation of nucleic acids during thermal renaturation and the inference of genetic couplings of various kinds from massive gene expression data and computational predictions, are outlined in the central chapters. / QC 20110316 transcription regulation regulatory motifs binding affinity genetic interactions secondary structure sequence repeats transcript characterization Biological physics Biologisk fysik
2	Análise in silico de regiões promotoras de genes de Xylella fastidiosa / In silico analysis on promoter sequences of protein-coding genes from Xylella fastidiosa Tria, Fernando Domingues Kümmel 24 June 2013 (has links) Xylella fastidiosa é uma bactéria gram-negativa, não flagelada, agente causal de doenças de importância econômica como a doença de Pierce nas videiras e a clorose variegada dos citros (CVC) nas laranjeiras. O objetivo do presente trabalho foi realizar análises in silico das sequências promotoras dos genes deste fitopatógeno em uma tentativa de arrecadar novas evidências para o melhor entendimento da dinâmica de regulação transcricional de seus genes, incluindo aqueles envolvidos em mecanismos de patogenicidade e virulência. Para tanto, duas estratégias foram utilizadas para predição de elementos cis-regulatórios em regiões promotoras do genoma da cepa referência 9a5c, comprovadamente associada à CVC. A primeira, conhecida como phylogenetic footprinting, foi empregada para identificação de elementos regulatórios conservados em promotores de unidades transcricionais ortólogas, levando em consideração o conjunto de genes de X. fastidiosa e 7 espécies comparativas. O critério para identificação de unidades transcricionais ortólogas, isto é, unidades trancricionais oriundas de espécies distintas e cujos promotores compartilham elementos cis-regulatórios, foi paralelamente estudado utilizando-se informações regulatórias das bactérias modelos: Pseudomonas aeruginosa, Bacillus subtilis e Escherichia coli. Os resultados obtidos com análise de phylogenetic footprinting nos permitiu acessar a rede regulatória transcricional da espécie de forma compreensiva (global). Foram estabelecidas 2990 interações regulatórias, compreendendo 80 motivos distribuídos nos promotores de 56.8% das unidades transcricionais do genoma de X. fastidiosa. Na segunda estratégia recuperamos informações regulatórias experimentalmente validadas em E. coli e complementamos o conhecimento de dez regulons de X. fastidiosa, através de uma metodologia de scanning (varredura), dos quais algumas interações regulatórias já haviam sido previamente descritas por outros trabalhos. Destacamos os regulons de Fur e CRP, reguladores transcricionais globais, que se mostraram responsáveis pela modulação de genes relacionados a mecanismos de invasão e colonização do hospedeiro vegetal entre outros. Por fim, análises comparativas em regiões regulatórias correspondentes entre cepas foram realizadas e diferenças possivelmente associadas a particularidades fenotípicas foram identificadas entre 9a5c e J1a12, um isolado de citros não virulento, e 9a5c e Temecula1, um isolado de videira causador da doença de Pierce. / Xylella fastidiosa is a gram-negative, non-flagellated bacterium responsible for causing economically important diseases such as Pierce\'s disease in grapevines and Citrus Variegated Clorosis (CVC) in sweet orange trees. In the present work we performed in silico analysis on promoter sequences of protein-coding genes from this phytopathogen, including those involved in virulence and pathogenic mechanisms, in an attempt to better understand the underlying transcriptional regulatory dynamics. Two strategies for cis-regulatory elements prediction were applied on promoter sequences from 9a5c strain genome, a proven causal agent of CVC. The first one, known as phylogenetic footprinting, involved the prediction of regulatory motifs conserved on promoter sequences of orthologous transcription units from X. fastidiosa and a set of 7 comparatives species. The criteria to identify orthologous transcription units, i. e., those from different species and whose promoter sequences share at least one common regulatory motif, was studied based on regulatory information available for model organisms: Pseudomonas aeruginosa, Bacillus subtilis and Escherichia coli. The results obtained with the phylogenetic footprinting analysis permitted us to access the underlying transcriptional regulatory network from the species in a comprehensive manner (genome-wide), with a total of 2990 regulatory interactions corresponding to 80 predicted motifs distributed on promoter sequences of 56.8% of all transcription units. In the second strategy regulatory information from E. coli was recovered and used to expand the knowledge of ten regulons in X. fastidiosa, through a scanning process, of which some regulatory interactions were previously described by independent studies. We emphasize some genes related to host invasion and colonization present in the Fur and CRP regulons, two global transcription regulators. Lastly, comparative analysis on corresponding regulatory regions among strains were performed and differences possibly associated to phenotypic variation were identified between 9a5c and J1a12, a non-virulent strain isolated from orange trees, and between 9a5c and Temecula1, a strain associated to Pierce\'s disease on grapevines. Citrus Variegated Chlorosis Clorose Variegada do Citros doença de Pierce fitopatógeno motivos regulatórios Pierce's Disease plant pathogen promoters promotores regulatory motifs regulon regulon Xylella fastidiosa Xylella fastidiosa
3	Análise in silico de regiões promotoras de genes de Xylella fastidiosa / In silico analysis on promoter sequences of protein-coding genes from Xylella fastidiosa Fernando Domingues Kümmel Tria 24 June 2013 (has links) Xylella fastidiosa é uma bactéria gram-negativa, não flagelada, agente causal de doenças de importância econômica como a doença de Pierce nas videiras e a clorose variegada dos citros (CVC) nas laranjeiras. O objetivo do presente trabalho foi realizar análises in silico das sequências promotoras dos genes deste fitopatógeno em uma tentativa de arrecadar novas evidências para o melhor entendimento da dinâmica de regulação transcricional de seus genes, incluindo aqueles envolvidos em mecanismos de patogenicidade e virulência. Para tanto, duas estratégias foram utilizadas para predição de elementos cis-regulatórios em regiões promotoras do genoma da cepa referência 9a5c, comprovadamente associada à CVC. A primeira, conhecida como phylogenetic footprinting, foi empregada para identificação de elementos regulatórios conservados em promotores de unidades transcricionais ortólogas, levando em consideração o conjunto de genes de X. fastidiosa e 7 espécies comparativas. O critério para identificação de unidades transcricionais ortólogas, isto é, unidades trancricionais oriundas de espécies distintas e cujos promotores compartilham elementos cis-regulatórios, foi paralelamente estudado utilizando-se informações regulatórias das bactérias modelos: Pseudomonas aeruginosa, Bacillus subtilis e Escherichia coli. Os resultados obtidos com análise de phylogenetic footprinting nos permitiu acessar a rede regulatória transcricional da espécie de forma compreensiva (global). Foram estabelecidas 2990 interações regulatórias, compreendendo 80 motivos distribuídos nos promotores de 56.8% das unidades transcricionais do genoma de X. fastidiosa. Na segunda estratégia recuperamos informações regulatórias experimentalmente validadas em E. coli e complementamos o conhecimento de dez regulons de X. fastidiosa, através de uma metodologia de scanning (varredura), dos quais algumas interações regulatórias já haviam sido previamente descritas por outros trabalhos. Destacamos os regulons de Fur e CRP, reguladores transcricionais globais, que se mostraram responsáveis pela modulação de genes relacionados a mecanismos de invasão e colonização do hospedeiro vegetal entre outros. Por fim, análises comparativas em regiões regulatórias correspondentes entre cepas foram realizadas e diferenças possivelmente associadas a particularidades fenotípicas foram identificadas entre 9a5c e J1a12, um isolado de citros não virulento, e 9a5c e Temecula1, um isolado de videira causador da doença de Pierce. / Xylella fastidiosa is a gram-negative, non-flagellated bacterium responsible for causing economically important diseases such as Pierce\'s disease in grapevines and Citrus Variegated Clorosis (CVC) in sweet orange trees. In the present work we performed in silico analysis on promoter sequences of protein-coding genes from this phytopathogen, including those involved in virulence and pathogenic mechanisms, in an attempt to better understand the underlying transcriptional regulatory dynamics. Two strategies for cis-regulatory elements prediction were applied on promoter sequences from 9a5c strain genome, a proven causal agent of CVC. The first one, known as phylogenetic footprinting, involved the prediction of regulatory motifs conserved on promoter sequences of orthologous transcription units from X. fastidiosa and a set of 7 comparatives species. The criteria to identify orthologous transcription units, i. e., those from different species and whose promoter sequences share at least one common regulatory motif, was studied based on regulatory information available for model organisms: Pseudomonas aeruginosa, Bacillus subtilis and Escherichia coli. The results obtained with the phylogenetic footprinting analysis permitted us to access the underlying transcriptional regulatory network from the species in a comprehensive manner (genome-wide), with a total of 2990 regulatory interactions corresponding to 80 predicted motifs distributed on promoter sequences of 56.8% of all transcription units. In the second strategy regulatory information from E. coli was recovered and used to expand the knowledge of ten regulons in X. fastidiosa, through a scanning process, of which some regulatory interactions were previously described by independent studies. We emphasize some genes related to host invasion and colonization present in the Fur and CRP regulons, two global transcription regulators. Lastly, comparative analysis on corresponding regulatory regions among strains were performed and differences possibly associated to phenotypic variation were identified between 9a5c and J1a12, a non-virulent strain isolated from orange trees, and between 9a5c and Temecula1, a strain associated to Pierce\'s disease on grapevines. Clorose Variegada do Citros doença de Pierce fitopatógeno motivos regulatórios promotores regulon Xylella fastidiosa Citrus Variegated Chlorosis Pierce's Disease plant pathogen promoters regulatory motifs regulon Xylella fastidiosa
4	Caractérisation systématique des motifs de régulation en cis à l’échelle transcriptomique et liens avec la localisation des ARN Benoit Bouvrette, Louis Philip 04 1900 (has links) La localisation subcellulaire de l’ARN permet un déploiement prompt et spatialement restreint autant des activités protéiques que des ARN noncodant. Le trafic d’ARN est dirigé par des éléments de séquences (sous-séquences primaires, structures secondaires), aussi appelés motifs de régulation, présents en cis à même la molécule d’ARN. Ces motifs sont reconnus par des protéines de liaisons aux ARN qui médient l’acheminement des transcrits vers des sites précis dans la cellule. Des études récentes, chez l’embryon de Drosophile, indiquent que la majorité des ARN ont une localisation subcellulaire asymétrique, suggérant l’existence d’un « code de localisation » complexe. Cependant, ceci peut représenter un exemple exceptionnel et la question demeurait, jusqu’ici, si une prévalence comparable de localisation d’ARN est observable chez des cellules standards développées en culture. De plus, des informations facilement disponibles à propos des caractéristiques de distribution topologique d’instances de motifs à travers des transcriptomes complets étaient jusqu’à présent manquantes. Afin d’avoir un aperçu de l’étendue et des propriétés impliquées dans la localisation des ARN, nous avons soumis des cellules de Drosophile (D17) et de l’humain (HepG2) à un fractionnement biochimique afin d’isoler les fractions nucléaire, cytosolique, membranaire et insoluble. Nous avons ensuite séquencé en profondeur l’ARN extrait et analysé par spectrométrie de masse les protéines extraites de ces fractions. Nous avons nommé cette méthode CeFra-Seq. Par des analyses bio-informatiques, j’ai ensuite cartographié l’enrichissement de divers biotypes d’ARN (p. ex. ARN messager, ARN long non codant, ARN circulaire) et protéines au sein des fractions subcellulaires. Ceci a révélé que la distribution d’un large éventail d’espèces d’ARN codants et non codants est asymétrique. Une analyse des gènes orthologues entre mouche et humain a aussi démontré de fortes similitudes, suggérant que le processus de localisation est évolutivement conservé. De plus, j’ai observé des attributs (p. ex. la taille des transcrits) distincts parmi les populations d’ARN messagers spécifiques à une fraction. Finalement, j’ai observé des corrélations et anti-corrélations spécifiques entre certains groupes d’ARN messagers et leurs protéines. Pour permettre l’étude de la topologie de motifs et de leurs conservations, j’ai créé oRNAment, une base de données d’instances présumée de sites de liaison de protéines chez des ARN codants et non codants. À partir de données de motifs de liaison protéique par RNAcompete et par RNA Bind-n-Seq, j’ai développé un algorithme permettant l’identification rapide d’instances potentielles de ces motifs dans un transcriptome complet. J’ai pu ainsi cataloguer les instances de 453 motifs provenant de 223 protéines liant l’ARN pour 525 718 transcrits chez cinq espèces. Les résultats obtenus ont été validés en les comparant à des données publiques de eCLIP. J’ai, par la suite, utilisé oRNAment pour analyser en détail les aspects topologiques des instances présumées de ces motifs et leurs conservations évolutives relatives. Ceci a permis de démontrer que la plupart des motifs sont distribués de façon similaire entre espèces. De plus, j’ai discerné des points communs entre les sous-groupes de protéines liant des biotypes distincts ou des régions d’ARN spécifiques. La présence de tels patrons, similaires ou non, entre espèces est susceptible de refléter l’importance de leurs fonctions. D’ailleurs, l’analyse plus détaillée du positionnement d’un motif entre régions transcriptomiques comparables chez les vertébrés suggère une conservation synténique de ceux-ci, à divers degrés, pour tous les biotypes d’ARN. La topologie régionale de certaines instances de motifs répétées apparaît aussi comme évolutivement conservée et peut être importante afin de permettre une liaison adéquate de la protéine. Finalement, les résultats compilés avec oRNAment ont permis de postuler sur un nouveau rôle potentiel pour l’ARN long non codant HELLPAR comme éponge de protéines liant l’ARN. La caractérisation systématique d’ARN localisés et de motifs de régulation en cis présentée dans cette thèse démontre comment l’intégration d’information à l’échelle transcriptomique permet d’évaluer la prévalence de l’asymétrie, les caractéristiques distinctes et la conservation évolutive de collections d’ARN. / The subcellular localization of RNA allows a rapid and spatially restricted deployment of protein and noncoding RNA activities. The trafficking of RNA is directed by sequence elements (primary subsequences, secondary structures), also called regulatory motifs, present in cis within the RNA molecule. These motifs are recognized by RNA-binding proteins that mediate the transport of transcripts to specific sites in the cell. Recent studies in the Drosophila embryo indicate that the majority of RNAs display an asymmetric subcellular localization, suggesting the existence of a complex "localization code". However, this may represent an exceptional example and the question remained, until now, whether a comparable prevalence of RNA localization is observable in standard cells grown in culture. In addition, readily available information about the topological distribution of pattern instances across full transcriptomes has been hitherto lacking. In order to have a broad overview of the extent and properties involved in RNA localization, we subjected Drosophila (D17) and human (HepG2) cells to biochemical fractionation to isolate the nuclear, cytosolic, membrane and insoluble fractions. We then performed deep sequencing on the extracted RNA and analyzed through mass spectrometry the proteins extracted from these fractions. We named this method CeFra-Seq. Through bioinformatics analyses, I then profiled the enrichment of various RNA biotypes (e.g. messenger RNA, long noncoding RNA, circular RNA) and proteins within the subcellular fractions. This revealed the high prevalence of asymmetric distribution of both coding and noncoding RNA species. An analysis of orthologous genes between fly and human has also shown strong similarities, suggesting that the localization process is evolutionarily conserved. In addition, I have observed distinct attributes (e.g. transcript size) among fraction-specific messenger RNA populations. Finally, I observed specific correlations and anti-correlations between defined groups of messenger RNAs and the proteins they encode. To study motifs topology and their conservation, I created oRNAment, a database of putative RNA-binding protein binding sites instances in coding and noncoding RNAs. Using data from protein binding motifs assessed by RNAcompete and by RNA Bind-n-Seq experiments, I have developed an algorithm allowing their rapid identification in a complete transcriptome. I was able to catalog the instances of 453 motifs from 223 RNA-binding proteins for 525,718 transcripts in five species. The results obtained were validated by comparing them with public data from eCLIP. I then used oRNAment to further analyze the topological aspects of these motifs’ instances and their relative evolutionary conservation. This showed that most motifs are distributed in a similar fashion between species. In addition, I have detected commonalities between the subgroups of proteins linking preferentially distinct biotypes or specific RNA regions. The presence or absence of such pattern between species is likely a reflection of the importance of their functions. Moreover, a more precise analysis of the position of a motif among comparable transcriptomic regions in vertebrates suggests a syntenic conservation, to varying degrees, in all RNA biotypes. The regional topology of certain motifs as repeated instances also appears to be evolutionarily conserved and may be important in order to allow adequate binding of the protein. Finally, the results compiled with oRNAment allowed to postulate on a potential new role for the long noncoding RNA HELLPAR as an RNA-binding protein sponge. The systematic characterization of RNA localization and cis regulatory motifs presented in this thesis demonstrates how the integration of information at a transcriptomic scale enables the assessment of the prevalence of asymmetry, the distinct characteristics and the evolutionary conservation of RNA clusters. Localisation de l’ARN Régulation post-transcriptionnelle Transcriptomique ARN messagers ARN non codants Protéine liant l’ARN Motifs de régulation en cis Fractionnement subcellulaire Séquençage en profondeur de l’ARN Conservation évolutive RNA localization Post-transcriptional regulation Transcriptomics Messenger RNA Noncoding RNA RNA binding protein Cis-regulatory motifs Subcellular fractionation RNA-sequencing Evolutionary conservation

1

Page generated in 0.0552 seconds