Global ETD Search

31	Flexible and Data-Driven Modeling of 3D Protein Complex Structures Charles W Christoffer (17482395) 30 November 2023 (has links) <p dir="ltr">Proteins and their interactions with each other, with nucleic acids, and with other molecules are foundational to all known forms of life. The three-dimensional structures of these interactions are an essential component of a comprehensive understanding of how they function. Molecular-biological hypothesis formulation and rational drug design are both often predicated on a particular structure model of the molecule or complex of interest. While experimental methods capable of determining atomic-detail structures of molecules and complexes exist, such as the popular X-ray crystallography and cryo-electron microscopy, these methods require both laborious sample preparation and expensive instruments with limited throughput. Computational methods of predicting complex structures are therefore desirable if they can enable cheap, high-throughput virtual screening of the space of biological hypotheses. Many common biomolecular contexts have largely been blind spots for predictive modeling of complex structures. In this direction, docking methods are proposed to address extreme conformational change, nonuniform environments, and distance-geometric priors. Flex-LZerD deforms a flexible protein using a novel fitting procedure based on iterated normal mode decomposition and was shown to construct accurate complex models even when an initial input subunit structure exhibits extreme conformational differences from its bound state. Mem-LZerD efficiently constrains the docking search space by augmenting the geometric hashing data structure at the core of the LZerD algorithm and enabled membrane protein complexes to be efficiently and accurately modeled. Finally, atomic distance-based approaches developed during modeling competitions and collaborations with wet lab biologists were shown to effectively integrate domain knowledge into complex modeling pipelines.</p> Bioinformatic methods development Protein-protein docking Flexible docking Membrane protein docking algorithms normal mode analysis (NMA)
32	HIGHLY ACCURATE MACROMOLECULAR STRUCTURE COMPLEX DETECTION, DETERMINATION AND EVALUATION BY DEEP LEARNING Xiao Wang (17405185) 17 November 2023 (has links) <p dir="ltr">In life sciences, the determination of macromolecular structures and their functions, particularly proteins and protein complexes, is of paramount importance, as these molecules play critical roles within cells. The specific physical interactions of macromolecules govern molecular and cellular functions, making the 3D structure elucidation of these entities essential for comprehending the mechanisms underlying life processes, diseases, and drug discovery. Cryo-electron microscopy (cryo-EM) has emerged as a promising experimental technique for obtaining 3D macromolecular structures. In the course of my research, I proposed CryoREAD, an innovative AI-based method for <i>de nov</i>o DNA/RNA structure modeling. This novel approach represents the first fully automated solution for DNA/RNA structure modeling from cryo-EM maps at near-atomic resolution. However, as the resolution decreases, structure modeling becomes significantly more challenging. To address this challenge, I introduced Emap2sec+, a 3D deep convolutional neural network designed to identify protein secondary structures, RNA, and DNA information from cryo-EM maps at intermediate resolutions ranging from 5-10 Å. Additionally, I presented Alpha-EM-Multimer, a groundbreaking method for automatically building full protein complexes from cryo-EM maps at intermediate resolution. Alpha-EM-Multimer employs a diffusion model to trace the protein backbone and subsequently fits the AlphaFold predicted single-chain structure to construct the complete protein complex. Notably, this method stands as the first to enable the modeling of protein complexes with more than 10,000 residues for cryo-EM maps at intermediate resolution, achieving an average TM-Score of predicted protein complexes above 0.8, which closely approximates the native structure. Furthermore, I addressed the recognition of local structural errors in predicted and experimental protein structures by proposing DAQ, an evaluation approach for experimental protein structure quality that utilizes detection probabilities derived from cryo-EM maps via a pretrained multi-task neural network. In the pursuit of evaluating protein complexes generated through computational methods, I developed GNN-DOVE and DOVE, leveraging convolutional neural networks and graph neural networks to assess the accuracy of predicted protein complex structures. These advancements in cryo-EM-based structural modeling and evaluation methodologies hold significant promise for advancing our understanding of complex macromolecular systems and their biological implications.</p> Bioinformatic methods development Deep learning macromolecular structure determination cryo-electron microscopy images deep learning
33	The Majority of the Diaphragm Immune Transcriptome Profile Rescued in Mdx Mice by Microdystrophin Gene Therapy was maintained by Voluntary Wheel Running Yuan, Zeyu 09 February 2023 (has links) The purpose of this thesis project was to elucidate the immune transcriptomic changes in the diaphragm of mdx mice treated with microdystrophin gene therapy with and without running wheel activity. Mdx mice are a model of Duchenne Muscular Dystrophy (DMD). Similar to DMD, mdx pathophysiology is associated with chronic inflammation due to sarcolemma fragility and cellular membrane leakage. Immune modulation has not yet been described when endurance exercise and AAV-microdystrophin gene therapy have been combined in mdx mice. An increase of physical activity in DMD individuals is a potential outcome of current clinical studies investigating microdystrophin treatment; therefore, understanding the impacts of physical activity on the immune system, particularly for the diaphragm, may be important to minimize risk. Recently, the Grange lab published the endurance and contractile property outcomes of combined microdystrophin gene therapy and running wheel activity in mdx mice.1 Diaphragm RNA-seq transcriptomic data were also collected from this study for gene expression analysis. Using this dataset, I tested the hypothesis that relative to mdxGT (mdx mice treated with gene therapy), transcripts related to the immune response such as immune cell recruitment, activation, and downstream signals that promote fibrosis deposition were unchanged or downregulated in mdxRGT (mdx mice treated with gene therapy and access to running wheel). DEGs (differentially expressed genes) were analyzed with Microsoft Excel, R, and bioinformatic tools such as KEGG and DAVID to explain immune system adaptations in response to combined microdystrophin treatment and running in mdx mice. Two major inflammatory signaling pathways, the IL-6/JAK/STAT and NF-kB signaling pathways translationally relevant to DMD patients were rescued by gene therapy towards WT expression levels. Although running maintained the majority of the rescued transcriptome profile (691 of 724 genes), some immune response-related gene expressions (33 of 724 genes) were modulated including genes related to chemotaxis and cellular migration. These changes suggested potential signaling for angiogenesis and a fast to slow fiber type shift; however, unbiased analysis with bioinformatic tools did not confirm either of these possibilities. The data from this study revealed inflammatory and fibrotic signaling pathways commonly observed in DMD patients and mdx mice were rescued by the AAV microdystrophin gene therapy and were maintained by voluntary wheel running / Master of Science / Duchenne Muscular Dystrophy (DMD) is an X chromosome-linked muscular dystrophy, a genetic disease that affects around 1 in 14,000 boys globally. DMD is lethal and currently there is no cure. Mutations in the DMD gene results in the absence of the protein dystrophin. The dystrophin protein and other proteins associated with it provide structural support to the skeletal muscle membrane. Without it, muscles are more easily damaged during contraction. This damage promotes recruitment of immune cells which initiates the first stage of muscle repair. Under normal circumstances, this inflammatory reaction caused by immune cells restores the skeletal muscles. However, in DMD patients, repeated breakdown and regeneration of skeletal muscles leads to abnormal inflammation which promotes negative outcomes such as increased fibrosis. Fibrosis impairs muscle function, especially the diaphragm . Hamm et al., 2021 from the Grange lab investigated the effects of microdystrophin gene therapy and increased physical activity in mdx mice, a mouse model of DMD, with the idea that some of the negative changes with muscular dystrophy could be improved. The results showed a positive increase of endurance capacity in mdx mice treated with gene therapy alone (mdxGT group) and a greater increase if the mice also used a running wheel (mdxRGT group) compared to untreated mdx mice (mdx group). These findings suggested that gene therapy can increase a DMD patient's ability to become more physically active. However, the effects of running and microdystrophin gene therapy on the damaging inflammatory response in the diaphragm were not reported. To address this question, gene expression data from diaphragm muscles of all treatment groups were collected in the Hamm et al., 2021 study for later analysis. In my study, these diaphragm gene expression data were used to compare inflammatory signals between the various treatment groups. Indicators of skeletal muscle damage, immune cell accumulation and fibrosis deposition were rescued (i.e., returned to healthy mice levels) by microdystrophin gene therapy (mdxGT group). Running did not exert any negative effects on the majority of genes rescued by the microdystrophin therapy (mdxRGT group). These results indicated that voluntary wheel running could maintain the reduced inflammatory signals due to the microdystrophin gene therapy in mdx mice. If the function of the skeletal muscle of dystrophic boys was similarly improved by microdystrophin gene therapy and exercise did not interfere with its positive effects, DMD boys could potentially be physically active similar to normal boys of their age. DMD Inflammation exercise physiology mdx running wheel RNA-Seq Bioinformatic R studio AAV microdystrophin gene therapy
34	New Phages, New Insights: Diversity in Phage Research Leads To Impactful Phage Therapy Outcomes Harry Jack Ashbaugh (18858763) 22 June 2024 (has links) <p dir="ltr">Bacteriophages are viruses that infect, replicate in, and kill bacteria. In industries that utilize microbes for production, like <i>E.coli</i> in the production of insulin or <i>A. globiformis</i> in the production of cheese, bacteriophages can pose a huge threat to manufacturing. However, bacteriophages aren’t entirely detrimental: we can use the destructive nature of bacteriophages to kill bacterial infections in the human body. This process is known as phage therapy, and while it isn’t a new concept, it is being seen as an increasingly necessary alternative to traditional antibiotics due to the increasing rise of antimicrobial resistance. Because bacteriophages have an entirely different mechanism of destroying bacteria, they can be used in tandem with traditional antibiotic regimens to help wipe out infections. Also, phages have a highly specific host range, meaning that an injection of a certain type of phage will only infect the bacteria it is targeting, sparing important gut microbes.</p><p dir="ltr">The search for new phages to treat infections has resulted in the discovery of over 25,000 actinobacteriophages, with about 4898 of them being sequenced. This is extremely important and necessary, but 49% of these sequenced phages are all mycobacteriophages. This bias towards mycobacteriophages is likely because they infect the genus mycobacterium, where the deadly <i>M. tuberculosis</i> resides. The discovery of new phages using less studied hosts results in novel phages that exhibit rarely seen morphologies, phenotypes, and genotypes. This leads to a better overall understanding of the phage proteome and can lead to new breakthroughs in phage therapy.</p><p dir="ltr">The purpose of this research is to study the differences between different types of phages and try to determine the impact it may have on phage therapy. This thesis is divided into three chapters. In the first chapter, novel phages from different hosts, including <i>M. smegmatis</i> and <i>A. globiformis</i>, were discovered and annotated, and the differences between them were characterized. The discovery of arthrobacteriophages immediately resulted in rare and previously unseen phage characteristics. In the second chapter, proteomic mass spectrometry data of various diverse mycobacteriophages was analyzed to determine differences. Despite being from multiple clusters and lifecycles, the expression data had more similarities than differences. In the third chapter, an alternative method of extracting DNA from phages is explored to determine the result of discrepancies in gel quality from <i>M. smegmatis</i> and <i>A. globiformis.</i><i> </i>Although a large amount of nucleic material was derived, it was not stable DNA and was unsuitable for use. The reason for poor gel quality is still unknown.</p> Bioinformatic methods development Proteomics and metabolomics Virology phage therapy and biotechnology bacteriophage engineering
35	Algoritmo híbrido multi-objetivo para predição de estrutura terciária de proteínas / Multi-objective approach to protein tertiary structure prediction Faccioli, Rodrigo Antonio 12 April 2007 (has links) Muitos problemas de otimização multi-objetivo utilizam os algoritmos evolutivos para encontrar as melhores soluções. Muitos desses algoritmos empregam as fronteiras de Pareto como estratégia para obter tais soluções. Entretando, conforme relatado na literatura, há a limitação da fronteira para problemas com até três objetivos, podendo tornar seu emprego insatisfatório para os problemas com quatro ou mais objetivos. Além disso, as propostas apresentadas muitas vezes eliminam o emprego dos algoritmos evolutivos, os quais utilizam tais fronteiras. Entretanto, as características dos algoritmos evolutivos os qualificam para ser empregados em problemas de otimização, como já vem sendo difundido pela literatura, evitando eliminá-lo por causa da limitação das fronteiras de Pareto. Assim sendo, neste trabalho se buscou eliminar as fronteiras de Pareto e para isso utilizou a lógica Fuzzy, mantendo-se assim o emprego dos algoritmos evolutivos. O problema escolhido para investigar essa substituição foi o problema de predição de estrutura terciária de proteínas, pois além de se encontrar em aberto é de suma relevância para a área de bioinformática. / Several multi-objective optimization problems utilize evolutionary algorithms to find the best solution. Some of these algoritms make use of the Pareto front as a strategy to find these solutions. However, according to the literature, the Pareto front limitation for problems with up to three objectives can make its employment unsatisfactory in problems with four or more objectives. Moreover, many authors, in most cases, propose to remove the evolutionay algorithms because of Pareto front limitation. Nevertheless, characteristics of evolutionay algorithms qualify them to be employed in optimization problems, as it has being spread out by literature, preventing to eliminate it because the Pareto front elimination. Thus being, this work investigated to remove the Pareto front and for this utilized the Fuzzy logic, remaining itself thus the employ of evolutionary algorithms. The choice problem to investigate this remove was the protein tertiary structure prediction, because it is a open problem and extremely relevance to bioinformatic area. Algoritmos evolutivos Bioinformatic Bioinformática Evolutionary algorithms Folding Folding Fronteiras de Pareto Fuzzy logic Lógica Fuzzy Multi-objective Multi-objetivo Pareto front
36	Analyse bioinformatique du contrôle des éléments transposables par les siARN chez Arabidopsis thaliana / Bioinformatic analysis of siRNA control on transposable elements in Arabidopsis thaliana Sarazin, Alexis 23 October 2012 (has links) De nombreux mécanismes contrôlent et limitent la prolifération des éléments transposables (ET) dans les génomes dont ils menacent l'intégrité structurale et fonctionnelle. Chez les plantes l'interférence ARN (ARNi) joue un rôle important dans ces contrôles via des petits ARN d'environ 20nt qui guident la régulation de l'expression de séquences endogènes ou exogènes par deux types de mécanismes. Un premier mécanisme, partagé par de nombreux organismes eucaryotes, inhibe l'activité d'ARNm par un contrôle post-transcriptionnel. Un deuxième type de régulation, permet un contrôle transcriptionnel de l'activité des ET via un mécanisme appelé RNA directed DNA Methylation (RdDM) qui implique des siARN (« short-interfering RNA ») de 24nt qui guident la méthylation de l'ADN spécifiquement au niveau des séquences d'ET. Les siARN sont impliqués également dans la restauration progressive de la méthylation de l'ADN après une perte induite par la mutation du gène DDM1 (Decrease in DNA Methylation 1). L'objectif de cette thèse est de tirer avantage des technologies de séquençage à haut débit pour caractériser le contrôle des ET par les siARN chez la plante modèle Arabidopsis thaliana.Dans un premier temps, j'ai développé des méthodes et outils bioinformatiques afin de gérer efficacement les données de séquençage à haut débit de banques de petit ARN. Ces outils, regroupés en pipeline, visent à permettre l'étude de l'accumulation des siARN correspondant aux séquences d'ET ou de familles d'ET ainsi que leur visualisation de manière globale ou détaillée.Ces outils ont ensuite été appliqués pour caractériser, dans un contexte sauvage, l'association entre les siARN et les ET afin de déterminer des facteurs pouvant expliquer les différences d'abondance en siARN observées. Ces analyses, réalisées en tenant compte de l'état de méthylation de l'ADN et du contexte génomique des ET apportent une vue statique du contrôle des ET par les siARN et de leur impact sur les gènes situés à proximité.L'analyse de banques de petits ARN de mutants de la voie de l'ARNi a ensuite été réalisée afin mieux caractériser l'impact de la perte de méthylation de l'ADN sur les populations de siARN et notamment définir les mécanismes impliqués dans la production des siARN de 21nt induite dans le mutant ddm1. Ces analyses comparatives du contrôle des ET lors d'une perte de la méthylation de l'ADN ont permis de mettre en évidence une production de siARN de 24nt indépendante de la voie classique du RdDM et de proposer un modèle permettant d'expliquer la production de siARN de 21nt dans le mutant ddm1.Dans un dernier temps, j'ai cherché à mieux définir l'implication des siARN dans la restauration des états de méthylation de l'ADN. Les variations de méthylation de l'ADN induites par la mutation ddm1 ont été caractérisées ainsi que leur stabilité transgénérationnelle au sein d'une population d'epiRIL. La stabilité de l'hypométhylation de l'ADN a été étudiée, au regard de données de séquençage à haut débit de banques de petits ARN de lignées WT, ddm1 ainsi que pour 4 lignées epiRIL, afin d'apporter une notion temporelle à l'étude du contrôle des ET par les siARN.Les résultats soulignent le rôle majeur des petits ARN pour le contrôle des éléments transposables afin de préserver l'intégrité structurale et fonctionnelle du génome et ce, via des mécanismes variés en fonction des ET. Ce travail ouvre la voie vers une analyse du contrôle des ET par les siARN basée sur une approche regroupant les ET en réseaux en fonction des séquences de siARN qu'ils partagent. Cela permettrait d'étudier les « connections-siARN » entre ET afin de, par exemple, explorer l'action en trans des siARN pour la restauration de la méthylation de l'ADN. / Many mechanisms control and limit the proliferation of transposable elements (TEs) which could otherwise threaten the structural and functional integrity of the genome. In plants RNA interference (RNAi) plays an important role in this control through small RNAs that guide the expression regulation of endogenous or exogenous sequences by two types of mechanisms. The first such mechanism, shared by many eukaryotic organisms, acts at the post-transcriptionnal level to inhibit the activity of mRNA. A second type of regulation allows the transcriptional control of TEs activity through a mechanism called RNA directed DNA methylation (RdDM) which involves 24nt long siRNA ("short-interfering RNA") that guide DNA methylation specifically on TEs sequences. Furthermore, siRNAs are also involved in the progressive restoration of DNA methylation after a loss induced by mutation of the DDM1 gene (Decrease in DNA Methylation 1). The aim of this thesis is to take advantage of high-throughput sequencing technologies to characterize these TEs controls mechanisms by siRNA in the model plant Arabidopsis thaliana .At first, I developed methods and bioinformatics tools to effectively manage data produced by high-throughput sequencing of small RNA libraries. These tools, combined in a pipeline, are designed to allow the study the accumulation of siRNA corresponding to TE sequences or TE families as well as their global or detailed visualization.These tools were applied to characterize, in a wild type background, the association between siRNA and TEs in order to define factors that may explain the observed differences in siRNA abundance . These analyses were performed by taking into account both DNA methylation states and genomic context. It provides a static view of siRNA control of TEs and their impact on nearby genes. Then, analysis of small RNA libraries from mutants of the RNAi pathway was performed to better characterize the impact of DNA methylation loss on siRNA populations and to define the mechanisms involved in the production of 21nt siRNA induced in the ddm1 mutant. These comparative analyses of the TE control after loss of DNA methylation allow us to highlight the production of 24nt siRNA independently of the classical RdDM pathway and to propose a model explaining the production of 21nt siRNA in the ddm1 mutant. At last, I tried to clarify the involvement of siRNA in the restoration of DNA methylation. Changes in DNA methylation induced by ddm1 mutation were characterized as well as their transgenerational stability in an epiRIL population. The stability of DNA hypomethylation has been studied in relation to high-throughput sequencing of small RNAs data from WT, ddm1 and 4 epiRIL lines. It provides a temporal view of the TE control by siRNA. The results highlight the important role of small RNAs in the control of transposable elements in order to preserve structural and functional integrity of the genome through a variety of mechanisms depending on TE sequences. This work opens the way to the analysis of the siRNA control on TEs based on approaches that combine TEs in networks based on their shared siRNA sequences. It would allow to study "siRNA-connections" between TEs in order to explore, for example, the action in trans of siRNA in the restoration of DNA methylation defect. Bioinformatique Arabidopsis Éléments transposables SiARN Méthylation de l'ADN Séquençage à haut débit Bioinformatic Arabidopsis Transposable elements SiRNA DNA methylation High-throughput sequencing
37	Exploration du rôle de l'épissage mineur dans le développement embryonnaire : modèle du syndrome de Taybi-Linder) (TALS) / Exploration of minor splicing function during embryonic development with the Taybi-Linder Syndrome (TALS) model Cologne, Audric 10 October 2019 (has links) Le Syndrome de Taybi-Linder (TALS) est une maladie génétique rare affectant le développement embryonnaire, caractérisée par un nanisme microcéphalique sévère et un décès précoce des patients. Le gène muté dans ce syndrome est RNU4ATAC, qui encode un petit ARN nucléaire (snRNA) non-codant : U4atac. Ce snRNA est l’une des briques composant le spliceosome mineur, une machinerie nucléaire dédiée à l’épissage des introns U12, un groupe d’introns peu étudié car présent dans ~1 % des gènes seulement. Dans le TALS, ces introns sont fréquemment retenus dans les transcrits matures, l’épissage correct des introns U12 semble donc capital pour le développement embryonnaire. L’étude du profil transcriptomique des patients TALS permet ainsi d’établir les conséquences moléculaires d’un dysfonctionnement du spliceosome mineur, nous permettant d’en apprendre davantage sur les mécanismes d’épissage des introns U12 en condition physiologique ou pathologique, et sur le rôle de l’épissage mineur dans le développement embryonnaire. Cette thèse présente la première analyse approfondie du transcriptome de cellules provenant de patients TALS. Pour mener cette analyse, nous avons développé un pipeline bioinformatique qui, à partir de données RNA-seq de seconde génération, utilise différentes méthodes dédiées à l’étude différentielle de l’expression des gènes ou de la qualité d’épissage entre patients et contrôles. L’épissage étant particulièrement complexe à analyser à partir de reads courts, deux approches complémentaires ont été utilisées : l’une classique, basée sur l’alignement des reads, et l’autre plus originale, basée sur l’assemblage des reads et permettant de détecter plus d’événements d’épissage non-annotés (KisSplice). Une des conséquences attendue d’un dysfonctionnement du spliceosome mineur est une rétention massive des introns U12 dans les ARN matures. Cependant, la détection et la quantification de rétentions d’intron chez les mammifères constituent encore aujourd’hui un challenge bioinformatique. Nous avons donc utilisé une méthode récente dédiée à l’analyse des rétentions d’introns pour caractériser le plus précisément possible le profil transcriptomique des patients TALS. J’ai ainsi participé au développement de KisSplice et de notre outil d’analyse statistique des différentielles d’épissage, kissDE, et mis en évidence certaines caractéristiques de l’épissage mineur, que ce soit en condition physiologique ou pathologique / The Taybi-Linder Syndrome (TALS) is a rare genetic disorder of the embryonic development leading to a severe microcephaly, a primordial dwarfism and an early/unexpected death. The mutated gene in this syndrome is RNU4ATAC, which encode a non-coding small nuclear RNA (snRNA) named U4atac, involved in the minor spliceosome. This nuclear machinery is dedicated to the splicing of a small number of particular introns : the U12 introns. Because only about 1 % of the Human’s genes display at least one U12 intron, they have not been extensively study and little is known about their function. In TALS patients’ cells, most of the U12 introns are retained in mature transcripts ; hence, splicing of U12 introns seems important for the embryonic development. Studying TALS patients’ cells transcriptomes both in physiological and pathological conditions should enable us to precisely identify most of the molecular consequences of a minor splicing defect and could shed light on the mechanism linking minor splicing and embryonic development. This thesis is the first work to conduct an in depth analysis of TALS patients’ cells transcriptomes. In order to do a precise analysis, we developed a bioinformatic pipeline that uses multiple methods to detect differentially expressed or spliced genes between patients and controls and from second generation RNA-seq data. Splicing analysis is a very complex task complete with short reads ; hence, we used two complementary approaches. The first one is based on reads alignement to a reference genome, method conventionnally used to work on splicing, and the second one is based on reads assembly (KisSplice), a original method enabling to find more non-annotated splicing events. One of the expected consequences of a minor splicing malfunction is a global U12 introns retention in mature transcripts. However, intron retention detection and quantification in mammals is particulary difficult task in mammals, thus we used a new method dedicated to intron retentions analysis to study the transcriptomic profile of TALS patients. During my thesis, I was one of the developer of KisSplice and kissDE, our differential splicing analysis tool, and I identified important charcteristics of minor splicing either in physiological or pathological conditions TALS Épissage mineur Introns U12 RNU4ATAC Transcriptomique Bioinformatique TALS Minor splicing U12 introns RNU4ATAC Transcriptomic Bioinformatic 570
38	Conséquences du contexte haplotypique sur la fonctionnalité des protéines : application à la mucoviscidose / Consequences of the haplotype context on protein function : application to cystic fibrosis Cuppens, Tania 07 May 2019 (has links) Notre génome contient des centaines de milliers de variants génétiques, qui pour la plupart, n’ont aucun impact sur notre santé. Après séquençage, il faut les filtrer pour ne conserver que ceux qui sont potentiellement impliquées dans une maladie. On utilise des annotateurs qui prédisent l’impact des variants. Ces prédictions sont faites sans tenir compte des variants en cis dans le même gène. Pourtant, des variants neutres peuvent, lorsqu’ils sont réunis chez un individu, devenir délétères. J’ai donc développé l’outil bioinformatique GEMPROT qui permet de visualiser l’effet des variants génétiques sur la séquence protéique et de mettre en évidence les combinaisons de variants touchant un même domaine fonctionnel.J’ai ensuite étudié l’impact de deux variants associés à la p.Phe508del (508del) sur la protéine CFTR.Le variant p.Val470M est présent sur tous les haplotypes portant la délétion mais pas sur la séquence de référence, qui est généralement utilisée pour la construction de plasmides. Nous avons montré des différences de fonction de la protéine CFTR selon l’acide aminé en position 470. La fonction est augmentée avec une Valine et il convient donc de s’assurer, lors de la construction de plasmides, que le contexte haplotypique des variants étudiés est bien respecté. Le variant p.Ile1027Thr conduit à une dégradation de la fonction de la protéine 508del.Ce variant n’est présent que sur une partie des haplotypes 508del et pourrait donc avoir un effet modificateur de l’expression de la délétion. En conclusion, nous montrons l’importance de la prise en compte des contextes haplotypiques dans l’étude des maladies et proposons un outil bioinformatique pour le faire. / We all carry hundreds of thousands genetic variations in our genome that, for the most of them, have no impact on our health. After sequencing, they must be filtered to only retain those potentially involved in a disease. We use annotators that predict the impact of variants.These predictions are done for each variant taken independently without considering cis variants in the same gene. However, neutral variants can become deleterious when associated together. I have developed the bioinformatics tool GEMPROT, which makes it possible to visualize the effect of genetic variants on the protein sequence and to highlight combinations of variants affecting the same functional domain.I then studied the impact of two variants associated with p.Phe508del (508del) on CFTR protein function.The variant p.Val470M is present on all carrying deletion haplotypes but not on the reference sequence, which is generally used for the construction of plasmids. We have shown differences in the function of the mutated CFTR protein 508del according to the amino acid at position 470. The function is increased with a Valine and it is therefore necessary to ensure, when constructing plasmids, that the haplotype context of the studied variants is well respected.The variant p.Ile1027Thr leads to a degradation of the function of the 508del protein. This variant is present only on a portion of the 508del haplotypes and could therefore have a modifying effect on deletion expression. In conclusion, we show the importance of considering haplotype contexts in the diseases studies and propose a bioinformatics tool to do so. Outil bioinformatique Contexte haplotypique Visualisation Mucoviscidose Fonction protéique Bioinformatic tool Haplotype context Visualization Cystic fibrosis Protein function
39	Bioinformatic prediction of conserved promoters across multiple whole genomes of Chlamydia Grech, Brian James January 2007 (has links) The genome sequencing projects have generated a wealth of genomic data and the analysis of this data has provided many interesting findings. However, genome wide analysis of bacteria for promoters has lagged behind, because it has been difficult to accurately predict the promoters with so much background noise that are found in bacterial genomes. One approach to overcome this problem is to predict phylogenetically conserved promoters across multiple genomes of different bacteria, thus filtering out many of the false positives, which are predicted by the current methods. However, there are no programmes capable of doing this. Therefore, the work presented in this thesis has developed a position weight matrix (PWM) based programme called Multiscan that predicts conserved promoters across multiple bacterial genomes. Since Chlamydia is one of the most sequenced bacterial genera and has a high level of conservation of genes and large-scale conservation of gene order between species, Multiscan was developed and tested on Chlamydia. When Multiscan analysed a genome wide dataset of equivalent non-coding regions (NCRs) upstream of genes, from Chlamydia trachomatis, Chlamydia pneumoniae and Chlamydia caviae for σ66 promoters that are phylogenetically conserved, Multiscan predicted 42 promoters. Since only one of the 42 promoters predicted by Multiscan had previously available biological data to confirm its prediction, an additional subset of 10 of the remaining 41 σ66 promoters were analysed in C. trachomatis by mapping the 5' end of the transcripts. The primer extension assay synthesised cDNA products of the correct length for seven of the 10 genes chosen. When the performance of Multiscan was compared to one of the accepted method for genome wide prediction of promoters in bacteria, the &quotstandard PWM method", Multiscan predicted 32 more promoters than the &quotstandard PWM method" in Chlamydia. Furthermore, the promoters predicted by Multiscan were up to three more mismatches from the Escherichia coli σ70 consensus sequence than the promoters predicted by the standard PWM method. Although Multiscan predicted 42 promoters that were well conserved across the three chlamydial species, the analysis was unable to identify the 14 known σ66 promoters in C. trachomatis. These promoters were missed (1) because they were dissimilar to the E. coli σ70 consensus sequence and/or (2) because the promoters were poorly conserved across the three chlamydial species. To address the second possibility, the 14 false negatives were analysed by another phylogenetic footprinting method. Fourteen sets of equivalent NCRs located upstream of the homologous genes from the three chlamydiae were aligned with the computer programme Clustal W and the alignment analysed &quotby eye" for evidence of phylogenetic footprints containing the 14 false negatives. The analysis identified that seven of the 14 false negatives were poorly conserved across the chlamydial species. Analysis of two of the seven promoters that could not be footprinted, the promoters of ltuA and ltuB, by mapping the transcriptional start sites in C. caviae, confirmed their poor conservation across C. trachomatis and C. caviae. This analysis showed that substantial differences exist in chlamydial σ66 promoters from equivalent NCRs upstream of genes. This study has developed a new computer programme for genome wide prediction of promoters that are phylogenetically conserved and has shown the value of this programme by identifying seven new well conserved promoters and seven candidate poorly conserved promoters in Chlamydia. algorithm bioinformatic Chlamydia comparative genomics gene expression regulation phylogenetic footprinting phylogeny promoter sigma factor transcription transcription factor transcription start site
40	Algoritmo híbrido multi-objetivo para predição de estrutura terciária de proteínas / Multi-objective approach to protein tertiary structure prediction Rodrigo Antonio Faccioli 12 April 2007 (has links) Muitos problemas de otimização multi-objetivo utilizam os algoritmos evolutivos para encontrar as melhores soluções. Muitos desses algoritmos empregam as fronteiras de Pareto como estratégia para obter tais soluções. Entretando, conforme relatado na literatura, há a limitação da fronteira para problemas com até três objetivos, podendo tornar seu emprego insatisfatório para os problemas com quatro ou mais objetivos. Além disso, as propostas apresentadas muitas vezes eliminam o emprego dos algoritmos evolutivos, os quais utilizam tais fronteiras. Entretanto, as características dos algoritmos evolutivos os qualificam para ser empregados em problemas de otimização, como já vem sendo difundido pela literatura, evitando eliminá-lo por causa da limitação das fronteiras de Pareto. Assim sendo, neste trabalho se buscou eliminar as fronteiras de Pareto e para isso utilizou a lógica Fuzzy, mantendo-se assim o emprego dos algoritmos evolutivos. O problema escolhido para investigar essa substituição foi o problema de predição de estrutura terciária de proteínas, pois além de se encontrar em aberto é de suma relevância para a área de bioinformática. / Several multi-objective optimization problems utilize evolutionary algorithms to find the best solution. Some of these algoritms make use of the Pareto front as a strategy to find these solutions. However, according to the literature, the Pareto front limitation for problems with up to three objectives can make its employment unsatisfactory in problems with four or more objectives. Moreover, many authors, in most cases, propose to remove the evolutionay algorithms because of Pareto front limitation. Nevertheless, characteristics of evolutionay algorithms qualify them to be employed in optimization problems, as it has being spread out by literature, preventing to eliminate it because the Pareto front elimination. Thus being, this work investigated to remove the Pareto front and for this utilized the Fuzzy logic, remaining itself thus the employ of evolutionary algorithms. The choice problem to investigate this remove was the protein tertiary structure prediction, because it is a open problem and extremely relevance to bioinformatic area. Algoritmos evolutivos Bioinformática Folding Fronteiras de Pareto Lógica Fuzzy Multi-objetivo Bioinformatic Evolutionary algorithms Folding Fuzzy logic Multi-objective Pareto front

Search results