Spelling suggestions: "subject:"bioinformatic"" "subject:"bioinformatics""
41 |
Etude des facteurs de transcription impliqués dans l'accumulation lipidique en condition de stress azoté chez la microalgue haptophyte Isochrysis affinis galbana / Study of transcription factors involved in lipid accumulation induced by nitrogen stress in the microalgae Isochrysis affinis galbanaThiriet-Rupert, Stanislas 10 January 2017 (has links)
Chez tout organisme, l’évolution et l’acclimatation aux changements du milieu de vie sont orchestrés par de nombreux acteurs moléculaires. Parmi eux, les facteurs de transcription (FTs) jouent un rôle clé en régulant l’expression des gènes. Identifier les FTs impliqués dans la production de composés d’intérêt est donc une étape importante dans un contexte biotechnologique. Le laboratoire dispose d’une souche mutante de la microalgue haptophyte Tisochrysis lutea produisant deux fois plus de lipides de réserve que la souche sauvage en condition de privation azotée. Compte tenu du rôle clé des FTs dans l’établissement du phénotype, cette thèse vise à identifier les FTs impliqués dans la mise en place de ce phénotype mutant.Un pipeline bio-informatique d’identification et classification des FTs présents dans le génome de T. lutea a été élaboré. Le manque de donnée chez les haptophytes constituant un vide dans l’étude de l’histoire évolutive des microalgues, une étude comparative des FTs présents dans le génome d’algues de différentes lignées a été réalisée. Celle-ci révèle que l’étude des FTs aide à comprendre et illustrer l’histoire évolutive des microalgues par la mise en évidence de présences/absences de familles de FTs spécifiques de lignée.Afin de comprendre l’établissement du phénotype de la souche mutante de T. lutea, des données transcriptomiques ont permis la construction de réseaux de co-expression et de régulation des gènes chez les deux souches. Leur analyse croisée a identifié sept FTs candidats potentiellement liés au phénotype mutant. Une approche de p-RT-PCR a confirmé l’implication de deux FTs dans la remobilisation de l’'azote en condition de stress azoté. / In every organism, evolution and acclimation to environmental changes are orchestrated by numerous molecular players. Among them, transcription factors (TFs) play a crucial role by regulating gene expression. Therefore, identify TFs involved in the production of high value products is a significant step in a biotechnological context. The laboratory has at its disposal a mutant strain of the haptophyte microalga Tisochrysis lutea producing twice more storage lipids than the wild type strain when exposed to nitrogen deprivation. Given the key role of TFs in phenotype establishment, this PhD aim at identify the TFs involved in that of the mutant phenotype of T. lutea.A TFs identification and classification pipeline was elaborated and applied to T. lutea’s genome. Since the lack of data in haptophytes constitutes a limit in studies on microalgae evolutionary history, a comparative study of TFs identified in the genome of microalgae belonging to different lineages was carried out. This study reveals that TFs could be used to understand and illustrate microalgae evolutionary history through the highlight of lineage specific presence/absence of TF families.Aiming at understanding T. lutea’s mutant strain phenotype establishment, transcriptomic data were used to build gene co-expression networks and gene regulatory networks for both strains. Their comparative analysis identified seven TFs potentially liked to the mutant phenotype. A q-RT-PCR approach confirmed the involvement of two TFs in nitrogen recycling under nitrogen deprivation.
|
42 |
Desenvolvimento de ferramenta e análise in silico da ocorrência de microssatélites no genoma do arroz / Development of a bioinformatics tool and in silico studies on microssatellites occurrence on the rice genomeMaia, Luciano Carlos da 05 April 2007 (has links)
Made available in DSpace on 2014-08-20T14:06:15Z (GMT). No. of bitstreams: 1
Dissertacao_Luciano_Carlos_da_Maia.pdf: 2299013 bytes, checksum: 1ea0e6e151a64a565fee2caf64a2b9d7 (MD5)
Previous issue date: 2007-04-05 / The classic plant breeding methods are responsible for the major advances of
modern agriculture. However, molecular biology and genomic techniques have
provided insights into how further advance in genetic gains. Molecular markers have
been successfully applied in genetic mapping and marker assisted selection in many
plant species. Rice after the complete sequence of its genome, has been more and
more used as a model for cereal improvement. Currently, strategies have been
relying on the transference of information between the model genome (rice) and
other grasses, making that the information generated in rice and for other major
crops such as maize, wheat and rice can be also used to improve orphan grass
crops. Microsatellites (SSRs) have been described as the preferred type of marker to
be used in these studies. Given these features of rice and SSRs and aiming to
evaluate the abundance of these markers on the rice genome and their availability for
public use, a computational tool was developed. This tool search’s and characterizes
these loci, applies primer design and searchs for anchoring sites for the primers in
genomic databases of any species. It also, evaluates the affectivity of transposition of
these markers by simulating a PCR. All SSRs found in the rice genome were
analyzed and primers were designed for all loci in chromosome 1 and simulated
against the other rice chromosomes, giving an idea of potentially duplicated regions
across the genome. / O uso de várias classes de marcadores moleculares tem sido implementado
em mapeamento genético e na seleção assistida de várias espécies vegetais. O
arroz, após o sequenciamento completo do seu genoma, tem sido proposto como
um modelo genético entre as várias espécies gramíneas de importância agronômica.
Modernamente, uma estratégia tem sido adotada com base na transferência de
informações da genômica estrutural dessas gramíneas, sendo que, desta forma, o
conhecimento obtido em espécies com maiores investimentos técnicos como o
milho, trigo e o arroz, possam ser utilizados também nas gramíneas com menores
níveis tecnológicos de pesquisa. A classe dos marcadores moleculares conhecida
como microssatélites é atualmente descrito como preferencial nestes estudos.
Dadas essas características do arroz e dos microssatélites, com objetivo de se
conhecer a riqueza desses marcadores no genoma do arroz e a disponibilização
desses para a utilização pública, foi desenvolvido uma ferramenta computacional
para busca e caracterização desses locos, desenho de primers e busca por sítios de
ancoramento para os primers em bancos de dados genômicos de qualquer espécie,
avaliando dessa forma, a transposição de marcadores moleculares entre espécies
através da simulação da PCR. Foram analisadas e descritas todas as possíveis
ocorrências de microssatélites no genoma do arroz e desenhados primers para os
locos encontrados no cromossomo 1, que posteriormente foram usados para a
simulação da PCR contra os demais cromossomos do arroz, resultando o número de
amplificações possíveis para cada conjunto de primers e suas respectivas regiões
genômicas.
|
43 |
Análise in sílico de proteínas relacionadas a sementes e identificação de microssatélites através da bioinformática / In silico analysis of proteins present in seeds stocked at the swiss-prot databaseMeneghello, Geri Eduardo 04 April 2007 (has links)
Made available in DSpace on 2014-08-20T13:44:38Z (GMT). No. of bitstreams: 1
tese_geri_eduardo_meneghello.pdf: 484002 bytes, checksum: fce3a37f2176aeefed1b425c5ccc0978 (MD5)
Previous issue date: 2007-04-04 / The main reserve components in seeds are carbohydrates, lipids and proteins. Aside from
their nutritious role, proteins have several important functions in seeds, being essential to
the molecular biology of the plant. The detailed knowledge of proteins makes it possible to
outline strategies of genetic improvement aimed at increasing production and tolerance to
different pathologies, among other traits. Progress in the molecular biology, genomics and
proteomics impelled the creation of a great volume of data on proteins of several
vegetable species, with information about their functionality in the several tissues and
organs they are part of. To use this information, it is necessary to rely on computational
tools. This need impelled the appearance and development of bioinformatics, e.g. the use
of computational tools for the study of biological data. The objective of this work was to
quantify the proteins related to the seeds described already and available in the Swiss-
Prot database, and to verify the similarity and hydrophobic pattern of the proteins with
same function in different species. A detailed consultation was performed in the Swiss-Prot
database, in search of proteins found in seeds, seedlings and their component tissues.
The sequences found were grouped and analyzed according to the tissue in which they
were expressed, epicotyl, aleurone, coleoptile, cotyledon, endosperm, hypocotyl,
mesocotyl, immature seed, seedling and seeds. Alignments were performed among the
sequences with similar function in different organs and in different species to verify
similarities across tissues and species. The hydrophobic/ hydrophilic character in the
proteins found was analyzed to identify patterns. Four hundred and fifty seed-related
proteins are in stock at the Swiss Prot database. Oryza sativa, Zea mays, Arabidopsis
thaliana, Triticum aestivum, Hordeum vulgare, Glycine max, which are the species that
possess the largest number of seed proteins studied so far. There isn t a similarity in the
sequences of amino acids of the proteins with same function / Dentre os principais componentes de reserva de uma semente destacam-se os
carboidratos, os lipídeos e as proteínas. Além da função nutritiva, as proteínas têm
diversas funções importantes nas sementes, sendo integrantes da biologia molecular da
planta. O conhecimento detalhado das proteínas possibilita que sejam adotadas
estratégias de melhoramento genético visando aumento de produção, resistência a
patógenos, etc... Avanços na biologia molecular, genômica e proteômica impulsionaram a
criação de um grande volume de dados sobre proteínas de diversas espécies vegetais,
com informações sobre a sua funcionalidade nos diversos tecidos e órgãos. Para utilizar
essas informações, é imprescindível a utilização de ferramentas computacionais. Essa
necessidade impulsionou o surgimento e o desenvolvimento da bioinformática, que é a
utilização de ferramentas computacionais para o estudo de dados biológicos. O objetivo
deste trabalho foi quantificar as proteínas relacionadas às sementes já descritas e
disponíveis no banco Swiss-Prot, e verificar a similaridade e padrão de hidrofobicidade
das proteínas de mesma função em espécies diferentes. Para isso, foi feita uma consulta
detalhada no banco de dados Swiss-Prot em busca de proteínas encontradas em
sementes, plântulas e seus diversos tecidos. As seqüências encontradas foram
agrupadas por tecido, sendo selecionadas e analisadas aquelas expressas no epicótilo,
camada de aleurona, coleóptilo, cotilédones, endosperma, hipocótilo, mesocótilo, semente
imatura, plântula e semente. Realizou-se alinhamentos entre as seqüências com função
similar encontradas em diferentes órgãos e em diferentes espécies para verificar a
similaridade existente entre as mesmas. Estudou-se a hidrofobicidade/hidrofilicidade nas
proteínas encontradas, buscando identificar padrões. Existem 450 proteínas relacionadas
a sementes depositadas no banco de dados Swiss-Prot.Oryza sativa, Zea mays,
Arabidopsis thaliana, Triticum aestivum, Hordeum vulgare, Glycine max são espécies que
possuem um maior número de proteínas de sementes estudadas. Não há um padrão de
similaridade nas seqüências de aminoácidos das proteínas de mesma função
|
44 |
Bioinformatic inference of a prognostic epigenetic signature of immunity in breast cancersBizet, Martin 10 January 2018 (has links)
L’altération des marques épigénétiques est de plus en plus reconnue comme une caractéristique fondamentale des cancers. Dans cette thèse, nous avons utilisé des profils de méthylation de l’ADN en vue d’améliorer la classification des patients atteints du cancer du sein grâce à une approche basée sur l’apprentissage automatique. L’objectif à long terme est le développement d’outils cliniques de médecine personnalisée. Les données de méthylation de l’ADN furent acquises à l’aide d’une puce à ADN dédiée à la méthylation, appelée Infinium. Cette technologie est récente comparée, par exemple, aux puces d’expression génique et son prétraitement n’est pas encore standardisé. La première partie de cette thèse fut donc consacrée à l’évaluation des méthodes de normalisation par comparaison des données normalisées avec d’autres technologies (pyroséquençage et RRBS) pour les deux technologies Infinium les plus récentes (450k et 850k). Nous avons également évalué la couverture de régions biologiquement relevantes (promoteurs et amplificateurs) par les deux technologies. Ensuite, nous avons utilisé les données Infinium (correctement prétraitées) pour développer un score, appelé MeTIL score, qui présente une valeur pronostique et prédictive dans les cancers du sein. Nous avons profité de la capacité de la méthylation de l’ADN à refléter la composition cellulaire pour extraire une signature de méthylation (c’est-à-dire un ensemble de positions de l’ADN où la méthylation varie) qui reflète la présence de lymphocytes dans l’échantillon tumoral. Après une sélection de sites présentant une méthylation spécifique aux lymphocytes, nous avons développé une approche basée sur l’apprentissage automatique pour obtenir une signature d’une tailleoptimale réduite à cinq sites permettant potentiellement une utilisation en clinique. Après conversion de cette signature en un score, nous avons montré sa spécificité pour les lymphocytes à l’aide de données externes et de simulations informatiques. Puis, nous avons montré la capacité du MeTIL score à prédire la réponse à la chimiothérapie ainsi que son pouvoir pronostique dans des cohortes indépendantes de cancer du sein et, même, dans d’autres cancers. / Epigenetic alterations are increasingly recognised as an hallmark of cancers. In this thesis, we used a machine-learning-based approach to improve breast cancer patients’ classification using DNA methylation profiling with the long term aim to make treatment more personalised. The DNA methylation data were acquired using a high density DNA methylation array called Infinium. This technology is recent compared to expression arrays and its preprocessing is not yet standardised. So, the first part of this thesis was to evaluate the normalisation methods by comparing normalised data against other technologies (pyrosequencing and RRBS) for the two most recent Infinium arrays (450k and 850k).We also went deeper into the evaluation of these arrays by assessing their coverage of biologically relevant regions like promoters and enhancers. Then, we used accurately preprocessed Infinium data to develop a score, called MeTIL score, which shows prognostic and predictive value in breast cancers. We took advantage that DNA methylation can mirror the cell composition to extract a DNA methylation signature (i.e. a set of DNA methylation sites) that reflects presence of lymphocytes within the tumour. After an initial selection of lymphocyte-specific sites we developed a machine-learning-based framework which reduced the predictive set to an optimal size of five methylation sites making it potentially suitable to use in clinics. After conversion of this signature to a score, we showed its specificity to lymphocytes using external datasets and simulations. Then, we showed its ability predict response to chemotherapy and, finally, its prognostic value in independent breast cancer cohorts and even in other cancers. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
45 |
Développement de méthodes et d'algorithmes pour la caractérisation et l'annotation des transcriptomes avec les séquenceurs haut débit. / Development of methods and tools for the characterization and annotation of the transcriptomes with Next-Generation Sequencing technologies.Philippe, Nicolas 29 September 2011 (has links)
Depuis leur apparition, les séquenceurs haut débit ont révolutionné l'étude des transcriptomes à l'échelle du génome. En effet, ils offrent la possibilité de générer des millions, voire des milliards de séquences, appelées reads. Des nouvelles approches transcriptomiques, telles que la Digital Gene Expression (DGE) et le RNA-Sequencing (RNA-Seq), permettent aujourd'hui de répertorier, de quantifier, voire reconstruire tous les transcrits d'une cellule, même les plus rares. Parmi ce type de transcrits se trouvent des ARN non-codants régulateurs ; des variants d'épissages créateurs de protéines ; et aussi des chimères (par fusion de gènes ou trans-épissage). La caractérisation de l'ensemble de ces transcrits représente un réel défi algorithmique, mais suscite aussi un défi biologique car certains peuvent être impliqués dans de nombreux processus cellulaires physiologiques et pathologiques et sont fréquemment décrits dans les cancers.Dans ce travail, nous proposons des algorithmes et des méthodes pour la caractérisation et l'annotation des transcriptomes. Tout d'abord, nous proposons une étude statistique sur la DGE afin d'évaluer l'impact des erreurs de séquences lors de l'analyse des reads. À partir de cette analyse, nous avons développé un pipeline d'annotation pour la DGE. Par le biais de ce premier travail, nous avons pu démontrer que de nombreuses informations étaient partagées entre les reads. Cela nous a amené à concevoir la structure d'indexation Gk arrays qui permet d'organiser une quantité massive de reads de façon à pouvoir interroger rapidement la structure sous forme de requêtes. Enfin, en s'appuyant sur les Gk arrays, nous avons développé CRAC qui est un logiciel spécialisé dans le traitement du RNA-Seq. En intégrant sa propre phase de mapping, CRAC est capable de distinguer les phénomènes biologiques des erreurs de séquences. Ilpermet notamment l'identification de chimères qui sont souvent très faiblement exprimées dans un transcriptome et sont par nature complexe à détecter avec des parties localisées à différents endroits sur le génome. / Since their introduction, high-throughput sequencers have revolutionized transcriptomic studies at genome scale. Indeed, they have the ability to generate millions, or even billions of short sequences, called reads. New transcriptomic approaches, such as Digital Gene Expression (DGE) and RNA-sequencing (RNA-Seq), enable the identification, quantification, and reconstitution of all transcripts of the cell, even rare ones. Among these transcripts are regulatory non-coding RNAs, alternative splice variants, which code for novel proteins, but also non colinear transcripts termed chimeras (generated by either gene fusion or trans-splicing). The characterization of these transcripts constitutes a sheer algorithmic,but also a biological challenge due to their differences in nature, their diverse implications in physiological and cellular processes, and for some their role in cancer development.In this work, we focus on algorithms and methods for the characterization and annotation of transcriptomes. First, we proposed a statistical study on DGE to assess the impact of sequence errors on the analysis. Therefrom, we developed a pipeline for the DGE annotation. Through this initial work,we demonstrated that a lot of information is shared between the reads. This property led us to design, the Gk arrays, an indexing data structure for organizing huge amounts of reads in memory and algorithms to quickly query this structure. Finally, based on the Gk arrays we have conceived, CRAC,a software specialised in the RNA-Seq processing. By integrating its own mapping process, CRAC is able to distinguish the biological phenomena from sequence errors. Moreover, it allows to identify chimeric RNAs, which may be weakly expressed in a transcriptome and are inherently complex to detect since their fragments originate from different places on the genome.
|
46 |
Une problématique de découverte de signatures de biomarqueurs / A biomarkers signatures discovery problemAbtroun Hamlaoui Belmouloud, Lilia 12 December 2011 (has links)
Appliqué à des problèmes actuels de recherche pharmaceutique, ce mémoire traite de la génération de signatures de biomarqueurs par une approche d'extraction de règles d'association et une Analyse Formelle de Concepts. Elle a aboutit au développement d'une méthodologie qui a été validée par six projets de recherche de signatures de biomarqueurs.Alors qu'il n'existe pas de méthode optimale pour traiter les données biomarqueurs, cette méthodologie logique s'appuie sur un scénario global d'analyse déployant quatre méthodes, chacune dépendante de procédés différents. Cette architecture qualifie une problématique centrale de manière à optimiser la qualité d'une solution aux différents problèmes scientifiques posés. Les six applications pratiques ont démontré l'intérêt de la prise en compte précoce des critères de qualité énoncés par les experts du domaine. L'interactivité est soutenue tout au long du processus de découverte et produit des résultats imprévus pour l'expert. La méthodologie s'inscrit dans la lignée des approches dédiées à la stratification systématique des individus, qui constitue le premier palier vers une médecine personnalisée. / In the framework of current intricate questions to be solved by the pharmaceutical industry, this manuscript examines the generation of biomarker signatures through an approach that combines association rules extraction and Formal Concept Analysis. It led to the development of a methodology which was validated by six research industrial projects. While there is no single optimal method to handle biomarkers datasets, this logical methodology relies on a global datamining scenario made up of four different methods. Each method utilizes different processes. This architecture qualifies global approach that helps to optimize a response to different biomarker signatures discovery problems. The six applications presented in this manuscript demonstrate the interest of an early consideration of the quality criteria are expressed by the experts in the field. The interactivity is supported throughout the process of discovery and produces unexpected results for the expert. The methodology helps the systematic stratification of individuals, which constitutes the first step towards personalized medicine.
|
47 |
Adaptation de la levure à la suite des perturbations du mécanisme de contrôle de qualité de l'ARNGendron, Louis 09 1900 (has links)
The life-cycle of RNA is determined by several processing steps, which allow the cell to export and translate a coding transcript. The cell has developed an astonishingly complex mechanism to ensure the integrity of RNA processing steps. The quality control mechanism of RNA balances the biosynthesis and degradation of various transcripts, adding another layer of gene regulation to the complex system of gene expression. The exosome is a central piece of the RNA quality control mechanism as it degrades many of the aberrant or non-functional RNAs in the nucleus and the cytoplasm. This project characterizes and highlight a response to mutation of components from the RNA quality control mechanism in Saccharomyces cerevisiae. These perturbations include functional components of the exosome (Csl4 and Dis3), a cofactor of the nuclear exosome (Rrp6), an essential protein for pre-rRNA processing (Enp1) and a component of RNA export machinery (Srm1). Here, I present bioinformatics approaches to characterize the cellular response at a level of transcript expression and polyadenylation size. The stress response embedded in the gene expression profile is highly similar between the mutants. This work suggests a generic response to a failure in different components of the RNA quality control machinery. / Le cycle de vie des ARN est déterminé par différentes étapes permettant à la cellule d’exporter et de traduire un transcrit codant. La cellule a développé un mécanisme incroyablement complexe pour s’assurer de l’intégrité des étapes de maturation de l’ARN. Le mécanisme de contrôle de qualité balance la biosynthèse et la dégradation de différents transcrits, ce qui ajout un niveau de régulation au système de l’expression génique. L’exosome est une pièce centrale du mécanisme de contrôle de qualité de l’ARN alors qu’elle dégrade une grande partie des transcrits aberrants ou non-fonctionnels dans le noyau et le cytoplasme. Ce projet caractérise et souligne la réponse cellulaire à la suite de la mutation de composantes du mécanisme de contrôle de qualité de l’ARN chez Saccharomyces cerevisiae. Ces perturbations comportent des composantes fonctionnelles du complexe de l’exosome (Csl4 et Dis3), un cofacteur de l’exosome nucléaire (Rrp6), une protéine essentielle pour la maturation des pré-ARNr (Enp1) et une composante de la machinerie d’export de l’ARN (Srm1). Ici, je présente des approches bio-informatiques pour caractériser la réponse cellulaire au niveau de l’expression des transcrits et de la taille des segments polyadénylés. La réponse au stress cellulaire intégré dans le profil d’expression du génome est très similaire entre les mutants. Ce travail suggère une réponse générique à la suite de la perturbation de différentes composantes du mécanisme de contrôle de qualité de l’ARN.
|
48 |
Immunological Cross-Reactivity : Construction of a Workflow That Enables Cross-Reactivity PredictionsBlomlöf, Alexander, Unge, Alvin, Byström, Petter, Lindberg, Erika, Fries, Torbjörn January 2022 (has links)
Cross-reactivity occurs when an antibody binds to the epitope of a protein that is not the targeted antigen. This is problematic in the analysis of immunoassay diagnostics. Detecting a protein incorrectly might cause issues such as incorrect mapping of metabolic conditions for research or diagnosis. In this study, articles have been collected within two main fields. The first of which is focused on bioinformatic tools to predict cross-reactivity risk and the second field investigates how single substitutions affect the antibody-antigen binding. The results from the collected articles were analyzed with the aim of providing as much information surrounding the topic as possible, to gain a further understanding of how protein similarities impact cross-reactivity. FASTA alignments proved to be efficient in classifying cross-reactive proteins based on sequence similarity. Moreover, epitope analysis, using PD tool or Cross-React, can provide an even more precise subset of proteins with risk of causing cross-reactivity. Individual residues of the epitopes of the subset can then be analyzed. Specific residue’s physicochemical properties such as hydrophobicity, polarity, size and charge have proven to be relevant for the binding affinity, with charge having the largest impact. The position of an amino acid has also shown great importance. More centrally located amino acids within the epitope contribute more to paratope affinity than those on the outer positions. However, a conclusive classifier based on specific residues within epitopes is difficult to implement in cross-reactivity analysis. A workflow of the different prediction steps has been constructed into a workflow that may be implemented as an automated pipeline in the future.
|
49 |
<b>Systems Modeling of host microbiome interactions in Inflammatory Bowel Diseases</b>Javier E Munoz (18431688) 24 April 2024 (has links)
<p dir="ltr">Crohn’s disease and ulcerative colitis are chronic inflammatory bowel diseases (IBD) with a rising global prevalence, influenced by clinical and demographics factors. The pathogenesis of IBD involves complex interactions between gut microbiome dysbiosis, epithelial cell barrier disruption, and immune hyperactivity, which are poorly understood. This necessitates the development of novel approaches to integrate and model multiple clinical and molecular data modalities from patients, animal models, and <i>in-vitro</i> systems to discover effective biomarkers for disease progression and drug response. As sequencing technologies advance, the amount of molecular and compositional data from paired measurements of host and microbiome systems is exploding. While it is become routine to generate such rich, deep datasets, tools for their interpretation lag behind. Here, I present a computational framework for integrative modeling of microbiome multi-omics data titled: Latent Interacting Variable Effects (LIVE) modeling. LIVE combines various types of microbiome multi-omics data using single-omic latent variables (LV) into a structured meta-model to determine the most predictive combinations of multi-omics features predicting an outcome, patient group, or phenotype. I implemented and tested LIVE using publicly available metagenomic and metabolomics data set from Crohn’s Disease (CD) and ulcerative colitis (UC) status patients in the PRISM and LLDeep cohorts. The findings show that LIVE reduced the number of features interactions from the original datasets for CD to tractable numbers and facilitated prioritization of biological associations between microbes, metabolites, enzymes, clinical variables, and a disease status outcome. LIVE modeling makes a distinct and complementary contribution to the current methods to integrate microbiome data to predict IBD status because of its flexibility to adapt to different types of microbiome multi-omics data, scalability for large and small cohort studies via reliance on latent variables and dimensionality reduction, and the intuitive interpretability of the meta-model integrating -omic data types.</p><p dir="ltr">A novel application of LIVE modeling framework was associated with sex-based differences in UC. Men are 20% more likely to develop this condition and 60% more likely to progress to colitis-associated cancer compared to women. A possible explanation for this observation is differences in estrogen signaling among men and women in which estrogen signaling may be protective against UC. Extracting causal insights into how gut microbes and metabolites regulate host estrogen receptor β (ERβ) signaling can facilitate the study of the gut microbiome’s effects on ERβ’s protective role against UC. Supervised LIVE models<b> </b>ERβ signaling using high-dimensional gut microbiome data by controlling clinical covariates such as: sex and disease status. LIVE models predicted an inhibitory effect on ER-UP and ER-DOWN signaling activities by pairs of gut microbiome features, generating a novel of catalog of metabolites, microbial species and their interactions, capable of modulating ER. Two strongly positively correlated gut microbiome features: <i>Ruminoccocus gnavus</i><i> </i>with acesulfame and <i>Eubacterium rectale</i><i> </i>with 4-Methylcatechol were prioritized as suppressors ER-UP and ER-DOWN signaling activities. An <i>in-vitro</i> experimental validation roadmap is proposed to study the synergistic relationships between metabolites and microbiota suppressors of ERβ signaling in the context of UC. Two i<i>n-vitro</i> systems, HT-29 female colon cancer cell and female epithelial gut organoids are described to evaluate the effect of gut microbiome on ERβ signaling. A detailed experimentation is described per each system including the selection of doses, treatments, metrics, potential interpretations and limitations. This experimental roadmap attempts to compare experimental conditions to study the inhibitory effects of gut microbiome on ERβ signaling and how it could elevate or reduce the risk of developing UC. The intuitive interpretability of the meta-model integrating -omic data types in conjunction with the presented experimental validation roadmap aim to transform an artificial intelligence-generated big data hypothesis into testable experimental predictions.</p>
|
50 |
Développement de méthodes bioinformatiques dédiées à la prédiction et l'analyse des réseaux métaboliques et des ARN non codants / Development of bioinformatic methods dedicated to the prediction and the analysis of metabolic networks and non-coding RNAGhozlane, Amine 20 November 2012 (has links)
L'identification des interactions survenant au niveau moléculaire joue un rôle crucial pour la compréhension du vivant. L'objectif de ce travail a consisté à développer des méthodes permettant de modéliser et de prédire ces interactions pour le métabolisme et la régulation de la transcription. Nous nous sommes basés pour cela sur la modélisation de ces systèmes sous la forme de graphes et d'automates. Nous avons dans un premier temps développé une méthode permettant de tester et de prédire la distribution du flux au sein d'un réseau métabolique en permettant la formulation d'une à plusieurs contraintes. Nous montrons que la prise en compte des données biologiques par cette méthode permet de mieux reproduire certains phénotypes observés in vivo pour notre modèle d'étude du métabolisme énergétique du parasite Trypanosoma brucei. Les résultats obtenus ont ainsi permis de fournir des éléments d'explication pour comprendre la flexibilité du flux de ce métabolisme, qui étaient cohérentes avec les données expérimentales. Dans un second temps, nous nous sommes intéressés à une catégorie particulière d'ARN non codants appelés sRNAs, qui sont impliqués dans la régulation de la réponse cellulaire aux variations environnementales. Nous avons développé une approche permettant de mieux prédire les interactions qu'ils effectuent avec d'autres ARN en nous basant sur une prédiction des interactions, une analyse par enrichissement du contexte biologique de ces cibles, et en développant un système de visualisation spécialement adapté à la manipulation de ces données. Nous avons appliqué notre méthode pour l'étude des sRNAs de la bactérie Escherichia coli. Les prédictions réalisées sont apparues être en accord avec les données expérimentales disponibles, et ont permis de proposer plusieurs nouvelles cibles candidates. / The identification of the interactions occurring at the molecular level is crucial to understand the life process. The aim of this work was to develop methods to model and to predict these interactions for the metabolism and the regulation of transcription. We modeled these systems by graphs and automata.Firstly, we developed a method to test and to predict the flux distribution in a metabolic network, which consider the formulation of several constraints. We showed that this method can better mimic the in vivo phenotype of the energy metabolism of the parasite Trypanosoma brucei. The results enabled to provide a good explanation of the metabolic flux flexibility, which were consistent with the experimental data. Secondly, we have considered a particular class of non-coding RNAs called sRNAs, which are involved in the regulation of the cellular response to environmental changes. We developed an approach to better predict their interactions with other RNAs based on the interaction prediction, an enrichment analysis, and by developing a visualization system adapted to the manipulation of these data. We applied our method to the study of the sRNAs interactions within the bacteria Escherichia coli. The predictions were in agreement with the available experimental data, and helped to propose several new target candidates.
|
Page generated in 0.0852 seconds