Global ETD Search

171	A functional genomics approach to map transcriptional and post-transcriptional gene regulatory networks Bhinge, Akshay Anant 15 October 2009 (has links) It has been suggested that organismal complexity correlates with the complexity of gene regulation. Transcriptional control of gene expression is mediated by binding of regulatory proteins to cis-acting sequences on the genome. Hence, it is crucial to identify the chromosomal targets of transcription factors (TFs) to delineate transcriptional regulatory networks underlying gene expression programs. The development of ChIP-chip technology has enabled high throughput mapping of TF binding sites across the genome. However, there are many limitations to the technology including the availability of whole genome arrays for complex organisms such human or mouse. To circumvent these limitations, we developed the Sequence Tag Analysis of Genomic Enrichment (STAGE) methodology that is based on extracting short DNA sequences or “tags” from ChIP-enriched DNA. With improvements in sequencing technologies, we applied the recently developed ChIP-Seq technique i.e. ChIP followed by ultra high throughput sequencing, to identify binding sites for the TF E2F4 across the human genome. We identified previously uncharacterized E2F4 binding sites in intergenic regions and found that several microRNAs are potential E2F4 targets. Binding of TFs to their respective chromosomal targets requires access of the TF to its regulatory element, which is strongly influenced by nucleosomal remodeling. In order to understand nucleosome remodeling in response to transcriptional perturbation, we used ultra high throughput sequencing to map nucleosome positions in yeast that were subjected to heat shock or were grown normally. We generated nucleosome remodeling profiles across yeast promoters and found that specific remodeling patterns correlate with specific TFs active during the transcriptional reprogramming. Another important aspect of gene regulation operates at the post-transcriptional level. MicroRNAs (miRNAs) are ~22 nucleotide non-coding RNAs that suppress translation or mark mRNAs for degradation. MiRNAs regulate TFs and in turn can be regulated by TFs. We characterized a TF-miRNA network involving the oncofactor Myc and the miRNA miR-22 that suppresses the interferon pathway as primary fibroblasts enter a stage of rapid proliferation. We found that miR-22 suppresses the interferon pathway by inhibiting nuclear translocation of the TF NF-kappaB. Our results show how the oncogenic TF Myc cross-talks with other TF regulatory pathways via a miRNA intermediary. / text Gene regulatory networks Gene regulation Human genome Gene mapping Transcription factors Transcription factor binding sites ChIP-enriched DNA DNA sequences Nucleosomal remodeling MicroRNAs Genomics
172	Identification des réseaux transcriptionnnels de résistance aux antifongiques chez Candida albicans Znaidi, Sadri 10 1900 (has links) Plusieurs souches cliniques de Candida albicans résistantes aux médicaments antifongiques azolés surexpriment des gènes encodant des effecteurs de la résistance appartenant à deux classes fonctionnelles : i) des transporteurs expulsant les azoles, CDR1, CDR2 et MDR1 et ii) la cible des azoles 14-lanostérol déméthylase encodée par ERG11. La surexpression de ces gènes est due à la sélection de mutations activatrices dans des facteurs de transcription à doigts de zinc de la famille zinc cluster (Zn2Cys6) qui contrôlent leur expression : Tac1p (Transcriptional activator of CDR genes 1) contrôlant l’expression de CDR1 et CDR2, Mrr1p (Multidrug resistance regulator 1), régulant celle de MDR1 et Upc2p (Uptake control 2), contrôlant celle d’ERG11. Un autre effecteur de la résistance clinique aux azoles est PDR16, encodant une transférase de phospholipides, dont la surexpression accompagne souvent celle de CDR1 et CDR2, suggérant que les trois gènes appartiennent au même régulon, potentiellement celui de Tac1p. De plus, la régulation transcriptionnelle du gène MDR1 ne dépend pas seulement de Mrr1p, mais aussi du facteur de transcription de la famille basic-leucine zipper Cap1p (Candida activator protein 1), un régulateur majeur de la réponse au stress oxydatif chez C. albicans qui, lorsque muté, induit une surexpression constitutive de MDR1 conférant la résistance aux azoles. Ces observations suggèrent qu’un réseau de régulation transcriptionnelle complexe contrôle le processus de résistance aux antifongiques azolés chez C. albicans. L’objectif de mon projet au doctorat était d’identifier les cibles transcriptionnelles directes des facteurs de transcription Tac1p, Upc2p et Cap1p, en me servant d’approches génétiques et de génomique fonctionnelle, afin de i) caractériser leur réseau transcriptionnel et les modules transcriptionnels qui sont sous leur contrôle direct, et ii) d’inférer leurs fonctions biologiques et ainsi mieux comprendre leur rôle dans la résistance aux azoles. Dans un premier volet, j’ai démontré, par des expériences de génétique, que Tac1p contrôle non seulement la surexpression de CDR1 et CDR2 mais aussi celle de PDR16. Mes résultats ont identifié une nouvelle mutation activatrice de Tac1p (N972D) et ont révélé la participation d’un autre régulateur dans le contrôle transcriptionnel de CDR1 et PDR16 dont l’identité est encore inconnue. Une combinaison d’expériences de transcriptomique et d’immunoprécipitation de la chromatine couplée à l’hybridation sur des biopuces à ADN (ChIP-chip) m’a permis d’identifier plusieurs gènes dont l’expression est contrôlée in vivo et directement par Tac1p (PDR16, CDR1, CDR2, ERG2, autres), Upc2p (ERG11, ERG2, MDR1, CDR1, autres) et Cap1p (MDR1, GCY1, GLR1, autres). Ces expériences ont révélé qu’Upc2p ne contrôle pas seulement l’expression d’ERG11, mais aussi celle de MDR1 et CDR1. Plusieurs nouvelles propriétés fonctionnelles de ces régulateurs ont été caractérisées, notamment la liaison in vivo de Tac1p aux promoteurs de ses cibles de façon constitutive et indépendamment de son état d’activation, et la liaison de Cap1p non seulement à la région du promoteur de ses cibles, mais aussi celle couvrant le cadre de lecture ouvert et le terminateur transcriptionnel putatif, suggérant une interaction physique avec la machinerie de la transcription. La caractérisation du réseau transcriptionnel a révélé une interaction fonctionnnelle entre ces différents facteurs, notamment Cap1p et Mrr1p, et a permis d’inférer des fonctions biologiques potentielles pour Tac1p (trafic et la mobilisation des lipides, réponse au stress oxydatif et osmotique) et confirmer ou proposer d’autres fonctions pour Upc2p (métabolisme des stérols) et Cap1p (réponse au stress oxydatif, métabolisme des sources d’azote, transport des phospholipides). Mes études suggèrent que la résistance aux antifongiques azolés chez C. albicans est intimement liée au métabolisme des lipides membranaires et à la réponse au stress oxydatif. / Many azole resistant Candida albicans clinical isolates overexpress genes encoding azole resistance effectors that belong to two functional categories: i) CDR1, CDR2 and MDR1, encoding azole-efflux transporters and ii) ERG11, encoding the target of azoles 14-lanosterol demethylase. The constitutive overexpression of these genes is due to activating mutations in transcription factors of the zinc cluster family (Zn2Cys6) which control their expression. Tac1p (Transcriptional activator of CDR genes 1), controlling the expression of CDR1 and CDR2, Mrr1p (Multidrug resistance regulator 1), regulating MDR1 expression and Upc2p (Uptake control 2), controlling the expression of ERG11. Another determinant of clinical azole resistance is PDR16, encoding a phospholipid transferase, whose overexpression often accompanies that of CDR1 and CDR2 in clinical isolates, suggesting that the three genes belong to the same regulon, potentially that of Tac1p. Further, MDR1 expression is not only regulated by Mrr1p, but also by the basic-leucine zipper transcription factor Cap1p (Candida activator protein 1), which controls the oxidative stress response in C. albicans and whose mutation confers azole resistance via MDR1 overexpression. These observations suggest that a complex transcriptional regulatory network controls azole resistance in C. albicans. My Ph.D. studies are aimed at identifying the direct transcriptional targets of Tac1p, Upc2p and Cap1p using genetics and functional genomics approches in order to i) characterize their regulatory network and the transcriptional modules under their direct control and ii) infer their biological functions and better understand their roles in azole resistance. In the first part of my studies, I showed that Tac1p does not only control the expression of CDR1 and CDR2, but also that of PDR16. My results also identified a new activating mutation in Tac1p (N972D) and revealed that the expression of CDR1 and PDR16 is under the control of another yet unknown regulator. The combination of transcriptomics and genome-wide location (ChIP-chip) approaches allowed me to identify the in vivo direct targets of Tac1p (PDR16, CDR1, CDR2, ERG2, others), Upc2p (ERG11, ERG2, MDR1, CDR1, others) and Cap1p (MDR1, GCY1, GLR1, others). These results also revealed that Upc2p does not only control the expression of ERG11 but also that of MDR1 and CDR1. Many new functional features of these transcription factors were found, including the constitutive binding of Tac1p to its targets under both activating and non-activating conditions, and the binding of Cap1p which extends beyond the promoter region of its target genes, to cover the open reading frame and the putative transcription termination regions, suggesting a physical interaction with the transcriptional machinery. The characterization of the transcriptional regulatory network revealed a functional interaction between these factors, notably between Cap1p and Mrr1p, and inferred potential biological functions for Tac1p (lipid mobilization and traffic, response to oxidative and osmotic stress) and confirmed or suggested other functions for Upc2p (sterol metabolism) and Cap1p (oxidative stress response, regulation of nitrogen utilization and phospholipids transport). Taken together, my results suggest that azole resistance in C. albicans is tightly linked to membrane lipid metabolism and oxidative stress response. Candida albicans antifongiques azolés résistance aux médicaments génomique fonctionnelle réseau transcriptionnel Candida albicans antifungal agents drug resistance functional genomics transcriptional regulatory networks
173	Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Le, Hai-Son Phuoc 01 May 2013 (has links) Advances in genomics allow researchers to measure the complete set of transcripts in cells. These transcripts include messenger RNAs (which encode for proteins) and microRNAs, short RNAs that play an important regulatory role in cellular networks. While this data is a great resource for reconstructing the activity of networks in cells, it also presents several computational challenges. These challenges include the data collection stage which often results in incomplete and noisy measurement, developing methods to integrate several experiments within and across species, and designing methods that can use this data to map the interactions and networks that are activated in specific conditions. Novel and efficient algorithms are required to successfully address these challenges. In this thesis, we present probabilistic models to address the set of challenges associated with expression data. First, we present a novel probabilistic error correction method for RNA-Seq reads. RNA-Seq generates large and comprehensive datasets that have revolutionized our ability to accurately recover the set of transcripts in cells. However, sequencing reads inevitably contain errors, which affect all downstream analyses. To address these problems, we develop an efficient hidden Markov modelbased error correction method for RNA-Seq data . Second, for the analysis of expression data across species, we develop clustering and distance function learning methods for querying large expression databases. The methods use a Dirichlet Process Mixture Model with latent matchings and infer soft assignments between genes in two species to allow comparison and clustering across species. Third, we introduce new probabilistic models to integrate expression and interaction data in order to predict targets and networks regulated by microRNAs. Combined, the methods developed in this thesis provide a solution to the pipeline of expression analysis used by experimentalists when performing expression experiments. genomics gene expression gene regulation microarray RNA-Seq transcriptomics error correction comparative genomics regulatory networks cross-species expression database Gene Expression Omnibus GEO orthologs microRNA target prediction Dirichlet Process Indian Buffet Process hidden Markov model immune response cancer. Computer Sciences
174	Inverse inference in the asymmetric Ising model Sakellariou, Jason 22 February 2013 (has links) (PDF) Recent experimental techniques in biology made possible the acquisition of overwhelming amounts of data concerning complex biological networks, such as neural networks, gene regulation networks and protein-protein interaction networks. These techniques are able to record states of individual components of such networks (neurons, genes, proteins) for a large number of configurations. However, the most biologically relevantinformation lies in their connectivity and in the way their components interact, information that these techniques aren't able to record directly. The aim of this thesis is to study statistical methods for inferring information about the connectivity of complex networks starting from experimental data. The subject is approached from a statistical physics point of view drawing from the arsenal of methods developed in the study of spin glasses. Spin-glasses are prototypes of networks of discrete variables interacting in a complex way and are widely used to model biological networks. After an introduction of the models used and a discussion on the biological motivation of the thesis, all known methods of network inference are introduced and analysed from the point of view of their performance. Then, in the third part of the thesis, a new method is proposed which relies in the remark that the interactions in biology are not necessarily symmetric (i.e. the interaction from node A to node B is not the same as the one from B to A). It is shown that this assumption leads to methods that are both exact and efficient. This means that the interactions can be computed exactly, given a sufficient amount of data, and in a reasonable amount of time. This is an important original contribution since no other method is known to be both exact and efficient. Statistical physics Disordered systems Complex networks Biological networks Neural networks Gene regulatory networks Information theory Statistical Inference Spin glasses Graphical models Asymmetric interactions Inverse Ising problem Kinetic Ising model
175	Emergence de structures modulaires dans les régulations des systèmes biologiques : théorie et applications à Bacillus subtilis Goelzer, Anne 04 November 2010 (has links) Cette thèse consiste à étudier l'organisation du système de contrôle des voies métaboliques des bactéries afin de dégager des propriétés systémiques révélant son fonctionnement. Dans un premier temps, nous montrons que le contrôle des voies métaboliques est hautement structuré et peut se décomposer en modules fortement découplés en régime stationnaire. Ces modules possèdent des propriétés mathématiques remarquables ayant des conséquences importantes en biologie. Cette décomposition, basée intrinsèquement sur la vision système de l'Automatique, offre un cadre théorique formel général d'analyse du contrôle des voies métaboliques qui s'est révélé effectif pour analyser des données expérimentales. dans un deuxième temps, nous nous intéressons aux raisons possibles de l'émergence de cette structure de contrôle similaire. Nous identifions un ensemble de contraintes structurelles agissant au niveau de la répartition d'une ressource commune, les protéines, entre les processus cellulaires. Respecter ces contraintes pour un taux de croissance donné conduit à formaliser et résoudre un problème d'optimisation convexe non différentiable, que nous appelons Resource balance Analysis. Ce problème d'optimisation se résout numériquement à l'échelle de la bactérie grâce à un problème de Programmation Linéaire équivalent. plusieurs propriétés sont déduites de l'analyse théorique du critère obtenu. Tout d'abord, le taux de croissance est structurellement limité par la répartition d'une quantité finie de protéines entre les voies métaboliques et les ribosomes. Ensuite, l'émergence des modules dans les voies métaboliques provient d'une politique générale d'économie en protéines chez la bactérie pour gagner du taux de croissance. Certaines stratégies de transport bien connues comme la répression catabolique ou la substitution de transporteurs haute/basse affinités sont prédites par notre méthode et peuvent alors être interprétées comme le moyen de maximiser la croissance tout en minimisant l'investissement en protéines. / This thesis consist in studying the organization of the control system of metabolic pathways of bacteria to identify systemic properties revealing its operation. At first, we show that control of metabolic pathways is highly structured and can be decomposed into modules strongly decoupled in steady-state. These modules are defined by their singular mathematical properties having important implications in biology. This decomposition, based inherently on the system outlook of automatic control, offers a formal theoretical analysis of general control of metabolic pathways, which has been effective in analysing experimental data. In a second step, we consider the possible reasons for the emergence of this modular control structure. We identify a set of structural constraints acting at the distribution of a common resourc, the proteins between cellular processes. Satisfying these constraints for a given growth rate leads to formalize and to solve a non-differentiable convex optimization problem, that we call Resource Balance Analysis. This optimization problem is solved numerically at the scale of the bacteria through an equivalent linear programming problem. Several properties are derived from theoretical analysis of the obtained criterion. Firts, the growth rate is structurally limited by the distribution of a finite amount of proteines between the metabolic pathways and the ribosomes. Second, the emergence of modules in metabolic pathways arises from a policy of economy in proteins in the bacterium to increase the growth rate. Some well known transport strategies such as catabolite repression of the substitution between low/highaffinity transporters are predicted by our methods and could consequently be interpretd as ways to maximize growth while minimizing investment in proteins. Biologie des systèmes Modularité Limitation du taux de croissance Gestion des ressources Resource Balance Analysis Optimisation convexe Prédiction des modules Systems biology Modularity Growth rate limitation Resource management Resource Balance Analysis Convex optimization Module prediction
176	Modélisation de l'évolution temporelle de l'expression des gènes sur la base de données de puces à ADN: application à la drosophile Haye, Alexandre 24 June 2011 (has links) Cette thèse de doctorat s’inscrit dans le développement et l’utilisation de méthodes mathématiques et informatiques qui exploitent les données temporelles d’expression des gènes issues de puces à ADN afin de rationaliser et de modéliser les réseaux de régulation génique. Dans cette optique, nous nous sommes principalement intéressés aux données d’expression des gènes de la drosophile (Drosophila melanogaster) pendant son développement, du stade embryonnaire au stade adulte. Nous avons également étudié des données concernant le développement d’autres eucaryotes supérieurs, la réponse d’une bactérie soumises à différents stress et le cycle cellulaire d’une levure. Ce travail a été réalisé selon trois volets principaux :la détection des stades de développement et des perturbations, les classifications de profils d’expression et la modélisation de réseaux de régulation.<p><p>Premièrement, l’observation des données d’expression utilisées nous a conduits à approfondir l’étude des phénomènes survenant lors des changements de stades de développement de la drosophile. Dans ce but, deux méthodes de détection automatique de ces changements ont été développées et appliquées aux données temporelles disponibles sur le développement d’eucaryotes supérieurs. Elles ont également été appliquées à des données temporelles relatives à des perturbations externes de bactéries. Cette étude à montré qu’une formulation mathématique simple permettait de retrouver les instants expérimentaux où une perturbation ou un changement de stade de développement est observé, à partir uniquement des profils d’expression. Par ailleurs, la réponse à une perturbation externe s’avère non distinguable d’une succession de stades de développement, sur la base des seuls profils temporels d’expression.<p><p>Deuxièmement, en raison des dimensions du problème constitué par les données d’expression de plusieurs milliers de gènes et de l’impossibilité de distinguer le rôle dans la régulation des gènes qui présentent des profils d’expression similaires, il s’est avéré nécessaire de classifier les gènes selon leurs profils d’expression. En nous basant sur les résultats obtenus lors de la détection des stades de développement, la démarche suivie est de regrouper les gènes qui présentent des profils temporels d’expression aux comportements similaires non seulement au cours de la série temporelle complète, mais également dans chacun des stades de développement. Dans cette optique, trois distances ont été proposées et utilisées dans une classification hiérarchique des données d’expression de la drosophile.<p><p>Troisièmement, des structures de modèles linéaires et non linéaires ainsi que des méthodes d’estimation et de réduction paramétriques ont été développées et utilisées pour reproduire les données d’expression du développement de la drosophile. Les résultats de ce travail ont montré qu’avec une structure de modèle linéaire simple, la reproduction des profils expérimentaux était excellente et que, dans ce cas, le réseau de régulation génique de la drosophile pouvait se contenter d’une faible connectivité (en moyenne 3 connexions par classe de gènes) et ce, sans hypothèse a priori. Toutefois, les modèles linéaires ont ensuite sérieusement été remis en question par des analyses de robustesse aux perturbations paramétriques et de stabilité des profils après extrapolation dans le temps. Dès lors, quatre structures de modèles non linéaires et cinq méthodes de réduction paramétrique ont été proposées et utilisées pour concilier les critères de reproduction des données, de robustesse et de stabilité des réseaux identifiés. En outre, ces méthodes de modélisation ont été appliquées à un sous-ensemble de 20 gènes impliqués dans le développement musculaire de la drosophile et pour lesquels 36 interactions ont été validées expérimentalement, ainsi qu’à des profils synthétiques bruités. Nous avons pu constater que plus de la moitié des connexions et non-connexions sont retrouvées par trois modèles non linéaires. Les résultats de cette étude ont permis d’éliminer certaines structures de modèle et méthodes de réduction et ont mis en lumière plusieurs directions futures à suivre dans la démarche de modélisation des réseaux de régulation génique. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Chimie Sciences de l'ingénieur Gene expression -- Mathematical models DNA microarrays Drosophila -- Genetics Puces à ADN Drosophiles -- Génétique gene expression gene regulatory networks cDNA microarrays mathematical modeling systems biology computational biology
177	Redes complexas de expressão gênica: síntese, identificação, análise e aplicações / Gene expression complex networks: synthesis, identification, analysis and applications Fabricio Martins Lopes 21 February 2011 (has links) Os avanços na pesquisa em biologia molecular e bioquímica permitiram o desenvolvimento de técnicas capazes de extrair informações moleculares de milhares de genes simultaneamente, como DNA Microarrays, SAGE e, mais recentemente RNA-Seq, gerando um volume massivo de dados biológicos. O mapeamento dos níveis de transcrição dos genes em larga escala é motivado pela proposição de que o estado funcional de um organismo é amplamente determinado pela expressão de seus genes. No entanto, o grande desafio enfrentado é o pequeno número de amostras (experimentos) com enorme dimensionalidade (genes). Dessa forma, se faz necessário o desenvolvimento de novas técnicas computacionais e estatísticas que reduzam o erro de estimação intrínseco cometido na presença de um pequeno número de amostras com enorme dimensionalidade. Neste contexto, um foco importante de pesquisa é a modelagem e identificação de redes de regulação gênica (GRNs) a partir desses dados de expressão. O objetivo central nesta pesquisa é inferir como os genes estão regulados, trazendo conhecimento sobre as interações moleculares e atividades metabólicas de um organismo. Tal conhecimento é fundamental para muitas aplicações, tais como o tratamento de doenças, estratégias de intervenção terapêutica e criação de novas drogas, bem como para o planejamento de novos experimentos. Nessa direção, este trabalho apresenta algumas contribuições: (1) software de seleção de características; (2) nova abordagem para a geração de Redes Gênicas Artificiais (AGNs); (3) função critério baseada na entropia de Tsallis; (4) estratégias alternativas de busca para a inferência de GRNs: SFFS-MR e SFFS-BA; (5) investigação biológica das redes gênicas envolvidas na biossíntese de tiamina, usando a Arabidopsis thaliana como planta modelo. O software de seleção de características consiste de um ambiente de código livre, gráfico e multiplataforma para problemas de bioinformática, que disponibiliza alguns algoritmos de seleção de características, funções critério e ferramentas de visualização gráfica. Em particular, implementa um método de inferência de GRNs baseado em seleção de características. Embora existam vários métodos propostos na literatura para a modelagem e identificação de GRNs, ainda há um problema muito importante em aberto: como validar as redes identificadas por esses métodos computacionais? Este trabalho apresenta uma nova abordagem para validação de tais algoritmos, considerando três aspectos principais: (a) Modelo para geração de Redes Gênicas Artificiais (AGNs), baseada em modelos teóricos de redes complexas, os quais são usados para simular perfis temporais de expressão gênica; (b) Método computacional para identificação de redes gênicas a partir de dados temporais de expressão; e (c) Validação das redes identificadas por meio do modelo AGN. O desenvolvimento do modelo AGN permitiu a análise e investigação das características de métodos de inferência de GRNs, levando ao desenvolvimento de um estudo comparativo entre quatro métodos disponíveis na literatura. A avaliação dos métodos de inferência levou ao desenvolvimento de novas metodologias para essa tarefa: (a) uma função critério, baseada na entropia de Tsallis, com objetivo de inferir os inter-relacionamentos gênicos com maior precisão; (b) uma estratégia alternativa de busca para a inferência de GRNs, chamada SFFS-MR, a qual tenta explorar uma característica local das interdependências regulatórias dos genes, conhecida como predição intrinsecamente multivariada; e (c) uma estratégia de busca, interativa e flutuante, que baseia-se na topologia de redes scale-free, como uma característica global das GRNs, considerada como uma informação a priori, com objetivo de oferecer um método mais adequado para essa classe de problemas e, com isso, obter resultados com maior precisão. Também é objetivo deste trabalho aplicar a metodologia desenvolvida em dados biológicos, em particular na identificação de GRNs relacionadas a funções específicas de Arabidopsis thaliana. Os resultados experimentais, obtidos a partir da aplicação das metodologias propostas, mostraram que os respectivos ganhos de desempenho foram significativos e adequados para os problemas a que foram propostos. / Thanks to recent advances in molecular biology and biochemistry, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as DNA microarrays, SAGE, and more recently RNA-Seq, generating a massive volume of biological data. The mapping of gene transcription levels at large scale is motivated by the proposition that information of the functional state of an organism is broadly determined by its gene expression. However, the main limitation faced is the small number of samples (experiments) with huge dimensionalities (genes). Thus, it is necessary to develop new computational and statistics techniques to reduce the inherent estimation error committed in the presence of a small number of samples with large dimensionality. In this context, particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. The main objective of this research is to infer how genes are regulated, bringing knowledge about the molecular interactions and metabolic activities of an organism. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. In this direction, this work presents some contributions: (1) feature selection software; (2) new approach for the generation of artificial gene networks (AGN); (3) criterion function based on Tsallis entropy; (4) alternative search strategies for GRNs inference: SFFS-MR and SFFS-BA; (5) biological investigation of GRNs involved in the thiamine biosynthesis by adopting the Arabidopsis thaliana as a model plant. The feature selection software is an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools. In particular, a feature selection method for GRNs inference is also implemented in the software. Although there are several methods proposed in the literature for the modeling and identification of GRNs, an important open problem regards: how to validate such methods and its results? This work presents a new approach for validation of such algorithms by considering three main aspects: (a) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (b) computational method for GRNs identification from temporal expression data; and (c) Validation of the identified AGN-based network through comparison with the original network. Through the development of the AGN model was possible the analysis and investigation of the characteristics of GRNs inference methods, leading to the development of a comparative study of four inference methods available in literature. The evaluation of inference methods led to the development of new methodologies for this task: (a) a new criterion function based on Tsallis entropy, in order to infer the genetic inter-relationships with better precision; (b) an alternative search strategy for the GRNs inference, called SFFS-MR, which tries to exploit a local property of the regulatory gene interdependencies, which is known as intrinsically multivariate prediction; and (c) a search strategy, interactive and floating, which is based on scale-free network topology, as a global property of the GRNs, which is considered as a priori information, in order to provide a more appropriate method for this class of problems and thereby achieve results with better precision. It is also an objective of this work, to apply the developed methodology in biological data, particularly in identifying GRNs related to specific functions of the Arabidopsis thaliana. The experimental results, obtained from the application of the proposed methodologies, indicate that the respective performances of each methodology were significant and adequate to the problems that have been proposed. entropia entropia de Tsallis inferência de redes reconhecimento de padrões redes complexas redes de regulação gênica seleção de características validação complex networks entropy feature selection gene regulatory networks network inference pattern recognition Tsallis entropy validation
178	Seleção de características e predição intrinsecamente multivariada em identificação de redes de regulação gênica / Feature selection and intrinsically multivariate prediction in gene regulatory networks identification David Corrêa Martins Junior 01 December 2008 (has links) Seleção de características é um tópico muito importante em aplicações de reconhecimento de padrões, especialmente em bioinformática, cujos problemas são geralmente tratados sobre um conjunto de dados envolvendo muitas variáveis e poucas observações. Este trabalho analisa aspectos de seleção de características no problema de identificação de redes de regulação gênica a partir de sinais de expressão gênica. Particularmente, propusemos um modelo de redes gênicas probabilísticas (PGN) que devolve uma rede construída a partir da aplicação recorrente de algoritmos de seleção de características orientados por uma função critério baseada em entropia condicional. Tal critério embute a estimação do erro por penalização de amostras raramente observadas. Resultados desse modelo aplicado a dados sintéticos e a conjuntos de dados de microarray de Plasmodium falciparum, um agente causador da malária, demonstram a validade dessa técnica, tendo sido capaz não apenas de reproduzir conhecimentos já produzidos anteriormente, como também de produzir novos resultados. Outro aspecto investigado nesta tese é o fenômeno da predição intrinsecamente multivariada (IMP), ou seja, o fato de um conjunto de características ser um ótimo caracterizador dos objetos em questão, mas qualquer de seus subconjuntos propriamente contidos não conseguirem representá-los de forma satisfatória. Neste trabalho, as condições para o surgimento desse fenômeno foram obtidas de forma analítica para conjuntos de 2 e 3 características em relação a uma variável alvo. No contexto de redes de regulação gênica, foram obtidas evidências de que genes alvo de conjuntos IMP possuem um enorme potencial para exercerem funções vitais em sistemas biológicos. O fenômeno conhecido como canalização é particularmente importante nesse contexto. Em dados de microarray de melanoma, constatamos que o gene DUSP1, conhecido por exercer função canalizadora, foi aquele que obteve o maior número de conjuntos de genes IMP, sendo que todos eles possuem lógicas de predição canalizadoras. Além disso, simulações computacionais para construção de redes com 3 ou mais genes mostram que o tamanho do território de um gene alvo pode ter um impacto positivo em seu teor de IMP com relação a seus preditores. Esta pode ser uma evidência que confirma a hipótese de que genes alvo de conjuntos IMP possuem a tendência de controlar diversas vias metabólicas cruciais para a manutenção das funções vitais de um organismo. / Feature selection is a crucial topic in pattern recognition applications, especially in bioinformatics, where problems usually involve data with a large number of variables and small number of observations. The present work addresses feature selection aspects in the problem of gene regulatory network identification from expression profiles. Particularly, we proposed a probabilistic genetic network model (PGN) that recovers a network constructed from the recurrent application of feature selection algorithms guided by a conditional entropy based criterion function. Such criterion embeds error estimation by penalization of rarely observed patterns. Results from this model applied to synthetic and real data sets obtained from Plasmodium falciparum microarrays, a malaria agent, demonstrate the validity of this technique. This method was able to not only reproduce previously produced knowledge, but also to produce other potentially relevant results. The intrinsically multivariate prediction (IMP) phenomenon has been also investigated. This phenomenon is related to the fact of a feature set being a nice predictor of the objects in study, but all of its properly contained subsets cannot predict such objects satisfactorily. In this work, the conditions for the rising of this phenomenon were analitically obtained for sets of 2 and 3 features regarding a target variable. In the gene regulatory networks context, evidences have been achieved in which target genes of IMP sets possess a great potential to execute vital functions in biological systems. The phenomenon known as canalization is particularly important in this context. In melanoma microarray data, we verified that DUSP1 gene, known by having canalization function, was the one which composed the largest number of IMP gene sets. It was also verified that all these sets have canalizing predictive logics. Moreover, computational simulations for generation of networks with 3 or more genes show that the territory size of a target gene can contribute positively to its IMP score with regard to its predictors. This could be an evidence that confirms the hypothesis stating that target genes of IMP sets are inclined to control several metabolic pathways essential to the maintenance of the vital functions of an organism. coeficiente de determinação entropia condicional média malária melanoma microarray predição intrinsecamente multivariada redes de regulação gênica seleção de características coefficient of determination feature selection gene regulatory networks intrinsically multivariate prediction malaria mean conditional entropy melanoma microarray
179	C. Elegans Metabolic Gene Regulatory Networks: A Dissertation Arda, H. Efsun 30 July 2010 (has links) In multicellular organisms, determining when and where genes will be expressed is critical for their development and physiology. Transcription factors (TFs) are major specifiers of differential gene expression. By establishing physical contacts with the regulatory elements of their target genes, TFs often determine whether the target genes will be expressed or not. These physical and/or regulatory TF-DNA interactions can be modeled into gene regulatory networks (GRNs), which provide a systems-level view of differential gene expression. Thus far, much of the GRN delineation efforts focused on metazoan development, whereas the organization of GRNs that pertain to systems physiology remains mostly unexplored. My work has focused on delineating the first gene regulatory network of the nematode Caenorhabditis elegans metabolic genes, and investigating how this network relates to the energy homeostasis of the nematode. The resulting metabolic GRN consists of ~70 metabolic genes, 100 TFs and more than 500 protein–DNA interactions. It also includes novel protein-protein interactions involving the metabolic transcriptional cofactor MDT-15 and several TFs that occur in the metabolic GRN. On a global level, we found that the metabolic GRN is enriched for nuclear hormone receptors (NHRs). NHRs form a special class of TFs that can interact with diffusible biomolecules and are well-known regulators of lipid metabolism in other organisms, including humans. Interestingly, NHRs comprise the largest family of TFs in nematodes; the C. elegans genome encodes 284 NHRs, most of which are uncharacterized. In our study, we show that the C. elegans NHRs that we retrieved in the metabolic GRN organize into network modules, and that most of these NHRs function to maintain lipid homeostasis in the nematode. Network modularity has been proposed to facilitate rapid and robust changes in gene expression. Our results suggest that the C. elegans metabolic GRN may have evolved by combining NHR family expansion with the specific modular wiring of NHRs to enable the rapid adaptation of the animal to different environmental cues. Caenorhabditis elegans Caenorhabditis elegans Proteins Gene Regulatory Networks Gene Expression Regulation Receptors Cytoplasmic and Nuclear Amino Acids, Peptides, and Proteins Animal Experimentation and Research Genetic Phenomena Genetics and Genomics
180	Machine Learning for Exploring State Space Structure in Genetic Regulatory Networks Thomas, Rodney H. 01 January 2018 (has links) Genetic regulatory networks (GRN) offer a useful model for clinical biology. Specifically, such networks capture interactions among genes, proteins, and other metabolic factors. Unfortunately, it is difficult to understand and predict the behavior of networks that are of realistic size and complexity. In this dissertation, behavior refers to the trajectory of a state, through a series of state transitions over time, to an attractor in the network. This project assumes asynchronous Boolean networks, implying that a state may transition to more than one attractor. The goal of this project is to efficiently identify a network's set of attractors and to predict the likelihood with which an arbitrary state leads to each of the network’s attractors. These probabilities will be represented using a fuzzy membership vector. Predicting fuzzy membership vectors using machine learning techniques may address the intractability posed by networks of realistic size and complexity. Modeling and simulation can be used to provide the necessary training sets for machine learning methods to predict fuzzy membership vectors. The experiments comprise several GRNs, each represented by a set of output classes. These classes consist of thresholds τ and ¬τ, where τ = [τlaw,τhigh]; state s belongs to class τ if the probability of its transitioning to attractor 􀜣 belongs to the range [τlaw,τhigh]; otherwise it belongs to class ¬τ. Finally, each machine learning classifier was trained with the training sets that was previously collected. The objective is to explore methods to discover patterns for meaningful classification of states in realistically complex regulatory networks. The research design took a GRN and a machine learning method as input and produced output class < Ατ > and its negation ¬ < Ατ >. For each GRN, attractors were identified, data was collected by sampling each state to create fuzzy membership vectors, and machine learning methods were trained to predict whether a state is in a healthy attractor or not. For T-LGL, SVMs had the highest accuracy in predictions (between 93.6% and 96.9%) and precision (between 94.59% and 97.87%). However, naive Bayesian classifiers had the highest recall (between 94.71% and 97.78%). This study showed that all experiments have extreme significance with pvalue < 0.0001. The contribution this research offers helps clinical biologist to submit genetic states to get an initial result on their outcomes. For future work, this implementation could use other machine learning classifiers such as xgboost or deep learning methods. Other suggestions offered are developing methods that improves the performance of state transition that allow for larger training sets to be sampled. asynchronous Boolean networks attractors Boolean networks cross-validation decision trees fuzzy basins fuzzy membership vectors fuzzy vectors genetic regulatory networks Markov Chain Monte Carlo naïve Bayesian classifiers support vector machines Computer Sciences

Search results