• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 49
  • 21
  • 19
  • 3
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 126
  • 126
  • 126
  • 39
  • 26
  • 21
  • 21
  • 20
  • 17
  • 16
  • 16
  • 15
  • 13
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Modélisation stochastique de l'expression des gènes et inférence de réseaux de régulation / From stochastic modelling of gene expression to inference of regulatory networks

Herbach, Ulysse 27 September 2018 (has links)
L'expression des gènes dans une cellule a longtemps été observable uniquement à travers des quantités moyennes mesurées sur des populations. L'arrivée des techniques «single-cell» permet aujourd'hui d'observer des niveaux d'ARN et de protéines dans des cellules individuelles : il s'avère que même dans une population de génome identique, la variabilité entre les cellules est parfois très forte. En particulier, une description moyenne est clairement insuffisante étudier la différenciation cellulaire, c'est-à-dire la façon dont les cellules souches effectuent des choix de spécialisation. Dans cette thèse, on s'intéresse à l'émergence de tels choix à partir de réseaux de régulation sous-jacents entre les gènes, que l'on souhaiterait pouvoir inférer à partir de données. Le point de départ est la construction d'un modèle stochastique de réseaux de gènes capable de reproduire les observations à partir d'arguments physiques. Les gènes sont alors décrits comme un système de particules en interaction qui se trouve être un processus de Markov déterministe par morceaux, et l'on cherche à obtenir un modèle statistique à partir de sa loi invariante. Nous présentons deux approches : la première correspond à une approximation de champ assez populaire en physique, pour laquelle nous obtenons un résultat de concentration, et la deuxième se base sur un cas particulier que l'on sait résoudre explicitement, ce qui aboutit à un champ de Markov caché aux propriétés intéressantes / Gene expression in a cell has long been only observable through averaged quantities over cell populations. The recent development of single-cell transcriptomics has enabled gene expression to be measured in individual cells: it turns out that even in an isogenic population, the molecular variability can be very important. In particular, an averaged description is not sufficient to account for cell differentiation. In this thesis, we are interested in the emergence of such cell decision-making from underlying gene regulatory networks, which we would like to infer from data. The starting point is the construction of a stochastic gene network model that is able to explain the data using physical arguments. Genes are then seen as an interacting particle system that happens to be a piecewise-deterministic Markov process, and our aim is to derive a tractable statistical model from its stationary distribution. We present two approaches: the first one is a popular field approximation, for which we obtain a concentration result, and the second one is based on an analytically tractable particular case, which provides a hidden Markov random field with interesting properties
112

Outils d'aide à la conception pour l'ingénierie de systèmes biologiques / Design tools for the engineering of biological systems

Rosati, Elise 05 April 2018 (has links)
En biologie synthétique, il existe plusieurs manières d’adresser les problèmes soulevés dans plusieurs domaines comme la thérapeutique, les biofuels, les biomatériaux ou encore les biocapteurs. Nous avons choisi de nous concentrer sur l’une d’entre elles : les réseaux de régulation génétique (RRG). Un constat peut être fait : la diversité des problèmes résolus grâce aux RRGs est bridée par la complexité de ces RRGs, qui a atteint une limite. Quelles solutions s’offrent aux biologistes, pour repousser cette limite et continuer d’augmenter la complexité de leur système ? Cette thèse a pour but de fournir aux biologistes les outils nécessaires à la conception et à la simulation de RRGs complexes. Un examen de l’état de l’art en la matière nous a mené à adapter les outils de la micro-électronique à la biologie ainsi qu’à créer un algorithme de programmation génétique pour la conception des RRGs. D’une part, nous avons élaboré les modèles Verilog A de différents systèmes biologiques (passe-bande, proie-prédateur, repressilator, XOR) ainsi que de la diffusion spatiotemporelle d’une molécule. Ces modèles fonctionnent très bien avec plusieurs simulateurs électroniques (Spectre et NgSpice). D’autre part, les premières marches vers l’automatisation de la conception de RRGs ont été gravies. En effet, nous avons développé un algorithme capable d’optimiser les paramètres d’un RRG pour remplir un cahier des charges donné. De plus, la programmation génétique a été utilisée pour optimiser non seulement les paramètres d’un RRG mais aussi sa topologie. Ces outils ont su prouver leur utilité en apportant des réponses pertinentes à des problèmes soulevés lors du développement de systèmes biologiques. Ce travail a permis de montrer que notre approche, à savoir adapter les outils de la micro-électronique et utiliser des algorithmes de programmation génétique, est valide dans le contexte de la biologie synthétique. L’assistance que notre environnement de développement fournit au biologiste devrait encourager l’émergence de systèmes plus complexes. / In synthetic biology, Gene Regulatory Networks (GRN) are one of the main ways to create new biological functions to solve problems in various areas (therapeutics, biofuels, biomaterials, biosensing). However, the complexity of the designed networks has reached a limit, thereby restraining the variety of problems they can address. How can biologists overcome this limit and further increase the complexity of their systems? The goal of this thesis is to provide the biologists with tools to assist them in the design and simulation of complex GRNs. To this aim, the current state of the art was examined and it was decided to adapt tools from the micro-electronic field to biology, as well as to create a Genetic Programming algorithm for GRN design. On the one hand, models of diffusion and of other various systems (band-pass, prey-predator, repressilator, XOR) were created and written in Verilog A. They are already implemented and well-functioning on the Spectre solver as well as a free solver, namely NgSpice. On the other hand, the first steps of automatic GRN design were achieved. Indeed, an algorithm able to optimize the parameters of a given GRN according to a specification was developed. Moreover, Genetic Programming was applied to GRN design, allowing the optimization of both the topology and the parameters of a GRN. These tools proved their usefulness for the biologists’ community by efficiently answering relevant biological questions arising in the development of a system. With this work, we were able to show that adapting microelectronics and Genetic Programming tools to biology is doable and useful. By assisting design and simulation, such tools should promote the emergence of more complex systems.
113

Seleção de características e predição intrinsecamente multivariada em identificação de redes de regulação gênica / Feature selection and intrinsically multivariate prediction in gene regulatory networks identification

Martins Junior, David Corrêa 01 December 2008 (has links)
Seleção de características é um tópico muito importante em aplicações de reconhecimento de padrões, especialmente em bioinformática, cujos problemas são geralmente tratados sobre um conjunto de dados envolvendo muitas variáveis e poucas observações. Este trabalho analisa aspectos de seleção de características no problema de identificação de redes de regulação gênica a partir de sinais de expressão gênica. Particularmente, propusemos um modelo de redes gênicas probabilísticas (PGN) que devolve uma rede construída a partir da aplicação recorrente de algoritmos de seleção de características orientados por uma função critério baseada em entropia condicional. Tal critério embute a estimação do erro por penalização de amostras raramente observadas. Resultados desse modelo aplicado a dados sintéticos e a conjuntos de dados de microarray de Plasmodium falciparum, um agente causador da malária, demonstram a validade dessa técnica, tendo sido capaz não apenas de reproduzir conhecimentos já produzidos anteriormente, como também de produzir novos resultados. Outro aspecto investigado nesta tese é o fenômeno da predição intrinsecamente multivariada (IMP), ou seja, o fato de um conjunto de características ser um ótimo caracterizador dos objetos em questão, mas qualquer de seus subconjuntos propriamente contidos não conseguirem representá-los de forma satisfatória. Neste trabalho, as condições para o surgimento desse fenômeno foram obtidas de forma analítica para conjuntos de 2 e 3 características em relação a uma variável alvo. No contexto de redes de regulação gênica, foram obtidas evidências de que genes alvo de conjuntos IMP possuem um enorme potencial para exercerem funções vitais em sistemas biológicos. O fenômeno conhecido como canalização é particularmente importante nesse contexto. Em dados de microarray de melanoma, constatamos que o gene DUSP1, conhecido por exercer função canalizadora, foi aquele que obteve o maior número de conjuntos de genes IMP, sendo que todos eles possuem lógicas de predição canalizadoras. Além disso, simulações computacionais para construção de redes com 3 ou mais genes mostram que o tamanho do território de um gene alvo pode ter um impacto positivo em seu teor de IMP com relação a seus preditores. Esta pode ser uma evidência que confirma a hipótese de que genes alvo de conjuntos IMP possuem a tendência de controlar diversas vias metabólicas cruciais para a manutenção das funções vitais de um organismo. / Feature selection is a crucial topic in pattern recognition applications, especially in bioinformatics, where problems usually involve data with a large number of variables and small number of observations. The present work addresses feature selection aspects in the problem of gene regulatory network identification from expression profiles. Particularly, we proposed a probabilistic genetic network model (PGN) that recovers a network constructed from the recurrent application of feature selection algorithms guided by a conditional entropy based criterion function. Such criterion embeds error estimation by penalization of rarely observed patterns. Results from this model applied to synthetic and real data sets obtained from Plasmodium falciparum microarrays, a malaria agent, demonstrate the validity of this technique. This method was able to not only reproduce previously produced knowledge, but also to produce other potentially relevant results. The intrinsically multivariate prediction (IMP) phenomenon has been also investigated. This phenomenon is related to the fact of a feature set being a nice predictor of the objects in study, but all of its properly contained subsets cannot predict such objects satisfactorily. In this work, the conditions for the rising of this phenomenon were analitically obtained for sets of 2 and 3 features regarding a target variable. In the gene regulatory networks context, evidences have been achieved in which target genes of IMP sets possess a great potential to execute vital functions in biological systems. The phenomenon known as canalization is particularly important in this context. In melanoma microarray data, we verified that DUSP1 gene, known by having canalization function, was the one which composed the largest number of IMP gene sets. It was also verified that all these sets have canalizing predictive logics. Moreover, computational simulations for generation of networks with 3 or more genes show that the territory size of a target gene can contribute positively to its IMP score with regard to its predictors. This could be an evidence that confirms the hypothesis stating that target genes of IMP sets are inclined to control several metabolic pathways essential to the maintenance of the vital functions of an organism.
114

Estudo dinâmico da expressão gênica global durante a interação STEC-enterócito utilizando séries temporais / Dinamic study of global gene expression along STEC-enterocyte interaction using time series

Priscila Iamashita 27 November 2017 (has links)
As Escherichia coli produtoras da toxina Shiga (STEC) são importantes patógenos humanos, causando desde diarréias até a síndrome hemolítica urêmica (SHU). Há diversos sorotipos associados a SHU, tais como O157:H7 e O113:H21. No Brasil o sorotipo O113:H21 ainda não aparece associado a SHU, embora seja frequentemente isolado de carcaças e fezes bovinas. Nosso grupo já investigou comparativamente as redes de coexpressão gênica (RCG) de STEC EH41 (associado à SHU) e Ec472/01 (isolado de fezes bovinas). A análise comparativa do perfil transcricional de EH41 e Ec472/01 revelou que somente EH41 expressa um conjunto de genes que inclui o regulador transcricional dicA. A maioria destes genes está situada em um único módulo transcricional e podem estar associados a fatores de virulência. Assim, este trabalho centrou-se numa abordagem de biologia de sistemas, integrando análises genômica e fenotípica da resposta de enterócitos Caco-2 à EH41 e Ec472/01. A análise genômica baseou-se no estudo temporal de RCG para compreender os mecanismos moleculares envolvidos na patogenicidade desses dois isolados. As alterações fenotípicas ocorridas nas células Caco-2 ao longo da exposição a cada um dos isolados de STEC foram visualizadas através de MEV. A análise genômica mostrou que o mecanismo molecular da resposta de Caco-2 durante a interação com EH41 ou Ec472/01 é claramente distinto. Nas redes do grupo Caco-2/EH41 as alterações topológicas incluíram a perda do status scale free e a sua recuperação, com o estabelecimento de uma nova hierarquia de genes na rede. Esses resultados se enquadram no modelo de redes para transição saúde-doença: a nova rede representa a resposta adaptativa da célula ao patógeno, o que não significa um retorno à normalidade. Já no grupo Caco-2/Ec472 as redes, após a perda do status scale free, não recuperam esse status até o final do período estudado, o que sugere um estado de transição mais prolongado para reorganização da hierarquia da rede. Mais ainda, através da caracterização dos módulos transcricionais, foi possível compreender dinamicamente os mecanismos moleculares envolvidos na resposta diferencial de Caco-2 aos dois isolados aqui estudados. STEC EH41 induz rapidamente a resposta inflamatória e apoptótica a partir da primeira hora de interação enterócito-bactéria. Por outro lado, células Caco-2 em contato com Ec472/01 ativam, a partir de uma hora, a fagocitose e, a partir da segunda hora, expressam moduladores da homeostase imune. A análise fenotípica das células Caco-2 mostrou, de forma nítida, uma maior destruição dos microvilos dos enterócitos em contato com EH41 do que com Ec472/01. Integrando os resultados genômicos e fenotípicos pode-se concluir que EH41 induz em Caco-2 - em comparação com Ec472/01 - maiores e mais rápidas alterações na expressão gênica global, além de uma resposta inflamatória e apoptótica excessiva, levando assim a alterações morfológicas mais pronunciadas nas células Caco-2. Em seu conjunto, esses resultados contribuem para uma melhor compreensão dos mecanismos moleculares envolvidos na patogenicidade das STECs associadas à SHU. Assim, as perspectivas de desenvolvimento deste trabalho deverão incluir a investigação de fatores de virulência e vias moleculares envolvidas na indução das respostas imunes que podem conduzir à SHU / Shiga toxin-producing Escherichia coli (STEC) O113:H21 strains are associated with human diarrhea and some of these strains may cause hemolytic uremic syndrome (HUS). In Brazil O113:H21 strains are commonly found in cattle but, so far, were not isolated from HUS patients. Previously, our group conducted comparative gene co-expression network (GCN) analyses of two O113:H21 STEC strains: EH41, isolated from a HUS patient in Australia, and Ec472/01, isolated from bovine feces in Brazil. Differential transcriptome profiles for EH41 and Ec472/01 revealed a gene set exclusively expressed in EH41, which includes the dicA putative virulence factor regulator. GCN analysis showed that this set of genes constitutes an EH41 specific transcriptional module which may be associated to virulence factors. Therefore, in the present work a system biology approach was conducted to investigate the differential Caco-2 response - genomic and phenotypic - to EH41 (Caco-2/EH41) or to Ec472/01 (Caco- 2/Ec472) along enterocyte-bacteria interaction. The genomic analysis was based on temporal GCN data in order to gain a better understanding on the molecular mechanisms underlying the capacity to cause HUS. The phenotypic alterations in Caco-2 during enterocyte-bacteria interaction were assessed by scanning electronic microscopy (SEM). The genomic analysis showed that the molecular mechanism of Caco-2 response to EH41 or to Ec472/01 during enterocyte-bacteria interaction is clearly different. The GCN topological analyses for Caco-2/EH41 group revealed loss of the scale-free status after one hour of interaction, persistence of this condition along the second hour and establishment of a new gene hierarchy thereafter. These events resemble the network mechanism of health-disease transition. The new established network represents an adaptive cell response to the pathogen and not the return to a \"normal\" state. Conversely, the networks for Caco-2/Ec472 group showed a slow and progressive loss of the scale-free status without its restoration at the end of the time interval here studied. Through transcriptional module characterization it was possible to reveal the dynamic of the molecular mechanism involved in the Caco-2 differential responses to the STEC isolates. EH41 induces a rapid inflammatory and apoptotic response just after the first hour of enterocyte-bacteria interaction. Instead, the Caco-2 response to Ec472/01 is characterized by phagocytosis activation at the first hour, followed by the expression of immune response modulators after the second hour. SEM phenotypic analysis of Caco-2 cells along enterocyte-bacteria interaction showed more intense microvilli destruction in cells exposed to EH41, when compared to cells exposed to Ec472/01. The integration of genomic and phenotypic data allowed us to conclude that EH41, comparatively to Ec472/01, induces greater and precocious global gene expression alterations in Caco-2, what is related to excessive inflammatory and apoptotic responses. These responses are associated with the pronounced morphological alterations observed by SEM in Caco-2 cells exposed to EH41. Altogether, these results contribute for a better understanding of the molecular mechanism involved in STEC pathogenicity associated to HUS. Therefore, the future perspectives for the development of the present work should include the investigation of virulence factors and molecular pathways involved in the induction of immune responses leading to HUS
115

A functional genomics approach to map transcriptional and post-transcriptional gene regulatory networks

Bhinge, Akshay Anant 15 October 2009 (has links)
It has been suggested that organismal complexity correlates with the complexity of gene regulation. Transcriptional control of gene expression is mediated by binding of regulatory proteins to cis-acting sequences on the genome. Hence, it is crucial to identify the chromosomal targets of transcription factors (TFs) to delineate transcriptional regulatory networks underlying gene expression programs. The development of ChIP-chip technology has enabled high throughput mapping of TF binding sites across the genome. However, there are many limitations to the technology including the availability of whole genome arrays for complex organisms such human or mouse. To circumvent these limitations, we developed the Sequence Tag Analysis of Genomic Enrichment (STAGE) methodology that is based on extracting short DNA sequences or “tags” from ChIP-enriched DNA. With improvements in sequencing technologies, we applied the recently developed ChIP-Seq technique i.e. ChIP followed by ultra high throughput sequencing, to identify binding sites for the TF E2F4 across the human genome. We identified previously uncharacterized E2F4 binding sites in intergenic regions and found that several microRNAs are potential E2F4 targets. Binding of TFs to their respective chromosomal targets requires access of the TF to its regulatory element, which is strongly influenced by nucleosomal remodeling. In order to understand nucleosome remodeling in response to transcriptional perturbation, we used ultra high throughput sequencing to map nucleosome positions in yeast that were subjected to heat shock or were grown normally. We generated nucleosome remodeling profiles across yeast promoters and found that specific remodeling patterns correlate with specific TFs active during the transcriptional reprogramming. Another important aspect of gene regulation operates at the post-transcriptional level. MicroRNAs (miRNAs) are ~22 nucleotide non-coding RNAs that suppress translation or mark mRNAs for degradation. MiRNAs regulate TFs and in turn can be regulated by TFs. We characterized a TF-miRNA network involving the oncofactor Myc and the miRNA miR-22 that suppresses the interferon pathway as primary fibroblasts enter a stage of rapid proliferation. We found that miR-22 suppresses the interferon pathway by inhibiting nuclear translocation of the TF NF-kappaB. Our results show how the oncogenic TF Myc cross-talks with other TF regulatory pathways via a miRNA intermediary. / text
116

Inverse inference in the asymmetric Ising model

Sakellariou, Jason 22 February 2013 (has links) (PDF)
Recent experimental techniques in biology made possible the acquisition of overwhelming amounts of data concerning complex biological networks, such as neural networks, gene regulation networks and protein-protein interaction networks. These techniques are able to record states of individual components of such networks (neurons, genes, proteins) for a large number of configurations. However, the most biologically relevantinformation lies in their connectivity and in the way their components interact, information that these techniques aren't able to record directly. The aim of this thesis is to study statistical methods for inferring information about the connectivity of complex networks starting from experimental data. The subject is approached from a statistical physics point of view drawing from the arsenal of methods developed in the study of spin glasses. Spin-glasses are prototypes of networks of discrete variables interacting in a complex way and are widely used to model biological networks. After an introduction of the models used and a discussion on the biological motivation of the thesis, all known methods of network inference are introduced and analysed from the point of view of their performance. Then, in the third part of the thesis, a new method is proposed which relies in the remark that the interactions in biology are not necessarily symmetric (i.e. the interaction from node A to node B is not the same as the one from B to A). It is shown that this assumption leads to methods that are both exact and efficient. This means that the interactions can be computed exactly, given a sufficient amount of data, and in a reasonable amount of time. This is an important original contribution since no other method is known to be both exact and efficient.
117

Modélisation de l'évolution temporelle de l'expression des gènes sur la base de données de puces à ADN: application à la drosophile

Haye, Alexandre 24 June 2011 (has links)
Cette thèse de doctorat s’inscrit dans le développement et l’utilisation de méthodes mathématiques et informatiques qui exploitent les données temporelles d’expression des gènes issues de puces à ADN afin de rationaliser et de modéliser les réseaux de régulation génique. Dans cette optique, nous nous sommes principalement intéressés aux données d’expression des gènes de la drosophile (Drosophila melanogaster) pendant son développement, du stade embryonnaire au stade adulte. Nous avons également étudié des données concernant le développement d’autres eucaryotes supérieurs, la réponse d’une bactérie soumises à différents stress et le cycle cellulaire d’une levure. Ce travail a été réalisé selon trois volets principaux :la détection des stades de développement et des perturbations, les classifications de profils d’expression et la modélisation de réseaux de régulation.<p><p>Premièrement, l’observation des données d’expression utilisées nous a conduits à approfondir l’étude des phénomènes survenant lors des changements de stades de développement de la drosophile. Dans ce but, deux méthodes de détection automatique de ces changements ont été développées et appliquées aux données temporelles disponibles sur le développement d’eucaryotes supérieurs. Elles ont également été appliquées à des données temporelles relatives à des perturbations externes de bactéries. Cette étude à montré qu’une formulation mathématique simple permettait de retrouver les instants expérimentaux où une perturbation ou un changement de stade de développement est observé, à partir uniquement des profils d’expression. Par ailleurs, la réponse à une perturbation externe s’avère non distinguable d’une succession de stades de développement, sur la base des seuls profils temporels d’expression.<p><p>Deuxièmement, en raison des dimensions du problème constitué par les données d’expression de plusieurs milliers de gènes et de l’impossibilité de distinguer le rôle dans la régulation des gènes qui présentent des profils d’expression similaires, il s’est avéré nécessaire de classifier les gènes selon leurs profils d’expression. En nous basant sur les résultats obtenus lors de la détection des stades de développement, la démarche suivie est de regrouper les gènes qui présentent des profils temporels d’expression aux comportements similaires non seulement au cours de la série temporelle complète, mais également dans chacun des stades de développement. Dans cette optique, trois distances ont été proposées et utilisées dans une classification hiérarchique des données d’expression de la drosophile.<p><p>Troisièmement, des structures de modèles linéaires et non linéaires ainsi que des méthodes d’estimation et de réduction paramétriques ont été développées et utilisées pour reproduire les données d’expression du développement de la drosophile. Les résultats de ce travail ont montré qu’avec une structure de modèle linéaire simple, la reproduction des profils expérimentaux était excellente et que, dans ce cas, le réseau de régulation génique de la drosophile pouvait se contenter d’une faible connectivité (en moyenne 3 connexions par classe de gènes) et ce, sans hypothèse a priori. Toutefois, les modèles linéaires ont ensuite sérieusement été remis en question par des analyses de robustesse aux perturbations paramétriques et de stabilité des profils après extrapolation dans le temps. Dès lors, quatre structures de modèles non linéaires et cinq méthodes de réduction paramétrique ont été proposées et utilisées pour concilier les critères de reproduction des données, de robustesse et de stabilité des réseaux identifiés. En outre, ces méthodes de modélisation ont été appliquées à un sous-ensemble de 20 gènes impliqués dans le développement musculaire de la drosophile et pour lesquels 36 interactions ont été validées expérimentalement, ainsi qu’à des profils synthétiques bruités. Nous avons pu constater que plus de la moitié des connexions et non-connexions sont retrouvées par trois modèles non linéaires. Les résultats de cette étude ont permis d’éliminer certaines structures de modèle et méthodes de réduction et ont mis en lumière plusieurs directions futures à suivre dans la démarche de modélisation des réseaux de régulation génique. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
118

Redes complexas de expressão gênica: síntese, identificação, análise e aplicações / Gene expression complex networks: synthesis, identification, analysis and applications

Fabricio Martins Lopes 21 February 2011 (has links)
Os avanços na pesquisa em biologia molecular e bioquímica permitiram o desenvolvimento de técnicas capazes de extrair informações moleculares de milhares de genes simultaneamente, como DNA Microarrays, SAGE e, mais recentemente RNA-Seq, gerando um volume massivo de dados biológicos. O mapeamento dos níveis de transcrição dos genes em larga escala é motivado pela proposição de que o estado funcional de um organismo é amplamente determinado pela expressão de seus genes. No entanto, o grande desafio enfrentado é o pequeno número de amostras (experimentos) com enorme dimensionalidade (genes). Dessa forma, se faz necessário o desenvolvimento de novas técnicas computacionais e estatísticas que reduzam o erro de estimação intrínseco cometido na presença de um pequeno número de amostras com enorme dimensionalidade. Neste contexto, um foco importante de pesquisa é a modelagem e identificação de redes de regulação gênica (GRNs) a partir desses dados de expressão. O objetivo central nesta pesquisa é inferir como os genes estão regulados, trazendo conhecimento sobre as interações moleculares e atividades metabólicas de um organismo. Tal conhecimento é fundamental para muitas aplicações, tais como o tratamento de doenças, estratégias de intervenção terapêutica e criação de novas drogas, bem como para o planejamento de novos experimentos. Nessa direção, este trabalho apresenta algumas contribuições: (1) software de seleção de características; (2) nova abordagem para a geração de Redes Gênicas Artificiais (AGNs); (3) função critério baseada na entropia de Tsallis; (4) estratégias alternativas de busca para a inferência de GRNs: SFFS-MR e SFFS-BA; (5) investigação biológica das redes gênicas envolvidas na biossíntese de tiamina, usando a Arabidopsis thaliana como planta modelo. O software de seleção de características consiste de um ambiente de código livre, gráfico e multiplataforma para problemas de bioinformática, que disponibiliza alguns algoritmos de seleção de características, funções critério e ferramentas de visualização gráfica. Em particular, implementa um método de inferência de GRNs baseado em seleção de características. Embora existam vários métodos propostos na literatura para a modelagem e identificação de GRNs, ainda há um problema muito importante em aberto: como validar as redes identificadas por esses métodos computacionais? Este trabalho apresenta uma nova abordagem para validação de tais algoritmos, considerando três aspectos principais: (a) Modelo para geração de Redes Gênicas Artificiais (AGNs), baseada em modelos teóricos de redes complexas, os quais são usados para simular perfis temporais de expressão gênica; (b) Método computacional para identificação de redes gênicas a partir de dados temporais de expressão; e (c) Validação das redes identificadas por meio do modelo AGN. O desenvolvimento do modelo AGN permitiu a análise e investigação das características de métodos de inferência de GRNs, levando ao desenvolvimento de um estudo comparativo entre quatro métodos disponíveis na literatura. A avaliação dos métodos de inferência levou ao desenvolvimento de novas metodologias para essa tarefa: (a) uma função critério, baseada na entropia de Tsallis, com objetivo de inferir os inter-relacionamentos gênicos com maior precisão; (b) uma estratégia alternativa de busca para a inferência de GRNs, chamada SFFS-MR, a qual tenta explorar uma característica local das interdependências regulatórias dos genes, conhecida como predição intrinsecamente multivariada; e (c) uma estratégia de busca, interativa e flutuante, que baseia-se na topologia de redes scale-free, como uma característica global das GRNs, considerada como uma informação a priori, com objetivo de oferecer um método mais adequado para essa classe de problemas e, com isso, obter resultados com maior precisão. Também é objetivo deste trabalho aplicar a metodologia desenvolvida em dados biológicos, em particular na identificação de GRNs relacionadas a funções específicas de Arabidopsis thaliana. Os resultados experimentais, obtidos a partir da aplicação das metodologias propostas, mostraram que os respectivos ganhos de desempenho foram significativos e adequados para os problemas a que foram propostos. / Thanks to recent advances in molecular biology and biochemistry, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as DNA microarrays, SAGE, and more recently RNA-Seq, generating a massive volume of biological data. The mapping of gene transcription levels at large scale is motivated by the proposition that information of the functional state of an organism is broadly determined by its gene expression. However, the main limitation faced is the small number of samples (experiments) with huge dimensionalities (genes). Thus, it is necessary to develop new computational and statistics techniques to reduce the inherent estimation error committed in the presence of a small number of samples with large dimensionality. In this context, particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. The main objective of this research is to infer how genes are regulated, bringing knowledge about the molecular interactions and metabolic activities of an organism. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. In this direction, this work presents some contributions: (1) feature selection software; (2) new approach for the generation of artificial gene networks (AGN); (3) criterion function based on Tsallis entropy; (4) alternative search strategies for GRNs inference: SFFS-MR and SFFS-BA; (5) biological investigation of GRNs involved in the thiamine biosynthesis by adopting the Arabidopsis thaliana as a model plant. The feature selection software is an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools. In particular, a feature selection method for GRNs inference is also implemented in the software. Although there are several methods proposed in the literature for the modeling and identification of GRNs, an important open problem regards: how to validate such methods and its results? This work presents a new approach for validation of such algorithms by considering three main aspects: (a) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (b) computational method for GRNs identification from temporal expression data; and (c) Validation of the identified AGN-based network through comparison with the original network. Through the development of the AGN model was possible the analysis and investigation of the characteristics of GRNs inference methods, leading to the development of a comparative study of four inference methods available in literature. The evaluation of inference methods led to the development of new methodologies for this task: (a) a new criterion function based on Tsallis entropy, in order to infer the genetic inter-relationships with better precision; (b) an alternative search strategy for the GRNs inference, called SFFS-MR, which tries to exploit a local property of the regulatory gene interdependencies, which is known as intrinsically multivariate prediction; and (c) a search strategy, interactive and floating, which is based on scale-free network topology, as a global property of the GRNs, which is considered as a priori information, in order to provide a more appropriate method for this class of problems and thereby achieve results with better precision. It is also an objective of this work, to apply the developed methodology in biological data, particularly in identifying GRNs related to specific functions of the Arabidopsis thaliana. The experimental results, obtained from the application of the proposed methodologies, indicate that the respective performances of each methodology were significant and adequate to the problems that have been proposed.
119

Seleção de características e predição intrinsecamente multivariada em identificação de redes de regulação gênica / Feature selection and intrinsically multivariate prediction in gene regulatory networks identification

David Corrêa Martins Junior 01 December 2008 (has links)
Seleção de características é um tópico muito importante em aplicações de reconhecimento de padrões, especialmente em bioinformática, cujos problemas são geralmente tratados sobre um conjunto de dados envolvendo muitas variáveis e poucas observações. Este trabalho analisa aspectos de seleção de características no problema de identificação de redes de regulação gênica a partir de sinais de expressão gênica. Particularmente, propusemos um modelo de redes gênicas probabilísticas (PGN) que devolve uma rede construída a partir da aplicação recorrente de algoritmos de seleção de características orientados por uma função critério baseada em entropia condicional. Tal critério embute a estimação do erro por penalização de amostras raramente observadas. Resultados desse modelo aplicado a dados sintéticos e a conjuntos de dados de microarray de Plasmodium falciparum, um agente causador da malária, demonstram a validade dessa técnica, tendo sido capaz não apenas de reproduzir conhecimentos já produzidos anteriormente, como também de produzir novos resultados. Outro aspecto investigado nesta tese é o fenômeno da predição intrinsecamente multivariada (IMP), ou seja, o fato de um conjunto de características ser um ótimo caracterizador dos objetos em questão, mas qualquer de seus subconjuntos propriamente contidos não conseguirem representá-los de forma satisfatória. Neste trabalho, as condições para o surgimento desse fenômeno foram obtidas de forma analítica para conjuntos de 2 e 3 características em relação a uma variável alvo. No contexto de redes de regulação gênica, foram obtidas evidências de que genes alvo de conjuntos IMP possuem um enorme potencial para exercerem funções vitais em sistemas biológicos. O fenômeno conhecido como canalização é particularmente importante nesse contexto. Em dados de microarray de melanoma, constatamos que o gene DUSP1, conhecido por exercer função canalizadora, foi aquele que obteve o maior número de conjuntos de genes IMP, sendo que todos eles possuem lógicas de predição canalizadoras. Além disso, simulações computacionais para construção de redes com 3 ou mais genes mostram que o tamanho do território de um gene alvo pode ter um impacto positivo em seu teor de IMP com relação a seus preditores. Esta pode ser uma evidência que confirma a hipótese de que genes alvo de conjuntos IMP possuem a tendência de controlar diversas vias metabólicas cruciais para a manutenção das funções vitais de um organismo. / Feature selection is a crucial topic in pattern recognition applications, especially in bioinformatics, where problems usually involve data with a large number of variables and small number of observations. The present work addresses feature selection aspects in the problem of gene regulatory network identification from expression profiles. Particularly, we proposed a probabilistic genetic network model (PGN) that recovers a network constructed from the recurrent application of feature selection algorithms guided by a conditional entropy based criterion function. Such criterion embeds error estimation by penalization of rarely observed patterns. Results from this model applied to synthetic and real data sets obtained from Plasmodium falciparum microarrays, a malaria agent, demonstrate the validity of this technique. This method was able to not only reproduce previously produced knowledge, but also to produce other potentially relevant results. The intrinsically multivariate prediction (IMP) phenomenon has been also investigated. This phenomenon is related to the fact of a feature set being a nice predictor of the objects in study, but all of its properly contained subsets cannot predict such objects satisfactorily. In this work, the conditions for the rising of this phenomenon were analitically obtained for sets of 2 and 3 features regarding a target variable. In the gene regulatory networks context, evidences have been achieved in which target genes of IMP sets possess a great potential to execute vital functions in biological systems. The phenomenon known as canalization is particularly important in this context. In melanoma microarray data, we verified that DUSP1 gene, known by having canalization function, was the one which composed the largest number of IMP gene sets. It was also verified that all these sets have canalizing predictive logics. Moreover, computational simulations for generation of networks with 3 or more genes show that the territory size of a target gene can contribute positively to its IMP score with regard to its predictors. This could be an evidence that confirms the hypothesis stating that target genes of IMP sets are inclined to control several metabolic pathways essential to the maintenance of the vital functions of an organism.
120

C. Elegans Metabolic Gene Regulatory Networks: A Dissertation

Arda, H. Efsun 30 July 2010 (has links)
In multicellular organisms, determining when and where genes will be expressed is critical for their development and physiology. Transcription factors (TFs) are major specifiers of differential gene expression. By establishing physical contacts with the regulatory elements of their target genes, TFs often determine whether the target genes will be expressed or not. These physical and/or regulatory TF-DNA interactions can be modeled into gene regulatory networks (GRNs), which provide a systems-level view of differential gene expression. Thus far, much of the GRN delineation efforts focused on metazoan development, whereas the organization of GRNs that pertain to systems physiology remains mostly unexplored. My work has focused on delineating the first gene regulatory network of the nematode Caenorhabditis elegans metabolic genes, and investigating how this network relates to the energy homeostasis of the nematode. The resulting metabolic GRN consists of ~70 metabolic genes, 100 TFs and more than 500 protein–DNA interactions. It also includes novel protein-protein interactions involving the metabolic transcriptional cofactor MDT-15 and several TFs that occur in the metabolic GRN. On a global level, we found that the metabolic GRN is enriched for nuclear hormone receptors (NHRs). NHRs form a special class of TFs that can interact with diffusible biomolecules and are well-known regulators of lipid metabolism in other organisms, including humans. Interestingly, NHRs comprise the largest family of TFs in nematodes; the C. elegans genome encodes 284 NHRs, most of which are uncharacterized. In our study, we show that the C. elegans NHRs that we retrieved in the metabolic GRN organize into network modules, and that most of these NHRs function to maintain lipid homeostasis in the nematode. Network modularity has been proposed to facilitate rapid and robust changes in gene expression. Our results suggest that the C. elegans metabolic GRN may have evolved by combining NHR family expansion with the specific modular wiring of NHRs to enable the rapid adaptation of the animal to different environmental cues.

Page generated in 0.1553 seconds