Global ETD Search

1	HAPPI: A Bioinformatics Database Platform Enabling Network Biology Studies Mamidipalli, SudhaRani 29 June 2006 (has links) Submitted to the faculty of the informatics Graduate Program in partial fulfillment of the requirements for the degree Master of Science in Bioinformatics in the School of Informatics, Indiana University, May 2006 / The publication of the draft human genome consisting of 30,000 genes is merely the beginning of genome biology. A new way to understand the complexity and richness of molecular and cellular function of proteins in biological processes is through understanding of biological networks. These networks include protein-protein interaction networks, gene regulatory networks, and metabolic networks. In this thesis, we focus on human protein-protein interaction networks using informatics techniques. First, we performed a thorough literature survey to document different experimental methods to detect and collect protein interactions, current public databases that store these interactions, computational software to predict, validate and interpret protein networks. Then, we developed the Human Annotated Protein-Protein Interaction (HAPPI) database to manage a wealth of integrated information related to protein functions, protein-protein functional links, and protein-protein interactions. Approximately 12900 proteins from Swissprot, 57900 proteins from Trembl, 52186 protein-domains from Swisspfam, 4084 gene-pathways from KEGG, 2403190 interactions from STRING and 51207 interactions from OPHID public databases were integrated into a single relational database platform using Oracle 10g on an IU Supercomputing grid. We further assigned a confidence score to each protein interaction pair to help assess the quality and reliability of protein-protein interaction. We hosted the database on the Discovery Informatics and Computing web site, which is now publicly accessible. HAPPI database differs from other protein interaction databases in these following aspects: 1) It focuses on human protein interactions and contains approximately 860000 high-confidence protein interaction records—one of the most complete and reliable sources of human protein interaction today; 2) It includes thorough protein domain, gene and pathway information of interacting proteins, therefore providing a whole view of protein functional information; 3) It contains a consistent ranking score that can be used to gauge the confidence of protein interactions. To show the benefits of HAPPI database, we performed a case study using Insulin Signaling pathway in collaboration with a biology team on campus. We began by taking two sets of proteins that were previously well studied as separate processes, set A and set B. We queried these proteins against the HAPPI database, and derived high-confidence protein interaction data sets annotated with known KEGG pathways. We then organized these protein interactions on a network diagram. The end result shows many novel hub proteins that connect set A or B proteins. Some hub proteins are even novel members outside of any annotated pathway, making them interesting targets to validate for subsequent biological studies. Bioinformatics Database Genome Regulatory network
2	Developing machine learning tools to understand transcriptional regulation in plants Song, Qi 09 September 2019 (has links) Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance. Recent advances of genomic technologies have led to the generation of much genomic data for the model plant, Arabidopsis. To understand gene responses activated by specific external stress signals, these large-scale data sets need to be analyzed to generate new insight of gene functions in stress responses. This poses new computational challenges of mining gene associations and reconstructing regulatory interactions from large-scale data sets. In this dissertation, several computational tools were developed to address the challenges. In Chapter 2, ConSReg was developed to infer condition-specific regulatory interactions and prioritize transcription factors (TFs) that are likely to play condition specific regulatory roles. Comprehensive investigation was performed to optimize the performance of ConSReg and a systematic recovery of nitrogen response TFs was performed to evaluate ConSReg. In Chapter 3, CoReg was developed to infer co-regulation between genes, using only regulatory networks as input. CoReg was compared to other computational methods and the results showed that CoReg outperformed other methods. CoReg was further applied to identified modules in regulatory network generated from DAP-seq (DNA affinity purification sequencing). Using a large expression dataset generated under many abiotic stress treatments, many regulatory modules with common regulatory edges were found to be highly co-expressed, suggesting that target modules are structurally stable modules under abiotic stress conditions. In Chapter 4, exploratory analysis was performed to classify cell types for Arabidopsis root single cell RNA-seq data. This is a first step towards construction of a cell-type-specific regulatory network for Arabidopsis root cells, which is important for improving current understanding of stress response. / Doctor of Philosophy / Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance to plants. Genomics technology has been used in past decade to generate gene expression data under different abiotic stresses for the model plant, Arabidopsis. Recent new genomic technologies, such as DAP-seq, have generated large scale regulatory maps that provide information regarding which gene has the potential to regulate other genes in the genome. However, this technology does not provide context specific interactions. It is unknown which transcription factor can regulate which gene under a specific abiotic stress condition. To address this challenge, several computational tools were developed to identify regulatory interactions and co-regulating genes for stress response. In addition, using single cell RNA-seq data generated from the model plant organism Arabidopsis, preliminary analysis was performed to build model that classifies Arabidopsis root cell types. This analysis is the first step towards the ultimate goal of constructing cell-typespecific regulatory network for Arabidopsis, which is important for improving current understanding of stress response in plants. regulatory network Machine learning genomics
3	Inference of gene regulatory networks for Mus musculus by incorporating network motifs from yeast. Weishaupt, Holger January 2007 (has links) In recent time particular interest has been drawn to the inference of gene regulatory networks from microarray gene expression data. But despite major improvements with data based methods, the network reconstruction from expression data alone still presents a computationally complex (NP-hard) problem. In this work it is incorporated additional information – regulatory motifs from yeast, when inferring a gene regulatory network for mouse genes. It was put forward the hypothesis that regulatory patterns analogous to these motifs are present in the set of mouse genes and can be identified by comparing yeast and mouse genes in terms of sequence similarity or Gene Ontology (The Gene Ontology Consortium 2000) annotations. In order to examine this hypothesis, small permutations of genes with high similarity to such yeast gene regulatory motifs were first tested against simple data-driven regulatory networks by means of consistency with the expression data. And secondly, using the best scored interactions provided by these permutations it were then inferred networks for the whole set of mouse genes. The results showed that individual permutations of genes with a high similarity to a given yeast motif did not perform better than low scored motifs and that complete networks, which were inferred from regulatory interactions provided by permutations, did also neither show any noticeable improvement over the corresponding data-driven network nor a high consistency with the expression data at all. It was therefore found that the hypothesis failed, i.e. neither the use of sequence similarity nor searching for identical functional annotations between mouse and yeast genes allowed to identify sets of genes that showed a high consistency with the expression data or would have allowed for an improved gene regulatory network inference. gene regulatory network regulatory motif Bioinformatics Bioinformatik
4	Inference of gene regulatory networks for Mus musculus by incorporating network motifs from yeast. Weishaupt, Holger January 2007 (has links) <p>In recent time particular interest has been drawn to the inference of gene regulatory networks from microarray gene expression data. But despite major improvements with data based methods, the network reconstruction from expression data alone still presents a computationally complex (NP-hard) problem. In this work it is incorporated additional information – regulatory motifs from yeast, when inferring a gene regulatory network for mouse genes. It was put forward the hypothesis that regulatory patterns analogous to these motifs are present in the set of mouse genes and can be identified by comparing yeast and mouse genes in terms of sequence similarity or Gene Ontology (The Gene Ontology Consortium 2000) annotations.</p><p>In order to examine this hypothesis, small permutations of genes with high similarity to such yeast gene regulatory motifs were first tested against simple data-driven regulatory networks by means of consistency with the expression data. And secondly, using the best scored interactions provided by these permutations it were then inferred networks for the whole set of mouse genes.</p><p>The results showed that individual permutations of genes with a high similarity to a given yeast motif did not perform better than low scored motifs and that complete networks, which were inferred from regulatory interactions provided by permutations, did also neither show any noticeable improvement over the corresponding data-driven network nor a high consistency with the expression data at all.</p><p>It was therefore found that the hypothesis failed, i.e. neither the use of sequence similarity nor searching for identical functional annotations between mouse and yeast genes allowed to identify sets of genes that showed a high consistency with the expression data or would have allowed for an improved gene regulatory network inference.</p> gene regulatory network regulatory motif Bioinformatics Bioinformatik
5	The Potential Power of Dynamics in Epistasis Analysis Awdeh, Aseel January 2015 (has links) Inferring regulatory relationships between genes, including the direction and the nature of influence between them, is the foremost problem in the field of genetics. One classical approach to this problem is epistasis analysis. Broadly speaking, epistasis analysis infers the regulatory relationships between a pair of genes in a genetic pathway by considering the patterns of change in an observable trait resulting from single and double deletion of genes. More specifically, a “surprising” situation occurs when the phenotype of a double mutant has a similar, aggravating or alleviating effect compared to the phenotype resulting from the single deletion of either one of the genes. As useful as this broad approach has been, there are limits to its ability to discriminate alternative pathway structures, meaning it is not always possible to infer the relationship between the genes. Here, we explore the possibility of dynamic epistasis analysis. In addition to performing genetic perturbations, we drive a genetic pathway with a dynamic, time-varying upstream signal, where the phenotypic consequence is measured at each time step. We explore the theoretical power of dynamic epistasis analysis by conducting an identifiability analysis of Boolean models of genetic pathways, comparing static and dynamic approaches. We also explore the identifiability of individual links in the pathway. Through these evaluations, we quantify how helpful the addition of dynamics is. We believe that a dynamic input in addition to epistasis analysis is a powerful tool to discriminate between different networks. Our primary findings show that the use of a dynamic input signal alone, without genetic perturbations, appears to be very weak in comparison with the more traditional genetic approaches based on the deletion of genes. However, the combination of dynamical input with genetic perturbations is far more powerful than the classical epistasis analysis approach. In all cases, we find that even relatively simple input dynamics with gene deletions greatly increases the power of epistasis analysis to discriminate alternative network structures and to confidently identify individual links in a network. Our positive results show the potential value of dynamics in epistasis analysis. epistasis gene regulatory network phenotype dynamics
6	Transcriptional co-regulation of microRNAs and protein-coding genes Webber, Aaron January 2013 (has links) This thesis was presented by Aaron Webber on the 4th December 2013 for the degree of Doctor of Philosophy from the University of Manchester. The title of this thesis is ‘Transcriptional co-regulation of microRNAs and protein-coding genes’. The thesis relates to gene expression regulation within humans and closely related primate species. We have investigated the binding site distributions from publically available ChIP-seq data of 117 transcription regulatory factors (TRFs) within the human genome. These were mapped to cis-regulatory regions of two major classes of genes,  20,000 genes encoding proteins and  1500 genes encoding microRNAs. MicroRNAs are short 20 - 24 nt noncoding RNAs which bind complementary regions within target mRNAs to repress translation. The complete collection of ChIP-seq binding site data is related to genomic associations between protein-coding and microRNA genes, and to the expression patterns and functions of both gene types across human tissues. We show that microRNA genes are associated with highly regulated protein-coding gene regions, and show rigorously that transcriptional regulation is greater than expected, given properties of these protein-coding genes. We find enrichment in developmental proteins among protein-coding genes hosting microRNA sequences. Novel subclasses of microRNAs are identified that lie outside of protein-coding genes yet may still be expressed from a shared promoter region with their protein-coding neighbours. We show that such microRNAs are more likely to form regulatory feedback loops with the transcriptional regulators lying in the upstream protein-coding promoter region. We show that when a microRNA and a TRF regulate one another, the TRF is more likely to sometimes function as a repressor. As in many studies, the data show that microRNAs lying downstream of particular TRFs target significantly many genes in common with these TRFs. We then demonstrate that the prevalence of such TRF/microRNA regulatory partnerships relates directly to the variation in mRNA expression across human tissues, with the least variable mRNAs having the most significant enrichment in such partnerships. This result is connected to theory describing the buffering of gene expression variation by microRNAs. Taken together, our study has demonstrated significant novel linkages between the transcriptional TRF and post-transcriptional microRNA-mediated regulatory layers. We finally consider transcriptional regulators alone, by mapping these to genes clustered on the basis of their expression patterns through time, within the context of CD4+ T cells from African green monkeys and Rhesus macaques infected with Simian immunodeficiency virus (SIV). African green monkeys maintain a functioning immune system despite never clearing the virus, while in rhesus macaques, the immune system becomes chronically stimulated leading to pathogenesis. Gene expression clusters were identified characterizing the natural and pathogenic host systems. We map transcriptional regulators to these expression clusters and demonstrate significant yet unexpected co-binding by two heterodimers (STAT1:STAT2 and BATF:IRF4) over key viral response genes. From 34 structural families of TRFs, we demonstrate that bZIPs, STATs and IRFs are the most frequently perturbed upon SIV infection. Our work therefore contributes to the characterization of both natural and pathogenic SIV infections, with longer term implications for HIV therapeutics. 572.8
7	Unveiling the effect of global regulators in the regulatory network for biofilm formation in Escherichia coli / Entendendo o efeito dos reguladores globais na rede regulatória para a formação de biofilme em Escherichia coli Amores, Gerardo Ruiz 29 March 2017 (has links) In nature, biofilm is a complex structure resulted of multicellular bacterial communities that provide important nutritional functions and the acquisition of protective traits such as antibiotics resistance and horizontal gene transfer. The development from the planktonic, lonely bacteria, to the mature multilayered biofilm structure consists of three main phases: motility, attachment and biofilm maturation. At cellular level, the process is controlled by several genes such as flhD, fliA, rpoS, csgD, adrA, cpxR all acting as master regulators. Additionally, the global regulators CRP, IHF, Fis, and others in less frequency, have been related to biofilm formation, although blurry information has been provided. In this thesis we used synthetic, molecular and cellular biology approaches to understand the effect of CRP, IHF and Fis in the transcriptional regulatory network in the bacterium Escherichia coli. In the first chapter, we employed network analysis to reconstruct and analyze part of the entire regulatory network described to modulate the flagella-biofilm program. With this analysis we identified some critical interactions responsible for the planktonic-biofilm transition. Next, we selected the top ten effectors nodes of the network and cloned the promoter region of those genes in a reporter system. As extensively explained in chapter II, this system allowed us to validate as well as suggest new interactions in the network. Additionally, the measurement of the promoter activity during bacterial development show that CRP, IHF and Fis differentially modulate most of the surveyed genes suggesting that those Global Regulators participate to modulate gene expression in different phases of the planktonic-biofilm development. At chapter three, to get a better overview of the entire process, we performed motility, adherence/early biofilm and mature biofilm assays. We describe the intrinsic ability of E. coli to perform motility, adherence and mature biofilm at 37?C. In contrast, the absence of ihf, fis as well as Carbon Catabolite Repression (CCR), lead to altered phenotypes at both motility and biofilm development. At the end, we discussed how the changes of promoter activity of target genes, together with our network analysis, could explain part of the altered phenotypes observed. For instance, we observed changes at the main stress responders rpoS and rpoE that, in combination with alterations at specific genes such as fliA, can explain the enhanced motility in the E. coli ?ihf strain. Altogether, in this thesis, we provided evidence that CRP, IHF and Fis control the activity of the promoter regions of genes involved in the planktonic-biofilm development. / Na natureza, o biofilme é uma estrutura complexa resultante de comunidades bacterianas multicelulares que fornece importantes funções nutricionais e a aquisição de traços de proteção como resistência a antibióticos e transferência horizontal de genes. O desenvolvimento das bactérias planctônicas solitárias para uma estrutura de biofilme maduro consiste em três fases principais: motilidade, fixação e maturação do biofilme. Ao nível celular, o processo é controlado por vários genes tais como flhD, fliA, rpoS, csgD, adrA, cpxR, todos agindo como reguladores mestre. Além disso, os reguladores globais CRP, IHF, Fis e outros em menor freqüência, têm sido relacionados à formação de biofilme, embora tenham sido fornecidas informações nao conclusivas sobre esse processo. Nesta tese foram utilizadas abordagens de bioinformática, assim como de biologia molecular e celular para entender o efeito de CRP, IHF e Fis na rede reguladora da transição de motilidade para biofilme na bactéria Escherichia coli. No primeiro capítulo, utilizamos a análise de rede para reconstruir e analisar parte da rede regulatória descrita para modular o programa flagelo-biofilme. Com esta análise identificamos algumas interações críticas responsáveis pela transição planctônica-biofilme. Em seguida, selecionamos os dez principais nós efetores da rede e clonamos a região promotora desses genes em um sistema repórter. Conforme explicado amplamente no capítulo II, este sistema nos permitiu validar e sugerir novas interações na rede. Adicionalmente, a medição da atividade do promotor durante o desenvolvimento bacteriano mostra que a CRP, a IHF e a Fis modulam diferencialmente a maioria dos genes analisados sugerindo que estes Reguladores Globais participam para modular a expressão génica em diferentes fases do desenvolvimento de estado planctónico para biofilme. No capítulo três, para obter uma melhor visão geral de todo o processo, realizamos ensaios de motilidade, aderência / biofilme precoce e biofilmes maduros. Descrevemos a capacidade intrínseca de E. coli para realizar motilidade, adesão e biofilme maduro a 37 °C. Em contraste, a ausência de ihf, fis, bem como o fenômeno de Repressão de Catabolite de Carbono (CCR), levam a fenótipos alterados, tanto na motilidade como no desenvolvimento do biofilme. No final, discutimos como as mudanças da atividade do promotor de genes alvo, juntamente com a nossa análise de rede, poderia !xi explicar parte dos fenótipos alterados observados. Por exemplo, observamos mudanças nos principais respondedores de estresse rpoS e rpoE que, em combinação com alterações em genes específicos como fliA, podem explicar a motilidade aumentada na estirpe de E. coli ?ihf. Em conjunto, nesta tese, apresentamos evidências de que CRP, IHF e Fis controlam a atividade das regiões promotoras de genes envolvidos no desenvolvimento planctônico-biofilme. Biofilm Biofilme Redes regulatórias Regulação da transcrição Regulatory network Transcriptional regulation
8	A Gene Regulatory Network for the Specification of Immunocytes in an Invertebrate Model System Solek, Cynthia 31 August 2012 (has links) Hematopoietic systems in vertebrates have been the focus of intense study. However immunocyte development is well characterized in very few invertebrate groups. The sea urchin is an attractive model for the study of immune cell development. Larval immunocytes, pigment cells and derivatives of the blastocoelar cells, emerge from a small population of precursors specified at blastula stage. Analyses from the genome reveal a complex system of immune receptors and effectors and a near complete set of homologues of vertebrate transcriptional regulators. Characterization of the expression profile and function of sea urchin homologues of key vertebrate hematopoietic transcription factors imply a conserved role in immunocyte development. SpGatac, an orthologue of the vertebrate Gata-1/2/3 transcription factors and SpScl, an orthologue of Scl/Tal-2/Lyl-1 transcription factors are both required for immune cell specification in the embryo. An important cis-regulatory mechanism that restricts SpGatac expression to the blastocoelar cells involves repression by SpGcm in the pigment cells. Characterization of the expression of several additional transcription factors, including SpE2A, an orthologue of vertebrate E2A/HEB/ITF2, SpId, an orthologue of the Class V bHLH factors that modulate E-protein function, and SpLmo2, an orthologue of the cofactor part of the transcriptional complex that includes Scl and Gata family members, suggests the existence of a conserved regulatory complex for hematopoiesis. Two isoforms of the SpE2A gene were identified. The shorter isoform shares genomic organization and sequence conservation with the mouse paralogue of E2A, HEBAlt. Expression of SpE2A and SpE2AAlt is consistent with a function in immunocyte development in the sea urchin embryo. Findings of the counterpart to a key vertebrate regulatory system functioning in the development of immunocytes in the simple sea urchin embryo lay the foundation for comparative immunocyte developmental gene regulatory network analyses. These will in turn lead to a greater understanding of the evolution of immune systems across phyla and will provide simple invertebrate model systems for detailed comparative investigations of regulatory function with direct relevance to vertebrates. development immune cells gene regulatory network transcription evolution 0307
9	A Gene Regulatory Network for the Specification of Immunocytes in an Invertebrate Model System Solek, Cynthia 31 August 2012 (has links) Hematopoietic systems in vertebrates have been the focus of intense study. However immunocyte development is well characterized in very few invertebrate groups. The sea urchin is an attractive model for the study of immune cell development. Larval immunocytes, pigment cells and derivatives of the blastocoelar cells, emerge from a small population of precursors specified at blastula stage. Analyses from the genome reveal a complex system of immune receptors and effectors and a near complete set of homologues of vertebrate transcriptional regulators. Characterization of the expression profile and function of sea urchin homologues of key vertebrate hematopoietic transcription factors imply a conserved role in immunocyte development. SpGatac, an orthologue of the vertebrate Gata-1/2/3 transcription factors and SpScl, an orthologue of Scl/Tal-2/Lyl-1 transcription factors are both required for immune cell specification in the embryo. An important cis-regulatory mechanism that restricts SpGatac expression to the blastocoelar cells involves repression by SpGcm in the pigment cells. Characterization of the expression of several additional transcription factors, including SpE2A, an orthologue of vertebrate E2A/HEB/ITF2, SpId, an orthologue of the Class V bHLH factors that modulate E-protein function, and SpLmo2, an orthologue of the cofactor part of the transcriptional complex that includes Scl and Gata family members, suggests the existence of a conserved regulatory complex for hematopoiesis. Two isoforms of the SpE2A gene were identified. The shorter isoform shares genomic organization and sequence conservation with the mouse paralogue of E2A, HEBAlt. Expression of SpE2A and SpE2AAlt is consistent with a function in immunocyte development in the sea urchin embryo. Findings of the counterpart to a key vertebrate regulatory system functioning in the development of immunocytes in the simple sea urchin embryo lay the foundation for comparative immunocyte developmental gene regulatory network analyses. These will in turn lead to a greater understanding of the evolution of immune systems across phyla and will provide simple invertebrate model systems for detailed comparative investigations of regulatory function with direct relevance to vertebrates. development immune cells gene regulatory network transcription evolution 0307
10	Comparisons of statistical modeling for constructing gene regulatory networks Chen, Xiaohui 11 1900 (has links) Genetic regulatory networks are of great importance in terms of scientific interests and practical medical importance. Since a number of high-throughput measurement devices are available, such as microarrays and sequencing techniques, regulatory networks have been intensively studied over the last decade. Based on these high-throughput data sets, statistical interpretations of these billions of bits are crucial for biologist to extract meaningful results. In this thesis, we compare a variety of existing regression models and apply them to construct regulatory networks which span trancription factors and microRNAs. We also propose an extended algorithm to address the local optimum issue in finding the Maximum A Posterjorj estimator. An E. coli mRNA expression microarray data set with known bona fide interactions is used to evaluate our models and we show that our regression networks with a properly chosen prior can perform comparably to the state-of-the-art regulatory network construction algorithm. Finally, we apply our models on a p53-related data set, NCI-60 data. By further incorporating available prior structural information from sequencing data, we identify several significantly enriched interactions with cell proliferation function. In both of the two data sets, we select specific examples to show that many regulatory interactions can be confirmed by previous studies or functional enrichment analysis. Through comparing statistical models, we conclude from the project that combining different models with over-representation analysis and prior structural information can improve the quality of prediction and facilitate biological interpretation. Keywords: regulatory network, variable selection, penalized maximum likelihood estimation, optimization, functional enrichment analysis. Regulatory network Variable selection Optimization Functional enrichment analysis

Search results