1 |
In silico bacterial gene regulatory network reconstruction from sequenceFichtenholtz, Alexander Michael January 2012 (has links)
Thesis (Ph.D.)--Boston University / PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. / DNA sequencing techniques have evolved to the point where one can sequence millions of bases per minute, while our capacity to use this information has been left behind. One particularly notorious example is in the area of gene regulatory networks. A molecular study of gene regulation proceeds one protein at a time, requiring bench scientists months of work purifying transcription factors and performing DNA footprinting studies. Massive scale options like ChIP-Seq and microarrays are a step up, but still require considerable resources in terms of manpower and materials. While computational biologists have developed methods to predict protein function from sequence, gene locations from sequence, and even metabolic networks from sequence, the space of regulatory network reconstruction from sequence remains virtually untouched. Part of the reason comes from the fact that the components of a regulatory interaction, such as transcription factors and binding sites, are difficult to detect. The other, more prominent reason, is that there exists no "recognition code" to determine which transcription factors will bind which sites. I've created a pipeline to reconstruct regulatory networks starting from an unannotated complete genomic sequence for a prokaryotic organism. The pipeline predicts necessary information, such as gene locations and transcription factor sequences, using custom tools and third party software. The core step is to determine the likelihood of interaction between a TF and a binding site using a black box style recognition code developed by applying machine learning methods to databases of prokaryotic regulatory interactions. I show how one can use this pipeline to reconstruct the virtually unknown regulatory network of Bacillus anthracis. / 2031-01-01
|
2 |
Integrative approaches to modelling and knowledge discovery of molecular interactions in bioinformatics a thesis submitted to Auckland University of Technology in partial fulfilment for the degree of Doctor of Philosophy - PhD, 2008.Jain, Vishal. January 2008 (has links)
Thesis (PhD) -- AUT University, 2008. / Includes bibliographical references. Also held in print (x, 296 leaves : col. ill. ; 30 cm.) in the Archive at the City Campus (T 572.330285 JAI)
|
3 |
Role of chromatin structure and JmjC histone demethylases in the response to hypoxiaBatie, Michael January 2017 (has links)
In response to low oxygen (hypoxia), cells have evolved sophisticated gene expression programmes for survival and adaption. How the chromatin state coordinates these changes remains largely unknown. Global histone methylation changes occur in response to hypoxia, however, temporal dynamics of histone methylation changes and how they correlate with hypoxia induced gene transcription changes is ill defined. The Jumonji C (Jmjc) histone demethylases are oxygen dependent enzymes and represent a potential link between chromatin structure and oxygen sensing. Many of these enzymes are differentially expressed in hypoxia and some have been found to influence the hypoxic response. Here, the JmjC histone demethylase, KDM2B, is found to be induced at the mRNA level but not at the protein level in response to hypoxia. KDM2B was also found to regulate the transcriptional response hypoxia, in a cell type dependent manner, through control of Hypoxia Inducible Factor (HIF) subunits, HIF 1 and 2α. These findings highlight complex HIF-KDM2B crosstalk involved in the cells response to low oxygen. Additionally, it was found that various histone methylation marks are induced in the early response to hypoxia prior to hypoxia induced gene transcription changes. This demonstrates that chromatin structural marks responds rapidly to changes in oxygen availability. Furthermore the methylation landscape of 2 two active transcription histone methylation marks, H3K4me3 and H3K36me3, were mapped by ChIP sequencing in the acute response to hypoxia. This analyses found specific changes in histone methylation, which correlate with the core gene transcription changes in hypoxia, pointing towards a mechanism by which rapid chromatin changes programs the cell for hypoxic transcription. Finally, KDM5A was identified to, at least in part, regulate early hypoxia H3K4me3 changes and changes in gene expression of a subset of hypoxia responsive genes. Findings described herein provide evidence for the role of chromatin structure dynamics, mediated by chromatin modifying enzymes, in regulating the hypoxic response. Specifically, early histone methylation changes elicited in acute hypoxia may help establish a chromatin landscape for the cell to transcriptionally respond, which is essential for survival and adaptation to hypoxia. Insights into chromatin dynamics in the response to hypoxia and the role played by JmjC histone demethylases in regulating the hypoxic response has the potential for new drug discovery in diseases such as cancer, were hypoxia, epigenetics and JmjC enzymes are often implicated in disease progression.
|
4 |
A Granger causality approach to gene regulatory network reconstructionbased on data from multiple experimentsTam, Hak-fui., 譚克奎. January 2012 (has links)
The discovery of gene regulatory network (GRN) using gene expression data is one of the promising directions for deciphering biological mechanisms, which underlie many basic aspects of scientific and medical advances. In this thesis, we focus on the reconstruction of GRN from time-series data using a Granger causality (GC) approach. As there is little existing research on combining data from multiple time-series experiments, we identify the need for developing a methodology with underlying theory to combine multiple experiments for statistical significant discovery.
We derive a statistical theory for intersection of two discovered networks. Such a statistical framework is novel and intended for our GRN discovery problem. However, this theory is not limited to GRN or GC, and may be applied to other problems as long as one can take the intersection of discoveries obtained from multiple experiments (or datasets).
We propose a number of novel methods for combining data from multiple experiments. Our single underlying model (SUM) method regresses data of multiple experiments in one go, enabling GC to fully utilize the information in the original data. Based on our statistical theory and SUM, we develop new meta-analysis methods, including union of pairwise common edges (UPCE) and leave-one-out hybrid of SUM and UPCE (LOOHSU). Applications on synthetic data and real data show that our new methods give discoveries of substantially higher precision than traditional meta-analysis.
We also propose methods for estimating the precision of GC-discovered networks and thus fill in an important gap not considered in the literature. This allows us to assess how good a discovered network is in the case of unknown ground truth, which is typical in most biological applications. Our precision estimation by half-half splitting with combinations (HHSC) gives an estimate much closer to the true value compared with that computed from the Benjamini-Hochberg false discovery rate controlling procedure. Furthermore, using a network covering notion, we design a method that can identify a small number of links with high precision of around 0.8-0.9, which may relieve the burden of testing many hypothetical interactions of low precision in biological experiments.
For the situation where the number of genes is much larger than the data length, in which case full-model GC cannot be applied, GC is often applied to the genes pairwisely. We analyze how spurious causalities (false discoveries) may arise. Consequently, we demonstrate that model validation can effectively remove spurious discoveries. With our proposed implementation that model orders are fixed by the Akaike information criterion and every model is subject to validation, we report a new observation that network hubs tend to act as sources rather than receivers of interactions. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
|
5 |
Gene Regulatory Networks are a Mechanism for Drug ResistanceCamellato, Brendan January 2018 (has links)
Multidrug resistance has become a major issue in the treatment of both microbial infections and cancers. While genetically encoded drug resistance is fairly well understood, it cannot explain all observed cases of resistance, namely the ability of a subset of disease cells to persist in an otherwise susceptible population. This non-genetic resistance requires the heterogeneous expression of a drug resistance phenotype, which can be produced by certain gene regulatory network architectures. Two particular network motifs, the coherent feedforward loop (CFFL) and the positive feedback loop (PFL), have functional properties that implicate them in the development of non-genetic heterogeneity and response to changing conditions. Motivated by the observation that CFFL and PFL motifs are involved in the transcriptional regulation of multiple pleiotropic drug resistance (PDR) genes in yeast, it has been hypothesized that CFFLs and PFLs could contribute to the development of drug resistance. This hypothesis was based on model simulations and has not been tested experimentally. In this thesis, it is demonstrated experimentally that the PDR5 gene is indeed expressed heterogeneously within an isogenic population of yeast cells, and that this cell-to-cell variability enables a subset of cells to persist drug treatment. While these observations agree with model predictions, it is also observed that the resistant phenotype occurs within a subset of cells that are morphologically distinct. This subpopulation has previously been linked to abnormal mitochondrial function, which cannot be ruled out as a likely cause of the observed drug resistance. To validate the hypothesis that CFFLs and PFLs contribute to drug resistance, the expression of the PDR5 gene was placed under the control of synthetic gene regulatory networks constructed to contain different combinations of direct activation, indirect activation, and positive feedback. These networks are used to show that direct activation can provide a selective advantage enabling rapid responses, while indirect activation and positive feedback can provide a selective advantage by maintaining favourable gene expression states. These results demonstrate that a gene regulatory network combining CFFLs and PFLs can contribute to the development of drug resistance, and highlight plausible means by which cells can exploit certain network features to gain a fitness advantage.
|
6 |
The Application of the Expectation-Maximization Algorithm to the Identification of Biological ModelsChen, Shuo 09 March 2007 (has links)
With the onset of large-scale gene expression profiling, many researchers have turned their attention toward biological process modeling and system identification. The abundance of data available, while inspiring, is also daunting to interpret. Following the initial work of Rangel et al., we propose a linear model for identifying the biological model behind the data and utilize a modification of the Expectation-Maximization algorithm for training it. With our model, we explore some commonly accepted assumptions concerning sampling, discretization, and state transformations. Also, we illuminate the model complexities and interpretation difficulties caused by unknown state transformations and propose some solutions for resolving these problems. Finally, we elucidate the advantages and limitations of our linear state-space model with simulated data from several nonlinear networks. / Master of Science
|
7 |
TFAP2A in the neural crest gene regulatory network and diseaseHallberg, Andrea Rachel 01 May 2019 (has links)
The neural crest is a transient, multipotent, cell population that gives rise to several important tissues during embryonic development, including the craniofacial skeleton, peripheral nervous system, and melanocytes. The neural crest arises from the ectoderm, along with the skin and central nervous system. This process of specification is dependent on a gene regulatory network (GRN) which is made up of transcription factors that regulate each other. While we know many of the members of this GRN, the direct connections among the members are largely unsolved. Breakdown of this GRN can lead to birth defects, such as cleft lip and palate, and cancer of neural crest derivatives, such as melanoma, thus understanding the intricate details of this network is important.
The transcription factor Tfap2a is an important member of the GRN, as loss of tfap2a and its paralog tfap2c leads to loss of pre-migratory neural crest and all neural crest derivatives. Despite its importance in this network little is known about how its expression is regulated. We hypothesized that, due to its importance in this network, it will have multiple enhancers that drive its expression in the neural crest. We have identified two neural crest enhancers of tfap2a. We found that one of these enhancers is responsive to WNT signals and is maintained by forming a positive feedback loop with Sox10. Our results suggest that this enhancer is important for both induction and maintenance of tfap2a expression in the neural crest.
Tfap2 paralogs are important at several different stages throughout neural crest lineage specification. However, the only direct target of Tfap2a that has been identified is sox10. Thus, we wanted to determine the direct targets of Tfap2 in this network. Through the integration of several data sets, including ATAC-seq and expression profiling of tfap2a/c double mutants, we have identified several direct targets including sox9b and alx1.
Melanoma is cancer of the melanocytes, a neural crest derivative. Recent studies have shown that melanoma and the neural crest share genetic similarities. TFAP2A expression is decreased in metastatic melanoma compared to primary tumors, thus we wanted to investigate the mechanism of TFAP2A in metastatic melanoma. We found that the promoter of TFAP2A is hypermethylated in some metastatic melanoma tumors. This was confirmed by samples in the TCGA database. Hypermethylation of the promoter contributes to the downregulation of TFAP2A in metastatic melanoma.
In conclusion, we have further illuminated the connections among transcription factors in the GRN important for neural crest lineage specification. Further, we have identified a new mechanism regulating TFAP2A expression in metastatic melanoma. Together, these studies reveal regulatory mechanisms of TFAP2A gene expression.
|
8 |
Modeling gene regulatory networks using a state-space model with time delaysKoh, Chu Shin 17 March 2008
Computational gene regulation models provide a means for scientists to draw biological inferences from large-scale gene expression data. The expression data used in the models usually are obtained in a time series in response to an initial perturbation. The common objective is to reverse engineer the internal structure and function of the genetic network from observing and analyzing its output in a time-based fashion. In many studies (Wang [39], Resendis-Antonio [31]), each gene is considered to have a regulatory effect on another gene. A network association is created based on the correlation of expression data. Highly correlated genes are thought to be co-regulated by similar (if not the same) mechanism. Gene co-regulation network models disregard the cascading effects of regulatory genes such as transcription factors, which could be missing in the expression data or are expressed at very low concentrations and thus undetectable by the instrument. As an alternative to the former methods, some authors (Wu et al. [40], Rangel et al. [28], Li et al. [20]) have proposed treating expression data solely as observation values of a state-space system and derive conceptual internal regulatory elements, i.e. the state-variables, from these measurements. This approach allows one to model unknown biological factors as hidden variables and therefore can potentially reveal more complex regulatory relations.<p>In a preliminary portion of this work, two state-space models developed by Rangel et al. and Wu et al. respectively were compared. The Rangel model provides a means for constructing a statistically reliable regulatory network. The model is demonstrated on highly replicated Tcell activation data [28]. On the other hand, Wu et al. develop a time-delay module that takes transcriptional delay dynamics into consideration. The model is demonstrated on non-replicated yeast cell-cycle data [40]. Both models presume time-invariant expression data. Our attempt to use the Wu model to infer small gene regulatory network in yeast was not successful. Thus we develop a new modeling tool incorporating a time-lag module and a novel method for constructing regulatory networks from non-replicated data. The latter involves an alternative scheme for determining network connectivity. Finally, we evaluate the networks generated from the original and extended models based on a priori biological knowledge.
|
9 |
Inference of gene regulatory networks for Mus musculus by incorporating network motifs from yeast.Weishaupt, Holger January 2007 (has links)
In recent time particular interest has been drawn to the inference of gene regulatory networks from microarray gene expression data. But despite major improvements with data based methods, the network reconstruction from expression data alone still presents a computationally complex (NP-hard) problem. In this work it is incorporated additional information – regulatory motifs from yeast, when inferring a gene regulatory network for mouse genes. It was put forward the hypothesis that regulatory patterns analogous to these motifs are present in the set of mouse genes and can be identified by comparing yeast and mouse genes in terms of sequence similarity or Gene Ontology (The Gene Ontology Consortium 2000) annotations. In order to examine this hypothesis, small permutations of genes with high similarity to such yeast gene regulatory motifs were first tested against simple data-driven regulatory networks by means of consistency with the expression data. And secondly, using the best scored interactions provided by these permutations it were then inferred networks for the whole set of mouse genes. The results showed that individual permutations of genes with a high similarity to a given yeast motif did not perform better than low scored motifs and that complete networks, which were inferred from regulatory interactions provided by permutations, did also neither show any noticeable improvement over the corresponding data-driven network nor a high consistency with the expression data at all. It was therefore found that the hypothesis failed, i.e. neither the use of sequence similarity nor searching for identical functional annotations between mouse and yeast genes allowed to identify sets of genes that showed a high consistency with the expression data or would have allowed for an improved gene regulatory network inference.
|
10 |
Modeling gene regulatory networks using a state-space model with time delaysKoh, Chu Shin 17 March 2008 (has links)
Computational gene regulation models provide a means for scientists to draw biological inferences from large-scale gene expression data. The expression data used in the models usually are obtained in a time series in response to an initial perturbation. The common objective is to reverse engineer the internal structure and function of the genetic network from observing and analyzing its output in a time-based fashion. In many studies (Wang [39], Resendis-Antonio [31]), each gene is considered to have a regulatory effect on another gene. A network association is created based on the correlation of expression data. Highly correlated genes are thought to be co-regulated by similar (if not the same) mechanism. Gene co-regulation network models disregard the cascading effects of regulatory genes such as transcription factors, which could be missing in the expression data or are expressed at very low concentrations and thus undetectable by the instrument. As an alternative to the former methods, some authors (Wu et al. [40], Rangel et al. [28], Li et al. [20]) have proposed treating expression data solely as observation values of a state-space system and derive conceptual internal regulatory elements, i.e. the state-variables, from these measurements. This approach allows one to model unknown biological factors as hidden variables and therefore can potentially reveal more complex regulatory relations.<p>In a preliminary portion of this work, two state-space models developed by Rangel et al. and Wu et al. respectively were compared. The Rangel model provides a means for constructing a statistically reliable regulatory network. The model is demonstrated on highly replicated Tcell activation data [28]. On the other hand, Wu et al. develop a time-delay module that takes transcriptional delay dynamics into consideration. The model is demonstrated on non-replicated yeast cell-cycle data [40]. Both models presume time-invariant expression data. Our attempt to use the Wu model to infer small gene regulatory network in yeast was not successful. Thus we develop a new modeling tool incorporating a time-lag module and a novel method for constructing regulatory networks from non-replicated data. The latter involves an alternative scheme for determining network connectivity. Finally, we evaluate the networks generated from the original and extended models based on a priori biological knowledge.
|
Page generated in 0.0518 seconds