Global ETD Search

111	ASSESSMENT OF ORTHOLOGY IDENTIFICATION APPROACHES AND THE IMPACT OF GENE FUSION AND FISSION IN BACTERIA Sung, WL Wilson 10 1900 (has links) <p>Orthology identification is central to comparative and evolutionary genomics and is an active area of research. Despite a recent shift towards tree reconciliation and other phylogenetic methods, previous comparisons between different algorithms relied on real datasets where true orthology relationships are unknown and did not conclusively show whether phylogenetic methods truly outperform sequence similarity-based methods. Using simulated datasets generated from programs we developed, we show that tree reconciliation does perform better than similarity-based methods when the true species phylogeny is known. Even slight deviations in the species phylogeny can have adverse effects on the performance of reconciliation algorithms and in those cases similarity-based methods may perform better. Fusion and fission complicate orthology identification and are not explicitly considered in most existing algorithms. Programs designed specifically to investigate fusion and fission events are either unavailable or are not specific enough to identify events affecting orthologous genes. We developed a pipeline of programs called FusionFinder that perform this task, gaining new insights to the contributions of fusion and fission to bacterial protein evolution and uncover an unexpected abundance of fissions in <em>Bacillus anthracis</em> that to our knowledge yet to be reported.</p> / Master of Science (MS) ORTHOLOGS SIMULATION GENOME SEQUENCE PROTEINS GENE FUSION GENE FISSION Computational Biology Computational Biology
112	Machine Learning in Computational Biology: Models of Alternative Splicing Shai, Ofer 03 March 2010 (has links) Alternative splicing, the process by which a single gene may code for similar but different proteins, is an important process in biology, linked to development, cellular differentiation, genetic diseases, and more. Genome-wide analysis of alternative splicing patterns and regulation has been recently made possible due to new high throughput techniques for monitoring gene expression and genomic sequencing. This thesis introduces two algorithms for alternative splicing analysis based on large microarray and genomic sequence data. The algorithms, based on generative probabilistic models that capture structure and patterns in the data, are used to study global properties of alternative splicing. In the first part of the thesis, a microarray platform for monitoring alternative splicing is introduced. A spatial noise removal algorithm that removes artifacts and improves data fidelity is presented. The GenASAP algorithm (generative model for alternative splicing array platform) models the non-linear process in which targeted molecules bind to a microarray’s probes and is used to predict patterns of alternative splicing. Two versions of GenASAP have been developed. The first uses variational approximation to infer the relative amounts of the targeted molecules, while the second incorporates a more accurate noise and generative model and utilizes Markov chain Monte Carlo (MCMC) sampling. GenASAP, the first method to provide quantitative predictions of alternative splicing patterns on large scale data sets, is shown to generate useful and precise predictions based on independent RT-PCR validation (a slow but more accurate approach to measuring cellular expression patterns). In the second part of the thesis, the results obtained by GenASAP are analysed to reveal jointly regulated genes. The sequences of the genes are examined for potential regulatory factors binding sites using a new motif finding algorithm designed for this purpose. The motif finding algorithm, called GenBITES (generative model for binding sites) uses a fully Bayesian generative model for sequences, and the MCMC approach used for inference in the model includes moves that can efficiently create or delete motifs, and extend or contract the width of existing motifs. GenBITES has been applied to several synthetic and real data sets, and is shown to be highly competitive at a task for which many algorithms already exist. Although developed to analyze alternative splicing data, GenBITES outperforms most reported results on a benchmark data set based on transcription data. Machine Learning Graphical Models Computational Biology Alternative Splicing
113	Collective analysis of multiple high-throughput gene expression datasets Abu Jamous, Basel January 2015 (has links) Modern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts. 572.8
114	Quantitative Models of Calcium-Dependent Protein Signaling in Neuronal Dendritic Spines Matthew C Pharris (6848951) 15 August 2019 (has links) <p><a> Worldwide, as many as 1 billion people suffer from neurological disorders. Fundamentally, neurological disorders are caused by dysregulation of biochemical signaling within neurons, leading to deficits in learning and memory formation. To identify better preventative and therapeutic strategies for patients of neurological disorders, we require a better understanding of how biochemical signaling is regulated within neurons.</a></p> <p> Biochemical signaling at the connections between neurons, called synapses, regulates dynamic shifts in a synapse’s size and connective strength. Called synaptic plasticity, these shifts are initiated by calcium ion (Ca<sup>2+</sup>) flux into message-receiving structures called dendritic spines. Within dendritic spines, Ca<sup>2+</sup> binds sensor proteins such as calmodulin (CaM). Importantly, Ca<sup>2+</sup>/CaM may bind and activate a wide variety of proteins, which subsequently facilitate signaling pathways regulating the dendritic spine’s size and connective strength. </p> <p>In this thesis, I use computational models to characterize molecular mechanisms regulating Ca<sup>2+</sup>-dependent protein signaling within the dendritic spine. Specifically, I explore how Ca<sup>2+</sup>/CaM differentially activates binding partners and how these binding partners transduce signals downstream. For this, I present deterministic models of Ca<sup>2+</sup>, CaM, and CaM-dependent proteins, and in analyzing model output I demonstrate in-part that competition for CaM-binding alone may be sufficient to set the Ca<sup>2+</sup> frequency-dependence of protein activation. Subsequently, I adapt my deterministic models into particle-based, spatial-stochastic frameworks to quantify how spatial effects influence model output, showing evidence that spatial gradients of Ca<sup>2+</sup>/CaM may set spatial gradients of activated proteins downstream. Additionally, I incorporate into my models the most detailed model to-date of Ca<sup>2+</sup>/CaM-dependent protein kinase II (CaMKII), a multi-subunit protein essential to synaptic plasticity. With this detailed model of CaMKII, my analysis suggests that the many subunits of CaMKII provide avidity effects that significantly increase the protein’s effective affinity for binding partners, particularly Ca<sup>2+</sup>/CaM. Altogether, this thesis provides a detailed analysis of Ca<sup>2+</sup>-dependent signaling within dendritic spines, characterizing molecular mechanisms that may be useful for the development of novel therapeutics for patients of neurological disorders. </p> Protein Signaling Neuroscience Computational Biology
115	A bioinformatics approach to the identification of type 2 diabetes susceptibility gene variants in Africans Oduaran, Ovokeraye Hilda 08 April 2015 (has links) Type 2 diabetes (T2D) is a metabolic disease that results from complex interactions between the environment, the genetic variation and epigenetic regulation of gene expression in individuals. Beta-cell dysfunction and insulin resistance are regarded as the hallmarks of the disease as the common presentation of T2D is the inability of beta-cells to adequately respond to the insulin demands of the body. The prevalence of T2D in Africa, and particularly South Africa, is on the rise. This is very likely the result of the combination of genetic susceptibility with increasing availability and accessibility of relatively cheap, highly palatable, calorie-dense meals with no corresponding lifestyle adjustment. This study aims to utilize available data from GWAS and gene expression arrays to identify potential variants that likely influence T2D susceptibility in African populations. Two public data repositories were mined – the National Center for Biotechnology Information’s (NCBI) Gene Expression Omnibus (GEO) and the National Human Genome Research Institute’s (NHGRI) GWAS Catalog. The criteria for selecting the studies for inclusion were based on ten descriptive T2D-related terms taken from the GWAS catalog’s pre-defined search categories. These terms were also applied to the selection of gene expression studies in GEO. These terms are: “fasting glucose-related traits”, “fasting insulin-related traits”, “fasting plasma glucose”, “insulin resistance/response”, “insulin traits”, “diabetes-related insulin traits”, “pro insulin levels” “Type 2 diabetes”, “type 2 diabetes and 6 quantitative traits” and “type 2 diabetes and other traits”. Ten Affymetrix platform-based studies in human tissues were chosen from GEO using these criteria. A Benjamin-Hochberg adjusted p-value of 0.05 was set as a cut-off for significant differentially expressed genes (7,887 genes) with 497 genes occurring in two or more studies, based on tissue- or array-type, considered candidates for downstream analysis. The GWAS catalogue presented 175 “reported” genes and 218 SNPs from 51 studies matching the set T2D-related criteria. Functional analyses done with the Database for Annotation, Visualization and Integrated Discovery (DAVID) on both the GWAS and expression studies genes lists, Diabetes Mellitus, Type 2 Genetic Phenomena Computational Biology
116	Network-based approaches for multi-omic data integration Xiao, Hui January 2019 (has links) The advent of advanced high-throughput biological technologies provides opportunities to measure the whole genome at different molecular levels in biological systems, which produces different types of omic data such as genome, epigenome, transcriptome, translatome, proteome, metabolome and interactome. Biological systems are highly dynamic and complex mechanisms which involve not only the within-level functionality but also the between-level regulation. In order to uncover the complexity of biological systems, it is desirable to integrate multi-omic data to transform the multiple level data into biological knowledge about the underlying mechanisms. Due to the heterogeneity and high-dimension of multi-omic data, it is necessary to develop effective and efficient methods for multi-omic data integration. This thesis aims to develop efficient approaches for multi-omic data integration using machine learning methods and network theory. We assume that a biological system can be represented by a network with nodes denoting molecules and edges indicating functional links between molecules, in which multi-omic data can be integrated as attributes of nodes and edges. We propose four network-based approaches for multi-omic data integration using machine learning methods. Firstly, we propose an approach for gene module detection by integrating multi-condition transcriptome data and interactome data using network overlapping module detection method. We apply the approach to study the transcriptome data of human pre-implantation embryos across multiple development stages, and identify several stage-specific dynamic functional modules and genes which provide interesting biological insights. We evaluate the reproducibility of the modules by comparing with some other widely used methods and show that the intra-module genes are significantly overlapped between the different methods. Secondly, we propose an approach for gene module detection by integrating transcriptome, translatome, and interactome data using multilayer network. We apply the approach to study the ribosome profiling data of mTOR perturbed human prostate cancer cells and mine several translation efficiency regulated modules associated with mTOR perturbation. We develop an R package, TERM, for implementation of the proposed approach which offers a useful tool for the research field. Next, we propose an approach for feature selection by integrating transcriptome and interactome data using network-constrained regression. We develop a more efficient network-constrained regression method eGBL. We evaluate its performance in term of variable selection and prediction, and show that eGBL outperforms the other related regression methods. With application on the transcriptome data of human blastocysts, we select several interested genes associated with time-lapse parameters. Finally, we propose an approach for classification by integrating epigenome and transcriptome data using neural networks. We introduce a superlayer neural network (SNN) model which learns DNA methylation and gene expression data parallelly in superlayers but with cross-connections allowing crosstalks between them. We evaluate its performance on human breast cancer classification. The SNN provides superior performances and outperforms several other common machine learning methods. The approaches proposed in this thesis offer effective and efficient solutions for integration of heterogeneous high-dimensional datasets, which can be easily applied to other datasets presenting the similar structures. They are therefore applicable to many fields including but not limited to Bioinformatics and Computer Science.
117	A computational investigation of the electrocardiogram with healthy and diseased human ventricles Cardone-Noott, Louie January 2016 (has links) Cardiovascular diseases are the leading cause of death worldwide, and are estimated to kill over 17 million people each year, about 31% of all deaths. In the clinic, the first diagnostic procedure for a suspected cardiac abnormality is often acquisition of an electrocardiogram (ECG), which measures the electrical potential of the heart at the body surface. Understanding the mechanisms underlying generation of the ECG waveforms is crucial for optimal clinical benefit. Computer simulations possess several strengths as a tool to gain this understanding, particularly in terms of human-specificity, flexibility, repeatability, and ethics. The ventricles make up the majority of the cardiac volume and are therefore responsible for the majority of ECG waveforms. Ventricular disorders are the most life-threatening, because the ventricles are responsible for pumping blood to the body. Due to their size it has only recently become possible to perform biophysically detailed simulations of the ventricles and torso using supercomputers. In this thesis, multiscale, mathematical models of the ventricles and torso using the Chaste software library are simulated on high performance computing systems. A description is included of the performance enhancements made in Chaste to improve resource efficiency and accelerate job turnaround, particularly in data storage and the auxiliary tasks of post-processing and data conversion. A novel model of ventricular activation is presented and parametrized using multi-modal human data, and successfully used to simulate normal and pathological QRS complexes. Similarly, repolarization gradients are imposed based on the literature and result in a variety of T waves. Finally, the developed human whole-ventricular and torso models are utilized to gain new insights into possible ionic mechanisms underlying the clinical manifestations of the early repolarization syndrome. Overall, this thesis presents a novel framework for simulation of the human ECG using high performance computers, with possible applications in basic science and computational medicine. 004
118	Defining complex rule-based models in space and over time Wilson-Kanamori, John Roger January 2015 (has links) Computational biology seeks to understand complex spatio-temporal phenomena across multiple levels of structural and functional organisation. However, questions raised in this context are difficult to answer without modelling methodologies that are intuitive and approachable for non-expert users. Stochastic rule-based modelling languages such as Kappa have been the focus of recent attention in developing complex biological models that are nevertheless concise, comprehensible, and easily extensible. We look at further developing Kappa, in terms of how we might define complex models in both the spatial and the temporal axes. In defining complex models in space, we address the assumption that the reaction mixture of a Kappa model is homogeneous and well-mixed. We propose evolutions of the current iteration of Spatial Kappa to streamline the process of defining spatial structures for different modelling purposes. We also verify the existing implementation against established results in diffusion and narrow escape, thus laying the foundations for querying a wider range of spatial systems with greater confidence in the accuracy of the results. In defining complex models over time, we draw attention to how non-modelling specialists might define, verify, and analyse rules throughout a rigorous model development process. We propose structured visual methodologies for developing and maintaining knowledge base data structures, incorporating the information needed to construct a Kappa rule-based model. We further extend these methodologies to deal with biological systems defined by the activity of synthetic genetic parts, with the hope of providing tractable operations that allow multiple users to contribute to their development over time according to their area of expertise. Throughout the thesis we pursue the aim of bridging the divide between information sources such as literature and bioinformatics databases and the abstracting decisions inherent in a model. We consider methodologies for automating the construction of spatial models, providing traceable links from source to model element, and updating a model via an iterative and collaborative development process. By providing frameworks for modellers from multiple domains of expertise to work with the language, we reduce the entry barrier and open the field to further questions and new research. 572.80285
119	Charting the single-cell transcriptional landscape of haematopoiesis Hamey, Fiona Kathryn January 2019 (has links) High turnover in the haematopoietic system is sustained by stem and progenitor cells, which divide and mature to produce the range of cell types present in the blood. This complex system has long served as a model of differentiation in adult stem cell systems and its study has important clinical relevance. Maintaining a healthy blood system requires regulation of haematopoietic cell fate decisions, with severe dysregulation of these fate choices observed in diseases such as leukaemia. As transcriptional regulation is known to play a role in this regulation, the gene expression of many haematopoietic progenitors has been measured. However, many of the classic populations are actually extremely heterogeneous in both expression and function, highlighting the need for characterising the haematopoietic progenitor compartment at the level of individual cells. The first aim of this work was to chart the single-cell transcriptional landscape of the haematopoietic stem and progenitor cell (HSPC) compartment. To build a comprehensive map of this landscape, 1,654 HSPCs from mouse bone marrow were profiled using single-cell RNA-sequencing. Analysis of these data generated a useful resource, and reconstructed changes in gene expression, cell cycle and RNA content along differentiation trajectories to three blood lineages. To investigate how single-cell gene expression can be used to learn about regulatory relationships, data measuring the expression of 41 genes (including 31 transcription factors) in 2,167 stem and progenitor cells were used to construct Boolean gene regulatory network models describing the regulation of differentiation from stem cells to two different progenitor populations. The inferred relationships revealed positive regulation of Nfe2 and Cbfa2t3h by Gata2 that was unique to differentiation towards megakaryocyte-erythroid progenitors, which was subsequently experimentally validated. The next study focused on investigating the link between transcriptional and functional heterogeneity within blood progenitor populations. Single-cell profiles of human cord blood progenitors revealed a continuum of lympho-myeloid gene expression. Culture assays performed to assess the functional output of single cells found both unilineage and bilineage output and, by investigating the link between surface marker expression and function, a new sorting strategy was devised that was able to enrich for function within conventional lympho-myeloid progenitor sorting gates. The final project aimed to study changes to the HSPC compartment in a perturbed state. A droplet-based single-cell RNA-sequencing dataset of 44,802 cells was analysed to identify entry points to eight blood lineages and to characterise gene expression changes in this transcriptional landscape. Mapping single-cell data from W41/W41 Kit mutant mice highlighted quantitative shifts in progenitor populations such as a reduction in mast cell progenitors and an increase towards more mature progenitors along the erythroid trajectory. Differential gene expression identified upregulation of stress response and a reduction of apoptosis during erythropoiesis as potential compensatory mechanisms in the Kit mutant progenitors. Together this body of work characterises the HSPC compartment at single-cell level and provides methods for how single-cell data can be used to discover regulatory relationships, link expression heterogeneity to function, and investigate changes in the transcriptional landscape in a perturbed environment.
120	Genetic determinants of Metabolic Syndrome in Lyon Hypertensive rats Ma, Man Chun John 01 December 2013 (has links) Metabolic Syndrome (MetS) is a collective term for a cluster of disorders, including dysglycemia, central obesity, dyslipidemia, hypertension, and eventual end organ damage. The combination of these disorders increases the risk of many kinds of end organ damages, including coronary heart disease, kidney failure, and cirrhosis. MetS is highly prevalent in the United States, affecting one third of the U.S. population in a 2009 estimate. The Lyon strains are three rat strains selectively inbred from the same colony of outbred rats for different blood pressure levels. The Lyon Hypertensive (LH) strain, in addition to its essential hypertension phenotype, also harbors many disorders found in MetS. The Lyon Normotensive (LN) rat strain is completely devoid of these symptoms, while Lyon Low-pressure (LL) is obese but is resistant to other traits of MetS. Rat chromosome 17 (RNO17) has previously been linked with many of MetS' phenotypes in Lyon Hypertensive (LH). In this project, we are using a mixture of genetical genomics and systems biology methods to identify genetic elements that may cause the LH phenotype. Divergent haplotype blocks between the Lyon strains were first identified by the analysis of the distribution of observed strain differences (OSD) calculated from the result of genome resequencing. Divergent haplotype regions totaling less than 16% of the rat genome that contain more than 95% of the identified SNPs in each of the three pairwise comparisons between the Lyon strains have been identified; in particular, there are 14 divergent haplotype blocks between LH and LN spanning 7.7% of RNO17 that harbor more than 97% of SNPs identified on RNO17. Twenty-five genes in these regions were thus identified as potential genetic determinants for MetS. Phenotypic QTLs (pQTL) and expression QTLs (eQTL) mapping from a cohort of male LH × LN F2 rats were performed by putting the cohort on a 15-week phenotyping protocol and genome-wide genotyping. Total liver RNA from 36 individuals from the cohort were sequenced to provide expression data for eQTL mapping. We have mapped 22 pQTLs that are statistically linked to 15 traits, with RNO17 linked to 15 traits associated with blood pressure, leptin and body weight. We have also identified 1,200 eQTLs from this cohort, including 11 eQTLs with cis-linkage with one or more genes. On RNO17, we have identified two SNPs between 29-39 Mb which are significantly linked to the expression of 85 genes; the only gene with cis-linkage with these SNPs, RGD1562963, was hence identified as a putative master regulator. Transcriptome analyses were then performed on the Lyon parental animals; the total liver and kidney of RNA from 6 each of LH, LL and LN strains that were subjected to the same 15-week phenotyping protocol were sequenced for differential expression analysis, gene coexpression network analysis and quantitative trait transcript analysis. Differential expression analysis identified 4 genes on RNO17's divergent haplotype regions: Cul2 and the aforementioned RGD1562963 for liver, Amph and Bambi for kidney. Quantitative trait transcript analyses have shown significant correlations between the expressions of these four genes with one or more of the traits of the animals treated, validating their status as potential genetic determinants for MetS. However, out of the 84 genes that RGD1562963 potentially regulates, only two other genes (Cul2 and Supt4h1) have significant correlations with one or more traits. Gene coexpression network analyses have shown a relationship between genes on the TGF-β pathway and the differentially expressed genes in the kidney, supporting our speculation on the hyperactivity of the TGF-β system in the etiology of the LH phenotypes. An LH-17LN consomic strain was also generated by introgressing an LN copy of RNO17 onto the LH genomic background to validate in vivo the role of RNO17 in the etiology of MetS symptoms in LH. We have observed that the consomic strain has significantly decreased body weight, adiposity, blood pressure, and inter-week blood pressure differences that may be a surrogate for salt sensitivity. Thus, the role of RNO17 on the LH genotype is validated. In summary, we have been able to identify, by in vivo and in silico methods, that RNO17 is related to the MetS traits in LH; that 4 genes, Amph, Bambi, Cul2, and RGD1562963, are potential genetic contributors to RNO17's effects; and that their effects may include, but are not limited to, the activation of TGF-Β signals. Computational Biology Diabetes Metabolic Syndrome Obesity Rat Genetics

Search results