Global ETD Search

41	Integrative Genomic Modeling of Complex Traits using Pathway Analysis Bennett, Brian D. January 2012 (has links) <p>Understanding the root molecular causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. The overall goal of this work is to develop an integrative framework to better understand the genetic and molecular causes of complex traits, including complex diseases. In this work, I present a computational framework that I developed to integrate gene expression and other genomic data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genomic variation. This framework combines analysis on the multi-gene biological pathway level with multi-task learning to build predictive models that also uncover pathways potentially relevant to the complex trait of interest. To validate this framework, I first performed a simulation study to test its predictive ability and to measure how well it uncovered pathways that contain genes that are both differentially expressed and genetically associated with a complex trait. The predictive performance of the multi-task model was found to be comparable to other similar methods. Also, multi-task learning, along with other methods that jointly considered pathway scores from both data sets, was able to better identify pathways with both genetic and expression differences related to the phenotype. I applied this framework to gene expression and genotype data from estrogen receptor (ER) positive and ER negative breast cancer samples. The top 15 predictive pathways from the multi-task model were all related to estrogen, steroids, cell signaling, or the cell cycle. The results from both the simulation studies and the breast cancer analysis suggest that this multi-task framework is useful for both identifying biologically relevant pathways associated with a phenotype across multiple data types while also retaining similar predictive performance as other similar methods.</p> / Dissertation Bioinformatics
42	Connecting bioinformatics analysis to scientific practice : an integrated information behaviour and task analysis approach / Bartlett, Joan Catherine. January 2004 (has links) Thesis (Ph. D.)--University of Toronto, 2004. / Adviser: Elaine Toms. Completed at the Faculty of Information Studies, University of Toronto. Includes bibliographical references (leaves 179-187). Bioinformatics.
43	Investigating the Genetic Basis of Gene Expression Using EQTL Techniques Quitadamo, Andrew 28 November 2018 (has links) <p> With advances in genome sequencing technology, datasets with large sample sizes can be generated relatively quickly and cheaply, especially compared to the past decade or so. We can utilize this data to analyze the associations between genetic variants and gene expression, and how that in turn relates to specific phenotypes. We will explore the impact of structural variants (SVs) on gene expression and microRNA expression in healthy individuals. This dissertation is an application of expression quantitative trait loci (eQTL) analysis techniques on several of these datasets, as well as a description of an eQTL analysis pipeline software package.</p><p> Bioinformatics
44	Integrated computational and experimental analysis of host-virus interaction systems Garamszegi, Sara 12 March 2016 (has links) Host-virus systems biology seeks to elucidate the complex interactions between a virus and its host, and to determine the downstream consequences of these interactions for the host. Traditional studies of host-virus interactions, conducted one-at-a-time, yield high-quality results, but these have limited scope. By contrast, systems biology uses a holistic approach to examine many interactions simultaneously, thereby increasing the breadth of interactions revealed. However, these studies have largely focused on common human pathogens (e.g., influenza or HIV), and their results may not apply to unrelated viruses, such as those that cause hemorrhagic fevers. Combining experimental and computational techniques can yield novel information about host-virus interactions that traditional virological or purely computational systems-biology methods cannot uncover. In this thesis, I demonstrate the utility of combined experimental and computational approaches by: (1) revealing general principles of host-virus interactions, broadly applicable to a wide range of viruses; and (2) probing a specific host-virus interaction system to identify transcriptional signatures which elucidate host response to Ebola virus. I identify general mechanisms governing host-virus protein-protein interactions (PPIs) using domain-resolved PPI networks. This method identifies mechanistic differences between virus-human and within-human interactions, such as the preference viral proteins exhibit for binding human proteins containing linear motif-binding domains. Using domain-resolved PPIs reveals novel signatures of pleiotropy, economy, and convergent evolution in the viral-host interactome not previously identified in other PPI networks. I further identify transcriptional signatures of host response to Ebola virus (EBOV) infection by pairing high-throughput microarray data with advanced pathway analyses. I compare EBOV-infected, non-human primates with and without anticoagulant treatment, to identify transcriptional signatures associated with survival following infection. Having found that CCAAT-enhancer binding proteins (CEBPs) are associated with survival, I determine the role CEBPs have in EBOV infection by using comparative microarray analysis of multiple viral infections of hemorrhagic and non-hemorrhagic origin. I also identify unique transcriptional changes in the host that distinguish EBOV infection from other viral infections, such as Influenza. Integrating these two areas of research provides information about universally applicable patterns of viral infection, while simultaneously examining the consequences of specific host-pathogen interactions. Bioinformatics
45	Computational approaches to understanding infectious disease Tan, Yan 08 April 2016 (has links) Infectious diseases derive from organisms such as viruses, bacteria, fungi and parasites that can be passed from person to person, transmitted via bites from insects or animals, or acquired through ingestion of contaminated food or water or environmental exposure. Infectious diseases cause roughly 20% of annual deaths worldwide, including many children under the age of five. In developing countries, these diseases remain a major public health problem. They can also cause societal and economic burdens through life-long disability. We need a better understanding of these diseases with a view towards the goals of prevention and cure. The advent of whole-genome transcriptional profiling technology and powerful computational resources has made it possible to study infectious diseases on a genome-wide scale. Such studies can lead to improvements in diagnostic tools as well as preventive measures such as vaccines. The work of this thesis focuses on a number of projects with the common thread of developing and applying of computational methods to extract biological information from high-throughput transcriptional data related to infectious diseases. These include (1) the identification of gene signatures related to B-cell proliferation that predict an influenza vaccine-induced antibody response; (2) study of the physiological state of the Plasmodium falciparum malaria parasite when sequestered in human tissue; (3) identifying the similarity and differences of the response to five anti-viral vaccines. To achieve the scientific goals of these projects I developed two new computational methods that can be utilized more broadly for the downstream interpretation of results from enrichment analyses of whole transcriptome profiles. There are a combined visualization and annotation approach called the Constellation Map and the Leading Edge Metagene Detector that systematically consolidates functionally related genes from multiple sets representing highly enriched biological pathways and processes in the comparison of expression data of two biological phenotypes. The application of those computational approaches and tools in this dissertation enabled a better understanding of the biological mechanisms related to human vaccine response. The software packages developed are freely available for use by biological investigators across many fields. Bioinformatics
46	Computational studies for prediction of protein folding and ligand binding Luo, Lingqi 02 February 2018 (has links) This dissertation comprises four projects. I) Glycosylation is a post-translational modification that affects many physiological processes, including protein folding, cell interaction and host immune response. PglC, a phosphoglycosyl transferase (PGT) involved in the biosynthesis of N-linked glycoproteins in Campylobacter jejuni, is representative of one of the structurally simplest members of the small bacterial PGT family. The research utilizes sequence similarity network and evolutionary covariance studies to identify the catalytic core of PglC, followed by modeling its three-dimensional structure using the covariance as constraints. II) Rapid growth of fragment-based drug discovery necessitates accurate fragment library screening for targets of interest, finding strong binders with specific binding. While many high-resolution biophysical methods for fragment screening work well, docking-based virtual screening is highly desired due to the speed and cost efficiency. Beyond the key performance-determining factors like score function and search method, the goal is to learn from the experimental fragment bound structures in the PDBbinder database and to evaluate the profile of side-chain flexibility in the interface and its contribution to docking performance. III) Protein docking procedures carry out the task of predicting the structure of a protein–protein complex starting from the known structures of the individual protein components. However, the structure of one or both components frequently must be obtained by homology modeling based on known structures. This work presents a benchmark dataset of experimentally determined target complexes with a large set of sufficiently diverse template complexes identified for each target. The dataset allows developers to test their algorithms combining homology modeling and docking, in order to determine the factors that critically influence the prediction performance. IV) Human Eukaryotic Initiation Factor 4AI (heIF4AI) is the enzymatic component of a highly efficient complex, heIF4F. Its helicase activity binds and unwinds the secondary structure of mRNA at the 5' end and thus plays a crucial role in translation initiation. This research focuses on the C-terminal domain of heIF4AI and investigates its potential as an anti-cancer target by integrating the approaches of solvent mapping, docking, crystallization and NMR. Bioinformatics
47	Topological Analysis of Biological Pathways: Genes, MicroRNAs and Pathways Involved in Hepatocellular Carcinoma January 2017 (has links) abstract: Rewired biological pathways and/or rewired microRNA (miRNA)-mRNA interactions might also influence the activity of biological pathways. Here, rewired biological pathways is defined as differential (rewiring) effect of genes on the topology of biological pathways between controls and cases. Similarly, rewired miRNA-mRNA interactions are defined as the differential (rewiring) effects of miRNAs on the topology of biological pathways between controls and cases. In the dissertation, it is discussed that how rewired biological pathways (Chapter 1) and/or rewired miRNA-mRNA interactions (Chapter 2) aberrantly influence the activity of biological pathways and their association with disease. This dissertation proposes two PageRank-based analytical methods, Pathways of Topological Rank Analysis (PoTRA) and miR2Pathway, discussed in Chapter 1 and Chapter 2, respectively. PoTRA focuses on detecting pathways with an altered number of hub genes in corresponding pathways between two phenotypes. The basis for PoTRA is that the loss of connectivity is a common topological trait of cancer networks, as well as the prior knowledge that a normal biological network is a scale-free network whose degree distribution follows a power law where a small number of nodes are hubs and a large number of nodes are non-hubs. However, from normal to cancer, the process of the network losing connectivity might be the process of disrupting the scale-free structure of the network, namely, the number of hub genes might be altered in cancer compared to that in normal samples. Hence, it is hypothesized that if the number of hub genes is different in a pathway between normal and cancer, this pathway might be involved in cancer. MiR2Pathway focuses on quantifying the differential effects of miRNAs on the activity of a biological pathway when miRNA-mRNA connections are altered from normal to disease and rank disease risk of rewired miRNA-mediated biological pathways. This dissertation explores how rewired gene-gene interactions and rewired miRNA-mRNA interactions lead to aberrant activity of biological pathways, and rank pathways for their disease risk. The two methods proposed here can be used to complement existing genomics analysis methods to facilitate the study of biological mechanisms behind disease at the systems-level. / Dissertation/Thesis / Doctoral Dissertation Molecular and Cellular Biology 2017 Bioinformatics
48	Use of Large, Immunosignature Databases to Pose New Questions About Infection and Health Status January 2018 (has links) abstract: Immunosignature is a technology that retrieves information from the immune system. The technology is based on microarrays with peptides chosen from random sequence space. My thesis focuses on improving the Immunosignature platform and using Immunosignatures to improve diagnosis for diseases. I first contributed to the optimization of the immunosignature platform by introducing scoring metrics to select optimal parameters, considering performance as well as practicality. Next, I primarily worked on identifying a signature shared across various pathogens that can distinguish them from the healthy population. I further retrieved consensus epitopes from the disease common signature and proposed that most pathogens could share the signature by studying the enrichment of the common signature in the pathogen proteomes. Following this, I worked on studying cancer samples from different stages and correlated the immune response with whether the epitope presented by tumor is similar to the pathogen proteome. An effective immune response is defined as an antibody titer increasing followed by decrease, suggesting elimination of the epitope. I found that an effective immune response usually correlates with epitopes that are more similar to pathogens. This suggests that the immune system might occupy a limited space and can be effective against only certain epitopes that have similarity with pathogens. I then participated in the attempt to solve the antibiotic resistance problem by developing a classification algorithm that can distinguish bacterial versus viral infection. This algorithm outperforms other currently available classification methods. Finally, I worked on the concept of deriving a single number to represent all the data on the immunosignature platform. This is in resemblance to the concept of temperature, which is an approximate measurement of whether an individual is healthy. The measure of Immune Entropy was found to work best as a single measurement to describe the immune system information derived from the immunosignature. Entropy is relatively invariant in healthy population, but shows significant differences when comparing healthy donors with patients either infected with a pathogen or have cancer. / Dissertation/Thesis / Doctoral Dissertation Molecular and Cellular Biology 2018 Bioinformatics
49	A Novel Approach to the Comparative Genomic Analysis of Canine and Human Cancers January 2018 (has links) abstract: Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power for improving understanding of cancer genes. However, characterization of canine cancer genome landscapes has been limited. It is hindered by lack of canine-specific tools and resources. To enable robust and reproducible comparative genomic analysis of canine cancers, I have developed a workflow for somatic and germline variant calling in canine cancer genomic data. I have first adapted a human cancer genomics pipeline to create a semi-automated canine pipeline used to map genomic landscapes of canine melanoma, lung adenocarcinoma, osteosarcoma and lymphoma. This pipeline also forms the backbone of my novel comparative genomics workflow. Practical impediments to comparative genomic analysis of dog and human include challenges identifying similarities in mutation type and function across species. For example, canine genes could have evolved different functions and their human orthologs may perform different functions. Hence, I undertook a systematic statistical evaluation of dog and human cancer genes and assessed functional similarities and differences between orthologs to improve understanding of the roles of these genes in cancer across species. I tested this pipeline canine and human Diffuse Large B-Cell Lymphoma (DLBCL), given that canine DLBCL is the most comprehensively genomically characterized canine cancer. Logistic regression with genes bearing somatic coding mutations in each cancer was used to determine if conservation metrics (sequence identity, network placement, etc.) could explain co-mutation of genes in both species. Using this model, I identified 25 co-mutated and evolutionarily similar genes that may be compelling cross-species cancer genes. For example, PCLO was identified as a co-mutated conserved gene with PCLO having been previously identified as recurrently mutated in human DLBCL, but with an unclear role in oncogenesis. Further investigation of these genes might shed new light on the biology of lymphoma in dogs and human and this approach may more broadly serve to prioritize new genes for comparative cancer biology studies. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2018 Bioinformatics
50	Studying Low Complexity Structures in Bioinformatics Data Analysis of Biological and Biomedical Data Causey, Jason L. 02 June 2018 (has links) <p> Biological, biomedical, and radiological data tend to be large, complex, and noisy. Gene expression studies contain expression levels for thousands of genes and hundreds or thousands of patients. Chest Computed Tomography images used for diagnosing lung cancer consist of hundreds of 2-D image ”slices”, each containing hundreds of thousands of pixels. Beneath the size and apparent complexity of many of these data are simple and sparse structures. These low complexity structures can be leveraged into new approaches to biological, biomedical, and radiological data analyses. Two examples are presented here. First, a new framework SparRec (Sparse Recovery) for imputation of GWAS data, based on a matrix completion (MC) model taking advantage of the low-rank and low number of co-clusters of GWAS matrices. SparRec is flexible enough to impute meta-analyses with multiple cohorts genotyped on different sets of SNPs, even without a reference panel. Compared with Mendel-Impute, another MC method, our low-rank based method achieves similar accuracy and efficiency even with up to 90% missing data; our co-clustering based method has advantages in running time. MC methods are shown to have advantages over statistics-based methods, including Beagle and fastPhase. Second, we demonstrate NoduleX, a method for predicting lung nodule malignancy from chest Computed Tomography (CT) data, based on deep convolutional neural networks. For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort and compare our results with classifications provided by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of up to 0.99, commensurate with the radiologists’ analysis. Whether they are leveraged directly or extracted using mathematical optimization and machine learning techniques, low complexity structures provide researchers with powerful tools for taming complex data. </p><p> Bioinformatics

Search results