Assigning human genes to diseases and related phenotypes is an important topic in modern genomics. Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities that occur in human diseases. Computational methods such as label-propagation and supervised-learning address challenges posed by traditional approaches such as manual curation to link genes to phenotypes in the HPO. It is only in recent years that computational methods have been applied in a network-based approach for predicting genes to disease-related phenotypes. In this thesis, we present an extensive benchmarking of various computational methods for the task of network-based gene classification. These methods are evaluated on multiple protein interaction networks and feature representations. We empirically evaluate the performance of multiple prediction tasks using two evaluation experiments: cross-fold validation and the more stringent temporal holdout. We demonstrate that all of the prediction methods considered in our benchmarking analysis have similar performance, with each of the methods outperforming a random predictor. / Master of Science / For many years biologists have been working towards studying diseases, characterizing dis- ease history and identifying what factors and genetic variants lead to diseases. Such studies are critical to working towards the advanced prognosis of diseases and being able to iden- tify targeted treatment plans to cure diseases. An important characteristic of diseases is that they can be expressed by a set of phenotypes. Phenotypes are defined as observable characteristics or traits of an organism, such as height and the color of the eyes and hair. In the context of diseases, the phenotypes that describe diseases are referred to as clinical phenotypes, with some examples being short stature, abnormal hair pattern, etc. Biologists have identified the importance of deep phenotyping, which is defined as a concise analysis that gathers information about diseases and their observed traits in humans, in finding genetic variants underlying human diseases. We make use of the Human Phenotype Ontology (HPO), a standardized vocabulary of phenotypic abnormalities that occur in human diseases. The HPO provides relationships between phenotypes as well as associations between phenotypes and genes. In our study, we perform a systematic benchmarking to evaluate different types of computational approaches for the task of phenotype-gene prediction, across multiple molecular networks using various feature representations and for multiple evaluation strategies.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/109304 |
Date | 16 September 2020 |
Creators | Tyagi, Tanya |
Contributors | Computer Science, Murali, T. M., Heath, Lenwood S., Karpatne, Anuj |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0023 seconds