1 |
Semantic Prioritization of Novel Causative Genomic Variants in Mendelian and Oligogenic DiseasesBoudellioua, Imene 21 March 2019 (has links)
Recent advances in Next Generation Sequencing (NGS) technologies have facilitated the generation of massive amounts of genomic data which in turn is bringing the promise that personalized medicine will soon become widely available. As a result, there is an increasing pressure to develop computational tools to analyze and interpret genomic data. In this dissertation, we present a systematic approach for interrogating patients’ genomes to identify candidate causal genomic variants of Mendelian and oligogenic diseases. To achieve that, we leverage the use of biomedical data available from extensive biological experiments along with machine learning techniques to build predictive models that rival the currently adopted approaches in the field. We integrate a collection of features representing molecular information about the genomic variants and information derived from biological networks. Furthermore, we incorporate genotype-phenotype relations by exploiting semantic technologies and automated reasoning inferred throughout a cross-species phenotypic ontology network obtained from human, mouse, and zebra fish studies. In our first developed method, named PhenomeNet Variant Predictor (PVP), we perform an extensive evaluation of a large set of synthetic exomes and genomes of diverse Mendelian diseases and phenotypes. Moreover, we evaluate PVP on a set of real patients’ exomes suffering from congenital hypothyroidism. We show that PVP successfully outperforms state-of-the-art methods, and provides a promising tool for accurate variant prioritization for Mendelian diseases. Next, we update the PVP method using a deep neural network architecture as a backbone for learning and illustrate the enhanced performance of the new method,
DeepPVP on synthetic exomes and genomes. Furthermore, we propose OligoPVP, an extension of DeepPVP that prioritizes candidate oligogenic combinations in personal exomes and genomes by integrating knowledge from protein-protein interaction networks and we evaluate the performance of OligoPVP on synthetic genomes created by known disease-causing digenic combinations. Finally, we discuss some limitations and future steps for extending the applicability of our proposed methods to identify the genetic underpinning for Mendelian and oligogenic diseases.
|
2 |
Prioritizing Causative Genomic Variants by Integrating Molecular and Functional Annotations from Multiple Biomedical OntologiesAlthagafi, Azza Th. 20 July 2023 (has links)
Whole-exome and genome sequencing are widely used to diagnose individual patients. However, despite its success, this approach leaves many patients undiagnosed. This could be due to the need to discover more disease genes and variants or because disease phenotypes are novel and arise from a combination of variants of multiple known genes related to the disease. Recent rapid increases in available genomic, biomedical, and phenotypic data enable computational analyses, reducing the search space for disease-causing genes or variants and facilitating the prediction of causal variants. Therefore, artificial intelligence, data mining, machine learning, and deep learning are essential tools that have been used to identify biological interactions, including protein-protein interactions, gene-disease predictions, and variant--disease associations. Predicting these biological associations is a critical step in diagnosing patients with rare or complex diseases.
In recent years, computational methods have emerged to improve gene-disease prioritization by incorporating phenotype information. These methods evaluate a patient's phenotype against a database of gene-phenotype associations to identify the closest match. However, inadequate knowledge of phenotypes linked with specific genes in humans and model organisms limits the effectiveness of the prediction. Information about gene product functions and anatomical locations of gene expression is accessible for many genes and can be associated with phenotypes through ontologies and machine-learning models. Incorporating this information can enhance gene-disease prioritization methods and more accurately identify potential disease-causing genes.
This dissertation aims to address key limitations in gene-disease prediction and variant prioritization by developing computational methods that systematically relate human phenotypes that arise as a consequence of the loss or change of gene function to gene functions and anatomical and cellular locations of activity. To achieve this objective, this work focuses on crucial problems in the causative variant prioritization pipeline and presents novel computational methods that significantly improve prediction performance by leveraging large background knowledge data and integrating multiple techniques.
Therefore, this dissertation presents novel approaches that utilize graph-based machine-learning techniques to leverage biomedical ontologies and linked biological data as background knowledge graphs. The methods employ representation learning with knowledge graphs and introduce generic models that address computational problems in gene-disease associations and variant prioritization. I demonstrate that my approach is capable of compensating for incomplete information in public databases and efficiently integrating with other biomedical data for similar prediction tasks. Moreover, my methods outperform other relevant approaches that rely on manually crafted features and laborious pre-processing. I systematically evaluate our methods and illustrate their potential applications for data analytics in biomedicine. Finally, I demonstrate how our prediction tools can be used in the clinic to assist geneticists in decision-making. In summary, this dissertation contributes to the development of more effective methods for predicting disease-causing variants and advancing precision medicine.
|
Page generated in 0.0914 seconds