Return to search

Mining Structural and Functional Patterns in Pathogenic and Benign Genetic Variants through Non-negative Matrix Factorization

The main challenge in studying genetics has evolved from identifying variations and their impact on traits to comprehending the molecular mechanisms through which genetic variations affect human biology, including disease susceptibility. Despite having identified a vast number of variants associated with human traits through large scale genome wide association studies (GWAS) a significant portion of them still lack detailed insights into their underlying mechanisms [1]. Addressing this uncertainty requires the development of precise and scalable approaches to discover how genetic variation precisely influences phenotypes at a molecular level. In this study, we developed a pipeline to automate the annotation of structural variant feature effects. We applied this pipeline to a dataset of 33,942 variants from the ClinVar and GnomAD databases, which included both pathogenic and benign associations. To bridge the gap between genetic variation data and molecular phenotypes, I implemented Non-negative Matrix Factorization (NMF) on this large-scale dataset. This algorithm revealed 6 distinct clusters of variants with similar feature profiles. Among these groups, two exhibited a predominant presence of benign variants (accounting for 70% and 85% of the clusters), while one showed an almost equal distribution of pathogenic and benign variants. The remaining three groups were predominantly composed of pathogenic variants, comprising 68%, 83%, and 77% of the respective clusters. These findings revealed valuable insights into the underlying mechanisms contributing to pathogenicity. Further analysis of this dataset and the exploration of disease-related genes can enhance the accuracy of genetic diagnosis and therapeutic development through the direct inference of variants that are likely to affect the functioning of essential genes.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/693750
Date08 1900
CreatorsPeña-Guerra, Karla A
ContributorsArold, Stefan T., Biological and Environmental Science and Engineering (BESE) Division, Henao, Ricardo, Reversade, Bruno
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Rights2024-08-24, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2024-08-24.
RelationN/A

Page generated in 0.0063 seconds